Re: [RFC PATCH] powerpc/powernv: recheck lock bit in core_idle_lock_held after lwarx
> On 25 Oct 2016, at 12:15, Paul Mackerras wrote: > > On Tue, Oct 25, 2016 at 11:24:34AM +0800, Li Zhong wrote: >> The core_idle_lock_held loops when the lock bit is held by others. >> However, it is possible that after the lock bit is cleared, some one >> else sees it first and sets the lock bit. And lwarx loads a value with >> lock bit set, and the lock bit may be cleared in the following stwcx. >> It is possible the first one is still executing in the critical section. >> >> This patch rechecks the lock bit after lwarx, and go back to loop if it >> is set. > > You're quite correct, in fact I posted almost exactly the same patch a > few days ago... See http://patchwork.ozlabs.org/patch/684963/. Forget to check the list before sending … One minor difference, maybe we can use bne- for the rechecking :) Thanks, Zhong > > Thanks, > Paul. >
[RFC PATCH] powerpc/powernv: recheck lock bit in core_idle_lock_held after lwarx
The core_idle_lock_held loops when the lock bit is held by others. However, it is possible that after the lock bit is cleared, some one else sees it first and sets the lock bit. And lwarx loads a value with lock bit set, and the lock bit may be cleared in the following stwcx. It is possible the first one is still executing in the critical section. This patch rechecks the lock bit after lwarx, and go back to loop if it is set. Signed-off-by: Li Zhong --- arch/powerpc/kernel/idle_book3s.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S index bd739fe..ce07b3f 100644 --- a/arch/powerpc/kernel/idle_book3s.S +++ b/arch/powerpc/kernel/idle_book3s.S @@ -99,6 +99,8 @@ core_idle_lock_held: bne 3b HMT_MEDIUM lwarx r15,0,r14 + andi. r9,r15,PNV_CORE_IDLE_LOCK_BIT + bne-core_idle_lock_held blr /* -- 1.9.1
Re: [RFC PATCH 1/2] KVM: PPC: divide the ics lock into smaller ones for each irq
On Jun 20, 2016, at 13:25, Paul Mackerras wrote:On Mon, May 16, 2016 at 02:58:18PM +0800, Li Zhong wrote:This patch tries to use smaller locks for each irq in the ics, insteadof a lock at the ics level, to provide better scalability.This looks like a worth-while thing to do. Do you have anyperformance measurements to justify the change? This will increasethe size of struct kvmppc_ics by 4kB, so it would be useful to showthe performance increase that justifies it.Actually, I saw some “improvement” because of the vcpus were not binded, io jobs and irqs on the guest were not binded. After I fixed those random factors, the result became stable, but I couldn’t see any obvious improvements from the patches... Maybe I need find some other test cases that could support this change. Also, when you resend the patch, please make the patch descriptionmore definite - say "With this patch, we use" rather than "this patchtries to use", for instance.OK, I will change that when doing a resend, if I can find some workload that could benefit from this change. Thanks, ZhongRegards,Paul.___Linuxppc-dev mailing listLinuxppc-dev@lists.ozlabs.orghttps://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 2/2] KVM: PPC: Don't take lock when check irq's resend flag
> On Jun 20, 2016, at 13:27, Paul Mackerras wrote: > > On Mon, May 16, 2016 at 03:02:13PM +0800, Li Zhong wrote: >> It seems that we don't need to take the lock before evaluating irq's >> resend flag. What needed is to make sure when we clear the ics's bit >> in the icp's resend_map, we don't miss the resend flag of the irqs >> that set the bit. >> >> And seems this could be ordered through the barrier in test_and_clear_bit(), >> and an newly added wmb when setting irq's resend flag, and icp's resend_map. > > This looks fine to me. Is there a measurable performance improvement > from this? I understand it could be hard to measure. > > Also, you could make the patch description more definite - just say > that we don't need to take the lock, there's no need for "seems”. OK :) However, we may need to ignore this one for now. To implement the P/Q stuff, we probably need make sure the resend irqs to be resent only once. It’s easier to make sure that with the lock here, and the resend flag can be cleared inside the lock. Thanks, Zhong > > Paul. > ___ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 2/2] KVM: PPC: Don't take lock when check irq's resend flag
It seems that we don't need to take the lock before evaluating irq's resend flag. What needed is to make sure when we clear the ics's bit in the icp's resend_map, we don't miss the resend flag of the irqs that set the bit. And seems this could be ordered through the barrier in test_and_clear_bit(), and an newly added wmb when setting irq's resend flag, and icp's resend_map. Signed-off-by: Li Zhong --- arch/powerpc/kvm/book3s_hv_rm_xics.c | 14 +++--- arch/powerpc/kvm/book3s_xics.c | 22 +++--- 2 files changed, 14 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index c3f604e..c3fa386 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -39,14 +39,9 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics, for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) { struct ics_irq_state *state = &ics->irq_state[i]; - arch_spin_lock(&state->lock); - - if (!state->resend) { - arch_spin_unlock(&state->lock); + if (!state->resend) continue; - } - arch_spin_unlock(&state->lock); icp_rm_deliver_irq(xics, icp, state->number); } } @@ -374,8 +369,13 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, * We failed to deliver the interrupt we need to set the * resend map bit and mark the ICS state as needing a resend */ - set_bit(ics->icsid, icp->resend_map); state->resend = 1; + /* +* Make sure when checking resend, we don't miss the resend if +* resend_map bit is seen and cleared. +*/ + smp_wmb(); + set_bit(ics->icsid, icp->resend_map); /* * If the need_resend flag got cleared in the ICP some time diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index b9dbf75..420c4fc 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -110,30 +110,17 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics, { int i; - unsigned long flags; - - local_irq_save(flags); - for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) { struct ics_irq_state *state = &ics->irq_state[i]; - arch_spin_lock(&state->lock); - - if (!state->resend) { - arch_spin_unlock(&state->lock); + if (!state->resend) continue; - } XICS_DBG("resend %#x prio %#x\n", state->number, state->priority); - arch_spin_unlock(&state->lock); - local_irq_restore(flags); icp_deliver_irq(xics, icp, state->number); - local_irq_save(flags); } - - local_irq_restore(flags); } static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics, @@ -474,8 +461,13 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, * We failed to deliver the interrupt we need to set the * resend map bit and mark the ICS state as needing a resend */ - set_bit(ics->icsid, icp->resend_map); state->resend = 1; + /* +* Make sure when checking resend, we don't miss the resend if +* resend_map bit is seen and cleared. +*/ + smp_wmb(); + set_bit(ics->icsid, icp->resend_map); /* * If the need_resend flag got cleared in the ICP some time -- 1.8.3.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 1/2] KVM: PPC: divide the ics lock into smaller ones for each irq
This patch tries to use smaller locks for each irq in the ics, instead of a lock at the ics level, to provide better scalability. Signed-off-by: Li Zhong --- arch/powerpc/kvm/book3s_hv_rm_xics.c | 24 ++--- arch/powerpc/kvm/book3s_xics.c | 41 +++- arch/powerpc/kvm/book3s_xics.h | 2 +- 3 files changed, 34 insertions(+), 33 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 980d8a6..c3f604e 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -36,20 +36,19 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics, { int i; - arch_spin_lock(&ics->lock); - for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) { struct ics_irq_state *state = &ics->irq_state[i]; - if (!state->resend) + arch_spin_lock(&state->lock); + + if (!state->resend) { + arch_spin_unlock(&state->lock); continue; + } - arch_spin_unlock(&ics->lock); + arch_spin_unlock(&state->lock); icp_rm_deliver_irq(xics, icp, state->number); - arch_spin_lock(&ics->lock); } - - arch_spin_unlock(&ics->lock); } /* -- ICP routines -- */ @@ -310,8 +309,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, } state = &ics->irq_state[src]; - /* Get a lock on the ICS */ - arch_spin_lock(&ics->lock); + arch_spin_lock(&state->lock); /* Get our server */ if (!icp || state->server != icp->server_num) { @@ -367,7 +365,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, * Delivery was successful, did we reject somebody else ? */ if (reject && reject != XICS_IPI) { - arch_spin_unlock(&ics->lock); + arch_spin_unlock(&state->lock); new_irq = reject; goto again; } @@ -387,12 +385,12 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, */ smp_mb(); if (!icp->state.need_resend) { - arch_spin_unlock(&ics->lock); + arch_spin_unlock(&state->lock); goto again; } } - out: - arch_spin_unlock(&ics->lock); +out: + arch_spin_unlock(&state->lock); } static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp, diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index 46871d5..b9dbf75 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -113,25 +113,26 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics, unsigned long flags; local_irq_save(flags); - arch_spin_lock(&ics->lock); for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) { struct ics_irq_state *state = &ics->irq_state[i]; - if (!state->resend) + arch_spin_lock(&state->lock); + + if (!state->resend) { + arch_spin_unlock(&state->lock); continue; + } XICS_DBG("resend %#x prio %#x\n", state->number, state->priority); - arch_spin_unlock(&ics->lock); + arch_spin_unlock(&state->lock); local_irq_restore(flags); icp_deliver_irq(xics, icp, state->number); local_irq_save(flags); - arch_spin_lock(&ics->lock); } - arch_spin_unlock(&ics->lock); local_irq_restore(flags); } @@ -143,7 +144,7 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics, unsigned long flags; local_irq_save(flags); - arch_spin_lock(&ics->lock); + arch_spin_lock(&state->lock); state->server = server; state->priority = priority; @@ -154,7 +155,7 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics, deliver = true; } - arch_spin_unlock(&ics->lock); + arch_spin_unlock(&state->lock); local_irq_restore(flags); return deliver; @@ -207,10 +208,10 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority) state = &ics->irq_state[src]; local_irq_save(flags); - arch_spin_l
[PATCH v3 powerpc] perf/hv-24x7 fail 24x7 initcall if create_events_from_catalog() fails
As Michael pointed out, create_events_from_catalog() fails when we either have: - a kernel bug - some sort of hypervisor misconfiguration - ENOMEM In all the above cases, we can also fail 24x7 initcall. For hypervisor errors, EIO is used so there is something reported in dmesg. Signed-off-by: Li Zhong --- arch/powerpc/perf/hv-24x7.c | 37 + 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 9445a82..ead1420 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -637,7 +637,7 @@ static ssize_t catalog_event_len_validate(struct hv_24x7_event_data *event, #define MAX_4K (SIZE_MAX / 4096) -static void create_events_from_catalog(struct attribute ***events_, +static int create_events_from_catalog(struct attribute ***events_, struct attribute ***event_descs_, struct attribute ***event_long_descs_) { @@ -655,19 +655,25 @@ static void create_events_from_catalog(struct attribute ***events_, void *event_data, *end; struct hv_24x7_event_data *event; struct rb_root ev_uniq = RB_ROOT; + int ret = 0; - if (!page) + if (!page) { + ret = -ENOMEM; goto e_out; + } hret = h_get_24x7_catalog_page(page, 0, 0); - if (hret) + if (hret) { + ret = -EIO; goto e_free; + } catalog_version_num = be64_to_cpu(page_0->version); catalog_page_len = be32_to_cpu(page_0->length); if (MAX_4K < catalog_page_len) { pr_err("invalid page count: %zu\n", catalog_page_len); + ret = -EIO; goto e_free; } @@ -686,6 +692,7 @@ static void create_events_from_catalog(struct attribute ***events_, || (MAX_4K - event_data_offs < event_data_len)) { pr_err("invalid event data offs %zu and/or len %zu\n", event_data_offs, event_data_len); + ret = -EIO; goto e_free; } @@ -694,12 +701,14 @@ static void create_events_from_catalog(struct attribute ***events_, event_data_offs, event_data_offs + event_data_len, catalog_page_len); + ret = -EIO; goto e_free; } if (SIZE_MAX / MAX_EVENTS_PER_EVENT_DATA - 1 < event_entry_count) { pr_err("event_entry_count %zu is invalid\n", event_entry_count); + ret = -EIO; goto e_free; } @@ -712,6 +721,7 @@ static void create_events_from_catalog(struct attribute ***events_, event_data = vmalloc(event_data_bytes); if (!event_data) { pr_err("could not allocate event data\n"); + ret = -ENOMEM; goto e_free; } @@ -731,6 +741,7 @@ static void create_events_from_catalog(struct attribute ***events_, if (hret) { pr_err("failed to get event data in page %zu\n", i + event_data_offs); + ret = -EIO; goto e_event_data; } } @@ -778,18 +789,24 @@ static void create_events_from_catalog(struct attribute ***events_, event_idx_last, event_entry_count, junk_events); events = kmalloc_array(attr_max + 1, sizeof(*events), GFP_KERNEL); - if (!events) + if (!events) { + ret = -ENOMEM; goto e_event_data; + } event_descs = kmalloc_array(event_idx + 1, sizeof(*event_descs), GFP_KERNEL); - if (!event_descs) + if (!event_descs) { + ret = -ENOMEM; goto e_event_attrs; + } event_long_descs = kmalloc_array(event_idx + 1, sizeof(*event_long_descs), GFP_KERNEL); - if (!event_long_descs) + if (!event_long_descs) { + ret = -ENOMEM; goto e_event_descs; + } /* Iterate over the catalog filling in the attribute vector */ for (junk_events = 0, event_attr_ct = 0, desc_ct = 0, long_desc_ct = 0, @@ -843,7 +860,7 @@ static void create_events_from_catalog(struct attribute ***events_, *events_ = events; *event_descs_ = event_descs; *event_long_descs_ = event_long_descs; - return; + return 0; e_event_descs: kfree(event_descs); @@ -857,6 +874,7 @@ e_out: *events_ = NULL; *event_descs_ = NULL; *event_long_descs_ = NULL; + return ret; } static ssize_t catalog_read(struct file *filp, struct kobject *kobj, @@ -1219
[RFC PATCH powerpc] perf/hv-24x7 set the attr group to NULL if events failed to be initialized
sysfs_create_groups() creates groups one by one in the attr_groups array before a NULL entry is encountered. But if an error is seen, it stops and removes all the groups already created: for (i = 0; groups[i]; i++) { error = sysfs_create_group(kobj, groups[i]); if (error) { while (--i >= 0) sysfs_remove_group(kobj, groups[i]); break; } } And for the three event groups of 24x7, if it is not supported, according to the above logic, it causes format and interface group to be removed because of the error. This patch moves the three events groups to the end of the attr groups, and if create_events_from_catalog() fails to set their attributes, we set them to NULL in attr_groups. Signed-off-by: Li Zhong --- arch/powerpc/perf/hv-24x7.c | 37 - 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 9445a82..1e433f7 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -637,7 +637,7 @@ static ssize_t catalog_event_len_validate(struct hv_24x7_event_data *event, #define MAX_4K (SIZE_MAX / 4096) -static void create_events_from_catalog(struct attribute ***events_, +static int create_events_from_catalog(struct attribute ***events_, struct attribute ***event_descs_, struct attribute ***event_long_descs_) { @@ -843,7 +843,7 @@ static void create_events_from_catalog(struct attribute ***events_, *events_ = events; *event_descs_ = event_descs; *event_long_descs_ = event_long_descs; - return; + return 0; e_event_descs: kfree(event_descs); @@ -857,6 +857,7 @@ e_out: *events_ = NULL; *event_descs_ = NULL; *event_long_descs_ = NULL; + return -1; } static ssize_t catalog_read(struct file *filp, struct kobject *kobj, @@ -967,13 +968,25 @@ static struct attribute_group if_group = { .attrs = if_attrs, }; +enum GROUP_INDEX { + FORMAT, + IF, + EVENT, + EVENT_DESC, + EVENT_LONG_DESC, + MAX +}; + +/* + * keep the attr group whose attr is set in init (may fail) + * at the end + */ static const struct attribute_group *attr_groups[] = { - &format_group, - &event_group, - &event_desc_group, - &event_long_desc_group, - &if_group, - NULL, + [FORMAT] = &format_group, + [IF] = &if_group, + [EVENT] = &event_group, + [EVENT_DESC] = &event_desc_group, + [EVENT_LONG_DESC] = &event_long_desc_group }; DEFINE_PER_CPU(char, hv_24x7_reqb[4096]) __aligned(4096); @@ -1219,10 +1232,16 @@ static int hv_24x7_init(void) /* sampling not supported */ h_24x7_pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT; - create_events_from_catalog(&event_group.attrs, + r = create_events_from_catalog(&event_group.attrs, &event_desc_group.attrs, &event_long_desc_group.attrs); + if (r) { + h_24x7_pmu.attr_groups[EVENT] = + h_24x7_pmu.attr_groups[EVENT_DESC] = + h_24x7_pmu.attr_groups[EVENT_LONG_DESC] = 0; + } + r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1); if (r) return r; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [powerpc] init nvram_pstore_info's buf_lock
On 二, 2014-11-18 at 15:33 +1100, Michael Ellerman wrote: > On Mon, 2014-17-11 at 02:52:30 UTC, Li Zhong wrote: > > It seems nvram_pstore_info's buf_lock is not initialized before > > registering, which causes some strange behavior when trying to obtain > > the lock during kdump process. > > What kind of strange behaviour? Does it still work and just print a warning? > > It's static, so I'd expect it to be unlocked by default. Yes, you are right. It still works. I saw it on a UP configuration, noticed the console stopped for a couple of seconds, then "lockup suspected" warning printed out, but then it continued to run. After some further checking, it seems working as designed for the UP spinlock debug, from the comments in spinlock_up.h * In the debug case, 1 means unlocked, 0 means locked. (the values * are inverted, to catch initialization bugs) So try lock fails, and lockup reported, but then arch_spin_lock() passes. Thanks, Zhong > > cheers > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH powerpc] init nvram_pstore_info's buf_lock
It seems nvram_pstore_info's buf_lock is not initialized before registering, which causes some strange behavior when trying to obtain the lock during kdump process. Signed-off-by: Li Zhong --- arch/powerpc/platforms/pseries/nvram.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c index 11a3b61..054a0ed 100644 --- a/arch/powerpc/platforms/pseries/nvram.c +++ b/arch/powerpc/platforms/pseries/nvram.c @@ -715,6 +715,8 @@ static int nvram_pstore_init(void) nvram_pstore_info.buf = oops_data; nvram_pstore_info.bufsize = oops_data_sz; + spin_lock_init(&nvram_pstore_info.buf_lock); + rc = pstore_register(&nvram_pstore_info); if (rc != 0) pr_err("nvram: pstore_register() failed, defaults to " ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 1/3] powerpc: Fix warning reported by verify_cpu_node_mapping()
On 二, 2014-10-14 at 15:35 +1100, Michael Ellerman wrote: > On Tue, 2014-10-14 at 10:39 +0800, Li Zhong wrote: > > On 二, 2014-10-07 at 08:33 -0700, Nishanth Aravamudan wrote: > > > On 07.10.2014 [17:28:38 +1100], Michael Ellerman wrote: > > > > On Fri, 2014-10-03 at 16:26 -0700, Nishanth Aravamudan wrote: > > > > > On 03.10.2014 [10:50:20 +1000], Michael Ellerman wrote: > > > > > > On Thu, 2014-10-02 at 14:13 -0700, Nishanth Aravamudan wrote: > > > > > > > Ben & Michael, > > > > > > > > > > > > > > What's the status of these patches? > > > > > > > > > > > > Been in my next for a week :) > > > > > > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=next > > > > > > > > > > Ah ok, thanks -- I wasn't following your tree, my fault. > > > > > > > > Not really your fault, I hadn't announced my trees existence :) > > > > > > > > > Do we want these to go back to 3.17-stable, as they fix some annoying > > > > > splats > > > > > during boot (non-fatal afaict, though)? > > > > > > > > Up to you really, I don't know how often/bad they were. I haven't added > > > > CC > > > > stable tags to the commits, so if you want them in stable you should > > > > send them > > > > explicitly. > > > > > > I think they occur every boot, unconditionally, on pseries. Doesn't > > > prevent boot, just really noisy. I think it'd be good to get them into > > > -stable. > > > > > > Li Zhong, can you push them once they get sent upstream? > > > > I guess I only need to send the first two patches to stable? > > Probably. It's not clear from the changelog how serious a problem it fixes. > > See Documentation/stable_kernel_rules.txt OK, I'll send the first two to stable. Thanks, Zhong > > cheers > > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 1/3] powerpc: Fix warning reported by verify_cpu_node_mapping()
On 二, 2014-10-07 at 08:33 -0700, Nishanth Aravamudan wrote: > On 07.10.2014 [17:28:38 +1100], Michael Ellerman wrote: > > On Fri, 2014-10-03 at 16:26 -0700, Nishanth Aravamudan wrote: > > > On 03.10.2014 [10:50:20 +1000], Michael Ellerman wrote: > > > > On Thu, 2014-10-02 at 14:13 -0700, Nishanth Aravamudan wrote: > > > > > Ben & Michael, > > > > > > > > > > What's the status of these patches? > > > > > > > > Been in my next for a week :) > > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=next > > > > > > Ah ok, thanks -- I wasn't following your tree, my fault. > > > > Not really your fault, I hadn't announced my trees existence :) > > > > > Do we want these to go back to 3.17-stable, as they fix some annoying > > > splats > > > during boot (non-fatal afaict, though)? > > > > Up to you really, I don't know how often/bad they were. I haven't added CC > > stable tags to the commits, so if you want them in stable you should send > > them > > explicitly. > > I think they occur every boot, unconditionally, on pseries. Doesn't > prevent boot, just really noisy. I think it'd be good to get them into > -stable. > > Li Zhong, can you push them once they get sent upstream? I guess I only need to send the first two patches to stable? Thanks, Zhong > > Thanks, > Nish ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Reimplement __get_SP() as a function not a define
On 三, 2014-10-01 at 15:10 +1000, Anton Blanchard wrote: > Li Zhong points out an issue with our current __get_SP() > implementation. If ftrace function tracing is enabled (ie -pg > profiling using _mcount) we spill a stack frame on 64bit all the > time. > > If a function calls __get_SP() and later calls a function that is > tail call optimised, we will pop the stack frame and the value > returned by __get_SP() is no longer valid. An example from Li can > be found in save_stack_trace -> save_context_stack: > > c00432c0 <.save_stack_trace>: > c00432c0: mflrr0 > c00432c4: std r0,16(r1) > c00432c8: stdur1,-128(r1) <-- stack frame for _mcount > c00432cc: std r3,112(r1) > c00432d0: bl <._mcount> > c00432d4: nop > > c00432d8: mr r4,r1 <-- __get_SP() > > c00432dc: ld r5,632(r13) > c00432e0: ld r3,112(r1) > c00432e4: li r6,1 > > c00432e8: addir1,r1,128 <-- pop stack frame > > c00432ec: ld r0,16(r1) > c00432f0: mtlrr0 > c00432f4: b <.save_context_stack> <-- tail call optimized > > save_context_stack ends up with a stack pointer below the current > one, and it is likely to be scribbled over. > > Fix this by making __get_SP() a function which returns the > callers stack frame. Also replace inline assembly which grabs > the stack pointer in save_stack_trace and show_stack with > __get_SP(). > > Reported-by: Li Zhong > Signed-off-by: Anton Blanchard > --- > arch/powerpc/include/asm/reg.h | 3 +-- > arch/powerpc/kernel/misc.S | 4 > arch/powerpc/kernel/process.c| 2 +- > arch/powerpc/kernel/stacktrace.c | 2 +- > 4 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h > index 0c05059..0f973c0 100644 > --- a/arch/powerpc/include/asm/reg.h > +++ b/arch/powerpc/include/asm/reg.h > @@ -1264,8 +1264,7 @@ static inline unsigned long mfvtb (void) > > #define proc_trap() asm volatile("trap") > > -#define __get_SP() ({unsigned long sp; \ > - asm volatile("mr %0,1": "=r" (sp)); sp;}) > +extern unsigned long __get_SP(void); It seems that some module code is using __get_SP, e.g. xfs in the example below: ERROR: ".__get_SP" [fs/xfs/xfs.ko] undefined! Maybe we need export this symbol in arch/powerpc/kernel/ppc_ksyms.c? diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c index 48d17d6f..eebd4e4 100644 --- a/arch/powerpc/kernel/ppc_ksyms.c +++ b/arch/powerpc/kernel/ppc_ksyms.c @@ -207,3 +207,5 @@ EXPORT_SYMBOL_GPL(mmu_psize_defs); #ifdef CONFIG_EPAPR_PARAVIRT EXPORT_SYMBOL(epapr_hypercall_start); #endif + +EXPORT_SYMBOL(__get_SP); With the above compiling error fixed, this patch solved the SP issue I saw, so Tested-by: Li Zhong > > extern unsigned long scom970_read(unsigned int address); > extern void scom970_write(unsigned int address, unsigned long value); > diff --git a/arch/powerpc/kernel/misc.S b/arch/powerpc/kernel/misc.S > index 7ce26d4..120deb7 100644 > --- a/arch/powerpc/kernel/misc.S > +++ b/arch/powerpc/kernel/misc.S > @@ -114,3 +114,7 @@ _GLOBAL(longjmp) > mtlrr0 > mr r3,r4 > blr > + > +_GLOBAL(__get_SP) > + PPC_LL r3,0(r1) > + blr > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > index aa1df89..3cc6439 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -1545,7 +1545,7 @@ void show_stack(struct task_struct *tsk, unsigned long > *stack) > tsk = current; > if (sp == 0) { > if (tsk == current) > - asm("mr %0,1" : "=r" (sp)); > + sp = __get_SP(); > else > sp = tsk->thread.ksp; > } > diff --git a/arch/powerpc/kernel/stacktrace.c > b/arch/powerpc/kernel/stacktrace.c > index 3d30ef1..7f65bae 100644 > --- a/arch/powerpc/kernel/stacktrace.c > +++ b/arch/powerpc/kernel/stacktrace.c > @@ -50,7 +50,7 @@ void save_stack_trace(struct stack_trace *trace) > { > unsigned long sp; > > - asm("mr %0,1" : "=r" (sp)); > + sp = __get_SP(); > > save_context_stack(trace, sp, current, 1); > } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 1/3] powerpc: Fix warning reported by verify_cpu_node_mapping()
On 二, 2014-10-07 at 08:33 -0700, Nishanth Aravamudan wrote: > On 07.10.2014 [17:28:38 +1100], Michael Ellerman wrote: > > On Fri, 2014-10-03 at 16:26 -0700, Nishanth Aravamudan wrote: > > > On 03.10.2014 [10:50:20 +1000], Michael Ellerman wrote: > > > > On Thu, 2014-10-02 at 14:13 -0700, Nishanth Aravamudan wrote: > > > > > Ben & Michael, > > > > > > > > > > What's the status of these patches? > > > > > > > > Been in my next for a week :) > > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=next > > > > > > Ah ok, thanks -- I wasn't following your tree, my fault. > > > > Not really your fault, I hadn't announced my trees existence :) > > > > > Do we want these to go back to 3.17-stable, as they fix some annoying > > > splats > > > during boot (non-fatal afaict, though)? > > > > Up to you really, I don't know how often/bad they were. I haven't added CC > > stable tags to the commits, so if you want them in stable you should send > > them > > explicitly. > > I think they occur every boot, unconditionally, on pseries. Doesn't > prevent boot, just really noisy. I think it'd be good to get them into > -stable. > > Li Zhong, can you push them once they get sent upstream? Ok, I will send these patches to stable mailing list after it is merged. Thanks, Zhong > > Thanks, > Nish ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 3/3] powerpc: some changes in numa_setup_cpu()
this patches changes some error handling logics in numa_setup_cpu(), when cpu node is not found, so: if the cpu is possible, but not present, -1 is kept in numa_cpu_lookup_table, so later, if the cpu is added, we could set correct numa information for it. if the cpu is present, then we set the first online node to numa_cpu_lookup_table instead of 0 ( in case 0 might not be an online node? ) Cc: Nishanth Aravamudan Cc: Nathan Fontenot Signed-off-by: Li Zhong --- arch/powerpc/mm/numa.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 3a9061e..ec32d46 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -538,7 +538,7 @@ static int of_drconf_to_nid_single(struct of_drconf_cell *drmem, */ static int numa_setup_cpu(unsigned long lcpu) { - int nid; + int nid = -1; struct device_node *cpu; /* @@ -555,19 +555,21 @@ static int numa_setup_cpu(unsigned long lcpu) if (!cpu) { WARN_ON(1); - nid = 0; - goto out; + if (cpu_present(lcpu)) + goto out_present; + else + goto out; } nid = of_node_to_nid_single(cpu); +out_present: if (nid < 0 || !node_online(nid)) nid = first_online_node; -out: - map_cpu_to_node(lcpu, nid); + map_cpu_to_node(lcpu, nid); of_node_put(cpu); - +out: return nid; } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 2/3] powerpc: Only set numa node information for present cpus at boottime
As Nish suggested, it makes more sense to init the numa node informatiion for present cpus at boottime, which could also avoid WARN_ON(1) in numa_setup_cpu(). With this change, we also need to change the smp_prepare_cpus() to set up numa information only on present cpus. For those possible, but not present cpus, their numa information will be set up after they are started, as the original code did before commit 2fabf084b6ad. Cc: Nishanth Aravamudan Cc: Nathan Fontenot Signed-off-by: Li Zhong --- arch/powerpc/kernel/smp.c | 10 -- arch/powerpc/mm/numa.c| 2 +- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index a0738af..dc0e774 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -379,8 +379,11 @@ void __init smp_prepare_cpus(unsigned int max_cpus) /* * numa_node_id() works after this. */ - set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]); - set_cpu_numa_mem(cpu, local_memory_node(numa_cpu_lookup_table[cpu])); + if (cpu_present(cpu)) { + set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]); + set_cpu_numa_mem(cpu, + local_memory_node(numa_cpu_lookup_table[cpu])); + } } cpumask_set_cpu(boot_cpuid, cpu_sibling_mask(boot_cpuid)); @@ -728,6 +731,9 @@ void start_secondary(void *unused) } traverse_core_siblings(cpu, true); + set_numa_node(numa_cpu_lookup_table[cpu]); + set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu])); + smp_wmb(); notify_cpu_starting(cpu); set_cpu_online(cpu, true); diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 9918c02..3a9061e 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1127,7 +1127,7 @@ void __init do_init_bootmem(void) * even before we online them, so that we can use cpu_to_{node,mem} * early in boot, cf. smp_prepare_cpus(). */ - for_each_possible_cpu(cpu) { + for_each_present_cpu(cpu) { numa_setup_cpu((unsigned long)cpu); } } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 1/3] powerpc: Fix warning reported by verify_cpu_node_mapping()
With commit 2fabf084b6ad ("powerpc: reorder per-cpu NUMA information's initialization"), during boottime, cpu_numa_callback() is called earlier(before their online) for each cpu, and verify_cpu_node_mapping() uses cpu_to_node() to check whether siblings are in the same node. It skips the checking for siblings that are not online yet. So the only check done here is for the bootcpu, which is online at that time. But the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which will be set up in smp_prepare_cpus()). So I saw something like following reported: [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same node! As we don't actually do the checking during this early stage, so maybe we could directly call numa_setup_cpu() in do_init_bootmem(). Cc: Nishanth Aravamudan Cc: Nathan Fontenot Signed-off-by: Li Zhong --- arch/powerpc/mm/numa.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index d7737a5..9918c02 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1128,8 +1128,7 @@ void __init do_init_bootmem(void) * early in boot, cf. smp_prepare_cpus(). */ for_each_possible_cpu(cpu) { - cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE, - (void *)(unsigned long)cpu); + numa_setup_cpu((unsigned long)cpu); } } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc: Fix warning reported by verify_cpu_node_mapping()
On 三, 2014-08-27 at 09:41 +0800, Li Zhong wrote: > On 二, 2014-08-26 at 08:10 -0500, Nathan Fontenot wrote: > > On 08/25/2014 02:22 AM, Li Zhong wrote: > > > With commit 2fabf084b, during boottime, cpu_numa_callback() is called > > > earlier(before their online) for each cpu, and verify_cpu_node_mapping() > > > uses cpu_to_node() to check whether siblings are in the same node. > > > > > > It skips the checking for siblings that are not online yet. So the only > > > check done here is for the bootcpu, which is online at that time. But > > > the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which > > > will be set up in smp_prepare_cpus()). > > > > > > So I saw something like following reported: > > > [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same > > > node! > > > > > > As we don't actually do the checking during this early stage, so maybe > > > we could directly call numa_setup_cpu() in do_init_bootmem(). > > > > > > Also, as Nish suggested, here it's better to use present cpu mask > > > instead of possible mask to avoid warning in numa_setup_cpu(). > > > > > > Signed-off-by: Li Zhong > > > --- > > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > > > index d7737a5..3a9061e 100644 > > > --- a/arch/powerpc/mm/numa.c > > > +++ b/arch/powerpc/mm/numa.c > > > @@ -1127,9 +1127,8 @@ void __init do_init_bootmem(void) > > >* even before we online them, so that we can use cpu_to_{node,mem} > > >* early in boot, cf. smp_prepare_cpus(). > > >*/ > > > - for_each_possible_cpu(cpu) { > > > - cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE, > > > - (void *)(unsigned long)cpu); > > > + for_each_present_cpu(cpu) { > > > + numa_setup_cpu((unsigned long)cpu); > > > } > > > } > > > > > > > I am getting the following error on my system booting with this patch. > > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.0-202712-g9e81330-dirty #42 > > task: c000fea4 ti: c000fea8 task.ti: c000fea8 > > NIP: c01afad8 LR: c0193b68 CTR: > > REGS: c000fea839e0 TRAP: 0300 Not tainted > > (3.16.0-202712-g9e81330-dirty) > > MSR: 80019033 CR: 2400 XER: 2004 > > CFAR: c00084d4 DAR: 1690 DSISR: 4000 SOFTE: 1 > > GPR00: c0b6db9c c000fea83c60 c0cd0628 1688 > > GPR04: 0001 c000fea83c80 0990 > > GPR08: c0d531e0 c0d66218 c0d60628 > > GPR12: cec6 c000bc88 > > GPR16: > > GPR20: c0c21b88 c0c03738 > > GPR24: c0c03638 c0d24b10 c0c03638 c0c03738 > > GPR28: 0080 0080 c0d208e8 0010 > > NIP [c01afad8] next_zones_zonelist+0x8/0xa0 > > LR [c0193b68] local_memory_node+0x38/0x60 > > Call Trace: > > [c000fea83c60] [c000fea83c90] 0xc000fea83c90 (unreliable) > > [c000fea83c90] [c0b6db9c] smp_prepare_cpus+0x16c/0x278 > > [c000fea83d00] [c0b64098] kernel_init_freeable+0x150/0x340 > > [c000fea83dc0] [c000bca4] kernel_init+0x24/0x140 > > [c000fea83e30] [c0009560] ret_from_kernel_thread+0x5c/0x7c > > Instruction dump: > > e9230038 39490f00 7fa35040 409c000c 38630780 4e800020 7d234b78 4b64 > > 6000 6042 2c25 40c2004c <81230008> 7f892040 419d0014 4830 > > ---[ end trace cb88537fdc8fa200 ]--- > > > > Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b > > > > I think the loop needs to go back to initializing all possibe cpus instead > > of > > only the present cpus. We can add a check for present cpus in > > numa_setup_cpu() > > to avoid printing the WARN_ON() for cpus that are not present, something > > like > > the following... > > Ah, yes, seems the panic was caused by smp_prepare_cpus() using > uninitialized numa_cpu_lookup_table for cpus which are possible but not > present during boottime. > > However, by following change, it seems those cpus will be set to node 0 > at boottime, and not be changed after they become presen
Re: [PATCH v2] powerpc: Fix warning reported by verify_cpu_node_mapping()
On 二, 2014-08-26 at 08:10 -0500, Nathan Fontenot wrote: > On 08/25/2014 02:22 AM, Li Zhong wrote: > > With commit 2fabf084b, during boottime, cpu_numa_callback() is called > > earlier(before their online) for each cpu, and verify_cpu_node_mapping() > > uses cpu_to_node() to check whether siblings are in the same node. > > > > It skips the checking for siblings that are not online yet. So the only > > check done here is for the bootcpu, which is online at that time. But > > the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which > > will be set up in smp_prepare_cpus()). > > > > So I saw something like following reported: > > [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same > > node! > > > > As we don't actually do the checking during this early stage, so maybe > > we could directly call numa_setup_cpu() in do_init_bootmem(). > > > > Also, as Nish suggested, here it's better to use present cpu mask > > instead of possible mask to avoid warning in numa_setup_cpu(). > > > > Signed-off-by: Li Zhong > > --- > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > > index d7737a5..3a9061e 100644 > > --- a/arch/powerpc/mm/numa.c > > +++ b/arch/powerpc/mm/numa.c > > @@ -1127,9 +1127,8 @@ void __init do_init_bootmem(void) > > * even before we online them, so that we can use cpu_to_{node,mem} > > * early in boot, cf. smp_prepare_cpus(). > > */ > > - for_each_possible_cpu(cpu) { > > - cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE, > > - (void *)(unsigned long)cpu); > > + for_each_present_cpu(cpu) { > > + numa_setup_cpu((unsigned long)cpu); > > } > > } > > > > I am getting the following error on my system booting with this patch. > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.0-202712-g9e81330-dirty #42 > task: c000fea4 ti: c000fea8 task.ti: c000fea8 > NIP: c01afad8 LR: c0193b68 CTR: > REGS: c000fea839e0 TRAP: 0300 Not tainted > (3.16.0-202712-g9e81330-dirty) > MSR: 80019033 CR: 2400 XER: 2004 > CFAR: c00084d4 DAR: 1690 DSISR: 4000 SOFTE: 1 > GPR00: c0b6db9c c000fea83c60 c0cd0628 1688 > GPR04: 0001 c000fea83c80 0990 > GPR08: c0d531e0 c0d66218 c0d60628 > GPR12: cec6 c000bc88 > GPR16: > GPR20: c0c21b88 c0c03738 > GPR24: c0c03638 c0d24b10 c0c03638 c0c03738 > GPR28: 0080 0080 c0d208e8 0010 > NIP [c01afad8] next_zones_zonelist+0x8/0xa0 > LR [c0193b68] local_memory_node+0x38/0x60 > Call Trace: > [c000fea83c60] [c000fea83c90] 0xc000fea83c90 (unreliable) > [c000fea83c90] [c0b6db9c] smp_prepare_cpus+0x16c/0x278 > [c000fea83d00] [c0b64098] kernel_init_freeable+0x150/0x340 > [c000fea83dc0] [c000bca4] kernel_init+0x24/0x140 > [c000fea83e30] [c0009560] ret_from_kernel_thread+0x5c/0x7c > Instruction dump: > e9230038 39490f00 7fa35040 409c000c 38630780 4e800020 7d234b78 4b64 > 6000 6042 2c25 40c2004c <81230008> 7f892040 419d0014 4830 > ---[ end trace cb88537fdc8fa200 ]--- > > Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b > > I think the loop needs to go back to initializing all possibe cpus instead of > only the present cpus. We can add a check for present cpus in numa_setup_cpu() > to avoid printing the WARN_ON() for cpus that are not present, something like > the following... Ah, yes, seems the panic was caused by smp_prepare_cpus() using uninitialized numa_cpu_lookup_table for cpus which are possible but not present during boottime. However, by following change, it seems those cpus will be set to node 0 at boottime, and not be changed after they become present, because of the following check in numa_setup_cpu(): if ((nid = numa_cpu_lookup_table[lcpu]) >= 0) { map_cpu_to_node(lcpu, nid); return nid; } Maybe we could change the smp_prepare_cpus() to set numa information for present cpus instead? And for those possible, !present cpus, we could do the setup after they are started. Thanks, Zhong > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >
[PATCH v2] powerpc: Fix warning reported by verify_cpu_node_mapping()
With commit 2fabf084b, during boottime, cpu_numa_callback() is called earlier(before their online) for each cpu, and verify_cpu_node_mapping() uses cpu_to_node() to check whether siblings are in the same node. It skips the checking for siblings that are not online yet. So the only check done here is for the bootcpu, which is online at that time. But the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which will be set up in smp_prepare_cpus()). So I saw something like following reported: [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same node! As we don't actually do the checking during this early stage, so maybe we could directly call numa_setup_cpu() in do_init_bootmem(). Also, as Nish suggested, here it's better to use present cpu mask instead of possible mask to avoid warning in numa_setup_cpu(). Signed-off-by: Li Zhong --- diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index d7737a5..3a9061e 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1127,9 +1127,8 @@ void __init do_init_bootmem(void) * even before we online them, so that we can use cpu_to_{node,mem} * early in boot, cf. smp_prepare_cpus(). */ - for_each_possible_cpu(cpu) { - cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE, - (void *)(unsigned long)cpu); + for_each_present_cpu(cpu) { + numa_setup_cpu((unsigned long)cpu); } } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH powerpc] Fix warning reported by verify_cpu_node_mapping()
On 五, 2014-08-22 at 15:04 -0700, Nishanth Aravamudan wrote: > On 22.08.2014 [10:12:56 +0800], Li Zhong wrote: > > On ???, 2014-08-21 at 08:45 -0700, Nishanth Aravamudan wrote: > > > On 21.08.2014 [16:14:02 +0800], Li Zhong wrote: > > > > With commit 2fabf084b, during boottime, cpu_numa_callback() is called > > > > earlier(before their online) for each cpu, and verify_cpu_node_mapping() > > > > uses cpu_to_node() to check whether siblings are in the same node. > > > > > > > > It skips the checking for siblings that are not online yet. So the only > > > > check done here is for the bootcpu, which is online at that time. But > > > > the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which > > > > will be set up in smp_prepare_cpus()). > > > > > > > > So I could see something like following reported: > > > > [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same > > > > node! > > > > > > You mean you did see this, right? (as opposed to "could" based upon code > > > inspection or something) > > > > Yes, I did see the warnings. Seems I didn't express it precisely in > > English ... > > > > > > > > > > > > > As we don't actually do the checking during this early stage, so maybe > > > > we could directly call numa_setup_cpu() in do_init_bootmem()? > > > > > > > > Signed-off-by: Li Zhong > > > > > > Acked-by: Nishanth Aravamudan > > > > Thank you for the review, > > > > Zhong > > > > > > > > > --- > > > > arch/powerpc/mm/numa.c | 3 +-- > > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > > > > index d7737a5..9918c02 100644 > > > > --- a/arch/powerpc/mm/numa.c > > > > +++ b/arch/powerpc/mm/numa.c > > > > @@ -1128,8 +1128,7 @@ void __init do_init_bootmem(void) > > > > * early in boot, cf. smp_prepare_cpus(). > > > > */ > > > > for_each_possible_cpu(cpu) { > > > > - cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE, > > > > - (void *)(unsigned long)cpu); > > > > + numa_setup_cpu((unsigned long)cpu); > > > > > > This is a good change, thanks for catching it. I must have glossed over > > > those messages in my testing, my apologies! > > Actually, thinking about this more, do you think it makes more sense to > do: > > for_each_present_cpu(cpu) in this loop? That is, at boot, ensure all > present (but possibly offline) CPUs have their NUMA mapping set up. CPUs > that aren't present (but are possible) might trigger other warnings, > right? (e.g., the WARN_ON(1) in numa_setup_cpu) After reading the code that set up the cpu masks, I think you are right. I will send a new version with this fixed. Thanks, Zhong > > -Nish ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH powerpc] Fix warning reported by verify_cpu_node_mapping()
On 四, 2014-08-21 at 08:45 -0700, Nishanth Aravamudan wrote: > On 21.08.2014 [16:14:02 +0800], Li Zhong wrote: > > With commit 2fabf084b, during boottime, cpu_numa_callback() is called > > earlier(before their online) for each cpu, and verify_cpu_node_mapping() > > uses cpu_to_node() to check whether siblings are in the same node. > > > > It skips the checking for siblings that are not online yet. So the only > > check done here is for the bootcpu, which is online at that time. But > > the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which > > will be set up in smp_prepare_cpus()). > > > > So I could see something like following reported: > > [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same > > node! > > You mean you did see this, right? (as opposed to "could" based upon code > inspection or something) Yes, I did see the warnings. Seems I didn't express it precisely in English ... > > > > > As we don't actually do the checking during this early stage, so maybe > > we could directly call numa_setup_cpu() in do_init_bootmem()? > > > > Signed-off-by: Li Zhong > > Acked-by: Nishanth Aravamudan Thank you for the review, Zhong > > > --- > > arch/powerpc/mm/numa.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > > index d7737a5..9918c02 100644 > > --- a/arch/powerpc/mm/numa.c > > +++ b/arch/powerpc/mm/numa.c > > @@ -1128,8 +1128,7 @@ void __init do_init_bootmem(void) > > * early in boot, cf. smp_prepare_cpus(). > > */ > > for_each_possible_cpu(cpu) { > > - cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE, > > - (void *)(unsigned long)cpu); > > + numa_setup_cpu((unsigned long)cpu); > > This is a good change, thanks for catching it. I must have glossed over > those messages in my testing, my apologies! > > -Nish ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH powerpc] Fix warning reported by verify_cpu_node_mapping()
With commit 2fabf084b, during boottime, cpu_numa_callback() is called earlier(before their online) for each cpu, and verify_cpu_node_mapping() uses cpu_to_node() to check whether siblings are in the same node. It skips the checking for siblings that are not online yet. So the only check done here is for the bootcpu, which is online at that time. But the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which will be set up in smp_prepare_cpus()). So I could see something like following reported: [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same node! As we don't actually do the checking during this early stage, so maybe we could directly call numa_setup_cpu() in do_init_bootmem()? Signed-off-by: Li Zhong --- arch/powerpc/mm/numa.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index d7737a5..9918c02 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1128,8 +1128,7 @@ void __init do_init_bootmem(void) * early in boot, cf. smp_prepare_cpus(). */ for_each_possible_cpu(cpu) { - cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE, - (void *)(unsigned long)cpu); + numa_setup_cpu((unsigned long)cpu); } } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH powerpc] use machine_subsys_initcall() for opal_hmi_handler_init()
As opal_message_init() uses machine_early_initcall(powernv, ), and opal_hmi_handler_init() depends on that early initcall, so it also needs use machine_* to check the machine_id. Signed-off-by: Li Zhong --- arch/powerpc/platforms/powernv/opal-hmi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c b/arch/powerpc/platforms/powernv/opal-hmi.c index 97ac8dc..5e1ed15 100644 --- a/arch/powerpc/platforms/powernv/opal-hmi.c +++ b/arch/powerpc/platforms/powernv/opal-hmi.c @@ -28,6 +28,7 @@ #include #include +#include static int opal_hmi_handler_nb_init; struct OpalHmiEvtNode { @@ -185,4 +186,4 @@ static int __init opal_hmi_handler_init(void) } return 0; } -subsys_initcall(opal_hmi_handler_init); +machine_subsys_initcall(powernv, opal_hmi_handler_init); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH powerpc] use _GLOBAL_TOC for memmove
memmove may be called from module code copy_pages(btrfs), and it may call memcpy, which may call back to C code, so it needs to use _GLOBAL_TOC to set up r2 correctly. This fixes following error when I tried to boot an le guest: Vector: 300 (Data Access) at [c00073f97210] pc: c0015004: enable_kernel_altivec+0x24/0x80 lr: c0058fbc: enter_vmx_copy+0x3c/0x60 sp: c00073f97490 msr: 82009033 dar: d1d50170 dsisr: 4000 current = 0xc000734c paca= 0xcfff softe: 0irq_happened: 0x01 pid = 815, comm = mktemp enter ? for help [c00073f974f0] c0058fbc enter_vmx_copy+0x3c/0x60 [c00073f97510] c0057d34 memcpy_power7+0x274/0x840 [c00073f97610] d1c3179c copy_pages+0xfc/0x110 [btrfs] [c00073f97660] d1c3c248 memcpy_extent_buffer+0xe8/0x160 [btrfs] [c00073f97700] d1be4be8 setup_items_for_insert+0x208/0x4a0 [btrfs] [c00073f97820] d1be50b4 btrfs_insert_empty_items+0xf4/0x140 [btrfs] [c00073f97890] d1bfed30 insert_with_overflow+0x70/0x180 [btrfs] [c00073f97900] d1bff174 btrfs_insert_dir_item+0x114/0x2f0 [btrfs] [c00073f979a0] d1c1f92c btrfs_add_link+0x10c/0x370 [btrfs] [c00073f97a40] d1c20e94 btrfs_create+0x204/0x270 [btrfs] [c00073f97b00] c026d438 vfs_create+0x178/0x210 [c00073f97b50] c0270a70 do_last+0x9f0/0xe90 [c00073f97c20] c0271010 path_openat+0x100/0x810 [c00073f97ce0] c0272ea8 do_filp_open+0x58/0xd0 [c00073f97dc0] c025ade8 do_sys_open+0x1b8/0x300 [c00073f97e30] c000a008 syscall_exit+0x0/0x7c --- Exception: c01 (System Call) at 3fffaabe2854 Signed-off-by: Li Zhong --- arch/powerpc/lib/mem_64.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S index 0738f96..43435c6 100644 --- a/arch/powerpc/lib/mem_64.S +++ b/arch/powerpc/lib/mem_64.S @@ -77,7 +77,7 @@ _GLOBAL(memset) stb r4,0(r6) blr -_GLOBAL(memmove) +_GLOBAL_TOC(memmove) cmplw 0,r3,r4 bgt backwards_memcpy b memcpy ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH powerpc] Fix parameter restoring issue in commit 752a6422f
DaYu just reminded me that Michael already had a patch removing all the related code. Please ignore this patch.. Thanks, Zhong On Wed, 2014-06-25 at 12:00 +0800, Li Zhong wrote: > In commit 752a6422f, new stack frame is created for parameters. > > However, the r1 is added back a little earlier, so r3 and r4 are > restored from a wrong place, which could cause following error during > boot: > > Querying for OPAL presence... there ! > DEFAULT CATCH!, exception-handler=fff00700 > at %SRR0: 04223058 %SRR1: 80081002 > Open Firmware exception handler entered from non-OF code > > Client's Fix Pt Regs: > 00 04223054 04223020 04fbe838 0002 > 04 28002024 04fbe838 0427e838 04222f20 > 08 04222f20 1002 > 0c a001 01a3fd20 040eb170 > 10 040eb628 040eb368 fffd 01a3fd20 > 14 01b37f00 0f34 00cc 0f34 > 18 040ebb08 0358 040eb128 04285920 > 1c 01a3fd60 040eb100 7c0802a6f8010010 f821ff914b91ebd1 > Special Regs: > %IV: 0700 %CR: 28002022%XER: %DSISR: 4200 > %SRR0: 04223058 %SRR1: 80081002 > %LR: 04223054%CTR: 0000 > %DAR: 01a3fcf00020b4ac > Virtual PID = 0 > ok > > Signed-off-by: Li Zhong > --- > diff --git a/arch/powerpc/platforms/powernv/opal-takeover.S > b/arch/powerpc/platforms/powernv/opal-takeover.S > index 11a3169..9093540 100644 > --- a/arch/powerpc/platforms/powernv/opal-takeover.S > +++ b/arch/powerpc/platforms/powernv/opal-takeover.S > @@ -5,6 +5,7 @@ > * > * This program is free software; you can redistribute it and/or > * modify it under the terms of the GNU General Public License > +B > * as published by the Free Software Foundation; either version > * 2 of the License, or (at your option) any later version. > */ > @@ -27,11 +28,11 @@ _GLOBAL(opal_query_takeover) > li r3,H_HAL_TAKEOVER > li r4,H_HAL_TAKEOVER_QUERY_MAGIC > HVSC > - addir1,r1,STACKFRAMESIZE > ld r10,STK_PARAM(R3)(r1) > std r4,0(r10) > ld r10,STK_PARAM(R4)(r1) > std r5,0(r10) > + addir1,r1,STACKFRAMESIZE > lwz r0,8(r1) > mtcrf 0xff,r0 > blr > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH powerpc] Fix parameter restoring issue in commit 752a6422f
In commit 752a6422f, new stack frame is created for parameters. However, the r1 is added back a little earlier, so r3 and r4 are restored from a wrong place, which could cause following error during boot: Querying for OPAL presence... there ! DEFAULT CATCH!, exception-handler=fff00700 at %SRR0: 04223058 %SRR1: 80081002 Open Firmware exception handler entered from non-OF code Client's Fix Pt Regs: 00 04223054 04223020 04fbe838 0002 04 28002024 04fbe838 0427e838 04222f20 08 04222f20 1002 0c a001 01a3fd20 040eb170 10 040eb628 040eb368 fffd 01a3fd20 14 01b37f00 0f34 00cc 0f34 18 040ebb08 0358 040eb128 04285920 1c 01a3fd60 040eb100 7c0802a6f8010010 f821ff914b91ebd1 Special Regs: %IV: 0700 %CR: 28002022%XER: %DSISR: 4200 %SRR0: 04223058 %SRR1: 80081002 %LR: 04223054%CTR: %DAR: 01a3fcf00020b4ac Virtual PID = 0 ok Signed-off-by: Li Zhong --- diff --git a/arch/powerpc/platforms/powernv/opal-takeover.S b/arch/powerpc/platforms/powernv/opal-takeover.S index 11a3169..9093540 100644 --- a/arch/powerpc/platforms/powernv/opal-takeover.S +++ b/arch/powerpc/platforms/powernv/opal-takeover.S @@ -5,6 +5,7 @@ * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License +B * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ @@ -27,11 +28,11 @@ _GLOBAL(opal_query_takeover) li r3,H_HAL_TAKEOVER li r4,H_HAL_TAKEOVER_QUERY_MAGIC HVSC - addir1,r1,STACKFRAMESIZE ld r10,STK_PARAM(R3)(r1) std r4,0(r10) ld r10,STK_PARAM(R4)(r1) std r5,0(r10) + addir1,r1,STACKFRAMESIZE lwz r0,8(r1) mtcrf 0xff,r0 blr ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/4] powerpc: start loop at section start of start in vmemmap_populated()
vmemmap_populated() checks whether the [start, start + page_size) has valid pfn numbers, to know whether a vmemmap mapping has been created that includes this range. Some range before end might not be checked by this loop: sec11start..start11..sec11end/sec12start..endstart12..sec12end as the above, for start11(section 11), it checks [sec11start, sec11end), and loop ends as the next start(start12) is bigger than end. However, [sec11end/sec12start, end) is not checked here. So before the loop, adjust the start to be the start of the section, so we don't miss ranges like the above. After we adjust start to be the start of the section, it also means it's aligned with vmemmap as of the sizeof struct page, so we could use page_to_pfn directly in the loop. Signed-off-by: Li Zhong Cc: Nathan Fontenot --- arch/powerpc/mm/init_64.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index 4963790..253b4b9 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -175,9 +175,10 @@ static unsigned long __meminit vmemmap_section_start(unsigned long page) static int __meminit vmemmap_populated(unsigned long start, int page_size) { unsigned long end = start + page_size; + start = (unsigned long)(pfn_to_page(vmemmap_section_start(start))); for (; start < end; start += (PAGES_PER_SECTION * sizeof(struct page))) - if (pfn_valid(vmemmap_section_start(start))) + if (pfn_valid(page_to_pfn((struct page *)start))) return 1; return 0; -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/4] powerpc: implement vmemmap_free()
vmemmap_free() does the opposite of vmemap_populate(). This patch also puts vmemmap_free() and vmemmap_list_free() into CONFIG_MEMMORY_HOTPLUG. Signed-off-by: Li Zhong Cc: Nathan Fontenot --- arch/powerpc/mm/init_64.c | 85 ++--- 1 file changed, 64 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index 69203c8..4963790 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -298,6 +298,37 @@ static __meminit void vmemmap_list_populate(unsigned long phys, vmemmap_list = vmem_back; } +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node) +{ + unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift; + + /* Align to the page size of the linear mapping. */ + start = _ALIGN_DOWN(start, page_size); + + pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node); + + for (; start < end; start += page_size) { + void *p; + + if (vmemmap_populated(start, page_size)) + continue; + + p = vmemmap_alloc_block(page_size, node); + if (!p) + return -ENOMEM; + + vmemmap_list_populate(__pa(p), start, node); + + pr_debug(" * %016lx..%016lx allocated at %p\n", +start, start + page_size, p); + + vmemmap_create_mapping(start, page_size, __pa(p)); + } + + return 0; +} + +#ifdef CONFIG_MEMORY_HOTPLUG static unsigned long vmemmap_list_free(unsigned long start) { struct vmemmap_backing *vmem_back, *vmem_back_prev; @@ -330,40 +361,52 @@ static unsigned long vmemmap_list_free(unsigned long start) return vmem_back->phys; } -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node) +void __ref vmemmap_free(unsigned long start, unsigned long end) { unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift; - /* Align to the page size of the linear mapping. */ start = _ALIGN_DOWN(start, page_size); - pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node); + pr_debug("vmemmap_free %lx...%lx\n", start, end); for (; start < end; start += page_size) { - void *p; + unsigned long addr; + /* +* the section has already be marked as invalid, so +* vmemmap_populated() true means some other sections still +* in this page, so skip it. +*/ if (vmemmap_populated(start, page_size)) continue; - p = vmemmap_alloc_block(page_size, node); - if (!p) - return -ENOMEM; - - vmemmap_list_populate(__pa(p), start, node); - - pr_debug(" * %016lx..%016lx allocated at %p\n", -start, start + page_size, p); - - vmemmap_create_mapping(start, page_size, __pa(p)); + addr = vmemmap_list_free(start); + if (addr) { + struct page *page = pfn_to_page(addr >> PAGE_SHIFT); + + if (PageReserved(page)) { + /* allocated from bootmem */ + if (page_size < PAGE_SIZE) { + /* +* this shouldn't happen, but if it is +* the case, leave the memory there +*/ + WARN_ON_ONCE(1); + } else { + unsigned int nr_pages = + 1 << get_order(page_size); + while (nr_pages--) + free_reserved_page(page++); + } + } else + free_pages((unsigned long)(__va(addr)), + get_order(page_size)); + + vmemmap_remove_mapping(start, page_size); + } } - - return 0; -} - -void vmemmap_free(unsigned long start, unsigned long end) -{ } - +#endif void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/4] powerpc: implement vmemmap_remove_mapping() for BOOK3S
This is to be called in vmemmap_free(), leave the implementation on BOOK3E empty as before. Signed-off-by: Li Zhong Cc: Nathan Fontenot --- arch/powerpc/mm/hash_utils_64.c |2 +- arch/powerpc/mm/init_64.c | 22 ++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index 88fdd9d..25d9d66 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -243,7 +243,7 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend, } #ifdef CONFIG_MEMORY_HOTPLUG -static int htab_remove_mapping(unsigned long vstart, unsigned long vend, +int htab_remove_mapping(unsigned long vstart, unsigned long vend, int psize, int ssize) { unsigned long vaddr; diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index fa5d28b..69203c8 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -212,6 +212,13 @@ static void __meminit vmemmap_create_mapping(unsigned long start, for (i = 0; i < page_size; i += PAGE_SIZE) BUG_ON(map_kernel_page(start + i, phys, flags)); } + +#ifdef CONFIG_MEMORY_HOTPLUG +static void vmemmap_remove_mapping(unsigned long start, + unsigned long page_size) +{ +} +#endif #else /* CONFIG_PPC_BOOK3E */ static void __meminit vmemmap_create_mapping(unsigned long start, unsigned long page_size, @@ -223,6 +230,21 @@ static void __meminit vmemmap_create_mapping(unsigned long start, mmu_kernel_ssize); BUG_ON(mapped < 0); } + +#ifdef CONFIG_MEMORY_HOTPLUG +extern int htab_remove_mapping(unsigned long vstart, unsigned long vend, + int psize, int ssize); + +static void vmemmap_remove_mapping(unsigned long start, + unsigned long page_size) +{ + int mapped = htab_remove_mapping(start, start + page_size, +mmu_vmemmap_psize, +mmu_kernel_ssize); + BUG_ON(mapped < 0); +} +#endif + #endif /* CONFIG_PPC_BOOK3E */ struct vmemmap_backing *vmemmap_list; -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/4] powerpc: implement vmemmap_list_free()
This patch implements vmemmap_list_free() for vmemmap_free(). The freed entries will be removed from vmemmap_list, and form a freed list, with next as the header. The next position in the last allocated page is kept at the list tail. When allocation, if there are freed entries left, get it from the freed list; if no freed entries left, get it like before from the last allocated pages. With this change, realmode_pfn_to_page() also needs to be changed to walk all the entries in the vmemmap_list, as the virt_addr of the entries might not be stored in order anymore. It helps to reuse the memory when continuous doing memory hot-plug/remove operations, but didn't reclaim the pages already allocated, so the memory usage will only increase, but won't exceed the value for the largest memory configuration. Signed-off-by: Li Zhong Cc: Nathan Fontenot --- arch/powerpc/mm/init_64.c | 62 + 1 file changed, 52 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index e3734ed..fa5d28b 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -226,14 +226,24 @@ static void __meminit vmemmap_create_mapping(unsigned long start, #endif /* CONFIG_PPC_BOOK3E */ struct vmemmap_backing *vmemmap_list; +static struct vmemmap_backing *next; +static int num_left; +static int num_freed; static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node) { - static struct vmemmap_backing *next; - static int num_left; + struct vmemmap_backing *vmem_back; + /* get from freed entries first */ + if (num_freed) { + num_freed--; + vmem_back = next; + next = next->list; + + return vmem_back; + } /* allocate a page when required and hand out chunks */ - if (!next || !num_left) { + if (!num_left) { next = vmemmap_alloc_block(PAGE_SIZE, node); if (unlikely(!next)) { WARN_ON(1); @@ -266,6 +276,38 @@ static __meminit void vmemmap_list_populate(unsigned long phys, vmemmap_list = vmem_back; } +static unsigned long vmemmap_list_free(unsigned long start) +{ + struct vmemmap_backing *vmem_back, *vmem_back_prev; + + vmem_back_prev = vmem_back = vmemmap_list; + + /* look for it with prev pointer recorded */ + for (; vmem_back; vmem_back = vmem_back->list) { + if (vmem_back->virt_addr == start) + break; + vmem_back_prev = vmem_back; + } + + if (unlikely(!vmem_back)) { + WARN_ON(1); + return 0; + } + + /* remove it from vmemmap_list */ + if (vmem_back == vmemmap_list) /* remove head */ + vmemmap_list = vmem_back->list; + else + vmem_back_prev->list = vmem_back->list; + + /* next point to this freed entry */ + vmem_back->list = next; + next = vmem_back; + num_freed++; + + return vmem_back->phys; +} + int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node) { unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift; @@ -331,16 +373,16 @@ struct page *realmode_pfn_to_page(unsigned long pfn) if (pg_va < vmem_back->virt_addr) continue; - /* Check that page struct is not split between real pages */ - if ((pg_va + sizeof(struct page)) > - (vmem_back->virt_addr + page_size)) - return NULL; - - page = (struct page *) (vmem_back->phys + pg_va - + /* After vmemmap_list entry free is possible, need check all */ + if ((pg_va + sizeof(struct page)) <= + (vmem_back->virt_addr + page_size)) { + page = (struct page *) (vmem_back->phys + pg_va - vmem_back->virt_addr); - return page; + return page; + } } + /* Probably that page struct is split between real pages */ return NULL; } EXPORT_SYMBOL_GPL(realmode_pfn_to_page); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH] powerpc: fix wrong sp saved in save_stack_trace()
I found stack trace couldn't be saved sometimes. After some investigation, it seems that when function trace is enabled, void save_stack_trace(struct stack_trace *trace) { unsigned long sp; asm("mr %0,1" : "=r" (sp)); save_context_stack(trace, sp, current, 1); } is compiled into: c00432c0 <.save_stack_trace>: c00432c0: 7c 08 02 a6 mflrr0 c00432c4: f8 01 00 10 std r0,16(r1) c00432c8: f8 21 ff 81 stdur1,-128(r1) c00432cc: f8 61 00 70 std r3,112(r1) c00432d0: 4b fc 77 bd bl c000aa8c <._mcount> c00432d4: 60 00 00 00 nop c00432d8: 7c 24 0b 78 mr r4,r1 c00432dc: e8 ad 02 78 ld r5,632(r13) c00432e0: e8 61 00 70 ld r3,112(r1) c00432e4: 38 c0 00 01 li r6,1 c00432e8: 38 21 00 80 addir1,r1,128 c00432ec: e8 01 00 10 ld r0,16(r1) c00432f0: 7c 08 03 a6 mtlrr0 c00432f4: 4b ff fe 5c b c0043150 <.save_context_stack> c00432f8: 60 00 00 00 nop c00432fc: 60 42 00 00 ori r2,r2,0 new stack frame -128(r1) is created to call ._mcount, and this new r1 is copied into sp as the stack pointer, which then could be overwritten by save_context_stack's prolog. I don't know how to specify in C that the embedded asm be compiled after r1 being added back to the original value. But as a workaround, maybe we could move this embedded asm into save_context_stack(). Signed-off-by: Li Zhong --- arch/powerpc/kernel/stacktrace.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c index 3d30ef1..5c0b461 100644 --- a/arch/powerpc/kernel/stacktrace.c +++ b/arch/powerpc/kernel/stacktrace.c @@ -22,6 +22,9 @@ static void save_context_stack(struct stack_trace *trace, unsigned long sp, struct task_struct *tsk, int savesched) { + if (tsk == current) + asm("mr %0,1" : "=r" (sp)); + for (;;) { unsigned long *stack = (unsigned long *) sp; unsigned long newsp, ip; @@ -48,11 +51,7 @@ static void save_context_stack(struct stack_trace *trace, unsigned long sp, void save_stack_trace(struct stack_trace *trace) { - unsigned long sp; - - asm("mr %0,1" : "=r" (sp)); - - save_context_stack(trace, sp, current, 1); + save_context_stack(trace, 0, current, 1); } EXPORT_SYMBOL_GPL(save_stack_trace); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Fix Oops in rtas_stop_self()
commit 41dd03a9 may cause Oops in rtas_stop_self(). The reason is that the rtas_args was moved into stack space. For a box with more that 4GB RAM, the stack could easily be outside 32bit range, but RTAS is 32bit. So the patch moves rtas_args away from stack by adding static before it. Signed-off-by: Li Zhong Signed-off-by: Anton Blanchard Cc: sta...@vger.kernel.org # 3.14+ --- arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 9b8e050..20d6297 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -88,13 +88,14 @@ void set_default_offline_state(int cpu) static void rtas_stop_self(void) { - struct rtas_args args = { - .token = cpu_to_be32(rtas_stop_self_token), + static struct rtas_args args = { .nargs = 0, .nret = 1, .rets = &args.args[0], }; + args.token = cpu_to_be32(rtas_stop_self_token); + local_irq_disable(); BUG_ON(rtas_stop_self_token == RTAS_UNKNOWN_SERVICE); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH] Fix Oops in rtas_stop_self()
On Fri, 2014-04-25 at 22:18 +1000, Anton Blanchard wrote: > Hi, > > > When trying offline cpus, I noticed following Oops in > > rtas_stop_self(), and it seems caused by commit 41dd03a9. The Oops > > disappears after reverting this commit. > > > > After reading the code, I guess it might be caused by moving the > > rtas_args to stack. Still need some more time to read enter_rtas to > > understand why it happens, but the problem seems could be solved by > > moving the rtas_args away from stack by adding static before it. > > Nice catch. RTAS is 32bit and if your box has more than 4GB RAM then > your stack could easily be outside 32bit range. Ah, yes, the stack here is obviously at a much higher address than 4GB. > > You can add: > > Signed-off-by: Anton Blanchard > > And also: > > Cc: sta...@vger.kernel.org # 3.14+ OK, Thanks, Zhong > > > Signed-off-by: Li Zhong > > --- > > arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c > > b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 9b8e050..20d6297 > > 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > > @@ -88,13 +88,14 @@ void set_default_offline_state(int cpu) > > > > static void rtas_stop_self(void) > > { > > - struct rtas_args args = { > > - .token = cpu_to_be32(rtas_stop_self_token), > > + static struct rtas_args args = { > > .nargs = 0, > > .nret = 1, > > .rets = &args.args[0], > > }; > > > > + args.token = cpu_to_be32(rtas_stop_self_token); > > + > > local_irq_disable(); > > > > BUG_ON(rtas_stop_self_token == RTAS_UNKNOWN_SERVICE); > > > > > > ___ > > Linuxppc-dev mailing list > > Linuxppc-dev@lists.ozlabs.org > > https://lists.ozlabs.org/listinfo/linuxppc-dev > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH] Fix Oops in rtas_stop_self()
When trying offline cpus, I noticed following Oops in rtas_stop_self(), and it seems caused by commit 41dd03a9. The Oops disappears after reverting this commit. After reading the code, I guess it might be caused by moving the rtas_args to stack. Still need some more time to read enter_rtas to understand why it happens, but the problem seems could be solved by moving the rtas_args away from stack by adding static before it. [ 247.194623] cpu 28 (hwid 28) Ready to die... [ 247.194641] Unable to handle kernel paging request for data at address 0xecb213c50 [ 247.194642] Faulting instruction address: 0x0f34157c [ 247.194645] Oops: Kernel access of bad area, sig: 11 [#1] [ 247.194648] SMP NR_CPUS=1024 NUMA pSeries [ 247.194664] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables ehea ibmvscsi scsi_transport_srp scsi_tgt [ 247.194667] CPU: 28 PID: 0 Comm: swapper/28 Not tainted 3.15.0-rc1+ #10 [ 247.194670] CPU: 27 PID: 1171 Comm: drmgr Not tainted 3.15.0-rc1+ #10 [ 247.194673] task: c007ca2f0ba0 ti: c00ecb21 task.ti: c00ecb21 [ 247.194679] NIP: 0f34157c LR: a84c CTR: 0151 [ 247.194680] REGS: c00ecb213780 TRAP: 0300 Not tainted (3.15.0-rc1+) [ 247.194684] MSR: 80001000 CR: XER: [ 247.194707] CFAR: c000a844 DAR: 000ecb213c50 DSISR: 4000 SOFTE: 0 [ 247.194707] GPR00: 80001030 c00ecb213a00 c14c7880 000ecb213c50 [ 247.194707] GPR04: 0f34 000ecb213c50 80001032 [ 247.194707] GPR08: 1000 c00ecb213a00 [ 247.194707] GPR12: cf227e00 c091f5b0 0f394e2c [ 247.194707] GPR16: 001c 0001 001c c13ad5b0 [ 247.194707] GPR20: 0001 c13bfb89 c0d57688 [ 247.194707] GPR24: 001c c0d44e2c c00ecb21 c13c0f48 [ 247.194707] GPR28: c0d44e28 c150ca68 00e0 c00ecb21 [ 247.194710] NIP [0f34157c] 0xf34157c [ 247.194712] LR [a84c] 0xa84c [ 247.194714] Call Trace: [ 247.194716] [c00ecb213a00] [001c] 0x1c (unreliable) [ 247.194722] [c00ecb213be0] [c007ffe0] .pseries_mach_cpu_die+0x150/0x350 [ 247.194725] [c00ecb213cf0] [c00420cc] .cpu_die+0x3c/0x60 [ 247.194727] [c00ecb213d60] [c0017e58] .arch_cpu_idle_dead+0x28/0x40 [ 247.194732] [c00ecb213dd0] [c0104f40] .cpu_startup_entry+0x570/0x590 [ 247.194735] [c00ecb213ed0] [c0041d4c] .start_secondary+0x2dc/0x310 [ 247.194738] [c00ecb213f90] [c000996c] .start_secondary_prolog+0x10/0x14 [ 247.194740] Instruction dump: [ 247.194744] [ 247.194748] [ 247.194751] ---[ end trace d1d21584135396ba ]--- Signed-off-by: Li Zhong --- arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 9b8e050..20d6297 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -88,13 +88,14 @@ void set_default_offline_state(int cpu) static void rtas_stop_self(void) { - struct rtas_args args = { - .token = cpu_to_be32(rtas_stop_self_token), + static struct rtas_args args = { .nargs = 0, .nret = 1, .rets = &args.args[0], }; + args.token = cpu_to_be32(rtas_stop_self_token); + local_irq_disable(); BUG_ON(rtas_stop_self_token == RTAS_UNKNOWN_SERVICE); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 powerpc] Protect remove_memory() with device hotplug lock
While testing memory hot-remove, I found following dead lock: Process #1141 is drmgr, trying to remove some memory, i.e. memory499. It holds the memory_hotplug_mutex, and blocks when trying to remove file "online" under dir memory499, in kernfs_drain(), at wait_event(root->deactivate_waitq, atomic_read(&kn->active) == KN_DEACTIVATED_BIAS); Process #1120 is trying to online memory499 by echo 1 > memory499/online In .kernfs_fop_write, it uses kernfs_get_active() to increase &kn->active, thus blocking process #1141. While itself is blocked later when trying to acquire memory_hotplug_mutex, which is held by process #1141. The backtrace of both processes are shown below: # cat /proc/1120/stack [] 0xc1b18600 [] .__switch_to+0x144/0x200 [] .online_pages+0x74/0x7b0 [] .memory_subsys_online+0x9c/0x150 [] .device_online+0xb8/0x120 [] .online_store+0xb4/0xc0 [] .dev_attr_store+0x64/0xa0 [] .sysfs_kf_write+0x7c/0xb0 [] .kernfs_fop_write+0x154/0x1e0 [] .vfs_write+0xe0/0x260 [] .SyS_write+0x64/0x110 [] syscall_exit+0x0/0x7c # cat /proc/1141/stack [] 0xc1b18600 [] .__switch_to+0x144/0x200 [] .__kernfs_remove+0x204/0x300 [] .kernfs_remove_by_name_ns+0x68/0xf0 [] .sysfs_remove_file_ns+0x38/0x60 [] .device_remove_attrs+0x54/0xc0 [] .device_del+0x158/0x250 [] .device_unregister+0x34/0xa0 [] .unregister_memory_section+0x164/0x170 [] .__remove_pages+0x108/0x4c0 [] .arch_remove_memory+0x60/0xc0 [] .remove_memory+0x8c/0xe0 [] .pseries_remove_memblock+0xd4/0x160 [] .pseries_memory_notifier+0x27c/0x290 [] .notifier_call_chain+0x8c/0x100 [] .__blocking_notifier_call_chain+0x6c/0xe0 [] .of_property_notify+0x7c/0xc0 [] .of_update_property+0x3c/0x1b0 [] .ofdt_write+0x3dc/0x740 [] .proc_reg_write+0xac/0x110 [] .vfs_write+0xe0/0x260 [] .SyS_write+0x64/0x110 [] syscall_exit+0x0/0x7c This patch uses lock_device_hotplug() to protect remove_memory() called in pseries_remove_memblock(), which is also stated before function remove_memory(): * NOTE: The caller must call lock_device_hotplug() to serialize hotplug * and online/offline operations before this call, as required by * try_offline_node(). */ void __ref remove_memory(int nid, u64 start, u64 size) With this lock held, the other process(#1120 above) trying to online the memory block will retry the system call when calling lock_device_hotplug_sysfs(), and finally find No such device error. Signed-off-by: Li Zhong --- arch/powerpc/platforms/pseries/hotplug-memory.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 573b488..7f75c94 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -100,10 +100,10 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz start_pfn = base >> PAGE_SHIFT; - if (!pfn_valid(start_pfn)) { - memblock_remove(base, memblock_size); - return 0; - } + lock_device_hotplug(); + + if (!pfn_valid(start_pfn)) + goto out; block_sz = memory_block_size_bytes(); sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; @@ -114,8 +114,10 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz base += MIN_MEMORY_BLOCK_SIZE; } +out: /* Update memory regions for memory remove */ memblock_remove(base, memblock_size); + unlock_device_hotplug(); return 0; } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH powerpc] Protect remove_memory() with device hotplug lock
While testing memory hot-remove, I found following dead lock: Process #1141 is drmgr, trying to remove some memory, i.e. memory499. It holds the memory_hotplug_mutex, and blocks when trying to remove file "online" under dir memory499, in kernfs_drain(), at wait_event(root->deactivate_waitq, atomic_read(&kn->active) == KN_DEACTIVATED_BIAS); Process #1120 is trying to online memory499 by echo 1 > memory499/online In .kernfs_fop_write, it uses kernfs_get_active() to increase &kn->active, thus blocking process #1141. While itself is blocked later when trying to acquire memory_hotplug_mutex, which is held by process #1141. The backtrace of both processes are shown below: # cat /proc/1120/stack [] 0xc1b18600 [] .__switch_to+0x144/0x200 [] .online_pages+0x74/0x7b0 [] .memory_subsys_online+0x9c/0x150 [] .device_online+0xb8/0x120 [] .online_store+0xb4/0xc0 [] .dev_attr_store+0x64/0xa0 [] .sysfs_kf_write+0x7c/0xb0 [] .kernfs_fop_write+0x154/0x1e0 [] .vfs_write+0xe0/0x260 [] .SyS_write+0x64/0x110 [] syscall_exit+0x0/0x7c # cat /proc/1141/stack [] 0xc1b18600 [] .__switch_to+0x144/0x200 [] .__kernfs_remove+0x204/0x300 [] .kernfs_remove_by_name_ns+0x68/0xf0 [] .sysfs_remove_file_ns+0x38/0x60 [] .device_remove_attrs+0x54/0xc0 [] .device_del+0x158/0x250 [] .device_unregister+0x34/0xa0 [] .unregister_memory_section+0x164/0x170 [] .__remove_pages+0x108/0x4c0 [] .arch_remove_memory+0x60/0xc0 [] .remove_memory+0x8c/0xe0 [] .pseries_remove_memblock+0xd4/0x160 [] .pseries_memory_notifier+0x27c/0x290 [] .notifier_call_chain+0x8c/0x100 [] .__blocking_notifier_call_chain+0x6c/0xe0 [] .of_property_notify+0x7c/0xc0 [] .of_update_property+0x3c/0x1b0 [] .ofdt_write+0x3dc/0x740 [] .proc_reg_write+0xac/0x110 [] .vfs_write+0xe0/0x260 [] .SyS_write+0x64/0x110 [] syscall_exit+0x0/0x7c This patch uses lock_device_hotplug() to protect remove_memory() called in pseries_remove_memblock(), which is also stated before function remove_memory(): * NOTE: The caller must call lock_device_hotplug() to serialize hotplug * and online/offline operations before this call, as required by * try_offline_node(). */ void __ref remove_memory(int nid, u64 start, u64 size) With this lock held, the other process(#1120 above) trying to online the memory block will retry the system call when calling lock_device_hotplug_sysfs(), and finally find No such device error. Signed-off-by: Li Zhong --- arch/powerpc/platforms/pseries/hotplug-memory.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 573b488..e96357c 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -109,10 +109,12 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; nid = memory_add_physaddr_to_nid(base); + lock_device_hotplug(); for (i = 0; i < sections_per_block; i++) { remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE); base += MIN_MEMORY_BLOCK_SIZE; } + unlock_device_hotplug(); /* Update memory regions for memory remove */ memblock_remove(base, memblock_size); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: use of_node_init() for the fakenode in msi_bitmap.c
This patch uses of_node_init() to initialize the kobject in the fake node used in test_of_node(), to avoid following kobject warning. [0.897654] kobject: '(null)' (c007ca183a08): is not initialized, yet kobject_put() is being called. [0.897682] [ cut here ] [0.897688] WARNING: at lib/kobject.c:670 [0.897692] Modules linked in: [0.897701] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 3.14.0+ #1 [0.897708] task: c007ca10 ti: c007ca18 task.ti: c007ca18 [0.897715] NIP: c046a1f0 LR: c046a1ec CTR: 01704660 [0.897721] REGS: c007ca1835c0 TRAP: 0700 Not tainted (3.14.0+) [0.897727] MSR: 80029032 CR: 2824 XER: 000d [0.897749] CFAR: c08ef4ec SOFTE: 1 GPR00: c046a1ec c007ca183840 c14c59b8 005c GPR04: 0001 c0129770 0001 GPR08: 3fef GPR12: cf221200 c000c350 GPR16: GPR20: GPR24: c144e808 c0c56f20 00d8 GPR28: c0cd5058 c1454ca8 c007ca183a08 [0.897856] NIP [c046a1f0] .kobject_put+0xa0/0xb0 [0.897863] LR [c046a1ec] .kobject_put+0x9c/0xb0 [0.897868] Call Trace: [0.897874] [c007ca183840] [c046a1ec] .kobject_put+0x9c/0xb0 (unreliable) [0.897885] [c007ca1838c0] [c0743f9c] .of_node_put+0x2c/0x50 [0.897894] [c007ca183940] [c0c83954] .test_of_node+0x1dc/0x208 [0.897902] [c007ca183b80] [c0c839a4] .msi_bitmap_selftest+0x24/0x38 [0.897913] [c007ca183bf0] [c000bb34] .do_one_initcall+0x144/0x200 [0.897922] [c007ca183ce0] [c0c748e4] .kernel_init_freeable+0x2b4/0x394 [0.897931] [c007ca183db0] [c000c374] .kernel_init+0x24/0x130 [0.897940] [c007ca183e30] [c000a2f4] .ret_from_kernel_thread+0x5c/0x68 [0.897947] Instruction dump: [0.897952] 7fe3fb78 38210080 e8010010 ebe1fff8 7c0803a6 4800014c e89f 3c62ff6e [0.897971] 7fe5fb78 3863a950 48485279 6000 <0fe0> 3900 393f0038 4b80 [0.897992] ---[ end trace 1eeffdb9f825a556 ]--- Signed-off-by: Li Zhong --- arch/powerpc/sysdev/msi_bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c index 8ba6042..2ff6302 100644 --- a/arch/powerpc/sysdev/msi_bitmap.c +++ b/arch/powerpc/sysdev/msi_bitmap.c @@ -202,7 +202,7 @@ void __init test_of_node(void) /* There should really be a struct device_node allocator */ memset(&of_node, 0, sizeof(of_node)); - kref_init(&of_node.kobj.kref); + of_node_init(&of_node); of_node.full_name = node_name; check(0 == msi_bitmap_alloc(&bmp, size, &of_node)); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH powerpc] cpuidle: fix per cpu accessing of cpuidle_devices
The patch tries to fix following bad address accessing, seems caused by a typo in commit 7f74dc0f. cpuidle_devices is defined statically, so per_cpu should be used to access it. [ 204.774193] Unable to handle kernel paging request for data at address 0x00a7d000 [ 204.774215] Faulting instruction address: 0xc06ee92c [ 204.774225] Oops: Kernel access of bad area, sig: 11 [#1] [ 204.774230] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 204.774244] Modules linked in: binfmt_misc [ 204.774254] CPU: 8 PID: 3590 Comm: bash Not tainted 3.13.0-08526-gb2e448e-dirty #2 [ 204.774262] task: c001faf4 ti: c001fa904000 task.ti: c001fa904000 [ 204.774268] NIP: c06ee92c LR: c06ee920 CTR: c0064e94 [ 204.774275] REGS: c001fa907240 TRAP: 0300 Not tainted (3.13.0-08526-gb2e448e-dirty) [ 204.774281] MSR: 80009032 CR: 22242424 XER: 2000 [ 204.774303] CFAR: 9088 DAR: 00a7d000 DSISR: 4000 SOFTE: 1 GPR00: c06ee920 c001fa9074c0 c0d764a8 c0c5d4c0 GPR04: 0010 0010 GPR08: c16564a8 GPR12: 0002 cf33e800 1012b3dc GPR16: 10129c58 10129bf8 1012b948 GPR20: 1012b3e4 1013ea90 GPR24: c0c5d858 c15f38e8 c0cb8b26 c0c664a8 GPR28: ffe5 00a7d000 00a7d000 [ 204.774407] NIP [c06ee92c] .cpuidle_disable_device+0x30/0xc4 [ 204.774413] LR [c06ee920] .cpuidle_disable_device+0x24/0xc4 [ 204.774419] Call Trace: [ 204.774424] [c001fa9074c0] [c06ee920] .cpuidle_disable_device+0x24/0xc4 (unreliable) [ 204.774435] [c001fa907540] [c00652a4] .pseries_cpuidle_add_cpu_notifier+0xb8/0xe0 [ 204.774445] [c001fa9075c0] [c0855548] .notifier_call_chain+0x150/0x1c0 [ 204.774455] [c001fa907670] [c00b3564] .__raw_notifier_call_chain+0x40/0x50 [ 204.774463] [c001fa907710] [c0079c40] .__cpu_notify+0x50/0x9c [ 204.774472] [c001fa9077a0] [c018a054] ._cpu_down+0x1ec/0x388 [ 204.774479] [c001fa9078b0] [c018a234] .cpu_down+0x44/0x64 [ 204.774488] [c001fa907940] [c05627bc] .cpu_subsys_offline+0x24/0x3c [ 204.774497] [c001fa9079c0] [c055b2a4] .device_offline+0xc8/0x120 [ 204.774504] [c001fa907a50] [c055d000] .online_store+0x74/0xb0 [ 204.774512] [c001fa907b00] [c0559924] .dev_attr_store+0x60/0x78 [ 204.774520] [c001fa907ba0] [c028cc78] .sysfs_kf_write+0x7c/0x9c [ 204.774528] [c001fa907c30] [c0291ae8] .kernfs_fop_write+0x108/0x174 [ 204.774537] [c001fa907cd0] [c01f8308] .vfs_write+0x110/0x21c [ 204.774545] [c001fa907d70] [c01f854c] .SyS_write+0x70/0xbc [ 204.774553] [c001fa907e30] [c0009d88] syscall_exit+0x0/0x7c [ 204.774560] Instruction dump: [ 204.774565] 7c0802a6 f8010010 fbe1fff8 f821ff81 7c7f1b78 6000 6000 7fe3fb78 [ 204.774582] 48000c89 6000 2fbf 419e0084 780b17e1 41820078 2fa3 [ 204.774601] ---[ end trace edf1df93cf81e28d ]--- Signed-off-by: Li Zhong --- arch/powerpc/platforms/pseries/processor_idle.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c index 94134a5..197cadc 100644 --- a/arch/powerpc/platforms/pseries/processor_idle.c +++ b/arch/powerpc/platforms/pseries/processor_idle.c @@ -190,7 +190,7 @@ static int pseries_cpuidle_add_cpu_notifier(struct notifier_block *n, { int hotcpu = (unsigned long)hcpu; struct cpuidle_device *dev = - per_cpu_ptr(cpuidle_devices, hotcpu); + per_cpu(cpuidle_devices, hotcpu); if (dev && cpuidle_get_driver()) { switch (action) { ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH powerpc] Fix compile error of pgtable-ppc64.h
On Fri, 2014-01-17 at 12:27 +0530, Aneesh Kumar K.V wrote: > Li Zhong writes: > > > It seems that forward declaration couldn't work well with typedef, use > > struct spinlock directly to avoiding following build errors: > > > > In file included from include/linux/spinlock.h:81, > > from include/linux/seqlock.h:35, > > from include/linux/time.h:5, > > from include/uapi/linux/timex.h:56, > > from include/linux/timex.h:56, > > from include/linux/sched.h:17, > > from arch/powerpc/kernel/asm-offsets.c:17: > > include/linux/spinlock_types.h:76: error: redefinition of typedef > > 'spinlock_t' > > /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: > > previous declaration of 'spinlock_t' was here > > > > what compiler version ? I have seen that error in gcc 4.3 and it was > concluded that it is too old a compiler version to worry about. That > specific compiler version also gave error for forward declaring struct; gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC), and it doesn't report error after use struct directly. > > > Signed-off-by: Li Zhong > > --- > > arch/powerpc/include/asm/pgtable-ppc64.h |6 +++--- > > 1 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h > > b/arch/powerpc/include/asm/pgtable-ppc64.h > > index d27960c..bc141c9 100644 > > --- a/arch/powerpc/include/asm/pgtable-ppc64.h > > +++ b/arch/powerpc/include/asm/pgtable-ppc64.h > > @@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, > > unsigned long address, > > pmd_t *pmdp); > > > > #define pmd_move_must_withdraw pmd_move_must_withdraw > > -typedef struct spinlock spinlock_t; > > -static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, > > -spinlock_t *old_pmd_ptl) > > +struct spinlock; > > +static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, > > +struct spinlock *old_pmd_ptl) > > { > > /* > > * Archs like ppc64 use pgtable to store per pmd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH powerpc] Fix compile error of pgtable-ppc64.h
It seems that forward declaration couldn't work well with typedef, use struct spinlock directly to avoiding following build errors: In file included from include/linux/spinlock.h:81, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/uapi/linux/timex.h:56, from include/linux/timex.h:56, from include/linux/sched.h:17, from arch/powerpc/kernel/asm-offsets.c:17: include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t' /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here Signed-off-by: Li Zhong --- arch/powerpc/include/asm/pgtable-ppc64.h |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index d27960c..bc141c9 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #define pmd_move_must_withdraw pmd_move_must_withdraw -typedef struct spinlock spinlock_t; -static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, -spinlock_t *old_pmd_ptl) +struct spinlock; +static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, +struct spinlock *old_pmd_ptl) { /* * Archs like ppc64 use pgtable to store per pmd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH powerpc 2/2] adjust the variable type and name in __pte_free_tlb()
The patch adjusts the variable type and name for page in __pte_free_tlb(), which now seems a little confusing. Signed-off-by: Li Zhong --- arch/powerpc/include/asm/pgalloc-64.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/pgalloc-64.h b/arch/powerpc/include/asm/pgalloc-64.h index d7543c2..1580f9e 100644 --- a/arch/powerpc/include/asm/pgalloc-64.h +++ b/arch/powerpc/include/asm/pgalloc-64.h @@ -148,11 +148,11 @@ static inline void pgtable_free_tlb(struct mmu_gather *tlb, static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table, unsigned long address) { - struct page *page = page_address(table); + void *page_addr = page_address(table); tlb_flush_pgtable(tlb, address); pgtable_page_dtor(table); - pgtable_free_tlb(tlb, page, 0); + pgtable_free_tlb(tlb, page_addr, 0); } #else /* if CONFIG_PPC_64K_PAGES */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH powerpc 1/2] fix using page adddress as the page stucture pointer in __pte_free_tlb()
It seems that in __pte_free_tlb (non 64K page code path), we wrongly pass the page address as the page structure pointer to pgtable_page_dtor(), which needs the page structure pointer as the argument. The change also fixes following bug on next-1128: [0.563113] Unable to handle kernel paging request for data at address 0x [0.563121] Faulting instruction address: 0xc01d8e30 [0.563128] Oops: Kernel access of bad area, sig: 11 [#1] [0.563132] PREEMPT SMP NR_CPUS=16 NUMA pSeries [0.563143] Modules linked in: [0.563150] CPU: 9 PID: 1 Comm: init Not tainted 3.13.0-rc1-next-20131128-dirty #1 [0.563157] task: c001fed4 ti: c001fed3c000 task.ti: c001fed3c000 [0.563163] NIP: c01d8e30 LR: c01da4e8 CTR: c018ed28 [0.563171] REGS: c001fed3f060 TRAP: 0300 Not tainted (3.13.0-rc1-next-20131128-dirty) [0.563177] MSR: 80009032 CR: 24222982 XER: 2001 [0.563197] CFAR: c01d8e0c DAR: DSISR: 4200 SOFTE: 1 GPR00: c001fed3f2e0 c0d4ce98 c001fe01f500 GPR04: f000 c01a9980 GPR08: 00b69000 0004 c001fc550108 GPR12: 44779982 cf33eb00 0020 1000 GPR16: c000 0001 c001fcbd8000 c001fa4b83f8 GPR20: 3fe0 c001fa0a8ff8 GPR24: 0029 c001fed3c000 00210d00 0001 GPR28: c001fe01f500 f000 [0.563299] NIP [c01d8e30] .__slab_free+0xc8/0x42c [0.563306] LR [c01da4e8] .kmem_cache_free+0x1d4/0x364 [0.563311] Call Trace: [0.563316] [c001fed3f2e0] [6db6db6db6db6db7] 0x6db6db6db6db6db7 (unreliable) [0.563325] [c001fed3f410] [c01da4e8] .kmem_cache_free+0x1d4/0x364 [0.563334] [c001fed3f4d0] [c01a9980] .ptlock_free+0x2c/0x44 [0.563342] [c001fed3f550] [c01aec98] .free_pgd_range+0x340/0x3d8 [0.563350] [c001fed3f680] [c01ee6b8] .shift_arg_pages+0x130/0x1a0 [0.563358] [c001fed3f7c0] [c01ee888] .setup_arg_pages+0x160/0x214 [0.563366] [c001fed3f870] [c0257664] .load_elf_binary+0x540/0x168c [0.563374] [c001fed3f9d0] [c01ecf28] .search_binary_handler+0xcc/0x238 [0.563382] [c001fed3fa80] [c02551e8] .load_script+0x26c/0x290 [0.563390] [c001fed3fb90] [c01ecf28] .search_binary_handler+0xcc/0x238 [0.563397] [c001fed3fc40] [c01ef1f4] .do_execve_common+0x6b4/0xa0c [0.563406] [c001fed3fd20] [c01ef634] .do_execve+0x40/0x58 [0.563414] [c001fed3fdb0] [c000bdb4] .kernel_init+0x74/0x158 [0.563422] [c001fed3fe30] [c000a0ec] .ret_from_kernel_thread+0x5c/0x70 [0.563429] Instruction dump: [0.563433] 635a0d00 3b61 2fbc 41fe0014 e8810070 7f83e378 486543c1 6000 [0.563450] e81e0022 829f0018 92810090 eaff0010 <7efd012a> a1210090 8aa10093 3809 [0.563469] ---[ end trace 3be1fd3d950f1716 ]--- Signed-off-by: Li Zhong --- arch/powerpc/include/asm/pgalloc-64.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/pgalloc-64.h b/arch/powerpc/include/asm/pgalloc-64.h index 6940128..d7543c2 100644 --- a/arch/powerpc/include/asm/pgalloc-64.h +++ b/arch/powerpc/include/asm/pgalloc-64.h @@ -151,7 +151,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table, struct page *page = page_address(table); tlb_flush_pgtable(tlb, address); - pgtable_page_dtor(page); + pgtable_page_dtor(table); pgtable_free_tlb(tlb, page, 0); } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH powerpc] Revert c6102609 and replace it with the correct fix for vio dma mask setting
This patch reverts my previous "fix", and replace it with the correct fix from Russell. And as Russell pointed out -- dma_set_mask_and_coherent() (and the other dma_set_mask() functions) are really supposed to be used by drivers only. Signed-off-by: Li Zhong --- arch/powerpc/kernel/vio.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c index 76a6482..d771778 100644 --- a/arch/powerpc/kernel/vio.c +++ b/arch/powerpc/kernel/vio.c @@ -1419,7 +1419,8 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) /* needed to ensure proper operation of coherent allocations * later, in case driver doesn't set it explicitly */ - dma_coerce_mask_and_coherent(&viodev->dev, DMA_BIT_MASK(64)); + viodev->dev.coherent_dma_mask = DMA_BIT_MASK(64); + viodev->dev.dma_mask = &viodev->dev.coherent_dma_mask; } /* register with generic device framework */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH powerpc] Fix a dma_mask issue of vio
On Thu, 2013-11-21 at 11:22 +1100, Benjamin Herrenschmidt wrote: > On Thu, 2013-11-21 at 00:08 +, Russell King - ARM Linux wrote: > > On Thu, Nov 21, 2013 at 11:01:42AM +1100, Benjamin Herrenschmidt wrote: > > > On Wed, 2013-11-20 at 23:23 +, Russell King - ARM Linux wrote: > > > > Li Zong's patch works around the issue of a failing dma_set_mask(), > > > > but as I've already said elsewhere, the real fix is to get whatever > > > > created the struct device to initialise the dev->dma_mask with a > > > > bus default. > > > > > > > > Using dma_coerce_xxx() merely makes the problem "go away" papering > > > > over the issue - it's fine to do it this way, but someone should still > > > > fix the broken code creating these devices... > > > > > > Ok, they are created by the vio bus core, so it should be doing the > > > job here of setting the dma_mask pointer to a proper value. > > > > > > Li, can you take care of that ? Look at other bus types we have in > > > there such as the macio bus etc... > > > > Oh, hang on a moment, this is the "bus" code. > > > > In which case, the question becomes: do vio devices ever need to have > > a separate streaming DMA mask from a coherent DMA mask? If not, then > > something like the following is what's needed here, and I should've > > never have used dma_set_mask_and_coherent(). > > No, a single mask. > > > dma_set_mask_and_coherent() (and the other dma_set_mask() functions) > > are really supposed to be used by drivers only. > > > > arch/powerpc/kernel/vio.c |3 ++- > > 1 files changed, 2 insertions(+), 1 deletions(-) > > > > diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c > > index e7d0c88f621a..d771778f398e 100644 > > --- a/arch/powerpc/kernel/vio.c > > +++ b/arch/powerpc/kernel/vio.c > > @@ -1419,7 +1419,8 @@ struct vio_dev *vio_register_device_node(struct > > device_node *of_node) > > > > /* needed to ensure proper operation of coherent allocations > > * later, in case driver doesn't set it explicitly */ > > - dma_set_mask_and_coherent(&viodev->dev, DMA_BIT_MASK(64)); > > + viodev->dev.coherent_dma_mask = DMA_BIT_MASK(64); > > + viodev->dev.dma_mask = &viodev->dev.coherent_dma_mask; > > } > > > > /* register with generic device framework */ > > Right that's exactly what I had in mind. Li, can you test this please ? Sure, and it works. Tested-by: Li Zhong > > The previous "fix" using dma_coerce_mask_and_coherent() is already on > its way to Linus, so we'll rework the above patch to undo it but for > now please test. > > Cheers, > Ben. > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH powerpc] Fix a dma_mask issue of vio
On Wed, 2013-11-20 at 12:28 +1100, Benjamin Herrenschmidt wrote: > On Tue, 2013-11-19 at 16:11 +0800, Li Zhong wrote: > > I encountered following issue: > > [0.283035] ibmvscsi 3015: couldn't initialize event pool > > [5.688822] ibmvscsi: probe of 3015 failed with error -1 > > > > which prevents the storage from being recognized, and the machine from > > booting. > > > > After some digging, it seems that it is caused by commit 4886c399da > > > > as dma_mask pointer in viodev->dev is not set, so in > > dma_set_mask_and_coherent(), dma_set_coherent_mask() is not called > > because dma_set_mask(), which is dma_set_mask_pSeriesLP() returned EIO. > > While before the commit, dma_set_coherent_mask() is always called. > > > > I tried to replace dma_set_mask_and_coherent() with > > dma_coerce_mask_and_coherent(), and the machine could boot again. > > > > But I'm not sure whether this is the correct fix... > > Russell, care to chime in ? I can't make sense of the semantics... > > The original commit was fairly clear: > > << > Replace the following sequence: > > dma_set_mask(dev, mask); > dma_set_coherent_mask(dev, mask); > > with a call to the new helper dma_set_mask_and_coherent(). > >> > > It all makes sense so far ... but doesn't work for some odd reason, > and the "fix" uses a function whose name doesn't make much sense to > me ... what is the difference between "setting" and "coercing" > the mask ? And why doe replacing two "set" with a "set both" doesn't > work and require a coerce ? I think the difference is because the check of return value from dma_set_mask in dma_coerce_mask_and_coherent(): -- int rc = dma_set_mask(dev, mask); if (rc == 0) dma_set_coherent_mask(dev, mask); -- and in struct device {, dma_mask is a pointer, while coherent_dma_mask is value (don't know why we have this difference). And here for pseries, dma_set_mask() failed because the dma_mask pointer still remains null. And in dma_coerce_mask_and_coherent(), the dma_mask is set with the address of coherent_dma_mask -- dev->dma_mask = &dev->coherent_dma_mask; -- Thanks, Zhong > > I'm asking because I'm worried about breakage elsewhere... > > Cheers, > Ben. > > > --- > > arch/powerpc/kernel/vio.c |2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c > > index e7d0c88..76a6482 100644 > > --- a/arch/powerpc/kernel/vio.c > > +++ b/arch/powerpc/kernel/vio.c > > @@ -1419,7 +1419,7 @@ struct vio_dev *vio_register_device_node(struct > > device_node *of_node) > > > > /* needed to ensure proper operation of coherent allocations > > * later, in case driver doesn't set it explicitly */ > > - dma_set_mask_and_coherent(&viodev->dev, DMA_BIT_MASK(64)); > > + dma_coerce_mask_and_coherent(&viodev->dev, DMA_BIT_MASK(64)); > > } > > > > /* register with generic device framework */ > > > > > > > > > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH powerpc] Fix compiling error in powernv/rng.c
On Tue, 2013-11-19 at 15:04 +1100, Michael Ellerman wrote: > On Fri, Nov 15, 2013 at 03:36:04PM +0800, Li Zhong wrote: > > This is seen when CONFIG_SMP is not enabled: > > > > arch/powerpc/platforms/powernv/rng.c: In function 'rng_init_per_cpu': > > arch/powerpc/platforms/powernv/rng.c:74: error: implicit declaration of > > function 'cpu_to_chip_id' > > Hi Li, > > We try whenever possible to avoid adding #ifdefs in C code. > > Also on a multi chip system where there are multiple RNGs, your code for > UP will not necessarily choose the RNG on the same core as the cpu. OK, thank you for the review, Michael. Just try to make sure I understand it: So even in UP, we could have multiple rng sources, and we should try to use the source which has the same chip_id as the logical cpu? Thanks, Zhong > I have a different fix that I will send. > > cheers > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH powerpc] Fix a dma_mask issue of vio
I encountered following issue: [0.283035] ibmvscsi 3015: couldn't initialize event pool [5.688822] ibmvscsi: probe of 3015 failed with error -1 which prevents the storage from being recognized, and the machine from booting. After some digging, it seems that it is caused by commit 4886c399da as dma_mask pointer in viodev->dev is not set, so in dma_set_mask_and_coherent(), dma_set_coherent_mask() is not called because dma_set_mask(), which is dma_set_mask_pSeriesLP() returned EIO. While before the commit, dma_set_coherent_mask() is always called. I tried to replace dma_set_mask_and_coherent() with dma_coerce_mask_and_coherent(), and the machine could boot again. But I'm not sure whether this is the correct fix... --- arch/powerpc/kernel/vio.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c index e7d0c88..76a6482 100644 --- a/arch/powerpc/kernel/vio.c +++ b/arch/powerpc/kernel/vio.c @@ -1419,7 +1419,7 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) /* needed to ensure proper operation of coherent allocations * later, in case driver doesn't set it explicitly */ - dma_set_mask_and_coherent(&viodev->dev, DMA_BIT_MASK(64)); + dma_coerce_mask_and_coherent(&viodev->dev, DMA_BIT_MASK(64)); } /* register with generic device framework */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH powerpc] Fix compiling error in powernv/rng.c
This is seen when CONFIG_SMP is not enabled: arch/powerpc/platforms/powernv/rng.c: In function 'rng_init_per_cpu': arch/powerpc/platforms/powernv/rng.c:74: error: implicit declaration of function 'cpu_to_chip_id' Signed-off-by: Li Zhong --- arch/powerpc/platforms/powernv/rng.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/platforms/powernv/rng.c b/arch/powerpc/platforms/powernv/rng.c index 8844628..04430a7 100644 --- a/arch/powerpc/platforms/powernv/rng.c +++ b/arch/powerpc/platforms/powernv/rng.c @@ -59,6 +59,7 @@ int powernv_get_random_long(unsigned long *v) } EXPORT_SYMBOL_GPL(powernv_get_random_long); +#ifdef CONFIG_SMP static __init void rng_init_per_cpu(struct powernv_rng *rng, struct device_node *dn) { @@ -75,6 +76,13 @@ static __init void rng_init_per_cpu(struct powernv_rng *rng, } } } +#else +static __init void rng_init_per_cpu(struct powernv_rng *rng, + struct device_node *dn) +{ + per_cpu(powernv_rng, 0) = rng; +} +#endif static __init int rng_create(struct device_node *dn) { ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH powerpc] Fix compiling error in rng.c
This patch tries to fix following compiling errors, by including the needed header file: arch/powerpc/platforms/pseries/rng.c: In function 'pseries_get_random_long': arch/powerpc/platforms/pseries/rng.c:20: error: 'PLPAR_HCALL_BUFSIZE' undeclared (first use in this function) ... Signed-off-by: Li Zhong --- arch/powerpc/platforms/pseries/rng.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/platforms/pseries/rng.c b/arch/powerpc/platforms/pseries/rng.c index a702f1c..62c7838 100644 --- a/arch/powerpc/platforms/pseries/rng.c +++ b/arch/powerpc/platforms/pseries/rng.c @@ -13,7 +13,7 @@ #include #include #include - +#include static int pseries_get_random_long(unsigned long *v) { ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH powerpc] Set cpu sibling mask before online cpu
On Tue, 2013-06-11 at 14:03 +0530, Srivatsa S. Bhat wrote: > On 06/11/2013 12:30 PM, Benjamin Herrenschmidt wrote: > > On Thu, 2013-05-16 at 18:20 +0800, Li Zhong wrote: > >> It seems following race is possible: > >> > > > > .../... > > > >>vdso_getcpu_init(); > >> #endif > >> - notify_cpu_starting(cpu); > >> - set_cpu_online(cpu, true); > >>/* Update sibling maps */ > >>base = cpu_first_thread_sibling(cpu); > >>for (i = 0; i < threads_per_core; i++) { > >> - if (cpu_is_offline(base + i)) > >> + if (cpu_is_offline(base + i) && (cpu != base + i)) > >>continue; > >>cpumask_set_cpu(cpu, cpu_sibling_mask(base + i)); > >>cpumask_set_cpu(base + i, cpu_sibling_mask(cpu)); > >> @@ -667,6 +665,10 @@ __cpuinit void start_secondary(void *unused) > >>} > >>of_node_put(l2_cache); > >> > >> + smp_wmb(); > >> + notify_cpu_starting(cpu); > >> + set_cpu_online(cpu, true); > >> + > > > > So we could have an online CPU with an empty sibling mask. Now we can > > have a sibling that isn't online ... Is that ok ? > > I think it is OK. We do the same thing on x86 as well - we set up the > sibling links before calling notify_cpu_starting() and setting the cpu > in the cpu_online_mask. In fact, there is even a comment explicitly > noting that order: > > arch/x86/kernel/smpboot.c: > 220 /* > 221 * This must be done before setting cpu_online_mask > 222 * or calling notify_cpu_starting. > 223 */ > 224 set_cpu_sibling_map(raw_smp_processor_id()); > 225 wmb(); > 226 > 227 notify_cpu_starting(cpuid); > 228 > 229 /* > 230 * Allow the master to continue. > 231 */ > 232 cpumask_set_cpu(cpuid, cpu_callin_mask); > > > So I agree with Li Zhong's solution. > > [Arch-specific CPU hotplug code consolidation efforts such as [1] would > have weeded out such nasty bugs.. I guess we should revive that patchset > sometime soon.] > Thank you both for the review and comments. Good to know that it matches that of x86, and there is a patchset consolidating the code. With the patches in [1], it seems we only need the line to include the "to be onlined cpu" in this patch. Thanks, Zhong > Regards, > Srivatsa S. Bhat > > [1]. https://lwn.net/Articles/500185/ > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH powerpc] Set cpu sibling mask before online cpu
It seems following race is possible: cpu0cpux smp_init->cpu_up->_cpu_up __cpu_up kick_cpu(1) - waiting online ... ... notify CPU_STARTING set cpux active set cpux online - finish waiting online ... sched_init_smp init_sched_domains(cpu_active_mask) build_sched_domains set cpux sibling info - Execution of cpu0 and cpux could be concurrent between two separator lines. So if the cpux sibling information was set too late (normally impossible, but could be triggered by adding some delay in start_secondary, after setting cpu online), build_sched_domains() running on cpu0 might see cpux active, with an empty sibling mask, then cause some bad address accessing like following: [0.099855] Unable to handle kernel paging request for data at address 0xc0038518078f [0.099868] Faulting instruction address: 0xc00b7a64 [0.099883] Oops: Kernel access of bad area, sig: 11 [#1] [0.099895] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [0.099922] Modules linked in: [0.099940] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-rc1-00120-gb973425-dirty #16 [0.099956] task: c001fed8 ti: c001fed7c000 task.ti: c001fed7c000 [0.099971] NIP: c00b7a64 LR: c00b7a40 CTR: c00b4934 [0.099985] REGS: c001fed7f760 TRAP: 0300 Not tainted (3.10.0-rc1-00120-gb973425-dirty) [0.07] MSR: 80009032 CR: 24272828 XER: 2003 [0.100045] SOFTE: 1 [0.100053] CFAR: c0445ee8 [0.100064] DAR: c0038518078f, DSISR: 4000 [0.100073] GPR00: 0080 c001fed7f9e0 c0c84d48 0010 GPR04: 0010 c001fc55e090 GPR08: c0b80b30 c0c962d8 0003845ffc5f GPR12: cf33d000 c000b9e4 GPR16: 0001 GPR20: c0ccf750 c0c94d48 c001fc504000 GPR24: c001fc504000 c001fecef848 c0c94d48 c0ccf000 GPR28: c001fc522090 0010 c001fecef848 c001fed7fae0 [0.100293] NIP [c00b7a64] .get_group+0x84/0xc4 [0.100307] LR [c00b7a40] .get_group+0x60/0xc4 [0.100318] Call Trace: [0.100332] [c001fed7f9e0] [c00dbce4] .lock_is_held+0xa8/0xd0 (unreliable) [0.100354] [c001fed7fa70] [c00bf62c] .build_sched_domains+0x728/0xd14 [0.100375] [c001fed7fbe0] [c0af67bc] .sched_init_smp+0x4fc/0x654 [0.100394] [c001fed7fce0] [c0adce24] .kernel_init_freeable+0x17c/0x30c [0.100413] [c001fed7fdb0] [c000ba08] .kernel_init+0x24/0x12c [0.100431] [c001fed7fe30] [c0009f74] .ret_from_kernel_thread+0x5c/0x68 [0.100445] Instruction dump: [0.100456] 38800010 38a0 4838e3f5 6000 7c6307b4 2fbf 419e0040 3d220001 [0.100496] 78601f24 39491590 e93e0008 7d6a002a <7d69582a> f97f 7d4a002a e93e0010 [0.100559] ---[ end trace 31fd0ba7d8756001 ]--- This patch tries to move the sibling maps updating before notify_cpu_starting() and cpu online, and a write barrier there to make sure sibling maps are updated before active and online mask. Signed-off-by: Li Zhong --- arch/powerpc/kernel/smp.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index ee7ac5e..c765937 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -637,12 +637,10 @@ __cpuinit void start_secondary(void *unused) vdso_getcpu_init(); #endif - notify_cpu_starting(cpu); - set_cpu_online(cpu, true); /* Update sibling maps */ base = cpu_first_thread_sibling(cpu); for (i = 0; i < threads_per_core; i++) { - if (cpu_is_offline(base + i)) + if (cpu_is_offline(base + i) && (cpu != base + i)) continue; cpumask_set_cpu(cpu, cpu_sibling_mask(base + i)); cpumask_set_cpu(base + i, cpu_sibling_mask(cpu)); @@ -667,6 +665,10 @@ __cpuinit void start_secondary(void *unused) } of_node_put(l2_cache); + smp_wmb(); + notify_cpu_starting(cpu); + set_cpu_online(cpu, true); + local_irq_enable();
[RFC PATCH v4 5/5] powerpc: select HAVE_CONTEXT_TRACKING for pSeries
Start context tracking support from pSeries. Signed-off-by: Li Zhong --- arch/powerpc/platforms/pseries/Kconfig |1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 9a0941b..023b288 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -18,6 +18,7 @@ config PPC_PSERIES select PPC_PCI_CHOICE if EXPERT select ZLIB_DEFLATE select PPC_DOORBELL + select HAVE_CONTEXT_TRACKING default y config PPC_SPLPAR -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 4/5] powerpc: Use the new schedule_user API on userspace preemption
This patch corresponds to [PATCH] x86: Use the new schedule_user API on userspace preemption commit 0430499ce9d78691f3985962021b16bf8f8a8048 Signed-off-by: Li Zhong --- arch/powerpc/include/asm/context_tracking.h | 10 ++ arch/powerpc/kernel/entry_64.S |3 ++- 2 files changed, 12 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/context_tracking.h diff --git a/arch/powerpc/include/asm/context_tracking.h b/arch/powerpc/include/asm/context_tracking.h new file mode 100644 index 000..b6f5a33 --- /dev/null +++ b/arch/powerpc/include/asm/context_tracking.h @@ -0,0 +1,10 @@ +#ifndef _ASM_POWERPC_CONTEXT_TRACKING_H +#define _ASM_POWERPC_CONTEXT_TRACKING_H + +#ifdef CONFIG_CONTEXT_TRACKING +#define SCHEDULE_USER bl .schedule_user +#else +#define SCHEDULE_USER bl .schedule +#endif + +#endif diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 915fbb4..d418977 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -33,6 +33,7 @@ #include #include #include +#include /* * System calls. @@ -634,7 +635,7 @@ _GLOBAL(ret_from_except_lite) andi. r0,r4,_TIF_NEED_RESCHED beq 1f bl .restore_interrupts - bl .schedule + SCHEDULE_USER b .ret_from_except_lite 1: bl .save_nvgprs -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 3/5] powerpc: Exit user context on notify resume
This patch allows RCU usage in do_notify_resume, e.g. signal handling. It corresponds to [PATCH] x86: Exit RCU extended QS on notify resume commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6 Signed-off-by: Li Zhong --- arch/powerpc/kernel/signal.c |5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index cf12eae..d63b502 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -159,6 +160,8 @@ static int do_signal(struct pt_regs *regs) void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) { + user_exit(); + if (thread_info_flags & _TIF_UPROBE) uprobe_notify_resume(regs); @@ -169,4 +172,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); } + + user_enter(); } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 2/5] powerpc: Exception hooks for context tracking subsystem
This is the exception hooks for context tracking subsystem, including data access, program check, single step, instruction breakpoint, machine check, alignment, fp unavailable, altivec assist, unknown exception, whose handlers might use RCU. This patch corresponds to [PATCH] x86: Exception hooks for userspace RCU extended QS commit 6ba3c97a38803883c2eee489505796cb0a727122 But after the exception handling moved to generic code, and some changes in following two commits: 56dd9470d7c8734f055da2a6bac553caf4a468eb context_tracking: Move exception handling to generic code 6c1e0256fad84a843d915414e4b5973b7443d48d context_tracking: Restore correct previous context state on exception exit it is able for exception hooks to use the generic code above instead of a redundant arch implementation. Signed-off-by: Li Zhong --- arch/powerpc/kernel/traps.c | 80 --- arch/powerpc/mm/fault.c | 41 +--- arch/powerpc/mm/hash_utils_64.c | 36 +- 3 files changed, 112 insertions(+), 45 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 83efa2f..a7a648f 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -667,6 +668,7 @@ int machine_check_generic(struct pt_regs *regs) void machine_check_exception(struct pt_regs *regs) { + enum ctx_state prev_state = exception_enter(); int recover = 0; __get_cpu_var(irq_stat).mce_exceptions++; @@ -683,7 +685,7 @@ void machine_check_exception(struct pt_regs *regs) recover = cur_cpu_spec->machine_check(regs); if (recover > 0) - return; + goto bail; #if defined(CONFIG_8xx) && defined(CONFIG_PCI) /* the qspan pci read routines can cause machine checks -- Cort @@ -693,20 +695,23 @@ void machine_check_exception(struct pt_regs *regs) * -- BenH */ bad_page_fault(regs, regs->dar, SIGBUS); - return; + goto bail; #endif if (debugger_fault_handler(regs)) - return; + goto bail; if (check_io_access(regs)) - return; + goto bail; die("Machine check", regs, SIGBUS); /* Must die if the interrupt is not recoverable */ if (!(regs->msr & MSR_RI)) panic("Unrecoverable Machine check"); + +bail: + exception_exit(prev_state); } void SMIException(struct pt_regs *regs) @@ -716,20 +721,29 @@ void SMIException(struct pt_regs *regs) void unknown_exception(struct pt_regs *regs) { + enum ctx_state prev_state = exception_enter(); + printk("Bad trap at PC: %lx, SR: %lx, vector=%lx\n", regs->nip, regs->msr, regs->trap); _exception(SIGTRAP, regs, 0, 0); + + exception_exit(prev_state); } void instruction_breakpoint_exception(struct pt_regs *regs) { + enum ctx_state prev_state = exception_enter(); + if (notify_die(DIE_IABR_MATCH, "iabr_match", regs, 5, 5, SIGTRAP) == NOTIFY_STOP) - return; + goto bail; if (debugger_iabr_match(regs)) - return; + goto bail; _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); + +bail: + exception_exit(prev_state); } void RunModeException(struct pt_regs *regs) @@ -739,15 +753,20 @@ void RunModeException(struct pt_regs *regs) void __kprobes single_step_exception(struct pt_regs *regs) { + enum ctx_state prev_state = exception_enter(); + clear_single_step(regs); if (notify_die(DIE_SSTEP, "single_step", regs, 5, 5, SIGTRAP) == NOTIFY_STOP) - return; + goto bail; if (debugger_sstep(regs)) - return; + goto bail; _exception(SIGTRAP, regs, TRAP_TRACE, regs->nip); + +bail: + exception_exit(prev_state); } /* @@ -1005,6 +1024,7 @@ int is_valid_bugaddr(unsigned long addr) void __kprobes program_check_exception(struct pt_regs *regs) { + enum ctx_state prev_state = exception_enter(); unsigned int reason = get_reason(regs); extern int do_mathemu(struct pt_regs *regs); @@ -1014,26 +1034,26 @@ void __kprobes program_check_exception(struct pt_regs *regs) if (reason & REASON_FP) { /* IEEE FP exception */ parse_fpe(regs); - return; + goto bail; } if (reason & REASON_TRAP) { /* Debugger is first in line to stop recursive faults in * rcu_lock, notify_die, or atomic_notifier_call_chain */ if (debugger_bpt(regs)) - ret
[RFC PATCH v4 1/5] powerpc: Syscall hooks for context tracking subsystem
This is the syscall slow path hooks for context tracking subsystem, corresponding to [PATCH] x86: Syscall hooks for userspace RCU extended QS commit bf5a3c13b939813d28ce26c01425054c740d6731 TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is better for it to be in the same 16 bits with others in the group, so in the asm code, andi. with this group could work. Signed-off-by: Li Zhong Acked-by: Frederic Weisbecker --- arch/powerpc/include/asm/thread_info.h |7 +-- arch/powerpc/kernel/ptrace.c |5 + 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 8ceea14..ba7b197 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -97,7 +97,7 @@ static inline struct thread_info *current_thread_info(void) #define TIF_PERFMON_CTXSW 6 /* perfmon needs ctxsw calls */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SINGLESTEP 8 /* singlestepping active */ -#define TIF_MEMDIE 9 /* is terminating due to OOM killer */ +#define TIF_NOHZ 9 /* in adaptive nohz mode */ #define TIF_SECCOMP10 /* secure computing */ #define TIF_RESTOREALL 11 /* Restore all regs (implies NOERROR) */ #define TIF_NOERROR12 /* Force successful syscall return */ @@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void) #define TIF_SYSCALL_TRACEPOINT 15 /* syscall tracepoint instrumentation */ #define TIF_EMULATE_STACK_STORE16 /* Is an instruction emulation for stack store? */ +#define TIF_MEMDIE 17 /* is terminating due to OOM killer */ /* as above, but as bit values */ #define _TIF_SYSCALL_TRACE (1< #include #include +#include #include #include @@ -1788,6 +1789,8 @@ long do_syscall_trace_enter(struct pt_regs *regs) { long ret = 0; + user_exit(); + secure_computing_strict(regs->gpr[0]); if (test_thread_flag(TIF_SYSCALL_TRACE) && @@ -1832,4 +1835,6 @@ void do_syscall_trace_leave(struct pt_regs *regs) step = test_thread_flag(TIF_SINGLESTEP); if (step || test_thread_flag(TIF_SYSCALL_TRACE)) tracehook_report_syscall_exit(regs, step); + + user_enter(); } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 0/5] powerpc: Support context tracking for Power pSeries
These patches try to support context tracking for Power arch, beginning with 64-bit pSeries. The codes are ported from that of the x86_64, and in each patch, I listed the corresponding patch for x86. v4: fixed some cosmetic issues suggested by Ben. Li Zhong (5): powerpc: Syscall hooks for context tracking subsystem powerpc: Exception hooks for context tracking subsystem powerpc: Exit user context on notify resume powerpc: Use the new schedule_user API on userspace preemption powerpc: select HAVE_CONTEXT_TRACKING for pSeries arch/powerpc/include/asm/context_tracking.h | 10 arch/powerpc/include/asm/thread_info.h |7 ++- arch/powerpc/kernel/entry_64.S |3 +- arch/powerpc/kernel/ptrace.c|5 ++ arch/powerpc/kernel/signal.c|5 ++ arch/powerpc/kernel/traps.c | 80 +++ arch/powerpc/mm/fault.c | 41 +- arch/powerpc/mm/hash_utils_64.c | 36 +--- arch/powerpc/platforms/pseries/Kconfig |1 + 9 files changed, 140 insertions(+), 48 deletions(-) create mode 100644 arch/powerpc/include/asm/context_tracking.h -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 2/5] powerpc: Exception hooks for context tracking subsystem
On Mon, 2013-05-13 at 19:06 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2013-05-13 at 16:44 +0800, Li Zhong wrote: > > Yes, the above and hash_page() are two C functions for a same exception. > > And the exception hooks enable RCU usage in those C codes. But for asm > > codes, I think we could assume that there would be no RCU usage there, > > so we don't need wrap them in the hooks. > > hash_page() won't start a new RCU, at least not in its current incarnation, > the only thing I can see it ever doing would be to take some RCU read locks > one > day (it doesn't today). Seems I added the hooks because of the trace point of hcall entry/exit ... Thanks, Zhong > > low_hash_fault() is a different beast. It will typically kill things, thus > involving sending signals etc... RCU might well be involved. > > Cheers, > Ben. > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 0/5] powerpc: Support context tracking for Power pSeries
On Mon, 2013-05-13 at 18:59 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2013-05-13 at 16:03 +0800, Li Zhong wrote: > > > > To my understanding, it is used to enable RCU user extended quiescent > > state, so RCU on that cpu doesn't need scheduler ticks. And together > > with some other code(already in 3.10), we are able to remove the ticks > > in some cases (e.g. only 1 task running on the cpu, with some other > > limitations). > > Ok, sounds interesting. Once you fix the little cosmetic issue, I don't > see any reason not to merge them as it's basically wiring up an existing > feature (in that regard the patches are pretty straightforward) and I > assume the overhead is only there when you enable it. Thanks for your support, Ben. Yes, there should be no overhead if not enabled. Thanks, Zhong > > Cheers, > Ben. > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 2/5] powerpc: Exception hooks for context tracking subsystem
On Mon, 2013-05-13 at 15:57 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2013-05-13 at 13:21 +0800, Li Zhong wrote: > > int recover = 0; > > + enum ctx_state prev_state; > > + > > + prev_state = exception_enter(); > > Please make it nicer: > > enum ctx_state prev_state = exception_enter(); > int recover = 0; OK, I'll fix it, and all the others. > > __get_cpu_var(irq_stat).mce_exceptions++; > > > > @@ -683,7 +687,7 @@ void machine_check_exception(struct pt_regs *regs) > > recover = cur_cpu_spec->machine_check(regs); > > > > if (recover > 0) > > - return; > > + goto exit; > > I'm no fan of "exit" as a label as it's otherwise a libc function (I > know there is no relationship here but I just find that annoying). > > I usually use "bail" OK, I'll fix it, and all the others. > > > #if defined(CONFIG_8xx) && defined(CONFIG_PCI) > > /* the qspan pci read routines can cause machine checks -- Cort > > @@ -693,20 +697,23 @@ void machine_check_exception(struct pt_regs *regs) > > * -- BenH > > */ > > bad_page_fault(regs, regs->dar, SIGBUS); > > - return; > > + goto exit; > > #endif > > > > if (debugger_fault_handler(regs)) > > - return; > > + goto exit; > > > > if (check_io_access(regs)) > > - return; > > + goto exit; > > > > die("Machine check", regs, SIGBUS); > > > > /* Must die if the interrupt is not recoverable */ > > if (!(regs->msr & MSR_RI)) > > panic("Unrecoverable Machine check"); > > + > > +exit: > > + exception_exit(prev_state); > > } > > > > void SMIException(struct pt_regs *regs) > > @@ -716,20 +723,31 @@ void SMIException(struct pt_regs *regs) > > > > void unknown_exception(struct pt_regs *regs) > > { > > + enum ctx_state prev_state; > > + prev_state = exception_enter(); > > + > > Same cosmetic comment for all other cases > > /... > > > #include > > #include > > @@ -193,8 +194,8 @@ static int mm_fault_error(struct pt_regs *regs, > > unsigned long addr, int fault) > > * The return value is 0 if the fault was handled, or the signal > > * number if this is a kernel fault that can't be handled here. > > */ > > -int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address, > > - unsigned long error_code) > > +static int __kprobes __do_page_fault(struct pt_regs *regs, > > + unsigned long address, unsigned long error_code) > > { > > struct vm_area_struct * vma; > > struct mm_struct *mm = current->mm; > > @@ -475,6 +476,17 @@ bad_area_nosemaphore: > > > > } > > > > +int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address, > > + unsigned long error_code) > > +{ > > + int ret; > > + enum ctx_state prev_state; > > + prev_state = exception_enter(); > > + ret = __do_page_fault(regs, address, error_code); > > + exception_exit(prev_state); > > + return ret; > > +} > > Any reason why you didn't use the same wrapping trick in traps.c ? I'd > rather we stay consistent in the way we do things between the two files. OK, I'll make it consistent with those in traps.c > > > /* > > * bad_page_fault is called when we have a bad access from the kernel. > > * It is called from the DSI and ISI handlers in head.S and from some > > diff --git a/arch/powerpc/mm/hash_utils_64.c > > b/arch/powerpc/mm/hash_utils_64.c > > index 88ac0ee..d92fb26 100644 > > --- a/arch/powerpc/mm/hash_utils_64.c > > +++ b/arch/powerpc/mm/hash_utils_64.c > > @@ -33,6 +33,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -962,6 +963,9 @@ int hash_page(unsigned long ea, unsigned long access, > > unsigned long trap) > > const struct cpumask *tmp; > > int rc, user_region = 0, local = 0; > > int psize, ssize; > > + enum ctx_state prev_state; > > + > > + prev_state = exception_enter(); > > > > DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n", > > ea, access, trap); > > @@ -973,7 +977,8 @@ int hash_page(unsigned long ea, unsigned long a
Re: [RFC PATCH v3 0/5] powerpc: Support context tracking for Power pSeries
On Mon, 2013-05-13 at 15:51 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2013-05-13 at 13:21 +0800, Li Zhong wrote: > > These patches try to support context tracking for Power arch, beginning with > > 64-bit pSeries. The codes are ported from that of the x86_64, and in each > > patch, I listed the corresponding patch for x86. > > So that's yet another pile of bloat on all syscall entry/exit and > exception entry/exit. What is it used for ? (I haven't followed on > x86_64 side). To my understanding, it is used to enable RCU user extended quiescent state, so RCU on that cpu doesn't need scheduler ticks. And together with some other code(already in 3.10), we are able to remove the ticks in some cases (e.g. only 1 task running on the cpu, with some other limitations). Maybe Paul, or Frederic could give some better descriptions. Thanks, Zhong > > Cheers, > Ben. > > > v3: > > > > This version is mainly a rebasing, against 3.10-rc1, also as the common > > code > > to handle the exception are pulled into 3.10, so there is no dependency on > > tip tree. So patch #2 and #6 in previous version_2 is merged together. > > > > Li Zhong (5): > > powerpc: Syscall hooks for context tracking subsystem > > powerpc: Exception hooks for context tracking subsystem > > powerpc: Exit user context on notify resume > > powerpc: Use the new schedule_user API on userspace preemption > > powerpc: select HAVE_CONTEXT_TRACKING for pSeries > > > > arch/powerpc/include/asm/context_tracking.h | 10 +++ > > arch/powerpc/include/asm/thread_info.h |7 ++- > > arch/powerpc/kernel/entry_64.S |3 +- > > arch/powerpc/kernel/ptrace.c|5 ++ > > arch/powerpc/kernel/signal.c|5 ++ > > arch/powerpc/kernel/traps.c | 91 > > --- > > arch/powerpc/mm/fault.c | 16 - > > arch/powerpc/mm/hash_utils_64.c | 38 --- > > arch/powerpc/platforms/pseries/Kconfig |1 + > > 9 files changed, 140 insertions(+), 36 deletions(-) > > create mode 100644 arch/powerpc/include/asm/context_tracking.h > > > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 5/5] powerpc: select HAVE_CONTEXT_TRACKING for pSeries
Start context tracking support from pSeries. Signed-off-by: Li Zhong --- arch/powerpc/platforms/pseries/Kconfig |1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 9a0941b..023b288 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -18,6 +18,7 @@ config PPC_PSERIES select PPC_PCI_CHOICE if EXPERT select ZLIB_DEFLATE select PPC_DOORBELL + select HAVE_CONTEXT_TRACKING default y config PPC_SPLPAR -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 4/5] powerpc: Use the new schedule_user API on userspace preemption
This patch corresponds to [PATCH] x86: Use the new schedule_user API on userspace preemption commit 0430499ce9d78691f3985962021b16bf8f8a8048 Signed-off-by: Li Zhong --- arch/powerpc/include/asm/context_tracking.h | 10 ++ arch/powerpc/kernel/entry_64.S |3 ++- 2 files changed, 12 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/context_tracking.h diff --git a/arch/powerpc/include/asm/context_tracking.h b/arch/powerpc/include/asm/context_tracking.h new file mode 100644 index 000..b6f5a33 --- /dev/null +++ b/arch/powerpc/include/asm/context_tracking.h @@ -0,0 +1,10 @@ +#ifndef _ASM_POWERPC_CONTEXT_TRACKING_H +#define _ASM_POWERPC_CONTEXT_TRACKING_H + +#ifdef CONFIG_CONTEXT_TRACKING +#define SCHEDULE_USER bl .schedule_user +#else +#define SCHEDULE_USER bl .schedule +#endif + +#endif diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 915fbb4..d418977 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -33,6 +33,7 @@ #include #include #include +#include /* * System calls. @@ -634,7 +635,7 @@ _GLOBAL(ret_from_except_lite) andi. r0,r4,_TIF_NEED_RESCHED beq 1f bl .restore_interrupts - bl .schedule + SCHEDULE_USER b .ret_from_except_lite 1: bl .save_nvgprs -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 3/5] powerpc: Exit user context on notify resume
This patch allows RCU usage in do_notify_resume, e.g. signal handling. It corresponds to [PATCH] x86: Exit RCU extended QS on notify resume commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6 Signed-off-by: Li Zhong --- arch/powerpc/kernel/signal.c |5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index cf12eae..d63b502 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -159,6 +160,8 @@ static int do_signal(struct pt_regs *regs) void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) { + user_exit(); + if (thread_info_flags & _TIF_UPROBE) uprobe_notify_resume(regs); @@ -169,4 +172,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); } + + user_enter(); } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 2/5] powerpc: Exception hooks for context tracking subsystem
This is the exception hooks for context tracking subsystem, including data access, program check, single step, instruction breakpoint, machine check, alignment, fp unavailable, altivec assist, unknown exception, whose handlers might use RCU. This patch corresponds to [PATCH] x86: Exception hooks for userspace RCU extended QS commit 6ba3c97a38803883c2eee489505796cb0a727122 But after the exception handling moved to generic code, and some changes in following two commits: 56dd9470d7c8734f055da2a6bac553caf4a468eb context_tracking: Move exception handling to generic code 6c1e0256fad84a843d915414e4b5973b7443d48d context_tracking: Restore correct previous context state on exception exit it is able for exception hooks to use the generic code above instead of a redundant arch implementation. Signed-off-by: Li Zhong --- arch/powerpc/kernel/traps.c | 91 +-- arch/powerpc/mm/fault.c | 16 ++- arch/powerpc/mm/hash_utils_64.c | 38 3 files changed, 112 insertions(+), 33 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 83efa2f..9d3c000 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -668,6 +669,9 @@ int machine_check_generic(struct pt_regs *regs) void machine_check_exception(struct pt_regs *regs) { int recover = 0; + enum ctx_state prev_state; + + prev_state = exception_enter(); __get_cpu_var(irq_stat).mce_exceptions++; @@ -683,7 +687,7 @@ void machine_check_exception(struct pt_regs *regs) recover = cur_cpu_spec->machine_check(regs); if (recover > 0) - return; + goto exit; #if defined(CONFIG_8xx) && defined(CONFIG_PCI) /* the qspan pci read routines can cause machine checks -- Cort @@ -693,20 +697,23 @@ void machine_check_exception(struct pt_regs *regs) * -- BenH */ bad_page_fault(regs, regs->dar, SIGBUS); - return; + goto exit; #endif if (debugger_fault_handler(regs)) - return; + goto exit; if (check_io_access(regs)) - return; + goto exit; die("Machine check", regs, SIGBUS); /* Must die if the interrupt is not recoverable */ if (!(regs->msr & MSR_RI)) panic("Unrecoverable Machine check"); + +exit: + exception_exit(prev_state); } void SMIException(struct pt_regs *regs) @@ -716,20 +723,31 @@ void SMIException(struct pt_regs *regs) void unknown_exception(struct pt_regs *regs) { + enum ctx_state prev_state; + prev_state = exception_enter(); + printk("Bad trap at PC: %lx, SR: %lx, vector=%lx\n", regs->nip, regs->msr, regs->trap); _exception(SIGTRAP, regs, 0, 0); + + exception_exit(prev_state); } void instruction_breakpoint_exception(struct pt_regs *regs) { + enum ctx_state prev_state; + prev_state = exception_enter(); + if (notify_die(DIE_IABR_MATCH, "iabr_match", regs, 5, 5, SIGTRAP) == NOTIFY_STOP) - return; + goto exit; if (debugger_iabr_match(regs)) - return; + goto exit; _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); + +exit: + exception_exit(prev_state); } void RunModeException(struct pt_regs *regs) @@ -739,15 +757,21 @@ void RunModeException(struct pt_regs *regs) void __kprobes single_step_exception(struct pt_regs *regs) { + enum ctx_state prev_state; + prev_state = exception_enter(); + clear_single_step(regs); if (notify_die(DIE_SSTEP, "single_step", regs, 5, 5, SIGTRAP) == NOTIFY_STOP) - return; + goto exit; if (debugger_sstep(regs)) - return; + goto exit; _exception(SIGTRAP, regs, TRAP_TRACE, regs->nip); + +exit: + exception_exit(prev_state); } /* @@ -1006,34 +1030,37 @@ int is_valid_bugaddr(unsigned long addr) void __kprobes program_check_exception(struct pt_regs *regs) { unsigned int reason = get_reason(regs); + enum ctx_state prev_state; extern int do_mathemu(struct pt_regs *regs); + prev_state = exception_enter(); + /* We can now get here via a FP Unavailable exception if the core * has no FPU, in that case the reason flags will be 0 */ if (reason & REASON_FP) { /* IEEE FP exception */ parse_fpe(regs); - return; + goto exit; } if (reason & REASON_TRAP) { /* Debugger is first in line to stop recursive fault
[RFC PATCH v3 1/5] powerpc: Syscall hooks for context tracking subsystem
This is the syscall slow path hooks for context tracking subsystem, corresponding to [PATCH] x86: Syscall hooks for userspace RCU extended QS commit bf5a3c13b939813d28ce26c01425054c740d6731 TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is better for it to be in the same 16 bits with others in the group, so in the asm code, andi. with this group could work. Signed-off-by: Li Zhong Acked-by: Frederic Weisbecker --- arch/powerpc/include/asm/thread_info.h |7 +-- arch/powerpc/kernel/ptrace.c |5 + 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 8ceea14..ba7b197 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -97,7 +97,7 @@ static inline struct thread_info *current_thread_info(void) #define TIF_PERFMON_CTXSW 6 /* perfmon needs ctxsw calls */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SINGLESTEP 8 /* singlestepping active */ -#define TIF_MEMDIE 9 /* is terminating due to OOM killer */ +#define TIF_NOHZ 9 /* in adaptive nohz mode */ #define TIF_SECCOMP10 /* secure computing */ #define TIF_RESTOREALL 11 /* Restore all regs (implies NOERROR) */ #define TIF_NOERROR12 /* Force successful syscall return */ @@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void) #define TIF_SYSCALL_TRACEPOINT 15 /* syscall tracepoint instrumentation */ #define TIF_EMULATE_STACK_STORE16 /* Is an instruction emulation for stack store? */ +#define TIF_MEMDIE 17 /* is terminating due to OOM killer */ /* as above, but as bit values */ #define _TIF_SYSCALL_TRACE (1< #include #include +#include #include #include @@ -1788,6 +1789,8 @@ long do_syscall_trace_enter(struct pt_regs *regs) { long ret = 0; + user_exit(); + secure_computing_strict(regs->gpr[0]); if (test_thread_flag(TIF_SYSCALL_TRACE) && @@ -1832,4 +1835,6 @@ void do_syscall_trace_leave(struct pt_regs *regs) step = test_thread_flag(TIF_SINGLESTEP); if (step || test_thread_flag(TIF_SYSCALL_TRACE)) tracehook_report_syscall_exit(regs, step); + + user_enter(); } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 0/5] powerpc: Support context tracking for Power pSeries
These patches try to support context tracking for Power arch, beginning with 64-bit pSeries. The codes are ported from that of the x86_64, and in each patch, I listed the corresponding patch for x86. v3: This version is mainly a rebasing, against 3.10-rc1, also as the common code to handle the exception are pulled into 3.10, so there is no dependency on tip tree. So patch #2 and #6 in previous version_2 is merged together. Li Zhong (5): powerpc: Syscall hooks for context tracking subsystem powerpc: Exception hooks for context tracking subsystem powerpc: Exit user context on notify resume powerpc: Use the new schedule_user API on userspace preemption powerpc: select HAVE_CONTEXT_TRACKING for pSeries arch/powerpc/include/asm/context_tracking.h | 10 +++ arch/powerpc/include/asm/thread_info.h |7 ++- arch/powerpc/kernel/entry_64.S |3 +- arch/powerpc/kernel/ptrace.c|5 ++ arch/powerpc/kernel/signal.c|5 ++ arch/powerpc/kernel/traps.c | 91 --- arch/powerpc/mm/fault.c | 16 - arch/powerpc/mm/hash_utils_64.c | 38 --- arch/powerpc/platforms/pseries/Kconfig |1 + 9 files changed, 140 insertions(+), 36 deletions(-) create mode 100644 arch/powerpc/include/asm/context_tracking.h -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again
Saw this warning again, and this time from the ret_from_fork path. It seems we could clear the back chain earlier in copy_thread(), which could cover both path, and also fix potential lockdep usage in schedule_tail(), or exception occurred before we clear the back chain. Signed-off-by: Li Zhong --- arch/powerpc/kernel/entry_32.S |2 -- arch/powerpc/kernel/entry_64.S |2 -- arch/powerpc/kernel/process.c |1 + 3 files changed, 1 insertion(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index e514de5..d22e73e 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -439,8 +439,6 @@ ret_from_fork: ret_from_kernel_thread: REST_NVGPRS(r1) bl schedule_tail - li r3,0 - stw r3,0(r1) mtlrr14 mr r3,r15 PPC440EP_ERR42 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 3fe5259..48e8a86 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -376,8 +376,6 @@ _GLOBAL(ret_from_fork) _GLOBAL(ret_from_kernel_thread) bl .schedule_tail REST_NVGPRS(r1) - li r3,0 - std r3,0(r1) ld r14, 0(r14) mtlrr14 mr r3,r15 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index ceb4e7b..80af366 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -971,6 +971,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, * do some house keeping and then return from the fork or clone * system call, using the stack frame created above. */ + ((unsigned long *)sp)[0] = 0; sp -= sizeof(struct pt_regs); kregs = (struct pt_regs *) sp; sp -= STACK_FRAME_OVERHEAD; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again
On Tue, 2013-05-07 at 10:32 +0800, Li Zhong wrote: > Saw this warning again, and this time from the ret_from_fork path. > > It seems we could clear the back chain earlier in copy_thread(), which > could cover both path, and also fix potential lockdep usage in > schedule_tail(), or exception occurred before we clear the back chain. Sorry, I made some mistake, please ignore this patch... It seems clearing the back chain shouldn't use kregs->gpr[1] below, it should be ((unsigned long*)sp)[0]. I'll send an updated version. Thanks, Zhong > > Signed-off-by: Li Zhong > --- > arch/powerpc/kernel/entry_32.S |2 -- > arch/powerpc/kernel/entry_64.S |2 -- > arch/powerpc/kernel/process.c |1 + > 3 files changed, 1 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S > index e514de5..d22e73e 100644 > --- a/arch/powerpc/kernel/entry_32.S > +++ b/arch/powerpc/kernel/entry_32.S > @@ -439,8 +439,6 @@ ret_from_fork: > ret_from_kernel_thread: > REST_NVGPRS(r1) > bl schedule_tail > - li r3,0 > - stw r3,0(r1) > mtlrr14 > mr r3,r15 > PPC440EP_ERR42 > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index 3fe5259..48e8a86 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -376,8 +376,6 @@ _GLOBAL(ret_from_fork) > _GLOBAL(ret_from_kernel_thread) > bl .schedule_tail > REST_NVGPRS(r1) > - li r3,0 > - std r3,0(r1) > ld r14, 0(r14) > mtlrr14 > mr r3,r15 > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > index ceb4e7b..2c9fc5e 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -973,6 +973,7 @@ int copy_thread(unsigned long clone_flags, unsigned long > usp, >*/ > sp -= sizeof(struct pt_regs); > kregs = (struct pt_regs *) sp; > + kregs->gpr[1] = 0; > sp -= STACK_FRAME_OVERHEAD; > p->thread.ksp = sp; > p->thread.ksp_limit = (unsigned long)task_stack_page(p) + ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again
Saw this warning again, and this time from the ret_from_fork path. It seems we could clear the back chain earlier in copy_thread(), which could cover both path, and also fix potential lockdep usage in schedule_tail(), or exception occurred before we clear the back chain. Signed-off-by: Li Zhong --- arch/powerpc/kernel/entry_32.S |2 -- arch/powerpc/kernel/entry_64.S |2 -- arch/powerpc/kernel/process.c |1 + 3 files changed, 1 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index e514de5..d22e73e 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -439,8 +439,6 @@ ret_from_fork: ret_from_kernel_thread: REST_NVGPRS(r1) bl schedule_tail - li r3,0 - stw r3,0(r1) mtlrr14 mr r3,r15 PPC440EP_ERR42 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 3fe5259..48e8a86 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -376,8 +376,6 @@ _GLOBAL(ret_from_fork) _GLOBAL(ret_from_kernel_thread) bl .schedule_tail REST_NVGPRS(r1) - li r3,0 - std r3,0(r1) ld r14, 0(r14) mtlrr14 mr r3,r15 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index ceb4e7b..2c9fc5e 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -973,6 +973,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, */ sp -= sizeof(struct pt_regs); kregs = (struct pt_regs *) sp; + kregs->gpr[1] = 0; sp -= STACK_FRAME_OVERHEAD; p->thread.ksp = sp; p->thread.ksp_limit = (unsigned long)task_stack_page(p) + -- 1.7.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH powerpc] make CONFIG_NUMA depends on CONFIG_SMP
On Thu, 2013-04-18 at 11:46 +1000, Michael Ellerman wrote: > On Wed, May 30, 2012 at 05:31:58PM +0800, Li Zhong wrote: > > I'm not sure whether it makes sense to add this dependency to avoid > > CONFI_NUMA && !CONFIG_SMP. > > > > I want to do this because I saw some build errors on next-tree when > > compiling with CONFIG_SMP disabled, and it seems they are caused by some > > codes under the CONFIG_NUMA #ifdefs. > > This seems to make sense to me. Can you please repost with a better > changelog and a description of the actual build error you were seeing. I tried it today, but didn't find any build errors any more, guess those errors should have already been fixed. But it seems to me by disabling CONFIG_NUMA when CONFIG_SMP is disabled, could at least prevent some unnecessary code being compiled into the kernel. (After building a kernel with/without CONFIG_NUMA just now, it seems that the vmlinux is ~100K smaller without CONFIG_NUMA). I'm not sure whether this is still needed. Thanks, Zhong > > cheers > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 3/3] powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()
This patch fixes the following oops, which could be trigged by build the kernel with many concurrent threads, under CONFIG_DEBUG_PAGEALLOC. hpte_insert() might return -1, indicating that the bucket (primary here) is full. We are not necessarily reporting a BUG in this case. Instead, we could try repeatedly (try secondary, remove and try again) until we find a slot. [ 543.075675] [ cut here ] [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] [ 543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 543.075741] Modules linked in: binfmt_misc ehea [ 543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: c005a594 [ 543.075771] REGS: c000a90832c0 TRAP: 0700 Not tainted (3.8.0-next-20130222) [ 543.075781] MSR: 80029032 CR: 4482 XER: [ 543.075816] SOFTE: 0 [ 543.075823] CFAR: c004c200 [ 543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: c000a908 CPU: 1 GPR00: 0001 c000a9083540 c0c600a8 GPR04: 0050 fffa c000a90834e0 004ff594 GPR08: 0001 9592d4d8 c0c86854 GPR12: 0002 c6ead300 00a51000 0001 GPR16: f3354380 ff80 GPR20: 0001 c0c600a8 0001 0001 GPR24: 03354380 c000 c0b65950 GPR28: 0020 000cd50e 00bf50d9 c0c7c230 [ 543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8 [ 543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 [ 543.076025] Call Trace: [ 543.076033] [c000a9083540] [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable) [ 543.076053] [c000a9083640] [c0167638] .get_page_from_freelist+0x6cc/0x8dc [ 543.076067] [c000a9083800] [c0167a48] .__alloc_pages_nodemask+0x200/0x96c [ 543.076082] [c000a90839c0] [c01ade44] .alloc_pages_vma+0x160/0x1e4 [ 543.076098] [c000a9083a80] [c018ce04] .handle_pte_fault+0x1b0/0x7e8 [ 543.076113] [c000a9083b50] [c018d5a8] .handle_mm_fault+0x16c/0x1a0 [ 543.076129] [c000a9083c00] [c07bf1dc] .do_page_fault+0x4d0/0x7a4 [ 543.076144] [c000a9083e30] [c00090e8] handle_page_fault+0x10/0x30 [ 543.076155] Instruction dump: [ 543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de e96a0010 e84a0008 [ 543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b00> 7f63db78 48785781 6000 [ 543.076224] ---[ end trace bd5807e8d6ae186b ]--- Signed-off-by: Li Zhong --- arch/powerpc/mm/hash_utils_64.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index ead9fa8..1ed4419 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1268,21 +1268,22 @@ repeat: #ifdef CONFIG_DEBUG_PAGEALLOC static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) { - unsigned long hash, hpteg; + unsigned long hash; unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize); unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize); unsigned long mode = htab_convert_pte_flags(PAGE_KERNEL); - int ret; + long ret; hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize); - hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); /* Don't create HPTE entries for bad address */ if (!vsid) return; - ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr), -mode, HPTE_V_BOLTED, -mmu_linear_psize, mmu_kernel_ssize); + + ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode, + HPTE_V_BOLTED, + mmu_linear_psize, mmu_kernel_ssize); + BUG_ON (ret < 0); spin_lock(&linear_map_hash_lock); BUG_ON(linear_map_hash_slots[lmi] & 0x80); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 2/3] powerpc: Split the code trying to insert hpte repeatedly as an helper function
Move the logic trying to insert hpte in __hash_page_huge() to an helper function, so it could also be used by others. Signed-off-by: Li Zhong --- arch/powerpc/mm/hash_utils_64.c | 35 ++ arch/powerpc/mm/hugetlbpage-hash64.c | 31 ++ 2 files changed, 41 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index f410c3e..ead9fa8 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1230,6 +1230,41 @@ void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc) bad_page_fault(regs, address, SIGBUS); } +long hpte_insert_repeating(unsigned long hash, unsigned long vpn, + unsigned long pa, unsigned long rflags, + unsigned long vflags, int psize, int ssize) +{ + unsigned long hpte_group; + long slot; + +repeat: + hpte_group = ((hash & htab_hash_mask) * + HPTES_PER_GROUP) & ~0x7UL; + + /* Insert into the hash table, primary slot */ + slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, vflags, + psize, ssize); + + /* Primary is full, try the secondary */ + if (unlikely(slot == -1)) { + hpte_group = ((~hash & htab_hash_mask) * + HPTES_PER_GROUP) & ~0x7UL; + slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, + vflags | HPTE_V_SECONDARY, + psize, ssize); + if (slot == -1) { + if (mftb() & 0x1) + hpte_group = ((hash & htab_hash_mask) * + HPTES_PER_GROUP)&~0x7UL; + + ppc_md.hpte_remove(hpte_group); + goto repeat; + } + } + + return slot; +} + #ifdef CONFIG_DEBUG_PAGEALLOC static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) { diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c index edb4129..b913f41 100644 --- a/arch/powerpc/mm/hugetlbpage-hash64.c +++ b/arch/powerpc/mm/hugetlbpage-hash64.c @@ -14,6 +14,10 @@ #include #include +extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn, + unsigned long pa, unsigned long rlags, + unsigned long vflags, int psize, int ssize); + int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, pte_t *ptep, unsigned long trap, int local, int ssize, unsigned int shift, unsigned int mmu_psize) @@ -83,7 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, if (likely(!(old_pte & _PAGE_HASHPTE))) { unsigned long hash = hpt_hash(vpn, shift, ssize); - unsigned long hpte_group; pa = pte_pfn(__pte(old_pte)) << PAGE_SHIFT; @@ -97,30 +100,8 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE | _PAGE_COHERENT | _PAGE_GUARDED)); -repeat: - hpte_group = ((hash & htab_hash_mask) * - HPTES_PER_GROUP) & ~0x7UL; - - /* Insert into the hash table, primary slot */ - slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, - mmu_psize, ssize); - - /* Primary is full, try the secondary */ - if (unlikely(slot == -1)) { - hpte_group = ((~hash & htab_hash_mask) * - HPTES_PER_GROUP) & ~0x7UL; - slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, - HPTE_V_SECONDARY, - mmu_psize, ssize); - if (slot == -1) { - if (mftb() & 0x1) - hpte_group = ((hash & htab_hash_mask) * - HPTES_PER_GROUP)&~0x7UL; - - ppc_md.hpte_remove(hpte_group); - goto repeat; -} - } + slot = hpte_insert_repeating(hash, vpn, pa, rflags, 0, +mmu_psize, ssize); /* * Hypervisor failure. Restore old pte and return -1 -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 1/3] powerpc: Move the setting of rflags out of loop in __hash_page_huge
It seems that new_pte and rflags don't get changed in the repeating loop, so move their assignment out of the loop. Signed-off-by: Li Zhong --- arch/powerpc/mm/hugetlbpage-hash64.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c index cecad34..edb4129 100644 --- a/arch/powerpc/mm/hugetlbpage-hash64.c +++ b/arch/powerpc/mm/hugetlbpage-hash64.c @@ -87,10 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, pa = pte_pfn(__pte(old_pte)) << PAGE_SHIFT; -repeat: - hpte_group = ((hash & htab_hash_mask) * - HPTES_PER_GROUP) & ~0x7UL; - /* clear HPTE slot informations in new PTE */ #ifdef CONFIG_PPC_64K_PAGES new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0; @@ -101,6 +97,10 @@ repeat: rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE | _PAGE_COHERENT | _PAGE_GUARDED)); +repeat: + hpte_group = ((hash & htab_hash_mask) * + HPTES_PER_GROUP) & ~0x7UL; + /* Insert into the hash table, primary slot */ slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, mmu_psize, ssize); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 3/4] powerpc: Don't bolt the hpte in kernel_map_linear_page()
On Mon, 2013-04-15 at 13:27 +0200, Benjamin Herrenschmidt wrote: > On Mon, 2013-04-15 at 16:15 +0800, Li Zhong wrote: > > > So the code is implemented in ppc_md.hpte_remove(), may be called by > > __hash_huge_page(), and asm code htab_call_hpte_remove? > > > > > This is why the linear mapping (and the vmemmap) must be bolted. > > > > If not, it would result the infinite recursion like above? > > Potentially, we don't expect to fault linear mapping or vmemmap entries > on demand. We aren't equipped to do it and we occasionally have code > path that access the linear mapping and cannot afford to have SRR0 and > SRR1 clobbered by a page fault. Thank you for the education :) Thanks, Zhong > > Cheers, > Ben. > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 4/4] powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()
On Mon, 2013-04-15 at 16:36 +1000, Michael Ellerman wrote: > On Fri, Apr 12, 2013 at 10:17:00AM +0800, Li Zhong wrote: > > This patch tries to fix following issue when CONFIG_DEBUG_PAGEALLOC > > is enabled: > > Please include a better changelog. OK, I'll use the following as a template, thank you for the suggestion. > > This patch does fix (I hope?) the following oops, caused by xxx. Reproducible > by doing yyy. > > cheers > > > > > [ 543.075675] [ cut here ] > > [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! > > [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] > > [ 543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries > > [ 543.075741] Modules linked in: binfmt_misc ehea > > [ 543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: > > c005a594 > > [ 543.075771] REGS: c000a90832c0 TRAP: 0700 Not tainted > > (3.8.0-next-20130222) > > [ 543.075781] MSR: 80029032 CR: 4482 > > XER: > > [ 543.075816] SOFTE: 0 > > [ 543.075823] CFAR: c004c200 > > [ 543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: > > c000a908 CPU: 1 > > GPR00: 0001 c000a9083540 c0c600a8 > > GPR04: 0050 fffa c000a90834e0 004ff594 > > GPR08: 0001 9592d4d8 c0c86854 > > GPR12: 0002 c6ead300 00a51000 0001 > > GPR16: f3354380 ff80 > > GPR20: 0001 c0c600a8 0001 0001 > > GPR24: 03354380 c000 c0b65950 > > GPR28: 0020 000cd50e 00bf50d9 c0c7c230 > > [ 543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8 > > [ 543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 > > [ 543.076025] Call Trace: > > [ 543.076033] [c000a9083540] [c0036ea4] > > .kernel_map_pages+0x1d4/0x3f8 (unreliable) > > [ 543.076053] [c000a9083640] [c0167638] > > .get_page_from_freelist+0x6cc/0x8dc > > [ 543.076067] [c000a9083800] [c0167a48] > > .__alloc_pages_nodemask+0x200/0x96c > > [ 543.076082] [c000a90839c0] [c01ade44] > > .alloc_pages_vma+0x160/0x1e4 > > [ 543.076098] [c000a9083a80] [c018ce04] > > .handle_pte_fault+0x1b0/0x7e8 > > [ 543.076113] [c000a9083b50] [c018d5a8] > > .handle_mm_fault+0x16c/0x1a0 > > [ 543.076129] [c000a9083c00] [c07bf1dc] > > .do_page_fault+0x4d0/0x7a4 > > [ 543.076144] [c000a9083e30] [c00090e8] > > handle_page_fault+0x10/0x30 > > [ 543.076155] Instruction dump: > > [ 543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de > > e96a0010 e84a0008 > > [ 543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b00> 7f63db78 > > 48785781 6000 > > [ 543.076224] ---[ end trace bd5807e8d6ae186b ]--- > > > > Signed-off-by: Li Zhong > > --- > > arch/powerpc/mm/hash_utils_64.c | 10 ++ > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/arch/powerpc/mm/hash_utils_64.c > > b/arch/powerpc/mm/hash_utils_64.c > > index a7f54f0..4b449a0 100644 > > --- a/arch/powerpc/mm/hash_utils_64.c > > +++ b/arch/powerpc/mm/hash_utils_64.c > > @@ -1272,7 +1272,7 @@ static void kernel_map_linear_page(unsigned long > > vaddr, unsigned long lmi) > > unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize); > > unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize); > > unsigned long mode = htab_convert_pte_flags(PAGE_KERNEL); > > - int ret; > > + long ret; > > > > hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize); > > hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); > > @@ -1280,9 +1280,11 @@ static void kernel_map_linear_page(unsigned long > > vaddr, unsigned long lmi) > > /* Don't create HPTE entries for bad address */ > > if (!vsid) > > return; > > - ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr), > > -mode, 0, > > -mmu_linear_psize, mmu_kernel_ssize); > > + > > + ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), > > + mode, mmu_linear_psize, > > + mmu_kernel_ssize); > > + > > BUG_ON (ret < 0); > > spin_lock(&linear_map_hash_lock); > > BUG_ON(linear_map_hash_slots[lmi] & 0x80); > > -- > > 1.7.9.5 > > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 1/4] powerpc: Move the setting of rflags out of loop in __hash_page_huge
On Mon, 2013-04-15 at 16:32 +1000, Michael Ellerman wrote: > On Fri, Apr 12, 2013 at 10:16:57AM +0800, Li Zhong wrote: > > It seems that rflags don't get changed in the repeating loop, so move > > it out of the loop. > You've also changed the way new_pte is handled on repeat, but I think > that's OK too. OK, I'll add it in the description :) Thanks, Zhong > cheers > > > diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c > > b/arch/powerpc/mm/hugetlbpage-hash64.c > > index cecad34..edb4129 100644 > > --- a/arch/powerpc/mm/hugetlbpage-hash64.c > > +++ b/arch/powerpc/mm/hugetlbpage-hash64.c > > @@ -87,10 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long > > access, unsigned long vsid, > > > > pa = pte_pfn(__pte(old_pte)) << PAGE_SHIFT; > > > > -repeat: > > - hpte_group = ((hash & htab_hash_mask) * > > - HPTES_PER_GROUP) & ~0x7UL; > > - > > /* clear HPTE slot informations in new PTE */ > > #ifdef CONFIG_PPC_64K_PAGES > > new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0; > > ie. here new_pte was updated on repeat, but now it's not. > > > @@ -101,6 +97,10 @@ repeat: > > rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE | > > _PAGE_COHERENT | _PAGE_GUARDED)); > > > > +repeat: > > + hpte_group = ((hash & htab_hash_mask) * > > + HPTES_PER_GROUP) & ~0x7UL; > > + > > /* Insert into the hash table, primary slot */ > > slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, > > mmu_psize, ssize); > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 3/4] powerpc: Don't bolt the hpte in kernel_map_linear_page()
On Mon, 2013-04-15 at 08:56 +0200, Benjamin Herrenschmidt wrote: > On Mon, 2013-04-15 at 13:50 +1000, Paul Mackerras wrote: > > On Fri, Apr 12, 2013 at 10:16:59AM +0800, Li Zhong wrote: > > > It seems that in kernel_unmap_linear_page(), it only checks whether there > > > is a map in the linear_map_hash_slots array, so seems we don't need bolt > > > the hpte. > > Hi Paul, Ben Thank you both for the comments and detailed information. I'll keep it bolted in the next version. If you have time, please help to check whether my understanding below is correct. Thanks, Zhong > > I don't exactly understand your rationale here, but I don't think it's > > safe not to have linear mapping pages bolted. Basically, if a page > > will be used in the process of calling hash_page to demand-fault an > > HPTE into the hash table, then that page needs to be bolted, otherwise > > we can get an infinite recursion of HPT misses. So the infinite recursion happens like below? fault for PAGE A hash_page for PAGE A some page B needed by hash_page processing removed by others, before inserting the HPTE fault for PAGE B hash_page for PAGE B and recursion for ever > That includes all > > kernel stack pages, among other things, so I think we need to leave > > the HPTE_V_BOLTED in there. > > I suspect Li's confusion comes from the fact that he doesn't realizes > that we might evict random hash slots. If the linear mapping hash > entries could only be thrown out via kernel_unmap_linear_page() then his > comment would make sense. However this isn't the case. > > Li: When faulting something in, if both the primary and secondary > buckets are full, we "somewhat randomly" evict the content of a slot and > replace it. However we only do that on non-bolted slots. So the code is implemented in ppc_md.hpte_remove(), may be called by __hash_huge_page(), and asm code htab_call_hpte_remove? > This is why the linear mapping (and the vmemmap) must be bolted. If not, it would result the infinite recursion like above? > Cheers, > Ben. > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 4/4] powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()
This patch tries to fix following issue when CONFIG_DEBUG_PAGEALLOC is enabled: [ 543.075675] [ cut here ] [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] [ 543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 543.075741] Modules linked in: binfmt_misc ehea [ 543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: c005a594 [ 543.075771] REGS: c000a90832c0 TRAP: 0700 Not tainted (3.8.0-next-20130222) [ 543.075781] MSR: 80029032 CR: 4482 XER: [ 543.075816] SOFTE: 0 [ 543.075823] CFAR: c004c200 [ 543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: c000a908 CPU: 1 GPR00: 0001 c000a9083540 c0c600a8 GPR04: 0050 fffa c000a90834e0 004ff594 GPR08: 0001 9592d4d8 c0c86854 GPR12: 0002 c6ead300 00a51000 0001 GPR16: f3354380 ff80 GPR20: 0001 c0c600a8 0001 0001 GPR24: 03354380 c000 c0b65950 GPR28: 0020 000cd50e 00bf50d9 c0c7c230 [ 543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8 [ 543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 [ 543.076025] Call Trace: [ 543.076033] [c000a9083540] [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable) [ 543.076053] [c000a9083640] [c0167638] .get_page_from_freelist+0x6cc/0x8dc [ 543.076067] [c000a9083800] [c0167a48] .__alloc_pages_nodemask+0x200/0x96c [ 543.076082] [c000a90839c0] [c01ade44] .alloc_pages_vma+0x160/0x1e4 [ 543.076098] [c000a9083a80] [c018ce04] .handle_pte_fault+0x1b0/0x7e8 [ 543.076113] [c000a9083b50] [c018d5a8] .handle_mm_fault+0x16c/0x1a0 [ 543.076129] [c000a9083c00] [c07bf1dc] .do_page_fault+0x4d0/0x7a4 [ 543.076144] [c000a9083e30] [c00090e8] handle_page_fault+0x10/0x30 [ 543.076155] Instruction dump: [ 543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de e96a0010 e84a0008 [ 543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b00> 7f63db78 48785781 6000 [ 543.076224] ---[ end trace bd5807e8d6ae186b ]--- Signed-off-by: Li Zhong --- arch/powerpc/mm/hash_utils_64.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index a7f54f0..4b449a0 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1272,7 +1272,7 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize); unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize); unsigned long mode = htab_convert_pte_flags(PAGE_KERNEL); - int ret; + long ret; hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize); hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); @@ -1280,9 +1280,11 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) /* Don't create HPTE entries for bad address */ if (!vsid) return; - ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr), -mode, 0, -mmu_linear_psize, mmu_kernel_ssize); + + ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), + mode, mmu_linear_psize, + mmu_kernel_ssize); + BUG_ON (ret < 0); spin_lock(&linear_map_hash_lock); BUG_ON(linear_map_hash_slots[lmi] & 0x80); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 3/4] powerpc: Don't bolt the hpte in kernel_map_linear_page()
It seems that in kernel_unmap_linear_page(), it only checks whether there is a map in the linear_map_hash_slots array, so seems we don't need bolt the hpte. Signed-off-by: Li Zhong --- arch/powerpc/mm/hash_utils_64.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index 716f42b..a7f54f0 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1281,7 +1281,7 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) if (!vsid) return; ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr), -mode, HPTE_V_BOLTED, +mode, 0, mmu_linear_psize, mmu_kernel_ssize); BUG_ON (ret < 0); spin_lock(&linear_map_hash_lock); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 1/4] powerpc: Move the setting of rflags out of loop in __hash_page_huge
It seems that rflags don't get changed in the repeating loop, so move it out of the loop. Signed-off-by: Li Zhong --- arch/powerpc/mm/hugetlbpage-hash64.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c index cecad34..edb4129 100644 --- a/arch/powerpc/mm/hugetlbpage-hash64.c +++ b/arch/powerpc/mm/hugetlbpage-hash64.c @@ -87,10 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, pa = pte_pfn(__pte(old_pte)) << PAGE_SHIFT; -repeat: - hpte_group = ((hash & htab_hash_mask) * - HPTES_PER_GROUP) & ~0x7UL; - /* clear HPTE slot informations in new PTE */ #ifdef CONFIG_PPC_64K_PAGES new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0; @@ -101,6 +97,10 @@ repeat: rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE | _PAGE_COHERENT | _PAGE_GUARDED)); +repeat: + hpte_group = ((hash & htab_hash_mask) * + HPTES_PER_GROUP) & ~0x7UL; + /* Insert into the hash table, primary slot */ slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, mmu_psize, ssize); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 2/4] powerpc: Split the code trying to insert hpte repeatedly as an helper function
Move the logical trying to insert hpte in __hash_page_huge() to an helper function, so it could also be used by others. Signed-off-by: Li Zhong --- arch/powerpc/mm/hash_utils_64.c | 35 ++ arch/powerpc/mm/hugetlbpage-hash64.c | 31 ++ 2 files changed, 41 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index f410c3e..716f42b 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1230,6 +1230,41 @@ void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc) bad_page_fault(regs, address, SIGBUS); } +long hpte_insert_repeating(unsigned long hash, unsigned long vpn, + unsigned long pa, unsigned long rflags, + int psize, int ssize) +{ + unsigned long hpte_group; + long slot; + +repeat: + hpte_group = ((hash & htab_hash_mask) * + HPTES_PER_GROUP) & ~0x7UL; + + /* Insert into the hash table, primary slot */ + slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, + psize, ssize); + + /* Primary is full, try the secondary */ + if (unlikely(slot == -1)) { + hpte_group = ((~hash & htab_hash_mask) * + HPTES_PER_GROUP) & ~0x7UL; + slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, + HPTE_V_SECONDARY, + psize, ssize); + if (slot == -1) { + if (mftb() & 0x1) + hpte_group = ((hash & htab_hash_mask) * + HPTES_PER_GROUP)&~0x7UL; + + ppc_md.hpte_remove(hpte_group); + goto repeat; + } + } + + return slot; +} + #ifdef CONFIG_DEBUG_PAGEALLOC static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) { diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c index edb4129..bd7d38b 100644 --- a/arch/powerpc/mm/hugetlbpage-hash64.c +++ b/arch/powerpc/mm/hugetlbpage-hash64.c @@ -14,6 +14,10 @@ #include #include +extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn, + unsigned long pa, unsigned long rlags, + int psize, int ssize); + int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, pte_t *ptep, unsigned long trap, int local, int ssize, unsigned int shift, unsigned int mmu_psize) @@ -83,7 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, if (likely(!(old_pte & _PAGE_HASHPTE))) { unsigned long hash = hpt_hash(vpn, shift, ssize); - unsigned long hpte_group; pa = pte_pfn(__pte(old_pte)) << PAGE_SHIFT; @@ -97,30 +100,8 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE | _PAGE_COHERENT | _PAGE_GUARDED)); -repeat: - hpte_group = ((hash & htab_hash_mask) * - HPTES_PER_GROUP) & ~0x7UL; - - /* Insert into the hash table, primary slot */ - slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, - mmu_psize, ssize); - - /* Primary is full, try the secondary */ - if (unlikely(slot == -1)) { - hpte_group = ((~hash & htab_hash_mask) * - HPTES_PER_GROUP) & ~0x7UL; - slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, - HPTE_V_SECONDARY, - mmu_psize, ssize); - if (slot == -1) { - if (mftb() & 0x1) - hpte_group = ((hash & htab_hash_mask) * - HPTES_PER_GROUP)&~0x7UL; - - ppc_md.hpte_remove(hpte_group); - goto repeat; -} - } + slot = hpte_insert_repeating(hash, vpn, pa, +rflags, mmu_psize, ssize); /* * Hypervisor failure. Restore old pte and return -1 -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 0/4] try secondary hash before BUG in kernel_map_linear_page()
Hi Michael, Here is the updated version, could you please help to review it again? As you suggested, this version didn't copy the code, but splitted the logic into a helper function, so both kernel_map_linear_page() and __hash_page_huge() can use. Also patch #1 moves some unnecessary code out of the repeating loop, so the splitting is easier. Patch #3 removes the HPTE_V_BOLTED flag in kernel_map_linear_page(), it seems not needed based on my understanding. Changes are split into smaller ones, so each one did only one thing. Thanks, Zhong Li Zhong (4): powerpc: Move the setting of rflags out of loop in __hash_page_huge powerpc: Split the code trying to insert hpte repeatedly as an helper function powerpc: Don't bolt the hpte in kernel_map_linear_page() powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page() arch/powerpc/mm/hash_utils_64.c | 45 +++--- arch/powerpc/mm/hugetlbpage-hash64.c | 31 +-- 2 files changed, 47 insertions(+), 29 deletions(-) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 6/6] powerpc: Use generic code for exception handling
On Wed, 2013-04-10 at 13:32 +0800, Li Zhong wrote: > On Wed, 2013-04-10 at 14:56 +1000, Michael Ellerman wrote: > > On Fri, Mar 29, 2013 at 06:00:21PM +0800, Li Zhong wrote: > > > After the exception handling moved to generic code, and some changes in > > ... > > > diff --git a/arch/powerpc/mm/hash_utils_64.c > > > b/arch/powerpc/mm/hash_utils_64.c > > > index 360fba8..eeab30f 100644 > > > --- a/arch/powerpc/mm/hash_utils_64.c > > > +++ b/arch/powerpc/mm/hash_utils_64.c > > > @@ -33,6 +33,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > #include > > > #include > > > @@ -56,7 +57,6 @@ > > > #include > > > #include > > > #include > > > -#include > > > > > > #ifdef DEBUG > > > #define DBG(fmt...) udbg_printf(fmt) > > > @@ -919,13 +919,17 @@ int hash_page(unsigned long ea, unsigned long > > > access, unsigned long trap) > > > const struct cpumask *tmp; > > > int rc, user_region = 0, local = 0; > > > int psize, ssize; > > > + enum ctx_state prev_state; > > > + > > > + prev_state = exception_enter(); > > > > > > DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n", > > > ea, access, trap); > > > > > > if ((ea & ~REGION_MASK) >= PGTABLE_RANGE) { > > > DBG_LOW(" out of pgtable range !\n"); > > > - return 1; > > > + rc = 1; > > > + goto exit; > > > } > > > > > > /* Get region & vsid */ > > > > This no longer applies on mainline, please send an updated version. > > Yes, for current mainline (powerpc tree), only previous five patches > could be applied. The dependency of this patch is current in tip tree, > and seems would be in for 3.10. > > There are some more details in the cover letter (#0): > > "I assume these patches would get in through powerpc tree, so I didn't > combine the new patch (#6) with the original one (#2). So that if > powerpc tree picks these, it could pick the first five patches, and > apply patch #6 later when the dependency enters into powerpc tree (maybe > on some 3.10-rcs)." And I will send an updated version of this one when I see the dependency commits in mainline. Thanks, Zhong > Thanks, Zhong > > > cheers > > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 6/6] powerpc: Use generic code for exception handling
On Wed, 2013-04-10 at 14:56 +1000, Michael Ellerman wrote: > On Fri, Mar 29, 2013 at 06:00:21PM +0800, Li Zhong wrote: > > After the exception handling moved to generic code, and some changes in > ... > > diff --git a/arch/powerpc/mm/hash_utils_64.c > > b/arch/powerpc/mm/hash_utils_64.c > > index 360fba8..eeab30f 100644 > > --- a/arch/powerpc/mm/hash_utils_64.c > > +++ b/arch/powerpc/mm/hash_utils_64.c > > @@ -33,6 +33,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -56,7 +57,6 @@ > > #include > > #include > > #include > > -#include > > > > #ifdef DEBUG > > #define DBG(fmt...) udbg_printf(fmt) > > @@ -919,13 +919,17 @@ int hash_page(unsigned long ea, unsigned long access, > > unsigned long trap) > > const struct cpumask *tmp; > > int rc, user_region = 0, local = 0; > > int psize, ssize; > > + enum ctx_state prev_state; > > + > > + prev_state = exception_enter(); > > > > DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n", > > ea, access, trap); > > > > if ((ea & ~REGION_MASK) >= PGTABLE_RANGE) { > > DBG_LOW(" out of pgtable range !\n"); > > - return 1; > > + rc = 1; > > + goto exit; > > } > > > > /* Get region & vsid */ > > This no longer applies on mainline, please send an updated version. Yes, for current mainline (powerpc tree), only previous five patches could be applied. The dependency of this patch is current in tip tree, and seems would be in for 3.10. There are some more details in the cover letter (#0): "I assume these patches would get in through powerpc tree, so I didn't combine the new patch (#6) with the original one (#2). So that if powerpc tree picks these, it could pick the first five patches, and apply patch #6 later when the dependency enters into powerpc tree (maybe on some 3.10-rcs)." Thanks, Zhong > cheers > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH powerpc] try secondary hash before BUG in kernel_map_linear_page()
On Wed, 2013-04-10 at 12:21 +1000, Michael Ellerman wrote: > On Mon, Feb 25, 2013 at 05:29:35PM +0800, Li Zhong wrote: > > This patch tries to fix following issue when CONFIG_DEBUG_PAGEALLOC > > is enabled: > > > > [ 543.075675] [ cut here ] > > [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! > > [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] Hi Michael, Thanks for the review. > So the issue is that kernel_map_linear_page() doesn't try the secondary > hash slot. It seems so. > > The code is borrowed from that in __hash_page_huge(). > > It is, and in fact there is another copy in hash_low_64.S - in assembler. > > So I think we should at least try and keep ourselves to two > implementations, one in asm and one in C. So can you split it out into a > helper routine called by both kernel_map_linear_page() and > __hash_page_huge() ? OK, I'll try to update it as you suggested. Thanks, Zhong > cheers > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 2/6] powerpc: Exception hooks for context tracking subsystem
On Fri, 2013-04-05 at 13:50 +1100, Paul Mackerras wrote: > On Fri, Mar 29, 2013 at 06:00:17PM +0800, Li Zhong wrote: > > This is the exception hooks for context tracking subsystem, including > > data access, program check, single step, instruction breakpoint, machine > > check, > > alignment, fp unavailable, altivec assist, unknown exception, whose handlers > > might use RCU. > > > > This patch corresponds to > > [PATCH] x86: Exception hooks for userspace RCU extended QS > > commit 6ba3c97a38803883c2eee489505796cb0a727122 > > > > Signed-off-by: Li Zhong Hi Paul, Thanks for your review! Please check my answers below, and correct me if any errors. > Is there a reason why you didn't put the exception_exit() call in > ret_from_except_lite in entry_64.S, and the exception_entry() call in > EXCEPTION_PROLOG_COMMON? That would seem to catch all these cases in > a more centralized place. It seems to me that ret_from_except_lite and EXCEPTION_PROLOG_COMMON are also used by interrupts, where I think we don't need the hooks. So using this way could help to avoid adding overhead to these code path (interrupts, and some exit path of syscall). And I think adding the hook on higher level code seems a little easier for reading and checking. It seems that some exceptions don't use EXCEPTION_PROLOG_COMMON, and some don't go ret_from_except_lite exit path (like fp unavailable might go directly to fast_exception_return ). Maybe fast_exception_return is a centralized place for us to return to user space? But it still adds some overheads which is not necessarily needed. And I think it also makes the implementation here consistent with the style that x86 uses. > Also, I notice that with the exception_exit calls where they are, we > can still deliver signals (thus possibly taking a page fault) or call > schedule() for preemption after the exception_exit() call. Is that > OK, or is it a potential problem? If I understand correctly, I guess you are talking about the cases where we might return to user space without context state correctly being set as in user? There is user_enter() called in do_notify_resume() in patch #3, so after handling the signals we always call user_enter(). There are also some changes of the context_tracking code from Frederic, which might be related: ( they are now in tip tree, and url of the patches for your convenience https://lkml.org/lkml/2013/3/1/266 ) 6c1e0256fad84a843d915414e4b5973b7443d48d context_tracking: Restore correct previous context state on exception exit. With this patch, if a later exception happened after user_enter(), before the CPU actually returns to user space, the correct context state(in user) is saved and restored when handling the later exception. Patch #6 converts the code to use these new APIs, which is currently not available in powerpc tree. b22366cd54c6fe05db426f20adb10f461c19ec06 context_tracking: Restore preempted context state after preempt_schedule_irq With this patch, the user context state could be correctly restored after schedule returns. Thanks, Zhong > Paul. > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 5/6] powerpc: select HAVE_CONTEXT_TRACKING for pSeries
Start context tracking support from pSeries. Signed-off-by: Li Zhong --- arch/powerpc/platforms/pseries/Kconfig |1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 9a0941b..023b288 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -18,6 +18,7 @@ config PPC_PSERIES select PPC_PCI_CHOICE if EXPERT select ZLIB_DEFLATE select PPC_DOORBELL + select HAVE_CONTEXT_TRACKING default y config PPC_SPLPAR -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 6/6] powerpc: Use generic code for exception handling
After the exception handling moved to generic code, and some changes in following two commits: 56dd9470d7c8734f055da2a6bac553caf4a468eb context_tracking: Move exception handling to generic code 6c1e0256fad84a843d915414e4b5973b7443d48d context_tracking: Restore correct previous context state on exception exit it is able for this patch to replace the implementation in arch code with the generic code in above commits. Signed-off-by: Li Zhong --- arch/powerpc/include/asm/context_tracking.h | 29 --- arch/powerpc/kernel/exceptions-64s.S|4 +-- arch/powerpc/kernel/traps.c | 42 +- arch/powerpc/mm/fault.c |7 ++-- arch/powerpc/mm/hash_utils_64.c | 51 ++- 5 files changed, 57 insertions(+), 76 deletions(-) diff --git a/arch/powerpc/include/asm/context_tracking.h b/arch/powerpc/include/asm/context_tracking.h index 4da287e..b6f5a33 100644 --- a/arch/powerpc/include/asm/context_tracking.h +++ b/arch/powerpc/include/asm/context_tracking.h @@ -1,39 +1,10 @@ #ifndef _ASM_POWERPC_CONTEXT_TRACKING_H #define _ASM_POWERPC_CONTEXT_TRACKING_H -#ifndef __ASSEMBLY__ -#include -#include - -/* - * temporarily defined to avoid potential conflicts with the common - * implementation, these will be removed by a later patch after the common - * code enters powerpc tree - */ -#define exception_enter __exception_enter -#define exception_exit __exception_exit - -static inline void __exception_enter(struct pt_regs *regs) -{ - user_exit(); -} - -static inline void __exception_exit(struct pt_regs *regs) -{ -#ifdef CONFIG_CONTEXT_TRACKING - if (user_mode(regs)) - user_enter(); -#endif -} - -#else /* __ASSEMBLY__ */ - #ifdef CONFIG_CONTEXT_TRACKING #define SCHEDULE_USER bl .schedule_user #else #define SCHEDULE_USER bl .schedule #endif -#endif /* !__ASSEMBLY__ */ - #endif diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 6d82f4f..a8a5361 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1368,17 +1368,15 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_SLB) rlwimi r4,r0,32-13,30,30 /* becomes _PAGE_USER access bit */ ori r4,r4,1 /* add _PAGE_PRESENT */ rlwimi r4,r5,22+2,31-2,31-2/* Set _PAGE_EXEC if trap is 0x400 */ - addir6,r1,STACK_FRAME_OVERHEAD /* * r3 contains the faulting address * r4 contains the required access permissions * r5 contains the trap number -* r6 contains the address of pt_regs * * at return r3 = 0 for success, 1 for page fault, negative for error */ - bl .hash_page_ct /* build HPTE if possible */ + bl .hash_page /* build HPTE if possible */ cmpdi r3,0/* see if hash_page succeeded */ /* Success */ diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 6228b6b..1b46c2d9 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -60,7 +61,6 @@ #include #include #include -#include #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC) int (*__debugger)(struct pt_regs *regs) __read_mostly; @@ -669,8 +669,9 @@ int machine_check_generic(struct pt_regs *regs) void machine_check_exception(struct pt_regs *regs) { int recover = 0; + enum ctx_state prev_state; - exception_enter(regs); + prev_state = exception_enter(); __get_cpu_var(irq_stat).mce_exceptions++; @@ -712,7 +713,7 @@ void machine_check_exception(struct pt_regs *regs) panic("Unrecoverable Machine check"); exit: - exception_exit(regs); + exception_exit(prev_state); } void SMIException(struct pt_regs *regs) @@ -722,19 +723,21 @@ void SMIException(struct pt_regs *regs) void unknown_exception(struct pt_regs *regs) { - exception_enter(regs); + enum ctx_state prev_state; + prev_state = exception_enter(); printk("Bad trap at PC: %lx, SR: %lx, vector=%lx\n", regs->nip, regs->msr, regs->trap); _exception(SIGTRAP, regs, 0, 0); - exception_exit(regs); + exception_exit(prev_state); } void instruction_breakpoint_exception(struct pt_regs *regs) { - exception_enter(regs); + enum ctx_state prev_state; + prev_state = exception_enter(); if (notify_die(DIE_IABR_MATCH, "iabr_match", regs, 5, 5, SIGTRAP) == NOTIFY_STOP) @@ -744,7 +747,7 @@ void instruction_breakpoint_exception(struct pt_regs *regs) _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); exit: - exception_exit(regs); + exception_exi
[RFC PATCH v2 4/6] powerpc: Use the new schedule_user API on userspace preemption
This patch corresponds to [PATCH] x86: Use the new schedule_user API on userspace preemption commit 0430499ce9d78691f3985962021b16bf8f8a8048 Signed-off-by: Li Zhong --- arch/powerpc/include/asm/context_tracking.h | 11 +++ arch/powerpc/kernel/entry_64.S |3 ++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/context_tracking.h b/arch/powerpc/include/asm/context_tracking.h index 377146e..4da287e 100644 --- a/arch/powerpc/include/asm/context_tracking.h +++ b/arch/powerpc/include/asm/context_tracking.h @@ -1,6 +1,7 @@ #ifndef _ASM_POWERPC_CONTEXT_TRACKING_H #define _ASM_POWERPC_CONTEXT_TRACKING_H +#ifndef __ASSEMBLY__ #include #include @@ -25,4 +26,14 @@ static inline void __exception_exit(struct pt_regs *regs) #endif } +#else /* __ASSEMBLY__ */ + +#ifdef CONFIG_CONTEXT_TRACKING +#define SCHEDULE_USER bl .schedule_user +#else +#define SCHEDULE_USER bl .schedule +#endif + +#endif /* !__ASSEMBLY__ */ + #endif diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 256c5bf..f7e4622 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -33,6 +33,7 @@ #include #include #include +#include /* * System calls. @@ -618,7 +619,7 @@ _GLOBAL(ret_from_except_lite) andi. r0,r4,_TIF_NEED_RESCHED beq 1f bl .restore_interrupts - bl .schedule + SCHEDULE_USER b .ret_from_except_lite 1: bl .save_nvgprs -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 3/6] powerpc: Exit user context on notify resume
This patch allows RCU usage in do_notify_resume, e.g. signal handling. It corresponds to [PATCH] x86: Exit RCU extended QS on notify resume commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6 Signed-off-by: Li Zhong --- arch/powerpc/kernel/signal.c |5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index cf12eae..d63b502 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -159,6 +160,8 @@ static int do_signal(struct pt_regs *regs) void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) { + user_exit(); + if (thread_info_flags & _TIF_UPROBE) uprobe_notify_resume(regs); @@ -169,4 +172,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); } + + user_enter(); } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 2/6] powerpc: Exception hooks for context tracking subsystem
This is the exception hooks for context tracking subsystem, including data access, program check, single step, instruction breakpoint, machine check, alignment, fp unavailable, altivec assist, unknown exception, whose handlers might use RCU. This patch corresponds to [PATCH] x86: Exception hooks for userspace RCU extended QS commit 6ba3c97a38803883c2eee489505796cb0a727122 Signed-off-by: Li Zhong --- arch/powerpc/include/asm/context_tracking.h | 28 + arch/powerpc/kernel/exceptions-64s.S|4 +- arch/powerpc/kernel/traps.c | 83 --- arch/powerpc/mm/fault.c | 15 - arch/powerpc/mm/hash_utils_64.c | 17 ++ 5 files changed, 122 insertions(+), 25 deletions(-) create mode 100644 arch/powerpc/include/asm/context_tracking.h diff --git a/arch/powerpc/include/asm/context_tracking.h b/arch/powerpc/include/asm/context_tracking.h new file mode 100644 index 000..377146e --- /dev/null +++ b/arch/powerpc/include/asm/context_tracking.h @@ -0,0 +1,28 @@ +#ifndef _ASM_POWERPC_CONTEXT_TRACKING_H +#define _ASM_POWERPC_CONTEXT_TRACKING_H + +#include +#include + +/* + * temporarily defined to avoid potential conflicts with the common + * implementation, these will be removed by a later patch after the common + * code enters powerpc tree + */ +#define exception_enter __exception_enter +#define exception_exit __exception_exit + +static inline void __exception_enter(struct pt_regs *regs) +{ + user_exit(); +} + +static inline void __exception_exit(struct pt_regs *regs) +{ +#ifdef CONFIG_CONTEXT_TRACKING + if (user_mode(regs)) + user_enter(); +#endif +} + +#endif diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index a8a5361..6d82f4f 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1368,15 +1368,17 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_SLB) rlwimi r4,r0,32-13,30,30 /* becomes _PAGE_USER access bit */ ori r4,r4,1 /* add _PAGE_PRESENT */ rlwimi r4,r5,22+2,31-2,31-2/* Set _PAGE_EXEC if trap is 0x400 */ + addir6,r1,STACK_FRAME_OVERHEAD /* * r3 contains the faulting address * r4 contains the required access permissions * r5 contains the trap number +* r6 contains the address of pt_regs * * at return r3 = 0 for success, 1 for page fault, negative for error */ - bl .hash_page /* build HPTE if possible */ + bl .hash_page_ct /* build HPTE if possible */ cmpdi r3,0/* see if hash_page succeeded */ /* Success */ diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 37cc40e..6228b6b 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -60,6 +60,7 @@ #include #include #include +#include #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC) int (*__debugger)(struct pt_regs *regs) __read_mostly; @@ -669,6 +670,8 @@ void machine_check_exception(struct pt_regs *regs) { int recover = 0; + exception_enter(regs); + __get_cpu_var(irq_stat).mce_exceptions++; /* See if any machine dependent calls. In theory, we would want @@ -683,7 +686,7 @@ void machine_check_exception(struct pt_regs *regs) recover = cur_cpu_spec->machine_check(regs); if (recover > 0) - return; + goto exit; #if defined(CONFIG_8xx) && defined(CONFIG_PCI) /* the qspan pci read routines can cause machine checks -- Cort @@ -693,20 +696,23 @@ void machine_check_exception(struct pt_regs *regs) * -- BenH */ bad_page_fault(regs, regs->dar, SIGBUS); - return; + goto exit; #endif if (debugger_fault_handler(regs)) - return; + goto exit; if (check_io_access(regs)) - return; + goto exit; die("Machine check", regs, SIGBUS); /* Must die if the interrupt is not recoverable */ if (!(regs->msr & MSR_RI)) panic("Unrecoverable Machine check"); + +exit: + exception_exit(regs); } void SMIException(struct pt_regs *regs) @@ -716,20 +722,29 @@ void SMIException(struct pt_regs *regs) void unknown_exception(struct pt_regs *regs) { + exception_enter(regs); + printk("Bad trap at PC: %lx, SR: %lx, vector=%lx\n", regs->nip, regs->msr, regs->trap); _exception(SIGTRAP, regs, 0, 0); + + exception_exit(regs); } void instruction_breakpoint_exception(struct pt_regs *regs) { + exception_enter(regs); + if (notify_die(DIE_IABR_MATCH, "iabr_match", regs, 5, 5, SIGTRAP) == N
[RFC PATCH v2 1/6] powerpc: Syscall hooks for context tracking subsystem
This is the syscall slow path hooks for context tracking subsystem, corresponding to [PATCH] x86: Syscall hooks for userspace RCU extended QS commit bf5a3c13b939813d28ce26c01425054c740d6731 TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is better for it to be in the same 16 bits with others in the group, so in the asm code, andi. with this group could work. Signed-off-by: Li Zhong Acked-by: Frederic Weisbecker --- arch/powerpc/include/asm/thread_info.h |7 +-- arch/powerpc/kernel/ptrace.c |5 + 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 406b7b9..414a261 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -97,7 +97,7 @@ static inline struct thread_info *current_thread_info(void) #define TIF_PERFMON_CTXSW 6 /* perfmon needs ctxsw calls */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SINGLESTEP 8 /* singlestepping active */ -#define TIF_MEMDIE 9 /* is terminating due to OOM killer */ +#define TIF_NOHZ 9 /* in adaptive nohz mode */ #define TIF_SECCOMP10 /* secure computing */ #define TIF_RESTOREALL 11 /* Restore all regs (implies NOERROR) */ #define TIF_NOERROR12 /* Force successful syscall return */ @@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void) #define TIF_SYSCALL_TRACEPOINT 15 /* syscall tracepoint instrumentation */ #define TIF_EMULATE_STACK_STORE16 /* Is an instruction emulation for stack store? */ +#define TIF_MEMDIE 17 /* is terminating due to OOM killer */ /* as above, but as bit values */ #define _TIF_SYSCALL_TRACE (1< #include #include +#include #include #include @@ -1778,6 +1779,8 @@ long do_syscall_trace_enter(struct pt_regs *regs) { long ret = 0; + user_exit(); + secure_computing_strict(regs->gpr[0]); if (test_thread_flag(TIF_SYSCALL_TRACE) && @@ -1822,4 +1825,6 @@ void do_syscall_trace_leave(struct pt_regs *regs) step = test_thread_flag(TIF_SINGLESTEP); if (step || test_thread_flag(TIF_SYSCALL_TRACE)) tracehook_report_syscall_exit(regs, step); + + user_enter(); } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v2 0/6] powerpc: Support context tracking for Power pSeries
These patches try to support context tracking for Power arch, beginning with 64-bit pSeries. The codes are ported from that of the x86_64, and in each patch, I listed the corresponding patch for x86. Would you please help review and give your comments? v2: I rebased the patches against 3.9-rcs, and also added a patch to replace the exception handling with the generic code in tip timers/nohz. I assume these patches would get in through powerpc tree, so I didn't combine the new patch (#6) with the original one (#2). So that if powerpc tree picks these, it could pick the first five patches, and apply patch #6 later when the dependency enters into powerpc tree (maybe on some 3.10-rcs). I'm also wondering whether it is possible for these to go through tip timers/nohz, so for now, patches #6 and #2 could be combined into one, and no need to worry about the issues caused by arch/common code merging. And it might also make future changes easier. Thanks, Zhong Li Zhong (6): powerpc: Syscall hooks for context tracking subsystem powerpc: Exception hooks for context tracking subsystem powerpc: Exit user context on notify resume powerpc: Use the new schedule_user API on userspace preemption powerpc: select HAVE_CONTEXT_TRACKING for pSeries powerpc: Use generic code for exception handling arch/powerpc/include/asm/context_tracking.h | 10 +++ arch/powerpc/include/asm/thread_info.h |7 ++- arch/powerpc/kernel/entry_64.S |3 +- arch/powerpc/kernel/ptrace.c|5 ++ arch/powerpc/kernel/signal.c|5 ++ arch/powerpc/kernel/traps.c | 91 --- arch/powerpc/mm/fault.c | 16 - arch/powerpc/mm/hash_utils_64.c | 38 --- arch/powerpc/platforms/pseries/Kconfig |1 + 9 files changed, 140 insertions(+), 36 deletions(-) create mode 100644 arch/powerpc/include/asm/context_tracking.h -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH powerpc] try secondary hash before BUG in kernel_map_linear_page()
This patch tries to fix following issue when CONFIG_DEBUG_PAGEALLOC is enabled: [ 543.075675] [ cut here ] [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] [ 543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 543.075741] Modules linked in: binfmt_misc ehea [ 543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: c005a594 [ 543.075771] REGS: c000a90832c0 TRAP: 0700 Not tainted (3.8.0-next-20130222) [ 543.075781] MSR: 80029032 CR: 4482 XER: [ 543.075816] SOFTE: 0 [ 543.075823] CFAR: c004c200 [ 543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: c000a908 CPU: 1 GPR00: 0001 c000a9083540 c0c600a8 GPR04: 0050 fffa c000a90834e0 004ff594 GPR08: 0001 9592d4d8 c0c86854 GPR12: 0002 c6ead300 00a51000 0001 GPR16: f3354380 ff80 GPR20: 0001 c0c600a8 0001 0001 GPR24: 03354380 c000 c0b65950 GPR28: 0020 000cd50e 00bf50d9 c0c7c230 [ 543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8 [ 543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 [ 543.076025] Call Trace: [ 543.076033] [c000a9083540] [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable) [ 543.076053] [c000a9083640] [c0167638] .get_page_from_freelist+0x6cc/0x8dc [ 543.076067] [c000a9083800] [c0167a48] .__alloc_pages_nodemask+0x200/0x96c [ 543.076082] [c000a90839c0] [c01ade44] .alloc_pages_vma+0x160/0x1e4 [ 543.076098] [c000a9083a80] [c018ce04] .handle_pte_fault+0x1b0/0x7e8 [ 543.076113] [c000a9083b50] [c018d5a8] .handle_mm_fault+0x16c/0x1a0 [ 543.076129] [c000a9083c00] [c07bf1dc] .do_page_fault+0x4d0/0x7a4 [ 543.076144] [c000a9083e30] [c00090e8] handle_page_fault+0x10/0x30 [ 543.076155] Instruction dump: [ 543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de e96a0010 e84a0008 [ 543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b00> 7f63db78 48785781 6000 [ 543.076224] ---[ end trace bd5807e8d6ae186b ]--- The code is borrowed from that in __hash_page_huge(). Signed-off-by: Li Zhong --- arch/powerpc/mm/hash_utils_64.c | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index 1b6e127..31c7924 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1231,11 +1231,28 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) int ret; hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize); + +repeat: hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr), mode, HPTE_V_BOLTED, mmu_linear_psize, mmu_kernel_ssize); + + if (unlikely(ret == -1)) { + hpteg = (~hash & htab_hash_mask) * HPTES_PER_GROUP; + ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr), mode, +HPTE_V_SECONDARY, +mmu_linear_psize, mmu_kernel_ssize); + if (ret == -1) { + if (mftb() & 0x1) + hpteg = (hash & htab_hash_mask) * +HPTES_PER_GROUP; + ppc_md.hpte_remove(hpteg); + goto repeat; + } + } + BUG_ON (ret < 0); spin_lock(&linear_map_hash_lock); BUG_ON(linear_map_hash_slots[lmi] & 0x80); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 2/5] powerpc: Exception hooks for context tracking subsystem
On Sun, 2013-02-10 at 15:10 +0100, Frederic Weisbecker wrote: > 2013/2/1 Li Zhong : > > This is the exception hooks for context tracking subsystem, including > > data access, program check, single step, instruction breakpoint, machine > > check, > > alignment, fp unavailable, altivec assist, unknown exception, whose handlers > > might use RCU. > > > > This patch corresponds to > > [PATCH] x86: Exception hooks for userspace RCU extended QS > > commit 6ba3c97a38803883c2eee489505796cb0a727122 > > > > Signed-off-by: Li Zhong > > Looks good! > > I guess we should move exception_enter/exit definition to the generic > code. They should be the same for all archs after all. Indeed. > Also we are > relying on user_mode(regs) but this may be buggy with some corner > cases. For example if an exception happen after a call to user_exit() I guess you mean user_enter() here, or am I confused? > (on syscall exit) but before we actually resume in userspace, the > exception will exit in kernel mode from the context tracking POV. > > So instead on relying on the regs, which are not sync with the context > tracking state, we should use something like: > > prev_state = exception_enter(); > ... > exception_exit(prev_state); > > Also preempt_schedule_irq() is concerned as well by this problem. So I > should convert it to that scheme as well. I'm going to prepare some > patches. > > Feel free to merge this patch in the powerpc tree, I'll do the > conversion along the way. Or if your patches gets merged earlier than these, I can update my code according to yours. Thanks, Zhong > > Thanks. > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 1/5] powerpc: Syscall hooks for context tracking subsystem
On Thu, 2013-02-07 at 01:29 +0100, Frederic Weisbecker wrote: > 2013/2/1 Li Zhong : > > This is the syscall slow path hooks for context tracking subsystem, > > corresponding to > > [PATCH] x86: Syscall hooks for userspace RCU extended QS > > commit bf5a3c13b939813d28ce26c01425054c740d6731 > > > > TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there > > is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is > > better for it to be in the same 16 bits with others in the group, so in the > > asm code, andi. with this group could work. > > > > Signed-off-by: Li Zhong > > --- > > arch/powerpc/include/asm/thread_info.h |7 +-- > > arch/powerpc/kernel/ptrace.c |5 + > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/thread_info.h > > b/arch/powerpc/include/asm/thread_info.h > > index 406b7b9..414a261 100644 > > --- a/arch/powerpc/include/asm/thread_info.h > > +++ b/arch/powerpc/include/asm/thread_info.h > > @@ -97,7 +97,7 @@ static inline struct thread_info > > *current_thread_info(void) > > #define TIF_PERFMON_CTXSW 6 /* perfmon needs ctxsw calls */ > > #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ > > #define TIF_SINGLESTEP 8 /* singlestepping active */ > > -#define TIF_MEMDIE 9 /* is terminating due to OOM killer > > */ > > +#define TIF_NOHZ 9 /* in adaptive nohz mode */ > > #define TIF_SECCOMP10 /* secure computing */ > > #define TIF_RESTOREALL 11 /* Restore all regs (implies > > NOERROR) */ > > #define TIF_NOERROR12 /* Force successful syscall return > > */ > > @@ -106,6 +106,7 @@ static inline struct thread_info > > *current_thread_info(void) > > #define TIF_SYSCALL_TRACEPOINT 15 /* syscall tracepoint > > instrumentation */ > > #define TIF_EMULATE_STACK_STORE16 /* Is an instruction > > emulation > > for stack store? */ > > +#define TIF_MEMDIE 17 /* is terminating due to OOM killer > > */ > > > > /* as above, but as bit values */ > > #define _TIF_SYSCALL_TRACE (1< > @@ -124,8 +125,10 @@ static inline struct thread_info > > *current_thread_info(void) > > #define _TIF_UPROBE(1< > #define _TIF_SYSCALL_TRACEPOINT(1< > #define _TIF_EMULATE_STACK_STORE (1< > +#define _TIF_NOHZ (1< > #define _TIF_SYSCALL_T_OR_A(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ > > -_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT) > > +_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \ > > +_TIF_NOHZ) > > > > #define _TIF_USER_WORK_MASK(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \ > > _TIF_NOTIFY_RESUME | _TIF_UPROBE) > > diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c > > index c497000..62238dd 100644 > > --- a/arch/powerpc/kernel/ptrace.c > > +++ b/arch/powerpc/kernel/ptrace.c > > @@ -32,6 +32,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -1745,6 +1746,8 @@ long do_syscall_trace_enter(struct pt_regs *regs) > > { > > long ret = 0; > > > > + user_exit(); > > + > > secure_computing_strict(regs->gpr[0]); > > > > if (test_thread_flag(TIF_SYSCALL_TRACE) && > > @@ -1789,4 +1792,6 @@ void do_syscall_trace_leave(struct pt_regs *regs) > > step = test_thread_flag(TIF_SINGLESTEP); > > if (step || test_thread_flag(TIF_SYSCALL_TRACE)) > > tracehook_report_syscall_exit(regs, step); > > + > > + user_enter(); > > In x86-64, schedule_user() and do_notify_resume() can be called before > syscall_trace_leave(). As a result we may be entering > syscall_trace_leave() in user mode (from a context tracking POV). To > fix this I added a call to user_exit() on the very beginning of that > function. > > You can find the details in 2c5594df344cd1ff0cc9bf007dea3235582b3acf > ("rcu: Fix unrecovered RCU user mode in syscall_trace_leave()"). Hi Frederic, Thank you very much for the reminding. > > Could this problem happen in ppc as well? I checked the code(64 bit) today, it seems to me that it won't happen. But fortunately, we are in the ppc mailing list, please correct me if my understanding is wrong. By the way, I enabled CONTEXT_TRACKING_FORCE and PROVE_RCU, so if it could happen, I think there should be some illegal RCU usage complaints reported. Thanks, Zhong > Thanks. > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 5/5] powerpc: select HAVE_CONTEXT_TRACKING for pSeries
Start context tracking support from pSeries. Signed-off-by: Li Zhong --- arch/powerpc/platforms/pseries/Kconfig |1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 837cf49..a9570fe 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -17,6 +17,7 @@ config PPC_PSERIES select PPC_NATIVE select PPC_PCI_CHOICE if EXPERT select ZLIB_DEFLATE + select HAVE_CONTEXT_TRACKING default y config PPC_SPLPAR -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 4/5] powerpc: Use the new schedule_user API on userspace preemption
This patch corresponds to [PATCH] x86: Use the new schedule_user API on userspace preemption commit 0430499ce9d78691f3985962021b16bf8f8a8048 Signed-off-by: Li Zhong --- arch/powerpc/include/asm/context_tracking.h | 11 +++ arch/powerpc/kernel/entry_64.S |3 ++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/context_tracking.h b/arch/powerpc/include/asm/context_tracking.h index 3adccd8..2e042ba 100644 --- a/arch/powerpc/include/asm/context_tracking.h +++ b/arch/powerpc/include/asm/context_tracking.h @@ -1,6 +1,7 @@ #ifndef _ASM_POWERPC_CONTEXT_TRACKING_H #define _ASM_POWERPC_CONTEXT_TRACKING_H +#ifndef __ASSEMBLY__ #include #include @@ -17,4 +18,14 @@ static inline void exception_exit(struct pt_regs *regs) #endif } +#else /* __ASSEMBLY__ */ + +#ifdef CONFIG_CONTEXT_TRACKING +#define SCHEDULE_USER bl .schedule_user +#else +#define SCHEDULE_USER bl .schedule +#endif + +#endif /* !__ASSEMBLY__ */ + #endif diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 3d990d3..91f09ec 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -33,6 +33,7 @@ #include #include #include +#include /* * System calls. @@ -595,7 +596,7 @@ _GLOBAL(ret_from_except_lite) andi. r0,r4,_TIF_NEED_RESCHED beq 1f bl .restore_interrupts - bl .schedule + SCHEDULE_USER b .ret_from_except_lite 1: bl .save_nvgprs -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev