(2014/10/08 20:51), Wanpeng Li wrote: > > δΊ 10/8/14, 2:43 PM, Yasuaki Ishimatsu ει: >> While offling node by hot removing memory, the following divide error >> occurs: >> >> divide error: 0000 [#1] SMP >> [...] >> Call Trace: >> [...] handle_mm_fault >> [...] ? try_to_wake_up >> [...] ? wake_up_state >> [...] __do_page_fault >> [...] ? do_futex >> [...] ? put_prev_entity >> [...] ? __switch_to >> [...] do_page_fault >> [...] page_fault >> [...] >> RIP [<ffffffff810a7081>] task_numa_fault >> RSP <ffff88084eb2bcb0> >> >> The issue occurs as follows: >> 1. When page fault occurs and page is allocated from node 1, >> task_struct->numa_faults_buffer_memory[] of node 1 is >> incremented and p->numa_faults_locality[] is also incremented >> as follows: >> >> o numa_faults_buffer_memory[] o numa_faults_locality[] >> NR_NUMA_HINT_FAULT_TYPES >> | 0 | 1 | >> ---------------------------------- ---------------------- >> node 0 | 0 | 0 | remote | 0 | >> node 1 | 0 | 1 | locale | 1 | >> ---------------------------------- ---------------------- >> >> 2. node 1 is offlined by hot removing memory. >> >> 3. When page fault occurs, fault_types[] is calculated by using >> p->numa_faults_buffer_memory[] of all online nodes in >> task_numa_placement(). But node 1 was offline by step 2. So >> the fault_types[] is calculated by using only >> p->numa_faults_buffer_memory[] of node 0. So both of fault_types[] >> are set to 0. >> >> 4. The values(0) of fault_types[] pass to update_task_scan_period(). >> >> 5. numa_faults_locality[1] is set to 1. So the following division is >> calculated. >> >> static void update_task_scan_period(struct task_struct *p, >> unsigned long shared, unsigned long >> private){ >> ... >> ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private >> + shared)); >> } >> >> 6. But both of private and shared are set to 0. So divide error >> occurs here. >> >> The divide error is rare case because the trigger is node offline. >> By this patch, when both of private and shared are set to 0, diff >> is just set to 0, not calculating the division. >> >> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasu...@jp.fujitsu.com> >> --- >> kernel/sched/fair.c | 30 +++++++++++++++++++----------- >> 1 file changed, 19 insertions(+), 11 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index bfa3c86..fb7dc3f 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -1496,18 +1496,26 @@ static void update_task_scan_period(struct >> task_struct *p, >> slot = 1; >> diff = slot * period_slot; >> } else { >> - diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot; >> + if (unlikely((private + shared) == 0)) >> + /* >> + * This is a rare case. The trigger is node offline. >> + */ >> + diff = 0; >> + else { >> + diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot; >> >> - /* >> - * Scale scan rate increases based on sharing. There is an >> - * inverse relationship between the degree of sharing and >> - * the adjustment made to the scanning period. Broadly >> - * speaking the intent is that there is little point >> - * scanning faster if shared accesses dominate as it may >> - * simply bounce migrations uselessly >> - */ >> - ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + >> shared)); >> - diff = (diff * ratio) / NUMA_PERIOD_SLOTS; >> + /* >> + * Scale scan rate increases based on sharing. There is >> + * an inverse relationship between the degree of sharing >> + * and the adjustment made to the scanning period. >> + * Broadly speaking the intent is that there is little >> + * point scanning faster if shared accesses dominate as >> + * it may simply bounce migrations uselessly >> + */ >> + ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, >> + (private + shared)); >> + diff = (diff * ratio) / NUMA_PERIOD_SLOTS; >> + } >> } >
> How about just > > ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared + 1)); Thank you for providing sample code. Rik also provided other idea. So I am confirming which is better idea. Thanks, Yasuaki Ishimatsu > > > Regards, > Wanpeng Li > >> p->numa_scan_period = clamp(p->numa_scan_period + diff, > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/