(2014/10/22 16:09), Wanpeng Li wrote:
> 10/22/14, 3:04 PM, Yasuaki Ishimatsu:
>> While offling node by hot removing memory, the following divide error
>> occurs:
>>
>>    divide error: 0000 [#1] SMP
>>    [...]
>>    Call Trace:
>>     [...] handle_mm_fault
>>     [...] ? try_to_wake_up
>>     [...] ? wake_up_state
>>     [...] __do_page_fault
>>     [...] ? do_futex
>>     [...] ? put_prev_entity
>>     [...] ? __switch_to
>>     [...] do_page_fault
>>     [...] page_fault
>>    [...]
>>    RIP  [<ffffffff810a7081>] task_numa_fault
>>     RSP <ffff88084eb2bcb0>
>>
>> The issue occurs as follows:
>>    1. When page fault occurs and page is allocated from node 1,
>>       task_struct->numa_faults_buffer_memory[] of node 1 is
>>       incremented and p->numa_faults_locality[] is also incremented
>>       as follows:
>>
>>       o numa_faults_buffer_memory[]       o numa_faults_locality[]
>>                NR_NUMA_HINT_FAULT_TYPES
>>               |      0     |     1     |
>>       ----------------------------------  ----------------------
>>        node 0 |      0     |     0     |   remote |      0     |
>>        node 1 |      0     |     1     |   locale |      1     |
>>       ----------------------------------  ----------------------
>>
>>    2. node 1 is offlined by hot removing memory.
>>
>>    3. When page fault occurs, fault_types[] is calculated by using
>>       p->numa_faults_buffer_memory[] of all online nodes in
>>       task_numa_placement(). But node 1 was offline by step 2. So
>>       the fault_types[] is calculated by using only
>>       p->numa_faults_buffer_memory[] of node 0. So both of fault_types[]
>>       are set to 0.
>>
>>    4. The values(0) of fault_types[] pass to update_task_scan_period().
>>
>>    5. numa_faults_locality[1] is set to 1. So the following division is
>>       calculated.
>>
>>          static void update_task_scan_period(struct task_struct *p,
>>                                  unsigned long shared, unsigned long 
>> private){
>>          ...
>>                  ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private 
>> + shared));
>>          }
>>
>>    6. But both of private and shared are set to 0. So divide error
>>       occurs here.
>>
>> The divide error is rare case because the trigger is node offline.
>> This patch always increments denominator for avoiding divide error.
>>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasu...@jp.fujitsu.com>
> 
> Reviewed-by: Wanpeng Li <wanpeng...@linux.intel.com>

Thank you for your review.

Thanks,
Yasuaki Ishimatsu

> 
>> ---
>> v2:
>>   - Simply increment a denominator
>>
>>   kernel/sched/fair.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 0b069bf..f3b492d 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -1520,7 +1520,7 @@ static void update_task_scan_period(struct task_struct 
>> *p,
>>               * scanning faster if shared accesses dominate as it may
>>               * simply bounce migrations uselessly
>>               */
>> -            ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + 
>> shared));
>> +            ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + 
>> shared + 1));
>>              diff = (diff * ratio) / NUMA_PERIOD_SLOTS;
>>      }
>>
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to