On 2015/3/4 11:53, Gu Zheng wrote:

> Hi Xishi,
> 
> On 03/04/2015 10:22 AM, Xishi Qiu wrote:
> 
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCU?) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
>>>>
>>>> process A:                 offline node XX:
>>>> for_each_populated_zone()
>>>> find online node XX
>>>> cond_resched()
>>>>                            offline cpu and memory, then try_offline_node()
>>>>                            node_set_offline(nid), and memset(pgdat, 0, 
>>>> sizeof(*pgdat))
>>>> access node XX's pgdat
>>>> NULL pointer access error
>>>
>>> It's possible, but I did not meet this condition, did you?
>>>
>>
>> Yes, we test hot-add/hot-remove node with stress, and meet the following
>> call trace several times.
> 
> Thanks.
> 
>>
>>      next_online_pgdat()
>>              int nid = next_online_node(pgdat->node_id);  // it's here, 
>> pgdat is NULL
> 
>       memset(pgdat, 0, sizeof(*pgdat));
> This memset just sets the context of pgdat to 0, but it will not free pgdat, 
> so the *pgdat is
> NULL* is strange here.
> But anyway, the bug is real, we must fix it.

next_zone()
        pg_data_t *pgdat = zone->zone_pgdat;  // I think this pgdat is NULL, 
and NODE_DATA() is not NULL.
        ...
        pgdat = next_online_pgdat(pgdat);
                int nid = next_online_node(pgdat->node_id);  // so here is the 
null pointer access

Thanks for your new patch, I'll test it.

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to