On 2015/3/4 11:53, Gu Zheng wrote: > Hi Xishi, > > On 03/04/2015 10:22 AM, Xishi Qiu wrote: > >> On 2015/3/3 18:20, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>> >>>> When hot-remove a numa node, we will clear pgdat, >>>> but is memset 0 safe in try_offline_node()? >>> >>> It is not safe here. In fact, this is a temporary solution here. >>> As you know, pgdat is accessed lock-less now, so protection >>> mechanism (RCU?) is needed to make it completely safe here, >>> but it seems a bit over-kill. >>> >>>> >>>> process A: offline node XX: >>>> for_each_populated_zone() >>>> find online node XX >>>> cond_resched() >>>> offline cpu and memory, then try_offline_node() >>>> node_set_offline(nid), and memset(pgdat, 0, >>>> sizeof(*pgdat)) >>>> access node XX's pgdat >>>> NULL pointer access error >>> >>> It's possible, but I did not meet this condition, did you? >>> >> >> Yes, we test hot-add/hot-remove node with stress, and meet the following >> call trace several times. > > Thanks. > >> >> next_online_pgdat() >> int nid = next_online_node(pgdat->node_id); // it's here, >> pgdat is NULL > > memset(pgdat, 0, sizeof(*pgdat)); > This memset just sets the context of pgdat to 0, but it will not free pgdat, > so the *pgdat is > NULL* is strange here. > But anyway, the bug is real, we must fix it.
next_zone() pg_data_t *pgdat = zone->zone_pgdat; // I think this pgdat is NULL, and NODE_DATA() is not NULL. ... pgdat = next_online_pgdat(pgdat); int nid = next_online_node(pgdat->node_id); // so here is the null pointer access Thanks for your new patch, I'll test it. Thanks, Xishi Qiu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/