Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization

2019-02-13 Thread Michal Hocko
On Wed 13-02-19 08:14:50, Dave Hansen wrote:
> On 2/13/19 1:43 AM, Michal Hocko wrote:
> > 
> > We have seen several bugs where zonelists have not been initialized
> > properly and it is not really straightforward to track those bugs down.
> > One way to help a bit at least is to dump zonelists of each node when
> > they are (re)initialized.
> 
> Were you thinking of boot-time bugs and crashes, or just stuff going
> wonky after boot?

Mostly boot time. I haven't seen hotplug related bugs in this direction.
All the issues I have seen so far is that we forget a node altogether
and it ends up with no zonelists at all. But who knows maybe we have
some hidden bugs where zonelists is initialized only partially for some
reason and there is no real way to find out.

> We don't have the zonelists dumped in /proc anywhere, do we?  Would that
> help?

I would prefer to not export such an implementation detail into proc

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization

2019-02-13 Thread Dave Hansen
On 2/13/19 1:43 AM, Michal Hocko wrote:
> 
> We have seen several bugs where zonelists have not been initialized
> properly and it is not really straightforward to track those bugs down.
> One way to help a bit at least is to dump zonelists of each node when
> they are (re)initialized.

Were you thinking of boot-time bugs and crashes, or just stuff going
wonky after boot?

We don't have the zonelists dumped in /proc anywhere, do we?  Would that
help?


Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization

2019-02-13 Thread Michal Hocko
On Wed 13-02-19 14:11:31, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 12:50:14PM +0100, Michal Hocko wrote:
> > On Wed 13-02-19 11:32:31, Peter Zijlstra wrote:
> > > On Wed, Feb 13, 2019 at 10:43:15AM +0100, Michal Hocko wrote:
> > > > @@ -5259,6 +5261,11 @@ static void build_zonelists(pg_data_t *pgdat)
> > > >  
> > > > build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
> > > > build_thisnode_zonelists(pgdat);
> > > > +
> > > > +   pr_info("node[%d] zonelist: ", pgdat->node_id);
> > > > +   for_each_zone_zonelist(zone, z, 
> > > > &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1)
> > > > +   pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
> > > > +   pr_cont("\n");
> > > >  }
> > > 
> > > Have you ran this by the SGI and other stupid large machine vendors?
> > 
> > I do not have such a large machine handy. The biggest I have has
> > handfull (say dozen) of NUMA nodes.
> > 
> > > Traditionally they tend to want to remove such things instead of adding
> > > them.
> > 
> > I do not insist on this patch but I find it handy. If there is an
> > opposition I will not miss it much.
> 
> Well, I don't have machines like that either and don't mind the patch.
> Just raising the issue; I've had the big iron boys complain about
> similar things (typically printing something for every CPU, which gets
> out of hand much faster than zones, but still).

Maybe we can try to push this through and revert if somebody complains
about an excessive output.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization

2019-02-13 Thread Peter Zijlstra
On Wed, Feb 13, 2019 at 12:50:14PM +0100, Michal Hocko wrote:
> On Wed 13-02-19 11:32:31, Peter Zijlstra wrote:
> > On Wed, Feb 13, 2019 at 10:43:15AM +0100, Michal Hocko wrote:
> > > @@ -5259,6 +5261,11 @@ static void build_zonelists(pg_data_t *pgdat)
> > >  
> > >   build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
> > >   build_thisnode_zonelists(pgdat);
> > > +
> > > + pr_info("node[%d] zonelist: ", pgdat->node_id);
> > > + for_each_zone_zonelist(zone, z, 
> > > &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1)
> > > + pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
> > > + pr_cont("\n");
> > >  }
> > 
> > Have you ran this by the SGI and other stupid large machine vendors?
> 
> I do not have such a large machine handy. The biggest I have has
> handfull (say dozen) of NUMA nodes.
> 
> > Traditionally they tend to want to remove such things instead of adding
> > them.
> 
> I do not insist on this patch but I find it handy. If there is an
> opposition I will not miss it much.

Well, I don't have machines like that either and don't mind the patch.
Just raising the issue; I've had the big iron boys complain about
similar things (typically printing something for every CPU, which gets
out of hand much faster than zones, but still).


Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization

2019-02-13 Thread Michal Hocko
On Wed 13-02-19 11:32:31, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 10:43:15AM +0100, Michal Hocko wrote:
> > @@ -5259,6 +5261,11 @@ static void build_zonelists(pg_data_t *pgdat)
> >  
> > build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
> > build_thisnode_zonelists(pgdat);
> > +
> > +   pr_info("node[%d] zonelist: ", pgdat->node_id);
> > +   for_each_zone_zonelist(zone, z, 
> > &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1)
> > +   pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
> > +   pr_cont("\n");
> >  }
> 
> Have you ran this by the SGI and other stupid large machine vendors?

I do not have such a large machine handy. The biggest I have has
handfull (say dozen) of NUMA nodes.

> Traditionally they tend to want to remove such things instead of adding
> them.

I do not insist on this patch but I find it handy. If there is an
opposition I will not miss it much.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization

2019-02-13 Thread Peter Zijlstra
On Wed, Feb 13, 2019 at 10:43:15AM +0100, Michal Hocko wrote:
> @@ -5259,6 +5261,11 @@ static void build_zonelists(pg_data_t *pgdat)
>  
>   build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
>   build_thisnode_zonelists(pgdat);
> +
> + pr_info("node[%d] zonelist: ", pgdat->node_id);
> + for_each_zone_zonelist(zone, z, 
> &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1)
> + pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
> + pr_cont("\n");
>  }

Have you ran this by the SGI and other stupid large machine vendors?
Traditionally they tend to want to remove such things instead of adding
them.