Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

Mark Hahn Tue, 12 Apr 2016 15:47:44 -0700

Our problem seems to correlate with an unintentional creation of a tree of 
>500M files.  Some of the crashes we've had since then appeared
to be related to vm.zone_reclaim_mode=1.  We also enabled quotas right after 
the 500M file thing, and were thinking that inconsistent
quota records might cause this sort of crash.


Have you set vm.zone_reclaim_mode=0 yet?  I had an issue with this on my
file system a while back when it was set to 1.


all our existing Lustre MDSes run happily with vm.zone_reclaim_mode=0,
and making this one consistent appears to have resolved a problem
(in which one family of lustre kernel threads would appear to spin,
"perf top" showing nearly all time spent in spinlock_irq.  iirc.)

might your system have had a *lot* of memory? ours tend to befairly modest (32-64G, dual-socket intel.)


thanks,
Mark Hahn | SHARCnet Sysadmin | h...@sharcnet.ca | http://www.sharcnet.ca
          | McMaster RHPCS    | h...@mcmaster.ca | 905 525 9140 x24687
          | Compute/Calcul Canada                | http://www.computecanada.ca
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

Reply via email to