We’ve been testing DNE Phase II and tried scaling the number of MDSes(one MDT 
each for all of our tests) very high, but when we did that, we couldn’t mount 
the filesystem on a client.  After trial and error, we discovered that we were 
unable to mount the filesystem when there were 56 MDSes. 55 MDSes mounted 
without issue, and it appears any number below that will mount. This failure at 
56 MDSes was replicable across different nodes being used for the MDSes, all of 
which were tested with working configurations, so it doesn’t seem to be a bad 
server.

Here’s the error info we saw in dmesg on the client:

LustreError: 28880:0:(obd_config.c:559:class_setup()) setup 
lustre-MDT0037-mdc-ffff95923d31b000 failed (-16)
LustreError: 28880:0:(obd_config.c:1836:class_config_llog_handler()) 
MGCx.x.x.x@o2ib: cfg command failed: rc = -16
Lustre:    cmd=cf003 0:lustre-MDT0037-mdc  1:lustre-MDT0037_UUID  2:x.x.x.x@o2ib
LustreError: 15c-8: MGCx.x.x.x@o2ib: The configuration from log 'lustre-client' 
failed (-16). This may be the result of communication errors between this node 
and the MGS, a bad configuration, or other errors. See the syslog for more 
information.
LustreError: 28858:0:(obd_config.c:610:class_cleanup()) Device 58 not setup
Lustre: Unmounted lustre-client
LustreError: 28858:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  
(-16)

OS: CentOS 7.6.1810
Kernel: 3.10.0-957.5.1.el7.x86_64
Lustre: 2.12.1
Network card: Qlogic InfiniPath_QLE7340

Other things to note for completeness’ sake: this happened with both ldiskfs 
and zfs backfstypes, and these tests were using files in memory as the backing 
devices.

Is there something I’m missing as to why more than 56 MDSes won’t mount?

Thanks,
Scott White
Scientist, HPC
Los Alamos National Laboratory

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to