Re: [lustre-discuss] Do old clients ever go away?
I expect a reboot of the EVLA Lustre system will remove unused IPs from that directory. Many of those "gone" IPs are for retired CBE nodes (cbe-node-{17..33}) and one (192.168.200.14) is for I don't know what. using "lshowmount -l" might be more useful than looking in that directory. On Jun 05 08:37, William D. Colburn wrote: }I was looking in /proc/fs/lustre/mgs/MGS/exports/, and I see ip }addresses in there that don't go anywhere anymore. I'm pretty sure they }are gone so long that they predate the uptime of the mds. Does a lost }client linger forever, or am I just wrong about when the machines went }offline in relation to the uptime of the MDS? } }--Schlake }___ }lustre-discuss mailing list }lustre-discuss@lists.lustre.org }http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] ldlm_enqueue and ldlm_cli_enqueue errors
We migrated from an MGS/MDS and OSSes running lustre-1.8.5 to a completely new MGS/MDS and OSSes running lustre-2.4.3 on Sep. 24, 2016. We use a mix of lustre-1.8.9 and lustre-2.4.3 clients, both of which mount lustre with the following options "defaults,noauto,user_xattr,flock". Since the migration, we have seen various ldlm_enqueue and ldlm_cli_enqueue errors like the following... These are from our lustre-2.4.3 clients connected with InfiniBand Sep 26 06:37:48 nmpost047 kernel: LustreError: 11-0: aoclst03-MDT-mdc-88101f7c7000: Communicating with 192.168.1.30@o2ib, operation ldlm_enqueue failed with -116. Sep 26 06:37:48 nmpost047 kernel: LustreError: 38632:0:(mdc_locks.c:848:mdc_enqueue()) ldlm_cli_enqueue: -116 Sep 26 08:46:58 nmpost060 kernel: LustreError: 11-0: aoclst03-MDT-mdc-8810622d5c00: Communicating with 192.168.1.30@o2ib, operation ldlm_enqueue failed with -95. Sep 26 08:46:58 nmpost060 kernel: LustreError: 124585:0:(mdc_locks.c:848:mdc_enqueue()) ldlm_cli_enqueue: -95 Sep 26 09:42:01 nmpost036 kernel: LustreError: 21189:0:(mdc_locks.c:848:mdc_enqueue()) ldlm_cli_enqueue: -2 Sep 26 20:37:08 nmpost017 kernel: LustreError: 11-0: aoclst03-MDT-mdc-880804845000: Communicating with 192.168.1.30@o2ib, operation ldlm_enqueue failed with -11. These are from our lustre-1.8.9 clients connected with 1Gb and LNET routers Sep 26 12:57:45 tofino kernel: LustreError: 11-0: an error occurred while communicating with 192.168.1.30@o2ib. The ldlm_enqueue operation failed with -11 I saw a reference to some of these message is https://jira.hpdd.intel.com/browse/LU-4705 but it was not clear what the seriousness of the error are. Can anyone tell me if these are errors we should worry about or are they more like warnings that should be ignored? And if they should be ignored, is there a way to disable them? ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[Lustre-discuss] Proper shutdown sans clients
We use Lustre 1.8.7. Our environment has many Lustre clients spread out accross several networks. When an emergency happes, like a power outage, where we need to quickly shutdown the Lustre servers we frequently are unable to shutdown the clients first. I know that the documentation recommends shutting down Lustre in this order: unmount clients unmount MDT unmount OSTs So my question is, what would the recommended procedure be if one cannot shutdown all the clients first? Would it just be unmount MDT unmount OSTs Or is there something else that should be done because we cannot get the clients shutdown first? -- K. Scott Rowe -- Linux Grouop Lead Array Operations Center, National Radio Astronomy Observatory kr...@nrao.edu -- http://www.aoc.nrao.edu/~krowe/ 1.575.835.7000 -- 1003 Lopezville Socorro, NM 87801 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Proper shutdown sans clients
The problem is many of our clients are workstations on people's desks as well as servers and clusters. I don't have a good list of all the Lustre clients so I don't know all the clients to unmount. I've thought about making a script to parse lshowmount and unmount clients that way but haven't had the time. I was hoping that ignoring the clients would be acceptable and just let them recover once Lustre is back up. So, I guess it is a question of risk. If the consensus is that shutting down the clients first is very important to prevent data loss and/or corruption then I will get a script working to do that. But, if the risk is minimal given a recommended way to shutdown Lustre without dealing with the clients, I would be interested in that. On Oct 22 15:06, Lee, Brett wrote: }Scott, } }What is preventing the clients from being shut down? Unable to unmount the file system? If that is the case, please try umount -f instead of umount. This will result in an unclean unmount, but the client *will* be able to unmount Lustre. If other, please advise. } }Regarding the order of shutdown, I've been advocating: } }Clients }MDT }OSTs } }As the MDS is a client of the OSTs (the MDS runs an OSC for each OST). } }However, using this method has seemingly triggered the recovery process on targets when bringing up the file system. } }I understand that in version 2.4, OSSs will need to communicate with the MDSs, seemingly creating a two way dependency (possibly my confusion on the matter, but clarification here is requested ). However, as you are using 1.8.7, this latter point should not cause you any problems. } }-- }Brett Lee }Sr. Systems Engineer }Intel High Performance Data Division } } } -Original Message- } From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- } boun...@lists.lustre.org] On Behalf Of K. Scott Rowe } Sent: Tuesday, October 22, 2013 8:51 AM } To: lustre-discuss@lists.lustre.org } Subject: [Lustre-discuss] Proper shutdown sans clients } } } We use Lustre 1.8.7. Our environment has many Lustre clients spread out } accross several networks. When an emergency happes, like a power outage, } where we need to quickly shutdown the Lustre servers we frequently are } unable to shutdown the clients first. I know that the documentation } recommends shutting down Lustre in this order: } } unmount clients } unmount MDT } unmount OSTs } } So my question is, what would the recommended procedure be if one cannot } shutdown all the clients first? Would it just be } } unmount MDT } unmount OSTs } } Or is there something else that should be done because we cannot get the } clients shutdown first? } } -- } K. Scott Rowe -- Linux Grouop Lead } Array Operations Center, National Radio Astronomy Observatory } kr...@nrao.edu -- http://www.aoc.nrao.edu/~krowe/ } 1.575.835.7000 -- 1003 Lopezville Socorro, NM 87801 } ___ } Lustre-discuss mailing list } Lustre-discuss@lists.lustre.org } http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss