Re: [lustre-discuss] Do old clients ever go away?

2020-06-05 Thread K. Scott Rowe
I expect a reboot of the EVLA Lustre system will remove unused
IPs from that directory.  Many of those "gone" IPs are for
retired CBE nodes (cbe-node-{17..33}) and one (192.168.200.14)
is for I don't know what.

using "lshowmount -l" might be more useful than looking in that
directory.

On Jun 05 08:37, William D. Colburn wrote:
}I was looking in /proc/fs/lustre/mgs/MGS/exports/, and I see ip
}addresses in there that don't go anywhere anymore.  I'm pretty sure they
}are gone so long that they predate the uptime of the mds.  Does a lost
}client linger forever, or am I just wrong about when the machines went
}offline in relation to the uptime of the MDS?
}
}--Schlake
}___
}lustre-discuss mailing list
}lustre-discuss@lists.lustre.org
}http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] ldlm_enqueue and ldlm_cli_enqueue errors

2016-09-27 Thread K. Scott Rowe

We migrated from an MGS/MDS and OSSes running lustre-1.8.5 to a
completely new MGS/MDS and OSSes running lustre-2.4.3 on Sep. 24,
2016. We use a mix of lustre-1.8.9 and lustre-2.4.3 clients, both of
which mount lustre with the following options
"defaults,noauto,user_xattr,flock".  Since the migration, we have seen
various ldlm_enqueue and ldlm_cli_enqueue errors like the following...

These are from our lustre-2.4.3 clients connected with InfiniBand

  Sep 26 06:37:48 nmpost047 kernel: LustreError: 11-0: 
aoclst03-MDT-mdc-88101f7c7000: Communicating with 192.168.1.30@o2ib, 
operation ldlm_enqueue failed with -116.
  Sep 26 06:37:48 nmpost047 kernel: LustreError: 
38632:0:(mdc_locks.c:848:mdc_enqueue()) ldlm_cli_enqueue: -116

  Sep 26 08:46:58 nmpost060 kernel: LustreError: 11-0: 
aoclst03-MDT-mdc-8810622d5c00: Communicating with 192.168.1.30@o2ib, 
operation ldlm_enqueue failed with -95.
  Sep 26 08:46:58 nmpost060 kernel: LustreError: 
124585:0:(mdc_locks.c:848:mdc_enqueue()) ldlm_cli_enqueue: -95

  Sep 26 09:42:01 nmpost036 kernel: LustreError: 
21189:0:(mdc_locks.c:848:mdc_enqueue()) ldlm_cli_enqueue: -2

  Sep 26 20:37:08 nmpost017 kernel: LustreError: 11-0: 
aoclst03-MDT-mdc-880804845000: Communicating with 192.168.1.30@o2ib, 
operation ldlm_enqueue failed with -11.


These are from our lustre-1.8.9 clients connected with 1Gb and LNET
routers

  Sep 26 12:57:45 tofino kernel: LustreError: 11-0: an error occurred while 
communicating with 192.168.1.30@o2ib. The ldlm_enqueue operation failed with -11


I saw a reference to some of these message is
https://jira.hpdd.intel.com/browse/LU-4705 but it was not clear what
the seriousness of the error are.  Can anyone tell me if these are
errors we should worry about or are they more like warnings that
should be ignored?  And if they should be ignored, is there a way to
disable them?

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[Lustre-discuss] Proper shutdown sans clients

2013-10-22 Thread K. Scott Rowe

We use Lustre 1.8.7.  Our environment has many Lustre clients spread
out accross several networks.  When an emergency happes, like a power
outage, where we need to quickly shutdown the Lustre servers we
frequently are unable to shutdown the clients first.  I know that the
documentation recommends shutting down Lustre in this order:

  unmount clients
  unmount MDT
  unmount OSTs

So my question is, what would the recommended procedure be if one
cannot shutdown all the clients first?  Would it just be

  unmount MDT
  unmount OSTs

Or is there something else that should be done because we cannot get
the clients shutdown first?

--
K. Scott Rowe -- Linux Grouop Lead
Array Operations Center, National Radio Astronomy Observatory
kr...@nrao.edu -- http://www.aoc.nrao.edu/~krowe/
1.575.835.7000 -- 1003 Lopezville Socorro, NM 87801 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Proper shutdown sans clients

2013-10-22 Thread K. Scott Rowe
The problem is many of our clients are workstations on people's desks
as well as servers and clusters.  I don't have a good list of all the
Lustre clients so I don't know all the clients to unmount.  I've
thought about making a script to parse lshowmount and unmount clients
that way but haven't had the time.  I was hoping that ignoring the
clients would be acceptable and just let them recover once Lustre is
back up.

So, I guess it is a question of risk.  If the consensus is that
shutting down the clients first is very important to prevent data loss
and/or corruption then I will get a script working to do that.  But,
if the risk is minimal given a recommended way to shutdown Lustre
without dealing with the clients, I would be interested in that.


On Oct 22 15:06, Lee, Brett wrote:
}Scott,
}
}What is preventing the clients from being shut down?  Unable to unmount the 
file system?  If that is the case, please try umount -f instead of umount.  
This will result in an unclean unmount, but the client *will* be able to 
unmount Lustre.  If other, please advise.
}
}Regarding the order of shutdown, I've been advocating:
}
}Clients
}MDT
}OSTs
}
}As the MDS is a client of the OSTs (the MDS runs an OSC for each OST).
}
}However, using this method has seemingly triggered the recovery process on 
targets when bringing up the file system.
}
}I understand that in version 2.4, OSSs will need to communicate with the MDSs, 
seemingly creating a two way dependency (possibly my confusion on the matter, 
but clarification here is requested ).  However, as you are using 1.8.7, this 
latter point should not cause you any problems.
}
}--
}Brett Lee
}Sr. Systems Engineer
}Intel High Performance Data Division
}
}
} -Original Message-
} From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-
} boun...@lists.lustre.org] On Behalf Of K. Scott Rowe
} Sent: Tuesday, October 22, 2013 8:51 AM
} To: lustre-discuss@lists.lustre.org
} Subject: [Lustre-discuss] Proper shutdown sans clients
} 
} 
} We use Lustre 1.8.7.  Our environment has many Lustre clients spread out
} accross several networks.  When an emergency happes, like a power outage,
} where we need to quickly shutdown the Lustre servers we frequently are
} unable to shutdown the clients first.  I know that the documentation
} recommends shutting down Lustre in this order:
} 
}   unmount clients
}   unmount MDT
}   unmount OSTs
} 
} So my question is, what would the recommended procedure be if one cannot
} shutdown all the clients first?  Would it just be
} 
}   unmount MDT
}   unmount OSTs
} 
} Or is there something else that should be done because we cannot get the
} clients shutdown first?
} 
} --
} K. Scott Rowe -- Linux Grouop Lead
} Array Operations Center, National Radio Astronomy Observatory
} kr...@nrao.edu -- http://www.aoc.nrao.edu/~krowe/
} 1.575.835.7000 -- 1003 Lopezville Socorro, NM 87801
} ___
} Lustre-discuss mailing list
} Lustre-discuss@lists.lustre.org
} http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss