Re: [lustre-discuss] reflecting state of underlying store in Lustre with HSM
Can you provide an example of what you're attempting to accomplish? Am I understanding correctly, that you've got a lustre file system, you're then writing data into this file system? On Mon, Jan 6, 2020 at 10:02 PM Kristian Kvilekval wrote: > We are using Lustre on AWS backed by S3 buckets. > When creating a new Lustre filesystem, S3 metadata can be automatically > imported into Lustre. When changes occur to the underlying S3 store, > these changes are not automatically reflected. > > Is it possible to indicate the creation / deletion of the underlying S3 > files after filesystem creation using HSM? > Is it possible to reimport the underlying metadata after creation? > > Any pointers appreciated. > > Thanks, > Kris > > -- > Kris Kvilekval, Ph.D. > ViQi Inc > (805)-699-6081 > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LDLM locks not expiring/cancelling
Thanks Diego, long time no see! I haven't been using NRS TBF. I think there's a few problems, some of which we were aware of before, but the lack of lock cancels was causing chaos. * (Mark lustre_inode_cache as reclaimable) https://jira.whamcloud.com/browse/LU-12313 * tested on a 2.12.3 client (without patch above), but we were actually getting lock cancels now So I think I'll join 2020 and run 2.12.3 and probably add the SUnreclaim patch to that as well, as it seems simple enough. Thank you! ~Steve On Mon, Jan 6, 2020 at 2:33 AM Moreno Diego (ID SIS) < diego.mor...@id.ethz.ch> wrote: > Hi Steve, > > > > I was having a similar problem in the past months where the MDS servers > would go OOM because of SlabUnreclaim. The root cause has not yet been > found but we stopped seeing this the day we disabled the NRS TBF (QoS) for > any LDLM service (just in case you have it enabled). Just in case you have > it enabled. It would be good to check as well what’s being consumed in the > slab cache. In our case it was mostly kernel objects and not ldlm. > > > > Diego > > > > > > *From: *lustre-discuss on > behalf of Steve Crusan > *Date: *Thursday, 2 January 2020 at 20:25 > *To: *"lustre-discuss@lists.lustre.org" > *Subject: *[lustre-discuss] LDLM locks not expiring/cancelling > > > > Hi all, > > > > We are running into a bizarre situation where we aren't having stale locks > cancel themselves, and even worse, it seems as if > ldlm.namespaces.*.lru_size is being ignored. > > > > For instance, I unmount our Lustre file systems on a client machine, then > remount. Next, I'll run "lctl set_param ldlm.namespaces.*.lru_max_age=60s, > lctl set_param ldlm.namespaces.*.lru_size=1024". This (I believe) > theoretically would only allow 1024 ldlm locks per osc, and then I'd see a > lot of lock cancels (via ldlm.namespaces.${ost}.pool.stats). We also should > see cancels if the grant time > lru_max_age. > > > > We can trigger this simply by running 'find' on the root of our Lustre > file system, and waiting for awhile. Eventually the clients SUnreclaim > value bloats to 60-70GB (!!!), and each of our OSTs have 30-40k LRU locks > (via lock_count). This is early in the process: > > > > """ > > ldlm.namespaces.h5-OST003f-osc-8802d8559000.lock_count=2090 > ldlm.namespaces.h5-OST0040-osc-8802d8559000.lock_count=2127 > ldlm.namespaces.h5-OST0047-osc-8802d8559000.lock_count=52 > ldlm.namespaces.h5-OST0048-osc-8802d8559000.lock_count=1962 > ldlm.namespaces.h5-OST0049-osc-8802d8559000.lock_count=1247 > ldlm.namespaces.h5-OST004a-osc-8802d8559000.lock_count=1642 > ldlm.namespaces.h5-OST004b-osc-8802d8559000.lock_count=1340 > ldlm.namespaces.h5-OST004c-osc-8802d8559000.lock_count=1208 > ldlm.namespaces.h5-OST004d-osc-8802d8559000.lock_count=1422 > ldlm.namespaces.h5-OST004e-osc-8802d8559000.lock_count=1244 > ldlm.namespaces.h5-OST004f-osc-8802d8559000.lock_count=1117 > ldlm.namespaces.h5-OST0050-osc-8802d8559000.lock_count=1165 > > """ > > > > But this will grow over time, and eventually this compute node gets > evicted from the MDS (after 10 minutes of cancelling locks/hanging). The > only way we have been able to reduce the slab usage is to drop caches and > set LRU=clear...but the problem just comes back depending on the workload. > > > > We are running 2.10.3 client side, 2.10.1 server side. Have there been any > fixes added into the codebase for 2.10 that we need to apply? This seems to > be the closest to what we are experiencing: > > > > https://jira.whamcloud.com/browse/LU-11518 > > > > > > PS: I've checked other systems across our cluster, and some of them have > as many as 50k locks per OST. I am kind of wondering if these locks are > staying around much longer than the lru_max_age default (65 minutes), but I > cannot prove that. Is there a good way to translate held locks to fids? I > have been messing around with lctl set_param debug="XXX" and lctl set_param > ldlm.namespaces.*.dump_namespace, but I don't feel like I'm getting *all* > of the locks. > > > > ~Steve > -- *Steve Crusan* Storage Specialist DownUnder GeoSolutions 16200 Park Row Drive, Suite 100 Houston TX 77084, USA tel +1 832 582 3221 ste...@dug.com www.dug.com ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] project quotas
I am needing to set up project quotas, but am having a difficult time finding documentation/examples of how to do this. Could someone point me to a good source of information for this? Thanks. Heath ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] project quotas
At the risk of stating the obvious – have you tried following section 25 of the Lustre manual - http://doc.lustre.org/lustre_manual.pdf ? It would also be helpful for anyone looking to give guidance to know what version of Lustre and whether you are using ZFS or ldiskfs. From: lustre-discuss on behalf of "Peeples, Heath" Date: Tuesday, January 7, 2020 at 12:08 PM To: "lustre-discuss@lists.lustre.org" Subject: [lustre-discuss] project quotas I am needing to set up project quotas, but am having a difficult time finding documentation/examples of how to do this. Could someone point me to a good source of information for this? Thanks. Heath ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] reflecting state of underlying store in Lustre with HSM
We have Lustre <- HSM -> S3 We have direct modifications to S3 that occur after the Lustre filesystem is created I was wondering if there is any way to register a new/deleted file at the Lustre level using HSM or other commands Say a user uploads a file to S3, and I know the mapped path in Lustre, I would like to do lfs hsm_register /path/to/file/in/S3/ # Create a metadata entry in Lustre lfs hsm_restore /path/to/file/in/S3 # Fetch file from S3 into Lustre Thx On Tue, Jan 7, 2020 at 8:04 AM Colin Faber wrote: > Can you provide an example of what you're attempting to accomplish? Am I > understanding correctly, that you've got a lustre file system, you're then > writing data into this file system? > > On Mon, Jan 6, 2020 at 10:02 PM Kristian Kvilekval wrote: > >> We are using Lustre on AWS backed by S3 buckets. >> When creating a new Lustre filesystem, S3 metadata can be automatically >> imported into Lustre. When changes occur to the underlying S3 store, >> these changes are not automatically reflected. >> >> Is it possible to indicate the creation / deletion of the underlying S3 >> files after filesystem creation using HSM? >> Is it possible to reimport the underlying metadata after creation? >> >> Any pointers appreciated. >> >> Thanks, >> Kris >> >> -- >> Kris Kvilekval, Ph.D. >> ViQi Inc >> (805)-699-6081 >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> > -- Kris Kvilekval, Ph.D. ViQi Inc (805)-699-6081 ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] project quotas
The manual has good examples. Chap. 25.2 Enabling Disk Quotas for instance. On 1/7/20 9:07 PM, Peeples, Heath wrote: > I am needing to set up project quotas, but am having a difficult time > finding documentation/examples of how to do this. Could someone point > me to a good source of information for this? Thanks. > > > > Heath > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90-580 14 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org