Re: [lustre-discuss] reflecting state of underlying store in Lustre with HSM

2020-01-07 Thread Colin Faber
Can you provide an example of what you're attempting to accomplish?  Am I
understanding correctly, that you've got a lustre file system, you're then
writing data into this file system?

On Mon, Jan 6, 2020 at 10:02 PM Kristian Kvilekval  wrote:

> We are using Lustre on AWS backed by S3 buckets.
> When creating a new Lustre filesystem, S3 metadata can be automatically
> imported  into Lustre.  When changes occur to the underlying S3 store,
> these changes are not automatically reflected.
>
> Is it possible to indicate the creation / deletion of the underlying S3
> files after filesystem creation using HSM?
> Is it possible to reimport the underlying metadata after creation?
>
> Any pointers appreciated.
>
> Thanks,
> Kris
>
> --
> Kris Kvilekval, Ph.D.
> ViQi Inc
> (805)-699-6081
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LDLM locks not expiring/cancelling

2020-01-07 Thread Steve Crusan
Thanks Diego, long time no see! I haven't been using NRS TBF.

I think there's a few problems, some of which we were aware of before, but
the lack of lock cancels was causing chaos.

* (Mark lustre_inode_cache as reclaimable)
https://jira.whamcloud.com/browse/LU-12313
* tested on a 2.12.3 client (without patch above), but we were actually
getting lock cancels now

So I think I'll join 2020 and run 2.12.3 and probably add the SUnreclaim
patch to that as well, as it seems simple enough.

Thank you!

~Steve

On Mon, Jan 6, 2020 at 2:33 AM Moreno Diego (ID SIS) <
diego.mor...@id.ethz.ch> wrote:

> Hi Steve,
>
>
>
> I was having a similar problem in the past months where the MDS servers
> would go OOM because of SlabUnreclaim. The root cause has not yet been
> found but we stopped seeing this the day we disabled the NRS TBF (QoS) for
> any LDLM service (just in case you have it enabled). Just in case you have
> it enabled. It would be good to check as well what’s being consumed in the
> slab cache. In our case it was mostly kernel objects and not ldlm.
>
>
>
> Diego
>
>
>
>
>
> *From: *lustre-discuss  on
> behalf of Steve Crusan 
> *Date: *Thursday, 2 January 2020 at 20:25
> *To: *"lustre-discuss@lists.lustre.org" 
> *Subject: *[lustre-discuss] LDLM locks not expiring/cancelling
>
>
>
> Hi all,
>
>
>
> We are running into a bizarre situation where we aren't having stale locks
> cancel themselves, and even worse, it seems as if
> ldlm.namespaces.*.lru_size is being ignored.
>
>
>
> For instance, I unmount our Lustre file systems on a client machine, then
> remount. Next, I'll run "lctl set_param ldlm.namespaces.*.lru_max_age=60s,
> lctl set_param ldlm.namespaces.*.lru_size=1024". This (I believe)
> theoretically would only allow 1024 ldlm locks per osc, and then I'd see a
> lot of lock cancels (via ldlm.namespaces.${ost}.pool.stats). We also should
> see cancels if the grant time > lru_max_age.
>
>
>
> We can trigger this simply by running 'find' on the root of our Lustre
> file system, and waiting for awhile. Eventually the clients SUnreclaim
> value bloats to 60-70GB (!!!), and each of our OSTs have 30-40k LRU locks
> (via lock_count). This is early in the process:
>
>
>
> """
>
> ldlm.namespaces.h5-OST003f-osc-8802d8559000.lock_count=2090
> ldlm.namespaces.h5-OST0040-osc-8802d8559000.lock_count=2127
> ldlm.namespaces.h5-OST0047-osc-8802d8559000.lock_count=52
> ldlm.namespaces.h5-OST0048-osc-8802d8559000.lock_count=1962
> ldlm.namespaces.h5-OST0049-osc-8802d8559000.lock_count=1247
> ldlm.namespaces.h5-OST004a-osc-8802d8559000.lock_count=1642
> ldlm.namespaces.h5-OST004b-osc-8802d8559000.lock_count=1340
> ldlm.namespaces.h5-OST004c-osc-8802d8559000.lock_count=1208
> ldlm.namespaces.h5-OST004d-osc-8802d8559000.lock_count=1422
> ldlm.namespaces.h5-OST004e-osc-8802d8559000.lock_count=1244
> ldlm.namespaces.h5-OST004f-osc-8802d8559000.lock_count=1117
> ldlm.namespaces.h5-OST0050-osc-8802d8559000.lock_count=1165
>
> """
>
>
>
> But this will grow over time, and eventually this compute node gets
> evicted from the MDS (after 10 minutes of cancelling locks/hanging). The
> only way we have been able to reduce the slab usage is to drop caches and
> set LRU=clear...but the problem just comes back depending on the workload.
>
>
>
> We are running 2.10.3 client side, 2.10.1 server side. Have there been any
> fixes added into the codebase for 2.10 that we need to apply? This seems to
> be the closest to what we are experiencing:
>
>
>
> https://jira.whamcloud.com/browse/LU-11518
>
>
>
>
>
> PS: I've checked other systems across our cluster, and some of them have
> as many as 50k locks per OST. I am kind of wondering if these locks are
> staying around much longer than the lru_max_age default (65 minutes), but I
> cannot prove that. Is there a good way to translate held locks to fids? I
> have been messing around with lctl set_param debug="XXX" and lctl set_param
> ldlm.namespaces.*.dump_namespace, but I don't feel like I'm getting *all*
> of the locks.
>
>
>
> ~Steve
>


-- 

*Steve Crusan*

Storage Specialist







DownUnder GeoSolutions



16200 Park Row Drive, Suite 100

Houston TX 77084, USA

tel +1 832 582 3221

ste...@dug.com

www.dug.com
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] project quotas

2020-01-07 Thread Peeples, Heath
I am needing to set up project quotas, but am having a difficult time finding 
documentation/examples of how to do this.  Could someone point me to a good 
source of information for this?  Thanks.

Heath
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] project quotas

2020-01-07 Thread Peter Jones
At the risk of stating the obvious – have you tried following section 25 of the 
Lustre manual - http://doc.lustre.org/lustre_manual.pdf ? It would also be 
helpful for anyone looking to give guidance to know what version of Lustre and 
whether you are using ZFS or ldiskfs.

From: lustre-discuss  on behalf of 
"Peeples, Heath" 
Date: Tuesday, January 7, 2020 at 12:08 PM
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] project quotas

I am needing to set up project quotas, but am having a difficult time finding 
documentation/examples of how to do this.  Could someone point me to a good 
source of information for this?  Thanks.

Heath
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] reflecting state of underlying store in Lustre with HSM

2020-01-07 Thread Kristian Kvilekval
We have Lustre <- HSM -> S3

We have direct modifications to S3 that occur after the Lustre filesystem
is created
I was  wondering if there is any way to register a new/deleted file at the
Lustre level using HSM or other commands

Say a user uploads a file to S3, and I know  the mapped path in Lustre,
I would like to do
lfs hsm_register /path/to/file/in/S3/ # Create a metadata entry in
Lustre
lfs hsm_restore /path/to/file/in/S3   # Fetch file from S3 into Lustre

Thx






On Tue, Jan 7, 2020 at 8:04 AM Colin Faber  wrote:

> Can you provide an example of what you're attempting to accomplish?  Am I
> understanding correctly, that you've got a lustre file system, you're then
> writing data into this file system?
>
> On Mon, Jan 6, 2020 at 10:02 PM Kristian Kvilekval  wrote:
>
>> We are using Lustre on AWS backed by S3 buckets.
>> When creating a new Lustre filesystem, S3 metadata can be automatically
>> imported  into Lustre.  When changes occur to the underlying S3 store,
>> these changes are not automatically reflected.
>>
>> Is it possible to indicate the creation / deletion of the underlying S3
>> files after filesystem creation using HSM?
>> Is it possible to reimport the underlying metadata after creation?
>>
>> Any pointers appreciated.
>>
>> Thanks,
>> Kris
>>
>> --
>> Kris Kvilekval, Ph.D.
>> ViQi Inc
>> (805)-699-6081
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>

-- 
Kris Kvilekval, Ph.D.
ViQi Inc
(805)-699-6081
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] project quotas

2020-01-07 Thread Åke Sandgren
The manual has good examples. Chap. 25.2 Enabling Disk Quotas for instance.

On 1/7/20 9:07 PM, Peeples, Heath wrote:
> I am needing to set up project quotas, but am having a difficult time
> finding documentation/examples of how to do this.  Could someone point
> me to a good source of information for this?  Thanks.
> 
>  
> 
> Heath
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org