[ceph-users] Re: force-create-pg not working

2022-09-20 Thread Jesper Lykkegaard Karlsen
Hi Josh, 

Thanks for your reply. 
But this I already tried that, with no luck. 
Primary OSD goes down and hangs forever, upon "mark_unfound_lost delete” 
command. 

I guess it is too damaged to salvage, unless one really starts deleting 
individual corrupt objects?

Anyway, as I said. files in the PG are identified and under backup, so I just 
want to healthy, no matter what ;-)

I actually discovered that removing the pgs shards, with objectstore-tool 
indeed works in getting the pg back active-clean (containing 0 objects though). 

One just need to run a final remove - start/stop OSD - repair - mark-complete 
on the primary OSD. 
A scrub tells me that the "active+clean” state  is for real.

I also found out the more automated "force-create-pg" command only works on pgs 
that a in down state. 

Best, 
Jesper  
 

----------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 20 Sep 2022, at 15.40, Josh Baergen  wrote:
> 
> Hi Jesper,
> 
> Given that the PG is marked recovery_unfound, I think you need to
> follow 
> https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#unfound-objects.
> 
> Josh
> 
> On Tue, Sep 20, 2022 at 12:56 AM Jesper Lykkegaard Karlsen
>  wrote:
>> 
>> Dear all,
>> 
>> System: latest Octopus, 8+3 erasure Cephfs
>> 
>> I have a PG that has been driving me crazy.
>> It had gotten to a bad state after heavy backfilling, combined with OSD 
>> going down in turn.
>> 
>> State is:
>> 
>> active+recovery_unfound+undersized+degraded+remapped
>> 
>> I have tried repairing it with ceph-objectstore-tool, but no luck so far.
>> Given the time recovery takes this way and since data are under backup, I 
>> thought that I would do the "easy" approach instead and:
>> 
>>  *   scan pg_files with cephfs-data-scan
>>  *   delete data beloging to that pool
>>  *   recreate PG with "ceph osd force-create-pg"
>>  *   restore data
>> 
>> Although, this has shown not to be so easy after all.
>> 
>> ceph osd force-create-pg 20.13f --yes-i-really-mean-it
>> 
>> seems to be accepted well enough with "pg 20.13f now creating, ok", but then 
>> nothing happens.
>> Issuing the command again just gives a "pg 20.13f already creating" response.
>> 
>> If I restart the primary OSD, then the pending force-create-pg disappears.
>> 
>> I read that this could be due to crush map issue, but I have checked and 
>> that does not seem to be the case.
>> 
>> Would it, for instance, be possible to do the force-create-pg manually with 
>> something like this?:
>> 
>>  *   set nobackfill and norecovery
>>  *   delete the pgs shards one by one
>>  *   unset nobackfill and norecovery
>> 
>> 
>> Any idea on how to proceed from here is most welcome.
>> 
>> Thanks,
>> Jesper
>> 
>> 
>> --
>> Jesper Lykkegaard Karlsen
>> Scientific Computing
>> Centre for Structural Biology
>> Department of Molecular Biology and Genetics
>> Aarhus University
>> Universitetsbyen 81
>> 8000 Aarhus C
>> 
>> E-mail: je...@mbg.au.dk
>> Tlf:+45 50906203
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] force-create-pg not working

2022-09-19 Thread Jesper Lykkegaard Karlsen
Dear all,

System: latest Octopus, 8+3 erasure Cephfs

I have a PG that has been driving me crazy.
It had gotten to a bad state after heavy backfilling, combined with OSD going 
down in turn.

State is:

active+recovery_unfound+undersized+degraded+remapped

I have tried repairing it with ceph-objectstore-tool, but no luck so far.
Given the time recovery takes this way and since data are under backup, I 
thought that I would do the "easy" approach instead and:

  *   scan pg_files with cephfs-data-scan
  *   delete data beloging to that pool
  *   recreate PG with "ceph osd force-create-pg"
  *   restore data

Although, this has shown not to be so easy after all.

ceph osd force-create-pg 20.13f --yes-i-really-mean-it

seems to be accepted well enough with "pg 20.13f now creating, ok", but then 
nothing happens.
Issuing the command again just gives a "pg 20.13f already creating" response.

If I restart the primary OSD, then the pending force-create-pg disappears.

I read that this could be due to crush map issue, but I have checked and that 
does not seem to be the case.

Would it, for instance, be possible to do the force-create-pg manually with 
something like this?:

  *   set nobackfill and norecovery
  *   delete the pgs shards one by one
  *   unset nobackfill and norecovery


Any idea on how to proceed from here is most welcome.

Thanks,
Jesper


------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove corrupt PG

2022-09-01 Thread Jesper Lykkegaard Karlsen
Well not the total solution after all.
There is still some metadata and header structure left that I still cannot 
delete with ceph-objectstore-tool —op remove. 
It makes a core dump. 

I think I need to declare the OSD lost anyway to the through this. 
Unless somebody have a better suggestion?

Best, 
Jesper
--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 1 Sep 2022, at 22.01, Jesper Lykkegaard Karlsen  wrote:
> 
> To answer my own question. 
> 
> The removal of the  corrupt PG, could be fixed by doing ceph-objectstore-tool 
> fuse mount-thingy. 
> Then from the mount point, delete everything in the PGs head directory. 
> 
> This took only a few seconds (compared to 7.5 days) and after unmount and 
> restart of the OSD it came back online. 
> 
> Best, 
> Jesper
> 
> ------
> Jesper Lykkegaard Karlsen
> Scientific Computing
> Centre for Structural Biology
> Department of Molecular Biology and Genetics
> Aarhus University
> Universitetsbyen 81
> 8000 Aarhus C
> 
> E-mail: je...@mbg.au.dk
> Tlf:+45 50906203
> 
>> On 31 Aug 2022, at 20.53, Jesper Lykkegaard Karlsen  wrote:
>> 
>> Hi all, 
>> 
>> I wanted to move a PG to an empty OSD, so I could do repairs on it without 
>> the whole OSD, which is full of other PG’s, would be effected with extensive 
>> downtime. 
>> 
>> Thus, I exported the PG with ceph-objectstore-tool, an after successful 
>> export I removed it. Unfortunately, the remove command was interrupted 
>> midway. 
>> This resulted in a PG that could not be remove with “ceph-objectstore-tool 
>> —op remove ….”, since the header is gone. 
>> Worse is that the OSD does not boot, due to it can see objects from the 
>> removed PG, but cannot access them. 
>> 
>> I have tried to remove the individual objects in that PG (also with 
>> objectstore-tool), but this process is extremely slow. 
>> When looping over the >65,000 object, each remove takes ~10 sec and is very 
>> compute intensive, which is approximately 7.5 days. 
>> 
>> Is the a faster way to get around this? 
>> 
>> Mvh. Jesper
>> 
>> --
>> Jesper Lykkegaard Karlsen
>> Scientific Computing
>> Centre for Structural Biology
>> Department of Molecular Biology and Genetics
>> Aarhus University
>> Universitetsbyen 81
>> 8000 Aarhus C
>> 
>> E-mail: je...@mbg.au.dk
>> Tlf:+45 50906203
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove corrupt PG

2022-09-01 Thread Jesper Lykkegaard Karlsen
To answer my own question. 

The removal of the  corrupt PG, could be fixed by doing ceph-objectstore-tool 
fuse mount-thingy. 
Then from the mount point, delete everything in the PGs head directory. 

This took only a few seconds (compared to 7.5 days) and after unmount and 
restart of the OSD it came back online. 

Best, 
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 31 Aug 2022, at 20.53, Jesper Lykkegaard Karlsen  wrote:
> 
> Hi all, 
> 
> I wanted to move a PG to an empty OSD, so I could do repairs on it without 
> the whole OSD, which is full of other PG’s, would be effected with extensive 
> downtime. 
> 
> Thus, I exported the PG with ceph-objectstore-tool, an after successful 
> export I removed it. Unfortunately, the remove command was interrupted 
> midway. 
> This resulted in a PG that could not be remove with “ceph-objectstore-tool 
> —op remove ….”, since the header is gone. 
> Worse is that the OSD does not boot, due to it can see objects from the 
> removed PG, but cannot access them. 
> 
> I have tried to remove the individual objects in that PG (also with 
> objectstore-tool), but this process is extremely slow. 
> When looping over the >65,000 object, each remove takes ~10 sec and is very 
> compute intensive, which is approximately 7.5 days. 
> 
> Is the a faster way to get around this? 
> 
> Mvh. Jesper
> 
> --
> Jesper Lykkegaard Karlsen
> Scientific Computing
> Centre for Structural Biology
> Department of Molecular Biology and Genetics
> Aarhus University
> Universitetsbyen 81
> 8000 Aarhus C
> 
> E-mail: je...@mbg.au.dk
> Tlf:+45 50906203
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Remove corrupt PG

2022-08-31 Thread Jesper Lykkegaard Karlsen
Hi all, 

I wanted to move a PG to an empty OSD, so I could do repairs on it without the 
whole OSD, which is full of other PG’s, would be effected with extensive 
downtime. 

Thus, I exported the PG with ceph-objectstore-tool, an after successful export 
I removed it. Unfortunately, the remove command was interrupted midway. 
This resulted in a PG that could not be remove with “ceph-objectstore-tool —op 
remove ….”, since the header is gone. 
Worse is that the OSD does not boot, due to it can see objects from the removed 
PG, but cannot access them. 

I have tried to remove the individual objects in that PG (also with 
objectstore-tool), but this process is extremely slow. 
When looping over the >65,000 object, each remove takes ~10 sec and is very 
compute intensive, which is approximately 7.5 days. 

Is the a faster way to get around this? 

Mvh. Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Potential bug in cephfs-data-scan?

2022-08-19 Thread Jesper Lykkegaard Karlsen
Actually, it might have worked better if the PG had stayed down while running 
cephfs-data-scan, as it could only then get file structure from metadata pool 
and not touch each file/link in data pool?
This would at least properly have given the list of files in (only) the 
affected PG?

//Jesper


Fra: Jesper Lykkegaard Karlsen 
Sendt: 19. august 2022 22:49
Til: Patrick Donnelly 
Cc: ceph-users@ceph.io 
Emne: [ceph-users] Re: Potential bug in cephfs-data-scan?



Fra: Patrick Donnelly 
Sendt: 19. august 2022 16:16
Til: Jesper Lykkegaard Karlsen 
Cc: ceph-users@ceph.io 
Emne: Re: [ceph-users] Potential bug in cephfs-data-scan?

On Fri, Aug 19, 2022 at 5:02 AM Jesper Lykkegaard Karlsen
 wrote:
>>
> >Hi,
>>
>> I have recently been scanning the files in a PG with "cephfs-data-scan 
>> pg_files ...".

>Why?

I had an incident where a PG that went down+incomplete after some OSD crashed + 
heavy load + ongoing snap trimming.
Got it back up again with object store tool by marking complete.
Then I wanted to show possible affected files with cephfs-data-scan in the 
unfortunate PG, so I could recover potential loss from backup.


>> Although, after a long time the scan was still running and the list of files 
>> consumed 44 GB, I stopped it, as something obviously was very wrong.
>>
>> It turns out some users had symlinks that looped and even a user had a 
>> symlink to "/".

>Symlinks are not stored in the data pool. This should be irrelevant.

Okay, it may be a case of me "holding it wrong", but I do see "cephfs-data-scan 
pg_files" trying to follow any global or local symlink in the file structure, 
which leads to many more files registrered than possibly could be in that PG 
and even endless loops in some cases.

If the symlinks are not stored in data pool, how can cephfs-data-scan then 
follow the link?
And how do I get "cephfs-data-scan" to just show the symlinks as links and not 
follow them up or down in directory structure?

Best,
Jesper


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Potential bug in cephfs-data-scan?

2022-08-19 Thread Jesper Lykkegaard Karlsen



Fra: Patrick Donnelly 
Sendt: 19. august 2022 16:16
Til: Jesper Lykkegaard Karlsen 
Cc: ceph-users@ceph.io 
Emne: Re: [ceph-users] Potential bug in cephfs-data-scan?

On Fri, Aug 19, 2022 at 5:02 AM Jesper Lykkegaard Karlsen
 wrote:
>>
> >Hi,
>>
>> I have recently been scanning the files in a PG with "cephfs-data-scan 
>> pg_files ...".

>Why?

I had an incident where a PG that went down+incomplete after some OSD crashed + 
heavy load + ongoing snap trimming.
Got it back up again with object store tool by marking complete.
Then I wanted to show possible affected files with cephfs-data-scan in the 
unfortunate PG, so I could recover potential loss from backup.


>> Although, after a long time the scan was still running and the list of files 
>> consumed 44 GB, I stopped it, as something obviously was very wrong.
>>
>> It turns out some users had symlinks that looped and even a user had a 
>> symlink to "/".

>Symlinks are not stored in the data pool. This should be irrelevant.

Okay, it may be a case of me "holding it wrong", but I do see "cephfs-data-scan 
pg_files" trying to follow any global or local symlink in the file structure, 
which leads to many more files registrered than possibly could be in that PG 
and even endless loops in some cases.

If the symlinks are not stored in data pool, how can cephfs-data-scan then 
follow the link?
And how do I get "cephfs-data-scan" to just show the symlinks as links and not 
follow them up or down in directory structure?

Best,
Jesper


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Potential bug in cephfs-data-scan?

2022-08-19 Thread Jesper Lykkegaard Karlsen
Hi,

I have recently been scanning the files in a PG with "cephfs-data-scan pg_files 
...".

Although, after a long time the scan was still running and the list of files 
consumed 44 GB, I stopped it, as something obviously was very wrong.

It turns out some users had symlinks that looped and even a user had a symlink 
to "/".

It does not make sense that cephfs-data-scan follows symlinks, as this will 
give a wrong picture of what files are in the target PG.
I have looked though CEPHs bug reports, but I do not see anyone mentioning this.

Although I am still on the recently deprecated Octopus, I suspect that this bug 
is also present in Pacific and Quincy?

It might be related to this bug?

https://tracker.ceph.com/issues/46166

But symptoms are different.

Or, maybe there is a way to disable the following of symlinks in 
"cephfs-data-scan pg_files ..."?

Best,
Jesper

----------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: replacing OSD nodes

2022-07-28 Thread Jesper Lykkegaard Karlsen
Cool thanks a lot! 
I will definitely put it in my toolbox. 

Best, 
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 29 Jul 2022, at 00.35, Josh Baergen  wrote:
> 
>> I know the balancer will reach a well balanced PG landscape eventually, but 
>> I am not sure that it will prioritise backfill after “most available 
>> location” first.
> 
> Correct, I don't believe it prioritizes in this way.
> 
>> Have you tried the pgremapper youself Josh?
> 
> My team wrote and maintains pgremapper and we've used it extensively,
> but I'd always recommend trying it in test environments first. Its
> effect on the system isn't much different than what you're proposing
> (it simply manipulates the upmap exception table).
> 
> Josh

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: replacing OSD nodes

2022-07-28 Thread Jesper Lykkegaard Karlsen
Thanks you for your suggestions Josh, it is really appreciated. 

Pgremapper looks interesting and definitely something I will look into.
 
I know the balancer will reach a well balanced PG landscape eventually, but I 
am not sure that it will prioritise backfill after “most available location” 
first. 
Then I might end up in the same situation, where some of the old (but not 
retired) OSD starts getting full. 

Then there is the “undo-upmaps” script left or maybe even the script that I 
propose in combination with “cancel-backfill”, as it just moves what Ceph was 
planing to move anyway, just in a prioritised manner. 

Have you tried the pgremapper youself Josh? 
Is it safe to use? 
And does the Ceph developers vouch for this methode?   

Status now is ~1,600,000,000 objects are now move, which is about half of all 
of the planned backfills. 
I have been reweighing OSD down, as they get to close to maximum usage, which 
works to some extend. 

Monitors on the other hand are now complaining about using a lot of disk space, 
due to the long time backfilling. 
There is still plenty of disk space on the mons, but I feel that the backfill 
is getting slower and slower, although still the same amount of PGs are 
backfilling. 

Can large disk usage on mons slow down backfill and other operations? 
Is it dangerous? 

Best, 
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 28 Jul 2022, at 22.26, Josh Baergen  wrote:
> 
> I don't have many comments on your proposed approach, but just wanted
> to note that how I would have approached this, assuming that you have
> the same number of old hosts, would be to:
> 1. Swap-bucket the hosts.
> 2. Downweight the OSDs on the old hosts to 0.001. (Marking them out
> (i.e. weight 0) prevents maps from being applied.)
> 3. Add the old hosts back to the CRUSH map in their old racks or whatever.
> 4. Use https://github.com/digitalocean/pgremapper#cancel-backfill.
> 5. Then run https://github.com/digitalocean/pgremapper#undo-upmaps in
> a loop to drain the old OSDs.
> 
> This gives you the maximum concurrency and efficiency of movement, but
> doesn't necessarily solve your balance issue if it's the new OSDs that
> are getting full (that wasn't clear to me). It's still possible to
> apply steps 2, 4, and 5 if the new hosts are in place. If you're not
> in a rush could actually use the balancer instead of undo-upmaps in
> step 5 to perform the rest of the data migration and then you wouldn't
> have full OSDs.
> 
> Josh
> 
> On Fri, Jul 22, 2022 at 1:57 AM Jesper Lykkegaard Karlsen
>  wrote:
>> 
>> It seems like a low hanging fruit to fix?
>> There must be a reason why the developers have not made a prioritized order 
>> of backfilling PGs.
>> Or maybe the prioritization is something else than available space?
>> 
>> The answer remains unanswered, as well as if my suggested approach/script 
>> would work or not?
>> 
>> Summer vacation?
>> 
>> Best,
>> Jesper
>> 
>> --
>> Jesper Lykkegaard Karlsen
>> Scientific Computing
>> Centre for Structural Biology
>> Department of Molecular Biology and Genetics
>> Aarhus University
>> Universitetsbyen 81
>> 8000 Aarhus C
>> 
>> E-mail: je...@mbg.au.dk
>> Tlf:    +45 50906203
>> 
>> 
>> Fra: Janne Johansson 
>> Sendt: 20. juli 2022 19:39
>> Til: Jesper Lykkegaard Karlsen 
>> Cc: ceph-users@ceph.io 
>> Emne: Re: [ceph-users] replacing OSD nodes
>> 
>> Den ons 20 juli 2022 kl 11:22 skrev Jesper Lykkegaard Karlsen 
>> :
>>> Thanks for you answer Janne.
>>> Yes, I am also running "ceph osd reweight" on the "nearfull" osds, once 
>>> they get too close for comfort.
>>> 
>>> But I just though a continuous prioritization of rebalancing PGs, could 
>>> make this process more smooth, with less/no need for handheld operations.
>> 
>> You are absolutely right there, just wanted to chip in with my
>> experiences of "it nags at me but it will still work out" so other
>> people finding these mails later on can feel a bit relieved at knowing
>> that a few toofull warnings aren't a major disaster and that it
>> sometimes happens, because ceph looks for all possible moves, even
>> those who will run late in the rebalancing.
>> 
>> --
>> May the most significant bit of your life be positive.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot set quota on ceph fs root

2022-07-28 Thread Jesper Lykkegaard Karlsen
Hi Frank, 

I guess there is alway the possibility to set quota on pool level with 
"target_max_objects" and “target_max_bytes”
The cephfs quotas through attributes are only for sub-directories as far as I 
recall. 

Best, 
Jesper

------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 28 Jul 2022, at 17.22, Frank Schilder  wrote:
> 
> Hi Gregory,
> 
> thanks for your reply. It should be possible to set a quota on the root, 
> other vattribs can be set as well despite it being a mount point. There must 
> be something on the ceph side (or another bug in the kclient) preventing it.
> 
> By the way, I can't seem to find cephfs-tools like cephfs-shell. I'm using 
> the image quay.io/ceph/ceph:v15.2.16 and its not installed in the image. A 
> "yum provides cephfs-shell" returns no candidate and I can't find 
> installation instructions. Could you help me out here?
> 
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> 
> From: Gregory Farnum 
> Sent: 28 July 2022 16:59:50
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] cannot set quota on ceph fs root
> 
> On Thu, Jul 28, 2022 at 1:01 AM Frank Schilder  wrote:
>> 
>> Hi all,
>> 
>> I'm trying to set a quota on the ceph fs file system root, but it fails with 
>> "setfattr: /mnt/adm/cephfs: Invalid argument". I can set quotas on any 
>> sub-directory. Is this intentional? The documentation 
>> (https://docs.ceph.com/en/octopus/cephfs/quota/#quotas) says
>> 
>>> CephFS allows quotas to be set on any directory in the system.
>> 
>> Any includes the fs root. Is the documentation incorrect or is this a bug?
> 
> I'm not immediately seeing why we can't set quota on the root, but the
> root inode is special in a lot of ways so this doesn't surprise me.
> I'd probably regard it as a docs bug.
> 
> That said, there's also a good chance that the setfattr is getting
> intercepted before Ceph ever sees it, since by setting it on the root
> you're necessarily interacting with a mount point in Linux and those
> can also be finicky...You could see if it works by using cephfs-shell.
> -Greg
> 
> 
>> 
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG does not become active

2022-07-28 Thread Jesper Lykkegaard Karlsen
Ah I see, should have look at the “raw” data instead ;-)

Then I agree this very weird?

Best, 
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 28 Jul 2022, at 12.45, Frank Schilder  wrote:
> 
> Hi Jesper,
> 
> thanks for looking at this. The failure domain is OSD and not host. I typed 
> it wrong in the text, the copy of the crush rule shows it right: step choose 
> indep 0 type osd.
> 
> I'm trying to reproduce the observation to file a tracker item, but it is 
> more difficult than expected. It might be a race condition, so far I didn't 
> see it again. I hope I can figure out when and why this is happening.
> 
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> ____
> From: Jesper Lykkegaard Karlsen 
> Sent: 28 July 2022 12:02:51
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] PG does not become active
> 
> Hi Frank,
> 
> I think you need at least 6 OSD hosts to make EC 4+2 with faillure domain 
> host.
> 
> I do not know how it was possible for you to create that configuration at 
> first?
> Could it be that you have multiple name for the OSD hosts?
> That would at least explain the one OSD down, being show as two OSDs down.
> 
> Also, I believe that min_size should never be smaller than “coding” shards, 
> which is 4 in this case.
> 
> You can either make a new test setup with your three test OSD hosts using EC 
> 2+1 or make e.g. 4+2, but with failure domain set to OSD.
> 
> Best,
> Jesper
> 
> --
> Jesper Lykkegaard Karlsen
> Scientific Computing
> Centre for Structural Biology
> Department of Molecular Biology and Genetics
> Aarhus University
> Universitetsbyen 81
> 8000 Aarhus C
> 
> E-mail: je...@mbg.au.dk
> Tlf:+45 50906203
> 
>> On 27 Jul 2022, at 17.32, Frank Schilder  wrote:
>> 
>> Update: the inactive PG got recovered and active after a lnngg wait. The 
>> middle question is now answered. However, these two questions are still of 
>> great worry:
>> 
>> - How can 2 OSDs be missing if only 1 OSD is down?
>> - If the PG should recover, why is it not prioritised considering its severe 
>> degradation
>> compared with all other PGs?
>> 
>> I don't understand how a PG can loose 2 shards if 1 OSD goes down. That 
>> looks really really bad to me (did ceph loose track of data??).
>> 
>> The second is of no less importance. The inactive PG was holding back client 
>> IO, leading to further warnings about slow OPS/requests/... Why are such 
>> critically degraded PGs not scheduled for recovery first? There is a service 
>> outage but only a health warning?
>> 
>> Thanks and best regards.
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>> 
>> 
>> From: Frank Schilder 
>> Sent: 27 July 2022 17:19:05
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] PG does not become active
>> 
>> I'm testing octopus 15.2.16 and run into a problem right away. I'm filling 
>> up a small test cluster with 3 hosts 3x3 OSDs and killed one OSD to see how 
>> recovery works. I have one 4+2 EC pool with failure domain host and on 1 PGs 
>> of this pool 2 (!!!) shards are missing. This most degraded PG is not 
>> becoming active, its stuck inactive but peered.
>> 
>> Questions:
>> 
>> - How can 2 OSDs be missing if only 1 OSD is down?
>> - Wasn't there an important code change to allow recovery for an EC PG with 
>> at
>> least k shards present even if min_size>k? Do I have to set something?
>> - If the PG should recover, why is it not prioritised considering its severe 
>> degradation
>> compared with all other PGs?
>> 
>> I have already increased these crush tunables and executed a pg repeer to no 
>> avail:
>> 
>> tunable choose_total_tries 250 <-- default 100
>> rule fs-data {
>>   id 1
>>   type erasure
>>   min_size 3
>>   max_size 6
>>   step set_chooseleaf_tries 50 <-- default 5
>>   step set_choose_tries 200 <-- default 100
>>   step take default
>>   step choose indep 0 type osd
>>   step emit
>> }
>> 
>> Ceph health detail says to that:

[ceph-users] Re: PG does not become active

2022-07-28 Thread Jesper Lykkegaard Karlsen
Hi Frank, 

I think you need at least 6 OSD hosts to make EC 4+2 with faillure domain host. 

I do not know how it was possible for you to create that configuration at 
first? 
Could it be that you have multiple name for the OSD hosts? 
That would at least explain the one OSD down, being show as two OSDs down. 

Also, I believe that min_size should never be smaller than “coding” shards, 
which is 4 in this case. 

You can either make a new test setup with your three test OSD hosts using EC 
2+1 or make e.g. 4+2, but with failure domain set to OSD. 

Best, 
Jesper
  
--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 27 Jul 2022, at 17.32, Frank Schilder  wrote:
> 
> Update: the inactive PG got recovered and active after a lnngg wait. The 
> middle question is now answered. However, these two questions are still of 
> great worry:
> 
> - How can 2 OSDs be missing if only 1 OSD is down?
> - If the PG should recover, why is it not prioritised considering its severe 
> degradation
>  compared with all other PGs?
> 
> I don't understand how a PG can loose 2 shards if 1 OSD goes down. That looks 
> really really bad to me (did ceph loose track of data??).
> 
> The second is of no less importance. The inactive PG was holding back client 
> IO, leading to further warnings about slow OPS/requests/... Why are such 
> critically degraded PGs not scheduled for recovery first? There is a service 
> outage but only a health warning?
> 
> Thanks and best regards.
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> 
> From: Frank Schilder 
> Sent: 27 July 2022 17:19:05
> To: ceph-users@ceph.io
> Subject: [ceph-users] PG does not become active
> 
> I'm testing octopus 15.2.16 and run into a problem right away. I'm filling up 
> a small test cluster with 3 hosts 3x3 OSDs and killed one OSD to see how 
> recovery works. I have one 4+2 EC pool with failure domain host and on 1 PGs 
> of this pool 2 (!!!) shards are missing. This most degraded PG is not 
> becoming active, its stuck inactive but peered.
> 
> Questions:
> 
> - How can 2 OSDs be missing if only 1 OSD is down?
> - Wasn't there an important code change to allow recovery for an EC PG with at
>  least k shards present even if min_size>k? Do I have to set something?
> - If the PG should recover, why is it not prioritised considering its severe 
> degradation
>  compared with all other PGs?
> 
> I have already increased these crush tunables and executed a pg repeer to no 
> avail:
> 
> tunable choose_total_tries 250 <-- default 100
> rule fs-data {
>id 1
>type erasure
>min_size 3
>max_size 6
>step set_chooseleaf_tries 50 <-- default 5
>step set_choose_tries 200 <-- default 100
>step take default
>step choose indep 0 type osd
>step emit
> }
> 
> Ceph health detail says to that:
> 
> [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive
>pg 4.32 is stuck inactive for 37m, current state 
> recovery_wait+undersized+degraded+remapped+peered, last acting 
> [1,2147483647,2147483647,4,5,2]
> 
> I don't want to cheat and set min_size=k on this pool. It should work by 
> itself.
> 
> Thanks for any pointers!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: replacing OSD nodes

2022-07-22 Thread Jesper Lykkegaard Karlsen
It seems like a low hanging fruit to fix?
There must be a reason why the developers have not made a prioritized order of 
backfilling PGs.
Or maybe the prioritization is something else than available space?

The answer remains unanswered, as well as if my suggested approach/script would 
work or not?

Summer vacation?

Best,
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Janne Johansson 
Sendt: 20. juli 2022 19:39
Til: Jesper Lykkegaard Karlsen 
Cc: ceph-users@ceph.io 
Emne: Re: [ceph-users] replacing OSD nodes

Den ons 20 juli 2022 kl 11:22 skrev Jesper Lykkegaard Karlsen :
> Thanks for you answer Janne.
> Yes, I am also running "ceph osd reweight" on the "nearfull" osds, once they 
> get too close for comfort.
>
> But I just though a continuous prioritization of rebalancing PGs, could make 
> this process more smooth, with less/no need for handheld operations.

You are absolutely right there, just wanted to chip in with my
experiences of "it nags at me but it will still work out" so other
people finding these mails later on can feel a bit relieved at knowing
that a few toofull warnings aren't a major disaster and that it
sometimes happens, because ceph looks for all possible moves, even
those who will run late in the rebalancing.

--
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: replacing OSD nodes

2022-07-20 Thread Jesper Lykkegaard Karlsen
Thanks for you answer Janne.

Yes, I am also running "ceph osd reweight" on the "nearfull" osds, once they 
get too close for comfort.

But I just though a continuous prioritization of rebalancing PGs, could make 
this process more smooth, with less/no need for handheld operations.

Best,
Jesper

----------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Janne Johansson 
Sendt: 20. juli 2022 10:47
Til: Jesper Lykkegaard Karlsen 
Cc: ceph-users@ceph.io 
Emne: Re: [ceph-users] replacing OSD nodes

Den tis 19 juli 2022 kl 13:09 skrev Jesper Lykkegaard Karlsen :
>
> Hi all,
> Setup: Octopus - erasure 8-3
> I had gotten to the point where I had some rather old OSD nodes, that I 
> wanted to replace with new ones.
> The procedure was planned like this:
>
>   *   add new replacement OSD nodes
>   *   set all OSDs on the retiring nodes to out.
>   *   wait for everything to rebalance
>   *   remove retiring nodes

> After around 50% misplaced objects remaining, the OSDs started to complain 
> about backfillfull OSDs and nearfull OSDs.
> A bit of a surprise to me, as RAW size is only 47% used.
> It seems that rebalancing does not happen in a prioritized manner, where 
> planed backfill starts with the OSD with most space available space, but 
> "alphabetically" according to pg-name.
> Is this really true?

I don't know if it does it in any particular order, just that it
certainly doesn't fire off requests to the least filled OSD to receive
data first, so when I have gotten into similar situations, it just
tried to run as many moves as possible given max_backfill and all
that, then some/most might get stuck in toofull, but as the rest of
the slots progress, space gets available and at some point those
toofull ones get handled. It delays the completion but hasn't caused
me any other specific problems.

Though I will admit I have used "ceph osd reweight osd.123
" at times to force emptying of some OSDs, but that was
more my impatience than anything else.


--
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] replacing OSD nodes

2022-07-19 Thread Jesper Lykkegaard Karlsen
5 377 46 322 24 
306 53 200 240 338   #1.9TiB bytes available on most full OSD (306)
ceph osd pg-upmap-items 20.6c5 334 371 30 340 70 266 241 407 3 233 186 356 40 
312 294 391   #1.9TiB bytes available on most full OSD (233)
ceph osd pg-upmap-items 20.6b4 344 338 226 389 319 362 309 411 85 379 248 233 
121 318 0 254   #1.9TiB bytes available on most full OSD (233)
ceph osd pg-upmap-items 20.6b1 325 292 35 371 347 153 146 390 12 343 88 327 27 
355 54 250 192 408   #1.9TiB bytes available on most full OSD (153)
ceph osd pg-upmap-items 20.57 82 389 282 356 103 165 62 284 67 408 252 366   
#1.9TiB bytes available on most full OSD (165)
ceph osd pg-upmap-items 20.50 244 355 319 228 154 397 63 317 113 378 97 276 288 
150   #1.9TiB bytes available on most full OSD (228)
ceph osd pg-upmap-items 20.47 343 351 107 283 81 332 76 398 160 410 26 378   
#1.9TiB bytes available on most full OSD (283)
ceph osd pg-upmap-items 20.3e 56 322 31 283 330 377 107 360 199 309 190 385 78 
406   #1.9TiB bytes available on most full OSD (283)
ceph osd pg-upmap-items 20.3b 91 349 312 414 268 386 45 244 125 371   #1.9TiB 
bytes available on most full OSD (244)
ceph osd pg-upmap-items 20.3a 277 371 290 359 91 415 165 392 107 167   #1.9TiB 
bytes available on most full OSD (167)
ceph osd pg-upmap-items 20.39 74 175 18 302 240 393 3 269 224 374 194 408 173 
364   #1.9TiB bytes available on most full OSD (302)
...
...

If I were to set this into effect, I would first set norecover and nobackfill, 
then run the script and unset norecover and nobackfill again.
But I am uncertain if it would work? Or even if this is a good idea?

It would be nice if Ceph did something similar automatically 🙂
Or maybe Ceph already does something similar, and I have just not been able to 
find it?

If Ceph were to do this, it could be nice if the priority of backfill_wait PGs 
was rerun, perharps every 24 hours, as OSD availability landscape of course 
changes during backfill.

I imagine this, especially, could stabilize recovery/rebalance on systems where 
space is a little tight.

Best regards,
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs quota used

2021-12-17 Thread Jesper Lykkegaard Karlsen
Thanks Konstantin,

Actually, I went a bit further and made the script more universal in usage:

ceph_du_dir:
# usage: ceph_du_dir $DIR1 ($DIR2 .)
for i in $@; do
if [[ -d $i && ! -L $i ]]; then
echo "$(numfmt --to=iec --suffix=B --padding=7 $(getfattr --only-values -n 
ceph.dir.rbytes $i 2>/dev/nul) | sed -r 's/([0-9])([a-zA-Z])/\1 \2/g; 
s/([a-zA-Z])([0-9])/\1 \2/g') $i"
fi
done

The above can be run as:

ceph_du_dir $DIR

with multiple directories:

ceph_du_dir $DIR1 $DIR2 $DIR3 ..

Or even with wildcard:

ceph_du_dir $DIR/*

Best,
Jesper

----------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Konstantin Shalygin 
Sendt: 17. december 2021 09:17
Til: Jesper Lykkegaard Karlsen 
Cc: Robert Gallop ; ceph-users@ceph.io 

Emne: Re: [ceph-users] cephfs quota used

Or you can mount with 'dirstat' option and use 'cat .' for determine CephFS 
stats:

alias fsdf="cat . | grep rbytes | awk '{print \$2}' | numfmt --to=iec 
--suffix=B"

[root@host catalog]# fsdf
245GB
[root@host catalog]#


Cheers,
k

On 17 Dec 2021, at 00:25, Jesper Lykkegaard Karlsen 
mailto:je...@mbg.au.dk>> wrote:

Anyway, I just made my own ceph-fs version of "du".

ceph_du_dir:

#!/bin/bash
# usage: ceph_du_dir $DIR
SIZE=$(getfattr -n ceph.dir.rbytes $1 2>/dev/null| grep "ceph\.dir\.rbytes" | 
awk -F\= '{print $2}' | sed s/\"//g)
numfmt --to=iec-i --suffix=B --padding=7 $SIZE

Prints out ceph-fs dir size in "human-readble"
It works like a charm and my god it is fast!.

Tools like that could be very useful, if provided by the development team 🙂

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs quota used

2021-12-16 Thread Jesper Lykkegaard Karlsen
Not to spam, but to make it output prettier, one can also separate the number 
from the byte-size prefix.

numfmt --to=iec --suffix=B --padding=7 $(getfattr --only-values -n 
ceph.dir.rbytes $1 2>/dev/nul) | sed -r 's/([0-9])([a-zA-Z])/\1 \2/g; 
s/([a-zA-Z])([0-9])/\1 \2/g'

//Jesper
------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

____
Fra: Jesper Lykkegaard Karlsen 
Sendt: 16. december 2021 23:07
Til: Jean-Francois GUILLAUME 
Cc: Robert Gallop ; ceph-users@ceph.io 

Emne: [ceph-users] Re: cephfs quota used

Brilliant, thanks Jean-François

Best,
Jesper

------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Jean-Francois GUILLAUME 
Sendt: 16. december 2021 23:03
Til: Jesper Lykkegaard Karlsen 
Cc: Robert Gallop ; ceph-users@ceph.io 

Emne: Re: [ceph-users] Re: cephfs quota used

Hi,

You can avoid using awk by passing --only-values to getfattr.

This should look something like this :

> #!/bin/bash
> numfmt --to=iec-i --suffix=B --padding=7 $(getfattr --only-values -n
> ceph.dir.rbytes $1 2>/dev/null)

Best,
---
Cordialement,
Jean-François GUILLAUME
Plateforme Bioinformatique BiRD

Tél. : +33 (0)2 28 08 00 57
www.pf-bird.univ-nantes.fr<http://www.pf-bird.univ-nantes.fr><http://www.pf-bird.univ-nantes.fr<http://www.pf-bird.univ-nantes.fr>>

Inserm UMR 1087/CNRS UMR 6291
IRS-UN - 8 quai Moncousu - BP 70721
44007 Nantes Cedex 1

Le 2021-12-16 22:25, Jesper Lykkegaard Karlsen a écrit :
> To answer my own question.
> It seems Frank Schilder asked a similar question two years ago:
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/6ENI42ZMHTTP2OONBRD7FDP7LQBC4P2E/
>
> listxattr() was aparrently removed and not much have happen since then
> it seems.
>
> Anyway, I just made my own ceph-fs version of "du".
>
> ceph_du_dir:
>
> #!/bin/bash
> # usage: ceph_du_dir $DIR
> SIZE=$(getfattr -n ceph.dir.rbytes $1 2>/dev/null| grep
> "ceph\.dir\.rbytes" | awk -F\= '{print $2}' | sed s/\"//g)
> numfmt --to=iec-i --suffix=B --padding=7 $SIZE
>
> Prints out ceph-fs dir size in "human-readble"
> It works like a charm and my god it is fast!.
>
> Tools like that could be very useful, if provided by the development
> team 🙂
>
> Best,
> Jesper
>
> --
> Jesper Lykkegaard Karlsen
> Scientific Computing
> Centre for Structural Biology
> Department of Molecular Biology and Genetics
> Aarhus University
> Gustav Wieds Vej 10
> 8000 Aarhus C
>
> E-mail: je...@mbg.au.dk
> Tlf:+45 50906203
>
> 
> Fra: Jesper Lykkegaard Karlsen 
> Sendt: 16. december 2021 14:37
> Til: Robert Gallop 
> Cc: ceph-users@ceph.io 
> Emne: [ceph-users] Re: cephfs quota used
>
> Woops, wrong copy/pasta:
>
> getfattr -n ceph.dir.rbytes $DIR
>
> works on all distributions I have tested.
>
> It is:
>
> getfattr -d -m 'ceph.*' $DIR
>
> that does not work on Rocky Linux 8, Ubuntu 18.04, but works on CentOS
> 7.
>
> Best,
> Jesper
> ------
> Jesper Lykkegaard Karlsen
> Scientific Computing
> Centre for Structural Biology
> Department of Molecular Biology and Genetics
> Aarhus University
> Gustav Wieds Vej 10
> 8000 Aarhus C
>
> E-mail: je...@mbg.au.dk
> Tlf:+45 50906203
>
> 
> Fra: Jesper Lykkegaard Karlsen 
> Sendt: 16. december 2021 13:57
> Til: Robert Gallop 
> Cc: ceph-users@ceph.io 
> Emne: [ceph-users] Re: cephfs quota used
>
> Just tested:
>
> getfattr -n ceph.dir.rbytes $DIR
>
> Works on CentOS 7, but not on Ubuntu 18.04 eighter.
> Weird?
>
> Best,
> Jesper
> ----------
> Jesper Lykkegaard Karlsen
> Scientific Computing
> Centre for Structural Biology
> Department of Molecular Biology and Genetics
> Aarhus University
> Gustav Wieds Vej 10
> 8000 Aarhus C
>
> E-mail: je...@mbg.au.dk
> Tlf:+45 50906203
>
> 
> Fra: Robert Gallop 
> Sendt: 16. december 2021 13:42
> Til: Jesper Lykkegaard Karlsen 
> Cc: ceph-users@ceph.io 
> Emne: Re: [ceph-users] Re: cephfs quota used
>
> From what I understand you used to be able to do that but cannot on
> later kernels?
>
> Seems there would be a list somewhere,

[ceph-users] Re: cephfs quota used

2021-12-16 Thread Jesper Lykkegaard Karlsen
Brilliant, thanks Jean-François

Best,
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Jean-Francois GUILLAUME 
Sendt: 16. december 2021 23:03
Til: Jesper Lykkegaard Karlsen 
Cc: Robert Gallop ; ceph-users@ceph.io 

Emne: Re: [ceph-users] Re: cephfs quota used

Hi,

You can avoid using awk by passing --only-values to getfattr.

This should look something like this :

> #!/bin/bash
> numfmt --to=iec-i --suffix=B --padding=7 $(getfattr --only-values -n
> ceph.dir.rbytes $1 2>/dev/null)

Best,
---
Cordialement,
Jean-François GUILLAUME
Plateforme Bioinformatique BiRD

Tél. : +33 (0)2 28 08 00 57
www.pf-bird.univ-nantes.fr<http://www.pf-bird.univ-nantes.fr>

Inserm UMR 1087/CNRS UMR 6291
IRS-UN - 8 quai Moncousu - BP 70721
44007 Nantes Cedex 1

Le 2021-12-16 22:25, Jesper Lykkegaard Karlsen a écrit :
> To answer my own question.
> It seems Frank Schilder asked a similar question two years ago:
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/6ENI42ZMHTTP2OONBRD7FDP7LQBC4P2E/
>
> listxattr() was aparrently removed and not much have happen since then
> it seems.
>
> Anyway, I just made my own ceph-fs version of "du".
>
> ceph_du_dir:
>
> #!/bin/bash
> # usage: ceph_du_dir $DIR
> SIZE=$(getfattr -n ceph.dir.rbytes $1 2>/dev/null| grep
> "ceph\.dir\.rbytes" | awk -F\= '{print $2}' | sed s/\"//g)
> numfmt --to=iec-i --suffix=B --padding=7 $SIZE
>
> Prints out ceph-fs dir size in "human-readble"
> It works like a charm and my god it is fast!.
>
> Tools like that could be very useful, if provided by the development
> team 🙂
>
> Best,
> Jesper
>
> --
> Jesper Lykkegaard Karlsen
> Scientific Computing
> Centre for Structural Biology
> Department of Molecular Biology and Genetics
> Aarhus University
> Gustav Wieds Vej 10
> 8000 Aarhus C
>
> E-mail: je...@mbg.au.dk
> Tlf:+45 50906203
>
> 
> Fra: Jesper Lykkegaard Karlsen 
> Sendt: 16. december 2021 14:37
> Til: Robert Gallop 
> Cc: ceph-users@ceph.io 
> Emne: [ceph-users] Re: cephfs quota used
>
> Woops, wrong copy/pasta:
>
> getfattr -n ceph.dir.rbytes $DIR
>
> works on all distributions I have tested.
>
> It is:
>
> getfattr -d -m 'ceph.*' $DIR
>
> that does not work on Rocky Linux 8, Ubuntu 18.04, but works on CentOS
> 7.
>
> Best,
> Jesper
> --
> Jesper Lykkegaard Karlsen
> Scientific Computing
> Centre for Structural Biology
> Department of Molecular Biology and Genetics
> Aarhus University
> Gustav Wieds Vej 10
> 8000 Aarhus C
>
> E-mail: je...@mbg.au.dk
> Tlf:+45 50906203
>
> 
> Fra: Jesper Lykkegaard Karlsen 
> Sendt: 16. december 2021 13:57
> Til: Robert Gallop 
> Cc: ceph-users@ceph.io 
> Emne: [ceph-users] Re: cephfs quota used
>
> Just tested:
>
> getfattr -n ceph.dir.rbytes $DIR
>
> Works on CentOS 7, but not on Ubuntu 18.04 eighter.
> Weird?
>
> Best,
> Jesper
> ------
> Jesper Lykkegaard Karlsen
> Scientific Computing
> Centre for Structural Biology
> Department of Molecular Biology and Genetics
> Aarhus University
> Gustav Wieds Vej 10
> 8000 Aarhus C
>
> E-mail: je...@mbg.au.dk
> Tlf:+45 50906203
>
> 
> Fra: Robert Gallop 
> Sendt: 16. december 2021 13:42
> Til: Jesper Lykkegaard Karlsen 
> Cc: ceph-users@ceph.io 
> Emne: Re: [ceph-users] Re: cephfs quota used
>
> From what I understand you used to be able to do that but cannot on
> later kernels?
>
> Seems there would be a list somewhere, but I can’t find it, maybe
> it’s changing too often depending on the kernel your using or
> something.
>
> But yeah, these attrs are one of the major reasons we are moving from
> traditional appliance NAS to ceph, the many other benefits come with
> it.
>
> On Thu, Dec 16, 2021 at 5:38 AM Jesper Lykkegaard Karlsen
> mailto:je...@mbg.au.dk>> wrote:
> Thanks everybody,
>
> That was a quick answer.
>
> getfattr -n ceph.dir.rbytes $DIR
>
> Was the answer that worked for me. So getfattr was the solution after
> all.
>
> Is there some way I can display all attributes, without knowing them
> in forehand?
>
> I have tried:
>
> getfattr -d -m 'ceph.*' $DIR
>
> which gives me no output. Should that not list all atributes?
>
> This 

[ceph-users] Re: cephfs quota used

2021-12-16 Thread Jesper Lykkegaard Karlsen
To answer my own question.
It seems Frank Schilder asked a similar question two years ago:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/6ENI42ZMHTTP2OONBRD7FDP7LQBC4P2E/

listxattr() was aparrently removed and not much have happen since then it seems.

Anyway, I just made my own ceph-fs version of "du".

ceph_du_dir:

#!/bin/bash
# usage: ceph_du_dir $DIR
SIZE=$(getfattr -n ceph.dir.rbytes $1 2>/dev/null| grep "ceph\.dir\.rbytes" | 
awk -F\= '{print $2}' | sed s/\"//g)
numfmt --to=iec-i --suffix=B --padding=7 $SIZE

Prints out ceph-fs dir size in "human-readble"
It works like a charm and my god it is fast!.

Tools like that could be very useful, if provided by the development team 🙂

Best,
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

________
Fra: Jesper Lykkegaard Karlsen 
Sendt: 16. december 2021 14:37
Til: Robert Gallop 
Cc: ceph-users@ceph.io 
Emne: [ceph-users] Re: cephfs quota used

Woops, wrong copy/pasta:

getfattr -n ceph.dir.rbytes $DIR

works on all distributions I have tested.

It is:

getfattr -d -m 'ceph.*' $DIR

that does not work on Rocky Linux 8, Ubuntu 18.04, but works on CentOS 7.

Best,
Jesper
--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

____
Fra: Jesper Lykkegaard Karlsen 
Sendt: 16. december 2021 13:57
Til: Robert Gallop 
Cc: ceph-users@ceph.io 
Emne: [ceph-users] Re: cephfs quota used

Just tested:

getfattr -n ceph.dir.rbytes $DIR

Works on CentOS 7, but not on Ubuntu 18.04 eighter.
Weird?

Best,
Jesper
--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

____
Fra: Robert Gallop 
Sendt: 16. december 2021 13:42
Til: Jesper Lykkegaard Karlsen 
Cc: ceph-users@ceph.io 
Emne: Re: [ceph-users] Re: cephfs quota used

From what I understand you used to be able to do that but cannot on later 
kernels?

Seems there would be a list somewhere, but I can’t find it, maybe it’s changing 
too often depending on the kernel your using or something.

But yeah, these attrs are one of the major reasons we are moving from 
traditional appliance NAS to ceph, the many other benefits come with it.

On Thu, Dec 16, 2021 at 5:38 AM Jesper Lykkegaard Karlsen 
mailto:je...@mbg.au.dk>> wrote:
Thanks everybody,

That was a quick answer.

getfattr -n ceph.dir.rbytes $DIR

Was the answer that worked for me. So getfattr was the solution after all.

Is there some way I can display all attributes, without knowing them in 
forehand?

I have tried:

getfattr -d -m 'ceph.*' $DIR

which gives me no output. Should that not list all atributes?

This is on Rocky Linux kernel 4.18.0-348.2.1.el8_5.x86_64

Best,
Jesper
--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk<mailto:je...@mbg.au.dk>
Tlf:+45 50906203


Fra: Sebastian Knust 
mailto:skn...@physik.uni-bielefeld.de>>
Sendt: 16. december 2021 13:01
Til: Jesper Lykkegaard Karlsen mailto:je...@mbg.au.dk>>; 
ceph-users@ceph.io<mailto:ceph-users@ceph.io> 
mailto:ceph-users@ceph.io>>
Emne: Re: [ceph-users] cephfs quota used

Hi Jasper,

On 16.12.21 12:45, Jesper Lykkegaard Karlsen wrote:
> Now, I want to access the usage information of folders with quotas from root 
> level of the cephfs.
> I have failed to find this information through getfattr commands, only quota 
> limits are shown here, and du-command on individual folders is a suboptimal 
> solution.

`getfattr -n ceph.quota.max_bytes /path` gives the specified quota for a
given path.
`getfattr -n ceph.dir.rbytes /path` gives the size of the path, as you
would usually get with du for conventional file systems.

As an example, I am using this script for weekly utilisation reports:
> for i in /ceph-path-to-home-dirs/*; do
> if [ -d "$i" ]; then
> SIZE=$(getfattr -n ceph.dir.rbytes --only-values "$i")
> QUOTA=$(getfattr -n ceph.quota.max_bytes --only-values "$i" 
> 2>/dev/null || echo 0)
> PERC=$(echo $SIZE*100/$QUOTA | bc 2> /dev/null)
> if [ -z "$PERC" ]; then PERC="--"; fi
>   

[ceph-users] Re: cephfs quota used

2021-12-16 Thread Jesper Lykkegaard Karlsen
Woops, wrong copy/pasta:

getfattr -n ceph.dir.rbytes $DIR

works on all distributions I have tested.

It is:

getfattr -d -m 'ceph.*' $DIR

that does not work on Rocky Linux 8, Ubuntu 18.04, but works on CentOS 7.

Best,
Jesper
------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Jesper Lykkegaard Karlsen 
Sendt: 16. december 2021 13:57
Til: Robert Gallop 
Cc: ceph-users@ceph.io 
Emne: [ceph-users] Re: cephfs quota used

Just tested:

getfattr -n ceph.dir.rbytes $DIR

Works on CentOS 7, but not on Ubuntu 18.04 eighter.
Weird?

Best,
Jesper
------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Robert Gallop 
Sendt: 16. december 2021 13:42
Til: Jesper Lykkegaard Karlsen 
Cc: ceph-users@ceph.io 
Emne: Re: [ceph-users] Re: cephfs quota used

>From what I understand you used to be able to do that but cannot on later 
>kernels?

Seems there would be a list somewhere, but I can’t find it, maybe it’s changing 
too often depending on the kernel your using or something.

But yeah, these attrs are one of the major reasons we are moving from 
traditional appliance NAS to ceph, the many other benefits come with it.

On Thu, Dec 16, 2021 at 5:38 AM Jesper Lykkegaard Karlsen 
mailto:je...@mbg.au.dk>> wrote:
Thanks everybody,

That was a quick answer.

getfattr -n ceph.dir.rbytes $DIR

Was the answer that worked for me. So getfattr was the solution after all.

Is there some way I can display all attributes, without knowing them in 
forehand?

I have tried:

getfattr -d -m 'ceph.*' $DIR

which gives me no output. Should that not list all atributes?

This is on Rocky Linux kernel 4.18.0-348.2.1.el8_5.x86_64

Best,
Jesper
----------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk<mailto:je...@mbg.au.dk>
Tlf:+45 50906203


Fra: Sebastian Knust 
mailto:skn...@physik.uni-bielefeld.de>>
Sendt: 16. december 2021 13:01
Til: Jesper Lykkegaard Karlsen mailto:je...@mbg.au.dk>>; 
ceph-users@ceph.io<mailto:ceph-users@ceph.io> 
mailto:ceph-users@ceph.io>>
Emne: Re: [ceph-users] cephfs quota used

Hi Jasper,

On 16.12.21 12:45, Jesper Lykkegaard Karlsen wrote:
> Now, I want to access the usage information of folders with quotas from root 
> level of the cephfs.
> I have failed to find this information through getfattr commands, only quota 
> limits are shown here, and du-command on individual folders is a suboptimal 
> solution.

`getfattr -n ceph.quota.max_bytes /path` gives the specified quota for a
given path.
`getfattr -n ceph.dir.rbytes /path` gives the size of the path, as you
would usually get with du for conventional file systems.

As an example, I am using this script for weekly utilisation reports:
> for i in /ceph-path-to-home-dirs/*; do
> if [ -d "$i" ]; then
> SIZE=$(getfattr -n ceph.dir.rbytes --only-values "$i")
> QUOTA=$(getfattr -n ceph.quota.max_bytes --only-values "$i" 
> 2>/dev/null || echo 0)
> PERC=$(echo $SIZE*100/$QUOTA | bc 2> /dev/null)
> if [ -z "$PERC" ]; then PERC="--"; fi
> printf "%-30s %8s %8s %8s%%\n" "$i" `numfmt --to=iec $SIZE` `numfmt 
> --to=iec $QUOTA` $PERC
> fi
> done


Note that you can also mount CephFS with the "rbytes" mount option. IIRC
the fuse clients defaults to it, for the kernel client you have to
specify it in the mount command or fstab entry.

The rbytes option returns the recursive path size (so the
ceph.dir.rbytes fattr) in stat calls to directories, so you will see it
with ls immediately. I really like it!

Just beware that some software might have issues with this behaviour -
alpine is the only example (bug report and patch proposal have been
submitted) that I know of.

Cheers
Sebastian
___
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs quota used

2021-12-16 Thread Jesper Lykkegaard Karlsen
Just tested:

getfattr -n ceph.dir.rbytes $DIR

Works on CentOS 7, but not on Ubuntu 18.04 eighter.
Weird?

Best,
Jesper
--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Robert Gallop 
Sendt: 16. december 2021 13:42
Til: Jesper Lykkegaard Karlsen 
Cc: ceph-users@ceph.io 
Emne: Re: [ceph-users] Re: cephfs quota used

>From what I understand you used to be able to do that but cannot on later 
>kernels?

Seems there would be a list somewhere, but I can’t find it, maybe it’s changing 
too often depending on the kernel your using or something.

But yeah, these attrs are one of the major reasons we are moving from 
traditional appliance NAS to ceph, the many other benefits come with it.

On Thu, Dec 16, 2021 at 5:38 AM Jesper Lykkegaard Karlsen 
mailto:je...@mbg.au.dk>> wrote:
Thanks everybody,

That was a quick answer.

getfattr -n ceph.dir.rbytes $DIR

Was the answer that worked for me. So getfattr was the solution after all.

Is there some way I can display all attributes, without knowing them in 
forehand?

I have tried:

getfattr -d -m 'ceph.*' $DIR

which gives me no output. Should that not list all atributes?

This is on Rocky Linux kernel 4.18.0-348.2.1.el8_5.x86_64

Best,
Jesper
----------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk<mailto:je...@mbg.au.dk>
Tlf:+45 50906203


Fra: Sebastian Knust 
mailto:skn...@physik.uni-bielefeld.de>>
Sendt: 16. december 2021 13:01
Til: Jesper Lykkegaard Karlsen mailto:je...@mbg.au.dk>>; 
ceph-users@ceph.io<mailto:ceph-users@ceph.io> 
mailto:ceph-users@ceph.io>>
Emne: Re: [ceph-users] cephfs quota used

Hi Jasper,

On 16.12.21 12:45, Jesper Lykkegaard Karlsen wrote:
> Now, I want to access the usage information of folders with quotas from root 
> level of the cephfs.
> I have failed to find this information through getfattr commands, only quota 
> limits are shown here, and du-command on individual folders is a suboptimal 
> solution.

`getfattr -n ceph.quota.max_bytes /path` gives the specified quota for a
given path.
`getfattr -n ceph.dir.rbytes /path` gives the size of the path, as you
would usually get with du for conventional file systems.

As an example, I am using this script for weekly utilisation reports:
> for i in /ceph-path-to-home-dirs/*; do
> if [ -d "$i" ]; then
> SIZE=$(getfattr -n ceph.dir.rbytes --only-values "$i")
> QUOTA=$(getfattr -n ceph.quota.max_bytes --only-values "$i" 
> 2>/dev/null || echo 0)
> PERC=$(echo $SIZE*100/$QUOTA | bc 2> /dev/null)
> if [ -z "$PERC" ]; then PERC="--"; fi
> printf "%-30s %8s %8s %8s%%\n" "$i" `numfmt --to=iec $SIZE` `numfmt 
> --to=iec $QUOTA` $PERC
> fi
> done


Note that you can also mount CephFS with the "rbytes" mount option. IIRC
the fuse clients defaults to it, for the kernel client you have to
specify it in the mount command or fstab entry.

The rbytes option returns the recursive path size (so the
ceph.dir.rbytes fattr) in stat calls to directories, so you will see it
with ls immediately. I really like it!

Just beware that some software might have issues with this behaviour -
alpine is the only example (bug report and patch proposal have been
submitted) that I know of.

Cheers
Sebastian
___
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs quota used

2021-12-16 Thread Jesper Lykkegaard Karlsen
Thanks everybody,

That was a quick answer.

getfattr -n ceph.dir.rbytes $DIR

Was the answer that worked for me. So getfattr was the solution after all.

Is there some way I can display all attributes, without knowing them in 
forehand?

I have tried:

getfattr -d -m 'ceph.*' $DIR

which gives me no output. Should that not list all atributes?

This is on Rocky Linux kernel 4.18.0-348.2.1.el8_5.x86_64

Best,
Jesper
------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Sebastian Knust 
Sendt: 16. december 2021 13:01
Til: Jesper Lykkegaard Karlsen ; ceph-users@ceph.io 

Emne: Re: [ceph-users] cephfs quota used

Hi Jasper,

On 16.12.21 12:45, Jesper Lykkegaard Karlsen wrote:
> Now, I want to access the usage information of folders with quotas from root 
> level of the cephfs.
> I have failed to find this information through getfattr commands, only quota 
> limits are shown here, and du-command on individual folders is a suboptimal 
> solution.

`getfattr -n ceph.quota.max_bytes /path` gives the specified quota for a
given path.
`getfattr -n ceph.dir.rbytes /path` gives the size of the path, as you
would usually get with du for conventional file systems.

As an example, I am using this script for weekly utilisation reports:
> for i in /ceph-path-to-home-dirs/*; do
> if [ -d "$i" ]; then
> SIZE=$(getfattr -n ceph.dir.rbytes --only-values "$i")
> QUOTA=$(getfattr -n ceph.quota.max_bytes --only-values "$i" 
> 2>/dev/null || echo 0)
> PERC=$(echo $SIZE*100/$QUOTA | bc 2> /dev/null)
> if [ -z "$PERC" ]; then PERC="--"; fi
> printf "%-30s %8s %8s %8s%%\n" "$i" `numfmt --to=iec $SIZE` `numfmt 
> --to=iec $QUOTA` $PERC
> fi
> done


Note that you can also mount CephFS with the "rbytes" mount option. IIRC
the fuse clients defaults to it, for the kernel client you have to
specify it in the mount command or fstab entry.

The rbytes option returns the recursive path size (so the
ceph.dir.rbytes fattr) in stat calls to directories, so you will see it
with ls immediately. I really like it!

Just beware that some software might have issues with this behaviour -
alpine is the only example (bug report and patch proposal have been
submitted) that I know of.

Cheers
Sebastian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephfs quota used

2021-12-16 Thread Jesper Lykkegaard Karlsen
Hi all,

Cephfs quota work really well for me.
A cool feature is that if one mounts a folder, which has quotas enabled, then 
the mountpoint will show as a partition of quota size and how much is used 
(e.g. with df command), nice!

Now, I want to access the usage information of folders with quotas from root 
level of the cephfs.
I have failed to find this information through getfattr commands, only quota 
limits are shown here, and du-command on individual folders is a suboptimal 
solution.
The usage information must be somewhere in ceph metadata/mondb, but where and 
how do I read?

Best,
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Recover data from Cephfs snapshot

2021-03-12 Thread Jesper Lykkegaard Karlsen
Hi Ceph'ers,

I love the possibility to make snapshots on Cephfs systems.

Although there is one thing that puzzles me.

Creating snapshot takes no time to do and deleting snapshots can bring PGs into 
snaptrim state for some hours.
While recovering data from a snapshot will always invoke a full data transfer, 
where data are "physically" being copied back into place.

This can make recovering from snapshots on Cephfs a rather heavy procedure.
I have even tried "mv" command but that also starts transfer real data instead 
of just moving metadata pointers.

Am I missing some "ceph snapshot recover" command, that can move metadata 
pointers and make recovery much lighter, or is this just that way it is?

Best reagards,
Jesper

------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephfs metadata and MDS on same node

2021-03-09 Thread Jesper Lykkegaard Karlsen
Dear Ceph’ers

I am about to upgrade MDS nodes for Cephfs in the Ceph cluster (erasure code 
8+3 ) I am administrating.

Since they will get plenty of memory and CPU cores, I was wondering if it would 
be a good idea to move metadata OSDs (NVMe's currently on OSD nodes together 
with cephfs_data ODS (HDD)) to the MDS nodes?

Configured as:

4 x MDS with each a metadata OSD and configured with 4 x replication

so each metadata OSD would have a complete copy of metadata.

I know MDS, stores al lot of metadata in RAM, but if metadata OSDs were on MDS 
nodes, would that not bring down latency?

Anyway, I am just asking for your opinion on this? Pros and cons or even better 
somebody who actually have tried this?

Best regards,
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk<mailto:je...@mbg.au.dk>
Tlf:+45 50906203

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Healthy objects trapped in incomplete pgs

2020-04-23 Thread Jesper Lykkegaard Karlsen
Dear Cephers,


A few days ago disaster struck the Ceph cluster (erasure-coded) I am 
administrating, as the UPS power was pull from the cluster causing a power 
outage.


After rebooting the system, 6 osds were lost (spread over 5 osd nodes) as they 
could not mount anymore, several others had damages. This was more than the 
host-faliure domain was setup to handle and auto-recovery failed and osds 
started downing in a cascading maner.


When the dust settled, there were 8 pgs (of 2048) inactive and a bunch of osds 
down. I managed to recover 5 pgs, mainly by ceph-objectstore-tool 
export/import/repair commands, but now I am left with 3 pgs that are inactive 
and incomplete.


One of the pgs seems un-salvageable, as I cannot get to become active at all 
(repair/import/export/lowering min_size), but the two others I can get active 
if I export/import one of the pg shards and restart osd.


Rebuilding then starts but after a while one of the osds holding the pgs goes 
down, with a "FAILED ceph_assert(clone_size.count(clone))" message in the log.

If I set osds to noout nodown, then I can that it is only rather few objects 
e.g. 161 of a pg of >10, that are failing to be remapped.


Since most of the object in the two pgs seem intact, it would be sad to delete 
the whole pg (force-create-pg) and loose all that data.


Is there a way to show and delete the failing objects?


I have thought of a recovery plan and want to share that with you, so you can 
comment on this if it sounds doable or not?


  *   Stop osds from recovering:ceph osd set norecover
  *   bring back pgs active:ceph-objectstore-tool export/import and 
restart osd
  *   find files in pgs:  cephfs-data-scan pg_files  

  *   pull out as many as possible of those files to other location.
  *   recreate pgs:  ceph osd force-create-pg 
  *   restart recovery:ceph osd unset norecover
  *   copy back in the recovered files


Would that work or do you have a better suggestion?


Cheers,

Jesper


------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io