[ceph-users] Re: CephFS: Isolating folders for different users

2023-01-02 Thread Robert Gallop
One side affect of using sub volumes is that you can then only take a snap
at the sub volume level, nothing further down the tree.

I find you can use the same path on the auth without the sub volume unless
I’m missing something in this thread.

On Mon, Jan 2, 2023 at 10:21 AM Jonas Schwab <
jonas.sch...@physik.uni-wuerzburg.de> wrote:

> Thank you very much! Works like a charm, except for one thing: I gave my
> clients the MDS caps 'allow rws path=' to also be able
> to create snapshots from the client, but `mkdir .snap/test` still returns
>  mkdir: cannot create directory ‘.snap/test’: Operation not permitted
>
> Do you have an idea what might be the issue here?
>
> Best regards,
> Jonas
>
> PS: A happy new year to everyone!
>
> On 23.12.22 10:05, Kai Stian Olstad wrote:
> > On 22.12.2022 15:47, Jonas Schwab wrote:
> >> Now the question: Since I established this setup more or less through
> >> trial and error, I was wondering if there is a more elegant/better
> >> approach than what is outlined above?
> >
> > You can use namespace so you don't need separate pools.
> > Unfortunately the documentation is sparse on the subject, I use it
> > with subvolume like this
> >
> >
> > # Create a subvolume
> >
> > ceph fs subvolume create  
> > --pool_layout  --namespace-isolated
> >
> > The subvolume is created with namespace fsvolume_
> > You can also find the name with
> >
> > ceph fs subvolume info   | jq -r
> > .pool_namespace
> >
> >
> > # Create a user with access to the subvolume and the namespace
> >
> > ## First find the path to the subvolume
> >
> > ceph fs subvolume getpath  
> >
> > ## Create the user
> >
> > ceph auth get-or-create client. mon 'allow r' osd 'allow
> > rw pool= namespace=fsvolumens_'
> >
> >
> > I have found this by looking at how Openstack does it and some trial
> > and error.
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs ha mount expectations

2022-10-26 Thread Robert Gallop
I use this very example, few more servers.  I have no outage windows for my
ceph deployments as they support several production environments.

MDS is your focus, there are many knobs, but MDS is the key to client
experience.  In my environment, MDS failover takes 30-180 seconds,
depending on how much replay and rejoin needs to take place.  During this
failover I/O on the client is paused, but not broken. If you were to do an
ls at the time of failover, it may not return for a couple min worst case.
If a file transfer is ongoing it will stop writing for this failover time,
but both will complete after failover.

If I have MDs issues and failover for whatever reason is > 5 min, my
clients are lost.  I must reboot all clients tied to that MDS to recover
due to thousands of open files in various states.  This is obviously major
impact, and as we learn ceph happens less frequently, and only 3 times in
the first year of operation.

It’s awesome tech, and I look forward to future enhancements in general.

On Wed, Oct 26, 2022 at 3:41 AM William Edwards 
wrote:

>
> > Op 26 okt. 2022 om 10:11 heeft mj  het volgende
> geschreven:
> >
> > Hi!
> >
> > We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and
> would like to see our expectations confirmed (or denied) here. :-)
> >
> > Suppose we build a three-node cluster, three monitors, three MDSs, etc,
> in order to export a cephfs to multiple client nodes.
> >
> > On the (RHEL8) clients (web application servers) fstab, we will mount
> the cephfs like:
> >
> >> cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph
> name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2
> >
> > We expect that the RHEL clients will then be able to use (read/write) a
> shared /mnt/ha-pool directory simultaneously.
> >
> > Our question: how HA can we expect this setup to be? Looking for some
> practical experience here.
> >
> > Specific: Can we reboot any of the three involved ceph servers without
> the clients noticing anything? Or will there be certain timeouts involved,
> during which /mnt/ha-pool/ will appear unresposive, and *after* a timeout
> the client switches monitor node, and /mnt/ha-pool/ will respond again?
>
> Monitor failovers don’t cause a noticeable disruption IIRC.
>
> MDS failovers do. The MDS needs to replay. You can minimise the effect
> with mds_standby_replay.
>
> >
> > Of course we hope the answer is: in such a setup, cephfs clients should
> not notice a reboot at all. :-)
> >
> > All the best!
> >
> > MJ
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm container configurations

2022-10-25 Thread Robert Gallop
Host networking is used by default as the network layer (no ip forwarding
requirement), so if your OS is jumbo your containers are.

As for the resources I’ll let more knowledgeable answer that, but you can
certainly run mon’s and OSD’s on the same box assuming you have enough CPU
and memory.  I have a small POC cluster that runs on some nodes, MON, MGR,
and OSD’s, and other nodes running MON OSD and MDS, all works great with
192GB ram and 48 core boxes.



On Tue, Oct 25, 2022 at 6:39 AM Mikhail Sidorov 
wrote:

> Hello!
>
> I am planning a ceph cluster and evaluating cephadm as an orchestration
> tool
> My cluster is going to be relatively small at the start, so I am planning
> to run monitor daemons on the same node as osd. But I wanted to provide
> some QoS on memory and cpu resources, so I am wondering if it is possible
> to set the resource limits for containers via cephadm? And if not, wouldn't
> they be overwritten, if I configure them some other way? What is the most
> convenient way to do so?
> Also I wanted to configure the containers to use jumbo frames and
> preferably to use host networking to avoid additional overhead, is ithat
> possible?
>
> Best regards,
> Michael
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: subdirectory pinning and reducing ranks / max_mds

2022-10-21 Thread Robert Gallop
In my experience it just falls back to behave like its un-pinned.

For my use case I do the following:

/ pinned to rank 0
/env1 to rank 1
/env2 to rank 2
/env 3 to rank 3

If I do an upgrade it will collapse to single rank, all access/IO continues
after what would be a normal failover type of interval, i.e. IO may stop on
clients for 10-60 seconds or whatever as if a normal MDS rank failover
occurred.

But it will not remain in a locked state for the entire time from what I’ve
seen.

YMMV, but as long as the reduction in ranks actually works (we’ve had them
crash when trying to shut down and stuff), you should be in good shape.

If you do hit issues of ranks crashing, be ready to pause the upgrade, and
set your max_mds back to 3 or 4 to stop the immediate bleeding and continue
your troubleshooting without impact to the clients.

On Fri, Oct 21, 2022 at 12:29 PM Wesley Dillingham 
wrote:

> In a situation where you have say 3 active MDS (and 3 standbys).
> You have 3 ranks, 0,1,2
> In your filesystem you have three directories at the root level [/a, /b,
> /c]
>
> you pin:
> /a to rank 0
> /b to rank 1
> /c to rank 2
>
> and you need to upgrade your Ceph Version. When it becomes time to reduce
> max_mds to 1 and thereby reduce the number of ranks to 1, just rank 0 what
> happens to directories /b and /c do they become unavailable between the
> time when max_mds is reduced to 1 and after the upgrade when max_mds is
> restored to 3. Alternatively if a rank disappears does the CephFS client
> understand this and begin to ignore the pinned rank and makes use of the
> remaining ranks? Thanks.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't remove MON of failed node

2022-07-21 Thread Robert Gallop
Good deal.

I ended up going un-managed on my mom’s…. Had issues from time to time
where orch would decide I didn’t need where I pointed it to, and also
wouldn’t deploy them without the - - placement, labels, numbers etc
wouldn’t work either…

But been fine since unmanaged of course!

Glad your back to health_ok for now!

On Thu, Jul 21, 2022 at 5:04 AM Dominik Baack <
dominik.ba...@cs.uni-dortmund.de> wrote:

> Hi
>
> I tried
>
> ceph orch daemon rm mon.ml2rsn01 --force
> Error EINVAL: Unable to find daemon(s) ['mon.ml2rsn01']
>
> with no success.
>
> But this reminded me that it may be possible to apply a complete new set
> of configs via
> ceph orch apply mon --placement="..."
>
> and that worked out. I hope this creates no further problems down the line
> when I want to reintegrate a new sn01 node.
>
> Thanks
>
>
> Dominik
> Am 21.07.2022 um 12:01 schrieb Robert Gallop:
>
> You try ceph orch daemon rm already?
>
>
> On Thu, Jul 21, 2022 at 3:58 AM Dominik Baack <
> dominik.ba...@cs.uni-dortmund.de> wrote:
>
>> Hi,
>>
>> after removing a node from our cluster we are currently on cleanup:
>>
>> OSDs are removed and cluster is (mostly) healthy again
>>
>> mds were changed
>>
>>
>> But we still have one trailing error:
>>
>> CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): mon
>>
>>
>> with
>>
>> > ceph orch ls
>> > mon 4/4  8m ago 3M
>> > ml2rsn01;ml2rsn03;ml2rsn05;ml2rsn06;ml2rsn07
>> >
>>
>>
>> you can see that its still present somewhere.
>> How can I remove the last error until we get a replacement?
>>
>>
>> Cheers
>> Dominik Baack
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't remove MON of failed node

2022-07-21 Thread Robert Gallop
You try ceph orch daemon rm already?


On Thu, Jul 21, 2022 at 3:58 AM Dominik Baack <
dominik.ba...@cs.uni-dortmund.de> wrote:

> Hi,
>
> after removing a node from our cluster we are currently on cleanup:
>
> OSDs are removed and cluster is (mostly) healthy again
>
> mds were changed
>
>
> But we still have one trailing error:
>
> CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): mon
>
>
> with
>
> > ceph orch ls
> > mon 4/4  8m ago 3M
> > ml2rsn01;ml2rsn03;ml2rsn05;ml2rsn06;ml2rsn07
> >
>
>
> you can see that its still present somewhere.
> How can I remove the last error until we get a replacement?
>
>
> Cheers
> Dominik Baack
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm host maintenance

2022-07-13 Thread Robert Gallop
This brings up a good follow on…. Rebooting in general for OS patching.

I have not been leveraging the maintenance mode function, as I found it was
really no different than just setting noout and doing the reboot.  I find
if the box is the active manager the failover happens quick, painless and
automatically.  All the OSD’s just show as missing and come back once the
box is back from reboot…

Am I causing issues I may not be aware of?  How is everyone handling
patching reboots?

The only place I’m careful is the active MDS nodes, since that failover
does cause a period of no i/o for the mounted clients, I generally fail
that manually so I can ensure I don’t have to wait for the MDS to figure
out an instance is gone and spin up a standby….

Any tips or techniques until there is a more holistic approach?

Thanks!


On Wed, Jul 13, 2022 at 9:49 AM Adam King  wrote:

> Hello Steven,
>
> Arguably, it should, but right now nothing is implemented to do so and
> you'd have to manually run the "ceph mgr fail
> node2-cobj2-atdev1-nvan.ghxlvw" before it would allow you to put the host
> in maintenance. It's non-trivial from a technical point of view to have it
> automatically do the switch as the cephadm instance is running on that
> active mgr, so it will have to store somewhere that we wanted this host in
> maintenance, fail over the mgr itself, then have the new cephadm instance
> pick up that we wanted the host in maintenance and do so. Possible, but not
> something anyone has had a chance to implement. FWIW, I do believe there
> are also plans to eventually have a playbook for a rolling reboot or
> something of the sort added to https://github.com/ceph/cephadm-ansible.
> But
> for now, I think some sort of intervention to cause the fail over to happen
> before running the maintenance enter command is necessary.
>
> Regards,
>  - Adam King
>
> On Wed, Jul 13, 2022 at 11:02 AM Steven Goodliff <
> steven.goodl...@globalrelay.net> wrote:
>
> >
> > Hi,
> >
> >
> > I'm trying to reboot a ceph cluster one instance at a time by running in
> a
> > Ansible playbook which basically runs
> >
> >
> > cephadm shell ceph orch host maintenance enter   and then
> > reboots the instance and exits the maintenance
> >
> >
> > but i get
> >
> >
> > ALERT: Cannot stop active Mgr daemon, Please switch active Mgrs with
> 'ceph
> > mgr fail node2-cobj2-atdev1-nvan.ghxlvw'
> >
> >
> > on one instance.  should cephadm handle the switch ?
> >
> >
> > thanks
> >
> > Steven Goodliff
> > Global Relay
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best value for "mds_cache_memory_limit" for large (more than 10 Po) cephfs

2022-06-29 Thread Robert Gallop
I’d say one thing to keep in mind is the higher you have your cache, and
the more that is currently consumed, the LONGER it will take in the event
the reply has to take over…

While standby-reply does help to improve takeover times, its not
significant if there is a lot of clients with a lot of open caps.

We are using in the 40GB cache after ramping up a bit a time to help with
recalls.  But when I failover now I’m looking at 1-3 minutes, with or
without standby-replay enabled.

Do some testing with failovers if you have the ability to ensure that your
timings are OK, too big can cause issues in that area, that I know of…

Robert

On Wed, Jun 29, 2022 at 6:54 AM Eugen Block  wrote:

> Hi,
>
> you can check how much your MDS is currently using:
>
> ceph daemon mds. cache status
>
> Does it already scratch your limit? I usually start with lower values
> if it's difficult to determine how much it will actually use and
> increase it if necessary.
>
> Zitat von Arnaud M :
>
> > Hello to everyone
> >
> > I have a ceph cluster currently serving cephfs.
> >
> > The size of the ceph filesystem is around 1 Po.
> > 1 Active mds and 1 Standby-replay
> > I do not have a lot of cephfs clients for now 5 but it may increase to 20
> > or 30.
> >
> > Here is some output
> >
> > Rank  | State  | Daemon| Activity | Dentries
> |
> > Inodes  | Dirs| Caps
> >
> > 0 | active | ceph-g-ssd-4-2.mxwjvd | Reqs: 130 /s | 10.2 M
>  |
> > 10.1 M  | 356.8 k | 707.6 k
> >
> > 0-s   | standby-replay | ceph-g-ssd-4-1.ixqewp | Evts: 0 /s   | 156.5 k
> |
> > 127.7 k | 47.4 k  |  0
> >
> > It is working really well
> >
> > I plan to to increase this cephfs cluster up to 10 Po (for now) and even
> > more
> >
> > What would be the good value for "mds_cache_memory_limit" ? I have set it
> > to 80 Gb because I have enough ram on my server to do so.
> >
> > Was it a good idea ? Or is it counter-productive ?
> >
> > All the best
> >
> > Arnaud
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: recovery from catastrophic mon and mds failure after reboot and ip address change

2022-06-28 Thread Robert Gallop
Thanks for sharing, hope I never need the info, but glad to know it’s here
and doable!

On Tue, Jun 28, 2022 at 10:36 AM Florian Jonas 
wrote:

> Dear all,
>
> just when we received Eugens message, we managed (with additional help
> via zoom from other experts) to recover our filesystem. Thank you again
> for your help. I briefly document our solution here. The monitors were
> corrupted due to repeated destruction and recreation, destroying the
> store.db of the monitors. The OSDs were intact. We followed the solution
> here to recover the monitors from the store.db collected form the OSDs:
>
>
> https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds
>
> However, we had made one mistake during one of the steps. For anyone
> reading this: make sure that the OSD services are not running before
> running the procedure. We then stopped all ceph services and replaced
> the corrupted store.db for each node:
>
> mv $extractedstoredb/store.db /var/lib/ceph/mon/mon.foo/store.db
>
> chown -R ceph:ceph /var/lib/ceph/mon/mon.foo/store.db
>
> we then started the monitors one by one and then started the osd
> services again. At this stage we got the pools again. We then roughly
> followed the guide here:
>
> https://docs.ceph.com/en/quincy/cephfs/recover-fs-after-mon-store-loss/
>
> to restore the filesystem, while making sure that NO MDS is running.
> However, I think the exact commands depend on the ceph version, so I
> would double check with an expert for the last step, since as far as i
> understood it can lead to erasure of files if the --recover flag is not
> properly implemented.
>
> Best regards,
>
> Florian
>
>
>
> On 28/06/2022 15:12, Eugen Block wrote:
> > I agree, having one MON out of quorum should not result in hanging
> > ceph commands, maybe a little delay until all clients have noticed it.
> > So the first question is, what happened there? Did you notice anything
> > else that could disturb the cluster? Do you have the logs from the
> > remaining two MONs and do they reveal anything? But this is just
> > relevant for the analysis and maybe prevent something similar from
> > happening in the future. Have you tried restarting the MGR after the
> > OSDs came back up? If not, I would restart it (do you have a second
> > MGR to be able to failover?) and then also restart a single OSD to see
> > if anything changes in the cluster status. You're right about the MDS,
> > of course. First you need the cephfs pools to be available again
> > before the MDS can start its work.
> >
> > Zitat von Florian Jonas :
> >
> >> Hi,
> >>
> >> thanks a lot for getting back to me. I will try to clarify what
> >> happened and reconstruct the timeline. For context, our computing
> >> cluster is part of a bigger network infrastructure that is managed by
> >> someone else, and for the particular node running the MON and MDS we
> >> had not assigned a static IP address due to an oversight on our part.
> >> The cluster is run semi-professionally by me and a colleague and
> >> started as a small test but quickly grew in scale, so we are still
> >> somewhat beginners. The machine got stuck due to some unrelated issue
> >> and we had to reboot, and after reboot only this one address changed
> >> (last three digits).
> >>
> >> After the reboot, the ceph status command was no longer working,
> >> which caused a bit of a panic. In principle, it should have still
> >> worked since the other two machines still should have had quorum. We
> >> quickly realized the IP address change and destroyed the monitor in
> >> question and re-created it after we had changed the mon ip in the
> >> ceph config. However, I think this was a mistake since in general the
> >> system was not in a good state (I assume due to the crashed MDS). In
> >> the rush to get things back online (second mistake), the other two
> >> monitors were also destroyed and re-created, even though their IP
> >> address did not change. At this point the ceph status command was
> >> still not available and just hanging.
> >>
> >> We proceeded following the procedure outline here:
> >>
> >>
> https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds
> >>
> >>
> >> in order to restore the monitors  using the OSDs on each node. After
> >> following this procedure we managed to get all three monitors back
> >> online and they now all show a quorum. This is the current situation.
> >> I think this whole mess is a mix of unlucky circumstances and
> >> panicked incompetence on our part ...
> >>
> >> By restarting the MDS, do you mean restarting the MDS service on the
> >> node in question? All three of them currently show up as "inactive",
> >> I think because no filesystem is recognized and they see no reason to
> >> become active. Regarding your question why the backup MDS did not
> >> start, I do not know.  It is indeed strange!
> >>
> >> Best regards,
> >>
> >> Florian Jonas
> >>
> >>
> >> O

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Robert Gallop
I saw a major boost after having the sleep_hdd set to 0.  Only after that
did I start staying at around 500MiB to 1.2GiB/sec and 1.5k obj/sec to 2.5k
obj/sec.

Eventually it tapered back down, but for me sleep was the key, and
specifically in my case:

osd_recovery_sleep_hdd

On Mon, Jun 27, 2022 at 11:17 AM Curt  wrote:

> On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder  wrote:
>
> > I think this is just how ceph is. Maybe you should post the output of
> > "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an
> > idea whether what you look at is expected or not. As I wrote before,
> object
> > recovery is throttled and the recovery bandwidth depends heavily on
> object
> > size. The interesting question is, how many objects per second are
> > recovered/rebalanced
> >
>  data:
> pools:   11 pools, 369 pgs
> objects: 2.45M objects, 9.2 TiB
> usage:   20 TiB used, 60 TiB / 80 TiB avail
> pgs: 512136/9729081 objects misplaced (5.264%)
>  343 active+clean
>  22  active+remapped+backfilling
>
>   io:
> client:   2.0 MiB/s rd, 344 KiB/s wr, 142 op/s rd, 69 op/s wr
> recovery: 34 MiB/s, 8 objects/s
>
> Pool 12 is the only one with any stats.
>
> pool EC-22-Pool id 12
>   510048/9545052 objects misplaced (5.344%)
>   recovery io 36 MiB/s, 9 objects/s
>   client io 1.8 MiB/s rd, 404 KiB/s wr, 86 op/s rd, 72 op/s wr
>
> --- RAW STORAGE ---
> CLASSSIZE   AVAILUSED  RAW USED  %RAW USED
> hdd80 TiB  60 TiB  20 TiB20 TiB  25.45
> TOTAL  80 TiB  60 TiB  20 TiB20 TiB  25.45
>
> --- POOLS ---
> POOLID  PGS   STORED  OBJECTS USED  %USED  MAX
> AVAIL
> .mgr 11  152 MiB   38  457 MiB  0
>  9.2 TiB
> 21BadPool3   328 KiB1   12 KiB  0
> 18 TiB
> .rgw.root4   32  1.3 KiB4   48 KiB  0
>  9.2 TiB
> default.rgw.log  5   32  3.6 KiB  209  408 KiB  0
>  9.2 TiB
> default.rgw.control  6   32  0 B8  0 B  0
>  9.2 TiB
> default.rgw.meta 78  6.7 KiB   20  203 KiB  0
>  9.2 TiB
> rbd_rep_pool 8   32  2.0 MiB5  5.9 MiB  0
>  9.2 TiB
> default.rgw.buckets.index98  2.0 MiB   33  5.9 MiB  0
>  9.2 TiB
> default.rgw.buckets.non-ec  10   32  1.4 KiB0  4.3 KiB  0
>  9.2 TiB
> default.rgw.buckets.data11   32  232 GiB   61.02k  697 GiB   2.41
>  9.2 TiB
> EC-22-Pool  12  128  9.8 TiB2.39M   20 TiB  41.55
> 14 TiB
>
>
>
> > Maybe provide the output of the first two commands for
> > osd_recovery_sleep_hdd=0.05 and osd_recovery_sleep_hdd=0.1 each (wait a
> bit
> > after setting these and then collect the output). Include the applied
> > values for osd_max_backfills* and osd_recovery_max_active* for one of the
> > OSDs in the pool (ceph config show osd.ID | grep -e osd_max_backfills -e
> > osd_recovery_max_active).
> >
>
> I didn't notice any speed difference with sleep values changed, but I'll
> grab the stats between changes when I have a chance.
>
> ceph config show osd.19 | egrep 'osd_max_backfills|osd_recovery_max_active'
> osd_max_backfills1000
>
>
> override  mon[5]
> osd_recovery_max_active  1000
>
>
> override
> osd_recovery_max_active_hdd  1000
>
>
> override  mon[5]
> osd_recovery_max_active_ssd  1000
>
>
> override
>
> >
> > I don't really know if on such a small cluster one can expect more than
> > what you see. It has nothing to do with network speed if you have a 10G
> > line. However, recovery is something completely different from a full
> > link-speed copy.
> >
> > I can tell you that boatloads of tiny objects are a huge pain for
> > recovery, even on SSD. Ceph doesn't raid up sections of disks against
> each
> > other, but object for object. This might be a feature request: that PG
> > space allocation and recovery should follow the model of LVM extends
> > (ideally match with LVM extends) to allow recovery/rebalancing larger
> > chunks of storage in one go, containing parts of a large or many small
> > objects.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Curt 
> > Sent: 27 June 2022 17:35:19
> > To: Frank Schilder
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] Re: Ceph recovery network speed
> >
> > Hello,
> >
> > I had already increased/changed those variables previously.  I increased
> > the pg_num to 128. Which increased the number of PG's backfilling, but
> > speed is still only at 30 MiB/s avg and has been backfilling 23 pg for
> the
> > last several hours.  Should I increase it higher than 128?
> >
> 

[ceph-users] Re: Something akin to FSIMAGE in ceph

2022-02-15 Thread Robert Gallop
Thanks William….

I’m going to mess with it, see how it does.  I hadn’t thought about
utilizing mlocate for this case, but wouldn’t be the worst thing if it can
keep up.  The major issue we are having with most solutions is just time.
It takes so long for most utilities to run through even my modest 34
million files as of today.

That’s why I was wondering if this may be another ceph super power, as
having the rbytes and rifles has been for us.  Those features have
eliminated so many very slow processes for figuring out what directories
are using more than others (until we can get quotas in place).

If this file name data was already in some DB somewhere that could be
queried “off line” we wouldn’t have to waste the cycles actively capturing
that data etc.

Appreciate the response!



On Mon, Feb 14, 2022 at 11:57 PM William Edwards 
wrote:

>
> > Op 15 feb. 2022 om 02:19 heeft Robert Gallop 
> het volgende geschreven:
> >
> > Had the question posed to me and couldn’t find an immediate answer.
> >
> > Is there anyway we can query the MDS or some other component in the ceph
> > stack that would give essentially immediate access to all file names
> > contained in ceph?
> >
> > in HDFS we have the ability to pull the fsimage from the name nodes and
> > perform query like operations to find a file, lets say we wanted to see
> all
> > *log4j*.jar files that existed in HDFS, we could run this query and have
> > 20k results in a couple seconds.
> >
> > Right now with ceph, we are only using cephfs, kernel client mounts, so
> the
> > only “normal” way to do this is to use find, or ls, or whatever normal
> > tools could go looking for this jar across the various mount points.
>
> Can you mount / and use mlocate?
>
> >
> > So thought I’d ask if any one had some tricks that could be used to
> > basically ask the MDS or component that would know:  Show me the path of
> > every file ending in .jar that contains the letters/numbers log4j in its
> > name…
> >
> > Thanks!
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Something akin to FSIMAGE in ceph

2022-02-14 Thread Robert Gallop
Had the question posed to me and couldn’t find an immediate answer.

Is there anyway we can query the MDS or some other component in the ceph
stack that would give essentially immediate access to all file names
contained in ceph?

in HDFS we have the ability to pull the fsimage from the name nodes and
perform query like operations to find a file, lets say we wanted to see all
*log4j*.jar files that existed in HDFS, we could run this query and have
20k results in a couple seconds.

Right now with ceph, we are only using cephfs, kernel client mounts, so the
only “normal” way to do this is to use find, or ls, or whatever normal
tools could go looking for this jar across the various mount points.

So thought I’d ask if any one had some tricks that could be used to
basically ask the MDS or component that would know:  Show me the path of
every file ending in .jar that contains the letters/numbers log4j in its
name…

Thanks!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io