[ceph-users] Slow osd ops on large arm cluster

Adam Prycki Mon, 08 Jul 2024 05:29:36 -0700

Hello,

we are having issues with slow ops on our large ARM hpc ceph cluster.


Cluster runs on 18.2.0 and ubutnu 20.04

MONs, MGRs and MDSs had to be moved to intel servers because of poorsingle core performance on our arm servers.Our main cephfs data pool is on 54 serwers in 9 racks with 1458 HDDs intotal. (OSDs without block.db on ssd)Cephfs data pool is configured as erasure coded pool with k=6 m=2 andrack level replication. Pool has about 16k PGs with average pg per osdat ~90.

We have had good experience with EC cephfs on 3,5 times smaller intelceph cluster. But this arm deployment is becoming problematic. Westarted experiencing issues since one of the users started to generatesequential RW traffic at at about 5GiB/s. Single OSD with slow ops wasenough to create laggy PG and crash application generating this traffic.We've even had issue where osd with slow ops was lagged for 6 hours andrequired manual restart.

Now we are experiencing slow ops even at much lower read only traffic~400MiB/s


Here is an example of slow ops on OSD:
{
    "ops": [
        {

"description": "osd_op(client.255949991.0:92728602 4.d22s04:44b3390a:::1000b640ddc.0000039b:head [read 3633152~8192] snapc 0=[]ondisk+read+known_if_redirected e1117246)",

            "initiated_at": "2024-07-08T10:19:58.469537+0000",
            "age": 507.242936848,
            "duration": 507.24298854800003,
            "type_data": {
                "flag_point": "started",
                "client_info": {
                    "client": "client.255949991",
                    "client_addr": "x.x.x.x:0/887459214",
                    "tid": 92728602
                },
                "events": [
                    {
                        "event": "initiated",
                        "time": "2024-07-08T10:19:58.469537+0000",
                        "duration": 0
                    },
                    {
                        "event": "throttled",
                        "time": "2024-07-08T10:19:58.469537+0000",
                        "duration": 0
                    },
                    {
                        "event": "header_read",
                        "time": "2024-07-08T10:19:58.469535+0000",
                        "duration": 4294967295.9999981
                    },
                    {
                        "event": "all_read",
                        "time": "2024-07-08T10:19:58.469571+0000",
                        "duration": 3.5859999999999999e-05
                    },
                    {
                        "event": "dispatched",
                        "time": "2024-07-08T10:19:58.469573+0000",
                        "duration": 2.08e-06
                    },
                    {
                        "event": "queued_for_pg",
                        "time": "2024-07-08T10:19:58.469586+0000",
                        "duration": 1.2721000000000001e-05
                    },
                    {
                        "event": "reached_pg",
                        "time": "2024-07-08T10:19:58.485132+0000",
                        "duration": 0.015546048999999999
                    },
                    {
                        "event": "started",
                        "time": "2024-07-08T10:19:58.485147+0000",
                        "duration": 1.5160000000000001e-05
                    }
                ]
            }
        },

HDD with this OSD is not busy. Arm cores on these servers are slow butno process reaches full 100% core usage.

I think we may have the same issue as one described here:https://www.mail-archive.com/ceph-users@ceph.io/msg13273.html


I've tried to reduce osd_pool_default_read_lease_ratio form 0.8 to 0.2
I've tried to reduce osd_heartbeat_grace from 20 to 10.

It should lower read_lease_interval from 16 to 2 but it didn't help.Still see a lot of slow ops.


Could you give me tips what I could tune to fix this issue?

Could this be an issue with large number of EC PGs on large cluster withweak CPUs?


Best regards
Adam Prycki
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Slow osd ops on large arm cluster

Reply via email to