Hello,

we are having issues with slow ops on our large ARM hpc ceph cluster.

Cluster runs on 18.2.0 and ubutnu 20.04
MONs, MGRs and MDSs had to be moved to intel servers because of poor single core performance on our arm servers. Our main cephfs data pool is on 54 serwers in 9 racks with 1458 HDDs in total. (OSDs without block.db on ssd) Cephfs data pool is configured as erasure coded pool with k=6 m=2 and rack level replication. Pool has about 16k PGs with average pg per osd at ~90.

We have had good experience with EC cephfs on 3,5 times smaller intel ceph cluster. But this arm deployment is becoming problematic. We started experiencing issues since one of the users started to generate sequential RW traffic at at about 5GiB/s. Single OSD with slow ops was enough to create laggy PG and crash application generating this traffic. We've even had issue where osd with slow ops was lagged for 6 hours and required manual restart.

Now we are experiencing slow ops even at much lower read only traffic ~400MiB/s

Here is an example of slow ops on OSD:
{
    "ops": [
        {
"description": "osd_op(client.255949991.0:92728602 4.d22s0 4:44b3390a:::1000b640ddc.0000039b:head [read 3633152~8192] snapc 0=[] ondisk+read+known_if_redirected e1117246)",
            "initiated_at": "2024-07-08T10:19:58.469537+0000",
            "age": 507.242936848,
            "duration": 507.24298854800003,
            "type_data": {
                "flag_point": "started",
                "client_info": {
                    "client": "client.255949991",
                    "client_addr": "x.x.x.x:0/887459214",
                    "tid": 92728602
                },
                "events": [
                    {
                        "event": "initiated",
                        "time": "2024-07-08T10:19:58.469537+0000",
                        "duration": 0
                    },
                    {
                        "event": "throttled",
                        "time": "2024-07-08T10:19:58.469537+0000",
                        "duration": 0
                    },
                    {
                        "event": "header_read",
                        "time": "2024-07-08T10:19:58.469535+0000",
                        "duration": 4294967295.9999981
                    },
                    {
                        "event": "all_read",
                        "time": "2024-07-08T10:19:58.469571+0000",
                        "duration": 3.5859999999999999e-05
                    },
                    {
                        "event": "dispatched",
                        "time": "2024-07-08T10:19:58.469573+0000",
                        "duration": 2.08e-06
                    },
                    {
                        "event": "queued_for_pg",
                        "time": "2024-07-08T10:19:58.469586+0000",
                        "duration": 1.2721000000000001e-05
                    },
                    {
                        "event": "reached_pg",
                        "time": "2024-07-08T10:19:58.485132+0000",
                        "duration": 0.015546048999999999
                    },
                    {
                        "event": "started",
                        "time": "2024-07-08T10:19:58.485147+0000",
                        "duration": 1.5160000000000001e-05
                    }
                ]
            }
        },
HDD with this OSD is not busy. Arm cores on these servers are slow but no process reaches full 100% core usage.

I think we may have the same issue as one described here: https://www.mail-archive.com/ceph-users@ceph.io/msg13273.html

I've tried to reduce osd_pool_default_read_lease_ratio form 0.8 to 0.2
I've tried to reduce osd_heartbeat_grace from 20 to 10.
It should lower read_lease_interval from 16 to 2 but it didn't help. Still see a lot of slow ops.

Could you give me tips what I could tune to fix this issue?

Could this be an issue with large number of EC PGs on large cluster with weak CPUs?

Best regards
Adam Prycki
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to