Hi

 

We are experiencing an annoying problem where scrubs make OSD's flap down
and cause Ceph cluster to be unusable for couple of minutes.

 

Our cluster consists of three nodes connected with 40gbit infiniband using
IPoIB, with 2x 6 core X5670 CPU's and 64GB of memory

Each node has 6 SSD's for journals to 12 OSD's 2TB disks (Fast pools) and
another 12 OSD's 4TB disks (Archive pools) which have journal on the same
disk. 

 

It seems that our cluster is constantly doing scrubbing, we rarely see only
active+clean, below is the status at the moment.

 

    cluster a2974742-3805-4cd3-bc79-765f2bddaefe

     health HEALTH_OK

     monmap e16: 4 mons at
{lb1=10.20.60.1:6789/0,lb2=10.20.60.2:6789/0,nc1=10.20.50.2:6789/0,nc2=10.20
.50.3:6789/0}

            election epoch 1838, quorum 0,1,2,3 nc1,nc2,lb1,lb2

     mdsmap e7901: 1/1/1 up {0=lb1=up:active}, 4 up:standby

     osdmap e104824: 72 osds: 72 up, 72 in

      pgmap v12941402: 5248 pgs, 9 pools, 19644 GB data, 4810 kobjects

            59067 GB used, 138 TB / 196 TB avail

                5241 active+clean

                   7 active+clean+scrubbing

 

When OSD's go down, first the load on a node goes high during scrubbing and
after that some OSD's go down and 30 secs, they are back up. They are not
really going down, but are marked as down. Then it takes around couple of
minutes for everything be OK again.

 

Any suggestion how to fix this? We can't go to production while this
behavior exists.

 

Our config is below:

 

[global]

fsid = a2974742-3805-4cd3-bc79-765f2bddaefe

mon_initial_members = lb1,lb2,nc1,nc2

mon_host = 10.20.60.1,10.20.60.2,10.20.50.2,10.20.50.3

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

filestore_xattr_use_omap = true

 

osd pool default pg num = 128

osd pool default pgp num = 128

 

public network = 10.20.0.0/16

 

        osd_op_threads = 12

        osd_op_num_threads_per_shard = 2

        osd_op_num_shards = 6

        #osd_op_num_sharded_pool_threads = 25

        filestore_op_threads = 12

        ms_nocrc = true

        filestore_fd_cache_size = 64

        filestore_fd_cache_shards = 32

        ms_dispatch_throttle_bytes = 0

        throttler_perf_counter = false

 

mon osd min down reporters = 25

 

[osd]

osd scrub max interval = 1209600

osd scrub min interval = 604800

osd scrub load threshold = 3.0

osd max backfills = 1

osd recovery max active = 1

# IO Scheduler settings

osd scrub sleep = 1.0

osd disk thread ioprio class = idle

osd disk thread ioprio priority = 7

osd scrub chunk max = 1

osd scrub chunk min = 1

osd deep scrub stride = 1048576

filestore queue max ops = 10000

filestore max sync interval = 30

filestore min sync interval = 29

 

osd deep scrub interval = 2592000

        osd heartbeat grace = 240

        osd heartbeat interval = 12

        osd mon report interval max = 120

        osd mon report interval min = 5

 

       osd_client_message_size_cap = 0

        osd_client_message_cap = 0

        osd_enable_op_tracker = false

 

        osd crush update on start = false

 

[client]

        rbd cache = true

        rbd cache size = 67108864 # 64mb

        rbd cache max dirty = 50331648 # 48mb

        rbd cache target dirty = 33554432 # 32mb

        rbd cache writethrough until flush = true # It's by default

        rbd cache max dirty age = 2

        admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok

 

 

Br,

Tuomas

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to