ceph osd pool application enable XXX rbd

-----Original Message-----
From: Steven Vacaroaia [mailto:ste...@gmail.com] 
Sent: woensdag 24 januari 2018 19:47
To: David Turner
Cc: ceph-users
Subject: Re: [ceph-users] Luminous - bad performance

Hi ,

I have bundled the public NICs and added 2 more monitors ( running on 2 
of the 3 OSD hosts) This seem to improve  things but still I have high 
latency Also performance of the SSD pool is worse than HDD which is very 
confusing 

SSDpool is using one Toshiba PX05SMB040Y per server ( for a total of 3 
OSDs) while HDD pool is using 2 Seagate ST600MM0006 disks per server () 
for a total of 6 OSDs)

Note
I have also disabled  C state in the BIOS and added  
"intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=0 
idle=poll" to GRUB

Any hints/suggestions will be greatly appreciated 

[root@osd04 ~]# ceph status
  cluster:
    id:     37161a51-a159-4895-a7fd-3b0d857f1b66
    health: HEALTH_WARN
            noscrub,nodeep-scrub flag(s) set
            application not enabled on 2 pool(s)
            mon osd02 is low on available space

  services:
    mon:         3 daemons, quorum osd01,osd02,mon01
    mgr:         mon01(active)
    osd:         9 osds: 9 up, 9 in
                 flags noscrub,nodeep-scrub
    tcmu-runner: 6 daemons active

  data:
    pools:   2 pools, 228 pgs
    objects: 50384 objects, 196 GB
    usage:   402 GB used, 3504 GB / 3906 GB avail
    pgs:     228 active+clean

  io:
    client:   46061 kB/s rd, 852 B/s wr, 15 op/s rd, 0 op/s wr

[root@osd04 ~]# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF
 -9       4.50000 root ssds
-10       1.50000     host osd01-ssd
  6   hdd 1.50000         osd.6          up  1.00000 1.00000
-11       1.50000     host osd02-ssd
  7   hdd 1.50000         osd.7          up  1.00000 1.00000
-12       1.50000     host osd04-ssd
  8   hdd 1.50000         osd.8          up  1.00000 1.00000
 -1       2.72574 root default
 -3       1.09058     host osd01
  0   hdd 0.54529         osd.0          up  1.00000 1.00000
  4   hdd 0.54529         osd.4          up  1.00000 1.00000
 -5       1.09058     host osd02
  1   hdd 0.54529         osd.1          up  1.00000 1.00000
  3   hdd 0.54529         osd.3          up  1.00000 1.00000
 -7       0.54459     host osd04
  2   hdd 0.27229         osd.2          up  1.00000 1.00000
  5   hdd 0.27229         osd.5          up  1.00000 1.00000


 rados bench -p ssdpool 300 -t 32 write --no-cleanup && rados bench -p 
ssdpool 300 -t 32  seq

Total time run:         302.058832
Total writes made:      4100
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     54.2941
Stddev Bandwidth:       70.3355
Max bandwidth (MB/sec): 252
Min bandwidth (MB/sec): 0
Average IOPS:           13
Stddev IOPS:            17
Max IOPS:               63
Min IOPS:               0
Average Latency(s):     2.35655
Stddev Latency(s):      4.4346
Max latency(s):         29.7027
Min latency(s):         0.045166

rados bench -p rbd 300 -t 32 write --no-cleanup && rados bench -p rbd 
300 -t 32  seq
Total time run:         301.428571
Total writes made:      8753
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     116.154
Stddev Bandwidth:       71.5763
Max bandwidth (MB/sec): 320
Min bandwidth (MB/sec): 0
Average IOPS:           29
Stddev IOPS:            17
Max IOPS:               80
Min IOPS:               0
Average Latency(s):     1.10189
Stddev Latency(s):      1.80203
Max latency(s):         15.0715
Min latency(s):         0.0210309




[root@osd04 ~]# ethtool -k gth0
Features for gth0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: on [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]



On 22 January 2018 at 12:09, Steven Vacaroaia <ste...@gmail.com> wrote:


        Hi David,

        I noticed the public interface of the server I am running the test 
from is heavily used  so I will bond that one too 

        I doubt though that this explains the poor performance

        Thanks for your advice 
        

        Steven



        On 22 January 2018 at 12:02, David Turner <drakonst...@gmail.com> 
wrote:
        

                I'm not speaking to anything other than your configuration. 

                "I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100 
xmit_hash_policy=1 lacp_rate=1")  for cluster and 1 x 1GB for public"
                It might not be a bad idea for you to forgo the public network 
on the 1Gb interfaces and either put everything on one network or use 
VLANs on the 10Gb connections.  I lean more towards that in particular 
because your public network doesn't have a bond on it.  Just as a note, 
communication between the OSDs and the MONs are all done on the public 
network.  If that interface goes down, then the OSDs are likely to be 
marked down/out from your cluster.  I'm a fan of VLANs, but if you don't 
have the equipment or expertise to go that route, then just using the 
same subnet for public and private is a decent way to go.
                

                On Mon, Jan 22, 2018 at 11:37 AM Steven Vacaroaia 
<ste...@gmail.com> wrote:
                

                        I did test with rados bench ..here are the results

                        rados bench -p ssdpool 300 -t 12 write --no-cleanup && 
rados bench -p ssdpool 300 -t 12  seq

                        Total time run:         300.322608
                        Total writes made:      10632
                        Write size:             4194304
                        Object size:            4194304
                        Bandwidth (MB/sec):     141.608
                        Stddev Bandwidth:       74.1065
                        Max bandwidth (MB/sec): 264
                        Min bandwidth (MB/sec): 0
                        Average IOPS:           35
                        Stddev IOPS:            18
                        Max IOPS:               66
                        Min IOPS:               0
                        Average Latency(s):     0.33887
                        Stddev Latency(s):      0.701947
                        Max latency(s):         9.80161
                        Min latency(s):         0.015171

                        Total time run:       300.829945
                        Total reads made:     10070
                        Read size:            4194304
                        Object size:          4194304
                        Bandwidth (MB/sec):   133.896
                        Average IOPS:         33
                        Stddev IOPS:          14
                        Max IOPS:             68
                        Min IOPS:             3
                        Average Latency(s):   0.35791
                        Max latency(s):       4.68213
                        Min latency(s):       0.0107572


                        rados bench -p scbench256 300 -t 12 write --no-cleanup 
&& 
rados bench -p scbench256 300 -t 12  seq

                        Total time run:         300.747004
                        Total writes made:      10239
                        Write size:             4194304
                        Object size:            4194304
                        Bandwidth (MB/sec):     136.181
                        Stddev Bandwidth:       75.5
                        Max bandwidth (MB/sec): 272
                        Min bandwidth (MB/sec): 0
                        Average IOPS:           34
                        Stddev IOPS:            18
                        Max IOPS:               68
                        Min IOPS:               0
                        Average Latency(s):     0.352339
                        Stddev Latency(s):      0.72211
                        Max latency(s):         9.62304
                        Min latency(s):         0.00936316
                        hints = 1


                        Total time run:       300.610761
                        Total reads made:     7628
                        Read size:            4194304
                        Object size:          4194304
                        Bandwidth (MB/sec):   101.5
                        Average IOPS:         25
                        Stddev IOPS:          11
                        Max IOPS:             61
                        Min IOPS:             0
                        Average Latency(s):   0.472321
                        Max latency(s):       15.636
                        Min latency(s):       0.0188098


                        On 22 January 2018 at 11:34, Steven Vacaroaia 
<ste...@gmail.com> wrote:
                        

                                sorry ..send the message too soon
                                Here is more info
                                Vendor Id          : SEAGATE
                                                Product Id         : ST600MM0006
                                                State              : Online
                                                Disk Type          : SAS,Hard 
Disk 
Device
                                                Capacity           : 558.375 GB
                                                Power State        : Active

                                ( SSD is in slot 0)

                                 megacli -LDGetProp  -Cache -LALL -a0

                                Adapter 0-VD 0(target id: 0): Cache 
Policy:WriteThrough, ReadAheadNone, Direct, No Write Cache if bad BBU
                                Adapter 0-VD 1(target id: 1): Cache 
Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU
                                Adapter 0-VD 2(target id: 2): Cache 
Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU
                                Adapter 0-VD 3(target id: 3): Cache 
Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU
                                Adapter 0-VD 4(target id: 4): Cache 
Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU
                                Adapter 0-VD 5(target id: 5): Cache 
Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU

                                [root@osd01 ~]#  megacli -LDGetProp  -DskCache 
-LALL 
-a0

                                Adapter 0-VD 0(target id: 0): Disk Write Cache 
: 
Disabled
                                Adapter 0-VD 1(target id: 1): Disk Write Cache 
: 
Disk's Default
                                Adapter 0-VD 2(target id: 2): Disk Write Cache 
: 
Disk's Default
                                Adapter 0-VD 3(target id: 3): Disk Write Cache 
: 
Disk's Default
                                Adapter 0-VD 4(target id: 4): Disk Write Cache 
: 
Disk's Default
                                Adapter 0-VD 5(target id: 5): Disk Write Cache 
: 
Disk's Default


                                CPU
                                Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
                                

                                Centos 7 kernel 3.10.0-693.11.6.el7.x86_64

                                sysctl -p
                                net.ipv4.tcp_sack = 0
                                net.core.netdev_budget = 600
                                net.ipv4.tcp_window_scaling = 1
                                net.core.rmem_max = 16777216
                                net.core.wmem_max = 16777216
                                net.core.rmem_default = 16777216
                                net.core.wmem_default = 16777216
                                net.core.optmem_max = 40960
                                net.ipv4.tcp_rmem = 4096 87380 16777216
                                net.ipv4.tcp_wmem = 4096 65536 16777216
                                net.ipv4.tcp_syncookies = 0
                                net.core.somaxconn = 1024
                                net.core.netdev_max_backlog = 20000
                                net.ipv4.tcp_max_syn_backlog = 30000
                                net.ipv4.tcp_max_tw_buckets = 2000000
                                net.ipv4.tcp_tw_reuse = 1
                                net.ipv4.tcp_slow_start_after_idle = 0
                                net.ipv4.conf.all.send_redirects = 0
                                net.ipv4.conf.all.accept_redirects = 0
                                net.ipv4.conf.all.accept_source_route = 0
                                vm.min_free_kbytes = 262144
                                vm.swappiness = 0
                                vm.vfs_cache_pressure = 100
                                fs.suid_dumpable = 0
                                kernel.core_uses_pid = 1
                                kernel.msgmax = 65536
                                kernel.msgmnb = 65536
                                kernel.randomize_va_space = 1
                                kernel.sysrq = 0
                                kernel.pid_max = 4194304
                                fs.file-max = 100000


                                ceph.conf


                                public_network = 10.10.30.0/24
                                cluster_network = 192.168.0.0/24


                                osd_op_num_threads_per_shard = 2
                                osd_op_num_shards = 25
                                osd_pool_default_size = 2
                                osd_pool_default_min_size = 1 # Allow writing 1 
copy 
in a degraded state
                                osd_pool_default_pg_num = 256
                                osd_pool_default_pgp_num = 256
                                osd_crush_chooseleaf_type = 1
                                osd_scrub_load_threshold = 0.01
                                osd_scrub_min_interval = 137438953472
                                osd_scrub_max_interval = 137438953472
                                osd_deep_scrub_interval = 137438953472
                                osd_max_scrubs = 16
                                osd_op_threads = 8
                                osd_max_backfills = 1
                                osd_recovery_max_active = 1
                                osd_recovery_op_priority = 1




                                debug_lockdep = 0/0
                                debug_context = 0/0
                                debug_crush = 0/0
                                debug_buffer = 0/0
                                debug_timer = 0/0
                                debug_filer = 0/0
                                debug_objecter = 0/0
                                debug_rados = 0/0
                                debug_rbd = 0/0
                                debug_journaler = 0/0
                                debug_objectcatcher = 0/0
                                debug_client = 0/0
                                debug_osd = 0/0
                                debug_optracker = 0/0
                                debug_objclass = 0/0
                                debug_filestore = 0/0
                                debug_journal = 0/0
                                debug_ms = 0/0
                                debug_monc = 0/0
                                debug_tp = 0/0
                                debug_auth = 0/0
                                debug_finisher = 0/0
                                debug_heartbeatmap = 0/0
                                debug_perfcounter = 0/0
                                debug_asok = 0/0
                                debug_throttle = 0/0
                                debug_mon = 0/0
                                debug_paxos = 0/0
                                debug_rgw = 0/0


                                [mon]
                                mon_allow_pool_delete = true

                                [osd]
                                osd_heartbeat_grace = 20
                                osd_heartbeat_interval = 5
                                bluestore_block_db_size = 16106127360 
<tel:(610)%20612-7360> 
                                bluestore_block_wal_size = 1073741824

                                [osd.6]
                                host = osd01
                                osd_journal = 
/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.1d58775a-
5019-42ea-8149-a126f51a2501
                                crush_location = root=ssds host=osd01-ssd

                                [osd.7]
                                host = osd02
                                osd_journal = 
/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.683dc52d-
5d69-4ff0-b5d9-b17056a55681
                                crush_location = root=ssds host=osd02-ssd

                                [osd.8]
                                host = osd04
                                osd_journal = 
/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.bd7c0088-
b724-441e-9b88-9457305c541d
                                crush_location = root=ssds host=osd04-ssd


                                On 22 January 2018 at 11:29, Steven Vacaroaia 
<ste...@gmail.com> wrote:
                                

                                        Hi David,

                                        Yes, I meant no separate partitions for 
WAL and 
DB

                                        I am using 2 x 10 GB bonded ( 
BONDING_OPTS="mode=4 miimon=100 xmit_hash_policy=1 lacp_rate=1")  for 
cluster and 1 x 1GB for public   
                                        Disks are 
                                        Vendor Id          : TOSHIBA
                                                        Product Id         : 
PX05SMB040Y
                                                        State              : 
Online
                                                        Disk Type          : 
SAS,Solid 
State Device
                                                        Capacity           : 
372.0 GB


                                        On 22 January 2018 at 11:24, David 
Turner 
<drakonst...@gmail.com> wrote:
                                        

                                                Disk models, other hardware 
information 
including CPU, network config?  You say you're using Luminous, but then 
say journal on same device.  I'm assuming you mean that you just have 
the bluestore OSD configured without a separate WAL or DB partition?  
Any more specifics you can give will be helpful.

                                                On Mon, Jan 22, 2018 at 11:20 
AM Steven 
Vacaroaia <ste...@gmail.com> wrote:
                                                

                                                        Hi,

                                                        I'll appreciate if you 
can provide 
some guidance / suggestions regarding perfomance issues on a test 
cluster ( 3 x DELL R620, 1 Entreprise SSD, 3 x 600 GB ,Entreprise HDD, 8 
cores, 64 GB RAM)

                                                        I created 2 pools ( 
replication 
factor 2) one with only SSD and the other with only HDD
                                                        ( journal on same disk 
for both)

                                                        The perfomance is quite 
similar 
although I was expecting to be at least 5 times better
                                                        No issues noticed using 
atop

                                                        What  should I check / 
tune ?

                                                        Many thanks
                                                        Steven



                                                        HDD based pool ( 
journal on the same 
disk)

                                                        ceph osd pool get 
scbench256 all

                                                        size: 2
                                                        min_size: 1
                                                        crash_replay_interval: 0
                                                        pg_num: 256
                                                        pgp_num: 256
                                                        crush_rule: 
replicated_rule
                                                        hashpspool: true
                                                        nodelete: false
                                                        nopgchange: false
                                                        nosizechange: false
                                                        write_fadvise_dontneed: 
false
                                                        noscrub: false
                                                        nodeep-scrub: false
                                                        use_gmt_hitset: 1
                                                        auid: 0
                                                        fast_read: 0


                                                        rbd bench --io-type 
write  image1 
--pool=scbench256
                                                        bench  type write 
io_size 4096 
io_threads 16 bytes 1073741824 pattern sequential
                                                          SEC       OPS   
OPS/SEC   BYTES/SEC
                                                            1     46816  
46836.46  
191842139.78
                                                            2     90658  
45339.11  
185709011.80
                                                            3    133671  
44540.80  
182439126.08
                                                            4    177341  
44340.36  
181618100.14
                                                            5    217300  
43464.04  
178028704.54
                                                            6    259595  
42555.85  
174308767.05
                                                        elapsed:     6  ops:   
262144  
ops/sec: 42694.50  bytes/sec: 174876688.23

                                                        fio 
/home/cephuser/write_256.fio
                                                        write-4M: (g=0): 
rw=randread, 
bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
                                                        fio-2.2.8
                                                        Starting 1 process
                                                        rbd engine: RBD 
version: 1.12.0
                                                        Jobs: 1 (f=1): [r(1)] 
[100.0% done] 
[66284KB/0KB/0KB /s] [16.6K/0/0 iops] [eta 00m:00s]


                                                        fio 
/home/cephuser/write_256.fio
                                                        write-4M: (g=0): 
rw=write, 
bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
                                                        fio-2.2.8
                                                        Starting 1 process
                                                        rbd engine: RBD 
version: 1.12.0
                                                        Jobs: 1 (f=1): [W(1)] 
[100.0% done] 
[0KB/14464KB/0KB /s] [0/3616/0 iops] [eta 00m:00s]


                                                        SSD based pool 


                                                        ceph osd pool get 
ssdpool all

                                                        size: 2
                                                        min_size: 1
                                                        crash_replay_interval: 0
                                                        pg_num: 128
                                                        pgp_num: 128
                                                        crush_rule: ssdpool
                                                        hashpspool: true
                                                        nodelete: false
                                                        nopgchange: false
                                                        nosizechange: false
                                                        write_fadvise_dontneed: 
false
                                                        noscrub: false
                                                        nodeep-scrub: false
                                                        use_gmt_hitset: 1
                                                        auid: 0
                                                        fast_read: 0

                                                         rbd -p ssdpool create 
--size 52100 
image2
                                                        

                                                        rbd bench --io-type 
write  image2 
--pool=ssdpool
                                                        bench  type write 
io_size 4096 
io_threads 16 bytes 1073741824 pattern sequential
                                                          SEC       OPS   
OPS/SEC   BYTES/SEC
                                                            1     42412  
41867.57  
171489557.93
                                                            2     78343  
39180.86  
160484805.88
                                                            3    118082  
39076.48  
160057256.16
                                                            4    155164  
38683.98  
158449572.38
                                                            5    192825  
38307.59  
156907885.84
                                                            6    230701  
37716.95  
154488608.16
                                                        elapsed:     7  ops:   
262144  
ops/sec: 36862.89  bytes/sec: 150990387.29


                                                        [root@osd01 ~]# fio 
/home/cephuser/write_256.fio
                                                        write-4M: (g=0): 
rw=write, 
bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
                                                        fio-2.2.8
                                                        Starting 1 process
                                                        rbd engine: RBD 
version: 1.12.0
                                                        Jobs: 1 (f=1): [W(1)] 
[100.0% done] 
[0KB/20224KB/0KB /s] [0/5056/0 iops] [eta 00m:00s]


                                                        fio 
/home/cephuser/write_256.fio
                                                        write-4M: (g=0): 
rw=randread, 
bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
                                                        fio-2.2.8
                                                        Starting 1 process
                                                        rbd engine: RBD 
version: 1.12.0
                                                        Jobs: 1 (f=1): [r(1)] 
[100.0% done] 
[76096KB/0KB/0KB /s] [19.3K/0/0 iops] [eta 00m:00s]
                                                        
_______________________________________________
                                                        ceph-users mailing list
                                                        
ceph-users@lists.ceph.com
                                                        
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> 
                                                        







_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to