Hi, setting the pplication pool helped - the performance is not skewed anymore ( i.e. SSDpool is better than HDD) However latency when using more threads is still very high
I am getting 9.91 Gbits/sec when testing with iperf Not sure what else should I check As always, your help will be greatly appreciated using 1 hread ( -t 1) Total writes made: 1627 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 108.387 Stddev Bandwidth: 9.75056 Max bandwidth (MB/sec): 128 Min bandwidth (MB/sec): 92 Average IOPS: 27 Stddev IOPS: 2 Max IOPS: 32 Min IOPS: 23 Average Latency(s): 0.0369025 Stddev Latency(s): 0.0161718 Max latency(s): 0.258894 Min latency(s): 0.0133281 using 32 threads ( -t 32 ) Total writes made: 2348 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 143.244 Stddev Bandwidth: 124.265 Max bandwidth (MB/sec): 420 Min bandwidth (MB/sec): 0 Average IOPS: 35 Stddev IOPS: 31 Max IOPS: 105 Min IOPS: 0 Average Latency(s): 0.892837 Stddev Latency(s): 1.97054 Max latency(s): 14.0602 Min latency(s): 0.0250363 On 24 January 2018 at 15:03, Marc Roos <m.r...@f1-outsourcing.eu> wrote: > > > ceph osd pool application enable XXX rbd > > -----Original Message----- > From: Steven Vacaroaia [mailto:ste...@gmail.com] > Sent: woensdag 24 januari 2018 19:47 > To: David Turner > Cc: ceph-users > Subject: Re: [ceph-users] Luminous - bad performance > > Hi , > > I have bundled the public NICs and added 2 more monitors ( running on 2 > of the 3 OSD hosts) This seem to improve things but still I have high > latency Also performance of the SSD pool is worse than HDD which is very > confusing > > SSDpool is using one Toshiba PX05SMB040Y per server ( for a total of 3 > OSDs) while HDD pool is using 2 Seagate ST600MM0006 disks per server () > for a total of 6 OSDs) > > Note > I have also disabled C state in the BIOS and added > "intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=0 > idle=poll" to GRUB > > Any hints/suggestions will be greatly appreciated > > [root@osd04 ~]# ceph status > cluster: > id: 37161a51-a159-4895-a7fd-3b0d857f1b66 > health: HEALTH_WARN > noscrub,nodeep-scrub flag(s) set > application not enabled on 2 pool(s) > mon osd02 is low on available space > > services: > mon: 3 daemons, quorum osd01,osd02,mon01 > mgr: mon01(active) > osd: 9 osds: 9 up, 9 in > flags noscrub,nodeep-scrub > tcmu-runner: 6 daemons active > > data: > pools: 2 pools, 228 pgs > objects: 50384 objects, 196 GB > usage: 402 GB used, 3504 GB / 3906 GB avail > pgs: 228 active+clean > > io: > client: 46061 kB/s rd, 852 B/s wr, 15 op/s rd, 0 op/s wr > > [root@osd04 ~]# ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -9 4.50000 root ssds > -10 1.50000 host osd01-ssd > 6 hdd 1.50000 osd.6 up 1.00000 1.00000 > -11 1.50000 host osd02-ssd > 7 hdd 1.50000 osd.7 up 1.00000 1.00000 > -12 1.50000 host osd04-ssd > 8 hdd 1.50000 osd.8 up 1.00000 1.00000 > -1 2.72574 root default > -3 1.09058 host osd01 > 0 hdd 0.54529 osd.0 up 1.00000 1.00000 > 4 hdd 0.54529 osd.4 up 1.00000 1.00000 > -5 1.09058 host osd02 > 1 hdd 0.54529 osd.1 up 1.00000 1.00000 > 3 hdd 0.54529 osd.3 up 1.00000 1.00000 > -7 0.54459 host osd04 > 2 hdd 0.27229 osd.2 up 1.00000 1.00000 > 5 hdd 0.27229 osd.5 up 1.00000 1.00000 > > > rados bench -p ssdpool 300 -t 32 write --no-cleanup && rados bench -p > ssdpool 300 -t 32 seq > > Total time run: 302.058832 > Total writes made: 4100 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 54.2941 > Stddev Bandwidth: 70.3355 > Max bandwidth (MB/sec): 252 > Min bandwidth (MB/sec): 0 > Average IOPS: 13 > Stddev IOPS: 17 > Max IOPS: 63 > Min IOPS: 0 > Average Latency(s): 2.35655 > Stddev Latency(s): 4.4346 > Max latency(s): 29.7027 > Min latency(s): 0.045166 > > rados bench -p rbd 300 -t 32 write --no-cleanup && rados bench -p rbd > 300 -t 32 seq > Total time run: 301.428571 > Total writes made: 8753 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 116.154 > Stddev Bandwidth: 71.5763 > Max bandwidth (MB/sec): 320 > Min bandwidth (MB/sec): 0 > Average IOPS: 29 > Stddev IOPS: 17 > Max IOPS: 80 > Min IOPS: 0 > Average Latency(s): 1.10189 > Stddev Latency(s): 1.80203 > Max latency(s): 15.0715 > Min latency(s): 0.0210309 > > > > > [root@osd04 ~]# ethtool -k gth0 > Features for gth0: > rx-checksumming: on > tx-checksumming: on > tx-checksum-ipv4: off [fixed] > tx-checksum-ip-generic: on > tx-checksum-ipv6: off [fixed] > tx-checksum-fcoe-crc: on [fixed] > tx-checksum-sctp: on > scatter-gather: on > tx-scatter-gather: on > tx-scatter-gather-fraglist: off [fixed] > tcp-segmentation-offload: on > tx-tcp-segmentation: on > tx-tcp-ecn-segmentation: off [fixed] > tx-tcp-mangleid-segmentation: off > tx-tcp6-segmentation: on > udp-fragmentation-offload: off [fixed] > generic-segmentation-offload: on > generic-receive-offload: on > large-receive-offload: off > rx-vlan-offload: on > tx-vlan-offload: on > ntuple-filters: off > receive-hashing: on > highdma: on [fixed] > rx-vlan-filter: on > vlan-challenged: off [fixed] > tx-lockless: off [fixed] > netns-local: off [fixed] > tx-gso-robust: off [fixed] > tx-fcoe-segmentation: on [fixed] > tx-gre-segmentation: on > tx-gre-csum-segmentation: on > tx-ipxip4-segmentation: on > tx-ipxip6-segmentation: on > tx-udp_tnl-segmentation: on > tx-udp_tnl-csum-segmentation: on > tx-gso-partial: on > tx-sctp-segmentation: off [fixed] > tx-esp-segmentation: off [fixed] > fcoe-mtu: off [fixed] > tx-nocache-copy: off > loopback: off [fixed] > rx-fcs: off [fixed] > rx-all: off > tx-vlan-stag-hw-insert: off [fixed] > rx-vlan-stag-hw-parse: off [fixed] > rx-vlan-stag-filter: off [fixed] > l2-fwd-offload: off > hw-tc-offload: off > esp-hw-offload: off [fixed] > esp-tx-csum-hw-offload: off [fixed] > > > > On 22 January 2018 at 12:09, Steven Vacaroaia <ste...@gmail.com> wrote: > > > Hi David, > > I noticed the public interface of the server I am running the test > from is heavily used so I will bond that one too > > I doubt though that this explains the poor performance > > Thanks for your advice > > > Steven > > > > On 22 January 2018 at 12:02, David Turner <drakonst...@gmail.com> > wrote: > > > I'm not speaking to anything other than your configuration. > > "I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 > miimon=100 > xmit_hash_policy=1 lacp_rate=1") for cluster and 1 x 1GB for public" > It might not be a bad idea for you to forgo the public > network > on the 1Gb interfaces and either put everything on one network or use > VLANs on the 10Gb connections. I lean more towards that in particular > because your public network doesn't have a bond on it. Just as a note, > communication between the OSDs and the MONs are all done on the public > network. If that interface goes down, then the OSDs are likely to be > marked down/out from your cluster. I'm a fan of VLANs, but if you don't > have the equipment or expertise to go that route, then just using the > same subnet for public and private is a decent way to go. > > > On Mon, Jan 22, 2018 at 11:37 AM Steven Vacaroaia > <ste...@gmail.com> wrote: > > > I did test with rados bench ..here are the results > > rados bench -p ssdpool 300 -t 12 write > --no-cleanup && > rados bench -p ssdpool 300 -t 12 seq > > Total time run: 300.322608 > Total writes made: 10632 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 141.608 > Stddev Bandwidth: 74.1065 > Max bandwidth (MB/sec): 264 > Min bandwidth (MB/sec): 0 > Average IOPS: 35 > Stddev IOPS: 18 > Max IOPS: 66 > Min IOPS: 0 > Average Latency(s): 0.33887 > Stddev Latency(s): 0.701947 > Max latency(s): 9.80161 > Min latency(s): 0.015171 > > Total time run: 300.829945 > Total reads made: 10070 > Read size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 133.896 > Average IOPS: 33 > Stddev IOPS: 14 > Max IOPS: 68 > Min IOPS: 3 > Average Latency(s): 0.35791 > Max latency(s): 4.68213 > Min latency(s): 0.0107572 > > > rados bench -p scbench256 300 -t 12 write > --no-cleanup && > rados bench -p scbench256 300 -t 12 seq > > Total time run: 300.747004 > Total writes made: 10239 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 136.181 > Stddev Bandwidth: 75.5 > Max bandwidth (MB/sec): 272 > Min bandwidth (MB/sec): 0 > Average IOPS: 34 > Stddev IOPS: 18 > Max IOPS: 68 > Min IOPS: 0 > Average Latency(s): 0.352339 > Stddev Latency(s): 0.72211 > Max latency(s): 9.62304 > Min latency(s): 0.00936316 > hints = 1 > > > Total time run: 300.610761 > Total reads made: 7628 > Read size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 101.5 > Average IOPS: 25 > Stddev IOPS: 11 > Max IOPS: 61 > Min IOPS: 0 > Average Latency(s): 0.472321 > Max latency(s): 15.636 > Min latency(s): 0.0188098 > > > On 22 January 2018 at 11:34, Steven Vacaroaia > <ste...@gmail.com> wrote: > > > sorry ..send the message too soon > Here is more info > Vendor Id : SEAGATE > Product Id : > ST600MM0006 > State : Online > Disk Type : > SAS,Hard Disk > Device > Capacity : > 558.375 GB > Power State : Active > > ( SSD is in slot 0) > > megacli -LDGetProp -Cache -LALL -a0 > > Adapter 0-VD 0(target id: 0): Cache > Policy:WriteThrough, ReadAheadNone, Direct, No Write Cache if bad BBU > Adapter 0-VD 1(target id: 1): Cache > Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU > Adapter 0-VD 2(target id: 2): Cache > Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU > Adapter 0-VD 3(target id: 3): Cache > Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU > Adapter 0-VD 4(target id: 4): Cache > Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU > Adapter 0-VD 5(target id: 5): Cache > Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU > > [root@osd01 ~]# megacli -LDGetProp > -DskCache -LALL > -a0 > > Adapter 0-VD 0(target id: 0): Disk Write > Cache : > Disabled > Adapter 0-VD 1(target id: 1): Disk Write > Cache : > Disk's Default > Adapter 0-VD 2(target id: 2): Disk Write > Cache : > Disk's Default > Adapter 0-VD 3(target id: 3): Disk Write > Cache : > Disk's Default > Adapter 0-VD 4(target id: 4): Disk Write > Cache : > Disk's Default > Adapter 0-VD 5(target id: 5): Disk Write > Cache : > Disk's Default > > > CPU > Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz > > > Centos 7 kernel 3.10.0-693.11.6.el7.x86_64 > > sysctl -p > net.ipv4.tcp_sack = 0 > net.core.netdev_budget = 600 > net.ipv4.tcp_window_scaling = 1 > net.core.rmem_max = 16777216 > net.core.wmem_max = 16777216 > net.core.rmem_default = 16777216 > net.core.wmem_default = 16777216 > net.core.optmem_max = 40960 > net.ipv4.tcp_rmem = 4096 87380 16777216 > net.ipv4.tcp_wmem = 4096 65536 16777216 > net.ipv4.tcp_syncookies = 0 > net.core.somaxconn = 1024 > net.core.netdev_max_backlog = 20000 > net.ipv4.tcp_max_syn_backlog = 30000 > net.ipv4.tcp_max_tw_buckets = 2000000 > net.ipv4.tcp_tw_reuse = 1 > net.ipv4.tcp_slow_start_after_idle = 0 > net.ipv4.conf.all.send_redirects = 0 > net.ipv4.conf.all.accept_redirects = 0 > net.ipv4.conf.all.accept_source_route = 0 > vm.min_free_kbytes = 262144 > vm.swappiness = 0 > vm.vfs_cache_pressure = 100 > fs.suid_dumpable = 0 > kernel.core_uses_pid = 1 > kernel.msgmax = 65536 > kernel.msgmnb = 65536 > kernel.randomize_va_space = 1 > kernel.sysrq = 0 > kernel.pid_max = 4194304 > fs.file-max = 100000 > > > ceph.conf > > > public_network = 10.10.30.0/24 > cluster_network = 192.168.0.0/24 > > > osd_op_num_threads_per_shard = 2 > osd_op_num_shards = 25 > osd_pool_default_size = 2 > osd_pool_default_min_size = 1 # Allow > writing 1 copy > in a degraded state > osd_pool_default_pg_num = 256 > osd_pool_default_pgp_num = 256 > osd_crush_chooseleaf_type = 1 > osd_scrub_load_threshold = 0.01 > osd_scrub_min_interval = 137438953472 > osd_scrub_max_interval = 137438953472 > osd_deep_scrub_interval = 137438953472 > osd_max_scrubs = 16 > osd_op_threads = 8 > osd_max_backfills = 1 > osd_recovery_max_active = 1 > osd_recovery_op_priority = 1 > > > > > debug_lockdep = 0/0 > debug_context = 0/0 > debug_crush = 0/0 > debug_buffer = 0/0 > debug_timer = 0/0 > debug_filer = 0/0 > debug_objecter = 0/0 > debug_rados = 0/0 > debug_rbd = 0/0 > debug_journaler = 0/0 > debug_objectcatcher = 0/0 > debug_client = 0/0 > debug_osd = 0/0 > debug_optracker = 0/0 > debug_objclass = 0/0 > debug_filestore = 0/0 > debug_journal = 0/0 > debug_ms = 0/0 > debug_monc = 0/0 > debug_tp = 0/0 > debug_auth = 0/0 > debug_finisher = 0/0 > debug_heartbeatmap = 0/0 > debug_perfcounter = 0/0 > debug_asok = 0/0 > debug_throttle = 0/0 > debug_mon = 0/0 > debug_paxos = 0/0 > debug_rgw = 0/0 > > > [mon] > mon_allow_pool_delete = true > > [osd] > osd_heartbeat_grace = 20 > osd_heartbeat_interval = 5 > bluestore_block_db_size = 16106127360 > <tel:(610)%20612-7360> > bluestore_block_wal_size = 1073741824 > > [osd.6] > host = osd01 > osd_journal = > /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.1d58775a- > 5019-42ea-8149-a126f51a2501 > crush_location = root=ssds host=osd01-ssd > > [osd.7] > host = osd02 > osd_journal = > /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.683dc52d- > 5d69-4ff0-b5d9-b17056a55681 > crush_location = root=ssds host=osd02-ssd > > [osd.8] > host = osd04 > osd_journal = > /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.bd7c0088- > b724-441e-9b88-9457305c541d > crush_location = root=ssds host=osd04-ssd > > > On 22 January 2018 at 11:29, Steven > Vacaroaia > <ste...@gmail.com> wrote: > > > Hi David, > > Yes, I meant no separate > partitions for WAL and > DB > > I am using 2 x 10 GB bonded ( > BONDING_OPTS="mode=4 miimon=100 xmit_hash_policy=1 lacp_rate=1") for > cluster and 1 x 1GB for public > Disks are > Vendor Id : TOSHIBA > Product Id > : > PX05SMB040Y > State > : Online > Disk Type > : SAS,Solid > State Device > Capacity > : 372.0 GB > > > On 22 January 2018 at 11:24, David > Turner > <drakonst...@gmail.com> wrote: > > > Disk models, other > hardware information > including CPU, network config? You say you're using Luminous, but then > say journal on same device. I'm assuming you mean that you just have > the bluestore OSD configured without a separate WAL or DB partition? > Any more specifics you can give will be helpful. > > On Mon, Jan 22, 2018 at > 11:20 AM Steven > Vacaroaia <ste...@gmail.com> wrote: > > > Hi, > > I'll appreciate if > you can provide > some guidance / suggestions regarding perfomance issues on a test > cluster ( 3 x DELL R620, 1 Entreprise SSD, 3 x 600 GB ,Entreprise HDD, 8 > cores, 64 GB RAM) > > I created 2 pools > ( replication > factor 2) one with only SSD and the other with only HDD > ( journal on same > disk for both) > > The perfomance is > quite similar > although I was expecting to be at least 5 times better > No issues noticed > using atop > > What should I > check / tune ? > > Many thanks > Steven > > > > HDD based pool ( > journal on the same > disk) > > ceph osd pool get > scbench256 all > > size: 2 > min_size: 1 > > crash_replay_interval: 0 > pg_num: 256 > pgp_num: 256 > crush_rule: > replicated_rule > hashpspool: true > nodelete: false > nopgchange: false > nosizechange: false > > write_fadvise_dontneed: false > noscrub: false > nodeep-scrub: false > use_gmt_hitset: 1 > auid: 0 > fast_read: 0 > > > rbd bench > --io-type write image1 > --pool=scbench256 > bench type write > io_size 4096 > io_threads 16 bytes 1073741824 pattern sequential > SEC OPS > OPS/SEC BYTES/SEC > 1 46816 > 46836.46 > 191842139.78 > 2 90658 > 45339.11 > 185709011.80 > 3 133671 > 44540.80 > 182439126.08 > 4 177341 > 44340.36 > 181618100.14 > 5 217300 > 43464.04 > 178028704.54 > 6 259595 > 42555.85 > 174308767.05 > elapsed: 6 > ops: 262144 > ops/sec: 42694.50 bytes/sec: 174876688.23 > > fio > /home/cephuser/write_256.fio > write-4M: (g=0): > rw=randread, > bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 > fio-2.2.8 > Starting 1 process > rbd engine: RBD > version: 1.12.0 > Jobs: 1 (f=1): > [r(1)] [100.0% done] > [66284KB/0KB/0KB /s] [16.6K/0/0 iops] [eta 00m:00s] > > > fio > /home/cephuser/write_256.fio > write-4M: (g=0): > rw=write, > bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 > fio-2.2.8 > Starting 1 process > rbd engine: RBD > version: 1.12.0 > Jobs: 1 (f=1): > [W(1)] [100.0% done] > [0KB/14464KB/0KB /s] [0/3616/0 iops] [eta 00m:00s] > > > SSD based pool > > > ceph osd pool get > ssdpool all > > size: 2 > min_size: 1 > > crash_replay_interval: 0 > pg_num: 128 > pgp_num: 128 > crush_rule: ssdpool > hashpspool: true > nodelete: false > nopgchange: false > nosizechange: false > > write_fadvise_dontneed: false > noscrub: false > nodeep-scrub: false > use_gmt_hitset: 1 > auid: 0 > fast_read: 0 > > rbd -p ssdpool > create --size 52100 > image2 > > > rbd bench > --io-type write image2 > --pool=ssdpool > bench type write > io_size 4096 > io_threads 16 bytes 1073741824 pattern sequential > SEC OPS > OPS/SEC BYTES/SEC > 1 42412 > 41867.57 > 171489557.93 > 2 78343 > 39180.86 > 160484805.88 > 3 118082 > 39076.48 > 160057256.16 > 4 155164 > 38683.98 > 158449572.38 > 5 192825 > 38307.59 > 156907885.84 > 6 230701 > 37716.95 > 154488608.16 > elapsed: 7 > ops: 262144 > ops/sec: 36862.89 bytes/sec: 150990387.29 > > > [root@osd01 ~]# > fio > /home/cephuser/write_256.fio > write-4M: (g=0): > rw=write, > bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 > fio-2.2.8 > Starting 1 process > rbd engine: RBD > version: 1.12.0 > Jobs: 1 (f=1): > [W(1)] [100.0% done] > [0KB/20224KB/0KB /s] [0/5056/0 iops] [eta 00m:00s] > > > fio > /home/cephuser/write_256.fio > write-4M: (g=0): > rw=randread, > bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 > fio-2.2.8 > Starting 1 process > rbd engine: RBD > version: 1.12.0 > Jobs: 1 (f=1): > [r(1)] [100.0% done] > [76096KB/0KB/0KB /s] [19.3K/0/0 iops] [eta 00m:00s] > > _______________________________________________ > ceph-users mailing > list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > > > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com