Re: [ceph-users] Luminous - bad performance

Steven Vacaroaia Mon, 22 Jan 2018 09:10:44 -0800

Hi David,

I noticed the public interface of the server I am running the test from is
heavily used  so I will bond that one too


I doubt though that this explains the poor performance

Thanks for your advice

Steven



On 22 January 2018 at 12:02, David Turner <drakonst...@gmail.com> wrote:

> I'm not speaking to anything other than your configuration.
>
> "I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100
> xmit_hash_policy=1 lacp_rate=1")  for cluster and 1 x 1GB for public"
> It might not be a bad idea for you to forgo the public network on the 1Gb
> interfaces and either put everything on one network or use VLANs on the
> 10Gb connections.  I lean more towards that in particular because your
> public network doesn't have a bond on it.  Just as a note, communication
> between the OSDs and the MONs are all done on the public network.  If that
> interface goes down, then the OSDs are likely to be marked down/out from
> your cluster.  I'm a fan of VLANs, but if you don't have the equipment or
> expertise to go that route, then just using the same subnet for public and
> private is a decent way to go.
>
> On Mon, Jan 22, 2018 at 11:37 AM Steven Vacaroaia <ste...@gmail.com>
> wrote:
>
>> I did test with rados bench ..here are the results
>>
>> rados bench -p ssdpool 300 -t 12 write --no-cleanup && rados bench -p
>> ssdpool 300 -t 12  seq
>>
>> Total time run:         300.322608
>> Total writes made:      10632
>> Write size:             4194304
>> Object size:            4194304
>> Bandwidth (MB/sec):     141.608
>> Stddev Bandwidth:       74.1065
>> Max bandwidth (MB/sec): 264
>> Min bandwidth (MB/sec): 0
>> Average IOPS:           35
>> Stddev IOPS:            18
>> Max IOPS:               66
>> Min IOPS:               0
>> Average Latency(s):     0.33887
>> Stddev Latency(s):      0.701947
>> Max latency(s):         9.80161
>> Min latency(s):         0.015171
>>
>> Total time run:       300.829945
>> Total reads made:     10070
>> Read size:            4194304
>> Object size:          4194304
>> Bandwidth (MB/sec):   133.896
>> Average IOPS:         33
>> Stddev IOPS:          14
>> Max IOPS:             68
>> Min IOPS:             3
>> Average Latency(s):   0.35791
>> Max latency(s):       4.68213
>> Min latency(s):       0.0107572
>>
>>
>> rados bench -p scbench256 300 -t 12 write --no-cleanup && rados bench -p
>> scbench256 300 -t 12  seq
>>
>> Total time run:         300.747004
>> Total writes made:      10239
>> Write size:             4194304
>> Object size:            4194304
>> Bandwidth (MB/sec):     136.181
>> Stddev Bandwidth:       75.5
>> Max bandwidth (MB/sec): 272
>> Min bandwidth (MB/sec): 0
>> Average IOPS:           34
>> Stddev IOPS:            18
>> Max IOPS:               68
>> Min IOPS:               0
>> Average Latency(s):     0.352339
>> Stddev Latency(s):      0.72211
>> Max latency(s):         9.62304
>> Min latency(s):         0.00936316
>> hints = 1
>>
>>
>> Total time run:       300.610761
>> Total reads made:     7628
>> Read size:            4194304
>> Object size:          4194304
>> Bandwidth (MB/sec):   101.5
>> Average IOPS:         25
>> Stddev IOPS:          11
>> Max IOPS:             61
>> Min IOPS:             0
>> Average Latency(s):   0.472321
>> Max latency(s):       15.636
>> Min latency(s):       0.0188098
>>
>>
>> On 22 January 2018 at 11:34, Steven Vacaroaia <ste...@gmail.com> wrote:
>>
>>> sorry ..send the message too soon
>>> Here is more info
>>> Vendor Id          : SEAGATE
>>>                 Product Id         : ST600MM0006
>>>                 State              : Online
>>>                 Disk Type          : SAS,Hard Disk Device
>>>                 Capacity           : 558.375 GB
>>>                 Power State        : Active
>>>
>>> ( SSD is in slot 0)
>>>
>>>  megacli -LDGetProp  -Cache -LALL -a0
>>>
>>> Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone,
>>> Direct, No Write Cache if bad BBU
>>> Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive,
>>> Direct, No Write Cache if bad BBU
>>> Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive,
>>> Direct, No Write Cache if bad BBU
>>> Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive,
>>> Direct, No Write Cache if bad BBU
>>> Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAdaptive,
>>> Direct, No Write Cache if bad BBU
>>> Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAdaptive,
>>> Direct, No Write Cache if bad BBU
>>>
>>> [root@osd01 ~]#  megacli -LDGetProp  -DskCache -LALL -a0
>>>
>>> Adapter 0-VD 0(target id: 0): Disk Write Cache : Disabled
>>> Adapter 0-VD 1(target id: 1): Disk Write Cache : Disk's Default
>>> Adapter 0-VD 2(target id: 2): Disk Write Cache : Disk's Default
>>> Adapter 0-VD 3(target id: 3): Disk Write Cache : Disk's Default
>>> Adapter 0-VD 4(target id: 4): Disk Write Cache : Disk's Default
>>> Adapter 0-VD 5(target id: 5): Disk Write Cache : Disk's Default
>>>
>>>
>>> CPU
>>> Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
>>>
>>> Centos 7 kernel 3.10.0-693.11.6.el7.x86_64
>>>
>>> sysctl -p
>>> net.ipv4.tcp_sack = 0
>>> net.core.netdev_budget = 600
>>> net.ipv4.tcp_window_scaling = 1
>>> net.core.rmem_max = 16777216
>>> net.core.wmem_max = 16777216
>>> net.core.rmem_default = 16777216
>>> net.core.wmem_default = 16777216
>>> net.core.optmem_max = 40960
>>> net.ipv4.tcp_rmem = 4096 87380 16777216
>>> net.ipv4.tcp_wmem = 4096 65536 16777216
>>> net.ipv4.tcp_syncookies = 0
>>> net.core.somaxconn = 1024
>>> net.core.netdev_max_backlog = 20000
>>> net.ipv4.tcp_max_syn_backlog = 30000
>>> net.ipv4.tcp_max_tw_buckets = 2000000
>>> net.ipv4.tcp_tw_reuse = 1
>>> net.ipv4.tcp_slow_start_after_idle = 0
>>> net.ipv4.conf.all.send_redirects = 0
>>> net.ipv4.conf.all.accept_redirects = 0
>>> net.ipv4.conf.all.accept_source_route = 0
>>> vm.min_free_kbytes = 262144
>>> vm.swappiness = 0
>>> vm.vfs_cache_pressure = 100
>>> fs.suid_dumpable = 0
>>> kernel.core_uses_pid = 1
>>> kernel.msgmax = 65536
>>> kernel.msgmnb = 65536
>>> kernel.randomize_va_space = 1
>>> kernel.sysrq = 0
>>> kernel.pid_max = 4194304
>>> fs.file-max = 100000
>>>
>>>
>>> ceph.conf
>>>
>>>
>>> public_network = 10.10.30.0/24
>>> cluster_network = 192.168.0.0/24
>>>
>>>
>>> osd_op_num_threads_per_shard = 2
>>> osd_op_num_shards = 25
>>> osd_pool_default_size = 2
>>> osd_pool_default_min_size = 1 # Allow writing 1 copy in a degraded state
>>> osd_pool_default_pg_num = 256
>>> osd_pool_default_pgp_num = 256
>>> osd_crush_chooseleaf_type = 1
>>> osd_scrub_load_threshold = 0.01
>>> osd_scrub_min_interval = 137438953472
>>> osd_scrub_max_interval = 137438953472
>>> osd_deep_scrub_interval = 137438953472
>>> osd_max_scrubs = 16
>>> osd_op_threads = 8
>>> osd_max_backfills = 1
>>> osd_recovery_max_active = 1
>>> osd_recovery_op_priority = 1
>>>
>>>
>>>
>>>
>>> debug_lockdep = 0/0
>>> debug_context = 0/0
>>> debug_crush = 0/0
>>> debug_buffer = 0/0
>>> debug_timer = 0/0
>>> debug_filer = 0/0
>>> debug_objecter = 0/0
>>> debug_rados = 0/0
>>> debug_rbd = 0/0
>>> debug_journaler = 0/0
>>> debug_objectcatcher = 0/0
>>> debug_client = 0/0
>>> debug_osd = 0/0
>>> debug_optracker = 0/0
>>> debug_objclass = 0/0
>>> debug_filestore = 0/0
>>> debug_journal = 0/0
>>> debug_ms = 0/0
>>> debug_monc = 0/0
>>> debug_tp = 0/0
>>> debug_auth = 0/0
>>> debug_finisher = 0/0
>>> debug_heartbeatmap = 0/0
>>> debug_perfcounter = 0/0
>>> debug_asok = 0/0
>>> debug_throttle = 0/0
>>> debug_mon = 0/0
>>> debug_paxos = 0/0
>>> debug_rgw = 0/0
>>>
>>>
>>> [mon]
>>> mon_allow_pool_delete = true
>>>
>>> [osd]
>>> osd_heartbeat_grace = 20
>>> osd_heartbeat_interval = 5
>>> bluestore_block_db_size = 16106127360 <(610)%20612-7360>
>>> bluestore_block_wal_size = 1073741824
>>>
>>> [osd.6]
>>> host = osd01
>>> osd_journal = /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-
>>> 062c0ceff05d.1d58775a-5019-42ea-8149-a126f51a2501
>>> crush_location = root=ssds host=osd01-ssd
>>>
>>> [osd.7]
>>> host = osd02
>>> osd_journal = /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-
>>> 062c0ceff05d.683dc52d-5d69-4ff0-b5d9-b17056a55681
>>> crush_location = root=ssds host=osd02-ssd
>>>
>>> [osd.8]
>>> host = osd04
>>> osd_journal = /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-
>>> 062c0ceff05d.bd7c0088-b724-441e-9b88-9457305c541d
>>> crush_location = root=ssds host=osd04-ssd
>>>
>>>
>>> On 22 January 2018 at 11:29, Steven Vacaroaia <ste...@gmail.com> wrote:
>>>
>>>> Hi David,
>>>>
>>>> Yes, I meant no separate partitions for WAL and DB
>>>>
>>>> I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100
>>>> xmit_hash_policy=1 lacp_rate=1")  for cluster and 1 x 1GB for public
>>>> Disks are
>>>> Vendor Id          : TOSHIBA
>>>>                 Product Id         : PX05SMB040Y
>>>>                 State              : Online
>>>>                 Disk Type          : SAS,Solid State Device
>>>>                 Capacity           : 372.0 GB
>>>>
>>>>
>>>> On 22 January 2018 at 11:24, David Turner <drakonst...@gmail.com>
>>>> wrote:
>>>>
>>>>> Disk models, other hardware information including CPU, network
>>>>> config?  You say you're using Luminous, but then say journal on same
>>>>> device.  I'm assuming you mean that you just have the bluestore OSD
>>>>> configured without a separate WAL or DB partition?  Any more specifics you
>>>>> can give will be helpful.
>>>>>
>>>>> On Mon, Jan 22, 2018 at 11:20 AM Steven Vacaroaia <ste...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'll appreciate if you can provide some guidance / suggestions
>>>>>> regarding perfomance issues on a test cluster ( 3 x DELL R620, 1 
>>>>>> Entreprise
>>>>>> SSD, 3 x 600 GB ,Entreprise HDD, 8 cores, 64 GB RAM)
>>>>>>
>>>>>> I created 2 pools ( replication factor 2) one with only SSD and the
>>>>>> other with only HDD
>>>>>> ( journal on same disk for both)
>>>>>>
>>>>>> The perfomance is quite similar although I was expecting to be at
>>>>>> least 5 times better
>>>>>> No issues noticed using atop
>>>>>>
>>>>>> What  should I check / tune ?
>>>>>>
>>>>>> Many thanks
>>>>>> Steven
>>>>>>
>>>>>>
>>>>>>
>>>>>> HDD based pool ( journal on the same disk)
>>>>>>
>>>>>> ceph osd pool get scbench256 all
>>>>>>
>>>>>> size: 2
>>>>>> min_size: 1
>>>>>> crash_replay_interval: 0
>>>>>> pg_num: 256
>>>>>> pgp_num: 256
>>>>>> crush_rule: replicated_rule
>>>>>> hashpspool: true
>>>>>> nodelete: false
>>>>>> nopgchange: false
>>>>>> nosizechange: false
>>>>>> write_fadvise_dontneed: false
>>>>>> noscrub: false
>>>>>> nodeep-scrub: false
>>>>>> use_gmt_hitset: 1
>>>>>> auid: 0
>>>>>> fast_read: 0
>>>>>>
>>>>>>
>>>>>> rbd bench --io-type write  image1 --pool=scbench256
>>>>>> bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
>>>>>> sequential
>>>>>>   SEC       OPS   OPS/SEC   BYTES/SEC
>>>>>>     1     46816  46836.46  191842139.78
>>>>>>     2     90658  45339.11  185709011.80
>>>>>>     3    133671  44540.80  182439126.08
>>>>>>     4    177341  44340.36  181618100.14
>>>>>>     5    217300  43464.04  178028704.54
>>>>>>     6    259595  42555.85  174308767.05
>>>>>> elapsed:     6  ops:   262144  ops/sec: 42694.50  bytes/sec:
>>>>>> 174876688.23
>>>>>>
>>>>>> fio /home/cephuser/write_256.fio
>>>>>> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
>>>>>> iodepth=32
>>>>>> fio-2.2.8
>>>>>> Starting 1 process
>>>>>> rbd engine: RBD version: 1.12.0
>>>>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [66284KB/0KB/0KB /s] [16.6K/0/0
>>>>>> iops] [eta 00m:00s]
>>>>>>
>>>>>>
>>>>>> fio /home/cephuser/write_256.fio
>>>>>> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
>>>>>> iodepth=32
>>>>>> fio-2.2.8
>>>>>> Starting 1 process
>>>>>> rbd engine: RBD version: 1.12.0
>>>>>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/14464KB/0KB /s] [0/3616/0
>>>>>> iops] [eta 00m:00s]
>>>>>>
>>>>>>
>>>>>> SSD based pool
>>>>>>
>>>>>>
>>>>>> ceph osd pool get ssdpool all
>>>>>>
>>>>>> size: 2
>>>>>> min_size: 1
>>>>>> crash_replay_interval: 0
>>>>>> pg_num: 128
>>>>>> pgp_num: 128
>>>>>> crush_rule: ssdpool
>>>>>> hashpspool: true
>>>>>> nodelete: false
>>>>>> nopgchange: false
>>>>>> nosizechange: false
>>>>>> write_fadvise_dontneed: false
>>>>>> noscrub: false
>>>>>> nodeep-scrub: false
>>>>>> use_gmt_hitset: 1
>>>>>> auid: 0
>>>>>> fast_read: 0
>>>>>>
>>>>>>  rbd -p ssdpool create --size 52100 image2
>>>>>>
>>>>>> rbd bench --io-type write  image2 --pool=ssdpool
>>>>>> bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
>>>>>> sequential
>>>>>>   SEC       OPS   OPS/SEC   BYTES/SEC
>>>>>>     1     42412  41867.57  171489557.93
>>>>>>     2     78343  39180.86  160484805.88
>>>>>>     3    118082  39076.48  160057256.16
>>>>>>     4    155164  38683.98  158449572.38
>>>>>>     5    192825  38307.59  156907885.84
>>>>>>     6    230701  37716.95  154488608.16
>>>>>> elapsed:     7  ops:   262144  ops/sec: 36862.89  bytes/sec:
>>>>>> 150990387.29
>>>>>>
>>>>>>
>>>>>> [root@osd01 ~]# fio /home/cephuser/write_256.fio
>>>>>> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
>>>>>> iodepth=32
>>>>>> fio-2.2.8
>>>>>> Starting 1 process
>>>>>> rbd engine: RBD version: 1.12.0
>>>>>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/20224KB/0KB /s] [0/5056/0
>>>>>> iops] [eta 00m:00s]
>>>>>>
>>>>>>
>>>>>> fio /home/cephuser/write_256.fio
>>>>>> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
>>>>>> iodepth=32
>>>>>> fio-2.2.8
>>>>>> Starting 1 process
>>>>>> rbd engine: RBD version: 1.12.0
>>>>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [76096KB/0KB/0KB /s] [19.3K/0/0
>>>>>> iops] [eta 00m:00s]
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>
>>>
>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous - bad performance

Reply via email to