Hello, I'm using ceph-0.80.7 with Mirantis OpenStack IceHouse - RBD for nova ephemeral disk and glance.
I have two ceph nodes with the following specifications 2x CEPH - OSD - 2 Replication factor Model : SuperMicro X8DT3 CPU : Dual intel E5620 RAM : 32G HDD : 2x 480GB SSD RAID-1 ( OS and Journal ) 22x 4TB SATA RAID-10 ( OSD ) 3x Controllers - CEPH Monitor Model : ProLiant DL180 G6 CPU : Dual intel E5620 RAM : 24G *Network Public : 1G NIC ( eth0 ) - Juniper 2200-48 Storage,Admin,Management - 10G NIC ( eth1 ) - Arista 7050T-36 (32x 10GE UTP, 4x 10GE SFP+) *I'm getting very poor ceph performance and high I/O with write/read And when light or deep scrub is running the load on VM's went crazy. ceph.conf tuning didn't help. [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = xx.xx.xx.xx xx.xx.xx.xx xx.xx.xx.xx mon_initial_members = node-xx node-xx node-xx fsid = osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 50 public_network = xx.xx.xx.xx osd_journal_size = 100000 auth_supported = cephx osd_pool_default_pgp_num = 50 osd_pool_default_flag_hashpspool = true osd_mkfs_type = xfs cluster_network = xx.xx.xx.xx mon_clock_drift_allowed = 2 [osd] osd_op_threads=16 osd_disk_threads=4 osd_disk_thread_ioprio_priority=7 osd_disk_thread_ioprio_class=idle filestore op threads=8 filestore_queue_max_ops=100000 filestore_queue_committing_max_ops=100000 filestore_queue_max_bytes=1073741824 filestore_queue_committing_max_bytes=1073741824 filestore_max_sync_interval=10 filestore_fd_cache_size=20240 filestore_flusher=false filestore_flush_min=0 filestore_sync_flush=true journal_dio=true journal_aio=true journal_max_write_bytes=1073741824 journal_max_write_entries=50000 journal_queue_max_bytes=1073741824 journal_queue_max_ops=100000 ms_dispatch_throttle_bytes=1073741824 objecter_infilght_op_bytes=1073741824 objecter_inflight_ops=1638400 osd_recovery_threads = 16 #osd_recovery_max_active = 2 #osd_recovery_max_chunk = 8388608 #osd_recovery_op_priority = 2 #osd_max_backfills = 1 [client] rbd_cache = true rbd_cache_writethrough_until_flush = true rbd_cache_size = 20 GiB rbd_cache_max_dirty = 16 GiB rbd_cache_target_dirty = 512 MiB *Results inside CentOS6 64bit VM : [root@vm ~]# dd if=/dev/zero of=./largefile bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 17.3417 s, 61.9 MB/s [root@vm ~]# rm -rf /tmp/test && spew -i 50 -v -d --write -r -b 4096 10M /tmp/test Iteration: 1 Total runtime: 00:00:00 WTR: 27753.91 KiB/s Transfer time: 00:00:00 IOPS: 6938.48 Iteration: 2 Total runtime: 00:00:00 WTR: 29649.53 KiB/s Transfer time: 00:00:00 IOPS: 7412.38 Iteration: 3 Total runtime: 00:00:01 WTR: 30897.44 KiB/s Transfer time: 00:00:00 IOPS: 7724.36 Iteration: 4 Total runtime: 00:00:02 WTR: 7474.93 KiB/s Transfer time: 00:00:01 IOPS: 1868.73 Iteration: 5 Total runtime: 00:00:02 WTR: 24810.11 KiB/s Transfer time: 00:00:00 IOPS: 6202.53 Iteration: 6 Total runtime: 00:00:03 WTR: 28534.01 KiB/s Transfer time: 00:00:00 IOPS: 7133.50 Iteration: 7 Total runtime: 00:00:03 WTR: 27687.95 KiB/s Transfer time: 00:00:00 IOPS: 6921.99 Iteration: 8 Total runtime: 00:00:03 WTR: 29195.91 KiB/s Transfer time: 00:00:00 IOPS: 7298.98 Iteration: 9 Total runtime: 00:00:04 WTR: 28315.53 KiB/s Transfer time: 00:00:00 IOPS: 7078.88 Iteration: 10 Total runtime: 00:00:04 WTR: 27971.42 KiB/s Transfer time: 00:00:00 IOPS: 6992.85 Iteration: 11 Total runtime: 00:00:04 WTR: 29873.39 KiB/s Transfer time: 00:00:00 IOPS: 7468.35 Iteration: 12 Total runtime: 00:00:05 WTR: 32364.30 KiB/s Transfer time: 00:00:00 IOPS: 8091.08 Iteration: 13 Total runtime: 00:00:05 WTR: 32619.98 KiB/s Transfer time: 00:00:00 IOPS: 8155.00 Iteration: 14 Total runtime: 00:00:06 WTR: 18714.54 KiB/s Transfer time: 00:00:00 IOPS: 4678.64 Iteration: 15 Total runtime: 00:00:06 WTR: 17070.37 KiB/s Transfer time: 00:00:00 IOPS: 4267.59 Iteration: 16 Total runtime: 00:00:07 WTR: 22403.23 KiB/s Transfer time: 00:00:00 IOPS: 5600.81 Iteration: 17 Total runtime: 00:00:07 WTR: 16076.39 KiB/s Transfer time: 00:00:00 IOPS: 4019.10 Iteration: 18 Total runtime: 00:00:08 WTR: 26219.77 KiB/s Transfer time: 00:00:00 IOPS: 6554.94 Iteration: 19 Total runtime: 00:00:08 WTR: 29054.01 KiB/s Transfer time: 00:00:00 IOPS: 7263.50 Iteration: 20 Total runtime: 00:00:08 WTR: 27210.02 KiB/s Transfer time: 00:00:00 IOPS: 6802.50 Iteration: 21 Total runtime: 00:00:09 WTR: 28502.72 KiB/s Transfer time: 00:00:00 IOPS: 7125.68 Iteration: 22 Total runtime: 00:00:10 WTR: 11172.32 KiB/s Transfer time: 00:00:00 IOPS: 2793.08 Iteration: 23 Total runtime: 00:00:10 WTR: 29038.44 KiB/s Transfer time: 00:00:00 IOPS: 7259.61 Iteration: 24 Total runtime: 00:00:11 WTR: 25374.86 KiB/s Transfer time: 00:00:00 IOPS: 6343.72 Iteration: 25 Total runtime: 00:00:11 WTR: 19123.03 KiB/s Transfer time: 00:00:00 IOPS: 4780.76 Iteration: 26 Total runtime: 00:00:11 WTR: 27481.82 KiB/s Transfer time: 00:00:00 IOPS: 6870.45 Iteration: 27 Total runtime: 00:00:12 WTR: 11416.62 KiB/s Transfer time: 00:00:00 IOPS: 2854.15 Iteration: 28 Total runtime: 00:00:13 WTR: 33922.34 KiB/s Transfer time: 00:00:00 IOPS: 8480.58 Iteration: 29 Total runtime: 00:00:13 WTR: 26893.30 KiB/s Transfer time: 00:00:00 IOPS: 6723.32 Iteration: 30 Total runtime: 00:00:13 WTR: 27222.82 KiB/s Transfer time: 00:00:00 IOPS: 6805.71 Iteration: 31 Total runtime: 00:00:14 WTR: 19842.92 KiB/s Transfer time: 00:00:00 IOPS: 4960.73 Iteration: 32 Total runtime: 00:00:14 WTR: 27585.91 KiB/s Transfer time: 00:00:00 IOPS: 6896.48 Iteration: 33 Total runtime: 00:00:15 WTR: 31579.30 KiB/s Transfer time: 00:00:00 IOPS: 7894.83 Iteration: 34 Total runtime: 00:00:15 WTR: 26563.32 KiB/s Transfer time: 00:00:00 IOPS: 6640.83 Iteration: 35 Total runtime: 00:00:15 WTR: 24829.90 KiB/s Transfer time: 00:00:00 IOPS: 6207.48 Iteration: 36 Total runtime: 00:00:16 WTR: 26769.70 KiB/s Transfer time: 00:00:00 IOPS: 6692.43 Iteration: 37 Total runtime: 00:00:16 WTR: 21256.06 KiB/s Transfer time: 00:00:00 IOPS: 5314.01 Iteration: 38 Total runtime: 00:00:17 WTR: 14035.99 KiB/s Transfer time: 00:00:00 IOPS: 3509.00 Iteration: 39 Total runtime: 00:00:17 WTR: 31576.48 KiB/s Transfer time: 00:00:00 IOPS: 7894.12 Iteration: 40 Total runtime: 00:00:18 WTR: 27915.22 KiB/s Transfer time: 00:00:00 IOPS: 6978.80 Iteration: 41 Total runtime: 00:00:18 WTR: 33392.14 KiB/s Transfer time: 00:00:00 IOPS: 8348.03 Iteration: 42 Total runtime: 00:00:18 WTR: 27876.61 KiB/s Transfer time: 00:00:00 IOPS: 6969.15 Iteration: 43 Total runtime: 00:00:19 WTR: 28092.05 KiB/s Transfer time: 00:00:00 IOPS: 7023.01 Iteration: 44 Total runtime: 00:00:19 WTR: 29125.74 KiB/s Transfer time: 00:00:00 IOPS: 7281.44 Iteration: 45 Total runtime: 00:00:19 WTR: 26937.87 KiB/s Transfer time: 00:00:00 IOPS: 6734.47 Iteration: 46 Total runtime: 00:00:20 WTR: 23235.92 KiB/s Transfer time: 00:00:00 IOPS: 5808.98 Iteration: 47 Total runtime: 00:00:20 WTR: 27946.07 KiB/s Transfer time: 00:00:00 IOPS: 6986.52 Iteration: 48 Total runtime: 00:00:21 WTR: 17759.06 KiB/s Transfer time: 00:00:00 IOPS: 4439.77 Iteration: 49 Total runtime: 00:00:23 WTR: 4779.38 KiB/s Transfer time: 00:00:02 IOPS: 1194.84 Iteration: 50 Total runtime: 00:00:23 WTR: 27997.65 KiB/s Transfer time: 00:00:00 IOPS: 6999.41 Total iterations: 50 Total runtime: 00:00:23 Total write transfer time (WTT): 00:00:23 Total write transfer rate (WTR): 21493.23 KiB/s Total write IOPS: 5373.31 IOPS I do not know if the hardware is impacting the performance that's why i need your advice may be some tuning could help. If it's a hardware issue please help finding out an answer for the following 5 questions. 1. Is it better to have small number of OSD nodes with many hardisks such SuperMicro SC846TQ or it's better to get high number of OSD nodes with small number of HDD's such HP DL380pG8 ? I need around 20TB storage, SuperMicro SC846TQ can get 24 hardisk. I may attach 24x 960G SSD - NO Raid - with 3x SuperMicro servers - replication factor 3. Or it's better to scale-out and put smaller disks on many servers such ( HP DL380pG8/2x Intel Xeon E5-2650 ) which can hold 12 hardisk And Attach 12x 960G SSD - NO Raid - 6x OSD nodes - replication factor 3. 2. I'm using Mirantis/Fuel 5 for provisioning and deployment of nodes When i attach the new ceph osd nodes to the environment, Will the data be replicated automatically from my current old SuperMicro OSD nodes to the new servers after the deployment complete ? 3. I will use 2x 960G SSD RAID 1 for OS Is it recommended put the SSD journal disk as a separate partition on the same disk of OS ? 4. Is it safe to remove the OLD ceph nodes while i'm currently using 2 replication factors after adding the new hardware nodes ? 5. Do i need RAID 1 for the journal hardisk ? and if not, What will happen if one of the journal HDD's failed ? 6. Should i use RAID Level for the drivers on OSD nodes ? or it's better to go without RAID ? Your advice is highly appreciated. Best Regards,
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com