Re: [ceph-users] osd crash
I am using proxmox so i guess ist debian. I will update the kernel there are newer versions. But generally if a osd crashes like this - can it be hardware related? How to dismount the disk? I cant even make ps ax or losof -it hangs because my osd is still mounted and blocks everything... i cannot start it also... Von: Nick Fisk [mailto:n...@fisk.me.uk] Gesendet: Donnerstag, 01. Dezember 2016 13:15 An: VELARTIS Philipp Dürhammer; ceph-us...@ceph.com Betreff: RE: osd crash Are you using Ubuntu 16.04 (Guessing from your kernel version). There was a numa bug in early kernels, try updating to the latest in the 4.4 series. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of VELARTIS Philipp Dürhammer Sent: 01 December 2016 12:04 To: 'ceph-us...@ceph.com' <ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>> Subject: [ceph-users] osd crash Hello! Tonight i had a osd crash. See the dump below. Also this osd is still mounted. Whats the cause? A bug? What to do next? Thank You! Dec 1 00:31:30 ceph2 kernel: [17314369.493029] divide error: [#1] SMP Dec 1 00:31:30 ceph2 kernel: [17314369.493062] Modules linked in: act_police cls_basic sch_ingress sch_htb vhost_net vhost macvtap macvlan 8021q garp mrp veth nfsv3 softdog ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_NFLOG nfnetlink_log xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_tcpudp xt_addrtype xt_multiport xt_conntrack xt_set xt_mark ip_set_hash_net ip_set nfnetlink iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding xfs libcrc32c ipmi_ssif mxm_wmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_pcm snd_timer snd soundcore pcspkr input_leds sb_edac shpchp edac_core mei_me ioatdma mei lpc_ich i2c_i801 ipmi_si 8250_fintek wmi ipmi_msghandler mac_hid nf_conntrack_ftp nf_conntrack autofs4 ses enclosure hid_generic usbmouse usbkbd usbhid hid ixgbe(O) vxlan ip6_udp_tunnel megaraid_sas udp_tunnel isci ahci libahci libsas igb(O) scsi_transport_sas dca ptp pps_core fjes Dec 1 00:31:30 ceph2 kernel: [17314369.493708] CPU: 1 PID: 17291 Comm: ceph-osd Tainted: G O4.4.8-1-pve #1 Dec 1 00:31:30 ceph2 kernel: [17314369.493754] Hardware name: Thomas-Krenn.AG X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013 Dec 1 00:31:30 ceph2 kernel: [17314369.493799] task: 881f6ff05280 ti: 880037c4c000 task.ti: 880037c4c000 Dec 1 00:31:30 ceph2 kernel: [17314369.493843] RIP: 0010:[] [] task_numa_find_cpu+0x23d/0x710 Dec 1 00:31:30 ceph2 kernel: [17314369.493893] RSP: :880037c4fbd8 EFLAGS: 00010257 Dec 1 00:31:30 ceph2 kernel: [17314369.493919] RAX: RBX: 880037c4fc80 RCX: Dec 1 00:31:30 ceph2 kernel: [17314369.493962] RDX: RSI: 88103fa4 RDI: 881033f50c00 Dec 1 00:31:30 ceph2 kernel: [17314369.494006] RBP: 880037c4fc48 R08: 000202046ea8 R09: 036b Dec 1 00:31:30 ceph2 kernel: [17314369.494049] R10: 007c R11: 0540 R12: 88064fbd Dec 1 00:31:30 ceph2 kernel: [17314369.494093] R13: 0250 R14: 0540 R15: 0009 Dec 1 00:31:30 ceph2 kernel: [17314369.494136] FS: 7ff17dd6c700() GS:88103fa4() knlGS: Dec 1 00:31:30 ceph2 kernel: [17314369.494182] CS: 0010 DS: ES: CR0: 80050033 Dec 1 00:31:30 ceph2 kernel: [17314369.494209] CR2: 7ff17dd6aff8 CR3: 001025e4b000 CR4: 001426e0 Dec 1 00:31:30 ceph2 kernel: [17314369.494252] Stack: Dec 1 00:31:30 ceph2 kernel: [17314369.494273] 880037c4fbe8 81038219 003f 00017180 Dec 1 00:31:30 ceph2 kernel: [17314369.494323] 881f6ff05280 00017180 0251 ffe7 Dec 1 00:31:30 ceph2 kernel: [17314369.494374] 0251 881f6ff05280 880037c4fc80 00cb Dec 1 00:31:30 ceph2 kernel: [17314369.494424] Call Trace: Dec 1 00:31:30 ceph2 kernel: [17314369.494449] [] ? sched_clock+0x9/0x10 Dec 1 00:31:30 ceph2 kernel: [17314369.494476] [] task_numa_migrate+0x4e6/0xa00 Dec 1 00:31:30 ceph2 kernel: [17314369.494506] [] ? copy_to_iter+0x7c/0x260 Dec 1 00:31:30 ceph2 kernel: [17314369.494534] [] numa_migrate_preferred+0x79/0x80 Dec 1 00:31:30 ceph2 kernel: [17314369.494563] [] task_numa_fault+0x848/0xd10 Dec 1 00:31:30 ceph2 kernel: [17314369.494591] [] ? should_numa_migrate_memory+0x59/0x130 Dec 1 00:31:30 ceph2 kernel: [17314369.494623] [] handle_mm_fault+0xc64/0x1a20 Dec 1 00:31:30 ceph2 kernel: [17314369.494654] [] ? SYSC_recvfrom+0x144/0x160 Dec 1 00:31:30
[ceph-users] osd crash - disk hangs
Hello! Tonight i had a osd crash. See the dump below. Also this osd is still mounted. Whats the cause? A bug? What to do next? I cant do a lsof or ps ax because it hangs. Thank You! Dec 1 00:31:30 ceph2 kernel: [17314369.493029] divide error: [#1] SMP Dec 1 00:31:30 ceph2 kernel: [17314369.493062] Modules linked in: act_police cls_basic sch_ingress sch_htb vhost_net vhost macvtap macvlan 8021q garp mrp veth nfsv3 softdog ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_NFLOG nfnetlink_log xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_tcpudp xt_addrtype xt_multiport xt_conntrack xt_set xt_mark ip_set_hash_net ip_set nfnetlink iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding xfs libcrc32c ipmi_ssif mxm_wmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_pcm snd_timer snd soundcore pcspkr input_leds sb_edac shpchp edac_core mei_me ioatdma mei lpc_ich i2c_i801 ipmi_si 8250_fintek wmi ipmi_msghandler mac_hid nf_conntrack_ftp nf_conntrack autofs4 ses enclosure hid_generic usbmouse usbkbd usbhid hid ixgbe(O) vxlan ip6_udp_tunnel megaraid_sas udp_tunnel isci ahci libahci libsas igb(O) scsi_transport_sas dca ptp pps_core fjes Dec 1 00:31:30 ceph2 kernel: [17314369.493708] CPU: 1 PID: 17291 Comm: ceph-osd Tainted: G O4.4.8-1-pve #1 Dec 1 00:31:30 ceph2 kernel: [17314369.493754] Hardware name: Thomas-Krenn.AG X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013 Dec 1 00:31:30 ceph2 kernel: [17314369.493799] task: 881f6ff05280 ti: 880037c4c000 task.ti: 880037c4c000 Dec 1 00:31:30 ceph2 kernel: [17314369.493843] RIP: 0010:[] [] task_numa_find_cpu+0x23d/0x710 Dec 1 00:31:30 ceph2 kernel: [17314369.493893] RSP: :880037c4fbd8 EFLAGS: 00010257 Dec 1 00:31:30 ceph2 kernel: [17314369.493919] RAX: RBX: 880037c4fc80 RCX: Dec 1 00:31:30 ceph2 kernel: [17314369.493962] RDX: RSI: 88103fa4 RDI: 881033f50c00 Dec 1 00:31:30 ceph2 kernel: [17314369.494006] RBP: 880037c4fc48 R08: 000202046ea8 R09: 036b Dec 1 00:31:30 ceph2 kernel: [17314369.494049] R10: 007c R11: 0540 R12: 88064fbd Dec 1 00:31:30 ceph2 kernel: [17314369.494093] R13: 0250 R14: 0540 R15: 0009 Dec 1 00:31:30 ceph2 kernel: [17314369.494136] FS: 7ff17dd6c700() GS:88103fa4() knlGS: Dec 1 00:31:30 ceph2 kernel: [17314369.494182] CS: 0010 DS: ES: CR0: 80050033 Dec 1 00:31:30 ceph2 kernel: [17314369.494209] CR2: 7ff17dd6aff8 CR3: 001025e4b000 CR4: 001426e0 Dec 1 00:31:30 ceph2 kernel: [17314369.494252] Stack: Dec 1 00:31:30 ceph2 kernel: [17314369.494273] 880037c4fbe8 81038219 003f 00017180 Dec 1 00:31:30 ceph2 kernel: [17314369.494323] 881f6ff05280 00017180 0251 ffe7 Dec 1 00:31:30 ceph2 kernel: [17314369.494374] 0251 881f6ff05280 880037c4fc80 00cb Dec 1 00:31:30 ceph2 kernel: [17314369.494424] Call Trace: Dec 1 00:31:30 ceph2 kernel: [17314369.494449] [] ? sched_clock+0x9/0x10 Dec 1 00:31:30 ceph2 kernel: [17314369.494476] [] task_numa_migrate+0x4e6/0xa00 Dec 1 00:31:30 ceph2 kernel: [17314369.494506] [] ? copy_to_iter+0x7c/0x260 Dec 1 00:31:30 ceph2 kernel: [17314369.494534] [] numa_migrate_preferred+0x79/0x80 Dec 1 00:31:30 ceph2 kernel: [17314369.494563] [] task_numa_fault+0x848/0xd10 Dec 1 00:31:30 ceph2 kernel: [17314369.494591] [] ? should_numa_migrate_memory+0x59/0x130 Dec 1 00:31:30 ceph2 kernel: [17314369.494623] [] handle_mm_fault+0xc64/0x1a20 Dec 1 00:31:30 ceph2 kernel: [17314369.494654] [] ? SYSC_recvfrom+0x144/0x160 Dec 1 00:31:30 ceph2 kernel: [17314369.494684] [] __do_page_fault+0x19d/0x410 Dec 1 00:31:30 ceph2 kernel: [17314369.494713] [] ? exit_to_usermode_loop+0xb0/0xd0 Dec 1 00:31:30 ceph2 kernel: [17314369.494742] [] do_page_fault+0x22/0x30 Dec 1 00:31:30 ceph2 kernel: [17314369.494771] [] page_fault+0x28/0x30 Dec 1 00:31:30 ceph2 kernel: [17314369.494797] Code: 4d b0 4c 89 ef e8 b4 d0 ff ff 48 8b 4d b0 49 8b 85 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4d 78 4c 8b 6b 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c0 48 29 c1 4c 03 43 48 4c 39 75 d0 Dec 1 00:31:30 ceph2 kernel: [17314369.495005] RIP [] task_numa_find_cpu+0x23d/0x710 Dec 1 00:31:30 ceph2 kernel: [17314369.495035] RSP Dec 1 00:31:30 ceph2 kernel: [17314369.495347] ---[ end trace 7106c9a72840cc7d ]--- ___ ceph-users mailing list
[ceph-users] osd crash
Hello! Tonight i had a osd crash. See the dump below. Also this osd is still mounted. Whats the cause? A bug? What to do next? Thank You! Dec 1 00:31:30 ceph2 kernel: [17314369.493029] divide error: [#1] SMP Dec 1 00:31:30 ceph2 kernel: [17314369.493062] Modules linked in: act_police cls_basic sch_ingress sch_htb vhost_net vhost macvtap macvlan 8021q garp mrp veth nfsv3 softdog ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_NFLOG nfnetlink_log xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_tcpudp xt_addrtype xt_multiport xt_conntrack xt_set xt_mark ip_set_hash_net ip_set nfnetlink iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding xfs libcrc32c ipmi_ssif mxm_wmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_pcm snd_timer snd soundcore pcspkr input_leds sb_edac shpchp edac_core mei_me ioatdma mei lpc_ich i2c_i801 ipmi_si 8250_fintek wmi ipmi_msghandler mac_hid nf_conntrack_ftp nf_conntrack autofs4 ses enclosure hid_generic usbmouse usbkbd usbhid hid ixgbe(O) vxlan ip6_udp_tunnel megaraid_sas udp_tunnel isci ahci libahci libsas igb(O) scsi_transport_sas dca ptp pps_core fjes Dec 1 00:31:30 ceph2 kernel: [17314369.493708] CPU: 1 PID: 17291 Comm: ceph-osd Tainted: G O4.4.8-1-pve #1 Dec 1 00:31:30 ceph2 kernel: [17314369.493754] Hardware name: Thomas-Krenn.AG X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013 Dec 1 00:31:30 ceph2 kernel: [17314369.493799] task: 881f6ff05280 ti: 880037c4c000 task.ti: 880037c4c000 Dec 1 00:31:30 ceph2 kernel: [17314369.493843] RIP: 0010:[] [] task_numa_find_cpu+0x23d/0x710 Dec 1 00:31:30 ceph2 kernel: [17314369.493893] RSP: :880037c4fbd8 EFLAGS: 00010257 Dec 1 00:31:30 ceph2 kernel: [17314369.493919] RAX: RBX: 880037c4fc80 RCX: Dec 1 00:31:30 ceph2 kernel: [17314369.493962] RDX: RSI: 88103fa4 RDI: 881033f50c00 Dec 1 00:31:30 ceph2 kernel: [17314369.494006] RBP: 880037c4fc48 R08: 000202046ea8 R09: 036b Dec 1 00:31:30 ceph2 kernel: [17314369.494049] R10: 007c R11: 0540 R12: 88064fbd Dec 1 00:31:30 ceph2 kernel: [17314369.494093] R13: 0250 R14: 0540 R15: 0009 Dec 1 00:31:30 ceph2 kernel: [17314369.494136] FS: 7ff17dd6c700() GS:88103fa4() knlGS: Dec 1 00:31:30 ceph2 kernel: [17314369.494182] CS: 0010 DS: ES: CR0: 80050033 Dec 1 00:31:30 ceph2 kernel: [17314369.494209] CR2: 7ff17dd6aff8 CR3: 001025e4b000 CR4: 001426e0 Dec 1 00:31:30 ceph2 kernel: [17314369.494252] Stack: Dec 1 00:31:30 ceph2 kernel: [17314369.494273] 880037c4fbe8 81038219 003f 00017180 Dec 1 00:31:30 ceph2 kernel: [17314369.494323] 881f6ff05280 00017180 0251 ffe7 Dec 1 00:31:30 ceph2 kernel: [17314369.494374] 0251 881f6ff05280 880037c4fc80 00cb Dec 1 00:31:30 ceph2 kernel: [17314369.494424] Call Trace: Dec 1 00:31:30 ceph2 kernel: [17314369.494449] [] ? sched_clock+0x9/0x10 Dec 1 00:31:30 ceph2 kernel: [17314369.494476] [] task_numa_migrate+0x4e6/0xa00 Dec 1 00:31:30 ceph2 kernel: [17314369.494506] [] ? copy_to_iter+0x7c/0x260 Dec 1 00:31:30 ceph2 kernel: [17314369.494534] [] numa_migrate_preferred+0x79/0x80 Dec 1 00:31:30 ceph2 kernel: [17314369.494563] [] task_numa_fault+0x848/0xd10 Dec 1 00:31:30 ceph2 kernel: [17314369.494591] [] ? should_numa_migrate_memory+0x59/0x130 Dec 1 00:31:30 ceph2 kernel: [17314369.494623] [] handle_mm_fault+0xc64/0x1a20 Dec 1 00:31:30 ceph2 kernel: [17314369.494654] [] ? SYSC_recvfrom+0x144/0x160 Dec 1 00:31:30 ceph2 kernel: [17314369.494684] [] __do_page_fault+0x19d/0x410 Dec 1 00:31:30 ceph2 kernel: [17314369.494713] [] ? exit_to_usermode_loop+0xb0/0xd0 Dec 1 00:31:30 ceph2 kernel: [17314369.494742] [] do_page_fault+0x22/0x30 Dec 1 00:31:30 ceph2 kernel: [17314369.494771] [] page_fault+0x28/0x30 Dec 1 00:31:30 ceph2 kernel: [17314369.494797] Code: 4d b0 4c 89 ef e8 b4 d0 ff ff 48 8b 4d b0 49 8b 85 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4d 78 4c 8b 6b 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c0 48 29 c1 4c 03 43 48 4c 39 75 d0 Dec 1 00:31:30 ceph2 kernel: [17314369.495005] RIP [] task_numa_find_cpu+0x23d/0x710 Dec 1 00:31:30 ceph2 kernel: [17314369.495035] RSP Dec 1 00:31:30 ceph2 kernel: [17314369.495347] ---[ end trace 7106c9a72840cc7d ]--- ___ ceph-users mailing list ceph-users@lists.ceph.com
[ceph-users] changing ceph config - but still same mount options
Hi before i tested with : osd mount options xfs = "rw,noatime,nobarrier,inode64,logbsize=256k,logbufs=8,allocsize = 4M" (added inode64) and then changed to osd mount options xfs = "rw,noatime,nobarrier,logbsize=256k,logbufs=8,allocsize = 4M" but after reboot it still mounts with inode64 as i have only 1tb disks i dont need the inode64 option. And it ist he only differnce tot he other server. This server has the same hardware but has a lot higher commit and aply rates .. so may ist because oft he inode64? /// total osd section [osd] keyring = /var/lib/ceph/osd/ceph-$id/keyring osd max backfills = 1 osd recovery max active = 1 osd mkfs options xfs = "" osd mount options xfs = "rw,noatime,nobarrier,logbsize=256k,logbufs=8,allocsize = 4M" osd op threads = 4 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd read speed only 1/4 of write speed
Hello, Read speed inside our vms (most of them windows) is only ¼ of the write speed. Write speed is about 450MB/s - 500mb/s and Read is only about 100/MB/s Our network is 10Gbit for OSDs and 10GB for MONS. We have 3 Servers with 15 osds each ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to check real rados read speed
Hi, with ceph -w i can see ceph writes reads and io. But the reads seem to be only reads wich are not served from osd or monitor cache. As we have 128gb with every ceph server our monitors and osds are set to use a lot of ram. Monitoring only very view times show some ceph reads... but a lot more writes (but it should be more reads) Even when i do a benchmark inside a virtual machine with 2gb i will se 2gb of writes but no reads... Is there any way to monitor real reads from rados and not only osd reads? Btw where can i check the reads oft the scrub process? Thank you philipp ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] write performance per disk
hi, yes i did a test now with 16 instances with 16 and 32 threads each. the absolute maximum was 1100mb/sec but network was still not seturated. all disks had the same load with about 110mb/sec - the maximum of the disks i got using direct access was 170/mb sec writes... this is not a too bad value... i will make more tests with 10 and 20 virt machines at the same time. do you think 110 mb per disk is the ceph maximum? (for 170 theoretical per disk) 110 per disks inclues journals also... thanx philipp Von: ceph-users [ceph-users-boun...@lists.ceph.com]quot; im Auftrag von quot;Mark Nelson [mark.nel...@inktank.com] Gesendet: Freitag, 04. Juli 2014 16:10 Bis: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] write performance per disk On 07/03/2014 08:11 AM, VELARTIS Philipp Dürhammer wrote: Hi, I have a ceph cluster setup (with 45 sata disk journal on disks) and get only 450mb/sec writes seq (maximum playing around with threads in rados bench) with replica of 2 Which is about ~20Mb writes per disk (what y see in atop also) theoretically with replica2 and having journals on disk should be 45 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 satas in reality have 120mb/sec so the theoretical output should be more. I would expect to have between 40-50mb/sec for each sata disk Can somebody confirm that he can reach this speed with a setup with journals on the satas (with journals on ssd speed should be 100mb per disk)? or does ceph only give about ¼ of the speed for a disk? (and not the ½ as expected because of journals) My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for osd traffic with reads I can saturate the network but writes is far away. And I would expect at least to saturate the 10gig with sequential writes also In addition to the advice wido is providing (which I wholeheartedly agree with!), you might want to check your controller/disk configuration. If you have journals on the same disks as the data, some times putting the disks into single-disk RAID0 LUNs with writeback cache enabled can help keep journal and data writes from causing seek contention. This only works if you have a controller with cache and a battery though. Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] write performance per disk
I use between 1 and 128 in different steps... But 500mb write is the max playing around. Uff its so hard to tune ceph... so many people have problems... ;-) -Ursprüngliche Nachricht- Von: Wido den Hollander [mailto:w...@42on.com] Gesendet: Freitag, 04. Juli 2014 10:55 An: VELARTIS Philipp Dürhammer; ceph-users@lists.ceph.com Betreff: Re: AW: [ceph-users] write performance per disk On 07/03/2014 04:32 PM, VELARTIS Philipp Dürhammer wrote: HI, Ceph.conf: osd journal size = 15360 rbd cache = true rbd cache size = 2147483648 rbd cache max dirty = 1073741824 rbd cache max dirty age = 100 osd recovery max active = 1 osd max backfills = 1 osd mkfs options xfs = -f -i size=2048 osd mount options xfs = rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M osd op threads = 8 so it should be 8 threads? How many threads are you using with rados bench? Don't touch the op threads from the start, usually the default is just fine. All 3 machines have more or less the same disk load at the same time. also the disks: sdb 35.5687.10 6849.09 617310 48540806 sdc 26.7572.62 5148.58 514701 36488992 sdd 35.1553.48 6802.57 378993 48211141 sde 31.0479.04 6208.48 560141 44000710 sdf 32.7938.35 6238.28 271805 44211891 sdg 31.6777.84 5987.45 551680 42434167 sdh 32.9551.29 6315.76 363533 44761001 sdi 31.6756.93 5956.29 403478 42213336 sdj 35.8377.82 6929.31 551501 49109354 sdk 36.8673.84 7291.00 523345 51672704 sdl 36.02 112.90 7040.47 800177 49897132 sdm 33.2538.02 6455.05 269446 45748178 sdn 33.5239.10 6645.19 277101 47095696 sdo 33.2646.22 6388.20 327541 45274394 sdp 33.3874.12 6480.62 525325 45929369 the question is: is this a poor performance to get max 500mb/write with 45 disks and replica 2 or should I expect this? You should be able to get more as long as the I/O is done in parallel. Wido -Ursprüngliche Nachricht- Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Wido den Hollander Gesendet: Donnerstag, 03. Juli 2014 15:22 An: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] write performance per disk On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote: Hi, I have a ceph cluster setup (with 45 sata disk journal on disks) and get only 450mb/sec writes seq (maximum playing around with threads in rados bench) with replica of 2 How many threads? Which is about ~20Mb writes per disk (what y see in atop also) theoretically with replica2 and having journals on disk should be 45 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 satas in reality have 120mb/sec so the theoretical output should be more. I would expect to have between 40-50mb/sec for each sata disk Can somebody confirm that he can reach this speed with a setup with journals on the satas (with journals on ssd speed should be 100mb per disk)? or does ceph only give about ¼ of the speed for a disk? (and not the ½ as expected because of journals) Did you verify how much each machine is doing? It could be that the data is not distributed evenly and that on a certain machine the drives are doing 50MB/sec. My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for osd traffic with reads I can saturate the network but writes is far away. And I would expect at least to saturate the 10gig with sequential writes also Should be possible, but with 3 servers the data distribution might not be optimal causing a lower write performance. I've seen 10Gbit write performance on multiple clusters without any problems. Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] write performance per disk
Hi, I have a ceph cluster setup (with 45 sata disk journal on disks) and get only 450mb/sec writes seq (maximum playing around with threads in rados bench) with replica of 2 Which is about ~20Mb writes per disk (what y see in atop also) theoretically with replica2 and having journals on disk should be 45 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 satas in reality have 120mb/sec so the theoretical output should be more. I would expect to have between 40-50mb/sec for each sata disk Can somebody confirm that he can reach this speed with a setup with journals on the satas (with journals on ssd speed should be 100mb per disk)? or does ceph only give about ¼ of the speed for a disk? (and not the ½ as expected because of journals) My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for osd traffic with reads I can saturate the network but writes is far away. And I would expect at least to saturate the 10gig with sequential writes also Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] write performance per disk
HI, Ceph.conf: osd journal size = 15360 rbd cache = true rbd cache size = 2147483648 rbd cache max dirty = 1073741824 rbd cache max dirty age = 100 osd recovery max active = 1 osd max backfills = 1 osd mkfs options xfs = -f -i size=2048 osd mount options xfs = rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M osd op threads = 8 so it should be 8 threads? All 3 machines have more or less the same disk load at the same time. also the disks: sdb 35.5687.10 6849.09 617310 48540806 sdc 26.7572.62 5148.58 514701 36488992 sdd 35.1553.48 6802.57 378993 48211141 sde 31.0479.04 6208.48 560141 44000710 sdf 32.7938.35 6238.28 271805 44211891 sdg 31.6777.84 5987.45 551680 42434167 sdh 32.9551.29 6315.76 363533 44761001 sdi 31.6756.93 5956.29 403478 42213336 sdj 35.8377.82 6929.31 551501 49109354 sdk 36.8673.84 7291.00 523345 51672704 sdl 36.02 112.90 7040.47 800177 49897132 sdm 33.2538.02 6455.05 269446 45748178 sdn 33.5239.10 6645.19 277101 47095696 sdo 33.2646.22 6388.20 327541 45274394 sdp 33.3874.12 6480.62 525325 45929369 the question is: is this a poor performance to get max 500mb/write with 45 disks and replica 2 or should I expect this? -Ursprüngliche Nachricht- Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Wido den Hollander Gesendet: Donnerstag, 03. Juli 2014 15:22 An: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] write performance per disk On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote: Hi, I have a ceph cluster setup (with 45 sata disk journal on disks) and get only 450mb/sec writes seq (maximum playing around with threads in rados bench) with replica of 2 How many threads? Which is about ~20Mb writes per disk (what y see in atop also) theoretically with replica2 and having journals on disk should be 45 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 satas in reality have 120mb/sec so the theoretical output should be more. I would expect to have between 40-50mb/sec for each sata disk Can somebody confirm that he can reach this speed with a setup with journals on the satas (with journals on ssd speed should be 100mb per disk)? or does ceph only give about ¼ of the speed for a disk? (and not the ½ as expected because of journals) Did you verify how much each machine is doing? It could be that the data is not distributed evenly and that on a certain machine the drives are doing 50MB/sec. My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for osd traffic with reads I can saturate the network but writes is far away. And I would expect at least to saturate the 10gig with sequential writes also Should be possible, but with 3 servers the data distribution might not be optimal causing a lower write performance. I've seen 10Gbit write performance on multiple clusters without any problems. Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can we map OSDs from different hosts (servers) to a Pool in Ceph
Hi, Will ceph support mixing different disk pools (example spinners and ssds) in the future a little bit better (more safe)? Thank you philipp On Wed, Jun 11, 2014 at 5:18 AM, Davide Fanciola dfanci...@gmail.com wrote: Hi, we have a similar setup where we have SSD and HDD in the same hosts. Our very basic crushmap is configured as follows: # ceph osd tree # id weight type name up/down reweight -6 3 root ssd 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 -5 3 root platters 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 -1 3 root default -2 1 host chgva-srv-stor-001 0 1 osd.0 up 1 3 1 osd.3 up 1 -3 1 host chgva-srv-stor-002 1 1 osd.1 up 1 4 1 osd.4 up 1 -4 1 host chgva-srv-stor-003 2 1 osd.2 up 1 5 1 osd.5 up 1 We do not seem to have problems with this setup, but i'm not sure if it's a good practice to have elements appearing multiple times in different branches. On the other hand, I see no way to follow the physical hierarchy of a datacenter for pools, since a pool can be spread among servers/racks/rooms... Can someone confirm this crushmap is any good for our configuration? If you accidentally use the default node anywhere, you'll get data scattered across both classes of device. If you try and use both the platters and ssd nodes within a single CRUSH rule, you might end up with copies of data on the same host (reducing your data resiliency). Otherwise this is just fine. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] someone using btrfs with ceph
Is someone using btrfs in production? I know people say it's still not stable. But do we use so many features with ceph? And facebook uses it also in production. Would be a big speed gain. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] SSD and SATA Pool CRUSHMAP
Hi, What is the best way to implement a ssd and sata pool? (both on 3 servers each) We have 3 servers with 15 satas and 4 ssds each. I prefer to have a fast and small ssd pool and a big sata pool then cache tiering or ssds as journals as 15 satas per server with writeback cache is ok from performance and I can use the ssd's for a really fast ssd pool. Thanks philipp ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com