Re: [ceph-users] osd crash

2016-12-01 Thread VELARTIS Philipp Dürhammer
I am using proxmox so i guess ist debian. I will update the kernel there are 
newer versions. But generally if a osd crashes like this - can it be hardware 
related?
How to dismount the disk? I cant even make ps ax or losof -it hangs because my 
osd is still mounted and blocks everything... i cannot start it also...

Von: Nick Fisk [mailto:n...@fisk.me.uk]
Gesendet: Donnerstag, 01. Dezember 2016 13:15
An: VELARTIS Philipp Dürhammer; ceph-us...@ceph.com
Betreff: RE: osd crash

Are you using Ubuntu 16.04 (Guessing from your kernel version). There was a 
numa bug in early kernels, try updating to the latest in the 4.4 series.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
VELARTIS Philipp Dürhammer
Sent: 01 December 2016 12:04
To: 'ceph-us...@ceph.com' <ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>>
Subject: [ceph-users] osd crash

Hello!

Tonight i had a osd crash. See the dump below. Also this osd is still mounted. 
Whats the cause? A bug? What to do next?

Thank You!

Dec  1 00:31:30 ceph2 kernel: [17314369.493029] divide error:  [#1] SMP
Dec  1 00:31:30 ceph2 kernel: [17314369.493062] Modules linked in: act_police 
cls_basic sch_ingress sch_htb vhost_net vhost macvtap macvlan 8021q garp mrp 
veth nfsv3 softdog ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 
ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_NFLOG 
nfnetlink_log xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_tcpudp 
xt_addrtype xt_multiport xt_conntrack xt_set xt_mark ip_set_hash_net ip_set 
nfnetlink iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd 
grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding xfs libcrc32c 
ipmi_ssif mxm_wmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd snd_pcm snd_timer snd soundcore pcspkr 
input_leds sb_edac shpchp edac_core mei_me ioatdma mei lpc_ich i2c_i801 ipmi_si 
8250_fintek wmi ipmi_msghandler mac_hid nf_conntrack_ftp nf_conntrack autofs4 
ses enclosure hid_generic usbmouse usbkbd usbhid hid ixgbe(O) vxlan 
ip6_udp_tunnel megaraid_sas udp_tunnel isci ahci libahci libsas igb(O) 
scsi_transport_sas dca ptp pps_core fjes
Dec  1 00:31:30 ceph2 kernel: [17314369.493708] CPU: 1 PID: 17291 Comm: 
ceph-osd Tainted: G   O4.4.8-1-pve #1
Dec  1 00:31:30 ceph2 kernel: [17314369.493754] Hardware name: Thomas-Krenn.AG 
X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013
Dec  1 00:31:30 ceph2 kernel: [17314369.493799] task: 881f6ff05280 ti: 
880037c4c000 task.ti: 880037c4c000
Dec  1 00:31:30 ceph2 kernel: [17314369.493843] RIP: 0010:[]  
[] task_numa_find_cpu+0x23d/0x710
Dec  1 00:31:30 ceph2 kernel: [17314369.493893] RSP: :880037c4fbd8  
EFLAGS: 00010257
Dec  1 00:31:30 ceph2 kernel: [17314369.493919] RAX:  RBX: 
880037c4fc80 RCX: 
Dec  1 00:31:30 ceph2 kernel: [17314369.493962] RDX:  RSI: 
88103fa4 RDI: 881033f50c00
Dec  1 00:31:30 ceph2 kernel: [17314369.494006] RBP: 880037c4fc48 R08: 
000202046ea8 R09: 036b
Dec  1 00:31:30 ceph2 kernel: [17314369.494049] R10: 007c R11: 
0540 R12: 88064fbd
Dec  1 00:31:30 ceph2 kernel: [17314369.494093] R13: 0250 R14: 
0540 R15: 0009
Dec  1 00:31:30 ceph2 kernel: [17314369.494136] FS:  7ff17dd6c700() 
GS:88103fa4() knlGS:
Dec  1 00:31:30 ceph2 kernel: [17314369.494182] CS:  0010 DS:  ES:  
CR0: 80050033
Dec  1 00:31:30 ceph2 kernel: [17314369.494209] CR2: 7ff17dd6aff8 CR3: 
001025e4b000 CR4: 001426e0
Dec  1 00:31:30 ceph2 kernel: [17314369.494252] Stack:
Dec  1 00:31:30 ceph2 kernel: [17314369.494273]  880037c4fbe8 
81038219 003f 00017180
Dec  1 00:31:30 ceph2 kernel: [17314369.494323]  881f6ff05280 
00017180 0251 ffe7
Dec  1 00:31:30 ceph2 kernel: [17314369.494374]  0251 
881f6ff05280 880037c4fc80 00cb
Dec  1 00:31:30 ceph2 kernel: [17314369.494424] Call Trace:
Dec  1 00:31:30 ceph2 kernel: [17314369.494449]  [] ? 
sched_clock+0x9/0x10
Dec  1 00:31:30 ceph2 kernel: [17314369.494476]  [] 
task_numa_migrate+0x4e6/0xa00
Dec  1 00:31:30 ceph2 kernel: [17314369.494506]  [] ? 
copy_to_iter+0x7c/0x260
Dec  1 00:31:30 ceph2 kernel: [17314369.494534]  [] 
numa_migrate_preferred+0x79/0x80
Dec  1 00:31:30 ceph2 kernel: [17314369.494563]  [] 
task_numa_fault+0x848/0xd10
Dec  1 00:31:30 ceph2 kernel: [17314369.494591]  [] ? 
should_numa_migrate_memory+0x59/0x130
Dec  1 00:31:30 ceph2 kernel: [17314369.494623]  [] 
handle_mm_fault+0xc64/0x1a20
Dec  1 00:31:30 ceph2 kernel: [17314369.494654]  [] ? 
SYSC_recvfrom+0x144/0x160
Dec  1 00:31:30 

[ceph-users] osd crash - disk hangs

2016-12-01 Thread VELARTIS Philipp Dürhammer
Hello!

Tonight i had a osd crash. See the dump below. Also this osd is still mounted. 
Whats the cause? A bug? What to do next? I cant do a lsof or ps ax because it 
hangs.

Thank You!

Dec  1 00:31:30 ceph2 kernel: [17314369.493029] divide error:  [#1] SMP
Dec  1 00:31:30 ceph2 kernel: [17314369.493062] Modules linked in: act_police 
cls_basic sch_ingress sch_htb vhost_net vhost macvtap macvlan 8021q garp mrp 
veth nfsv3 softdog ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 
ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_NFLOG 
nfnetlink_log xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_tcpudp 
xt_addrtype xt_multiport xt_conntrack xt_set xt_mark ip_set_hash_net ip_set 
nfnetlink iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd 
grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding xfs libcrc32c 
ipmi_ssif mxm_wmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd snd_pcm snd_timer snd soundcore pcspkr 
input_leds sb_edac shpchp edac_core mei_me ioatdma mei lpc_ich i2c_i801 ipmi_si 
8250_fintek wmi ipmi_msghandler mac_hid nf_conntrack_ftp nf_conntrack autofs4 
ses enclosure hid_generic usbmouse usbkbd usbhid hid ixgbe(O) vxlan 
ip6_udp_tunnel megaraid_sas udp_tunnel isci ahci libahci libsas igb(O) 
scsi_transport_sas dca ptp pps_core fjes
Dec  1 00:31:30 ceph2 kernel: [17314369.493708] CPU: 1 PID: 17291 Comm: 
ceph-osd Tainted: G   O4.4.8-1-pve #1
Dec  1 00:31:30 ceph2 kernel: [17314369.493754] Hardware name: Thomas-Krenn.AG 
X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013
Dec  1 00:31:30 ceph2 kernel: [17314369.493799] task: 881f6ff05280 ti: 
880037c4c000 task.ti: 880037c4c000
Dec  1 00:31:30 ceph2 kernel: [17314369.493843] RIP: 0010:[]  
[] task_numa_find_cpu+0x23d/0x710
Dec  1 00:31:30 ceph2 kernel: [17314369.493893] RSP: :880037c4fbd8  
EFLAGS: 00010257
Dec  1 00:31:30 ceph2 kernel: [17314369.493919] RAX:  RBX: 
880037c4fc80 RCX: 
Dec  1 00:31:30 ceph2 kernel: [17314369.493962] RDX:  RSI: 
88103fa4 RDI: 881033f50c00
Dec  1 00:31:30 ceph2 kernel: [17314369.494006] RBP: 880037c4fc48 R08: 
000202046ea8 R09: 036b
Dec  1 00:31:30 ceph2 kernel: [17314369.494049] R10: 007c R11: 
0540 R12: 88064fbd
Dec  1 00:31:30 ceph2 kernel: [17314369.494093] R13: 0250 R14: 
0540 R15: 0009
Dec  1 00:31:30 ceph2 kernel: [17314369.494136] FS:  7ff17dd6c700() 
GS:88103fa4() knlGS:
Dec  1 00:31:30 ceph2 kernel: [17314369.494182] CS:  0010 DS:  ES:  
CR0: 80050033
Dec  1 00:31:30 ceph2 kernel: [17314369.494209] CR2: 7ff17dd6aff8 CR3: 
001025e4b000 CR4: 001426e0
Dec  1 00:31:30 ceph2 kernel: [17314369.494252] Stack:
Dec  1 00:31:30 ceph2 kernel: [17314369.494273]  880037c4fbe8 
81038219 003f 00017180
Dec  1 00:31:30 ceph2 kernel: [17314369.494323]  881f6ff05280 
00017180 0251 ffe7
Dec  1 00:31:30 ceph2 kernel: [17314369.494374]  0251 
881f6ff05280 880037c4fc80 00cb
Dec  1 00:31:30 ceph2 kernel: [17314369.494424] Call Trace:
Dec  1 00:31:30 ceph2 kernel: [17314369.494449]  [] ? 
sched_clock+0x9/0x10
Dec  1 00:31:30 ceph2 kernel: [17314369.494476]  [] 
task_numa_migrate+0x4e6/0xa00
Dec  1 00:31:30 ceph2 kernel: [17314369.494506]  [] ? 
copy_to_iter+0x7c/0x260
Dec  1 00:31:30 ceph2 kernel: [17314369.494534]  [] 
numa_migrate_preferred+0x79/0x80
Dec  1 00:31:30 ceph2 kernel: [17314369.494563]  [] 
task_numa_fault+0x848/0xd10
Dec  1 00:31:30 ceph2 kernel: [17314369.494591]  [] ? 
should_numa_migrate_memory+0x59/0x130
Dec  1 00:31:30 ceph2 kernel: [17314369.494623]  [] 
handle_mm_fault+0xc64/0x1a20
Dec  1 00:31:30 ceph2 kernel: [17314369.494654]  [] ? 
SYSC_recvfrom+0x144/0x160
Dec  1 00:31:30 ceph2 kernel: [17314369.494684]  [] 
__do_page_fault+0x19d/0x410
Dec  1 00:31:30 ceph2 kernel: [17314369.494713]  [] ? 
exit_to_usermode_loop+0xb0/0xd0
Dec  1 00:31:30 ceph2 kernel: [17314369.494742]  [] 
do_page_fault+0x22/0x30
Dec  1 00:31:30 ceph2 kernel: [17314369.494771]  [] 
page_fault+0x28/0x30
Dec  1 00:31:30 ceph2 kernel: [17314369.494797] Code: 4d b0 4c 89 ef e8 b4 d0 
ff ff 48 8b 4d b0 49 8b 85 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4d 
78 4c 8b 6b 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c0 48 29 c1 4c 03 43 48 
4c 39 75 d0
Dec  1 00:31:30 ceph2 kernel: [17314369.495005] RIP  [] 
task_numa_find_cpu+0x23d/0x710
Dec  1 00:31:30 ceph2 kernel: [17314369.495035]  RSP 
Dec  1 00:31:30 ceph2 kernel: [17314369.495347] ---[ end trace 7106c9a72840cc7d 
]---

___
ceph-users mailing list

[ceph-users] osd crash

2016-12-01 Thread VELARTIS Philipp Dürhammer
Hello!

Tonight i had a osd crash. See the dump below. Also this osd is still mounted. 
Whats the cause? A bug? What to do next?

Thank You!

Dec  1 00:31:30 ceph2 kernel: [17314369.493029] divide error:  [#1] SMP
Dec  1 00:31:30 ceph2 kernel: [17314369.493062] Modules linked in: act_police 
cls_basic sch_ingress sch_htb vhost_net vhost macvtap macvlan 8021q garp mrp 
veth nfsv3 softdog ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 
ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_NFLOG 
nfnetlink_log xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_tcpudp 
xt_addrtype xt_multiport xt_conntrack xt_set xt_mark ip_set_hash_net ip_set 
nfnetlink iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd 
grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding xfs libcrc32c 
ipmi_ssif mxm_wmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd snd_pcm snd_timer snd soundcore pcspkr 
input_leds sb_edac shpchp edac_core mei_me ioatdma mei lpc_ich i2c_i801 ipmi_si 
8250_fintek wmi ipmi_msghandler mac_hid nf_conntrack_ftp nf_conntrack autofs4 
ses enclosure hid_generic usbmouse usbkbd usbhid hid ixgbe(O) vxlan 
ip6_udp_tunnel megaraid_sas udp_tunnel isci ahci libahci libsas igb(O) 
scsi_transport_sas dca ptp pps_core fjes
Dec  1 00:31:30 ceph2 kernel: [17314369.493708] CPU: 1 PID: 17291 Comm: 
ceph-osd Tainted: G   O4.4.8-1-pve #1
Dec  1 00:31:30 ceph2 kernel: [17314369.493754] Hardware name: Thomas-Krenn.AG 
X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013
Dec  1 00:31:30 ceph2 kernel: [17314369.493799] task: 881f6ff05280 ti: 
880037c4c000 task.ti: 880037c4c000
Dec  1 00:31:30 ceph2 kernel: [17314369.493843] RIP: 0010:[]  
[] task_numa_find_cpu+0x23d/0x710
Dec  1 00:31:30 ceph2 kernel: [17314369.493893] RSP: :880037c4fbd8  
EFLAGS: 00010257
Dec  1 00:31:30 ceph2 kernel: [17314369.493919] RAX:  RBX: 
880037c4fc80 RCX: 
Dec  1 00:31:30 ceph2 kernel: [17314369.493962] RDX:  RSI: 
88103fa4 RDI: 881033f50c00
Dec  1 00:31:30 ceph2 kernel: [17314369.494006] RBP: 880037c4fc48 R08: 
000202046ea8 R09: 036b
Dec  1 00:31:30 ceph2 kernel: [17314369.494049] R10: 007c R11: 
0540 R12: 88064fbd
Dec  1 00:31:30 ceph2 kernel: [17314369.494093] R13: 0250 R14: 
0540 R15: 0009
Dec  1 00:31:30 ceph2 kernel: [17314369.494136] FS:  7ff17dd6c700() 
GS:88103fa4() knlGS:
Dec  1 00:31:30 ceph2 kernel: [17314369.494182] CS:  0010 DS:  ES:  
CR0: 80050033
Dec  1 00:31:30 ceph2 kernel: [17314369.494209] CR2: 7ff17dd6aff8 CR3: 
001025e4b000 CR4: 001426e0
Dec  1 00:31:30 ceph2 kernel: [17314369.494252] Stack:
Dec  1 00:31:30 ceph2 kernel: [17314369.494273]  880037c4fbe8 
81038219 003f 00017180
Dec  1 00:31:30 ceph2 kernel: [17314369.494323]  881f6ff05280 
00017180 0251 ffe7
Dec  1 00:31:30 ceph2 kernel: [17314369.494374]  0251 
881f6ff05280 880037c4fc80 00cb
Dec  1 00:31:30 ceph2 kernel: [17314369.494424] Call Trace:
Dec  1 00:31:30 ceph2 kernel: [17314369.494449]  [] ? 
sched_clock+0x9/0x10
Dec  1 00:31:30 ceph2 kernel: [17314369.494476]  [] 
task_numa_migrate+0x4e6/0xa00
Dec  1 00:31:30 ceph2 kernel: [17314369.494506]  [] ? 
copy_to_iter+0x7c/0x260
Dec  1 00:31:30 ceph2 kernel: [17314369.494534]  [] 
numa_migrate_preferred+0x79/0x80
Dec  1 00:31:30 ceph2 kernel: [17314369.494563]  [] 
task_numa_fault+0x848/0xd10
Dec  1 00:31:30 ceph2 kernel: [17314369.494591]  [] ? 
should_numa_migrate_memory+0x59/0x130
Dec  1 00:31:30 ceph2 kernel: [17314369.494623]  [] 
handle_mm_fault+0xc64/0x1a20
Dec  1 00:31:30 ceph2 kernel: [17314369.494654]  [] ? 
SYSC_recvfrom+0x144/0x160
Dec  1 00:31:30 ceph2 kernel: [17314369.494684]  [] 
__do_page_fault+0x19d/0x410
Dec  1 00:31:30 ceph2 kernel: [17314369.494713]  [] ? 
exit_to_usermode_loop+0xb0/0xd0
Dec  1 00:31:30 ceph2 kernel: [17314369.494742]  [] 
do_page_fault+0x22/0x30
Dec  1 00:31:30 ceph2 kernel: [17314369.494771]  [] 
page_fault+0x28/0x30
Dec  1 00:31:30 ceph2 kernel: [17314369.494797] Code: 4d b0 4c 89 ef e8 b4 d0 
ff ff 48 8b 4d b0 49 8b 85 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4d 
78 4c 8b 6b 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c0 48 29 c1 4c 03 43 48 
4c 39 75 d0
Dec  1 00:31:30 ceph2 kernel: [17314369.495005] RIP  [] 
task_numa_find_cpu+0x23d/0x710
Dec  1 00:31:30 ceph2 kernel: [17314369.495035]  RSP 
Dec  1 00:31:30 ceph2 kernel: [17314369.495347] ---[ end trace 7106c9a72840cc7d 
]---
___
ceph-users mailing list
ceph-users@lists.ceph.com

[ceph-users] changing ceph config - but still same mount options

2016-03-20 Thread VELARTIS Philipp Dürhammer
Hi before i tested with :
osd mount options xfs = 
"rw,noatime,nobarrier,inode64,logbsize=256k,logbufs=8,allocsize = 4M" (added 
inode64)

and then changed to
osd mount options xfs = "rw,noatime,nobarrier,logbsize=256k,logbufs=8,allocsize 
= 4M"

but after reboot it still mounts with inode64

as i have only 1tb disks i dont need the inode64 option. And it ist he only 
differnce tot he other server. This server has the same hardware but has a lot 
higher commit and aply rates .. so may ist because oft he inode64?

/// total osd section
[osd]
 keyring = /var/lib/ceph/osd/ceph-$id/keyring
 osd max backfills = 1
 osd recovery max active = 1
 osd mkfs options xfs = ""
 osd mount options xfs = 
"rw,noatime,nobarrier,logbsize=256k,logbufs=8,allocsize = 4M"
 osd op threads = 4
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd read speed only 1/4 of write speed

2014-12-16 Thread VELARTIS Philipp Dürhammer
Hello,

Read speed inside our vms (most of them windows) is only ¼ of the write speed.
Write speed is about 450MB/s - 500mb/s and
Read is only about 100/MB/s

Our network is 10Gbit for OSDs and 10GB for MONS. We have 3 Servers with 15 
osds each

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to check real rados read speed

2014-10-29 Thread VELARTIS Philipp Dürhammer
Hi,

with ceph -w i can see ceph writes reads and io.
But the reads seem to be only reads wich are not served from osd or monitor 
cache.
As we have 128gb with every ceph server our monitors and osds are set to use a 
lot of ram.
Monitoring only very view times show some ceph reads... but a lot more writes 
(but it should be more reads)
Even when i do a  benchmark inside a virtual machine with 2gb i will se 2gb of 
writes but no reads...

Is there any way to monitor real reads from rados and not only osd reads?
Btw where can i check the reads oft the scrub process?

Thank you
philipp

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] write performance per disk

2014-07-06 Thread VELARTIS Philipp Dürhammer
hi,

yes i did a test now with 16 instances with 16 and 32 threads each.
the absolute maximum was 1100mb/sec but network was still not seturated.
all disks had the same load with about 110mb/sec - the maximum of the disks i 
got using direct access was 170/mb sec writes...

this is not a too bad value... i will make more tests with 10 and 20 virt 
machines at the same time.

do you think 110 mb per disk is the ceph maximum? (for 170 theoretical per 
disk) 
110 per disks inclues journals also...

thanx
philipp

Von: ceph-users [ceph-users-boun...@lists.ceph.com]quot; im Auftrag von 
quot;Mark Nelson [mark.nel...@inktank.com]
Gesendet: Freitag, 04. Juli 2014 16:10
Bis: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] write performance per disk

On 07/03/2014 08:11 AM, VELARTIS Philipp Dürhammer wrote:
 Hi,

 I have a ceph cluster setup (with 45 sata disk journal on disks) and get
 only 450mb/sec writes seq (maximum playing around with threads in rados
 bench) with replica of 2

 Which is about ~20Mb writes per disk (what y see in atop also)
 theoretically with replica2 and having journals on disk should be 45 X
 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125
 satas in reality have 120mb/sec so the theoretical output should be more.

 I would expect to have between 40-50mb/sec for each sata disk

 Can somebody confirm that he can reach this speed with a setup with
 journals on the satas (with journals on ssd speed should be 100mb per disk)?
 or does ceph only give about ¼ of the speed for a disk? (and not the ½
 as expected because of journals)

 My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for
 ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for
 osd traffic
 with reads I can saturate the network but writes is far away. And I
 would expect at least to saturate the 10gig with sequential writes also

In addition to the advice wido is providing (which I wholeheartedly
agree with!), you might want to check your controller/disk
configuration.  If you have journals on the same disks as the data, some
times putting the disks into single-disk RAID0 LUNs with writeback cache
enabled can help keep journal and data writes from causing seek
contention.  This only works if you have a controller with cache and a
battery though.


 Thank you



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] write performance per disk

2014-07-04 Thread VELARTIS Philipp Dürhammer
I use between 1 and 128 in different steps...
But 500mb write is the max playing around.

Uff its so hard to tune ceph... so many people have problems... ;-)

-Ursprüngliche Nachricht-
Von: Wido den Hollander [mailto:w...@42on.com] 
Gesendet: Freitag, 04. Juli 2014 10:55
An: VELARTIS Philipp Dürhammer; ceph-users@lists.ceph.com
Betreff: Re: AW: [ceph-users] write performance per disk

On 07/03/2014 04:32 PM, VELARTIS Philipp Dürhammer wrote:
 HI,

 Ceph.conf:
 osd journal size = 15360
 rbd cache = true
  rbd cache size = 2147483648
  rbd cache max dirty = 1073741824
  rbd cache max dirty age = 100
  osd recovery max active = 1
   osd max backfills = 1
   osd mkfs options xfs = -f -i size=2048
   osd mount options xfs = 
 rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M
   osd op threads = 8

 so it should be 8 threads?


How many threads are you using with rados bench? Don't touch the op threads 
from the start, usually the default is just fine.

 All 3 machines have more or less the same disk load at the same time.
 also the disks:
 sdb  35.5687.10  6849.09 617310   48540806
 sdc  26.7572.62  5148.58 514701   36488992
 sdd  35.1553.48  6802.57 378993   48211141
 sde  31.0479.04  6208.48 560141   44000710
 sdf  32.7938.35  6238.28 271805   44211891
 sdg  31.6777.84  5987.45 551680   42434167
 sdh  32.9551.29  6315.76 363533   44761001
 sdi  31.6756.93  5956.29 403478   42213336
 sdj  35.8377.82  6929.31 551501   49109354
 sdk  36.8673.84  7291.00 523345   51672704
 sdl  36.02   112.90  7040.47 800177   49897132
 sdm  33.2538.02  6455.05 269446   45748178
 sdn  33.5239.10  6645.19 277101   47095696
 sdo  33.2646.22  6388.20 327541   45274394
 sdp  33.3874.12  6480.62 525325   45929369


 the question is: is this a poor performance to get max 500mb/write with 45 
 disks and replica 2 or should I expect this?


You should be able to get more as long as the I/O is done in parallel.

Wido


 -Ursprüngliche Nachricht-
 Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag 
 von Wido den Hollander
 Gesendet: Donnerstag, 03. Juli 2014 15:22
 An: ceph-users@lists.ceph.com
 Betreff: Re: [ceph-users] write performance per disk

 On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote:
 Hi,

 I have a ceph cluster setup (with 45 sata disk journal on disks) and 
 get only 450mb/sec writes seq (maximum playing around with threads in 
 rados
 bench) with replica of 2


 How many threads?

 Which is about ~20Mb writes per disk (what y see in atop also) 
 theoretically with replica2 and having journals on disk should be 45 
 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 
 satas in reality have 120mb/sec so the theoretical output should be more.

 I would expect to have between 40-50mb/sec for each sata disk

 Can somebody confirm that he can reach this speed with a setup with 
 journals on the satas (with journals on ssd speed should be 100mb per disk)?
 or does ceph only give about ¼ of the speed for a disk? (and not the 
 ½ as expected because of journals)


 Did you verify how much each machine is doing? It could be that the data is 
 not distributed evenly and that on a certain machine the drives are doing 
 50MB/sec.

 My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for 
 ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig 
 for osd traffic with reads I can saturate the network but writes is 
 far away. And I would expect at least to saturate the 10gig with 
 sequential writes also


 Should be possible, but with 3 servers the data distribution might not be 
 optimal causing a lower write performance.

 I've seen 10Gbit write performance on multiple clusters without any problems.

 Thank you



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 Ceph consultant and trainer
 42on B.V.

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] write performance per disk

2014-07-03 Thread VELARTIS Philipp Dürhammer
Hi,

I have a ceph cluster setup (with 45 sata disk journal on disks) and get only 
450mb/sec writes seq (maximum playing around with threads in rados bench) with 
replica of 2
Which is about ~20Mb writes per disk (what y see in atop also)
theoretically with replica2 and having journals on disk should be 45 X 100mb 
(sata) / 2 (replica) / 2 (journal writes) which makes it 1125
satas in reality have 120mb/sec so the theoretical output should be more.

I would expect to have between 40-50mb/sec for each sata disk

Can somebody confirm that he can reach this speed with a setup with journals on 
the satas (with journals on ssd speed should be 100mb per disk)?
or does ceph only give about ¼ of the speed for a disk? (and not the ½ as 
expected because of journals)


My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for ceph (and 
ssds for system) 1 x 10gig for external traffic, 1 x 10gig for osd traffic
with reads I can saturate the network but writes is far away. And I would 
expect at least to saturate the 10gig with sequential writes also

Thank you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] write performance per disk

2014-07-03 Thread VELARTIS Philipp Dürhammer
HI,

Ceph.conf:
   osd journal size = 15360
   rbd cache = true
rbd cache size = 2147483648
rbd cache max dirty = 1073741824
rbd cache max dirty age = 100
osd recovery max active = 1
 osd max backfills = 1
 osd mkfs options xfs = -f -i size=2048
 osd mount options xfs = 
rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M
 osd op threads = 8

so it should be 8 threads?

All 3 machines have more or less the same disk load at the same time.
also the disks:
sdb  35.5687.10  6849.09 617310   48540806
sdc  26.7572.62  5148.58 514701   36488992
sdd  35.1553.48  6802.57 378993   48211141
sde  31.0479.04  6208.48 560141   44000710
sdf  32.7938.35  6238.28 271805   44211891
sdg  31.6777.84  5987.45 551680   42434167
sdh  32.9551.29  6315.76 363533   44761001
sdi  31.6756.93  5956.29 403478   42213336
sdj  35.8377.82  6929.31 551501   49109354
sdk  36.8673.84  7291.00 523345   51672704
sdl  36.02   112.90  7040.47 800177   49897132
sdm  33.2538.02  6455.05 269446   45748178
sdn  33.5239.10  6645.19 277101   47095696
sdo  33.2646.22  6388.20 327541   45274394
sdp  33.3874.12  6480.62 525325   45929369


the question is: is this a poor performance to get max 500mb/write with 45 
disks and replica 2 or should I expect this?


-Ursprüngliche Nachricht-
Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Wido 
den Hollander
Gesendet: Donnerstag, 03. Juli 2014 15:22
An: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] write performance per disk

On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote:
 Hi,

 I have a ceph cluster setup (with 45 sata disk journal on disks) and 
 get only 450mb/sec writes seq (maximum playing around with threads in 
 rados
 bench) with replica of 2


How many threads?

 Which is about ~20Mb writes per disk (what y see in atop also) 
 theoretically with replica2 and having journals on disk should be 45 X 
 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 
 satas in reality have 120mb/sec so the theoretical output should be more.

 I would expect to have between 40-50mb/sec for each sata disk

 Can somebody confirm that he can reach this speed with a setup with 
 journals on the satas (with journals on ssd speed should be 100mb per disk)?
 or does ceph only give about ¼ of the speed for a disk? (and not the ½ 
 as expected because of journals)


Did you verify how much each machine is doing? It could be that the data is not 
distributed evenly and that on a certain machine the drives are doing 50MB/sec.

 My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for 
 ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig 
 for osd traffic with reads I can saturate the network but writes is 
 far away. And I would expect at least to saturate the 10gig with 
 sequential writes also


Should be possible, but with 3 servers the data distribution might not be 
optimal causing a lower write performance.

I've seen 10Gbit write performance on multiple clusters without any problems.

 Thank you



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
Ceph consultant and trainer
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can we map OSDs from different hosts (servers) to a Pool in Ceph

2014-06-12 Thread VELARTIS Philipp Dürhammer
Hi,

Will ceph support mixing different disk pools (example spinners and ssds) in 
the future a little bit better (more safe)?

Thank you
philipp

On Wed, Jun 11, 2014 at 5:18 AM, Davide Fanciola dfanci...@gmail.com wrote:
 Hi,

 we have a similar setup where we have SSD and HDD in the same hosts.
 Our very basic crushmap is configured as follows:

 # ceph osd tree
 # id weight type name up/down reweight
 -6 3 root ssd
 3 1 osd.3 up 1
 4 1 osd.4 up 1
 5 1 osd.5 up 1
 -5 3 root platters
 0 1 osd.0 up 1
 1 1 osd.1 up 1
 2 1 osd.2 up 1
 -1 3 root default
 -2 1 host chgva-srv-stor-001
 0 1 osd.0 up 1
 3 1 osd.3 up 1
 -3 1 host chgva-srv-stor-002
 1 1 osd.1 up 1
 4 1 osd.4 up 1
 -4 1 host chgva-srv-stor-003
 2 1 osd.2 up 1
 5 1 osd.5 up 1


 We do not seem to have problems with this setup, but i'm not sure if 
 it's a good practice to have elements appearing multiple times in 
 different branches.
 On the other hand, I see no way to follow the physical hierarchy of a 
 datacenter for pools, since a pool can be spread among 
 servers/racks/rooms...

 Can someone confirm this crushmap is any good for our configuration?

If you accidentally use the default node anywhere, you'll get data scattered 
across both classes of device. If you try and use both the platters and ssd 
nodes within a single CRUSH rule, you might end up with copies of data on the 
same host (reducing your data resiliency). Otherwise this is just fine.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] someone using btrfs with ceph

2014-05-28 Thread VELARTIS Philipp Dürhammer
Is someone using btrfs in production?
I know people say it's still not stable. But do we use so many features with 
ceph? And facebook uses it also in production. Would be a big speed gain.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD and SATA Pool CRUSHMAP

2014-05-26 Thread VELARTIS Philipp Dürhammer
Hi,

What is the best way to implement a ssd and sata pool? (both on 3 servers each)
We have 3 servers with 15 satas and 4 ssds each.
I prefer to have a fast and small ssd pool and a big sata pool then cache 
tiering or ssds as journals as 15 satas per server with writeback cache is ok 
from performance and I can use the ssd's for a really fast ssd pool.

Thanks
philipp
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com