Hello,

Wondering if anyone else has come over an issue we're having with our POC CEPH 
Cluster at the moment.

Some details about its setup;

6 x Dell R720 (20 x 1TB Drives, 4 xSSD CacheCade), 4 x 10GB Nics
4 x Generic white label server (24 x 2 4TB Disk Raid-0 ), 4 x 10GB Nics
3 x Dell R620 - Acting as ISCSI Heads (targetcli / Linux kernel ISCSI) - 4 x 
10GB Nics.  An RBD device is mounted and exported via targetcli, this is then 
mounted on a client device to push backup data.

All machines are running Ubuntu 12.04.3 LTS and ceph 0.67.4

Machines are split over two racks (distinct layer 2 domains) using a leaf/spine 
model and we use ECMP/quagga on the ISCSI heads to reach the CEPH Cluster.

Crush map has racks defined to spread data over 2 racks -  I've attached the 
ceph.conf

The cluster performs great normally, and we only have issues when simulating 
rack failure.

The issue comes when the following steps are taken

o) Initiate load against the cluster (backups going via ISCSI)
o) ceph osd set noout
o) Reboot 2 x Generic Servers / 3 x Dell Servers (basically all the nodes in 1 
Rack)
o) Cluster goes degraded, as expected

  cluster 55dcf929-fca5-49fe-99d0-324a19afd5b4
   health HEALTH_WARN 7056 pgs degraded; 282 pgs stale; 2842 pgs stuck unclean; 
recovery 1286582/2700870 degraded (47.636%); 108/216 in osds are down; noout 
flag(s) set
   monmap e3: 5 mons at 
{fh-ceph01-mon-01=172.17.12.224:6789/0,fh-ceph01-mon-02=172.17.12.225:6789/0,fh-ceph01-mon-03=172.17.11.224:6789/0,fh-ceph01-mon-04=172.17.11.225:6789/0,fh-ceph01-mon-05=172.17.12.226:6789/0},
 election epoch 74, quorum 0,1,2,3,4 
fh-ceph01-mon-01,fh-ceph01-mon-02,fh-ceph01-mon-03,fh-ceph01-mon-04,fh-ceph01-mon-05
   osdmap e4237: 216 osds: 108 up, 216 in
    pgmap v117686: 7328 pgs: 266 active+clean, 6 stale+active+clean, 6780 
active+degraded, 276 stale+active+degraded; 3511 GB data, 10546 GB used, 794 TB 
/ 805 TB avail; 1286582/2700870 degraded (47.636%)
   mdsmap e1: 0/0/1 up


2013-11-05 08:51:44.830393 mon.0 [INF] pgmap v117685: 7328 pgs: 1489 
active+clean, 1289 stale+active+clean, 3215 active+degraded, 1335 
stale+active+degraded; 3511 GB data, 10546 GB used, 794 TB / 805 TB avail; 
1048742/2700870 degraded (38.830%);  recovering 7 o/s, 28969KB/s

o) As OSDS start returning

2013-11-05 08:52:42.019295 mon.0 [INF] osd.165 172.17.11.9:6864/6074 boot
2013-11-05 08:52:42.023055 mon.0 [INF] osd.154 172.17.11.9:6828/5943 boot
2013-11-05 08:52:42.024226 mon.0 [INF] osd.159 172.17.11.9:6816/5820 boot
2013-11-05 08:52:42.031996 mon.0 [INF] osd.161 172.17.11.9:6856/6059 boot

o) We then see some slow requests;

2013-11-05 08:53:11.677044 osd.153 [WRN] 6 slow requests, 6 included below; 
oldest blocked for > 30.409992 secs
2013-11-05 08:53:11.677052 osd.153 [WRN] slow request 30.409992 seconds old, 
received at 2013-11-05 08:52:41.266994: osd_op(client.16010.1:13441679 
rb.0.21ec.238e1f29.00000012fa28 [write 2854912~4096] 3.516ef071 RETRY=-1 e4240) 
currently reached pg
2013-11-05 08:53:11.677056 osd.153 [WRN] slow request 30.423024 seconds old, 
received at 2013-11-05 08:52:41.253962: osd_op(client.15755.1:13437999 
rb.0.21ec.238e1f29.00000012fa28 [write 0~233472] 3.516ef071 RETRY=1 e4240) v4 
currently reached pg

o) A few minutes , the ISCSI heads start panicking

Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664305] ------------[ cut 
here ]------------
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664313] WARNING: at 
/build/buildd/linux-lts-raring-3.8.0/kernel/watchdog.c:246 wat
chdog_overflow_callback+0x9a/0xc0()
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664315] Hardware name: 
PowerEdge R620
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664317] Watchdog detected 
hard LOCKUP on cpu 6
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664318] Modules linked in: 
ib_srpt(F) tcm_qla2xxx(F) tcm_loop(F) tcm_fc(F) iscsi_t
arget_mod(F) target_core_pscsi(F) target_core_file(F) target_core_iblock(F) 
target_core_mod(F) rbd(F) libceph(F) ipmi_devintf(F) ipm
i_si(F) ipmi_msghandler(F) qla2xxx(F) libfc(F) scsi_transport_fc(F) scsi_tgt(F) 
configfs(F) dell_rbu(F) ib_iser(F) rdma_cm(F) ib_cm(
F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) ext2(F) iscsi_tcp(F) 
libiscsi_tcp(F) libiscsi(F) scsi_transport_iscsi(F) corete
mp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) 
cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) gpi
o_ich(F) dcdbas(F) microcode(F) joydev(F) shpchp(F) sb_edac(F) wmi(F) 
edac_core(F) acpi_power_meter(F) mei(F) lpc_ich(F) mac_hid(F) 
8021q(F) garp(F) stp(F) llc(F) lp(F) parport(F) hid_generic(F) usbhid(F) hid(F) 
ahci(F) libahci(F) ixgbe(F) dca(F) megaraid_sas(F) m
dio(F) tg3(F) ptp(F) pps_core(F) btrfs(F) zlib_deflate(F) libcrc32c(F) [last 
unloaded: target_core_mod]
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664387] Pid: 460, comm: 
kworker/u:5 Tainted: GF       W    3.8.0-31-generic #46~pr
ecise1-Ubuntu
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664389] Call Trace:
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664391]  <NMI>  
[<ffffffff81059b2f>] warn_slowpath_common+0x7f/0xc0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664402]  [<ffffffff81059c26>] 
warn_slowpath_fmt+0x46/0x50
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664406]  [<ffffffff810ed6fa>] 
watchdog_overflow_callback+0x9a/0xc0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664410]  [<ffffffff8112c75c>] 
__perf_event_overflow+0x9c/0x220
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664415]  [<ffffffff81025aca>] 
? x86_perf_event_set_period+0xda/0x150
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664418]  [<ffffffff8112cf54>] 
perf_event_overflow+0x14/0x20
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664423]  [<ffffffff8102b8db>] 
intel_pmu_handle_irq+0x19b/0x250
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664429]  [<ffffffff816f5c71>] 
perf_event_nmi_handler+0x21/0x30
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664432]  [<ffffffff816f53a9>] 
nmi_handle.isra.0+0x59/0x80
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664435]  [<ffffffff816f54b8>] 
default_do_nmi+0x58/0x240
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664438]  [<ffffffff816f5730>] 
do_nmi+0x90/0xd0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664442]  [<ffffffff816f49c1>] 
end_repeat_nmi+0x1e/0x2e
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664447]  [<ffffffff81045c42>] 
? __ticket_spin_lock+0x22/0x30
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664450]  [<ffffffff81045c42>] 
? __ticket_spin_lock+0x22/0x30
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664453]  [<ffffffff81045c42>] 
? __ticket_spin_lock+0x22/0x30
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664454]  <<EOE>>  
[<ffffffff81045cd9>] default_spin_lock_flags+0x9/0x10
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664460]  [<ffffffff816f400e>] 
_raw_spin_lock_irqsave+0x2e/0x40
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664466]  [<ffffffff8108a442>] 
__wake_up+0x32/0x70
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664477]  [<ffffffffa03687d2>] 
iscsit_add_cmd_to_response_queue+0xb2/0xe0 [iscsi_ta
rget_mod]
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664483]  [<ffffffffa0374e28>] 
lio_queue_tm_rsp+0x28/0x30 [iscsi_target_mod]
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664492]  [<ffffffffa03a79ee>] 
target_tmr_work+0x6e/0xf0 [target_core_mod]
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664496]  [<ffffffff81078d01>] 
process_one_work+0x141/0x490
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664500]  [<ffffffff81079cc8>] 
worker_thread+0x168/0x400
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664503]  [<ffffffff81079b60>] 
? manage_workers+0x120/0x120
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664506]  [<ffffffff8107f1d0>] 
kthread+0xc0/0xd0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664510]  [<ffffffff8107f110>] 
? flush_kthread_worker+0xb0/0xb0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664514]  [<ffffffff816fcb2c>] 
ret_from_fork+0x7c/0xb0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664517]  [<ffffffff8107f110>] 
? flush_kthread_worker+0xb0/0xb0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664519] ---[ end trace 
9b6438fe7cb20627 ]---
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602470] INFO: rcu_sched 
detected stalls on CPUs/tasks: { 6} (detected by 21, t=150
02 jiffies, g=18619, c=18618, q=7934)
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602570] sending NMI to all 
CPUs:
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602574] NMI backtrace for cpu 0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602577] CPU 0 
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602581] Pid: 0, comm: 
swapper/0 Tainted: GF       W    3.8.0-31-generic #46~precis
e1-Ubuntu Dell Inc. PowerEdge R620/0PXXHP
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602583] RIP: 
0010:[<ffffffff8158b2e2>]  [<ffffffff8158b2e2>] poll_idle+0x42/0x90
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602590] RSP: 
0018:ffffffff81c01db8  EFLAGS: 00000246
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602592] RAX: 0000000400000000 
RBX: 0000000000000000 RCX: 0000000000000018
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602594] RDX: ffffffff81c01fd8 
RSI: ffffffff81c7a380 RDI: ffffffff81c1af00
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602596] RBP: ffffffff81c01df8 
R08: 0000000000000020 R09: 0000000000000000
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602598] R10: 0000000000002000 
R11: 0000000000000000 R12: ffff880fffc1b370
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602599] R13: 00003ef672bf9148 
R14: ffffffff81c7a380 R15: 0000000000000000
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602602] FS:  
0000000000000000(0000) GS:ffff880fffc00000(0000) knlGS:00000000000000
00
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602604] CS:  0010 DS: 0000 
ES: 0000 CR0: 0000000080050033
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602605] CR2: 00007fffd8f0f000 
CR3: 0000000fda277000 CR4: 00000000000407f0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602607] DR0: 0000000000000000 
DR1: 0000000000000000 DR2: 0000000000000000
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602609] DR3: 0000000000000000 
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602611] Process swapper/0 
(pid: 0, threadinfo ffffffff81c00000, task ffffffff81c15
440)
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602613] Stack:
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602614]  0000000000000000 
0000000006e86f73 ffffffff81c01dd8 ffffffff8101ba59
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602620]  ffffffff81c01e08 
ffff880fffc1b370 00003ef672bf9129 ffffffff8158ae40
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602625]  ffffffff81c01e08 
ffffffff8158ae59 ffffffff81c01e68 ffffffff8158b8d1
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602629] Call Trace:
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602634]  [<ffffffff8101ba59>] 
? read_tsc+0x9/0x20
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602638]  [<ffffffff8158ae40>] 
? centrino_target+0x360/0x360
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602641]  [<ffffffff8158ae59>] 
cpuidle_enter+0x19/0x20
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602645]  [<ffffffff8158b8d1>] 
cpuidle_wrap_enter+0x41/0xa0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602648]  [<ffffffff8158cd1e>] 
? menu_select+0x16e/0x2b0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602651]  [<ffffffff8158b940>] 
cpuidle_enter_tk+0x10/0x20
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602654]  [<ffffffff8158b4ff>] 
cpuidle_idle_call+0xaf/0x2b0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602658]  [<ffffffff8101da8f>] 
cpu_idle+0xcf/0x120
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602664]  [<ffffffff816c7082>] 
rest_init+0x72/0x80
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602669]  [<ffffffff81d05c4f>] 
start_kernel+0x3d1/0x3de
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602673]  [<ffffffff81d057ff>] 
? pass_bootoption.constprop.2+0xd3/0xd3
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602676]  [<ffffffff81d05397>] 
x86_64_start_reservations+0x131/0x135
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602680]  [<ffffffff81d05120>] 
? early_idt_handlers+0x120/0x120
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602683]  [<ffffffff81d05468>] 
x86_64_start_kernel+0xcd/0xdc
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602685] Code: e8 84 36 b2 ff 
49 89 c5 fb 66 66 90 66 66 90 65 48 8b 14 25 30 c7 00
 00 48 8b 82 38 e0 ff ff a8 08 75 13 66 0f 1f 44 00 00 f3 90 <48> 8b 82 38 e0 
ff ff a8 08 74 f3 e8 4e 36 b2 ff 4c 29 e8 48 89 
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602718] NMI backtrace for cpu 
1
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602721] CPU 1 
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602725] Pid: 0, comm: 
swapper/1 Tainted: GF       W    3.8.0-31-generic #46~precis
e1-Ubuntu Dell Inc. PowerEdge R620/0PXXHP
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602728] RIP: 
0010:[<ffffffff8158b2e9>]  [<ffffffff8158b2e9>] poll_idle+0x49/0x90
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602735] RSP: 
0018:ffff880fe9545de8  EFLAGS: 00000246
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602737] RAX: 0000000400000000 
RBX: 0000000000000000 RCX: 0000000000000018
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602739] RDX: ffff880fe9545fd8 
RSI: ffffffff81c7a380 RDI: ffffffff81c1af00
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602741] RBP: ffff880fe9545e28 
R08: 0000000000000020 R09: 0000000000000000
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602743] R10: 0000000000002000 
R11: 0000000000000000 R12: ffff88203fc1b370
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602745] R13: 00003ef672bf7e9f 
R14: ffffffff81c7a380 R15: 0000000000000000
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602747] FS:  
0000000000000000(0000) GS:ffff88203fc00000(0000) knlGS:00000000000000
00
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602749] CS:  0010 DS: 0000 
ES: 0000 CR0: 0000000080050033
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602751] CR2: 00007f1dc4053038 
CR3: 0000001fd97aa000 CR4: 00000000000407e0
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602753] DR0: 0000000000000000 
DR1: 0000000000000000 DR2: 0000000000000000
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602755] DR3: 0000000000000000 
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602757] Process swapper/1 
(pid: 0, threadinfo ffff880fe9544000, task ffff880fe9549
740)

I'm in the process of building a separate iscsi head on different hardware, 
just incase it's something related to the M620's but wanted to see if anyone 
had any ideas?  Any input greatly appreciated.  I can pretty much replicate 
this on demand at the moment by enacting the above steps.

Cheers,



Attachment: ceph.conf
Description: ceph.conf

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to