Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped

2015-08-17 Thread Steve Dainard
 acting_primary
 2.e7f active+remapped [58] 58 [58,5] 58
 2.143 active+remapped [16] 16 [16,76] 16
 2.968 active+remapped [44] 44 [44,76] 44
 2.5f8 active+remapped [17] 17 [17,76] 17
 2.81c active+remapped [25] 25 [25,76] 25
 2.1a3 active+remapped [16] 16 [16,76] 16
 2.2cb active+remapped [14] 14 [14,76] 14
 2.d41 active+remapped [27] 27 [27,76] 27
 2.3f9 active+remapped [35] 35 [35,76] 35
 2.a62 active+remapped [2] 2 [2,38] 2
 2.1b0 active+remapped [3] 3 [3,76] 3

 All of the OSD filesystems are below 85% full.

 I then compared a 0.94.2 cluster that was new and had not been updated
 (current cluster is 0.94.2 which had been updated a couple times) and
 noticed the crush map had 'tunable straw_calc_version 1' so I added it
 to the current cluster.

 After the data moved around for about 8 hours or so I'm left with this 
 state:

 # ceph health detail
 HEALTH_WARN 2 pgs stuck unclean; recovery 16357/66089446 objects
 misplaced (0.025%)
 pg 2.e7f is stuck unclean for 149422.331848, current state
 active+remapped, last acting [58,5]
 pg 2.782 is stuck unclean for 64878.002464, current state
 active+remapped, last acting [76,31]
 recovery 16357/66089446 objects misplaced (0.025%)

 I attempted a pg repair on both of the pg's listed above, but it
 doesn't look like anything is happening. The doc's reference an
 inconsistent state as a use case for the repair command so that's
 likely why.

 These 2 pg's have been the issue throughout this process so how can I
 dig deeper to figure out what the problem is?

 # ceph pg 2.e7f query: http://pastebin.com/jMMsbsjS
 # ceph pg 2.e7f query: http://pastebin.com/0ntBfFK5


 On Wed, Aug 12, 2015 at 6:52 PM, yangyongp...@bwstor.com.cn
 yangyongp...@bwstor.com.cn wrote:
 You can try ceph pg repair pg_idto repair the unhealth pg.ceph health
 detail command is very useful to detect unhealth pgs.

 
 yangyongp...@bwstor.com.cn


 From: Steve Dainard
 Date: 2015-08-12 23:48
 To: ceph-users
 Subject: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1
 active+remapped
 I ran a ceph osd reweight-by-utilization yesterday and partway through
 had a network interruption. After the network was restored the cluster
 continued to rebalance but this morning the cluster has stopped
 rebalance and status will not change from:

 # ceph status
 cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c
 health HEALTH_WARN
 1 pgs degraded
 1 pgs stuck degraded
 2 pgs stuck unclean
 1 pgs stuck undersized
 1 pgs undersized
 recovery 8163/66089054 objects degraded (0.012%)
 recovery 8194/66089054 objects misplaced (0.012%)
 monmap e24: 3 mons at
 {mon1=10.0.231.53:6789/0,mon2=10.0.231.54:6789/0,mon3=10.0.231.55:6789/0}
 election epoch 250, quorum 0,1,2 mon1,mon2,mon3
 osdmap e184486: 100 osds: 100 up, 100 in; 1 remapped pgs
 pgmap v3010985: 4144 pgs, 7 pools, 125 TB data, 32270 kobjects
 251 TB used, 111 TB / 363 TB avail
 8163/66089054 objects degraded (0.012%)
 8194/66089054 objects misplaced (0.012%)
 4142 active+clean
 1 active+undersized+degraded
 1 active+remapped


 # ceph health detail
 HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 2 pgs stuck unclean;
 1 pgs stuck undersized; 1 pgs undersized; recovery 8163/66089054
 objects degraded (0.012%); recovery 8194/66089054 objects misplaced
 (0.012%)
 pg 2.e7f is stuck unclean for 65125.554509, current state
 active+remapped, last acting [58,5]
 pg 2.782 is stuck unclean for 65140.681540, current state
 active+undersized+degraded, last acting [76]
 pg 2.782 is stuck undersized for 60568.221461, current state
 active+undersized+degraded, last acting [76]
 pg 2.782 is stuck degraded for 60568.221549, current state
 active+undersized+degraded, last acting [76]
 pg 2.782 is active+undersized+degraded, acting [76]
 recovery 8163/66089054 objects degraded (0.012%)
 recovery 8194/66089054 objects misplaced (0.012%)

 # ceph pg 2.e7f query
 recovery_state: [
 {
 name: Started\/Primary\/Active,
 enter_time: 2015-08-11 15:43:09.190269,
 might_have_unfound: [],
 recovery_progress: {
 backfill_targets: [],
 waiting_on_backfill: [],
 last_backfill_started: 0\/\/0\/\/-1,
 backfill_info: {
 begin: 0\/\/0\/\/-1,
 end: 0\/\/0\/\/-1,
 objects: []
 },
 peer_backfill_info: [],
 backfills_in_flight: [],
 recovering: [],
 pg_backend: {
 pull_from_peer: [],
 pushing: []
 }
 },
 scrub: {
 scrubber.epoch_start: 0,
 scrubber.active: 0,
 scrubber.waiting_on: 0,
 scrubber.waiting_on_whom: []
 }
 },
 {
 name: Started,
 enter_time: 2015-08-11 15:43:04.955796
 }
 ],


 # ceph pg 2.782 query
 recovery_state: [
 {
 name: Started\/Primary\/Active,
 enter_time: 2015-08-11 15:42:42.178042,
 might_have_unfound: [
 {
 osd: 5,
 status: not queried
 }
 ],
 recovery_progress: {
 backfill_targets: [],
 waiting_on_backfill: [],
 last_backfill_started: 0\/\/0\/\/-1,
 backfill_info: {
 begin: 0\/\/0\/\/-1,
 end: 0\/\/0\/\/-1,
 objects: []
 },
 peer_backfill_info: [],
 backfills_in_flight: [],
 recovering: [],
 pg_backend: {
 pull_from_peer: [],
 pushing

Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped

2015-08-13 Thread Steve Dainard
I decided to set OSD 76 out and let the cluster shuffle the data off
that disk and then brought the OSD back in. For the most part this
seemed to be working, but then I had 1 object degraded and 88xxx
objects misplaced:

# ceph health detail
HEALTH_WARN 11 pgs stuck unclean; recovery 1/66089446 objects degraded
(0.000%); recovery 88844/66089446 objects misplaced (0.134%)
pg 2.e7f is stuck unclean for 88398.251351, current state
active+remapped, last acting [58,5]
pg 2.143 is stuck unclean for 13892.364101, current state
active+remapped, last acting [16,76]
pg 2.968 is stuck unclean for 13892.363521, current state
active+remapped, last acting [44,76]
pg 2.5f8 is stuck unclean for 13892.377245, current state
active+remapped, last acting [17,76]
pg 2.81c is stuck unclean for 13892.363443, current state
active+remapped, last acting [25,76]
pg 2.1a3 is stuck unclean for 13892.364400, current state
active+remapped, last acting [16,76]
pg 2.2cb is stuck unclean for 13892.374390, current state
active+remapped, last acting [14,76]
pg 2.d41 is stuck unclean for 13892.373636, current state
active+remapped, last acting [27,76]
pg 2.3f9 is stuck unclean for 13892.373147, current state
active+remapped, last acting [35,76]
pg 2.a62 is stuck unclean for 86283.741920, current state
active+remapped, last acting [2,38]
pg 2.1b0 is stuck unclean for 13892.363268, current state
active+remapped, last acting [3,76]
recovery 1/66089446 objects degraded (0.000%)
recovery 88844/66089446 objects misplaced (0.134%)

I say apparently because with one object degraded, none of the pg's
are showing degraded:
# ceph pg dump_stuck degraded
ok

# ceph pg dump_stuck unclean
ok
pg_stat state up up_primary acting acting_primary
2.e7f active+remapped [58] 58 [58,5] 58
2.143 active+remapped [16] 16 [16,76] 16
2.968 active+remapped [44] 44 [44,76] 44
2.5f8 active+remapped [17] 17 [17,76] 17
2.81c active+remapped [25] 25 [25,76] 25
2.1a3 active+remapped [16] 16 [16,76] 16
2.2cb active+remapped [14] 14 [14,76] 14
2.d41 active+remapped [27] 27 [27,76] 27
2.3f9 active+remapped [35] 35 [35,76] 35
2.a62 active+remapped [2] 2 [2,38] 2
2.1b0 active+remapped [3] 3 [3,76] 3

All of the OSD filesystems are below 85% full.

I then compared a 0.94.2 cluster that was new and had not been updated
(current cluster is 0.94.2 which had been updated a couple times) and
noticed the crush map had 'tunable straw_calc_version 1' so I added it
to the current cluster.

After the data moved around for about 8 hours or so I'm left with this state:

# ceph health detail
HEALTH_WARN 2 pgs stuck unclean; recovery 16357/66089446 objects
misplaced (0.025%)
pg 2.e7f is stuck unclean for 149422.331848, current state
active+remapped, last acting [58,5]
pg 2.782 is stuck unclean for 64878.002464, current state
active+remapped, last acting [76,31]
recovery 16357/66089446 objects misplaced (0.025%)

I attempted a pg repair on both of the pg's listed above, but it
doesn't look like anything is happening. The doc's reference an
inconsistent state as a use case for the repair command so that's
likely why.

These 2 pg's have been the issue throughout this process so how can I
dig deeper to figure out what the problem is?

# ceph pg 2.e7f query: http://pastebin.com/jMMsbsjS
# ceph pg 2.e7f query: http://pastebin.com/0ntBfFK5


On Wed, Aug 12, 2015 at 6:52 PM, yangyongp...@bwstor.com.cn
yangyongp...@bwstor.com.cn wrote:
 You can try ceph pg repair pg_idto repair the unhealth pg.ceph health
 detail command is very useful to detect unhealth pgs.

 
 yangyongp...@bwstor.com.cn


 From: Steve Dainard
 Date: 2015-08-12 23:48
 To: ceph-users
 Subject: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1
 active+remapped
 I ran a ceph osd reweight-by-utilization yesterday and partway through
 had a network interruption. After the network was restored the cluster
 continued to rebalance but this morning the cluster has stopped
 rebalance and status will not change from:

 # ceph status
 cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c
  health HEALTH_WARN
 1 pgs degraded
 1 pgs stuck degraded
 2 pgs stuck unclean
 1 pgs stuck undersized
 1 pgs undersized
 recovery 8163/66089054 objects degraded (0.012%)
 recovery 8194/66089054 objects misplaced (0.012%)
  monmap e24: 3 mons at
 {mon1=10.0.231.53:6789/0,mon2=10.0.231.54:6789/0,mon3=10.0.231.55:6789/0}
 election epoch 250, quorum 0,1,2 mon1,mon2,mon3
  osdmap e184486: 100 osds: 100 up, 100 in; 1 remapped pgs
   pgmap v3010985: 4144 pgs, 7 pools, 125 TB data, 32270 kobjects
 251 TB used, 111 TB / 363 TB avail
 8163/66089054 objects degraded (0.012%)
 8194/66089054 objects misplaced (0.012%)
 4142 active+clean
1 active+undersized+degraded
1 active+remapped


 # ceph health detail
 HEALTH_WARN 1 pgs degraded

Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped

2015-08-13 Thread Steve Dainard
OSD tree: http://pastebin.com/3z333DP4
Crushmap: http://pastebin.com/DBd9k56m

I realize these nodes are quite large, I have plans to break them out
into 12 OSD's/node.

On Thu, Aug 13, 2015 at 9:02 AM, GuangYang yguan...@outlook.com wrote:
 Could you share the 'ceph osd tree dump' and CRUSH map dump ?

 Thanks,
 Guang


 
 Date: Thu, 13 Aug 2015 08:16:09 -0700
 From: sdain...@spd1.com
 To: yangyongp...@bwstor.com.cn; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 
 active+remapped

 I decided to set OSD 76 out and let the cluster shuffle the data off
 that disk and then brought the OSD back in. For the most part this
 seemed to be working, but then I had 1 object degraded and 88xxx
 objects misplaced:

 # ceph health detail
 HEALTH_WARN 11 pgs stuck unclean; recovery 1/66089446 objects degraded
 (0.000%); recovery 88844/66089446 objects misplaced (0.134%)
 pg 2.e7f is stuck unclean for 88398.251351, current state
 active+remapped, last acting [58,5]
 pg 2.143 is stuck unclean for 13892.364101, current state
 active+remapped, last acting [16,76]
 pg 2.968 is stuck unclean for 13892.363521, current state
 active+remapped, last acting [44,76]
 pg 2.5f8 is stuck unclean for 13892.377245, current state
 active+remapped, last acting [17,76]
 pg 2.81c is stuck unclean for 13892.363443, current state
 active+remapped, last acting [25,76]
 pg 2.1a3 is stuck unclean for 13892.364400, current state
 active+remapped, last acting [16,76]
 pg 2.2cb is stuck unclean for 13892.374390, current state
 active+remapped, last acting [14,76]
 pg 2.d41 is stuck unclean for 13892.373636, current state
 active+remapped, last acting [27,76]
 pg 2.3f9 is stuck unclean for 13892.373147, current state
 active+remapped, last acting [35,76]
 pg 2.a62 is stuck unclean for 86283.741920, current state
 active+remapped, last acting [2,38]
 pg 2.1b0 is stuck unclean for 13892.363268, current state
 active+remapped, last acting [3,76]
 recovery 1/66089446 objects degraded (0.000%)
 recovery 88844/66089446 objects misplaced (0.134%)

 I say apparently because with one object degraded, none of the pg's
 are showing degraded:
 # ceph pg dump_stuck degraded
 ok

 # ceph pg dump_stuck unclean
 ok
 pg_stat state up up_primary acting acting_primary
 2.e7f active+remapped [58] 58 [58,5] 58
 2.143 active+remapped [16] 16 [16,76] 16
 2.968 active+remapped [44] 44 [44,76] 44
 2.5f8 active+remapped [17] 17 [17,76] 17
 2.81c active+remapped [25] 25 [25,76] 25
 2.1a3 active+remapped [16] 16 [16,76] 16
 2.2cb active+remapped [14] 14 [14,76] 14
 2.d41 active+remapped [27] 27 [27,76] 27
 2.3f9 active+remapped [35] 35 [35,76] 35
 2.a62 active+remapped [2] 2 [2,38] 2
 2.1b0 active+remapped [3] 3 [3,76] 3

 All of the OSD filesystems are below 85% full.

 I then compared a 0.94.2 cluster that was new and had not been updated
 (current cluster is 0.94.2 which had been updated a couple times) and
 noticed the crush map had 'tunable straw_calc_version 1' so I added it
 to the current cluster.

 After the data moved around for about 8 hours or so I'm left with this state:

 # ceph health detail
 HEALTH_WARN 2 pgs stuck unclean; recovery 16357/66089446 objects
 misplaced (0.025%)
 pg 2.e7f is stuck unclean for 149422.331848, current state
 active+remapped, last acting [58,5]
 pg 2.782 is stuck unclean for 64878.002464, current state
 active+remapped, last acting [76,31]
 recovery 16357/66089446 objects misplaced (0.025%)

 I attempted a pg repair on both of the pg's listed above, but it
 doesn't look like anything is happening. The doc's reference an
 inconsistent state as a use case for the repair command so that's
 likely why.

 These 2 pg's have been the issue throughout this process so how can I
 dig deeper to figure out what the problem is?

 # ceph pg 2.e7f query: http://pastebin.com/jMMsbsjS
 # ceph pg 2.e7f query: http://pastebin.com/0ntBfFK5


 On Wed, Aug 12, 2015 at 6:52 PM, yangyongp...@bwstor.com.cn
 yangyongp...@bwstor.com.cn wrote:
 You can try ceph pg repair pg_idto repair the unhealth pg.ceph health
 detail command is very useful to detect unhealth pgs.

 
 yangyongp...@bwstor.com.cn


 From: Steve Dainard
 Date: 2015-08-12 23:48
 To: ceph-users
 Subject: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1
 active+remapped
 I ran a ceph osd reweight-by-utilization yesterday and partway through
 had a network interruption. After the network was restored the cluster
 continued to rebalance but this morning the cluster has stopped
 rebalance and status will not change from:

 # ceph status
 cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c
 health HEALTH_WARN
 1 pgs degraded
 1 pgs stuck degraded
 2 pgs stuck unclean
 1 pgs stuck undersized
 1 pgs undersized
 recovery 8163/66089054 objects degraded (0.012%)
 recovery 8194/66089054 objects misplaced (0.012%)
 monmap e24: 3 mons at
 {mon1=10.0.231.53:6789/0,mon2

[ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped

2015-08-12 Thread Steve Dainard
I ran a ceph osd reweight-by-utilization yesterday and partway through
had a network interruption. After the network was restored the cluster
continued to rebalance but this morning the cluster has stopped
rebalance and status will not change from:

# ceph status
cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c
 health HEALTH_WARN
1 pgs degraded
1 pgs stuck degraded
2 pgs stuck unclean
1 pgs stuck undersized
1 pgs undersized
recovery 8163/66089054 objects degraded (0.012%)
recovery 8194/66089054 objects misplaced (0.012%)
 monmap e24: 3 mons at
{mon1=10.0.231.53:6789/0,mon2=10.0.231.54:6789/0,mon3=10.0.231.55:6789/0}
election epoch 250, quorum 0,1,2 mon1,mon2,mon3
 osdmap e184486: 100 osds: 100 up, 100 in; 1 remapped pgs
  pgmap v3010985: 4144 pgs, 7 pools, 125 TB data, 32270 kobjects
251 TB used, 111 TB / 363 TB avail
8163/66089054 objects degraded (0.012%)
8194/66089054 objects misplaced (0.012%)
4142 active+clean
   1 active+undersized+degraded
   1 active+remapped


# ceph health detail
HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 2 pgs stuck unclean;
1 pgs stuck undersized; 1 pgs undersized; recovery 8163/66089054
objects degraded (0.012%); recovery 8194/66089054 objects misplaced
(0.012%)
pg 2.e7f is stuck unclean for 65125.554509, current state
active+remapped, last acting [58,5]
pg 2.782 is stuck unclean for 65140.681540, current state
active+undersized+degraded, last acting [76]
pg 2.782 is stuck undersized for 60568.221461, current state
active+undersized+degraded, last acting [76]
pg 2.782 is stuck degraded for 60568.221549, current state
active+undersized+degraded, last acting [76]
pg 2.782 is active+undersized+degraded, acting [76]
recovery 8163/66089054 objects degraded (0.012%)
recovery 8194/66089054 objects misplaced (0.012%)

# ceph pg 2.e7f query
recovery_state: [
{
name: Started\/Primary\/Active,
enter_time: 2015-08-11 15:43:09.190269,
might_have_unfound: [],
recovery_progress: {
backfill_targets: [],
waiting_on_backfill: [],
last_backfill_started: 0\/\/0\/\/-1,
backfill_info: {
begin: 0\/\/0\/\/-1,
end: 0\/\/0\/\/-1,
objects: []
},
peer_backfill_info: [],
backfills_in_flight: [],
recovering: [],
pg_backend: {
pull_from_peer: [],
pushing: []
}
},
scrub: {
scrubber.epoch_start: 0,
scrubber.active: 0,
scrubber.waiting_on: 0,
scrubber.waiting_on_whom: []
}
},
{
name: Started,
enter_time: 2015-08-11 15:43:04.955796
}
],


# ceph pg 2.782 query
  recovery_state: [
{
name: Started\/Primary\/Active,
enter_time: 2015-08-11 15:42:42.178042,
might_have_unfound: [
{
osd: 5,
status: not queried
}
],
recovery_progress: {
backfill_targets: [],
waiting_on_backfill: [],
last_backfill_started: 0\/\/0\/\/-1,
backfill_info: {
begin: 0\/\/0\/\/-1,
end: 0\/\/0\/\/-1,
objects: []
},
peer_backfill_info: [],
backfills_in_flight: [],
recovering: [],
pg_backend: {
pull_from_peer: [],
pushing: []
}
},
scrub: {
scrubber.epoch_start: 0,
scrubber.active: 0,
scrubber.waiting_on: 0,
scrubber.waiting_on_whom: []
}
},
{
name: Started,
enter_time: 2015-08-11 15:42:41.139709
}
],
agent_state: {}

I tried restarted osd.5/58/76 but no change.

Any suggestions?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph tell not persistent through reboots?

2015-08-06 Thread Steve Dainard
Hello,

Version 0.94.1

I'm passing settings to the admin socket ie:
ceph tell osd.* injectargs '--osd_deep_scrub_begin_hour 20'
ceph tell osd.* injectargs '--osd_deep_scrub_end_hour 4'
ceph tell osd.* injectargs '--osd_deep_scrub_interval 1209600'

Then I check to see if they're in the configs now:
# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show |
egrep -i 'scrub_interval|hour'
osd_scrub_begin_hour: 4,
osd_scrub_end_hour: 20,
osd_deep_scrub_interval: 1.2096e+06,

Then I restart that host and check again and the values have returned
to default:
# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show |
egrep -i 'scrub_interval|hour'
osd_scrub_begin_hour: 0,
osd_scrub_end_hour: 24,
osd_deep_scrub_interval: 604800,

If I check on another host the values are correct:
# ceph --admin-daemon /var/run/ceph/ceph-osd.90.asok config show |
egrep -i 'scrub_interval|hour'
osd_scrub_begin_hour: 20,
osd_scrub_end_hour: 4,
osd_deep_scrub_interval: 1.2096e+06,

If I check on a mon the values are default:
# ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show |
egrep -i 'scrub_interval|hour'
osd_scrub_begin_hour: 0,
osd_scrub_end_hour: 24,
osd_deep_scrub_interval: 604800,

If I try to pass a config to mon1 via a osd host it appears to do something:
# ceph tell mon.1 injectargs --osd_deep_scrub_interval 1209600
injectargs:osd_deep_scrub_interval = '1.2096e+06'

And then check on mon1 and its still the default value:
# ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show |
egrep -i scrub_interval
osd_deep_scrub_interval: 604800,


And if I pass a config on mon1 it looks like its being updated, but
the default remains:
# ceph tell mon.1 injectargs --osd_deep_scrub_interval 1209600
injectargs:osd_deep_scrub_interval = '1.2096e+06'
# ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show |
egrep -i scrub_interval
osd_deep_scrub_interval: 604800,

I don't know if this is a bug, or if I'm doing something wrong here...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Direct IO tests on RBD device vary significantly

2015-08-06 Thread Steve Dainard
Trying to get an understanding why direct IO would be so slow on my cluster.

Ceph 0.94.1
1 Gig public network
10 Gig public network
10 Gig cluster network

100 OSD's, 4T disk sizes, 5G SSD journal.

As of this morning I had no SSD journal and was finding direct IO was
sub 10MB/s so I decided to add journals today.

Afterwards I started running tests again and wasn't very impressed.
Then for no apparent reason the write speeds increased significantly.
But I'm finding they vary wildly.

Currently there is a bit of background ceph activity, but only my
testing client has an rbd mapped/mounted:
   election epoch 144, quorum 0,1,2 mon1,mon3,mon2
 osdmap e181963: 100 osds: 100 up, 100 in
flags noout
  pgmap v2852566: 4144 pgs, 7 pools, 113 TB data, 29179 kobjects
227 TB used, 135 TB / 363 TB avail
4103 active+clean
  40 active+clean+scrubbing
   1 active+clean+scrubbing+deep

Tests:
1M block size: http://pastebin.com/LKtsaHrd throughput has no consistency
4k block size: http://pastebin.com/ib6VW9eB thoughput is amazingly consistent

Thoughts?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph tell not persistent through reboots?

2015-08-06 Thread Steve Dainard
That would make sense..

Thanks!

On Thu, Aug 6, 2015 at 6:29 PM, Wang, Warren
warren_w...@cable.comcast.com wrote:
 Injecting args into the running procs is not meant to be persistent. You'll 
 need to modify /etc/ceph/ceph.conf for that.

 Warren

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Steve Dainard
 Sent: Thursday, August 06, 2015 9:16 PM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] ceph tell not persistent through reboots?

 Hello,

 Version 0.94.1

 I'm passing settings to the admin socket ie:
 ceph tell osd.* injectargs '--osd_deep_scrub_begin_hour 20'
 ceph tell osd.* injectargs '--osd_deep_scrub_end_hour 4'
 ceph tell osd.* injectargs '--osd_deep_scrub_interval 1209600'

 Then I check to see if they're in the configs now:
 # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | egrep -i 
 'scrub_interval|hour'
 osd_scrub_begin_hour: 4,
 osd_scrub_end_hour: 20,
 osd_deep_scrub_interval: 1.2096e+06,

 Then I restart that host and check again and the values have returned to 
 default:
 # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | egrep -i 
 'scrub_interval|hour'
 osd_scrub_begin_hour: 0,
 osd_scrub_end_hour: 24,
 osd_deep_scrub_interval: 604800,

 If I check on another host the values are correct:
 # ceph --admin-daemon /var/run/ceph/ceph-osd.90.asok config show | egrep -i 
 'scrub_interval|hour'
 osd_scrub_begin_hour: 20,
 osd_scrub_end_hour: 4,
 osd_deep_scrub_interval: 1.2096e+06,

 If I check on a mon the values are default:
 # ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show | egrep -i 
 'scrub_interval|hour'
 osd_scrub_begin_hour: 0,
 osd_scrub_end_hour: 24,
 osd_deep_scrub_interval: 604800,

 If I try to pass a config to mon1 via a osd host it appears to do something:
 # ceph tell mon.1 injectargs --osd_deep_scrub_interval 1209600
 injectargs:osd_deep_scrub_interval = '1.2096e+06'

 And then check on mon1 and its still the default value:
 # ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show | egrep -i 
 scrub_interval
 osd_deep_scrub_interval: 604800,


 And if I pass a config on mon1 it looks like its being updated, but the 
 default remains:
 # ceph tell mon.1 injectargs --osd_deep_scrub_interval 1209600
 injectargs:osd_deep_scrub_interval = '1.2096e+06'
 # ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show | egrep -i 
 scrub_interval
 osd_deep_scrub_interval: 604800,

 I don't know if this is a bug, or if I'm doing something wrong here...
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Meanning of ceph perf dump

2015-07-24 Thread Steve Dainard
Hi Somnath,

Do you have a link with the definitions of all the perf counters?

Thanks,
Steve

On Sun, Jul 5, 2015 at 11:23 AM, Somnath Roy somnath@sandisk.com wrote:
 Hi Ray,

 Here is the description of the different latencies under filestore perf
 counters.



 Journal_latency :

 --



 This is the latency of putting the ops in journal. Write is acknowledged
 after that (well a bit after that, there is one context switch after this).



 commitcycle_latency:

 --



 Filestore backend while carrying out transaction, do a buffered write. In a
 separate thread it does call syncfs() to persist the data to the disk and
 update the persistent commit number in a separate file. This thread runs by
 default 5 sec interval.

 This latency measures the time taken to carry out this job after the timer
 expires i.e the actual persisting cycle.



 apply_latency:

 



 This is the entire latency till the transaction finishes i.e journal write +
 transaction time. It will do a buffer write here.



 queue_transaction_latency_avg:

 

 This is the latency of putting the op in the journal queue. This will give
 you an idea how much throttling is going on at the first place. This depends
 on the following two parameters if you are using XFS.



 filestore_queue_max_ops

 filestore_queue_max_bytes





 All the latency numbers are represented by avgcount(number of ops within
 this range) and the sum (which is total latency in second). Sum/avgcount
 will give you an idea the latency per op.



 Hope this is helpful,



 Thanks  Regards

 Somnath





 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ray
 Sun
 Sent: Sunday, July 05, 2015 7:28 AM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Meanning of ceph perf dump



 Cephers,

 Is there any documents or code definition to explain ceph perf dump? I am a
 little confusing about the output, for example, under filestore, there's
 journal_latency and apply_latency and each of them has avgcount and sum. I
 am not quite sure what's the unit and meaning of the numbers? How can I use
 these numbers to tuning my ceph cluster. Thanks a lot.



 filestore: {

 journal_queue_max_ops: 300,

 journal_queue_ops: 0,

 journal_ops: 35893,

 journal_queue_max_bytes: 33554432,

 journal_queue_bytes: 0,

 journal_bytes: 20579009432,

 journal_latency: {

 avgcount: 35893,

 sum: 1213.560761279

 },

 journal_wr: 34228,

 journal_wr_bytes: {

 avgcount: 34228,

 sum: 20657713152

 },

 journal_full: 0,

 committing: 0,

 commitcycle: 3207,

 commitcycle_interval: {

 avgcount: 3207,

 sum: 16157.379852152

 },

 commitcycle_latency: {

 avgcount: 3207,

 sum: 121.892109010

 },

 op_queue_max_ops: 50,

 op_queue_ops: 0,

 ops: 35893,

 op_queue_max_bytes: 104857600,

 op_queue_bytes: 0,

 bytes: 20578506930,

 apply_latency: {

 avgcount: 35893,

 sum: 1327.974596287

 },

 queue_transaction_latency_avg: {

 avgcount: 35893,

 sum: 0.025993727

 }

 },



 Best Regards
 -- Ray


 

 PLEASE NOTE: The information contained in this electronic mail message is
 intended only for the use of the designated recipient(s) named above. If the
 reader of this message is not the intended recipient, you are hereby
 notified that you have received this message in error and that any review,
 dissemination, distribution, or copying of this message is strictly
 prohibited. If you have received this communication in error, please notify
 the sender by telephone or e-mail (as shown above) immediately and destroy
 any and all copies of this message in your possession (whether hard copies
 or electronically stored copies).


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Steve Dainard
Disclaimer: I'm relatively new to ceph, and haven't moved into
production with it.

Did you run your bench for 30 seconds?

For reference my bench from a VM bridged to a 10Gig card with 90x4TB
at 30 seconds is:

 Total time run: 30.766596
Total writes made:  1979
Write size: 4194304
Bandwidth (MB/sec): 257.292

Stddev Bandwidth:   106.78
Max bandwidth (MB/sec): 420
Min bandwidth (MB/sec): 0
Average Latency:0.248238
Stddev Latency: 0.723444
Max latency:10.5275
Min latency:0.0346015

Seems like latency is a huge factor if your 30 second test took 52 seconds.

What kind of 10Gig NICs are you using? I have Mellanox Connectx-3 and
one node was using an older driver version. I started to experience
the osd in..out..in.. and incorrectly marked out from... as
mentioned by Quentin as well as poor performance. Installed the newest
version of the Mellanox driver and all is running well again.

On Fri, Jul 17, 2015 at 7:55 AM, J David j.david.li...@gmail.com wrote:
 On Fri, Jul 17, 2015 at 10:21 AM, Mark Nelson mnel...@redhat.com wrote:
 rados -p pool 30 bench write

 just to see how it handles 4MB object writes.

 Here's that, from the VM host:

  Total time run: 52.062639
 Total writes made:  66
 Write size: 4194304
 Bandwidth (MB/sec): 5.071

 Stddev Bandwidth:   11.6312
 Max bandwidth (MB/sec): 80
 Min bandwidth (MB/sec): 0
 Average Latency:12.436
 Stddev Latency: 13.6272
 Max latency:51.6924
 Min latency:0.073353

 Unfortunately I don't know much about how to parse this (other than
 5MB/sec writes does match up with our best-case performance in the VM
 guest).

 If rados bench is
 also terribly slow, then you might want to start looking for evidence of IO
 getting hung up on a specific disk or node.

 Thusfar, no evidence of that has presented itself.  iostat looks good
 on every drive and the nodes are all equally loaded.

 Thanks!
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?

2015-07-17 Thread Steve Dainard
Other than those errors, do you find RBD's will not be unmapped on
system restart/shutdown on a machine using systemd? Leaving the system
hanging without network connections trying to unmap RBD's?

That's been my experience thus far, so I wrote an (overly simple)
systemd file to handle this on a per RBD basis.

On Tue, Jul 14, 2015 at 1:15 PM, Bruce McFarland
bruce.mcfarl...@taec.toshiba.com wrote:
 When starting the rbdmap.service to provide map/unmap of rbd devices across
 boot/shutdown cycles the /etc/init.d/rbdmap includes
 /lib/lsb/init-functions. This is not a problem except that the rbdmap script
 is making calls to the log_daemon_* log_progress_* log_actiion_* functions
 that are included in Ubuntu 14.04 distro's, but are not in the RHEL 7.1/RHCS
 1.3 distro. Are there any recommended workaround for boot time startup in
 RHEL/Centos 7.1 clients?


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unsetting osd_crush_chooseleaf_type = 0

2015-07-16 Thread Steve Dainard
I originally built a single node cluster, and added
'osd_crush_chooseleaf_type = 0 #0 is for one node cluster' to ceph.conf
(which is now commented out).

I've now added a 2nd node, where can I set this value to 1? I see in the
crush map that the osd's are under 'host' buckets and don't see any
reference to leaf.

Would the cluster automatically rebalance when the 2nd host was added? How
can I verify this?

The issue right now, is with two host, copies = 2, min copies = 1, I cannot
access data from client machines when one of the two hosts goes down.

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host ceph1 {
id -2   # do not change unnecessarily
# weight 163.350
alg straw
hash 0  # rjenkins1
item osd.0 weight 3.630
item osd.1 weight 3.630
}
host ceph2 {
id -3   # do not change unnecessarily
# weight 163.350
alg straw
hash 0  # rjenkins1
item osd.2 weight 3.630
item osd.3 weight 3.630
}
root default {
id -1   # do not change unnecessarily
# weight 326.699
alg straw
hash 0  # rjenkins1
item ceph1 weight 163.350
item ceph2 weight 163.350
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
*step choose firstn 0 type osd -- should this line be ''**step
chooseleaf firstn 0 type host?*
step emit
}

# end crush map
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Health WARN, ceph errors looping

2015-07-07 Thread Steve Dainard
Hello,

Ceph 0.94.1
2 hosts, Centos 7

I have two hosts, one which ran out of / disk space which crashed all
the osd daemons. After cleaning up the OS disk storage and restarting
ceph on that node, I'm seeing multiple errors, then health OK, then
back into the errors:

# ceph -w
http://pastebin.com/mSKwNzYp

Any help is appreciated.

Thanks,
Steve
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Health WARN, ceph errors looping

2015-07-07 Thread Steve Dainard
The error keeps coming back, eventually status changing to OK, then
back into errors.

I thought it looked like a connectivity issue as well with the
wrongly marked me down, but firewall rules are allowing all traffic
on the cluster network.

Syslog is being flooded with messages like:
Jul  7 10:52:17 ceph1 bash: 2015-07-07 10:52:17.609870 7f2055192700 -1
osd.21 129936 heartbeat_check: no reply from osd.89 ever on either
front or back, first ping sent 2015-07-07 10:51:50.995374 (cutoff
2015-07-07 10:51:57.609817)
Jul  7 10:52:17 ceph1 bash: 2015-07-07 10:52:17.611302 7f203ba5b700 -1
osd.21 129936 heartbeat_check: no reply from osd.50 ever on either
front or back, first ping sent 2015-07-07 10:51:44.691270 (cutoff
2015-07-07 10:51:57.611297)
Jul  7 10:52:17 ceph1 bash: 2015-07-07 10:52:17.611309 7f203ba5b700 -1
osd.21 129936 heartbeat_check: no reply from osd.61 ever on either
front or back, first ping sent 2015-07-07 10:51:50.995374 (cutoff
2015-07-07 10:51:57.611297)
Jul  7 10:52:17 ceph1 bash: 2015-07-07 10:52:17.611315 7f203ba5b700 -1
osd.21 129936 heartbeat_check: no reply from osd.69 ever on either
front or back, first ping sent 2015-07-07 10:51:54.998259 (cutoff
2015-07-07 10:51:57.611297)

Thats just a small section, but multiple osd's are listed. eventually
the logs are rate limited because they're coming in so fast.

On Tue, Jul 7, 2015 at 10:13 AM, Abhishek L
abhishek.lekshma...@gmail.com wrote:

 Steve Dainard writes:

 Hello,

 Ceph 0.94.1
 2 hosts, Centos 7

 I have two hosts, one which ran out of / disk space which crashed all
 the osd daemons. After cleaning up the OS disk storage and restarting
 ceph on that node, I'm seeing multiple errors, then health OK, then
 back into the errors:

 # ceph -w
 http://pastebin.com/mSKwNzYp

 Is the error still consistently happening? (the last lines shows
 active+clean) Wild guess, but is it possible some sort of
 iptables/firewall rules are preventing communication between the osds?


 Any help is appreciated.

 Thanks,
 Steve
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 --
 Abhishek
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can't mount btrfs volume on rbd

2015-06-11 Thread Steve Dainard
Hello,

I'm getting an error when attempting to mount a volume on a host that was
forceably powered off:

# mount /dev/rbd4 climate-downscale-CMIP5/
mount: mount /dev/rbd4 on /mnt/climate-downscale-CMIP5 failed: Stale file
handle

/var/log/messages:
Jun 10 15:31:07 node1 kernel: rbd4: unknown partition table

# parted /dev/rbd4 print
Model: Unknown (unknown)
Disk /dev/rbd4: 36.5TB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End SizeFile system  Flags
 1  0.00B  36.5TB  36.5TB  btrfs

# btrfs check --repair /dev/rbd4
enabling repair mode
Checking filesystem on /dev/rbd4
UUID: dfe6b0c8-2866-4318-abc2-e1e75c891a5e
checking extents
cmds-check.c:2274: check_owner_ref: Assertion `rec-is_root` failed.
btrfs[0x4175cc]
btrfs[0x41b873]
btrfs[0x41c3fe]
btrfs[0x41dc1d]
btrfs[0x406922]


OS: CentOS 7.1
btrfs-progs: 3.16.2
Ceph: version: 0.94.1/CentOS 7.1

I haven't found any references to 'stale file handle' on btrfs.

The underlying block device is ceph rbd, so I've posted to both lists for
any feedback. Also once I reformatted btrfs I didn't get a mount error.

The btrfs volume has been reformatted so I won't be able to do much post
mortem but I'm wondering if anyone has some insight.

Thanks,
Steve
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com