[ceph-users] ask about "recovery optimazation:recovery what isreally modified"

2017-07-28 Thread donglifec...@gmail.com
yaoning, haomai, Json

what about the "recovery what is really modified" feature? I didn't see any 
update on github recently, will it be further developed?

https://github.com/ceph/ceph/pull/3837 (PG:: recovery optimazation: recovery 
what is really modified)
Thanks a lot.



donglifec...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CRC mismatch detection on read (XFS OSD)

2017-07-28 Thread Gregory Farnum
On Fri, Jul 28, 2017 at 8:16 AM Дмитрий Глушенок  wrote:

> Hi!
>
> Just found strange thing while testing deep-scrub on 10.2.7.
> 1. Stop OSD
> 2. Change primary copy's contents (using vi)
> 3. Start OSD
>
> Then 'rados get' returns "No such file or directory". No error messages
> seen in OSD log, cluster status "HEALTH_OK".
>
> 4. ceph pg repair 
>
> Then 'rados get' works as expected, "currupted" data repaired.
>
> One time (I was unable to reproduce this) the error was detected on-fly
> (without OSD restart):
>
> 2017-07-28 17:34:22.362968 7ff8bfa27700 -1 log_channel(cluster) log [ERR]
> : 16.d full-object read crc 0x78fcc738 != expected 0x5fd86d3e on
> 16:b36845b2:::testobject1:head
>
> Am I missed that CRC storing/verifying started to work on XFS? If so,
> where the are stored? xattr? I thought it was only implemented in Bluestore.
>

FileStore maintains CRC checksums opportunistically, such as when you do a
full-object write. So in some circumstances it can detect objects with the
wrong data and do repairs on its own. (And the checksum is stored in the
object_info, which is written down in an xattr, yes.)

I'm not certain why overwriting the file with vi made it return ENOENT, but
probably because it lost the xattrs storing metadata. (...though I'd expect
that to return an error on the primary that either prompts it to repair, or
else incorrectly returns that raw error to the client. Can you create a
ticket with exactly what steps you followed and what outcome you saw?)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd safe to remove

2017-07-28 Thread Peter Maloney
Hello Dan,

Based on what I know and what people told me on IRC, this means basicaly
the condition that the osd is not acting nor up for any pg. And for one
person (fusl on irc) that said there was a unfound objects bug when he
had size = 1, also he said if reweight (and I assume crush weight) is 0,
it will surely be safe, but possibly it won't be otherwise.

And so here I took my bc-ceph-reweight-by-utilization.py script that
already parses `ceph pg dump --format=json` (for up,acting,bytes,count
of pgs) and `ceph osd df --format=json` (for weight and reweight), and
gutted out the unneeded parts, and changed the report to show the
condition I described as True or False per OSD. So the ceph auth needs
to allow ceph pg dump and ceph osd df. The script is attached.

The script doesn't assume you're ok with acting lower than size, or care
about min_size, and just assumes you want the OSD completely empty.

Sample output:

Real cluster:
> root@cephtest:~ # ./bc-ceph-empty-osds.py -a
> osd_id weight  reweight pgs_old bytes_old  pgs_new bytes_new 
> empty
>  0 4.00099  0.61998  38  1221853911536  38  1221853911536
> False
>  1 4.00099  0.59834  43  1168531341347  43  1168531341347
> False
>  2 4.00099  0.79213  44  1155260814435  44  1155260814435
> False
> 27 4.00099  0.69459  39  1210145117377  39  1210145117377
> False
> 30 6.00099  0.73933  56  1691992924542  56  1691992924542
> False
> 31 6.00099  0.81180  64  1810503842054  64  1810503842054
> False
> ...

Test cluster with some -nan and 0's in crush map:
> root@tceph1:~ # ceph osd df
> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE VAR  PGS
>  4 1.00  0  0  0 -nan -nan   0
>  1 0.06439  1.0 61409M 98860k 61313M 0.16 0.93  47
>  0 0.06438  1.0 61409M   134M 61275M 0.22 1.29  59
>  2 0.06439  1.0 61409M 82300k 61329M 0.13 0.77  46
>  3   00  0  0  0 -nan -nan   0
>   TOTAL   179G   311M   179G 0.17  
> MIN/MAX VAR: 0.77/1.29  STDDEV: 0.04

> root@tceph1:~ # ./bc-ceph-empty-osds.py 
> osd_id weight  reweight pgs_old bytes_old  pgs_new bytes_new 
> empty
>  3 0.0  0.0   0  0   0  0 True
>  4 1.0  0.0   0  0   0  0 True
> root@tceph1:~ # ./bc-ceph-empty-osds.py -a
> osd_id weight  reweight pgs_old bytes_old  pgs_new bytes_new 
> empty
>  0 0.06438  1.0  59   46006167  59   46006167
> False
>  1 0.06439  1.0  47   28792306  47   28792306
> False
>  2 0.06439  1.0  46   17623485  46   17623485
> False
>  3 0.0  0.0   0  0   0  0 True
>  4 1.0  0.0   0  0   0  0 True

The "old" vs "new" suffixes refer to the position of data now and after
recovery is complete, respectively. (the magic that made my reweight
script efficient compared to the official reweight script)

And I have not used such a method in the past... my cluster is small, so
I have always just let recovery completely finish instead. I hope you
find it useful and it develops from there.

Peter

On 07/28/17 15:36, Dan van der Ster wrote:
> Hi all,
>
> We are trying to outsource the disk replacement process for our ceph
> clusters to some non-expert sysadmins.
> We could really use a tool that reports if a Ceph OSD *would* or
> *would not* be safe to stop, e.g.
>
> # ceph-osd-safe-to-stop osd.X
> Yes it would be OK to stop osd.X
>
> (which of course means that no PGs would go inactive if osd.X were to
> be stopped).
>
> Does anyone have such a script that they'd like to share?
>
> Thanks!
>
> Dan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


#!/usr/bin/env python3
#
# tells you if an osd is empty (no pgs up or acting, and no weight)
# (most of the code here was copied from bc-ceph-reweight-by-utilization.py)
#
# Author: Peter Maloney
# Licensed GNU GPLv2; if you did not recieve a copy of the license, get one at http://www.gnu.org/licenses/gpl-2.0.html

import sys
import subprocess
import re
import argparse
import time
import logging
import json

#
# global variables
#

osds = {}
health = ""
json_nan_regex = None

#
# logging
#

logging.VERBOSE = 15
def log_verbose(self, message, *args, **kws):
if self.isEnabledFor(logging.VERBOSE):
self.log(logging.VERBOSE, message, *args, **kws)

logging.addLevelName(logging.VERBOSE, "VERBOSE")
logging.Logger.verbose = log_verbose

formatter = logging.Formatter(
fmt='%(asctime)-15s.%(msecs)03d %(levelname)s: %(message)s',
datefmt="%Y-%m-%d %H:%M:%S"
)

handler = logging.StreamHandler()
handler.setFormatter(formatter)

l

Re: [ceph-users] ceph osd safe to remove

2017-07-28 Thread Alexandre Germain
Hello Dan,

Something like this maybe?

https://github.com/CanonicalLtd/ceph_safe_disk

Cheers,

Alex

2017-07-28 9:36 GMT-04:00 Dan van der Ster :

> Hi all,
>
> We are trying to outsource the disk replacement process for our ceph
> clusters to some non-expert sysadmins.
> We could really use a tool that reports if a Ceph OSD *would* or
> *would not* be safe to stop, e.g.
>
> # ceph-osd-safe-to-stop osd.X
> Yes it would be OK to stop osd.X
>
> (which of course means that no PGs would go inactive if osd.X were to
> be stopped).
>
> Does anyone have such a script that they'd like to share?
>
> Thanks!
>
> Dan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CRC mismatch detection on read (XFS OSD)

2017-07-28 Thread Дмитрий Глушенок
Hi!

Just found strange thing while testing deep-scrub on 10.2.7.
1. Stop OSD
2. Change primary copy's contents (using vi)
3. Start OSD

Then 'rados get' returns "No such file or directory". No error messages seen in 
OSD log, cluster status "HEALTH_OK".

4. ceph pg repair 

Then 'rados get' works as expected, "currupted" data repaired.

One time (I was unable to reproduce this) the error was detected on-fly 
(without OSD restart):

2017-07-28 17:34:22.362968 7ff8bfa27700 -1 log_channel(cluster) log [ERR] : 
16.d full-object read crc 0x78fcc738 != expected 0x5fd86d3e on 
16:b36845b2:::testobject1:head

Am I missed that CRC storing/verifying started to work on XFS? If so, where the 
are stored? xattr? I thought it was only implemented in Bluestore.

--
Dmitry Glushenok
Jet Infosystems

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Networking/naming doubt

2017-07-28 Thread Oscar Segarra
Hi David,

Thanks a lot for your comments!

I just want to utilize a different network than the public one (where dns
resolves the name) for ceph-deploy and client connections.

For example with 3 nics:

Nic1: Public (internet acces)
Nic2: Ceph-mon (clients and ceph-deploy)
Nic3: Ceph-osd

Thanks a lot for your help!

El 28 jul. 2017 2:25 a. m., "David Turner"  escribió:

The only thing that is supposed to use the cluster network are the OSDs.
Not even the MONs access the cluster network. I am sure that if you have a
need to make this work that you can find a way, but I don't know that one
exists in the standard tool set.

You might try temporarily setting the /etc/hosts reference for vdicnode02
and vdicnode03 to the cluster network and use the proper hosts name in the
ceph-deploy command. Ceph cluster operations do not use dns at all, so you
could probably leave your /etc/hosts in this state. I don't know if it
would work though. It's really not intended for any communication to happen
on this subnet other than inter-OSD traffic.



On Thu, Jul 27, 2017 at 6:31 PM Oscar Segarra 
wrote:

> Sorry! I'd like to add that I want to use the cluster network for both
> purposes:
>
> ceph-deploy --username vdicceph new vdicnode01 --cluster-network
> 192.168.100.0/24 --public-network 192.168.100.0/24
>
> Thanks a lot
>
>
> 2017-07-28 0:29 GMT+02:00 Oscar Segarra :
>
>> Hi,
>>
>> ¿Do you mean that for security reasons ceph-deploy can only be executed
>> from the public interface?
>>
>> Looks extrange that one cannot decide what network use for ceph-deploy...
>> I could have a dedicated network for ceph-deploy... :S
>>
>> Thanks a lot
>>
>> 2017-07-28 0:03 GMT+02:00 Roger Brown :
>>
>>> I could be wrong, but I think you cannot achieve this objective. If you
>>> declare a cluster network, OSDs will route heartbeat, object replication
>>> and recovery traffic over the cluster network. We prefer that the cluster
>>> network is NOT reachable from the public network or the Internet for added
>>> security. Therefore it will not work with ceph-deploy actions.
>>> Source: http://docs.ceph.com/docs/master/rados/
>>> configuration/network-config-ref/
>>>
>>>
>>> On Thu, Jul 27, 2017 at 3:53 PM Oscar Segarra 
>>> wrote:
>>>
 Hi,

 In my environment I have 3 hosts, every host has 2 network interfaces:

 public: 192.168.2.0/24
 cluster: 192.168.100.0/24

 The hostname "vdicnode01", "vdicnode02" and "vdicnode03" are resolved
 by public DNS through the public interface, that means the "ping
 vdicnode01" will resolve 192.168.2.1.

 In my environment the "admin" node is the first node vdicnode01 and I'd
 like all the deployment "ceph-deploy" and all osd traffic to go from the
 cluster network.

 1) To begin with, I create the cluster and I want all traffic to go
 from the cluster network:
 ceph-deploy --username vdicceph new vdicnode01 --cluster-network
 192.168.100.0/24 --public-network 192.168.100.0/24

 2) The problem comes when I have to launch my commands to the other
 hosts for example, from node vdicnode01 I execute:

 2.1) ceph-deploy --username vdicceph osd create vdicnode02:sdb
 --> Finishes Ok but communication goes through the public interface

 2.2) ceph-deploy --username vdicceph osd create vdicnode02.local:sdb
 --> vdicnode02.local is added manually in /etc/hosts (assigned a
 cluster IP)
 --> It raises some errors/warnings becase vdicnod02.local is not the
 real hostname. Some files are created with vdicnode02.local in the middle
 of the name of the file and some errors appear when starting up the osd
 service related to "file does not exist"

 2.3) ceph-deploy --username vdicceph osd create vdicnode02-priv:sdb
 --> vdicnode02-priv is added manually in /etc/hosts (assigned a cluster
 IP)
 --> It raises some errors/warnings becase vdicnod02-priv is not the
 real hostname. Some files are created with vdicnode02-priv in the middle of
 the name of the file and some errors appear when starting up the osd
 service related to "file does not exist"

 What would be the right way to achieve my objective?

 If is there any documentation I have not found, please redirect me...

 Thanks a lot for your help in advance.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph osd safe to remove

2017-07-28 Thread Dan van der Ster
Hi all,

We are trying to outsource the disk replacement process for our ceph
clusters to some non-expert sysadmins.
We could really use a tool that reports if a Ceph OSD *would* or
*would not* be safe to stop, e.g.

# ceph-osd-safe-to-stop osd.X
Yes it would be OK to stop osd.X

(which of course means that no PGs would go inactive if osd.X were to
be stopped).

Does anyone have such a script that they'd like to share?

Thanks!

Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unable to remove osd from crush map. - leads remapped pg's v11.2.0

2017-07-28 Thread nokia ceph
Hello,

Recently we got an underlying issue with osd.10 which mapped to /dev/sde .
So we tried to removed it from the crush

===
#systemctl stop ceph-osd@10.service

#for x in {10..10}; do ceph osd out $x;ceph osd crush remove osd.$x;ceph
auth del osd.$x;ceph osd rm osd.$x ;done
marked out osd.10.
removed item id 10 name 'osd.10' from crush map
updated
removed osd.10


Still I can see the entry in crush
#ceph osd crush dump
<<..>>
{
"id": 10,
"name": "device10"
},
<<..>>

Then I tried to manually removed using crush tool by below steps.

#ceph osd getcrushmap -o /tmp/test.map
#crushtool -d /tmp/test.map -o /tmp/test1.map

Opened file /tmp/test1.map and removed
#vim /tmp/test1.map
<..>
device 9 osd.9
device 10 device10  --< removed this entry
device 11 osd.11
<..>

#crushtool -c /tmp/test1.map -o /tmp/test2.map
#ceph osd setcrushmap -i /tmp/test2.map  -- Reinject to crush


Still i can see the device10 info in the crush
#ceph osd crush dump 2> /dev/null | grep device10
"name": "device10"

Even i tried below command, no luck...
#ceph osd crush rm osd.10
#ceph osd crush rm 10
#ceph osd crush rm device0


Due to this issue, 9 PG's got affected and landed in "remapped+incomplete"
state.

# for i in `cat test`; do  ceph pg map $i 2> /dev/null; done
osdmap e2443 pg 3.d9 (3.d9) -> up [8,63,77,35,117] acting
[2147483647,63,77,2147483647,117]
osdmap e2443 pg 3.9f (3.9f) -> up [80,47,116,19,3] acting
[80,2147483647,116,2147483647,3]
osdmap e2443 pg 3.7fe (3.7fe) -> up [17,27,93,23,102] acting
[17,27,93,2147483647,102]
osdmap e2443 pg 3.32f (3.32f) -> up [64,69,94,111,20] acting
[2147483647,69,94,111,2147483647]
osdmap e2443 pg 3.34f (3.34f) -> up [102,25,90,1,24] acting
[102,2147483647,90,2147483647,24]
osdmap e2443 pg 3.176 (3.176) -> up [9,2,107,13,91] acting
[9,2,107,2147483647,91]
osdmap e2443 pg 3.10e (3.10e) -> up [88,61,21,59,100] acting
[2147483647,2147483647,21,2147483647,2147483647]
osdmap e2443 pg 3.48 (3.48) -> up [114,18,32,90,8] acting
[114,18,2147483647,90,8]
osdmap e2443 pg 3.71a (3.71a) -> up [3,78,58,71,116] acting
[3,78,58,2147483647,116]

#  ceph pg  $i query 2> /dev/null| grep -w -A1 "blocked_by\"\: \[" | grep
-v -
"blocked_by": [
10  ==>>>

#ceph pg  $i query 2> /dev/null| grep -w -A1 down_osds_we_would_probe
"down_osds_we_would_probe": [  -->>
10

Then I tried to recreate the pg using below command

#ceph pg force_create_pg id

Still no luck ...

Here the osd.10 is still present in crush which caused I'm unable to
recover these 9 PG's . Whenever we reboot that affected OSD.10 node which
leads the osd join back to the cluster again which is weired. >>

Comments please how to forcefully remove device10 / osd.10 info from the
crush.

Attached crushmap file.

Thanks
Jayaram
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 device10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22 osd.22
device 23 osd.23
device 24 osd.24
device 25 osd.25
device 26 osd.26
device 27 osd.27
device 28 osd.28
device 29 osd.29
device 30 osd.30
device 31 osd.31
device 32 osd.32
device 33 osd.33
device 34 osd.34
device 35 osd.35
device 36 osd.36
device 37 osd.37
device 38 osd.38
device 39 osd.39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47
device 48 osd.48
device 49 osd.49
device 50 osd.50
device 51 osd.51
device 52 osd.52
device 53 osd.53
device 54 osd.54
device 55 osd.55
device 56 osd.56
device 57 osd.57
device 58 osd.58
device 59 osd.59
device 60 osd.60
device 61 osd.61
device 62 osd.62
device 63 osd.63
device 64 osd.64
device 65 osd.65
device 66 osd.66
device 67 osd.67
device 68 osd.68
device 69 osd.69
device 70 osd.70
device 71 osd.71
device 72 osd.72
device 73 osd.73
device 74 osd.74
device 75 osd.75
device 76 osd.76
device 77 osd.77
device 78 osd.78
device 79 osd.79
device 80 osd.80
device 81 osd.81
device 82 osd.82
device 83 osd.83
device 84 osd.84
device 85 osd.85
device 86 osd.86
device 87 osd.87
device 88 osd.88
device 89 osd.89
device 90 osd.90
device 91 osd.91
device 92 osd.92
device 93 osd.93
device 94 osd.94
device 95 osd.95
device 96 osd.96
device 97 osd.97
device 98 osd.98
device 99 osd.99
device 100 osd.100
device 101 osd.101
device 102 osd.102
device 103 osd.103
device 104 osd.104
device 105 osd.105
device 106 osd.106
device 107 osd.107
device 108 osd.108
device 109 osd.109
device 110 o

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread linghucongsong


1 You have 3 size pool I do not know why you set min_size 1. It is too dangous.

2 You had better use the same size and same num osds each host for crush.

now you can try ceph osd reweight-by-utilization. command. When there is no 
user in you cluster.

and I will go home.






At 2017-07-28 17:57:11, "Nikola Ciprich"  wrote:
>On Fri, Jul 28, 2017 at 05:52:29PM +0800, linghucongsong wrote:
>> 
>> 
>> 
>> You have two crush rule? One is ssd the other is hdd?
>yes, exactly..
>
>> 
>> Can you show ceph osd dump|grep pool
>> 
>
>pool 3 'vm' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
>pg_num 1024 pgp_num 1024 last_change 69955 flags hashpspool 
>min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
>pool 4 'cephfs_data' replicated size 3 min_size 1 crush_ruleset 0 object_hash 
>rjenkins pg_num 1024 pgp_num 1024 last_change 74682 flags hashpspool 
>crash_replay_interval 45 min_write_recency_for_promote 1 stripe_width 0
>pool 5 'cephfs_metadata' replicated size 3 min_size 1 crush_ruleset 0 
>object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 74667 flags 
>hashpspool min_write_recency_for_promote 1 stripe_width 0
>pool 11 'ssd' replicated size 3 min_size 1 crush_ruleset 1 object_hash 
>rjenkins pg_num 128 pgp_num 128 last_change 46119 flags hashpspool 
>min_write_recency_for_promote 1 stripe_width 0
>
>
>> ceph osd crush dump
>
>{
>"devices": [
>{
>"id": 0,
>"name": "osd.0"
>},
>{
>"id": 1,
>"name": "osd.1"
>},
>{
>"id": 2,
>"name": "osd.2"
>},
>{
>"id": 3,
>"name": "osd.3"
>},
>{
>"id": 4,
>"name": "osd.4"
>},
>{
>"id": 5,
>"name": "osd.5"
>},
>{
>"id": 6,
>"name": "osd.6"
>},
>{
>"id": 7,
>"name": "device7"
>},
>{
>"id": 8,
>"name": "osd.8"
>},
>{
>"id": 9,
>"name": "osd.9"
>},
>{
>"id": 10,
>"name": "osd.10"
>},
>{
>"id": 11,
>"name": "osd.11"
>},
>{
>"id": 12,
>"name": "osd.12"
>},
>{
>"id": 13,
>"name": "osd.13"
>},
>{
>"id": 14,
>"name": "osd.14"
>},
>{
>"id": 15,
>"name": "osd.15"
>},
>{
>"id": 16,
>"name": "osd.16"
>},
>{
>"id": 17,
>"name": "osd.17"
>},
>{
>"id": 18,
>"name": "osd.18"
>},
>{
>"id": 19,
>"name": "osd.19"
>},
>{
>"id": 20,
>"name": "osd.20"
>},
>{
>"id": 21,
>"name": "osd.21"
>},
>{
>"id": 22,
>"name": "osd.22"
>},
>{
>"id": 23,
>"name": "osd.23"
>},
>{
>"id": 24,
>"name": "osd.24"
>},
>{
>"id": 25,
>"name": "osd.25"
>},
>{
>"id": 26,
>"name": "osd.26"
>}
>],
>"types": [
>{
>"type_id": 0,
>"name": "osd"
>},
>{
>"type_id": 1,
>"name": "host"
>},
>{
>"type_id": 2,
>"name": "chassis"
>},
>{
>"type_id": 3,
>"name": "rack"
>},
>{
>"type_id": 4,
>"name": "row"
>},
>{
>"type_id": 5,
>"name": "pdu"
>},
>{
>"type_id": 6,
>"name": "pod"
>},
>{
>"type_id": 7,
>"name": "room"
>},
>{
>"type_id": 8,
>"name": "datacenter"
>},
>{
>"type_id": 9,
>"name": "region"
>},
>{
>"type_id": 10,
>"name": "root"
>}
>],
>"buckets": [
>{
>"id": -1,
>"name": "default",
>"type_id": 10,
>"type_name": "root",
>"weight": 2575553,
>"alg": "straw2",
>"hash": "rjenkins1",
>"items": [
>{
>"id": -4,
>"weight": 779875,
>"pos": 0
>},
>{
>"id": -5,
>"weight": 681571,
>"pos": 1
>},
>{
>"id": -6,
>"weight": 511178,
> 

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich
On Fri, Jul 28, 2017 at 05:52:29PM +0800, linghucongsong wrote:
> 
> 
> 
> You have two crush rule? One is ssd the other is hdd?
yes, exactly..

> 
> Can you show ceph osd dump|grep pool
> 

pool 3 'vm' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
pg_num 1024 pgp_num 1024 last_change 69955 flags hashpspool 
min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
pool 4 'cephfs_data' replicated size 3 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 1024 pgp_num 1024 last_change 74682 flags hashpspool 
crash_replay_interval 45 min_write_recency_for_promote 1 stripe_width 0
pool 5 'cephfs_metadata' replicated size 3 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 74667 flags 
hashpspool min_write_recency_for_promote 1 stripe_width 0
pool 11 'ssd' replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins 
pg_num 128 pgp_num 128 last_change 46119 flags hashpspool 
min_write_recency_for_promote 1 stripe_width 0


> ceph osd crush dump

{
"devices": [
{
"id": 0,
"name": "osd.0"
},
{
"id": 1,
"name": "osd.1"
},
{
"id": 2,
"name": "osd.2"
},
{
"id": 3,
"name": "osd.3"
},
{
"id": 4,
"name": "osd.4"
},
{
"id": 5,
"name": "osd.5"
},
{
"id": 6,
"name": "osd.6"
},
{
"id": 7,
"name": "device7"
},
{
"id": 8,
"name": "osd.8"
},
{
"id": 9,
"name": "osd.9"
},
{
"id": 10,
"name": "osd.10"
},
{
"id": 11,
"name": "osd.11"
},
{
"id": 12,
"name": "osd.12"
},
{
"id": 13,
"name": "osd.13"
},
{
"id": 14,
"name": "osd.14"
},
{
"id": 15,
"name": "osd.15"
},
{
"id": 16,
"name": "osd.16"
},
{
"id": 17,
"name": "osd.17"
},
{
"id": 18,
"name": "osd.18"
},
{
"id": 19,
"name": "osd.19"
},
{
"id": 20,
"name": "osd.20"
},
{
"id": 21,
"name": "osd.21"
},
{
"id": 22,
"name": "osd.22"
},
{
"id": 23,
"name": "osd.23"
},
{
"id": 24,
"name": "osd.24"
},
{
"id": 25,
"name": "osd.25"
},
{
"id": 26,
"name": "osd.26"
}
],
"types": [
{
"type_id": 0,
"name": "osd"
},
{
"type_id": 1,
"name": "host"
},
{
"type_id": 2,
"name": "chassis"
},
{
"type_id": 3,
"name": "rack"
},
{
"type_id": 4,
"name": "row"
},
{
"type_id": 5,
"name": "pdu"
},
{
"type_id": 6,
"name": "pod"
},
{
"type_id": 7,
"name": "room"
},
{
"type_id": 8,
"name": "datacenter"
},
{
"type_id": 9,
"name": "region"
},
{
"type_id": 10,
"name": "root"
}
],
"buckets": [
{
"id": -1,
"name": "default",
"type_id": 10,
"type_name": "root",
"weight": 2575553,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": -4,
"weight": 779875,
"pos": 0
},
{
"id": -5,
"weight": 681571,
"pos": 1
},
{
"id": -6,
"weight": 511178,
"pos": 2
},
{
"id": -3,
"weight": 602929,
"pos": 3
}
]
},
{
"id": -2,
"name": "ssd",
"type_id": 10,
"type_name": "root",
"weight": 102233,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": -9,
"weight": 26214,
"pos": 0

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread linghucongsong



You have two crush rule? One is ssd the other is hdd?

Can you show ceph osd dump|grep pool

ceph osd crush dump






At 2017-07-28 17:47:48, "Nikola Ciprich"  wrote:
>
>On Fri, Jul 28, 2017 at 05:43:14PM +0800, linghucongsong wrote:
>> 
>> 
>> It look like the osd in your cluster is not all the same size.
>> 
>> can you show ceph osd df output?
>
>you're right, they're not..  here's the output:
>
>[root@v1b ~]# ceph osd df tree
>ID  WEIGHT   REWEIGHT SIZE   USE   AVAIL  %USE  VAR  PGS TYPE NAME 
> -2  1.55995-  1706G  883G   805G 51.78 2.55   0 root ssd  
> -9  0.3-   393G  221G   171G 56.30 2.78   0 host v1c-ssd  
> 10  0.3  1.0   393G  221G   171G 56.30 2.78  98 osd.10
>-10  0.59998-   683G  275G   389G 40.39 1.99   0 host v1a-ssd  
>  5  0.2  1.0   338G  151G   187G 44.77 2.21  65 osd.5 
> 26  0.2  1.0   344G  124G   202G 36.07 1.78  52 osd.26
>-11  0.34000-   338G  219G   119G 64.68 3.19   0 host v1b-ssd  
> 13  0.34000  1.0   338G  219G   119G 64.68 3.19  96 osd.13
> -7  0.21999-   290G  166G   123G 57.43 2.83   0 host v1d-ssd  
> 19  0.21999  1.0   290G  166G   123G 57.43 2.83  73 osd.19
> -1 39.29982- 43658G 8312G 34787G 19.04 0.94   0 root default  
> -4 11.89995- 12806G 2422G 10197G 18.92 0.93   0 host v1a  
>  6  1.5  1.0  1833G  358G  1475G 19.53 0.96 366 osd.6 
>  8  1.7  1.0  1833G  313G  1519G 17.11 0.84 370 osd.8 
>  2  1.5  1.0  1833G  320G  1513G 17.46 0.86 331 osd.2 
>  0  1.7  1.0  1804G  431G  1373G 23.90 1.18 359 osd.0 
>  4  1.5  1.0  1833G  294G  1539G 16.07 0.79 360 osd.4 
> 25  3.5  1.0  3667G  704G  2776G 19.22 0.95 745 osd.25
> -5 10.39995- 10914G 2154G  8573G 19.74 0.97   0 host v1b  
>  1  1.5  1.0  1804G  350G  1454G 19.42 0.96 409 osd.1 
>  3  1.7  1.0  1804G  360G  1444G 19.98 0.99 412 osd.3 
>  9  1.5  1.0  1804G  331G  1473G 18.37 0.91 363 osd.9 
> 11  1.7  1.0  1833G  367G  1465G 20.06 0.99 415 osd.11
> 24  3.5  1.0  3667G  744G  2736G 20.30 1.00 834 osd.24
> -6  7.79996-  9051G 1769G  7282G 19.54 0.96   0 host v1c  
> 14  1.5  1.0  1804G  370G  1433G 20.54 1.01 442 osd.14
> 15  1.7  1.0  1833G  383G  1450G 20.92 1.03 447 osd.15
> 16  1.3  1.0  1804G  295G  1508G 16.38 0.81 355 osd.16
> 18  1.3  1.0  1804G  366G  1438G 20.29 1.00 381 osd.18
> 17  1.5  1.0  1804G  353G  1451G 19.57 0.97 429 osd.17
> -3  9.19997- 10885G 1965G  8733G 18.06 0.89   0 host v1d-sata 
> 12  1.3  1.0  1804G  348G  1455G 19.32 0.95 365 osd.12
> 20  1.3  1.0  1804G  335G  1468G 18.60 0.92 371 osd.20
> 21  3.5  1.0  3667G  695G  2785G 18.97 0.94 871 osd.21
> 22  1.3  1.0  1804G  281G  1522G 15.63 0.77 326 osd.22
> 23  1.3  1.0  1804G  303G  1500G 16.83 0.83 321 osd.23
>TOTAL 45365G 9195G 35592G 20.27
>MIN/MAX VAR: 0.77/3.19  STDDEV: 14.69
>
>
>
>apart from replacing OSDs, how can I help it?
>
>
>
>
>> 
>> 
>> At 2017-07-28 17:24:29, "Nikola Ciprich"  wrote:
>> >I forgot to add that OSD daemons really seem to be idle, no disk
>> >activity, no CPU usage.. it just looks to me like  some kind of
>> >deadlock, as they were waiting for each other..
>> >
>> >and so I'm trying to get last 1.5% of misplaced / degraded PGs
>> >for almost a week..
>> >
>> >
>> >On Fri, Jul 28, 2017 at 10:56:02AM +0200, Nikola Ciprich wrote:
>> >> Hi,
>> >> 
>> >> I'm trying to find reason for strange recovery issues I'm seeing on
>> >> our cluster..
>> >> 
>> >> it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
>> >> across nodes. jewel 10.2.9
>> >> 
>> >> the problem is that after some disk replaces and data moves, recovery
>> >> is progressing extremely slowly.. pgs seem to be stuck in 
>> >> active+recovering+degraded
>> >> state:
>> >> 
>> >> [root@v1d ~]# ceph -s
>> >> cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
>> >>  health HEALTH_WARN
>> >> 159 pgs backfill_wait
>> >> 4 pgs backfilling
>> >> 259 pgs degraded
>> >> 12 pgs recovering
>> >> 113 pgs recovery_wait
>> >> 215 pgs stuck degraded
>> >> 266 pgs stuck unclean
>> >> 140 pgs stuck undersized
>> >> 151 pgs undersized
>> >> recovery 37788/2327775 objects degraded (1.623%)
>> >> recovery 23854/2327775 objects misplaced (1.025%)
>> >> noout,noin flag(s) set
>> >>  monmap e21: 3 mons at 
>> >> {v1a=10.0.0.1

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich

On Fri, Jul 28, 2017 at 05:43:14PM +0800, linghucongsong wrote:
> 
> 
> It look like the osd in your cluster is not all the same size.
> 
> can you show ceph osd df output?

you're right, they're not..  here's the output:

[root@v1b ~]# ceph osd df tree
ID  WEIGHT   REWEIGHT SIZE   USE   AVAIL  %USE  VAR  PGS TYPE NAME 
 -2  1.55995-  1706G  883G   805G 51.78 2.55   0 root ssd  
 -9  0.3-   393G  221G   171G 56.30 2.78   0 host v1c-ssd  
 10  0.3  1.0   393G  221G   171G 56.30 2.78  98 osd.10
-10  0.59998-   683G  275G   389G 40.39 1.99   0 host v1a-ssd  
  5  0.2  1.0   338G  151G   187G 44.77 2.21  65 osd.5 
 26  0.2  1.0   344G  124G   202G 36.07 1.78  52 osd.26
-11  0.34000-   338G  219G   119G 64.68 3.19   0 host v1b-ssd  
 13  0.34000  1.0   338G  219G   119G 64.68 3.19  96 osd.13
 -7  0.21999-   290G  166G   123G 57.43 2.83   0 host v1d-ssd  
 19  0.21999  1.0   290G  166G   123G 57.43 2.83  73 osd.19
 -1 39.29982- 43658G 8312G 34787G 19.04 0.94   0 root default  
 -4 11.89995- 12806G 2422G 10197G 18.92 0.93   0 host v1a  
  6  1.5  1.0  1833G  358G  1475G 19.53 0.96 366 osd.6 
  8  1.7  1.0  1833G  313G  1519G 17.11 0.84 370 osd.8 
  2  1.5  1.0  1833G  320G  1513G 17.46 0.86 331 osd.2 
  0  1.7  1.0  1804G  431G  1373G 23.90 1.18 359 osd.0 
  4  1.5  1.0  1833G  294G  1539G 16.07 0.79 360 osd.4 
 25  3.5  1.0  3667G  704G  2776G 19.22 0.95 745 osd.25
 -5 10.39995- 10914G 2154G  8573G 19.74 0.97   0 host v1b  
  1  1.5  1.0  1804G  350G  1454G 19.42 0.96 409 osd.1 
  3  1.7  1.0  1804G  360G  1444G 19.98 0.99 412 osd.3 
  9  1.5  1.0  1804G  331G  1473G 18.37 0.91 363 osd.9 
 11  1.7  1.0  1833G  367G  1465G 20.06 0.99 415 osd.11
 24  3.5  1.0  3667G  744G  2736G 20.30 1.00 834 osd.24
 -6  7.79996-  9051G 1769G  7282G 19.54 0.96   0 host v1c  
 14  1.5  1.0  1804G  370G  1433G 20.54 1.01 442 osd.14
 15  1.7  1.0  1833G  383G  1450G 20.92 1.03 447 osd.15
 16  1.3  1.0  1804G  295G  1508G 16.38 0.81 355 osd.16
 18  1.3  1.0  1804G  366G  1438G 20.29 1.00 381 osd.18
 17  1.5  1.0  1804G  353G  1451G 19.57 0.97 429 osd.17
 -3  9.19997- 10885G 1965G  8733G 18.06 0.89   0 host v1d-sata 
 12  1.3  1.0  1804G  348G  1455G 19.32 0.95 365 osd.12
 20  1.3  1.0  1804G  335G  1468G 18.60 0.92 371 osd.20
 21  3.5  1.0  3667G  695G  2785G 18.97 0.94 871 osd.21
 22  1.3  1.0  1804G  281G  1522G 15.63 0.77 326 osd.22
 23  1.3  1.0  1804G  303G  1500G 16.83 0.83 321 osd.23
TOTAL 45365G 9195G 35592G 20.27
MIN/MAX VAR: 0.77/3.19  STDDEV: 14.69



apart from replacing OSDs, how can I help it?




> 
> 
> At 2017-07-28 17:24:29, "Nikola Ciprich"  wrote:
> >I forgot to add that OSD daemons really seem to be idle, no disk
> >activity, no CPU usage.. it just looks to me like  some kind of
> >deadlock, as they were waiting for each other..
> >
> >and so I'm trying to get last 1.5% of misplaced / degraded PGs
> >for almost a week..
> >
> >
> >On Fri, Jul 28, 2017 at 10:56:02AM +0200, Nikola Ciprich wrote:
> >> Hi,
> >> 
> >> I'm trying to find reason for strange recovery issues I'm seeing on
> >> our cluster..
> >> 
> >> it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
> >> across nodes. jewel 10.2.9
> >> 
> >> the problem is that after some disk replaces and data moves, recovery
> >> is progressing extremely slowly.. pgs seem to be stuck in 
> >> active+recovering+degraded
> >> state:
> >> 
> >> [root@v1d ~]# ceph -s
> >> cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
> >>  health HEALTH_WARN
> >> 159 pgs backfill_wait
> >> 4 pgs backfilling
> >> 259 pgs degraded
> >> 12 pgs recovering
> >> 113 pgs recovery_wait
> >> 215 pgs stuck degraded
> >> 266 pgs stuck unclean
> >> 140 pgs stuck undersized
> >> 151 pgs undersized
> >> recovery 37788/2327775 objects degraded (1.623%)
> >> recovery 23854/2327775 objects misplaced (1.025%)
> >> noout,noin flag(s) set
> >>  monmap e21: 3 mons at 
> >> {v1a=10.0.0.1:6789/0,v1b=10.0.0.2:6789/0,v1c=10.0.0.3:6789/0}
> >> election epoch 6160, quorum 0,1,2 v1a,v1b,v1c
> >>   fsmap e817: 1/1/1 up {0=v1a=up:active}, 1 up:standby
> >>  osdmap e76002: 26 osds: 26 up, 26 in; 185 remapped pgs
> >> flags noout,n

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread linghucongsong


It look like the osd in your cluster is not all the same size.

can you show ceph osd df output?


At 2017-07-28 17:24:29, "Nikola Ciprich"  wrote:
>I forgot to add that OSD daemons really seem to be idle, no disk
>activity, no CPU usage.. it just looks to me like  some kind of
>deadlock, as they were waiting for each other..
>
>and so I'm trying to get last 1.5% of misplaced / degraded PGs
>for almost a week..
>
>
>On Fri, Jul 28, 2017 at 10:56:02AM +0200, Nikola Ciprich wrote:
>> Hi,
>> 
>> I'm trying to find reason for strange recovery issues I'm seeing on
>> our cluster..
>> 
>> it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
>> across nodes. jewel 10.2.9
>> 
>> the problem is that after some disk replaces and data moves, recovery
>> is progressing extremely slowly.. pgs seem to be stuck in 
>> active+recovering+degraded
>> state:
>> 
>> [root@v1d ~]# ceph -s
>> cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
>>  health HEALTH_WARN
>> 159 pgs backfill_wait
>> 4 pgs backfilling
>> 259 pgs degraded
>> 12 pgs recovering
>> 113 pgs recovery_wait
>> 215 pgs stuck degraded
>> 266 pgs stuck unclean
>> 140 pgs stuck undersized
>> 151 pgs undersized
>> recovery 37788/2327775 objects degraded (1.623%)
>> recovery 23854/2327775 objects misplaced (1.025%)
>> noout,noin flag(s) set
>>  monmap e21: 3 mons at 
>> {v1a=10.0.0.1:6789/0,v1b=10.0.0.2:6789/0,v1c=10.0.0.3:6789/0}
>> election epoch 6160, quorum 0,1,2 v1a,v1b,v1c
>>   fsmap e817: 1/1/1 up {0=v1a=up:active}, 1 up:standby
>>  osdmap e76002: 26 osds: 26 up, 26 in; 185 remapped pgs
>> flags noout,noin,sortbitwise,require_jewel_osds
>>   pgmap v80995844: 3200 pgs, 4 pools, 2876 GB data, 757 kobjects
>> 9215 GB used, 35572 GB / 45365 GB avail
>> 37788/2327775 objects degraded (1.623%)
>> 23854/2327775 objects misplaced (1.025%)
>> 2912 active+clean
>>  130 active+undersized+degraded+remapped+wait_backfill
>>   97 active+recovery_wait+degraded
>>   29 active+remapped+wait_backfill
>>   12 active+recovery_wait+undersized+degraded+remapped
>>6 active+recovering+degraded
>>5 active+recovering+undersized+degraded+remapped
>>4 active+undersized+degraded+remapped+backfilling
>>4 active+recovery_wait+degraded+remapped
>>1 active+recovering+degraded+remapped
>>   client io 2026 B/s rd, 146 kB/s wr, 9 op/s rd, 21 op/s wr
>> 
>> 
>>  when I restart affected OSDs, it bumps the recovery, but then another
>> PGs get stuck.. All OSDs were restarted multiple times, none are even close 
>> to
>> nearfull, I just cant find what I'm doing wrong..
>> 
>> possibly related OSD options:
>> 
>> osd max backfills = 4
>> osd recovery max active = 15
>> debug osd = 0/0
>> osd op threads = 4
>> osd backfill scan min = 4
>> osd backfill scan max = 16
>> 
>> Any hints would be greatly appreciated
>> 
>> thanks
>> 
>> nik
>> 
>> 
>> -- 
>> -
>> Ing. Nikola CIPRICH
>> LinuxBox.cz, s.r.o.
>> 28.rijna 168, 709 00 Ostrava
>> 
>> tel.:   +420 591 166 214
>> fax:+420 596 621 273
>> mobil:  +420 777 093 799
>> www.linuxbox.cz
>> 
>> mobil servis: +420 737 238 656
>> email servis: ser...@linuxbox.cz
>> -
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>
>-- 
>-
>Ing. Nikola CIPRICH
>LinuxBox.cz, s.r.o.
>28.rijna 168, 709 00 Ostrava
>
>tel.:   +420 591 166 214
>fax:+420 596 621 273
>mobil:  +420 777 093 799
>www.linuxbox.cz
>
>mobil servis: +420 737 238 656
>email servis: ser...@linuxbox.cz
>-
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich
I forgot to add that OSD daemons really seem to be idle, no disk
activity, no CPU usage.. it just looks to me like  some kind of
deadlock, as they were waiting for each other..

and so I'm trying to get last 1.5% of misplaced / degraded PGs
for almost a week..


On Fri, Jul 28, 2017 at 10:56:02AM +0200, Nikola Ciprich wrote:
> Hi,
> 
> I'm trying to find reason for strange recovery issues I'm seeing on
> our cluster..
> 
> it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
> across nodes. jewel 10.2.9
> 
> the problem is that after some disk replaces and data moves, recovery
> is progressing extremely slowly.. pgs seem to be stuck in 
> active+recovering+degraded
> state:
> 
> [root@v1d ~]# ceph -s
> cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
>  health HEALTH_WARN
> 159 pgs backfill_wait
> 4 pgs backfilling
> 259 pgs degraded
> 12 pgs recovering
> 113 pgs recovery_wait
> 215 pgs stuck degraded
> 266 pgs stuck unclean
> 140 pgs stuck undersized
> 151 pgs undersized
> recovery 37788/2327775 objects degraded (1.623%)
> recovery 23854/2327775 objects misplaced (1.025%)
> noout,noin flag(s) set
>  monmap e21: 3 mons at 
> {v1a=10.0.0.1:6789/0,v1b=10.0.0.2:6789/0,v1c=10.0.0.3:6789/0}
> election epoch 6160, quorum 0,1,2 v1a,v1b,v1c
>   fsmap e817: 1/1/1 up {0=v1a=up:active}, 1 up:standby
>  osdmap e76002: 26 osds: 26 up, 26 in; 185 remapped pgs
> flags noout,noin,sortbitwise,require_jewel_osds
>   pgmap v80995844: 3200 pgs, 4 pools, 2876 GB data, 757 kobjects
> 9215 GB used, 35572 GB / 45365 GB avail
> 37788/2327775 objects degraded (1.623%)
> 23854/2327775 objects misplaced (1.025%)
> 2912 active+clean
>  130 active+undersized+degraded+remapped+wait_backfill
>   97 active+recovery_wait+degraded
>   29 active+remapped+wait_backfill
>   12 active+recovery_wait+undersized+degraded+remapped
>6 active+recovering+degraded
>5 active+recovering+undersized+degraded+remapped
>4 active+undersized+degraded+remapped+backfilling
>4 active+recovery_wait+degraded+remapped
>1 active+recovering+degraded+remapped
>   client io 2026 B/s rd, 146 kB/s wr, 9 op/s rd, 21 op/s wr
> 
> 
>  when I restart affected OSDs, it bumps the recovery, but then another
> PGs get stuck.. All OSDs were restarted multiple times, none are even close to
> nearfull, I just cant find what I'm doing wrong..
> 
> possibly related OSD options:
> 
> osd max backfills = 4
> osd recovery max active = 15
> debug osd = 0/0
> osd op threads = 4
> osd backfill scan min = 4
> osd backfill scan max = 16
> 
> Any hints would be greatly appreciated
> 
> thanks
> 
> nik
> 
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich
Hi,

I'm trying to find reason for strange recovery issues I'm seeing on
our cluster..

it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
across nodes. jewel 10.2.9

the problem is that after some disk replaces and data moves, recovery
is progressing extremely slowly.. pgs seem to be stuck in 
active+recovering+degraded
state:

[root@v1d ~]# ceph -s
cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
 health HEALTH_WARN
159 pgs backfill_wait
4 pgs backfilling
259 pgs degraded
12 pgs recovering
113 pgs recovery_wait
215 pgs stuck degraded
266 pgs stuck unclean
140 pgs stuck undersized
151 pgs undersized
recovery 37788/2327775 objects degraded (1.623%)
recovery 23854/2327775 objects misplaced (1.025%)
noout,noin flag(s) set
 monmap e21: 3 mons at 
{v1a=10.0.0.1:6789/0,v1b=10.0.0.2:6789/0,v1c=10.0.0.3:6789/0}
election epoch 6160, quorum 0,1,2 v1a,v1b,v1c
  fsmap e817: 1/1/1 up {0=v1a=up:active}, 1 up:standby
 osdmap e76002: 26 osds: 26 up, 26 in; 185 remapped pgs
flags noout,noin,sortbitwise,require_jewel_osds
  pgmap v80995844: 3200 pgs, 4 pools, 2876 GB data, 757 kobjects
9215 GB used, 35572 GB / 45365 GB avail
37788/2327775 objects degraded (1.623%)
23854/2327775 objects misplaced (1.025%)
2912 active+clean
 130 active+undersized+degraded+remapped+wait_backfill
  97 active+recovery_wait+degraded
  29 active+remapped+wait_backfill
  12 active+recovery_wait+undersized+degraded+remapped
   6 active+recovering+degraded
   5 active+recovering+undersized+degraded+remapped
   4 active+undersized+degraded+remapped+backfilling
   4 active+recovery_wait+degraded+remapped
   1 active+recovering+degraded+remapped
  client io 2026 B/s rd, 146 kB/s wr, 9 op/s rd, 21 op/s wr


 when I restart affected OSDs, it bumps the recovery, but then another
PGs get stuck.. All OSDs were restarted multiple times, none are even close to
nearfull, I just cant find what I'm doing wrong..

possibly related OSD options:

osd max backfills = 4
osd recovery max active = 15
debug osd = 0/0
osd op threads = 4
osd backfill scan min = 4
osd backfill scan max = 16

Any hints would be greatly appreciated

thanks

nik


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore wal / block db size

2017-07-28 Thread Tobias Rehn
Hey,

I am just playing around with luminous RC. As far as I can see it works nice.

Studying around I found the following discussion about wal and block db size: 
http://marc.info/?l=ceph-devel&m=149978799900866&w=2


Creating an osd with the following command:
ceph-deploy osd create --bluestore --block-db=/dev/sdj --block-wal=/dev/sdj 
osd01:/dev/sdb

creates wal of 576M and block db of 1G. In my scenario /dev/sdj is a SSD.


In the discussion mentioned above it is said that bluestore automatically roles 
rocksdb data over to the hdd when the block db gets full and performance 
decreases.

So what are good values for wal and blockdb? Is there any documentation for 
this? I can hardly find information on this topic.


Thank you.
Tobias

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com