Re: [ceph-users] a big cluster or several small

2018-05-15 Thread Piotr Dałek

On 18-05-14 06:49 PM, Marc Boisis wrote:


Hi,

Hello,
Currently we have a 294 OSD (21 hosts/3 racks) cluster with RBD clients 
only, 1 single pool (size=3).


We want to divide this cluster into several to minimize the risk in case of 
failure/crash.
For example, a cluster for the mail, another for the file servers, a test 
cluster ...

Do you think it's a good idea ?


If reliability and data availability is your main concern, and you don't 
share data between clusters - yes.


Do you have experience feedback on multiple clusters in production on the 
same hardware:

- containers (LXD or Docker)
- multiple cluster on the same host without virtualization (with ceph-deploy 
... --cluster ...)

- multilple pools
...

Do you have any advice?


We're using containers to host OSDs, but we don't host multiple clusters on 
same machine (in other words, single physical machine hosts containers for 
one and the same cluster). We're using Ceph for RBD images, so having 
multiple clusters isn't a problem for us.


Our main reason for using multiple clusters is that Ceph has a bad 
reliability history when scaling up and even now there are many issues 
unresolved (https://tracker.ceph.com/issues/21761 for example) so by 
dividing single, large cluster into few smaller ones, we reduce the impact 
for customers when things go fatally wrong - when one cluster goes down or 
it's performance is on single ESDI drive level due to recovery, other 
clusters - and their users - are unaffected. For us this already proved 
useful in the past.


--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph's UID/GID 65045 in conflict with user's UID/GID in a ldap

2018-05-15 Thread John Hearns
Hello Yoann. I am working with similar issues at the moment in a biotech
company in Denmark.

First of all what authentication setup are you using?
If you are using sssd there is a very simple and useful utility called
sss_override
You can óverride' the uid which you get from LDAP with the genuine one.

Oops. On reading your email more closely.
Why not just add ceph to your /etc/group  file?





On 15 May 2018 at 08:58, Yoann Moulin  wrote:

> Hello,
>
> I'm facing an issue with ceph's UID/GID 65045 on an LDAPized server, I
> have to install ceph-common to mount a cephfs filesystem but ceph-common
> fails because a user with uid 65045 already exist with a group also set at
> 65045.
>
> Server under Ubuntu 16.04.4 LTS
>
> > Setting up ceph-common (12.2.5-1xenial) ...
> > Adding system user cephdone
> > Setting system user ceph properties..usermod: group 'ceph' does not exist
> > dpkg: error processing package ceph-common (--configure):
> >  subprocess installed post-installation script returned error exit
> status 6
>
> The user is correctly created but the group not.
>
> > # grep ceph /etc/passwd
> > ceph:x:64045:64045::/home/ceph:/bin/false
> > # grep ceph /etc/group
> > #
> Is there a workaround for that?
>
> --
> Yoann Moulin
> EPFL IC-IT
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph's UID/GID 65045 in conflict with user's UID/GID in a ldap

2018-05-15 Thread Yoann Moulin
Hello John,

> Hello Yoann. I am working with similar issues at the moment in a biotech 
> company in Denmark.
> 
> First of all what authentication setup are you using?

ldap with sssd

> If you are using sssd there is a very simple and useful utility called 
> sss_override
> You can óverride' the uid which you get from LDAP with the genuine one.

That's one of the option, I'm just asking if there are others or simpler 
solution.

> Oops. On reading your email more closely.
> Why not just add ceph to your /etc/group  file?

I tried but there is some side effect.

I gave a look to the postinst script in ceph-common and I may find a way to fix 
this issue :

> # Let the admin override these distro-specified defaults.  This is NOT
> # recommended!
> [ -f "/etc/default/ceph" ] && . /etc/default/ceph
> 
> [ -z "$SERVER_HOME" ] && SERVER_HOME=/var/lib/ceph
> [ -z "$SERVER_USER" ] && SERVER_USER=ceph
> [ -z "$SERVER_NAME" ] && SERVER_NAME="Ceph storage service"
> [ -z "$SERVER_GROUP" ] && SERVER_GROUP=ceph
> [ -z "$SERVER_UID" ] && SERVER_UID=64045  # alloc by Debian base-passwd 
> maintainer
> [ -z "$SERVER_GID" ] && SERVER_GID=$SERVER_UID

I can change the SERVER_UID / SERVER_GID and or SERVER_USER

I'm gonna try to create a specific ceph user in the ldap and use it for ceph 
install.

Yoann


> On 15 May 2018 at 08:58, Yoann Moulin  > wrote:
> 
> Hello,
> 
> I'm facing an issue with ceph's UID/GID 65045 on an LDAPized server, I 
> have to install ceph-common to mount a cephfs filesystem but ceph-common
> fails because a user with uid 65045 already exist with a group also set 
> at 65045.
> 
> Server under Ubuntu 16.04.4 LTS
> 
> > Setting up ceph-common (12.2.5-1xenial) ...
> > Adding system user cephdone
> > Setting system user ceph properties..usermod: group 'ceph' does not 
> exist
> > dpkg: error processing package ceph-common (--configure):
> >  subprocess installed post-installation script returned error exit 
> status 6
> 
> The user is correctly created but the group not.
> 
> > # grep ceph /etc/passwd           
> > ceph:x:64045:64045::/home/ceph:/bin/false
> > # grep ceph /etc/group
> > #
> Is there a workaround for that?
> 
> -- 
> Yoann Moulin
> EPFL IC-IT
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 


-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd feature map fail

2018-05-15 Thread xiang . dai
Hi, all! 

I use rbd to do something and find below issue: 

when i create a rbd image with feature: 
layering,exclusive-lock,object-map,fast-diff 

failed to map: 
rbd: sysfs write failed 
RBD image feature set mismatch. Try disabling features unsupported by the 
kernel with "rbd feature disable". 
In some cases useful info is found in syslog - try "dmesg | tail". 
rbd: map failed: (6) No such device or address 

dmesg | tail: 
[960284.869596] rbd: rbd0: capacity 107374182400 features 0x5 
[960310.908615] libceph: mon1 10.0.10.12:6789 session established 
[960310.908916] libceph: client21459 fsid fe308030-ae94-471a-8d52-2c12151262fc 
[960310.911729] rbd: image foo: image uses unsupported features: 0x18 
[960337.946856] libceph: mon1 10.0.10.12:6789 session established 
[960337.947320] libceph: client21465 fsid fe308030-ae94-471a-8d52-2c12151262fc 
[960337.950116] rbd: image foo: image uses unsupported features: 0x8 
[960346.248676] libceph: mon0 10.0.10.11:6789 session established 
[960346.249077] libceph: client21866 fsid fe308030-ae94-471a-8d52-2c12151262fc 
[960346.254145] rbd: rbd0: capacity 107374182400 features 0x5 

If i just create layering image, map is ok. 

*The question is here:* 

Then i enable feature: 
exclusive-lock,object-map,fast-diff 

It works. 

And rbd info shows all feature i set. 

I think it is a bug: 

why create with those feature then map failed but map after create is ok? 
I think it is more than order question. 

My OS is CentOS Linux release 7.4.1708 (Core), kernel is 3.10.0-693.el7.x86_64. 

Ceph version is 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous 
(stable) 

Thanks 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cache Tiering not flushing and evicting due to missing scrub

2018-05-15 Thread Micha Krause

Hi,

increasing pg_num for a cache pool gives you a warning, that pools must be 
scrubed afterwards.

Turns out If you ignore this flushing and evicting will not work.

You realy should do something like this:

for pg in $(ceph pg dump | awk '$1 ~ "^." { print $1 }'); do ceph pg 
scrub $pg; done

After just a few seconds my pool started flushing and evicting again.

Micha Krause
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] which kernel support object-map, fast-diff

2018-05-15 Thread xiang . dai
Hi, all! 

I use Centos 7.4 and want to use ceph rbd. 

I found that object-map, fast-diff can not work. 

rbd image 'app': 
size 500 GB in 128000 objects 
order 22 (4096 kB objects) 
block_name_prefix: rbd_data.10a2643c9869 
format: 2 
features: layering, exclusive-lock, object-map, fast-diff <=== 
flags: object map invalid, fast diff invalid 

Ceph version is 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous 
(stable) 
Kernel: 3.10.0-693.el7.x86_64 

So which kernel version support those feature? 

I do not find answer on ceph docs. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which kernel support object-map, fast-diff

2018-05-15 Thread Konstantin Shalygin

So which kernel version support those feature?



No one kernel support this features yet.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which kernel support object-map, fast-diff

2018-05-15 Thread xiang....@sky-data.cn
Could give a list about enable or not?

- Original Message -
From: "Konstantin Shalygin" 
To: "ceph-users" 
Cc: "xiang dai" 
Sent: Tuesday, May 15, 2018 4:57:00 PM
Subject: Re: [ceph-users] which kernel support object-map, fast-diff

> So which kernel version support those feature?


No one kernel support this features yet.



k
-- 
戴翔 
南京天数信息科技有限公司 
电话: +86 1 3382776490 
公司官网: www.sky-data.cn 
免费使用天数润科智能计算平台 SkyDiscovery
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-15 Thread Wido den Hollander


On 05/14/2018 04:46 PM, Nick Fisk wrote:
> Hi Wido,
> 
> Are you trying this setting?
> 
> /sys/devices/system/cpu/intel_pstate/min_perf_pct
> 

Yes, but that doesn't help. I can set it to 80, 100 or any value I like,
the CPUs keep clocking down to 800Mhz.

At first I was having some issues with getting intel_pstate loaded, but
with 4.16 it loaded without any problems, but still, CPUs keep clocking
down.

Wido

> 
> 
> -Original Message-
> From: ceph-users  On Behalf Of Wido den
> Hollander
> Sent: 14 May 2018 14:14
> To: n...@fisk.me.uk; 'Blair Bethwaite' 
> Cc: 'ceph-users' 
> Subject: Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on
> NVMe/SSD Ceph OSDs
> 
> 
> 
> On 05/01/2018 10:19 PM, Nick Fisk wrote:
>> 4.16 required?
>> https://www.phoronix.com/scan.php?page=news_item&px=Skylake-X-P-State-
>> Linux-
>> 4.16
>>
> 
> I've been trying with the 4.16 kernel for the last few days, but still, it's
> not working.
> 
> The CPU's keep clocking down to 800Mhz
> 
> I've set scaling_min_freq=scaling_max_freq in /sys, but that doesn't change
> a thing. The CPUs keep scaling down.
> 
> Still not close to the 1ms latency with these CPUs :(
> 
> Wido
> 
>>
>> -Original Message-
>> From: ceph-users  On Behalf Of 
>> Blair Bethwaite
>> Sent: 01 May 2018 16:46
>> To: Wido den Hollander 
>> Cc: ceph-users ; Nick Fisk 
>> 
>> Subject: Re: [ceph-users] Intel Xeon Scalable and CPU frequency 
>> scaling on NVMe/SSD Ceph OSDs
>>
>> Also curious about this over here. We've got a rack's worth of R740XDs 
>> with Xeon 4114's running RHEL 7.4 and intel-pstate isn't even active 
>> on them, though I don't believe they are any different at the OS level 
>> to our Broadwell nodes (where it is loaded).
>>
>> Have you tried poking the kernel's pmqos interface for your use-case?
>>
>> On 2 May 2018 at 01:07, Wido den Hollander  wrote:
>>> Hi,
>>>
>>> I've been trying to get the lowest latency possible out of the new 
>>> Xeon Scalable CPUs and so far I got down to 1.3ms with the help of Nick.
>>>
>>> However, I can't seem to pin the CPUs to always run at their maximum 
>>> frequency.
>>>
>>> If I disable power saving in the BIOS they stay at 2.1Ghz (Silver 
>>> 4110), but that disables the boost.
>>>
>>> With the Power Saving enabled in the BIOS and when giving the OS all 
>>> control for some reason the CPUs keep scaling down.
>>>
>>> $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>>
>>> cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009 Report 
>>> errors and bugs to cpuf...@vger.kernel.org, please.
>>> analyzing CPU 0:
>>>   driver: intel_pstate
>>>   CPUs which run at the same hardware frequency: 0
>>>   CPUs which need to have their frequency coordinated by software: 0
>>>   maximum transition latency: 0.97 ms.
>>>   hardware limits: 800 MHz - 3.00 GHz
>>>   available cpufreq governors: performance, powersave
>>>   current policy: frequency should be within 800 MHz and 3.00 GHz.
>>>   The governor "performance" may decide which speed to
> use
>>>   within this range.
>>>   current CPU frequency is 800 MHz.
>>>
>>> I do see the CPUs scale up to 2.1Ghz, but they quickly scale down 
>>> again to 800Mhz and that hurts latency. (50% difference!)
>>>
>>> With the CPUs scaling down to 800Mhz my latency jumps from 1.3ms to 
>>> 2.4ms on avg. With turbo enabled I hope to get down to 1.1~1.2ms on avg.
>>>
>>> $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>>> performance
>>>
>>> Everything seems to be OK and I would expect the CPUs to stay at 
>>> 2.10Ghz, but they aren't.
>>>
>>> C-States are also pinned to 0 as a boot parameter for the kernel:
>>>
>>> processor.max_cstate=1 intel_idle.max_cstate=0
>>>
>>> Running Ubuntu 16.04.4 with the 4.13 kernel from the HWE from Ubuntu.
>>>
>>> Has anybody tried this yet with the recent Intel Xeon Scalable CPUs?
>>>
>>> Thanks,
>>>
>>> Wido
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Cheers,
>> ~Blairo
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group

2018-05-15 Thread Siegfried Höllrigl



Hi !

We have upgraded our Ceph cluster (3 Mon Servers, 9 OSD Servers, 190 
OSDs total) From 10.2.10 to Ceph 12.2.4 and then to 12.2.5.
(A mixture of Ubuntu 14 and 16 with the Repos from 
https://download.ceph.com/debian-luminous/)


Now we have the Problem that One ODS is crashing again and again 
(approx. once per day). systemd restarts it.


We could now propably identify the problem. It looks like one placement 
group (5.9b) causes the crash.
It seems like it doesnt matter if it is running on a filestore or a 
bluestore osd.

We could even break it down to some RBDs that were in this pool.
They are already deleted, but it looks like there are some objects on 
the osd left, but we cant delete them :



rados -p rbd ls > radosrbdls.txt
echo radosrbdls.txt | grep -vE "($(rados -p rbd ls | grep rbd_header | 
grep -o "\.[0-9a-f]*" | sed -e :a -e '$!N; s/\n/|/; ta' -e 
's/\./\\./g'))" | grep -E '(rbd_data|journal|rbd_object_map)'

rbd_data.112913b238e1f29.0e3f
rbd_data.112913b238e1f29.09d2
rbd_data.112913b238e1f29.0ba3

rados -p rbd rm rbd_data.112913b238e1f29.0e3f
error removing rbd>rbd_data.112913b238e1f29.0e3f: (2) No 
such file or directory

rados -p rbd rm rbd_data.112913b238e1f29.09d2
error removing rbd>rbd_data.112913b238e1f29.09d2: (2) No 
such file or directory

rados -p rbd rm rbd_data.112913b238e1f29.0ba3
error removing rbd>rbd_data.112913b238e1f29.0ba3: (2) No 
such file or directory


In the "current" directory of the osd there are a lot more files with 
this rbd prefix.
Is there any chance to delete these obviously orpahed stuff before the 
pg becomes healthy ?

(it is running now at only 2 of 3 osds)

What else could cause such a crash ?


We attatch (hopefully all) of the relevant logs.



  -103> 2018-05-14 13:01:50.514850 7f389894c700  5 -- 10.7.2.141:6801/139719 >> 
10.7.2.49:0/2866 conn(0x55a13fd0d000 :6801 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=453 cs=1 l=1). rx osd.60 seq 
2720 0x55a13e7bac00 osd_ping(ping e502962 stamp 2018-05-14 13:01:50.511610) v4
  -102> 2018-05-14 13:01:50.514878 7f389894c700  1 -- 10.7.2.141:6801/139719 
<== osd.60 10.7.2.49:0/2866 2720  osd_ping(ping e502962 stamp 2018-05-14 
13:01:50.511610) v4  2004+0+0 (1134770966 0 0) 0x55a13e7bac00 con 
0x55a13fd0d000
  -101> 2018-05-14 13:01:50.514896 7f389894c700  1 -- 10.7.2.141:6801/139719 
--> 10.7.2.49:0/2866 -- osd_ping(ping_reply e502962 stamp 2018-05-14 
13:01:50.511610) v4 -- 0x55a13fd27200 con 0
  -100> 2018-05-14 13:01:50.525876 7f389894c700  5 -- 10.7.2.141:6801/139719 >> 
10.7.2.144:0/2988 conn(0x55a13f2dd000 :6801 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=865 cs=1 l=1). rx osd.179 seq 
2652 0x55a13e442600 osd_ping(ping e502962 stamp 2018-05-14 13:01:50.531899) v4
   -99> 2018-05-14 13:01:50.525902 7f389894c700  1 -- 10.7.2.141:6801/139719 
<== osd.179 10.7.2.144:0/2988 2652  osd_ping(ping e502962 stamp 2018-05-14 
13:01:50.531899) v4  2004+0+0 (3454691771 0 0) 0x55a13e442600 con 
0x55a13f2dd000
   -98> 2018-05-14 13:01:50.525917 7f389894c700  1 -- 10.7.2.141:6801/139719 
--> 10.7.2.144:0/2988 -- osd_ping(ping_reply e502962 stamp 2018-05-14 
13:01:50.531899) v4 -- 0x55a13fd27200 con 0
   -97> 2018-05-14 13:01:50.526649 7f389914d700  5 -- 10.0.0.28:6801/139719 >> 
10.0.0.24:0/2988 conn(0x55a13f2de800 :6801 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=869 cs=1 l=1). rx osd.179 seq 
2652 0x55a17bd8a200 osd_ping(ping e502962 stamp 2018-05-14 13:01:50.531899) v4
   -96> 2018-05-14 13:01:50.526675 7f389914d700  1 -- 10.0.0.28:6801/139719 <== 
osd.179 10.0.0.24:0/2988 2652  osd_ping(ping e502962 stamp 2018-05-14 
13:01:50.531899) v4  2004+0+0 (3454691771 0 0) 0x55a17bd8a200 con 
0x55a13f2de800
   -95> 2018-05-14 13:01:50.526688 7f389914d700  1 -- 10.0.0.28:6801/139719 --> 
10.0.0.24:0/2988 -- osd_ping(ping_reply e502962 stamp 2018-05-14 
13:01:50.531899) v4 -- 0x55a13e43ec00 con 0
   -94> 2018-05-14 13:01:50.546508 7f389994e700  5 -- 10.7.2.141:6800/139719 >> 
10.7.2.50:6802/2519 conn(0x55a13e724000 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=18716 cs=1 l=0). rx osd.47 
seq 4894 0x55a13ec9d000 MOSDScrubReserve(3.111 REQUEST e502962) v1
   -93> 2018-05-14 13:01:50.546537 7f389994e700  1 -- 10.7.2.141:6800/139719 
<== osd.47 10.7.2.50:6802/2519 4894  MOSDScrubReserve(3.111 REQUEST 
e502962) v1  43+0+0 (327031511 0 0) 0x55a13ec9d000 con 0x55a13e724000
   -92> 2018-05-14 13:01:50.546655 7f3883138700  1 -- 10.7.2.141:6800/139719 
--> 10.7.2.50:6802/2519 -- MOSDScrubReserve(3.111 REJECT e502962) v1 -- 
0x55a13e8fd200 con 0
   -91> 2018-05-14 13:01:50.547685 7f389994e700  5 -- 10.7.2.141:6800/139719 >> 
10.7.2.50:6802/2519 conn(0x55a13e724000 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=18716 cs=1 l=0). rx osd.47 
seq 4895 0x55a13e8fd200 MOSDScrubReserve(3.111 RELEASE e502962) v1
   -90> 2018-05-14 13:01:50.547714 7f389994e700  1 -- 10.7

Re: [ceph-users] rbd feature map fail

2018-05-15 Thread Jason Dillaman
I believe this is documented by this tracker ticket [1].

[1] http://tracker.ceph.com/issues/11418

On Tue, May 15, 2018 at 1:07 AM,   wrote:
> Hi, all!
>
> I use rbd to do something and find below issue:
>
> when i create a rbd image with feature:
> layering,exclusive-lock,object-map,fast-diff
>
> failed to map:
> rbd: sysfs write failed
> RBD image feature set mismatch. Try disabling features unsupported by the
> kernel with "rbd feature disable".
> In some cases useful info is found in syslog - try "dmesg | tail".
> rbd: map failed: (6) No such device or address
>
> dmesg | tail:
> [960284.869596] rbd: rbd0: capacity 107374182400 features 0x5
> [960310.908615] libceph: mon1 10.0.10.12:6789 session established
> [960310.908916] libceph: client21459 fsid
> fe308030-ae94-471a-8d52-2c12151262fc
> [960310.911729] rbd: image foo: image uses unsupported features: 0x18
> [960337.946856] libceph: mon1 10.0.10.12:6789 session established
> [960337.947320] libceph: client21465 fsid
> fe308030-ae94-471a-8d52-2c12151262fc
> [960337.950116] rbd: image foo: image uses unsupported features: 0x8
> [960346.248676] libceph: mon0 10.0.10.11:6789 session established
> [960346.249077] libceph: client21866 fsid
> fe308030-ae94-471a-8d52-2c12151262fc
> [960346.254145] rbd: rbd0: capacity 107374182400 features 0x5
>
> If i just create layering image, map is ok.
>
> *The question is here:*
>
> Then i enable feature:
> exclusive-lock,object-map,fast-diff
>
> It works.
>
> And rbd info shows all feature i set.
>
> I think it is a bug:
>
> why create with those feature then map failed but map after create is ok?
> I think it is more than order question.
>
> My OS is CentOS Linux release 7.4.1708 (Core), kernel is
> 3.10.0-693.el7.x86_64.
>
> Ceph version is 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous
> (stable)
>
> Thanks
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd feature map fail

2018-05-15 Thread Ilya Dryomov
On Tue, May 15, 2018 at 10:07 AM,   wrote:
> Hi, all!
>
> I use rbd to do something and find below issue:
>
> when i create a rbd image with feature:
> layering,exclusive-lock,object-map,fast-diff
>
> failed to map:
> rbd: sysfs write failed
> RBD image feature set mismatch. Try disabling features unsupported by the
> kernel with "rbd feature disable".
> In some cases useful info is found in syslog - try "dmesg | tail".
> rbd: map failed: (6) No such device or address
>
> dmesg | tail:
> [960284.869596] rbd: rbd0: capacity 107374182400 features 0x5
> [960310.908615] libceph: mon1 10.0.10.12:6789 session established
> [960310.908916] libceph: client21459 fsid
> fe308030-ae94-471a-8d52-2c12151262fc
> [960310.911729] rbd: image foo: image uses unsupported features: 0x18
> [960337.946856] libceph: mon1 10.0.10.12:6789 session established
> [960337.947320] libceph: client21465 fsid
> fe308030-ae94-471a-8d52-2c12151262fc
> [960337.950116] rbd: image foo: image uses unsupported features: 0x8
> [960346.248676] libceph: mon0 10.0.10.11:6789 session established
> [960346.249077] libceph: client21866 fsid
> fe308030-ae94-471a-8d52-2c12151262fc
> [960346.254145] rbd: rbd0: capacity 107374182400 features 0x5
>
> If i just create layering image, map is ok.
>
> *The question is here:*
>
> Then i enable feature:
> exclusive-lock,object-map,fast-diff
>
> It works.
>
> And rbd info shows all feature i set.
>
> I think it is a bug:
>
> why create with those feature then map failed but map after create is ok?

Yes, it is a bug.  There is a patch pending from Dongsheng, so it will
be fixed in 4.18.

If you are told that these features are unsupported, you shouldn't be
looking for backdoor ways to enable them ;)

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-15 Thread Blair Bethwaite
Sorry, bit late to get back to this...

On Wed., 2 May 2018, 06:19 Nick Fisk,  wrote:

> 4.16 required?
>

Looks like it - thanks for pointing that out.

Wido, I don't think you are doing anything wrong here, maybe this is a
bug...

I've got RHEL7 + Broadwell based Ceph nodes here for which the same tuning
appears to be working fine:

-bash-4.2$ lsb_release -a
LSB Version::core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description:Red Hat Enterprise Linux Server release 7.3 (Maipo)
Release:7.3
Codename:   Maipo

-bash-4.2$ lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):20
On-line CPU(s) list:   0-19
Thread(s) per core:2
Core(s) per socket:10
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 79
Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Stepping:  1
CPU MHz:   2745.960
BogoMIPS:  4399.83
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  25600K
NUMA node0 CPU(s): 0-19

-bash-4.2$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-514.2.2.el7.x86_64
root=/dev/mapper/vg00-LogVol00 ro nofb splash=quiet crashkernel=auto
rd.lvm.lv=vg00/LogVol00 rd.lvm.lv=vg00/LogVol01 rhgb quiet LANG=en_US.UTF-8

-bash-4.2$ sudo cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 1.20 GHz - 3.10 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 1.20 GHz and 3.10 GHz.
  The governor "performance" may decide which speed to use
  within this range.
  current CPU frequency: 2.40 GHz (asserted by call to hardware)
  boost state support:
Supported: yes
Active: yes

-bash-4.2$ sudo cpupower -c 0-19 monitor
|Nehalem|| Mperf  || Idle_Stats
CPU | C3   | C6   | PC3  | PC6  || C0   | Cx   | Freq || POLL | C1-B | C1E-
| C3-B | C6-B
   0|  0.00|  0.00|  0.00|  0.00|| 20.93| 79.07|  2398||  1.00| 79.08|
0.00|  0.00|  0.00
  10|  0.00|  0.00|  0.00|  0.00||  1.81| 98.19|  2398||  0.00| 98.23|
0.00|  0.00|  0.00
   1|  0.00|  0.00|  0.00|  0.00||  3.80| 96.20|  2398||  2.10| 96.21|
0.00|  0.00|  0.00
  11|  0.00|  0.00|  0.00|  0.00||  7.95| 92.05|  2398||  7.59| 92.06|
0.00|  0.00|  0.00
   2|  0.00|  0.00|  0.00|  0.00||  1.99| 98.01|  2398||  0.00| 98.04|
0.00|  0.00|  0.00
  12|  0.00|  0.00|  0.00|  0.00||  1.59| 98.41|  2398||  0.64| 98.42|
0.00|  0.00|  0.00
   3|  0.00|  0.00|  0.00|  0.00|| 24.58| 75.42|  2398||  0.00| 75.43|
0.00|  0.00|  0.00
  13|  0.00|  0.00|  0.00|  0.00||  1.66| 98.34|  2399||  0.24| 98.35|
0.00|  0.00|  0.00
   4|  0.00|  0.00|  0.00|  0.00||  1.36| 98.64|  2398||  0.00| 98.65|
0.00|  0.00|  0.00
  14|  0.00|  0.00|  0.00|  0.00||  1.95| 98.05|  2398||  0.77| 98.06|
0.00|  0.00|  0.00
   5|  0.00|  0.00|  0.00|  0.00||  1.39| 98.61|  2398||  0.00| 98.64|
0.00|  0.00|  0.00
  15|  0.00|  0.00|  0.00|  0.00||  8.33| 91.67|  2398||  7.80| 91.68|
0.00|  0.00|  0.00
   6|  0.00|  0.00|  0.00|  0.00||  1.48| 98.52|  2398||  0.00| 98.54|
0.00|  0.00|  0.00
  16|  0.00|  0.00|  0.00|  0.00||  2.44| 97.56|  2398||  1.73| 97.57|
0.00|  0.00|  0.00
   7|  0.00|  0.00|  0.00|  0.00||  2.13| 97.87|  2398||  0.64| 97.88|
0.00|  0.00|  0.00
  17|  0.00|  0.00|  0.00|  0.00||  1.03| 98.97|  2398||  0.24| 98.93|
0.00|  0.00|  0.00
   8|  0.00|  0.00|  0.00|  0.00||  1.43| 98.57|  2398||  0.00| 98.61|
0.00|  0.00|  0.00
  18|  0.00|  0.00|  0.00|  0.00||  1.58| 98.42|  2398||  0.00| 98.45|
0.00|  0.00|  0.00
   9|  0.00|  0.00|  0.00|  0.00||  1.66| 98.34|  2398||  0.00| 98.35|
0.00|  0.00|  0.00
  19|  0.00|  0.00|  0.00|  0.00||  1.04| 98.96|  2398||  0.00| 98.93|
0.00|  0.00|  0.00

-bash-4.2$ sudo /opt/dell/srvadmin/bin/omreport chassis biossetup | egrep
-i "c state|turbo"
Dell Controlled Turbo   : Disabled
Turbo Boost : Enabled
Energy Efficient Turbo  : Disabled
C States: Disabled
Number of Turbo Boost Enabled Cores for Processor 1 : All

-bash-4.2$ sudo tail /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
| grep ^2
2399976
2399890
2399976
2399976
2399976
2399804
2399976
2399976
2400062
2399976
2399976
2399890
2399976
2400062
2399976
2399976
2399804
2399890
2399976
2399890

We didn't manage to get this level of consistency until we
used /dev/cpu_dma_latency (see
https://www.kernel.org/doc/Documentation/power/pm_qos_interface.txt) via
tuned:

-bash-4.2$ sudo tuned-adm active
Current activ

Re: [ceph-users] Cephfs write fail when node goes down

2018-05-15 Thread Josef Zelenka
Client's kernel is 4.4.0. Regarding the hung osd request, i'll have to 
check, the issue is gone now, so i'm not sure if i'll find what you are 
suggesting. It's rather odd, because Ceph's failover worked for us every 
time, so i'm trying to figure out whether it is a ceph or app issue.



On 15/05/18 02:57, Yan, Zheng wrote:

On Mon, May 14, 2018 at 5:37 PM, Josef Zelenka
 wrote:

Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48
OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0). Yesterday,
we were doing a HW upgrade of the nodes, so they went down one by one - the
cluster was in good shape during the upgrade, as we've done this numerous
times and we're quite sure that the redundancy wasn't screwed up while doing
this. However, during this upgrade one of the clients that does backups to
cephfs(mounted via the kernel driver) failed to write the backup file
correctly to the cluster with the following trace after we turned off one of
the nodes:

[2585732.529412]  8800baa279a8 813fb2df 880236230e00
8802339c
[2585732.529414]  8800baa28000 88023fc96e00 7fff
8800baa27b20
[2585732.529415]  81840ed0 8800baa279c0 818406d5

[2585732.529417] Call Trace:
[2585732.529505]  [] ? cpumask_next_and+0x2f/0x40
[2585732.529558]  [] ? bit_wait+0x60/0x60
[2585732.529560]  [] schedule+0x35/0x80
[2585732.529562]  [] schedule_timeout+0x1b5/0x270
[2585732.529607]  [] ? kvm_clock_get_cycles+0x1e/0x20
[2585732.529609]  [] ? bit_wait+0x60/0x60
[2585732.529611]  [] io_schedule_timeout+0xa4/0x110
[2585732.529613]  [] bit_wait_io+0x1b/0x70
[2585732.529614]  [] __wait_on_bit_lock+0x4e/0xb0
[2585732.529652]  [] __lock_page+0xbb/0xe0
[2585732.529674]  [] ? autoremove_wake_function+0x40/0x40
[2585732.529676]  [] pagecache_get_page+0x17d/0x1c0
[2585732.529730]  [] ? ceph_pool_perm_check+0x48/0x700
[ceph]
[2585732.529732]  [] grab_cache_page_write_begin+0x26/0x40
[2585732.529738]  [] ceph_write_begin+0x48/0xe0 [ceph]
[2585732.529739]  [] generic_perform_write+0xce/0x1c0
[2585732.529763]  [] ? file_update_time+0xc9/0x110
[2585732.529769]  [] ceph_write_iter+0xf89/0x1040 [ceph]
[2585732.529792]  [] ? __alloc_pages_nodemask+0x159/0x2a0
[2585732.529808]  [] new_sync_write+0x9b/0xe0
[2585732.529811]  [] __vfs_write+0x26/0x40
[2585732.529812]  [] vfs_write+0xa9/0x1a0
[2585732.529814]  [] SyS_write+0x55/0xc0
[2585732.529817]  [] entry_SYSCALL_64_fastpath+0x16/0x71



is there any hang osd request in /sys/kernel/debug/ceph//osdc?


I have encountered this behavior on Luminous, but not on Jewel. Anyone who
has a clue why the write fails? As far as i'm concerned, it should always
work if all the PGs are available. Thanks
Josef

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD bench read performance vs rados bench

2018-05-15 Thread Jorge Pinilla López
rbd bench --io-type read 2tb/test --io-size 4M
bench  type read io_size 4194304 io_threads 16 bytes 1073741824 pattern 
sequential
  SEC   OPS   OPS/SEC   BYTES/SEC
123 36.13  151560621.45
243 28.61  119988170.65
354 23.02  96555723.10
476 22.35  93748581.57
586 20.31  85202745.83
6   102 15.73  65987931.41
7   113 13.72  57564529.85
8   115 12.13  50895409.80
9   138 12.62  52950797.01
   10   144 11.37  47688526.04
   11   154  9.59  40232628.73
   12   161  9.51  39882023.45
   13   167 10.30  43195718.39
   14   172  6.57  27570654.19
   15   181  7.21  30224357.89
   16   186  7.08  29692318.46
   17   192  6.31  26457629.12
   18   197  6.03  25286212.14
   19   202  6.22  26097739.41
   20   210  5.82  24406336.22
   21   217  6.05  25354976.24
   22   224  6.15  25785754.73
   23   231  6.84  28684892.86
   24   237  6.86  28760546.77
elapsed:26  ops:  256  ops/sec: 9.58  bytes/sec: 40195235.45


rados -p 2tb bench 10 seq
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
0   0 0 0 0 0   -   0
1  165842   167.965   1680.164905 0.25338
2  169781   161.969   156   0.03173690.315989
3  16   135   119158.64   1520.1338470.349598
4  16   180   164   163.975   180   0.05118050.354751
5  16   229   213   170.375   1960.2457270.342972
6  16   276   260   173.268   1880.0320290.344167
7  16   326   310   177.082   2000.4896630.336684
8  16   376   360   179.944   200   0.04585360.330955
9  16   422   406   180.391   1840.2475510.336771
   10  16   472   456   182.349   200 1.289010.334343
Total time run:   10.522668
Total reads made: 473
Read size:4194304
Object size:  4194304
Bandwidth (MB/sec):   179.802
Average IOPS: 44
Stddev IOPS:  4
Max IOPS: 50
Min IOPS: 38
Average Latency(s):   0.350895
Max latency(s):   1.61122
Min latency(s):   0.0317369

rados bench -p 2tb 10 rand
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
0   0 0 0 0 0   -   0
1  15   127   112   447.903   4480.1097420.104891
2  16   240   224   447.897   4480.3340960.131679
3  15   327   312   415.918   3520.1093330.146387
4  15   479   464   463.913   6080.1793710.133533
5  15   640   625   499.912   644   0.05895280.124145
6  15   808   793   528.576   6720.1481730.117483
7  16   975   959   547.909   664   0.01193220.112975
8  15  1129  1114556.91   620 0.136460.111279
9  15  1294  1279   568.353   660   0.08201290.109182
   10  15  1456  1441   576.307   648 0.110070.107887
Total time run:   10.106389
Total reads made: 1457
Read size:4194304
Object size:  4194304
Bandwidth (MB/sec):   576.665
Average IOPS: 144
Stddev IOPS:  28
Max IOPS: 168
Min IOPS: 88
Average Latency(s):   0.108051
Max latency(s):   0.998451
Min latency(s):   0.00858933


Total time run:   3.478728
Total reads made: 582
Read size:4194304
Object size:  4194304
Bandwidth (MB/sec):   669.21
Average IOPS: 167
Stddev IOPS:  6
Max IOPS: 176
Min IOPS: 163
Average Latency(s):   0.0919296
Max latency(s):   0.297326
Min latency(s):   0.0090395

Just to get in context, I have a 3 node cluster, replica 3 min size 2.
There are only 3 OSDs in the pool, one on each cluster (for benchmark) 
All nodes are connected through 4x10Gbps (2 for public network and 2 for 
private network)
There are no other clients running
Configuration is the default
Imagen is 20GB big, the disks are 2TB big, there are 125 PGs in the pool


I wonder why there is such a huge difference between RBD seq benchmark with 4M 
io size and 16 threads and the Rados sequential Benchmark with the same Object 
size, Rados benchmark makes a lot of sense when you can read from multiple 
OSDs simultaneously but RBD read performance is really bad.

On writes both rbd and rados have similar speed

Any advice?

Other question, why are random reads faster than sequential reads?

Thanks a lot.
Jorge Pinilla Lóp

[ceph-users] RBD imagen-level permissions

2018-05-15 Thread Jorge Pinilla López
Hey, I would like to know if there is any way on luminous to set 
imagen-level permissions per user instead of pool-level. If I only have 
pool level, then I could have 1 not-secured pool with clients accession 
any rbd or hundreds of little pools which are a mess.

I have read than previously some people used object_prefix to allow the 
user only to read and write the imagen objects, is that still possible?

On the official master documentation about users permissions, namespaces 
are mention but not object_prefix, I have also seen that namespaces on 
rbd is a future feature, what is the current status of the feature?, is 
there any release date or version?

Until namespaces feature is implemented on rbd, I would like to know if 
there is any work-around to achive the same functionality.

Thanks
Jorge Pinilla López


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Single ceph cluster for the object storage service of 2 OpenStack clouds

2018-05-15 Thread Massimo Sgaravatto
Hi

I have been using for a while a single ceph cluster for the image and block
storage services of two Openstack clouds.

Now I want to use this ceph cluster also for the object storage services of
the two OpenStack clouds and I want to implement that having a clear
separation between the two clouds. In particular I want different ceph
pools for the two Clouds.

My understanding is that this can be done:

- creating 2 realms (one for each cloud)
- creating one zonegroup for each realm
- creating one zone for each zonegroup
- having 1 ore more rgw instances for each zone

Did I get it right ?

Thanks, Massimo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD imagen-level permissions

2018-05-15 Thread Jason Dillaman
On Tue, May 15, 2018 at 6:27 AM, Jorge Pinilla López  wrote:
> Hey, I would like to know if there is any way on luminous to set
> imagen-level permissions per user instead of pool-level. If I only have
> pool level, then I could have 1 not-secured pool with clients accession
> any rbd or hundreds of little pools which are a mess.

If you search the mailing list, there are some examples of per-image
caps where a user is only granted access to "rbd_header.XYZ",
"rbd_data.XYZ", and "rbd_id.IMAGENAME" objects using the object_prefix
restriction (requires v2 image format -- as you should already be
using). It's not really a scalable solution given the manual nature of
generating the caps and the linear search nature in which objects are
validated against a user's caps.

> I have read than previously some people used object_prefix to allow the
> user only to read and write the imagen objects, is that still possible?
>
> On the official master documentation about users permissions, namespaces
> are mention but not object_prefix, I have also seen that namespaces on
> rbd is a future feature, what is the current status of the feature?, is
> there any release date or version?
>
> Until namespaces feature is implemented on rbd, I would like to know if
> there is any work-around to achive the same functionality.

Adding support for namespaces to librbd/krbd is currently one of our
high-priority items for the next release (Nautilus).

> Thanks
> Jorge Pinilla López
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD bench read performance vs rados bench

2018-05-15 Thread Jason Dillaman
On Tue, May 15, 2018 at 6:23 AM, Jorge Pinilla López  wrote:
> rbd bench --io-type read 2tb/test --io-size 4M
> bench  type read io_size 4194304 io_threads 16 bytes 1073741824 pattern
> sequential
>   SEC   OPS   OPS/SEC   BYTES/SEC
> 123 36.13  151560621.45
> 243 28.61  119988170.65
> 354 23.02  96555723.10
> 476 22.35  93748581.57
> 586 20.31  85202745.83
> 6   102 15.73  65987931.41
> 7   113 13.72  57564529.85
> 8   115 12.13  50895409.80
> 9   138 12.62  52950797.01
>10   144 11.37  47688526.04
>11   154  9.59  40232628.73
>12   161  9.51  39882023.45
>13   167 10.30  43195718.39
>14   172  6.57  27570654.19
>15   181  7.21  30224357.89
>16   186  7.08  29692318.46
>17   192  6.31  26457629.12
>18   197  6.03  25286212.14
>19   202  6.22  26097739.41
>20   210  5.82  24406336.22
>21   217  6.05  25354976.24
>22   224  6.15  25785754.73
>23   231  6.84  28684892.86
>24   237  6.86  28760546.77
> elapsed:26  ops:  256  ops/sec: 9.58  bytes/sec: 40195235.45

What are your results if you re-run with the in-memory cache disabled
(i.e. 'rbd bench --rbd-cache=false ')?

> rados -p 2tb bench 10 seq
> hints = 1
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> 0   0 0 0 0 0   -   0
> 1  165842   167.965   1680.164905 0.25338
> 2  169781   161.969   156   0.03173690.315989
> 3  16   135   119158.64   1520.1338470.349598
> 4  16   180   164   163.975   180   0.05118050.354751
> 5  16   229   213   170.375   1960.2457270.342972
> 6  16   276   260   173.268   1880.0320290.344167
> 7  16   326   310   177.082   2000.4896630.336684
> 8  16   376   360   179.944   200   0.04585360.330955
> 9  16   422   406   180.391   1840.2475510.336771
>10  16   472   456   182.349   200 1.289010.334343
> Total time run:   10.522668
> Total reads made: 473
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   179.802
> Average IOPS: 44
> Stddev IOPS:  4
> Max IOPS: 50
> Min IOPS: 38
> Average Latency(s):   0.350895
> Max latency(s):   1.61122
> Min latency(s):   0.0317369
>
> rados bench -p 2tb 10 rand
> hints = 1
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> 0   0 0 0 0 0   -   0
> 1  15   127   112   447.903   4480.1097420.104891
> 2  16   240   224   447.897   4480.3340960.131679
> 3  15   327   312   415.918   3520.1093330.146387
> 4  15   479   464   463.913   6080.1793710.133533
> 5  15   640   625   499.912   644   0.05895280.124145
> 6  15   808   793   528.576   6720.1481730.117483
> 7  16   975   959   547.909   664   0.01193220.112975
> 8  15  1129  1114556.91   620 0.136460.111279
> 9  15  1294  1279   568.353   660   0.08201290.109182
>10  15  1456  1441   576.307   648 0.110070.107887
> Total time run:   10.106389
> Total reads made: 1457
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   576.665
> Average IOPS: 144
> Stddev IOPS:  28
> Max IOPS: 168
> Min IOPS: 88
> Average Latency(s):   0.108051
> Max latency(s):   0.998451
> Min latency(s):   0.00858933
>
>
> Total time run:   3.478728
> Total reads made: 582
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   669.21
> Average IOPS: 167
> Stddev IOPS:  6
> Max IOPS: 176
> Min IOPS: 163
> Average Latency(s):   0.0919296
> Max latency(s):   0.297326
> Min latency(s):   0.0090395
>
> Just to get in context, I have a 3 node cluster, replica 3 min size 2.
> There are only 3 OSDs in the pool, one on each cluster (for benchmark)
> All nodes are connected through 4x10Gbps (2 for public network and 2 for
> private network)
> There are no other clients running
> Configuration is the default
> Imagen is 20GB big, the disks are 2TB big, there are 125 PGs in the pool
>
>
> I wonder why there is such a huge difference between RBD seq benchmark with 4M
> io siz

Re: [ceph-users] Cephfs write fail when node goes down

2018-05-15 Thread David C
I've seen similar behavior with cephfs client around that age, try 4.14+

On 15 May 2018 1:57 p.m., "Josef Zelenka" 
wrote:

Client's kernel is 4.4.0. Regarding the hung osd request, i'll have to
check, the issue is gone now, so i'm not sure if i'll find what you are
suggesting. It's rather odd, because Ceph's failover worked for us every
time, so i'm trying to figure out whether it is a ceph or app issue.



On 15/05/18 02:57, Yan, Zheng wrote:
> On Mon, May 14, 2018 at 5:37 PM, Josef Zelenka
>  wrote:
>> Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48
>> OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0).
Yesterday,
>> we were doing a HW upgrade of the nodes, so they went down one by one -
the
>> cluster was in good shape during the upgrade, as we've done this numerous
>> times and we're quite sure that the redundancy wasn't screwed up while
doing
>> this. However, during this upgrade one of the clients that does backups
to
>> cephfs(mounted via the kernel driver) failed to write the backup file
>> correctly to the cluster with the following trace after we turned off
one of
>> the nodes:
>>
>> [2585732.529412]  8800baa279a8 813fb2df 880236230e00
>> 8802339c
>> [2585732.529414]  8800baa28000 88023fc96e00 7fff
>> 8800baa27b20
>> [2585732.529415]  81840ed0 8800baa279c0 818406d5
>> 
>> [2585732.529417] Call Trace:
>> [2585732.529505]  [] ? cpumask_next_and+0x2f/0x40
>> [2585732.529558]  [] ? bit_wait+0x60/0x60
>> [2585732.529560]  [] schedule+0x35/0x80
>> [2585732.529562]  [] schedule_timeout+0x1b5/0x270
>> [2585732.529607]  [] ? kvm_clock_get_cycles+0x1e/0x20
>> [2585732.529609]  [] ? bit_wait+0x60/0x60
>> [2585732.529611]  [] io_schedule_timeout+0xa4/0x110
>> [2585732.529613]  [] bit_wait_io+0x1b/0x70
>> [2585732.529614]  [] __wait_on_bit_lock+0x4e/0xb0
>> [2585732.529652]  [] __lock_page+0xbb/0xe0
>> [2585732.529674]  [] ?
autoremove_wake_function+0x40/0x40
>> [2585732.529676]  [] pagecache_get_page+0x17d/0x1c0
>> [2585732.529730]  [] ? ceph_pool_perm_check+0x48/0x700
>> [ceph]
>> [2585732.529732]  []
grab_cache_page_write_begin+0x26/0x40
>> [2585732.529738]  [] ceph_write_begin+0x48/0xe0 [ceph]
>> [2585732.529739]  [] generic_perform_write+0xce/0x1c0
>> [2585732.529763]  [] ? file_update_time+0xc9/0x110
>> [2585732.529769]  [] ceph_write_iter+0xf89/0x1040
[ceph]
>> [2585732.529792]  [] ?
__alloc_pages_nodemask+0x159/0x2a0
>> [2585732.529808]  [] new_sync_write+0x9b/0xe0
>> [2585732.529811]  [] __vfs_write+0x26/0x40
>> [2585732.529812]  [] vfs_write+0xa9/0x1a0
>> [2585732.529814]  [] SyS_write+0x55/0xc0
>> [2585732.529817]  []
entry_SYSCALL_64_fastpath+0x16/0x71
>>
>>
> is there any hang osd request in /sys/kernel/debug/ceph//osdc?
>
>> I have encountered this behavior on Luminous, but not on Jewel. Anyone
who
>> has a clue why the write fails? As far as i'm concerned, it should always
>> work if all the PGs are available. Thanks
>> Josef
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-15 Thread Wido den Hollander


On 05/15/2018 02:51 PM, Blair Bethwaite wrote:
> Sorry, bit late to get back to this...
> 
> On Wed., 2 May 2018, 06:19 Nick Fisk,  > wrote:
> 
> 4.16 required?
> 
> 
> Looks like it - thanks for pointing that out.
> 
> Wido, I don't think you are doing anything wrong here, maybe this is a
> bug...
> 
> I've got RHEL7 + Broadwell based Ceph nodes here for which the same
> tuning appears to be working fine:
> 

Odd indeed. Keep in mind that I indeed have the newer Intel Scalable
CPUs with Ubuntu 16.04 and a 4.16 kernel.

My main goal is the lowest possible latency with NVMe and for that you
need higher clock speeds.

(more down)

> -bash-4.2$ lsb_release -a
> LSB Version:    :core-4.1-amd64:core-4.1-noarch
> Distributor ID: RedHatEnterpriseServer
> Description:    Red Hat Enterprise Linux Server release 7.3 (Maipo)
> Release:        7.3
> Codename:       Maipo
> 
> -bash-4.2$ lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                20
> On-line CPU(s) list:   0-19
> Thread(s) per core:    2
> Core(s) per socket:    10
> Socket(s):             1
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 79
> Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> Stepping:              1
> CPU MHz:               2745.960
> BogoMIPS:              4399.83
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              25600K
> NUMA node0 CPU(s):     0-19
> 
> -bash-4.2$ cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-3.10.0-514.2.2.el7.x86_64
> root=/dev/mapper/vg00-LogVol00 ro nofb splash=quiet crashkernel=auto
> rd.lvm.lv =vg00/LogVol00 rd.lvm.lv
> =vg00/LogVol01 rhgb quiet LANG=en_US.UTF-8
> 
> -bash-4.2$ sudo cpupower frequency-info
> analyzing CPU 0:
>   driver: intel_pstate
>   CPUs which run at the same hardware frequency: 0
>   CPUs which need to have their frequency coordinated by software: 0
>   maximum transition latency:  Cannot determine or is not supported.
>   hardware limits: 1.20 GHz - 3.10 GHz
>   available cpufreq governors: performance powersave
>   current policy: frequency should be within 1.20 GHz and 3.10 GHz.
>                   The governor "performance" may decide which speed to use
>                   within this range.
>   current CPU frequency: 2.40 GHz (asserted by call to hardware)
>   boost state support:
>     Supported: yes
>     Active: yes
> 
> -bash-4.2$ sudo cpupower -c 0-19 monitor
>     |Nehalem                    || Mperf              || Idle_Stats
> CPU | C3   | C6   | PC3  | PC6  || C0   | Cx   | Freq || POLL | C1-B |
> C1E- | C3-B | C6-B
>    0|  0.00|  0.00|  0.00|  0.00|| 20.93| 79.07|  2398||  1.00| 79.08| 
> 0.00|  0.00|  0.00
>   10|  0.00|  0.00|  0.00|  0.00||  1.81| 98.19|  2398||  0.00| 98.23| 
> 0.00|  0.00|  0.00
>    1|  0.00|  0.00|  0.00|  0.00||  3.80| 96.20|  2398||  2.10| 96.21| 
> 0.00|  0.00|  0.00
>   11|  0.00|  0.00|  0.00|  0.00||  7.95| 92.05|  2398||  7.59| 92.06| 
> 0.00|  0.00|  0.00
>    2|  0.00|  0.00|  0.00|  0.00||  1.99| 98.01|  2398||  0.00| 98.04| 
> 0.00|  0.00|  0.00
>   12|  0.00|  0.00|  0.00|  0.00||  1.59| 98.41|  2398||  0.64| 98.42| 
> 0.00|  0.00|  0.00
>    3|  0.00|  0.00|  0.00|  0.00|| 24.58| 75.42|  2398||  0.00| 75.43| 
> 0.00|  0.00|  0.00
>   13|  0.00|  0.00|  0.00|  0.00||  1.66| 98.34|  2399||  0.24| 98.35| 
> 0.00|  0.00|  0.00
>    4|  0.00|  0.00|  0.00|  0.00||  1.36| 98.64|  2398||  0.00| 98.65| 
> 0.00|  0.00|  0.00
>   14|  0.00|  0.00|  0.00|  0.00||  1.95| 98.05|  2398||  0.77| 98.06| 
> 0.00|  0.00|  0.00
>    5|  0.00|  0.00|  0.00|  0.00||  1.39| 98.61|  2398||  0.00| 98.64| 
> 0.00|  0.00|  0.00
>   15|  0.00|  0.00|  0.00|  0.00||  8.33| 91.67|  2398||  7.80| 91.68| 
> 0.00|  0.00|  0.00
>    6|  0.00|  0.00|  0.00|  0.00||  1.48| 98.52|  2398||  0.00| 98.54| 
> 0.00|  0.00|  0.00
>   16|  0.00|  0.00|  0.00|  0.00||  2.44| 97.56|  2398||  1.73| 97.57| 
> 0.00|  0.00|  0.00
>    7|  0.00|  0.00|  0.00|  0.00||  2.13| 97.87|  2398||  0.64| 97.88| 
> 0.00|  0.00|  0.00
>   17|  0.00|  0.00|  0.00|  0.00||  1.03| 98.97|  2398||  0.24| 98.93| 
> 0.00|  0.00|  0.00
>    8|  0.00|  0.00|  0.00|  0.00||  1.43| 98.57|  2398||  0.00| 98.61| 
> 0.00|  0.00|  0.00
>   18|  0.00|  0.00|  0.00|  0.00||  1.58| 98.42|  2398||  0.00| 98.45| 
> 0.00|  0.00|  0.00
>    9|  0.00|  0.00|  0.00|  0.00||  1.66| 98.34|  2398||  0.00| 98.35| 
> 0.00|  0.00|  0.00
>   19|  0.00|  0.00|  0.00|  0.00||  1.04| 98.96|  2398||  0.00| 98.93| 
> 0.00|  0.00|  0.00
> 
> -bash-4.2$ sudo /opt/dell/srvadmin/bin/omreport chassis biossetup |
> egrep -i "c state|turbo"
> Dell Controlled Turbo                               : Disabled
> Turbo Boost                                         : Enabled
> Energy Efficient Turbo                              : 

Re: [ceph-users] RBD bench read performance vs rados bench

2018-05-15 Thread Jorge Pinilla López
rbd bench --io-type read 2tb/test --io-size 4M
bench  type read io_size 4194304 io_threads 16 bytes 1073741824 pattern 
sequential
  SEC   OPS   OPS/SEC   BYTES/SEC
1 8 22.96  96306849.22
212 11.74  49250368.05
314  9.85  41294366.71
420  8.99  37697084.35
524  7.92  33218488.44
629  4.18  17544268.15
735  4.79  20108554.48
838  4.82  20223375.77
944  4.78  20028118.00
   1050  4.98  20901154.36
   1156  4.70  19714997.57
   1259  4.96  20783869.38
   1367  5.79  24280067.45
   1478  6.71  28133962.20
   1586  7.51  31512326.28
   1698  9.92  41613289.49
   17   107  8.87  37189698.96
   18   113  9.25  38787843.71
   19   118  8.02  33630441.91
   20   127  8.08  33879605.71
   21   133  7.02  29448630.29
   22   139  6.80  28522695.29
   23   146  6.46  27102585.08
   24   150  6.50  27275014.72
   25   157  6.01  25205422.98
   26   164  5.73  24026089.08
   27   166  5.13  21526120.39
   28   173  5.18  21711129.16
   29   185  6.72  28192258.47
   30   191  6.92  29018511.32
   31   201  7.95  33342772.10
   32   207  8.76  36732760.58
   33   213  8.54  35823482.59
   34   218  6.89  28883406.39
   35   225  5.87  24627670.76
   36   226  5.03  21078626.70
   37   235  5.22  21894384.04
   38   237  4.12  17279968.87
   39   238  4.00  16760880.87
elapsed:42  ops:  256  ops/sec: 6.09  bytes/sec: 25539951.50

Without RBD-cache performance is even worst by half.

So are only random reads distributed while sequential reads are sent to only 1 
OSD?


El martes, 15 de mayo de 2018 15:42:44 (CEST) usted escribió:
> On Tue, May 15, 2018 at 6:23 AM, Jorge Pinilla López  
wrote:
> > rbd bench --io-type read 2tb/test --io-size 4M
> > bench  type read io_size 4194304 io_threads 16 bytes 1073741824 pattern
> > sequential
> > 
> >   SEC   OPS   OPS/SEC   BYTES/SEC
> >   
> > 123 36.13  151560621.45
> > 243 28.61  119988170.65
> > 354 23.02  96555723.10
> > 476 22.35  93748581.57
> > 586 20.31  85202745.83
> > 6   102 15.73  65987931.41
> > 7   113 13.72  57564529.85
> > 8   115 12.13  50895409.80
> > 9   138 12.62  52950797.01
> >
> >10   144 11.37  47688526.04
> >11   154  9.59  40232628.73
> >12   161  9.51  39882023.45
> >13   167 10.30  43195718.39
> >14   172  6.57  27570654.19
> >15   181  7.21  30224357.89
> >16   186  7.08  29692318.46
> >17   192  6.31  26457629.12
> >18   197  6.03  25286212.14
> >19   202  6.22  26097739.41
> >20   210  5.82  24406336.22
> >21   217  6.05  25354976.24
> >22   224  6.15  25785754.73
> >23   231  6.84  28684892.86
> >24   237  6.86  28760546.77
> > 
> > elapsed:26  ops:  256  ops/sec: 9.58  bytes/sec: 40195235.45
> 
> What are your results if you re-run with the in-memory cache disabled
> (i.e. 'rbd bench --rbd-cache=false ')?
> 
> > rados -p 2tb bench 10 seq
> > hints = 1
> > 
> >   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> >   lat(s)
> >   
> > 0   0 0 0 0 0   - 
> >  0
> > 1  165842   167.965   1680.164905
> > 0.25338
> > 2  169781   161.969   156   0.0317369   
> > 0.315989
> > 3  16   135   119158.64   1520.133847   
> > 0.349598
> > 4  16   180   164   163.975   180   0.0511805   
> > 0.354751
> > 5  16   229   213   170.375   1960.245727   
> > 0.342972
> > 6  16   276   260   173.268   1880.032029   
> > 0.344167
> > 7  16   326   310   177.082   2000.489663   
> > 0.336684
> > 8  16   376   360   179.944   200   0.0458536   
> > 0.330955
> > 9  16   422   406   180.391   1840.247551   
> > 0.336771
> >
> >10  16   472   456   182.349   200 1.28901   
> >0.334343
> > 
> > Total time run:   10.522668
> > Total reads made: 473
> > Read size:4194304
> > Object size:  4194304
> > Bandwidth (MB/sec):   179.802
> > Average IOPS: 44
> > Stddev IOPS:  4
> > Max IOPS: 50
> > Min IOPS: 38
> > Average Latency(s):   0.350895
> > Max latency(s):   1.61122
> > Min latency(s):   0

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread Grigory Murashov

Hello guys!

I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph 
daemon osd.16 dump_historic_ops.


Here is the output of ceph heath details in the moment of problem

HEALTH_WARN 20 slow requests are blocked > 32 sec
REQUEST_SLOW 20 slow requests are blocked > 32 sec
    20 ops are blocked > 65.536 sec
    osds 16,27,29 have blocked requests > 65.536 sec

So I grab logs from osd.16.

The file is attached.  Could you please help to translate?

Thanks in advance.

Grigory Murashov
Voximplant

14.05.2018 18:14, Grigory Murashov пишет:


Hello David!

2. I set it up 10/10

3. Thanks, my problem was I did it on host where was no osd.15 daemon.

Could you please help to read osd logs?

Here is a part from ceph.log

2018-05-14 13:46:32.644323 mon.storage-ru1-osd1 mon.0 
185.164.149.2:6789/0 553895 : cluster [INF] Cluster is now healthy
2018-05-14 13:46:43.741921 mon.storage-ru1-osd1 mon.0 
185.164.149.2:6789/0 553896 : cluster [WRN] Health check failed: 21 
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-05-14 13:46:49.746994 mon.storage-ru1-osd1 mon.0 
185.164.149.2:6789/0 553897 : cluster [WRN] Health check update: 23 
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-05-14 13:46:55.752314 mon.storage-ru1-osd1 mon.0 
185.164.149.2:6789/0 553900 : cluster [WRN] Health check update: 3 
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-05-14 13:47:01.030686 mon.storage-ru1-osd1 mon.0 
185.164.149.2:6789/0 553901 : cluster [WRN] Health check update: 4 
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-05-14 13:47:07.764236 mon.storage-ru1-osd1 mon.0 
185.164.149.2:6789/0 553903 : cluster [WRN] Health check update: 32 
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-05-14 13:47:13.770833 mon.storage-ru1-osd1 mon.0 
185.164.149.2:6789/0 553904 : cluster [WRN] Health check update: 21 
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-05-14 13:47:17.774530 mon.storage-ru1-osd1 mon.0 
185.164.149.2:6789/0 553905 : cluster [INF] Health check cleared: 
REQUEST_SLOW (was: 12 slow requests are blocked > 32 sec)
2018-05-14 13:47:17.774582 mon.storage-ru1-osd1 mon.0 
185.164.149.2:6789/0 553906 : cluster [INF] Cluster is now healthy


At 13-47 I had a problem with osd.21

1. Ceph Health (storage-ru1-osd1.voximplant.com:ceph.health): HEALTH_WARN
{u'REQUEST_SLOW': {u'severity': u'HEALTH_WARN', u'summary': {u'message': u'4 slow 
requests are blocked > 32 sec'}}}
HEALTH_WARN 4 slow requests are blocked > 32 sec
REQUEST_SLOW 4 slow requests are blocked > 32 sec
 2 ops are blocked > 65.536 sec
 2 ops are blocked > 32.768 sec
 osd.21 has blocked requests > 65.536 sec

Here is a part from ceph-osd.21.log

2018-05-14 13:47:06.891399 7fb806dd6700 10 osd.21 pg_epoch: 236 
pg[2.0( v 236'297 (0'0,236'297] local-lis/les=223/224 n=1 ec=119/119 
lis/c 223/223 les/c/f 224/224/0 223/223/212) [21,29,15]
r=0 lpr=223 crt=236'297 lcod 236'296 mlcod 236'296 active+clean]  
dropping ondisk_read_lock
2018-05-14 13:47:06.891435 7fb806dd6700 10 osd.21 236 dequeue_op 
0x56453b753f80 finish

2018-05-14 13:47:07.111388 7fb8185f9700 10 osd.21 236 tick
2018-05-14 13:47:07.111398 7fb8185f9700 10 osd.21 236 do_waiters -- start
2018-05-14 13:47:07.111401 7fb8185f9700 10 osd.21 236 do_waiters -- finish
2018-05-14 13:47:07.800421 7fb817df8700 10 osd.21 236 
tick_without_osd_lock
2018-05-14 13:47:07.800444 7fb817df8700 10 osd.21 236 
promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0  
bytes; target 25 obj/sec or 5120 k bytes/sec
2018-05-14 13:47:07.800449 7fb817df8700 10 osd.21 236 
promote_throttle_recalibrate  actual 0, actual/prob ratio 1, adjusted 
new_prob 1000, prob 1000 -> 1000

2018-05-14 13:47:08.111470 7fb8185f9700 10 osd.21 236 tick
2018-05-14 13:47:08.111483 7fb8185f9700 10 osd.21 236 do_waiters -- start
2018-05-14 13:47:08.111485 7fb8185f9700 10 osd.21 236 do_waiters -- finish
2018-05-14 13:47:08.181070 7fb8055d3700 10 osd.21 236 dequeue_op 
0x564539651000 prio 63 cost 0 latency 0.000143 
osd_op(client.2597258.0:213844298 6.1d4 6.4079fd4 (undecoded) 
ondisk+read+kno
wn_if_redirected e236) v8 pg pg[6.1d4( v 236'20882 
(236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132 lis/c 
223/223 les/c/f 224/224/0 223/223/212) [21,29,17] r=0 lpr=223 crt=236

'20882 lcod 236'20881 mlcod 236'20881 active+clean]
2018-05-14 13:47:08.181112 7fb8055d3700 10 osd.21 pg_epoch: 236 
pg[6.1d4( v 236'20882 (236'19289,236'20882] local-lis/les=223/224 
n=20791 ec=145/132 lis/c 223/223 les/c/f 224/224/0 223/223/
212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 
236'20881 active+clean] _handle_message: 0x564539651000
2018-05-14 13:47:08.181141 7fb8055d3700 10 osd.21 pg_epoch: 236 
pg[6.1d4( v 236'20882 (236'19289,236'20882] local-lis/les=223/224 
n=20791 ec=145/132 lis/c 223/223 les/c/f 224/224/0 223/223/
212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 
236'20881 active+clean] do_op osd_op(client.2597258.0:213844298 6.1d4 
6:2bf9e020:::eb359f44-3316-4cd3-9006-d

Re: [ceph-users] Single ceph cluster for the object storage service of 2 OpenStack clouds

2018-05-15 Thread David Turner
Yeah, that's how we do multiple zones.  I find following the documentation
for multi-site (but not actually setting up a second site) to work well for
setting up multiple realms in a single cluster.

On Tue, May 15, 2018 at 9:29 AM Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> Hi
>
> I have been using for a while a single ceph cluster for the image and
> block storage services of two Openstack clouds.
>
> Now I want to use this ceph cluster also for the object storage services
> of the two OpenStack clouds and I want to implement that having a clear
> separation between the two clouds. In particular I want different ceph
> pools for the two Clouds.
>
> My understanding is that this can be done:
>
> - creating 2 realms (one for each cloud)
> - creating one zonegroup for each realm
> - creating one zone for each zonegroup
> - having 1 ore more rgw instances for each zone
>
> Did I get it right ?
>
> Thanks, Massimo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests are blocked

2018-05-15 Thread LOPEZ Jean-Charles
Hi Grigory,

looks like osd.16 is having a hard time acknowledging the write request (for 
bucket resharding operations from what it looks like) as it takes about 15 
seconds for osd.16 to receive the commit confirmation from osd.21 on subop 
communication.

Have a go and check at the journal device for osd.21 or if the machine where 
osd.21 is running is either overloaded or has a network issue.

Regards
JC

> On 15 May 2018, at 19:49, Grigory Murashov  wrote:
> 
> Hello guys!
> 
> I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph daemon 
> osd.16 dump_historic_ops.
> 
> Here is the output of ceph heath details in the moment of problem
> 
> HEALTH_WARN 20 slow requests are blocked > 32 sec
> REQUEST_SLOW 20 slow requests are blocked > 32 sec
> 20 ops are blocked > 65.536 sec
> osds 16,27,29 have blocked requests > 65.536 sec
> So I grab logs from osd.16.
> 
> The file is attached.  Could you please help to translate?
> 
> Thanks in advance.
> Grigory Murashov
> Voximplant
> 14.05.2018 18:14, Grigory Murashov пишет:
>> Hello David!
>> 
>> 2. I set it up 10/10
>> 
>> 3. Thanks, my problem was I did it on host where was no osd.15 daemon.
>> 
>> Could you please help to read osd logs?
>> 
>> Here is a part from ceph.log
>> 
>> 2018-05-14 13:46:32.644323 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 
>> 553895 : cluster [INF] Cluster is now healthy
>> 2018-05-14 13:46:43.741921 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 
>> 553896 : cluster [WRN] Health check failed: 21 slow requests are blocked > 
>> 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:46:49.746994 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 
>> 553897 : cluster [WRN] Health check update: 23 slow requests are blocked > 
>> 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:46:55.752314 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 
>> 553900 : cluster [WRN] Health check update: 3 slow requests are blocked > 32 
>> sec (REQUEST_SLOW)
>> 2018-05-14 13:47:01.030686 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 
>> 553901 : cluster [WRN] Health check update: 4 slow requests are blocked > 32 
>> sec (REQUEST_SLOW)
>> 2018-05-14 13:47:07.764236 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 
>> 553903 : cluster [WRN] Health check update: 32 slow requests are blocked > 
>> 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:47:13.770833 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 
>> 553904 : cluster [WRN] Health check update: 21 slow requests are blocked > 
>> 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:47:17.774530 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 
>> 553905 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 12 slow 
>> requests are blocked > 32 sec)
>> 2018-05-14 13:47:17.774582 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 
>> 553906 : cluster [INF] Cluster is now healthy
>> At 13-47 I had a problem with osd.21
>> 
>> 1. Ceph Health (storage-ru1-osd1.voximplant.com:ceph.health): HEALTH_WARN
>> {u'REQUEST_SLOW': {u'severity': u'HEALTH_WARN', u'summary': {u'message': u'4 
>> slow requests are blocked > 32 sec'}}}
>> HEALTH_WARN 4 slow requests are blocked > 32 sec
>> REQUEST_SLOW 4 slow requests are blocked > 32 sec
>> 2 ops are blocked > 65.536 sec
>> 2 ops are blocked > 32.768 sec
>> osd.21 has blocked requests > 65.536 sec
>> Here is a part from ceph-osd.21.log
>> 2018-05-14 13:47:06.891399 7fb806dd6700 10 osd.21 pg_epoch: 236 pg[2.0( v 
>> 236'297 (0'0,236'297] local-lis/les=223/224 n=1 ec=119/119 lis/c 223/223 
>> les/c/f 224/224/0 223/223/212) [21,29,15]
>> r=0 lpr=223 crt=236'297 lcod 236'296 mlcod 236'296 active+clean]  dropping 
>> ondisk_read_lock
>> 2018-05-14 13:47:06.891435 7fb806dd6700 10 osd.21 236 dequeue_op 
>> 0x56453b753f80 finish
>> 2018-05-14 13:47:07.111388 7fb8185f9700 10 osd.21 236 tick
>> 2018-05-14 13:47:07.111398 7fb8185f9700 10 osd.21 236 do_waiters -- start
>> 2018-05-14 13:47:07.111401 7fb8185f9700 10 osd.21 236 do_waiters -- finish
>> 2018-05-14 13:47:07.800421 7fb817df8700 10 osd.21 236 tick_without_osd_lock
>> 2018-05-14 13:47:07.800444 7fb817df8700 10 osd.21 236 
>> promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0  bytes; 
>> target 25 obj/sec or 5120 k bytes/sec
>> 2018-05-14 13:47:07.800449 7fb817df8700 10 osd.21 236 
>> promote_throttle_recalibrate  actual 0, actual/prob ratio 1, adjusted 
>> new_prob 1000, prob 1000 -> 1000
>> 2018-05-14 13:47:08.111470 7fb8185f9700 10 osd.21 236 tick
>> 2018-05-14 13:47:08.111483 7fb8185f9700 10 osd.21 236 do_waiters -- start
>> 2018-05-14 13:47:08.111485 7fb8185f9700 10 osd.21 236 do_waiters -- finish
>> 2018-05-14 13:47:08.181070 7fb8055d3700 10 osd.21 236 dequeue_op 
>> 0x564539651000 prio 63 cost 0 latency 0.000143 
>> osd_op(client.2597258.0:213844298 6.1d4 6.4079fd4 (undecoded) ondisk+read+kno
>> wn_if_redirected e236) v8 pg pg[6.1d4( v 236'20882 (236'19289,236'20882] 
>> local-lis/les=223/224 n=20791 ec=145/132 lis/c 223/223 les/c/f 224/224/0 
>> 223/223/212) [21,29,17] r=0 lpr=223 crt=236
>> '20882 lcod 236'20881 

Re: [ceph-users] Node crash, filesytem not usable

2018-05-15 Thread Webert de Souza Lima
I'm sorry I wouldn't know, I'm on Jewel.
is your cluster HEALTH_OK now?

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*


On Sun, May 13, 2018 at 6:29 AM Marc Roos  wrote:

>
> In luminous
> osd_recovery_threads = osd_disk_threads ?
> osd_recovery_sleep = osd_recovery_sleep_hdd ?
>
> Or is this speeding up recovery, a lot different in luminous?
>
> [@~]# ceph daemon osd.0 config show | grep osd | grep thread
> "osd_command_thread_suicide_timeout": "900",
> "osd_command_thread_timeout": "600",
> "osd_disk_thread_ioprio_class": "",
> "osd_disk_thread_ioprio_priority": "-1",
> "osd_disk_threads": "1",
> "osd_op_num_threads_per_shard": "0",
> "osd_op_num_threads_per_shard_hdd": "1",
> "osd_op_num_threads_per_shard_ssd": "2",
> "osd_op_thread_suicide_timeout": "150",
> "osd_op_thread_timeout": "15",
> "osd_peering_wq_threads": "2",
> "osd_recovery_thread_suicide_timeout": "300",
> "osd_recovery_thread_timeout": "30",
> "osd_remove_thread_suicide_timeout": "36000",
> "osd_remove_thread_timeout": "3600",
>
> -Original Message-
> From: Webert de Souza Lima [mailto:webert.b...@gmail.com]
> Sent: vrijdag 11 mei 2018 20:34
> To: ceph-users
> Subject: Re: [ceph-users] Node crash, filesytem not usable
>
> This message seems to be very concerning:
>  >mds0: Metadata damage detected
>
>
> but for the rest, the cluster seems still to be recovering. you could
> try to seep thing up with ceph tell, like:
>
> ceph tell osd.* injectargs --osd_max_backfills=10
>
> ceph tell osd.* injectargs --osd_recovery_sleep=0.0
>
> ceph tell osd.* injectargs --osd_recovery_threads=2
>
>
>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> Belo Horizonte - Brasil
> IRC NICK - WebertRLZ
>
>
> On Fri, May 11, 2018 at 3:06 PM Daniel Davidson
>  wrote:
>
>
> Below id the information you were asking for.  I think they are
> size=2, min size=1.
>
> Dan
>
> # ceph status
> cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77
>
>
>
>
>  health HEALTH_ERR
>
>
>
>
> 140 pgs are stuck inactive for more than 300 seconds
> 64 pgs backfill_wait
> 76 pgs backfilling
> 140 pgs degraded
> 140 pgs stuck degraded
> 140 pgs stuck inactive
> 140 pgs stuck unclean
> 140 pgs stuck undersized
> 140 pgs undersized
> 210 requests are blocked > 32 sec
> recovery 38725029/695508092 objects degraded (5.568%)
> recovery 10844554/695508092 objects misplaced (1.559%)
> mds0: Metadata damage detected
> mds0: Behind on trimming (71/30)
> noscrub,nodeep-scrub flag(s) set
>  monmap e3: 4 mons at
> {ceph-0=172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:
> 6789/0,ceph-3=172.16.31.4:6789/0}
> election epoch 824, quorum 0,1,2,3
> ceph-0,ceph-1,ceph-2,ceph-3
>   fsmap e144928: 1/1/1 up {0=ceph-0=up:active}, 1 up:standby
>  osdmap e35814: 32 osds: 30 up, 30 in; 140 remapped pgs
> flags
> noscrub,nodeep-scrub,sortbitwise,require_jewel_osds
>   pgmap v43142427: 1536 pgs, 2 pools, 762 TB data, 331 Mobjects
> 1444 TB used, 1011 TB / 2455 TB avail
> 38725029/695508092 objects degraded (5.568%)
> 10844554/695508092 objects misplaced (1.559%)
> 1396 active+clean
>   76
> undersized+degraded+remapped+backfilling+peered
>   64
> undersized+degraded+remapped+wait_backfill+peered
> recovery io 1244 MB/s, 1612 keys/s, 705 objects/s
>
> ID  WEIGHT TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY
>  -1 2619.54541 root default
>  -2  163.72159 host ceph-0
>   0   81.86079 osd.0 up  1.0  1.0
>   1   81.86079 osd.1 up  1.0  1.0
>  -3  163.72159 host ceph-1
>   2   81.86079 osd.2 up  1.0  1.0
>   3   81.86079 osd.3 up  1.0  1.0
>  -4  163.72159 host ceph-2
>   8   81.86079 osd.8 up  1.0  1.0
>   9   81.86079 osd.9 up  1.0  1.0
>  -5  163.72159 host ceph-3
>  10   81.86079 osd.10up  1.0  1.0
>  11   81.86079 osd.11up  1.0  1.0
>  -6  163.72159 host ceph-4
>   4   81.86079 osd.4 up  1.0  1.0
>   5   81.86079 osd.5 up  1.0

Re: [ceph-users] Cephfs write fail when node goes down

2018-05-15 Thread Paul Emmerich
Kernel 4.4 is ancient in terms of Ceph support; we've also encountered a
lot of similar hangs with older kernels and cephfs.


Paul

2018-05-15 16:56 GMT+02:00 David C :

> I've seen similar behavior with cephfs client around that age, try 4.14+
>
> On 15 May 2018 1:57 p.m., "Josef Zelenka" 
> wrote:
>
> Client's kernel is 4.4.0. Regarding the hung osd request, i'll have to
> check, the issue is gone now, so i'm not sure if i'll find what you are
> suggesting. It's rather odd, because Ceph's failover worked for us every
> time, so i'm trying to figure out whether it is a ceph or app issue.
>
>
>
> On 15/05/18 02:57, Yan, Zheng wrote:
> > On Mon, May 14, 2018 at 5:37 PM, Josef Zelenka
> >  wrote:
> >> Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48
> >> OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0).
> Yesterday,
> >> we were doing a HW upgrade of the nodes, so they went down one by one -
> the
> >> cluster was in good shape during the upgrade, as we've done this
> numerous
> >> times and we're quite sure that the redundancy wasn't screwed up while
> doing
> >> this. However, during this upgrade one of the clients that does backups
> to
> >> cephfs(mounted via the kernel driver) failed to write the backup file
> >> correctly to the cluster with the following trace after we turned off
> one of
> >> the nodes:
> >>
> >> [2585732.529412]  8800baa279a8 813fb2df 880236230e00
> >> 8802339c
> >> [2585732.529414]  8800baa28000 88023fc96e00 7fff
> >> 8800baa27b20
> >> [2585732.529415]  81840ed0 8800baa279c0 818406d5
> >> 
> >> [2585732.529417] Call Trace:
> >> [2585732.529505]  [] ? cpumask_next_and+0x2f/0x40
> >> [2585732.529558]  [] ? bit_wait+0x60/0x60
> >> [2585732.529560]  [] schedule+0x35/0x80
> >> [2585732.529562]  [] schedule_timeout+0x1b5/0x270
> >> [2585732.529607]  [] ? kvm_clock_get_cycles+0x1e/0x20
> >> [2585732.529609]  [] ? bit_wait+0x60/0x60
> >> [2585732.529611]  [] io_schedule_timeout+0xa4/0x110
> >> [2585732.529613]  [] bit_wait_io+0x1b/0x70
> >> [2585732.529614]  [] __wait_on_bit_lock+0x4e/0xb0
> >> [2585732.529652]  [] __lock_page+0xbb/0xe0
> >> [2585732.529674]  [] ? autoremove_wake_function+0x40/
> 0x40
> >> [2585732.529676]  [] pagecache_get_page+0x17d/0x1c0
> >> [2585732.529730]  [] ? ceph_pool_perm_check+0x48/
> 0x700
> >> [ceph]
> >> [2585732.529732]  [] grab_cache_page_write_begin+
> 0x26/0x40
> >> [2585732.529738]  [] ceph_write_begin+0x48/0xe0 [ceph]
> >> [2585732.529739]  [] generic_perform_write+0xce/0x1c0
> >> [2585732.529763]  [] ? file_update_time+0xc9/0x110
> >> [2585732.529769]  [] ceph_write_iter+0xf89/0x1040
> [ceph]
> >> [2585732.529792]  [] ? __alloc_pages_nodemask+0x159/
> 0x2a0
> >> [2585732.529808]  [] new_sync_write+0x9b/0xe0
> >> [2585732.529811]  [] __vfs_write+0x26/0x40
> >> [2585732.529812]  [] vfs_write+0xa9/0x1a0
> >> [2585732.529814]  [] SyS_write+0x55/0xc0
> >> [2585732.529817]  [] entry_SYSCALL_64_fastpath+
> 0x16/0x71
> >>
> >>
> > is there any hang osd request in /sys/kernel/debug/ceph//osdc?
> >
> >> I have encountered this behavior on Luminous, but not on Jewel. Anyone
> who
> >> has a clue why the write fails? As far as i'm concerned, it should
> always
> >> work if all the PGs are available. Thanks
> >> Josef
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests are blocked

2018-05-15 Thread Paul Emmerich
Looks like it's mostly RGW metadata stuff; are you running your non-data
RGW pools on SSDs (you should, that can help *a lot*)?


Paul

2018-05-15 18:49 GMT+02:00 Grigory Murashov :

> Hello guys!
>
> I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph
> daemon osd.16 dump_historic_ops.
>
> Here is the output of ceph heath details in the moment of problem
>
> HEALTH_WARN 20 slow requests are blocked > 32 sec
> REQUEST_SLOW 20 slow requests are blocked > 32 sec
> 20 ops are blocked > 65.536 sec
> osds 16,27,29 have blocked requests > 65.536 sec
>
> So I grab logs from osd.16.
>
> The file is attached.  Could you please help to translate?
>
> Thanks in advance.
>
> Grigory Murashov
> Voximplant
>
> 14.05.2018 18:14, Grigory Murashov пишет:
>
> Hello David!
>
> 2. I set it up 10/10
>
> 3. Thanks, my problem was I did it on host where was no osd.15 daemon.
>
> Could you please help to read osd logs?
>
> Here is a part from ceph.log
>
> 2018-05-14 13:46:32.644323 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0
> 553895 : cluster [INF] Cluster is now healthy
> 2018-05-14 13:46:43.741921 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0
> 553896 : cluster [WRN] Health check failed: 21 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-05-14 13:46:49.746994 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0
> 553897 : cluster [WRN] Health check update: 23 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-05-14 13:46:55.752314 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0
> 553900 : cluster [WRN] Health check update: 3 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-05-14 13:47:01.030686 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0
> 553901 : cluster [WRN] Health check update: 4 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-05-14 13:47:07.764236 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0
> 553903 : cluster [WRN] Health check update: 32 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-05-14 13:47:13.770833 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0
> 553904 : cluster [WRN] Health check update: 21 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-05-14 13:47:17.774530 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0
> 553905 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 12 slow
> requests are blocked > 32 sec)
> 2018-05-14 13:47:17.774582 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0
> 553906 : cluster [INF] Cluster is now healthy
>
> At 13-47 I had a problem with osd.21
>
> 1. Ceph Health (storage-ru1-osd1.voximplant.com:ceph.health): HEALTH_WARN
> {u'REQUEST_SLOW': {u'severity': u'HEALTH_WARN', u'summary': {u'message': u'4 
> slow requests are blocked > 32 sec'}}}
> HEALTH_WARN 4 slow requests are blocked > 32 sec
> REQUEST_SLOW 4 slow requests are blocked > 32 sec
> 2 ops are blocked > 65.536 sec
> 2 ops are blocked > 32.768 sec
> osd.21 has blocked requests > 65.536 sec
>
> Here is a part from ceph-osd.21.log
>
> 2018-05-14 13:47:06.891399 7fb806dd6700 10 osd.21 pg_epoch: 236 pg[2.0( v
> 236'297 (0'0,236'297] local-lis/les=223/224 n=1 ec=119/119 lis/c 223/223
> les/c/f 224/224/0 223/223/212) [21,29,15]
> r=0 lpr=223 crt=236'297 lcod 236'296 mlcod 236'296 active+clean]  dropping
> ondisk_read_lock
> 2018-05-14 13:47:06.891435 7fb806dd6700 10 osd.21 236 dequeue_op
> 0x56453b753f80 finish
> 2018-05-14 13:47:07.111388 7fb8185f9700 10 osd.21 236 tick
> 2018-05-14 13:47:07.111398 7fb8185f9700 10 osd.21 236 do_waiters -- start
> 2018-05-14 13:47:07.111401 7fb8185f9700 10 osd.21 236 do_waiters -- finish
> 2018-05-14 13:47:07.800421 7fb817df8700 10 osd.21 236 tick_without_osd_lock
> 2018-05-14 13:47:07.800444 7fb817df8700 10 osd.21 236
> promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0  bytes;
> target 25 obj/sec or 5120 k bytes/sec
> 2018-05-14 13:47:07.800449 7fb817df8700 10 osd.21 236
> promote_throttle_recalibrate  actual 0, actual/prob ratio 1, adjusted
> new_prob 1000, prob 1000 -> 1000
> 2018-05-14 13:47:08.111470 7fb8185f9700 10 osd.21 236 tick
> 2018-05-14 13:47:08.111483 7fb8185f9700 10 osd.21 236 do_waiters -- start
> 2018-05-14 13:47:08.111485 7fb8185f9700 10 osd.21 236 do_waiters -- finish
> 2018-05-14 13:47:08.181070 7fb8055d3700 10 osd.21 236 dequeue_op
> 0x564539651000 prio 63 cost 0 latency 0.000143 
> osd_op(client.2597258.0:213844298
> 6.1d4 6.4079fd4 (undecoded) ondisk+read+kno
> wn_if_redirected e236) v8 pg pg[6.1d4( v 236'20882 (236'19289,236'20882]
> local-lis/les=223/224 n=20791 ec=145/132 lis/c 223/223 les/c/f 224/224/0
> 223/223/212) [21,29,17] r=0 lpr=223 crt=236
> '20882 lcod 236'20881 mlcod 236'20881 active+clean]
> 2018-05-14 13:47:08.181112 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4(
> v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132
> lis/c 223/223 les/c/f 224/224/0 223/223/
> 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881
> active+clean] _handle_message: 0x564539651000
> 2018-05-14 13:47:08.181141 7fb8

Re: [ceph-users] which kernel support object-map, fast-diff

2018-05-15 Thread Paul Emmerich
The following RBD features are supported since these kernel versions:

Kernel 3.8: RBD_FEATURE_LAYERING
https://github.com/ceph/ceph-client/commit/d889140c4a1c5edb6a7bd90392b9d878bfaccfb6
Kernel 3.10: RBD_FEATURE_STRIPINGV2
https://github.com/ceph/ceph-client/commit/5cbf6f12c48121199cc214c93dea98cce719343b
Kernel 4.9: RBD_FEATURE_EXCLUSIVE_LOCK
https://github.com/ceph/ceph-client/commit/ed95b21a4b0a71ef89306cdeb427d53cc9cb343f
Kernel 4.11: RBD_FEATURE_DATA_POOL
https://github.com/ceph/ceph-client/commit/7e97332ea9caad3b7c6d86bc3b982e17eda2f736

Try using rbd-nbd if you need other features.

Paul


2018-05-15 11:06 GMT+02:00 xiang@sky-data.cn :

> Could give a list about enable or not?
>
> - Original Message -
> From: "Konstantin Shalygin" 
> To: "ceph-users" 
> Cc: "xiang dai" 
> Sent: Tuesday, May 15, 2018 4:57:00 PM
> Subject: Re: [ceph-users] which kernel support object-map, fast-diff
>
> > So which kernel version support those feature?
>
>
> No one kernel support this features yet.
>
>
>
> k
> --
> 戴翔
> 南京天数信息科技有限公司
> 电话: +86 1 3382776490
> 公司官网: www.sky-data.cn
> 免费使用天数润科智能计算平台 SkyDiscovery
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests are blocked

2018-05-15 Thread David Turner
I've been happening into slow requests with my rgw metadata pools just this
week. I tracked it down because the slow requests were on my nmve osds. I
haven't solved the issue yet, but I can confirm that no resharding was
taking place and that the auto-resharder is working as all of my larger
buckets have different amounts of index shards.

I'm interested to see if there are some settings we should try, or things
we should check on, to resolve the rgw meta-pools slow requests.

On Tue, May 15, 2018, 4:42 PM Paul Emmerich  wrote:

> Looks like it's mostly RGW metadata stuff; are you running your non-data
> RGW pools on SSDs (you should, that can help *a lot*)?
>
>
> Paul
>
> 2018-05-15 18:49 GMT+02:00 Grigory Murashov :
>
>> Hello guys!
>>
>> I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph
>> daemon osd.16 dump_historic_ops.
>>
>> Here is the output of ceph heath details in the moment of problem
>>
>> HEALTH_WARN 20 slow requests are blocked > 32 sec
>> REQUEST_SLOW 20 slow requests are blocked > 32 sec
>> 20 ops are blocked > 65.536 sec
>> osds 16,27,29 have blocked requests > 65.536 sec
>>
>> So I grab logs from osd.16.
>>
>> The file is attached.  Could you please help to translate?
>>
>> Thanks in advance.
>>
>> Grigory Murashov
>> Voximplant
>>
>> 14.05.2018 18:14, Grigory Murashov пишет:
>>
>> Hello David!
>>
>> 2. I set it up 10/10
>>
>> 3. Thanks, my problem was I did it on host where was no osd.15 daemon.
>>
>> Could you please help to read osd logs?
>>
>> Here is a part from ceph.log
>>
>> 2018-05-14 13:46:32.644323 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553895 : cluster [INF] Cluster is now healthy
>> 2018-05-14 13:46:43.741921 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553896 : cluster [WRN] Health check failed: 21 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:46:49.746994 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553897 : cluster [WRN] Health check update: 23 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:46:55.752314 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553900 : cluster [WRN] Health check update: 3 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:47:01.030686 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553901 : cluster [WRN] Health check update: 4 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:47:07.764236 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553903 : cluster [WRN] Health check update: 32 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:47:13.770833 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553904 : cluster [WRN] Health check update: 21 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:47:17.774530 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553905 : cluster [INF] Health check cleared:
>> REQUEST_SLOW (was: 12 slow requests are blocked > 32 sec)
>> 2018-05-14 13:47:17.774582 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553906 : cluster [INF] Cluster is now healthy
>>
>> At 13-47 I had a problem with osd.21
>>
>> 1. Ceph Health (storage-ru1-osd1.voximplant.com:ceph.health): HEALTH_WARN
>> {u'REQUEST_SLOW': {u'severity': u'HEALTH_WARN', u'summary': {u'message': u'4 
>> slow requests are blocked > 32 sec'}}}
>> HEALTH_WARN 4 slow requests are blocked > 32 sec
>> REQUEST_SLOW 4 slow requests are blocked > 32 sec
>> 2 ops are blocked > 65.536 sec
>> 2 ops are blocked > 32.768 sec
>> osd.21 has blocked requests > 65.536 sec
>>
>> Here is a part from ceph-osd.21.log
>>
>> 2018-05-14 13:47:06.891399 7fb806dd6700 10 osd.21 pg_epoch: 236 pg[2.0( v
>> 236'297 (0'0,236'297] local-lis/les=223/224 n=1 ec=119/119 lis/c 223/223
>> les/c/f 224/224/0 223/223/212) [21,29,15]
>> r=0 lpr=223 crt=236'297 lcod 236'296 mlcod 236'296 active+clean]
>> dropping ondisk_read_lock
>> 2018-05-14 13:47:06.891435 7fb806dd6700 10 osd.21 236 dequeue_op
>> 0x56453b753f80 finish
>> 2018-05-14 13:47:07.111388 7fb8185f9700 10 osd.21 236 tick
>> 2018-05-14 13:47:07.111398 7fb8185f9700 10 osd.21 236 do_waiters -- start
>> 2018-05-14 13:47:07.111401 7fb8185f9700 10 osd.21 236 do_waiters -- finish
>> 2018-05-14 13:47:07.800421 7fb817df8700 10 osd.21 236
>> tick_without_osd_lock
>> 2018-05-14 13:47:07.800444 7fb817df8700 10 osd.21 236
>> promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0  bytes;
>> target 25 obj/sec or 5120 k bytes/sec
>> 2018-05-14 13:47:07.800449 7fb817df8700 10 osd.21 236
>> promote_throttle_recalibrate  actual 0, actual/prob ratio 1, adjusted
>> new_prob 1000, prob 1000 -> 1000
>> 2018-05-14 13:47:08.111470 7fb8185f9700 10 osd.21 236 tick
>> 2018-05-14 13:47:08.111483 7fb8185f9700 10 osd.21 236 do_waiters -- start
>> 2018-05-14 13:47:08.111485 7fb8185f9700 10 osd.21 236 do_waiters -- finish
>> 2018-05-14 13:47:08.181070 7fb8055d3700 10 osd.21 236 dequeue_op
>> 0x564539651000 prio 63 cost 0 latency 0.000143
>> osd_op(client.2597258.0:213844298 6.1d4

[ceph-users] Too many active mds servers

2018-05-15 Thread Thomas Bennett
Hi,

I'm running Luminous 12.2.5 and I'm testing cephfs.

However, I seem to have too many active mds servers on my test cluster.

How do I set one of my mds servers to become standby?

I've run ceph fs set cephfs max_mds 2 which set the max_mds from 3 to 2 but
has no effect on my running configuration.

$ ceph status
  cluster:
id:
health: HEALTH_WARN
*insufficient standby MDS daemons available*

  services:
mon: 3 daemons, quorum mon1-c2-vm,mon2-c2-vm,mon3-c2-vm
mgr: mon2-c2-vm(active), standbys: mon1-c2-vm
mds: *cephfs-3/3/2 up
{0=mon1-c2-vm=up:active,1=mon3-c2-vm=up:active,2=mon2-c2-vm=up:active}*
osd: 250 osds: 250 up, 250 in
rgw: 2 daemons active

  data:
pools:   4 pools, 8456 pgs
objects: 13492 objects, 53703 MB
usage:   427 GB used, 1750 TB / 1751 TB avail
pgs: 8456 active+clean

$ ceph fs get cephfs
Filesystem 'cephfs' (1)
fs_name cephfs
epoch 187
flags c
created 2018-05-03 10:25:21.733597
modified 2018-05-03 10:25:21.733597
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 0
last_failure_osd_epoch 1369
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
table,9=file layout v2}
*max_mds 2*
*in 0,1,2*
*up {0=43808,1=43955,2=27318}*
failed
damaged
stopped
data_pools [1,11]
metadata_pool 2
inline_data disabled
balancer
standby_count_wanted 1
43808: xx.xx.xx.xx:6800/3009065437 'mon1-c2-vm' mds.0.171 up:active seq 45
43955: xx.xx.xx.xx:6800/2947700655 'mon2-c2-vm' mds.1.174 up:active seq 28
27318: xx.xx.xx.xx:6800/652878628 'mon3-c2-vm' mds.2.177 up:active seq 8

Thanks,
Tom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Too many active mds servers

2018-05-15 Thread Patrick Donnelly
Hello Thomas,

On Tue, May 15, 2018 at 2:35 PM, Thomas Bennett  wrote:
> Hi,
>
> I'm running Luminous 12.2.5 and I'm testing cephfs.
>
> However, I seem to have too many active mds servers on my test cluster.
>
> How do I set one of my mds servers to become standby?
>
> I've run ceph fs set cephfs max_mds 2 which set the max_mds from 3 to 2 but
> has no effect on my running configuration.

http://docs.ceph.com/docs/luminous/cephfs/multimds/#decreasing-the-number-of-ranks

Note: the behavior is changing in Mimic to be automatic after reducing max_mds.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Too many active mds servers

2018-05-15 Thread Thomas Bennett
Hi Patric,

Thanks! Much appreciate.

On Tue, 15 May 2018 at 14:52, Patrick Donnelly  wrote:

> Hello Thomas,
>
> On Tue, May 15, 2018 at 2:35 PM, Thomas Bennett  wrote:
> > Hi,
> >
> > I'm running Luminous 12.2.5 and I'm testing cephfs.
> >
> > However, I seem to have too many active mds servers on my test cluster.
> >
> > How do I set one of my mds servers to become standby?
> >
> > I've run ceph fs set cephfs max_mds 2 which set the max_mds from 3 to 2
> but
> > has no effect on my running configuration.
>
>
> http://docs.ceph.com/docs/luminous/cephfs/multimds/#decreasing-the-number-of-ranks
>
> Note: the behavior is changing in Mimic to be automatic after reducing
> max_mds.
>
> --
> Patrick Donnelly
>
-- 
Thomas Bennett

SKA South Africa
Science Processing Team

Office: +27 21 5067341
Mobile: +27 79 5237105
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group

2018-05-15 Thread Gregory Farnum
Looks like something went a little wrong with the snapshot metadata in that
PG. If the PG is still going active from the other copies, you're probably
best off using the ceph-objectstore-tool to remove it on the OSD that is
crashing. You could either replace it with an export from one of the other
nodes, or let Ceph do the backfilling on its own.
-Greg

On Tue, May 15, 2018 at 2:13 AM Siegfried Höllrigl <
siegfried.hoellr...@xidras.com> wrote:

>
>
> Hi !
>
> We have upgraded our Ceph cluster (3 Mon Servers, 9 OSD Servers, 190
> OSDs total) From 10.2.10 to Ceph 12.2.4 and then to 12.2.5.
> (A mixture of Ubuntu 14 and 16 with the Repos from
> https://download.ceph.com/debian-luminous/)
>
> Now we have the Problem that One ODS is crashing again and again
> (approx. once per day). systemd restarts it.
>
> We could now propably identify the problem. It looks like one placement
> group (5.9b) causes the crash.
> It seems like it doesnt matter if it is running on a filestore or a
> bluestore osd.
> We could even break it down to some RBDs that were in this pool.
> They are already deleted, but it looks like there are some objects on
> the osd left, but we cant delete them :
>
>
> rados -p rbd ls > radosrbdls.txt
> echo radosrbdls.txt | grep -vE "($(rados -p rbd ls | grep rbd_header |
> grep -o "\.[0-9a-f]*" | sed -e :a -e '$!N; s/\n/|/; ta' -e
> 's/\./\\./g'))" | grep -E '(rbd_data|journal|rbd_object_map)'
> rbd_data.112913b238e1f29.0e3f
> rbd_data.112913b238e1f29.09d2
> rbd_data.112913b238e1f29.0ba3
>
> rados -p rbd rm rbd_data.112913b238e1f29.0e3f
> error removing rbd>rbd_data.112913b238e1f29.0e3f: (2) No
> such file or directory
> rados -p rbd rm rbd_data.112913b238e1f29.09d2
> error removing rbd>rbd_data.112913b238e1f29.09d2: (2) No
> such file or directory
> rados -p rbd rm rbd_data.112913b238e1f29.0ba3
> error removing rbd>rbd_data.112913b238e1f29.0ba3: (2) No
> such file or directory
>
> In the "current" directory of the osd there are a lot more files with
> this rbd prefix.
> Is there any chance to delete these obviously orpahed stuff before the
> pg becomes healthy ?
> (it is running now at only 2 of 3 osds)
>
> What else could cause such a crash ?
>
>
> We attatch (hopefully all) of the relevant logs.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel to luminous upgrade, chooseleaf_vary_r and chooseleaf_stable

2018-05-15 Thread Adrian
Thanks Dan,

After talking it through we've decided to adopt your approach too and leave
the tunables till after the upgrade.

Regards,
Adrian.

On Mon, May 14, 2018 at 5:14 PM, Dan van der Ster 
wrote:

> Hi Adrian,
>
> Is there a strict reason why you *must* upgrade the tunables?
>
> It is normally OK to run with old (e.g. hammer) tunables on a luminous
> cluster. The crush placement won't be state of the art, but that's not
> a huge problem.
>
> We have a lot of data in a jewel cluster with hammer tunables. We'll
> upgrade that to luminous soon, but don't plan to set chooseleaf_stable
> until there's less disruptive procedure, e.g.  [1].
>
> Cheers, Dan
>
> [1] One idea I had to make this much less disruptive would be to
> script something that uses upmap's to lock all PGs into their current
> placement, then set chooseleaf_stable, then gradually remove the
> upmap's. There are some details to work out, and it requires all
> clients to be running luminous, but I think something like this could
> help...
>
>
>
>
> On Mon, May 14, 2018 at 9:01 AM, Adrian  wrote:
> > Hi all,
> >
> > We recently upgraded our old ceph cluster to jewel (5xmon, 21xstorage
> hosts
> > with 9x6tb filestore osds and 3xssd's with 3 journals on each) - mostly
> used
> > for openstack compute/cinder.
> >
> > In order to get there we had to go with chooseleaf_vary_r = 4 in order to
> > minimize client impact and save time. We now need to get to luminous (on
> a
> > deadline and time is limited).
> >
> > Current tunables are:
> >   {
> >   "choose_local_tries": 0,
> >   "choose_local_fallback_tries": 0,
> >   "choose_total_tries": 50,
> >   "chooseleaf_descend_once": 1,
> >   "chooseleaf_vary_r": 4,
> >   "chooseleaf_stable": 0,
> >   "straw_calc_version": 1,
> >   "allowed_bucket_algs": 22,
> >   "profile": "unknown",
> >   "optimal_tunables": 0,
> >   "legacy_tunables": 0,
> >   "minimum_required_version": "firefly",
> >   "require_feature_tunables": 1,
> >   "require_feature_tunables2": 1,
> >   "has_v2_rules": 0,
> >   "require_feature_tunables3": 1,
> >   "has_v3_rules": 0,
> >   "has_v4_buckets": 0,
> >   "require_feature_tunables5": 0,
> >   "has_v5_rules": 0
> >   }
> >
> > Setting chooseleaf_stable to 1, the crush compare tool says:
> >Replacing the crushmap specified with --origin with the crushmap
> >   specified with --destination will move 8774 PGs (59.08417508417509% of
> the
> > total)
> >   from one item to another.
> >
> > Current tunings we have in ceph.conf are:
> >   #THROTTLING CEPH
> >   osd_max_backfills = 1
> >   osd_recovery_max_active = 1
> >   osd_recovery_op_priority = 1
> >   osd_client_op_priority = 63
> >
> >   #PERFORMANCE TUNING
> >   osd_op_threads = 6
> >   filestore_op_threads = 10
> >   filestore_max_sync_interval = 30
> >
> > I was wondering if anyone has any advice as to anything else we can do
> > balancing client impact and speed of recovery or war stories of other
> things
> > to consider.
> >
> > I'm also wondering about the interplay between chooseleaf_vary_r and
> > chooseleaf_stable.
> > Are we better with
> > 1) sticking with choosleaf_vary_r = 4, setting chooseleaf_stable =1,
> > upgrading and then setting chooseleaf_vary_r incrementally to 1 when more
> > time is available
> > or
> > 2) setting chooseleaf_vary_r incrementally first, then chooseleaf_stable
> and
> > finally upgrade
> >
> > All this bearing in mind we'd like to keep the time it takes us to get to
> > luminous as short as possible ;-) (guestimating a 59% rebalance to take
> many
> > days)
> >
> > Any advice/thoughts gratefully received.
> >
> > Regards,
> > Adrian.
> >
> > --
> > ---
> > Adrian : aussie...@gmail.com
> > If violence doesn't solve your problem, you're not using enough of it.
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>



-- 
---
Adrian : aussie...@gmail.com
If violence doesn't solve your problem, you're not using enough of it.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com