Re: [ceph-users] Ceph 10.1.1 rbd map fail

2016-06-21 Thread 王海涛
I find this message in dmesg:
[83090.212918] libceph: mon0 192.168.159.128:6789 feature set mismatch, my 
4a042a42 < server's 2004a042a42, missing 200


According to 
"http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client;,
this could mean that I need to upgrade kernel client up to 3.15 or disable 
tunable 3 features.
Our cluster is not convenient to upgrade. 
Could you tell me how to disable tunable 3 features?


Thanks!


Kind Regards,
Haitao Wang

At 2016-06-22 12:33:42, "Brad Hubbard"  wrote:
>On Wed, Jun 22, 2016 at 1:35 PM, 王海涛  wrote:
>> Hi All
>>
>> I'm using ceph-10.1.1 to map a rbd image ,but it dosen't work ,the error
>> messages are:
>>
>> root@heaven:~#rbd map rbd/myimage --id admin
>> 2016-06-22 11:16:34.546623 7fc87ca53d80 -1 WARNING: the following dangerous
>> and experimental features are enabled: bluestore,rocksdb
>> 2016-06-22 11:16:34.547166 7fc87ca53d80 -1 WARNING: the following dangerous
>> and experimental features are enabled: bluestore,rocksdb
>> 2016-06-22 11:16:34.549018 7fc87ca53d80 -1 WARNING: the following dangerous
>> and experimental features are enabled: bluestore,rocksdb
>> rbd: sysfs write failed
>> rbd: map failed: (5) Input/output error
>
>Anything in dmesg, or anywhere, about "feature set mismatch" ?
>
>http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client
>
>>
>> Could someone tell me what's wrong?
>> Thanks!
>>
>> Kind Regards,
>> Haitao Wang
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
>-- 
>Cheers,
>Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 10.1.1 rbd map fail

2016-06-21 Thread Brad Hubbard
On Wed, Jun 22, 2016 at 1:35 PM, 王海涛  wrote:
> Hi All
>
> I'm using ceph-10.1.1 to map a rbd image ,but it dosen't work ,the error
> messages are:
>
> root@heaven:~#rbd map rbd/myimage --id admin
> 2016-06-22 11:16:34.546623 7fc87ca53d80 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2016-06-22 11:16:34.547166 7fc87ca53d80 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2016-06-22 11:16:34.549018 7fc87ca53d80 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> rbd: sysfs write failed
> rbd: map failed: (5) Input/output error

Anything in dmesg, or anywhere, about "feature set mismatch" ?

http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client

>
> Could someone tell me what's wrong?
> Thanks!
>
> Kind Regards,
> Haitao Wang
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance issue with jewel on ubuntu xenial (kernel)

2016-06-21 Thread Florian Haas
Hi Yoann,

On Tue, Jun 21, 2016 at 3:11 PM, Yoann Moulin  wrote:
> Hello,
>
> I found a performance drop between kernel 3.13.0-88 (default kernel on Ubuntu
> Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 16.04)
>
> ceph version is Jewel (10.2.2).
> All tests have been done under Ubuntu 14.04

Knowing that you also have an internalis cluster on almost identical
hardware, can you please let the list know whether you see the same
behavior (severely reduced throughput on a 4.4 kernel, vs. 3.13) on
that cluster as well?

Thank you.

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph 10.1.1 rbd map fail

2016-06-21 Thread 王海涛
Hi All

I'm using ceph-10.1.1 to map a rbd image ,but it dosen't work ,the error 
messages are:root@heaven:~#rbd map rbd/myimage --id admin
2016-06-22 11:16:34.546623 7fc87ca53d80 -1 WARNING: the following dangerous and 
experimental features are enabled: bluestore,rocksdb
2016-06-22 11:16:34.547166 7fc87ca53d80 -1 WARNING: the following dangerous and 
experimental features are enabled: bluestore,rocksdb
2016-06-22 11:16:34.549018 7fc87ca53d80 -1 WARNING: the following dangerous and 
experimental features are enabled: bluestore,rocksdb
rbd: sysfs write failed
rbd: map failed: (5) Input/output error
Could someone tell me what's wrong?Thanks!

Kind Regards,
Haitao Wang

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Performance vs Entry Level San Arrays

2016-06-21 Thread Christian Balzer

Hello,

On Wed, 22 Jun 2016 11:09:46 +1200 Denver Williams wrote:

> Hi All
> 
> 
> I'm planning an Open-stack Private Cloud Deplyment and I'm trying to
> Decide what would be the Better Option?
> 
> What would the Performance Advantages/Disadvantages be when comparing a
> 3 Node Ceph Setup with 15K/12G SAS Drives in an HP Dl380p G8 Server with
> SSDs for Write Cache, compared to something like an HP MSA 2040 10GBe
> iSCSI Array, All network connections would be 10GBe.
>
Very complex question and it's easy to compare apples and oranges here as
well.

For starters, I have no experience with that (or any similar) SAN and all
my iSCSI experiences are purely based on tests, not production
environments (and thus performance numbers).

I'd also pit non-HP boxes (unless you get a massive discount from them of
course) against the SAN, both for cost and design flexibility.
And 15k or not, 12Gb/s SAS is overkill in my book for anything but SSDs.

That all being said, I'd venture the SAN will win performance wise, it
having 4GB HW cache on the RAID controllers can mask RAID6 performance
drops and if you'd deploy raid 10's and tiering to SSDs with it that should
only get better.
There's a reason I deploy my mailbox servers as DRBD Pacemaker cluster
pairs and not as with Ceph as backing storage.

3 Ceph storage nodes will give you the capacity of just one due to
replication and you incur the latency penalty associated with that as
well. 

Ceph could outgrow and potentially out-perform that SAN (in it's maximum
configuration), but clearly you're not looking for that.

Ceph also has potentially more resilience, but that's not a performance
question either.

It would be helpful to put a little more meat on that question, as in:

- What are you needs (space, IOPS)?
- What are the costs for either solution? (get a quote from HP)


Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Performance vs Entry Level San Arrays

2016-06-21 Thread Denver Williams
Hi All


I'm planning an Open-stack Private Cloud Deplyment and I'm trying to
Decide what would be the Better Option?

What would the Performance Advantages/Disadvantages be when comparing a
3 Node Ceph Setup with 15K/12G SAS Drives in an HP Dl380p G8 Server with
SSDs for Write Cache, compared to something like an HP MSA 2040 10GBe
iSCSI Array, All network connections would be 10GBe.


Kind Regards,
Denver Williams

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow request, waiting for rw locks / subops from osd doing deep scrub of pg in rgw.buckets.index

2016-06-21 Thread Samuel Just
.rgw.bucket.index.pool is the pool with rgw's index objects, right?
The actual on-disk directory for one of those pgs would contain only
empty files -- the actual index data is stored in the osd's leveldb
instance.  I suspect your index objects are very large (because the
buckets contain many objects) and are taking a long time to scrub.
iirc, there is a way to make rgw split up those index objects into
smaller ones.
-Sam

On Tue, Jun 21, 2016 at 11:58 AM, Trygve Vea
 wrote:
> Hi,
>
> I believe I've stumbled on a bug in Ceph, and I'm currently trying to figure 
> out if this is a new bug, some behaviour caused by our cluster being in the 
> midst of a hammer(0.94.6)->jewel(10.2.2) upgrade, or other factors.
>
> The state of the cluster at the time of the incident:
>
> - All monitor nodes are running 10.2.2.
> - One OSD-server (4 osds) is up with 10.2.2 and with all pg's in active+clean.
> - One OSD-server (4 osds) is up with 10.2.2 and undergoing backfills 
> (however: nobackfill was set, as we try to keep backfills running during 
> night time).
>
> We have 4 OSD-servers with 4 osds each with 0.94.6.
> We have 3 OSD-servers with 2 osds each with 0.94.6.
>
>
> We experienced something that heavily affected our RGW-users.  Some requests 
> interfacing with 0.94.6 nodes were slow.
>
> During a 10 minute window, our RGW-nodes ran out of available workers and 
> ceased to respond.
>
> Some nodes logged some lines like these (only 0.94.6 nodes):
>
> 2016-06-21 09:51:08.053886 7f54610d8700  0 log_channel(cluster) log [WRN] : 2 
> slow requests, 1 included below; oldest blocked for > 74.368036 secs
> 2016-06-21 09:51:08.053951 7f54610d8700  0 log_channel(cluster) log [WRN] : 
> slow request 30.056333 seconds old, received at 2016-06-21 09:50:37.997327: 
> osd_op(client.9433496.0:1089298249 somergwuser.buckets [call 
> user.set_buckets_info] 12.da8df901 ondisk+write+known_if_redirected e9906) 
> currently waiting for rw locks
>
>
> Some nodes logged some lines like these (there were some, but not 100% 
> overlap between osds that logged these and the beforementioned lines - only 
> 0.94.6 nodes):
>
> 2016-06-21 09:51:48.677474 7f8cb6628700  0 log_channel(cluster) log [WRN] : 2 
> slow requests, 1 included below; oldest blocked for > 42.033650 secs
> 2016-06-21 09:51:48.677565 7f8cb6628700  0 log_channel(cluster) log [WRN] : 
> slow request 30.371173 seconds old, received at 2016-06-21 09:51:18.305770: 
> osd_op(client.9525441.0:764274789 gc.1164 [call lock.lock] 7.7b4f1779 
> ondisk+write+known_if_redirected e9906) currently waiting for subops from 
> 40,50
>
> All of the osds that logged these lines, were waiting for subops from osd.50
>
>
> Investigating what's going on this osd during that window:
>
> 2016-06-21 09:48:22.064630 7f1cbb41d700  0 log_channel(cluster) log [INF] : 
> 5.b5 deep-scrub starts
> 2016-06-21 09:59:56.640012 7f1c90163700  0 -- 10.21.9.22:6800/2003521 >> 
> 10.20.9.21:6805/7755 pipe(0x1e47a000 sd=298 :39448 s=2 pgs=23 cs=1 l=0 
> c=0x1033ba20).fault with nothing to send, going to standby
> 2016-06-21 09:59:56.997763 7f1c700f8700  0 -- 10.21.9.22:6808/3521 >> 
> 10.21.9.12:0/1028533 pipe(0x1f30f000 sd=87 :6808 s=0 pgs=0 cs=0 l=1 
> c=0x743c840).accept replacing existing (lossy) channel (new one lossy=1)
> 2016-06-21 10:00:39.938700 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
> 33 slow requests, 33 included below; oldest blocked for > 727.862759 secs
> 2016-06-21 10:00:39.938708 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
> slow request 670.918857 seconds old, received at 2016-06-21 09:49:29.019653: 
> osd_op(client.9403437.0:1209613500 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 
> 9.442585e6 ack+read+known_if_redirected e9906) currently no flag points 
> reached
> 2016-06-21 10:00:39.938800 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
> slow request 689.815851 seconds old, received at 2016-06-21 09:49:10.122660: 
> osd_op(client.9403437.0:1209611533 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 
> 9.442585e6 ack+read+known_if_redirected e9906) currently no flag points 
> reached
> 2016-06-21 10:00:39.938807 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
> slow request 670.895353 seconds old, received at 2016-06-21 09:49:29.043158: 
> osd_op(client.9403437.0:1209613505 prod.arkham [call 
> version.read,getxattrs,stat] 2.4da23de6 ack+read+known_if_redirected e9906) 
> currently no flag points reached
> 2016-06-21 10:00:39.938810 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
> slow request 688.612303 seconds old, received at 2016-06-21 09:49:11.326207: 
> osd_op(client.20712623.0:137251515 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 
> 9.442585e6 ack+read+known_if_redirected e9906) currently no flag points 
> reached
> 2016-06-21 10:00:39.938813 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
> slow request 658.605163 seconds old, received at 2016-06-21 09:49:41.48: 
> osd_op(client.20712623.0:137254412 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 
> 

[ceph-users] Bluestore Backend Tech Talk

2016-06-21 Thread Patrick McGarry
Hey cephers,

Just a reminder, the Bluestore backend Ceph Tech Talk by Sage is going
to be starting in ~10m. Feel free to dial in and ask questions.
Thanks.

http://ceph.com/ceph-tech-talks/



-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] slow request, waiting for rw locks / subops from osd doing deep scrub of pg in rgw.buckets.index

2016-06-21 Thread Trygve Vea
Hi,

I believe I've stumbled on a bug in Ceph, and I'm currently trying to figure 
out if this is a new bug, some behaviour caused by our cluster being in the 
midst of a hammer(0.94.6)->jewel(10.2.2) upgrade, or other factors.

The state of the cluster at the time of the incident:

- All monitor nodes are running 10.2.2.
- One OSD-server (4 osds) is up with 10.2.2 and with all pg's in active+clean.
- One OSD-server (4 osds) is up with 10.2.2 and undergoing backfills (however: 
nobackfill was set, as we try to keep backfills running during night time).

We have 4 OSD-servers with 4 osds each with 0.94.6.
We have 3 OSD-servers with 2 osds each with 0.94.6.


We experienced something that heavily affected our RGW-users.  Some requests 
interfacing with 0.94.6 nodes were slow.  

During a 10 minute window, our RGW-nodes ran out of available workers and 
ceased to respond.

Some nodes logged some lines like these (only 0.94.6 nodes):

2016-06-21 09:51:08.053886 7f54610d8700  0 log_channel(cluster) log [WRN] : 2 
slow requests, 1 included below; oldest blocked for > 74.368036 secs
2016-06-21 09:51:08.053951 7f54610d8700  0 log_channel(cluster) log [WRN] : 
slow request 30.056333 seconds old, received at 2016-06-21 09:50:37.997327: 
osd_op(client.9433496.0:1089298249 somergwuser.buckets [call 
user.set_buckets_info] 12.da8df901 ondisk+write+known_if_redirected e9906) 
currently waiting for rw locks


Some nodes logged some lines like these (there were some, but not 100% overlap 
between osds that logged these and the beforementioned lines - only 0.94.6 
nodes):

2016-06-21 09:51:48.677474 7f8cb6628700  0 log_channel(cluster) log [WRN] : 2 
slow requests, 1 included below; oldest blocked for > 42.033650 secs
2016-06-21 09:51:48.677565 7f8cb6628700  0 log_channel(cluster) log [WRN] : 
slow request 30.371173 seconds old, received at 2016-06-21 09:51:18.305770: 
osd_op(client.9525441.0:764274789 gc.1164 [call lock.lock] 7.7b4f1779 
ondisk+write+known_if_redirected e9906) currently waiting for subops from 40,50

All of the osds that logged these lines, were waiting for subops from osd.50


Investigating what's going on this osd during that window:

2016-06-21 09:48:22.064630 7f1cbb41d700  0 log_channel(cluster) log [INF] : 
5.b5 deep-scrub starts
2016-06-21 09:59:56.640012 7f1c90163700  0 -- 10.21.9.22:6800/2003521 >> 
10.20.9.21:6805/7755 pipe(0x1e47a000 sd=298 :39448 s=2 pgs=23 cs=1 l=0 
c=0x1033ba20).fault with nothing to send, going to standby
2016-06-21 09:59:56.997763 7f1c700f8700  0 -- 10.21.9.22:6808/3521 >> 
10.21.9.12:0/1028533 pipe(0x1f30f000 sd=87 :6808 s=0 pgs=0 cs=0 l=1 
c=0x743c840).accept replacing existing (lossy) channel (new one lossy=1)
2016-06-21 10:00:39.938700 7f1cd9828700  0 log_channel(cluster) log [WRN] : 33 
slow requests, 33 included below; oldest blocked for > 727.862759 secs
2016-06-21 10:00:39.938708 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
slow request 670.918857 seconds old, received at 2016-06-21 09:49:29.019653: 
osd_op(client.9403437.0:1209613500 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 
9.442585e6 ack+read+known_if_redirected e9906) currently no flag points reached
2016-06-21 10:00:39.938800 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
slow request 689.815851 seconds old, received at 2016-06-21 09:49:10.122660: 
osd_op(client.9403437.0:1209611533 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 
9.442585e6 ack+read+known_if_redirected e9906) currently no flag points reached
2016-06-21 10:00:39.938807 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
slow request 670.895353 seconds old, received at 2016-06-21 09:49:29.043158: 
osd_op(client.9403437.0:1209613505 prod.arkham [call 
version.read,getxattrs,stat] 2.4da23de6 ack+read+known_if_redirected e9906) 
currently no flag points reached
2016-06-21 10:00:39.938810 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
slow request 688.612303 seconds old, received at 2016-06-21 09:49:11.326207: 
osd_op(client.20712623.0:137251515 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 
9.442585e6 ack+read+known_if_redirected e9906) currently no flag points reached
2016-06-21 10:00:39.938813 7f1cd9828700  0 log_channel(cluster) log [WRN] : 
slow request 658.605163 seconds old, received at 2016-06-21 09:49:41.48: 
osd_op(client.20712623.0:137254412 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 
9.442585e6 ack+read+known_if_redirected e9906) currently no flag points reached
2016-06-21 10:00:39.960300 7f1cbb41d700  0 log_channel(cluster) log [INF] : 
5.b5 deep-scrub ok

Looking at the contents of 5.b5 (which is in our .rgw.buckets.index pool, if 
relevant) and it's almost empty (12KB of files on the disk) I find it unlikely 
for a scrubbing to take that long.  Which is why I suspect we've ran into a bug.

With the information I've provided here, can anyone shed some light on what 
this may be, and if it's a bug that is not fixed in HEAD; What information 
would be useful to include in a bug report?


Regards
-- 
Trygve Vea

Re: [ceph-users] Issue installing ceph with ceph-deploy

2016-06-21 Thread Vasu Kulkarni
On Tue, Jun 21, 2016 at 8:16 AM, shane  wrote:
> Fran Barrera  writes:
>
>>
>> Hi all,
>> I have a problem installing ceph jewel with ceph-deploy (1.5.33) on ubuntu
> 14.04.4 (openstack instance).
>>
>> This is my setup:
>>
>>
>> ceph-admin
>>
>> ceph-mon
>> ceph-osd-1
>> ceph-osd-2
>>
>>
>> I've following these steps from ceph-admin node:
>>
>> I have the user "ceph" created in all nodes and access from ssh key.
>>
>>
>> 1. # wget -q -O-
> 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key
> add -
>>
>> 2. # echo deb http://download.ceph.com/debian-jewel/ $(lsb_release -sc)
> main | tee /etc/apt/sources.list.d/ceph.list
>> 3. # apt-get update
>> 4. # apt-get install ceph-deploy
>> 5. $ ceph-deploy new ceph-mon
>> 6. Modify ceph.conf and add "osd_pool_default_size = 2"
>> 7. $ ceph-deploy install ceph-admin ceph-mon ceph-osd-1 ceph-osd-2
>>
>> And this is the output:
>>
>> [ceph-admin][DEBUG ] Setting up ceph-common (10.2.1-1trusty) ...
>> [ceph-admin][DEBUG ] Setting system user ceph properties..Processing
> triggers for libc-bin (2.19-0ubuntu6.9) ...
>> [ceph-admin][WARNIN] usermod: user ceph is currently used by process 1303
>> [ceph-admin][WARNIN] dpkg: error processing package ceph-common 
>> (--configure):
>> [ceph-admin][WARNIN]  subprocess installed post-installation script
> returned error exit status 8
>> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of
> ceph-base:
>> [ceph-admin][WARNIN]  ceph-base depends on ceph-common (= 10.2.1-1trusty);
> however:
>> [ceph-admin][WARNIN]   Package ceph-common is not configured yet.
>> [ceph-admin][WARNIN]
>> [ceph-admin][WARNIN] dpkg: error processing package ceph-base (--configure):
>> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
>> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of
> ceph-mon:
>> [ceph-admin][WARNIN]  ceph-mon depends on ceph-base (= 10.2.1-1trusty);
> however:
>> [ceph-admin][WARNIN]   Package ceph-base is not configured yet.
>> [ceph-admin][WARNIN]
>> [ceph-admin][WARNIN] dpkg: error processing package ceph-mon (--configure):
>> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
>> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of
> ceph-osd:
>> [ceph-admin][WARNIN]  ceph-osd depends on ceph-base (= 10.2.1-1trusty);
> however:
>> [ceph-admin][WARNIN]   Package ceph-base is not configured yet.
>> [ceph-admin][WARNIN]
>> [ceph-admin][WARNIN] dpkg: error processing package ceph-osd (--configure):
>> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
>> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of ceph:
>> [ceph-admin][WARNIN]  ceph depends on ceph-mon (= 10.2.1-1trusty); however:
>> [ceph-admin][WARNIN]   Package ceph-mon is not configured yet.
>> [ceph-admin][WARNIN]  ceph depends on ceph-osd (= 10.2.1-1trusty); however:
>> [ceph-admin][WARNIN]   Package ceph-osd is not configured yet.
>> [ceph-admin][WARNIN]
>> [ceph-admin][WARNIN] dpkg: error processing package ceph (--configure):
>> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
>> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of
> ceph-mds:
>> [ceph-admin][WARNIN]  ceph-mds depends on ceph-base (= 10.2.1-1trusty);
> however:
>> [ceph-admin][WARNIN]   Package ceph-base is not configured yet.
>> [ceph-admin][WARNIN]
>> [ceph-admin][WARNIN] dpkg: error processing package ceph-mds (--configure):
>> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
>> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of
> radosgw:
>> [ceph-admin][WARNIN]  radosgw depends on ceph-common (= 10.2.1-1trusty);
> however:
>> [ceph-admin][WARNIN]   Package ceph-common is not configured yet.
>> [ceph-admin][WARNIN]
>> [ceph-admin][WARNIN] dpkg: error processing package radosgw (--configure):
>> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
>> [ceph-admin][WARNIN] No apport report written because the error message
> indicates its a followup error from a previous failure.
>> [ceph-admin][WARNIN] No apport report written because the error message
> indicates its a followup error from a previous failure.
>> [ceph-admin][WARNIN] No apport report written because MaxReports is
> reached already
>> [ceph-admin][WARNIN] No apport report written because MaxReports is
> reached already
>> [ceph-admin][WARNIN] No apport report written because MaxReports is
> reached already
>> [ceph-admin][WARNIN] No apport report written because MaxReports is
> reached already
>> [ceph-admin][DEBUG ] Processing triggers for ureadahead (0.100.0-16) ...
>> [ceph-admin][WARNIN] Errors were encountered while processing:
>> [ceph-admin][WARNIN]  ceph-common
>> [ceph-admin][WARNIN]  ceph-base
>> [ceph-admin][WARNIN]  ceph-mon
>> [ceph-admin][WARNIN]  ceph-osd
>> [ceph-admin][WARNIN]  ceph
>> [ceph-admin][WARNIN]  ceph-mds
>> [ceph-admin][WARNIN]  

Re: [ceph-users] Chown / symlink issues on download.ceph.com

2016-06-21 Thread Dan Mick
On 06/20/2016 12:54 AM, Wido den Hollander wrote:
> Hi Dan,
> 
> There seems to be a symlink issue on download.ceph.com:
> 
> # rsync -4 -avrn download.ceph.com::ceph /tmp|grep 'rpm-hammer/rhel7'
> rpm-hammer/rhel7 -> /home/dhc-user/repos/rpm-hammer/el7
> 
> Could you take a quick look at that? It breaks the syncs for all the other 
> mirrors who sync from download.ceph.com
> 
> Maybe do a chown (automated, cron?) as well to make sure all the files are 
> readable by rsync?
> 
> Thanks!
> 
> Wido
> 

I've just removed the symlink.  It probably was doing no good.

If there are further perm issues I don't see them, but let me know.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue installing ceph with ceph-deploy

2016-06-21 Thread shane
Fran Barrera  writes:

> 
> Hi all,
> I have a problem installing ceph jewel with ceph-deploy (1.5.33) on ubuntu
14.04.4 (openstack instance).
> 
> This is my setup:
> 
> 
> ceph-admin
> 
> ceph-mon
> ceph-osd-1
> ceph-osd-2
> 
> 
> I've following these steps from ceph-admin node:
> 
> I have the user "ceph" created in all nodes and access from ssh key.
> 
> 
> 1. # wget -q -O-
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key
add -
> 
> 2. # echo deb http://download.ceph.com/debian-jewel/ $(lsb_release -sc)
main | tee /etc/apt/sources.list.d/ceph.list
> 3. # apt-get update
> 4. # apt-get install ceph-deploy
> 5. $ ceph-deploy new ceph-mon
> 6. Modify ceph.conf and add "osd_pool_default_size = 2"
> 7. $ ceph-deploy install ceph-admin ceph-mon ceph-osd-1 ceph-osd-2
> 
> And this is the output:
> 
> [ceph-admin][DEBUG ] Setting up ceph-common (10.2.1-1trusty) ...
> [ceph-admin][DEBUG ] Setting system user ceph properties..Processing
triggers for libc-bin (2.19-0ubuntu6.9) ...
> [ceph-admin][WARNIN] usermod: user ceph is currently used by process 1303
> [ceph-admin][WARNIN] dpkg: error processing package ceph-common (--configure):
> [ceph-admin][WARNIN]  subprocess installed post-installation script
returned error exit status 8
> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of
ceph-base:
> [ceph-admin][WARNIN]  ceph-base depends on ceph-common (= 10.2.1-1trusty);
however:
> [ceph-admin][WARNIN]   Package ceph-common is not configured yet.
> [ceph-admin][WARNIN]
> [ceph-admin][WARNIN] dpkg: error processing package ceph-base (--configure):
> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of
ceph-mon:
> [ceph-admin][WARNIN]  ceph-mon depends on ceph-base (= 10.2.1-1trusty);
however:
> [ceph-admin][WARNIN]   Package ceph-base is not configured yet.
> [ceph-admin][WARNIN]
> [ceph-admin][WARNIN] dpkg: error processing package ceph-mon (--configure):
> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of
ceph-osd:
> [ceph-admin][WARNIN]  ceph-osd depends on ceph-base (= 10.2.1-1trusty);
however:
> [ceph-admin][WARNIN]   Package ceph-base is not configured yet.
> [ceph-admin][WARNIN]
> [ceph-admin][WARNIN] dpkg: error processing package ceph-osd (--configure):
> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of ceph:
> [ceph-admin][WARNIN]  ceph depends on ceph-mon (= 10.2.1-1trusty); however:
> [ceph-admin][WARNIN]   Package ceph-mon is not configured yet.
> [ceph-admin][WARNIN]  ceph depends on ceph-osd (= 10.2.1-1trusty); however:
> [ceph-admin][WARNIN]   Package ceph-osd is not configured yet.
> [ceph-admin][WARNIN]
> [ceph-admin][WARNIN] dpkg: error processing package ceph (--configure):
> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of
ceph-mds:
> [ceph-admin][WARNIN]  ceph-mds depends on ceph-base (= 10.2.1-1trusty);
however:
> [ceph-admin][WARNIN]   Package ceph-base is not configured yet.
> [ceph-admin][WARNIN]
> [ceph-admin][WARNIN] dpkg: error processing package ceph-mds (--configure):
> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
> [ceph-admin][WARNIN] dpkg: dependency problems prevent configuration of
radosgw:
> [ceph-admin][WARNIN]  radosgw depends on ceph-common (= 10.2.1-1trusty);
however:
> [ceph-admin][WARNIN]   Package ceph-common is not configured yet.
> [ceph-admin][WARNIN]
> [ceph-admin][WARNIN] dpkg: error processing package radosgw (--configure):
> [ceph-admin][WARNIN]  dependency problems - leaving unconfigured
> [ceph-admin][WARNIN] No apport report written because the error message
indicates its a followup error from a previous failure.
> [ceph-admin][WARNIN] No apport report written because the error message
indicates its a followup error from a previous failure.
> [ceph-admin][WARNIN] No apport report written because MaxReports is
reached already
> [ceph-admin][WARNIN] No apport report written because MaxReports is
reached already
> [ceph-admin][WARNIN] No apport report written because MaxReports is
reached already
> [ceph-admin][WARNIN] No apport report written because MaxReports is
reached already
> [ceph-admin][DEBUG ] Processing triggers for ureadahead (0.100.0-16) ...
> [ceph-admin][WARNIN] Errors were encountered while processing:
> [ceph-admin][WARNIN]  ceph-common
> [ceph-admin][WARNIN]  ceph-base
> [ceph-admin][WARNIN]  ceph-mon
> [ceph-admin][WARNIN]  ceph-osd
> [ceph-admin][WARNIN]  ceph
> [ceph-admin][WARNIN]  ceph-mds
> [ceph-admin][WARNIN]  radosgw
> [ceph-admin][WARNIN] E: Sub-process /usr/bin/dpkg returned an error code (1)
> [ceph-admin][ERROR ] RuntimeError: command returned non-zero exit status: 100
> [ceph_deploy][ERROR ] 

[ceph-users] performance issue with jewel on ubuntu xenial (kernel)

2016-06-21 Thread Yoann Moulin
Hello,

I found a performance drop between kernel 3.13.0-88 (default kernel on Ubuntu
Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 16.04)

ceph version is Jewel (10.2.2).
All tests have been done under Ubuntu 14.04

Kernel 4.4 has a drop of 50% compared to 4.2
Kernel 4.4 has a drop of 40% compared to 3.13

details below:

With the 3 kernel I have the same performance on disks :

Raw benchmark:
dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average ~230MB/s
dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct   => average ~220MB/s

Filesystem mounted benchmark:
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1  => average ~205MB/s
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average ~214MB/s
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync   => average ~190MB/s

Ceph osd Benchmark:
Kernel 3.13.0-88-generic : ceph tell osd.ID => average  ~81MB/s
Kernel 4.2.0-38-generic  : ceph tell osd.ID => average ~109MB/s
Kernel 4.4.0-24-generic  : ceph tell osd.ID => average  ~50MB/s

Does anyone get a similar behaviour on their cluster ?

Best regards

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds udev rules not triggered on reboot (jewel, jessie)

2016-06-21 Thread Loic Dachary


On 16/06/2016 18:01, stephane.d...@orange.com wrote:
> Hi,
> 
> Same issue with Centos 7, I also put back this file in /etc/udev/rules.d. 

Hi Stephane,

Could you please detail which version of CentOS 7 you are using ? I tried to 
reproduce the problem with CentOS 7.2 as found on the CentOS cloud images 
repository ( 
http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1511.qcow2 
) but it "works for me".

Thanks !

> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Alexandre DERUMIER
> Sent: Thursday, June 16, 2016 17:53
> To: Karsten Heymann; Loris Cuoghi
> Cc: Loic Dachary; ceph-users
> Subject: Re: [ceph-users] osds udev rules not triggered on reboot (jewel, 
> jessie)
> 
> Hi,
> 
> I have the same problem with osd disks not mounted at boot on jessie with 
> ceph jewel
> 
> workaround is to re-add 60-ceph-partuuid-workaround.rules file to udev
> 
> http://tracker.ceph.com/issues/16351
> 
> 
> - Mail original -
> De: "aderumier" 
> À: "Karsten Heymann" , "Loris Cuoghi" 
> 
> Cc: "Loic Dachary" , "ceph-users" 
> 
> Envoyé: Jeudi 28 Avril 2016 07:42:04
> Objet: Re: [ceph-users] osds udev rules not triggered on reboot (jewel,   
> jessie)
> 
> Hi, 
> they are missing target files in debian packages 
> 
> http://tracker.ceph.com/issues/15573 
> https://github.com/ceph/ceph/pull/8700 
> 
> I have also done some other trackers about packaging bug 
> 
> jewel: debian package: wrong /etc/default/ceph/ceph location 
> http://tracker.ceph.com/issues/15587 
> 
> debian/ubuntu : TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES not specified in 
> /etc/default/cep 
> http://tracker.ceph.com/issues/15588 
> 
> jewel: debian package: init.d script bug 
> http://tracker.ceph.com/issues/15585 
> 
> 
> @CC loic dachary, maybe he could help to speed up packaging fixes 
> 
> - Mail original - 
> De: "Karsten Heymann"  
> À: "Loris Cuoghi"  
> Cc: "ceph-users"  
> Envoyé: Mercredi 27 Avril 2016 15:20:29 
> Objet: Re: [ceph-users] osds udev rules not triggered on reboot (jewel, 
> jessie) 
> 
> 2016-04-27 15:18 GMT+02:00 Loris Cuoghi : 
>> Le 27/04/2016 14:45, Karsten Heymann a écrit : 
>>> one workaround I found was to add 
>>>
>>> [Install] 
>>> WantedBy=ceph-osd.target 
>>>
>>> to /lib/systemd/system/ceph-disk@.service and then manually enable my 
>>> disks with 
>>>
>>> # systemctl enable ceph-disk\@dev-sdi1 
>>> # systemctl start ceph-disk\@dev-sdi1 
>>>
>>> That way they at least are started at boot time. 
> 
>> Great! But only if the disks keep their device names, right ? 
> 
> Exactly. It's just a little workaround until the real issue is fixed. 
> 
> +Karsten 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _
> 
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
> 
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Observations after upgrading to latest Hammer (0.94.7)

2016-06-21 Thread Kostis Fardelas
Hello,
I upgraded a staging ceph cluster from latest Firefly to latest Hammer
last week. Everything went fine overall and I would like to share my
observations so far:
a. every OSD upgrade lasts appr. 3 minutes. I doubt there is any way
to speed this up though
b. rados bench with different block sizes and different number of
threads produces consistently 15-20% better write/read IOPS/throughput
compared to Firefly. Subsequently, CPU load on OSD nodes was lower
during the bench
c. OSD apply latency increased 2x-3x for all OSDs. No clue though why
this is happening. Commitcycle/journal latencies are at the same
level. You may notice the effect for the apply latency at the uploaded
image [1]

It would be nice if someone else also shares his/her experience after
upgrading to Hammer or/and propose more core metrics that should be
looked at after major version upgrades.

[1] https://up1.ca/#V5vcso6i8IQ01Se62NJqng

Regards,
Kostis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Does flushbufs on a rbd-nbd invalidate librbd cache?

2016-06-21 Thread Nick Fisk
Hi All,

Does anybody know if calling a blockdev --flushbufs on a rbd-nbd device
causes the librbd read cache to be invalidated?

I've done a quick test and the invalidate_cache counter doesn't increment
like when you send the invalidate command via the admin socket.

Thanks,
Nick

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Regarding executing COSBench onto a specific pool

2016-06-21 Thread Venkata Manojawa Paritala
Hi,

In Ceph cluster, currently we are seeing that COSBench is writing IO to
default pools that are created while configuring rados gw. Can you please
let me know, if there is a way to execute IO (using COSBech) on a specific
pool.

Thanks & Regards,
Manoj
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PGs

2016-06-21 Thread Paweł Sadowski
Thanks for response.

All OSDs seems to be ok, they have been restarted, joined cluster after
that, nothing weird in the logs.

# ceph pg dump_stuck stale
ok

# ceph pg dump_stuck inactive
ok
pg_statstateupup_primaryactingacting_primary
3.2929incomplete[109,272,83]109[109,272,83]109
3.1683incomplete[166,329,281]166[166,329,281]166

# ceph pg dump_stuck unclean
ok
pg_statstateupup_primaryactingacting_primary
3.2929incomplete[109,272,83]109[109,272,83]109
3.1683incomplete[166,329,281]166[166,329,281]166


On OSD 166 there is 100 blocked ops (on 109 too), they all end on
"event": "reached_pg"

# ceph --admin-daemon /var/run/ceph/ceph-osd.166.asok dump_ops_in_flight
...
{
"description": "osd_op(client.958764031.0:18137113
rbd_data.392585982ae8944a.0ad4 [set-alloc-hint object_size
4194304 write_size 4194304,write 2641920~8192] 3.d6195683 RETRY=15
ack+ondisk+retry+write+known_if_redirected e613241)",
"initiated_at": "2016-06-21 10:19:59.894393",
"age": 828.025527,
"duration": 600.020809,
"type_data": [
"reached pg",
{
"client": "client.958764031",
"tid": 18137113
},
[
{
"time": "2016-06-21 10:19:59.894393",
"event": "initiated"
},
{
"time": "2016-06-21 10:29:59.915202",
"event": "reached_pg"
}
]
]
}
],
"num_ops": 100
}



On 06/21/2016 12:27 PM, M Ranga Swami Reddy wrote:
> you can use the below cmds:
> ==
>
> ceph pg dump_stuck stale
> ceph pg dump_stuck inactive
> ceph pg dump_stuck unclean
> ===
>
> And the query the PG, which are in unclean or stale state, check for
> any issue with a specific OSD.
>
> Thanks
> Swami
>
> On Tue, Jun 21, 2016 at 3:02 PM, Paweł Sadowski  wrote:
>> Hello,
>>
>> We have an issue on one of our clusters. One node with 9 OSD was down
>> for more than 12 hours. During that time cluster recovered without
>> problems. When host back to the cluster we got two PGs in incomplete
>> state. We decided to mark OSDs on this host as out but the two PGs are
>> still in incomplete state. Trying to query those pg hangs forever. We
>> were alredy trying restarting OSDs. Is there any way to solve this issue
>> without loosing data? Any help appreciate :)
>>
>> # ceph health detail | grep incomplete
>> HEALTH_WARN 2 pgs incomplete; 2 pgs stuck inactive; 2 pgs stuck unclean;
>> 200 requests are blocked > 32 sec; 2 osds have slow requests;
>> noscrub,nodeep-scrub flag(s) set
>> pg 3.2929 is stuck inactive since forever, current state incomplete,
>> last acting [109,272,83]
>> pg 3.1683 is stuck inactive since forever, current state incomplete,
>> last acting [166,329,281]
>> pg 3.2929 is stuck unclean since forever, current state incomplete, last
>> acting [109,272,83]
>> pg 3.1683 is stuck unclean since forever, current state incomplete, last
>> acting [166,329,281]
>> pg 3.1683 is incomplete, acting [166,329,281] (reducing pool vms
>> min_size from 2 may help; search ceph.com/docs for 'incomplete')
>> pg 3.2929 is incomplete, acting [109,272,83] (reducing pool vms min_size
>> from 2 may help; search ceph.com/docs for 'incomplete')
>>
>> Directory for PG 3.1683 is present on OSD 166 and containes ~8GB.
>>
>> We didn't try setting min_size to 1 yet (we treat is as a last resort).
>>
>>
>>
>> Some cluster info:
>> # ceph --version
>>
>> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>
>> # ceph -s
>>  health HEALTH_WARN
>> 2 pgs incomplete
>> 2 pgs stuck inactive
>> 2 pgs stuck unclean
>> 200 requests are blocked > 32 sec
>> noscrub,nodeep-scrub flag(s) set
>>  monmap e7: 5 mons at
>> {mon-03=*.2:6789/0,mon-04=*.36:6789/0,mon-05=*.81:6789/0,mon-06=*.0:6789/0,mon-07=*.40:6789/0}
>> election epoch 3250, quorum 0,1,2,3,4
>> mon-06,mon-07,mon-04,mon-03,mon-05
>>  osdmap e613040: 346 osds: 346 up, 337 in
>> flags noscrub,nodeep-scrub
>>   pgmap v27163053: 18624 pgs, 6 pools, 138 TB data, 39062 kobjects
>> 415 TB used, 186 TB / 601 TB avail
>>18622 active+clean
>>2 incomplete
>>   client io 9992 kB/s rd, 64867 kB/s wr, 8458 op/s
>>
>>
>> # ceph osd pool get vms pg_num
>> pg_num: 16384
>>
>> # ceph osd pool get vms size
>> size: 3
>>
>> # ceph osd pool get vms min_size
>> min_size: 2

-- 
PS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bucket index question

2016-06-21 Thread Василий Ангапов
Hello,

I have a questions regarding the bucket index:

1) As far as know index of a given bucket is the single RADOS object
and it lives in OSD omap. But does it get replicated or not?

2) When trying to copy bucket index pool to some other pool i get the
following error:
$ rados cppool ed-1.rgw.buckets.index test
ed-1.rgw.buckets.index:.dir.06ee966c-5b48-4c53-8ed8-36bbf53204f5.171499.1
=> test:.dir.06ee966c-5b48-4c53-8ed8-36bbf53204f5.171499.1
error copying object: (2) No such file or directory
error copying pool ed-1.rgw.buckets.index => test: (34) Numerical
result out of range

and the object is not getting copied. Btw, this particular index
servers as an index of bucket with almost 19 million objects and it is
not sharded.
$ ceph df | grep -e ed-1.rgw.buckets.data -e NAME
NAMEID USED   %USED MAX AVAIL
   OBJECTS
ed-1.rgw.buckets.data   16 52998G  4.96  769T
   38414882
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD out/down detection

2016-06-21 Thread Adrien Gillard
Regarding your original issue, you may want to configure kdump on one of
the machines to get more insight on what is happening when the box
hangs/crashes.

I faced a similar issue when trying 4.4.8 on my Infernalis cluster (box
hangs, black screen, OSD down and out), and as it happens, there were cases
with similar traces [0][1].

I didn't have the time at the moment to run more tests so I went back to
using stock 3.10.

Also note that the default kdump behavior on kernel panic is to dump the
kernel and restart the server.


[0] https://lkml.org/lkml/2016/3/17/570
[1] https://lkml.org/lkml/2016/5/17/136

On Mon, Jun 20, 2016 at 4:12 AM, Adrian Saul 
wrote:

> Hi All,
>  We have a Jewel (10.2.1) cluster on Centos 7 - I am using an  elrepo
> 4.4.1 kernel on all machines and we have an issue where some of the
> machines hang - not sure if its hardware or OS but essentially the host
> including the console is unresponsive and can only be recovered with a
> hardware reset.  Unfortunately nothing useful is logged so I am still
> trying to figure out what is going on to cause this.   But the result for
> ceph is that if an OSD host goes down like this we have run into an issue
> where only some of its OSDs are marked down.In the instance on the
> weekend, the host had 8 OSDs and only 5 got marked as down - this lead to
> the kRBD devices jamming up trying to send IO to non-responsive OSDs that
> stayed marked up.
>
> The machine went into a slow death - lots of reports of slow or blocked
> requests:
>
> 2016-06-19 09:37:49.070810 osd.36 10.145.2.15:6802/31359 65 : cluster
> [WRN] 2 slow requests, 2 included below; oldest blocked for > 30.297258 secs
> 2016-06-19 09:37:54.071542 osd.36 10.145.2.15:6802/31359 82 : cluster
> [WRN] 112 slow requests, 5 included below; oldest blocked for > 35.297988
> secs
> 2016-06-19 09:37:54.071737 osd.6 10.145.2.15:6801/21836 221 : cluster
> [WRN] 253 slow requests, 5 included below; oldest blocked for > 35.325155
> secs
> 2016-06-19 09:37:59.072570 osd.6 10.145.2.15:6801/21836 251 : cluster
> [WRN] 262 slow requests, 5 included below; oldest blocked for > 40.325986
> secs
>
> And then when the monitors did report them down the OSDs disputed that:
>
> 2016-06-19 09:38:35.821716 mon.0 10.145.2.13:6789/0 244970 : cluster
> [INF] osd.6 10.145.2.15:6801/21836 failed (2 reporters from different
> host after 20.000365 >= grace 20.00)
> 2016-06-19 09:38:36.950556 mon.0 10.145.2.13:6789/0 244978 : cluster
> [INF] osd.22 10.145.2.15:6806/21826 failed (2 reporters from different
> host after 21.613336 >= grace 20.00)
> 2016-06-19 09:38:36.951133 mon.0 10.145.2.13:6789/0 244980 : cluster
> [INF] osd.31 10.145.2.15:6812/21838 failed (2 reporters from different
> host after 21.613781 >= grace 20.836511)
> 2016-06-19 09:38:36.951636 mon.0 10.145.2.13:6789/0 244982 : cluster
> [INF] osd.36 10.145.2.15:6802/31359 failed (2 reporters from different
> host after 21.614259 >= grace 20.00)
>
> 2016-06-19 09:38:37.156088 osd.36 10.145.2.15:6802/31359 346 : cluster
> [WRN] map e28730 wrongly marked me down
> 2016-06-19 09:38:36.002076 osd.6 10.145.2.15:6801/21836 473 : cluster
> [WRN] map e28729 wrongly marked me down
> 2016-06-19 09:38:37.046885 osd.22 10.145.2.15:6806/21826 374 : cluster
> [WRN] map e28730 wrongly marked me down
> 2016-06-19 09:38:37.050635 osd.31 10.145.2.15:6812/21838 351 : cluster
> [WRN] map e28730 wrongly marked me down
>
> But shortly after
>
> 2016-06-19 09:43:39.940985 mon.0 10.145.2.13:6789/0 245305 : cluster
> [INF] osd.6 out (down for 303.951251)
> 2016-06-19 09:43:39.941061 mon.0 10.145.2.13:6789/0 245306 : cluster
> [INF] osd.22 out (down for 302.908528)
> 2016-06-19 09:43:39.941099 mon.0 10.145.2.13:6789/0 245307 : cluster
> [INF] osd.31 out (down for 302.908527)
> 2016-06-19 09:43:39.941152 mon.0 10.145.2.13:6789/0 245308 : cluster
> [INF] osd.36 out (down for 302.908527)
>
> 2016-06-19 10:09:10.648924 mon.0 10.145.2.13:6789/0 247076 : cluster
> [INF] osd.23 10.145.2.15:6814/21852 failed (2 reporters from different
> host after 20.000378 >= grace 20.00)
> 2016-06-19 10:09:10.887220 osd.23 10.145.2.15:6814/21852 176 : cluster
> [WRN] map e28848 wrongly marked me down
> 2016-06-19 10:14:15.160513 mon.0 10.145.2.13:6789/0 247422 : cluster
> [INF] osd.23 out (down for 304.288018)
>
> By the time the issue was eventually escalated and I was able to do
> something about it I manual marked the remaining host OSDs down (which
> seemed to unclog RBD):
>
> 2016-06-19 15:25:06.171395 mon.0 10.145.2.13:6789/0 267212 : cluster
> [INF] osd.7 10.145.2.15:6808/21837 failed (2 reporters from different
> host after 22.000367 >= grace 20.00)
> 2016-06-19 15:25:06.171905 mon.0 10.145.2.13:6789/0 267214 : cluster
> [INF] osd.24 10.145.2.15:6800/21813 failed (2 reporters from different
> host after 22.000748 >= grace 20.710981)
> 2016-06-19 15:25:06.172426 mon.0 10.145.2.13:6789/0 267216 : cluster
> [INF] osd.37 10.145.2.15:6810/31936 

Re: [ceph-users] Inconsistent PGs

2016-06-21 Thread M Ranga Swami Reddy
you can use the below cmds:
==

ceph pg dump_stuck stale
ceph pg dump_stuck inactive
ceph pg dump_stuck unclean
===

And the query the PG, which are in unclean or stale state, check for
any issue with a specific OSD.

Thanks
Swami

On Tue, Jun 21, 2016 at 3:02 PM, Paweł Sadowski  wrote:
> Hello,
>
> We have an issue on one of our clusters. One node with 9 OSD was down
> for more than 12 hours. During that time cluster recovered without
> problems. When host back to the cluster we got two PGs in incomplete
> state. We decided to mark OSDs on this host as out but the two PGs are
> still in incomplete state. Trying to query those pg hangs forever. We
> were alredy trying restarting OSDs. Is there any way to solve this issue
> without loosing data? Any help appreciate :)
>
> # ceph health detail | grep incomplete
> HEALTH_WARN 2 pgs incomplete; 2 pgs stuck inactive; 2 pgs stuck unclean;
> 200 requests are blocked > 32 sec; 2 osds have slow requests;
> noscrub,nodeep-scrub flag(s) set
> pg 3.2929 is stuck inactive since forever, current state incomplete,
> last acting [109,272,83]
> pg 3.1683 is stuck inactive since forever, current state incomplete,
> last acting [166,329,281]
> pg 3.2929 is stuck unclean since forever, current state incomplete, last
> acting [109,272,83]
> pg 3.1683 is stuck unclean since forever, current state incomplete, last
> acting [166,329,281]
> pg 3.1683 is incomplete, acting [166,329,281] (reducing pool vms
> min_size from 2 may help; search ceph.com/docs for 'incomplete')
> pg 3.2929 is incomplete, acting [109,272,83] (reducing pool vms min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> Directory for PG 3.1683 is present on OSD 166 and containes ~8GB.
>
> We didn't try setting min_size to 1 yet (we treat is as a last resort).
>
>
>
> Some cluster info:
> # ceph --version
>
> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>
> # ceph -s
>  health HEALTH_WARN
> 2 pgs incomplete
> 2 pgs stuck inactive
> 2 pgs stuck unclean
> 200 requests are blocked > 32 sec
> noscrub,nodeep-scrub flag(s) set
>  monmap e7: 5 mons at
> {mon-03=*.2:6789/0,mon-04=*.36:6789/0,mon-05=*.81:6789/0,mon-06=*.0:6789/0,mon-07=*.40:6789/0}
> election epoch 3250, quorum 0,1,2,3,4
> mon-06,mon-07,mon-04,mon-03,mon-05
>  osdmap e613040: 346 osds: 346 up, 337 in
> flags noscrub,nodeep-scrub
>   pgmap v27163053: 18624 pgs, 6 pools, 138 TB data, 39062 kobjects
> 415 TB used, 186 TB / 601 TB avail
>18622 active+clean
>2 incomplete
>   client io 9992 kB/s rd, 64867 kB/s wr, 8458 op/s
>
>
> # ceph osd pool get vms pg_num
> pg_num: 16384
>
> # ceph osd pool get vms size
> size: 3
>
> # ceph osd pool get vms min_size
> min_size: 2
>
>
> --
> PS
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] librbd compatibility

2016-06-21 Thread Jason Dillaman
The librbd API is stable between releases.  While new API methods
might be added, the older API methods are kept for backwards
compatibility.  For example, qemu-kvm under RHEL 7 is built against a
librbd from Firefly but can function using a librbd from Jewel.

On Tue, Jun 21, 2016 at 1:47 AM, min fang  wrote:
> Hi, is there a document describing librbd compatibility?  For example,
> something like this: librbd from Ceph 0.88 can also be applied to
> 0.90,0.91..
>
> I hope not keep librbd relative stable, so can avoid more code iteration and
> testing.
>
> Thanks.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PGs

2016-06-21 Thread Paweł Sadowski
Already restarted those OSD and then whole cluster (rack by rack,
failure domain is rack in this setup).
We would like to try *ceph-objectstore-tool mark-complete* operation. Is
there any way (other than checking mtime on file and querying PGs) to
determine which replica has most up to date datas?

On 06/21/2016 12:37 PM, M Ranga Swami Reddy wrote:
> Try to restart OSD 109 and 166? check if it help?
>
>
> On Tue, Jun 21, 2016 at 4:05 PM, Paweł Sadowski  wrote:
>> Thanks for response.
>>
>> All OSDs seems to be ok, they have been restarted, joined cluster after
>> that, nothing weird in the logs.
>>
>> # ceph pg dump_stuck stale
>> ok
>>
>> # ceph pg dump_stuck inactive
>> ok
>> pg_statstateupup_primaryactingacting_primary
>> 3.2929incomplete[109,272,83]109[109,272,83]109
>> 3.1683incomplete[166,329,281]166[166,329,281]166
>>
>> # ceph pg dump_stuck unclean
>> ok
>> pg_statstateupup_primaryactingacting_primary
>> 3.2929incomplete[109,272,83]109[109,272,83]109
>> 3.1683incomplete[166,329,281]166[166,329,281]166
>>
>>
>> On OSD 166 there is 100 blocked ops (on 109 too), they all end on
>> "event": "reached_pg"
>>
>> # ceph --admin-daemon /var/run/ceph/ceph-osd.166.asok dump_ops_in_flight
>> ...
>> {
>> "description": "osd_op(client.958764031.0:18137113
>> rbd_data.392585982ae8944a.0ad4 [set-alloc-hint object_size
>> 4194304 write_size 4194304,write 2641920~8192] 3.d6195683 RETRY=15
>> ack+ondisk+retry+write+known_if_redirected e613241)",
>> "initiated_at": "2016-06-21 10:19:59.894393",
>> "age": 828.025527,
>> "duration": 600.020809,
>> "type_data": [
>> "reached pg",
>> {
>> "client": "client.958764031",
>> "tid": 18137113
>> },
>> [
>> {
>> "time": "2016-06-21 10:19:59.894393",
>> "event": "initiated"
>> },
>> {
>> "time": "2016-06-21 10:29:59.915202",
>> "event": "reached_pg"
>> }
>> ]
>> ]
>> }
>> ],
>> "num_ops": 100
>> }
>>
>>
>>
>> On 06/21/2016 12:27 PM, M Ranga Swami Reddy wrote:
>>> you can use the below cmds:
>>> ==
>>>
>>> ceph pg dump_stuck stale
>>> ceph pg dump_stuck inactive
>>> ceph pg dump_stuck unclean
>>> ===
>>>
>>> And the query the PG, which are in unclean or stale state, check for
>>> any issue with a specific OSD.
>>>
>>> Thanks
>>> Swami
>>>
>>> On Tue, Jun 21, 2016 at 3:02 PM, Paweł Sadowski  wrote:
 Hello,

 We have an issue on one of our clusters. One node with 9 OSD was down
 for more than 12 hours. During that time cluster recovered without
 problems. When host back to the cluster we got two PGs in incomplete
 state. We decided to mark OSDs on this host as out but the two PGs are
 still in incomplete state. Trying to query those pg hangs forever. We
 were alredy trying restarting OSDs. Is there any way to solve this issue
 without loosing data? Any help appreciate :)

 # ceph health detail | grep incomplete
 HEALTH_WARN 2 pgs incomplete; 2 pgs stuck inactive; 2 pgs stuck unclean;
 200 requests are blocked > 32 sec; 2 osds have slow requests;
 noscrub,nodeep-scrub flag(s) set
 pg 3.2929 is stuck inactive since forever, current state incomplete,
 last acting [109,272,83]
 pg 3.1683 is stuck inactive since forever, current state incomplete,
 last acting [166,329,281]
 pg 3.2929 is stuck unclean since forever, current state incomplete, last
 acting [109,272,83]
 pg 3.1683 is stuck unclean since forever, current state incomplete, last
 acting [166,329,281]
 pg 3.1683 is incomplete, acting [166,329,281] (reducing pool vms
 min_size from 2 may help; search ceph.com/docs for 'incomplete')
 pg 3.2929 is incomplete, acting [109,272,83] (reducing pool vms min_size
 from 2 may help; search ceph.com/docs for 'incomplete')

 Directory for PG 3.1683 is present on OSD 166 and containes ~8GB.

 We didn't try setting min_size to 1 yet (we treat is as a last resort).



 Some cluster info:
 # ceph --version

 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)

 # ceph -s
  health HEALTH_WARN
 2 pgs incomplete
 2 pgs stuck inactive
 2 pgs stuck unclean
 200 requests are blocked > 32 sec
 noscrub,nodeep-scrub flag(s) set
  monmap e7: 5 mons at
 {mon-03=*.2:6789/0,mon-04=*.36:6789/0,mon-05=*.81:6789/0,mon-06=*.0:6789/0,mon-07=*.40:6789/0}
 election epoch 3250, quorum 

[ceph-users] performance issue with jewel on ubuntu xenial (kernel)

2016-06-21 Thread Yoann Moulin
Hello,

I found a performance drop between kernel 3.13.0-88 (default kernel on Ubuntu
Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 16.04)

ceph version is Jewel (10.2.2).
All tests have been done under Ubuntu 14.04

Kernel 4.4 has a drop of 50% compared to 4.2
Kernel 4.4 has a drop of 40% compared to 3.13

details below:

With the 3 kernel I have the same performance on disks :

Raw benchmark:
dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average ~230MB/s
dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct   => average ~220MB/s

Filesystem mounted benchmark:
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1  => average ~205MB/s
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average ~214MB/s
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync   => average ~190MB/s

Ceph osd Benchmark:
Kernel 3.13.0-88-generic : ceph tell osd.ID => average  ~81MB/s
Kernel 4.2.0-38-generic  : ceph tell osd.ID => average ~109MB/s
Kernel 4.4.0-24-generic  : ceph tell osd.ID => average  ~50MB/s

Does anyone get a similar behaviour on their cluster ?

Best regards

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PGs

2016-06-21 Thread M Ranga Swami Reddy
Try to restart OSD 109 and 166? check if it help?


On Tue, Jun 21, 2016 at 4:05 PM, Paweł Sadowski  wrote:
> Thanks for response.
>
> All OSDs seems to be ok, they have been restarted, joined cluster after
> that, nothing weird in the logs.
>
> # ceph pg dump_stuck stale
> ok
>
> # ceph pg dump_stuck inactive
> ok
> pg_statstateupup_primaryactingacting_primary
> 3.2929incomplete[109,272,83]109[109,272,83]109
> 3.1683incomplete[166,329,281]166[166,329,281]166
>
> # ceph pg dump_stuck unclean
> ok
> pg_statstateupup_primaryactingacting_primary
> 3.2929incomplete[109,272,83]109[109,272,83]109
> 3.1683incomplete[166,329,281]166[166,329,281]166
>
>
> On OSD 166 there is 100 blocked ops (on 109 too), they all end on
> "event": "reached_pg"
>
> # ceph --admin-daemon /var/run/ceph/ceph-osd.166.asok dump_ops_in_flight
> ...
> {
> "description": "osd_op(client.958764031.0:18137113
> rbd_data.392585982ae8944a.0ad4 [set-alloc-hint object_size
> 4194304 write_size 4194304,write 2641920~8192] 3.d6195683 RETRY=15
> ack+ondisk+retry+write+known_if_redirected e613241)",
> "initiated_at": "2016-06-21 10:19:59.894393",
> "age": 828.025527,
> "duration": 600.020809,
> "type_data": [
> "reached pg",
> {
> "client": "client.958764031",
> "tid": 18137113
> },
> [
> {
> "time": "2016-06-21 10:19:59.894393",
> "event": "initiated"
> },
> {
> "time": "2016-06-21 10:29:59.915202",
> "event": "reached_pg"
> }
> ]
> ]
> }
> ],
> "num_ops": 100
> }
>
>
>
> On 06/21/2016 12:27 PM, M Ranga Swami Reddy wrote:
>> you can use the below cmds:
>> ==
>>
>> ceph pg dump_stuck stale
>> ceph pg dump_stuck inactive
>> ceph pg dump_stuck unclean
>> ===
>>
>> And the query the PG, which are in unclean or stale state, check for
>> any issue with a specific OSD.
>>
>> Thanks
>> Swami
>>
>> On Tue, Jun 21, 2016 at 3:02 PM, Paweł Sadowski  wrote:
>>> Hello,
>>>
>>> We have an issue on one of our clusters. One node with 9 OSD was down
>>> for more than 12 hours. During that time cluster recovered without
>>> problems. When host back to the cluster we got two PGs in incomplete
>>> state. We decided to mark OSDs on this host as out but the two PGs are
>>> still in incomplete state. Trying to query those pg hangs forever. We
>>> were alredy trying restarting OSDs. Is there any way to solve this issue
>>> without loosing data? Any help appreciate :)
>>>
>>> # ceph health detail | grep incomplete
>>> HEALTH_WARN 2 pgs incomplete; 2 pgs stuck inactive; 2 pgs stuck unclean;
>>> 200 requests are blocked > 32 sec; 2 osds have slow requests;
>>> noscrub,nodeep-scrub flag(s) set
>>> pg 3.2929 is stuck inactive since forever, current state incomplete,
>>> last acting [109,272,83]
>>> pg 3.1683 is stuck inactive since forever, current state incomplete,
>>> last acting [166,329,281]
>>> pg 3.2929 is stuck unclean since forever, current state incomplete, last
>>> acting [109,272,83]
>>> pg 3.1683 is stuck unclean since forever, current state incomplete, last
>>> acting [166,329,281]
>>> pg 3.1683 is incomplete, acting [166,329,281] (reducing pool vms
>>> min_size from 2 may help; search ceph.com/docs for 'incomplete')
>>> pg 3.2929 is incomplete, acting [109,272,83] (reducing pool vms min_size
>>> from 2 may help; search ceph.com/docs for 'incomplete')
>>>
>>> Directory for PG 3.1683 is present on OSD 166 and containes ~8GB.
>>>
>>> We didn't try setting min_size to 1 yet (we treat is as a last resort).
>>>
>>>
>>>
>>> Some cluster info:
>>> # ceph --version
>>>
>>> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>>
>>> # ceph -s
>>>  health HEALTH_WARN
>>> 2 pgs incomplete
>>> 2 pgs stuck inactive
>>> 2 pgs stuck unclean
>>> 200 requests are blocked > 32 sec
>>> noscrub,nodeep-scrub flag(s) set
>>>  monmap e7: 5 mons at
>>> {mon-03=*.2:6789/0,mon-04=*.36:6789/0,mon-05=*.81:6789/0,mon-06=*.0:6789/0,mon-07=*.40:6789/0}
>>> election epoch 3250, quorum 0,1,2,3,4
>>> mon-06,mon-07,mon-04,mon-03,mon-05
>>>  osdmap e613040: 346 osds: 346 up, 337 in
>>> flags noscrub,nodeep-scrub
>>>   pgmap v27163053: 18624 pgs, 6 pools, 138 TB data, 39062 kobjects
>>> 415 TB used, 186 TB / 601 TB avail
>>>18622 active+clean
>>>2 incomplete
>>>   client io 9992 kB/s rd, 64867 kB/s wr, 8458 op/s
>>>
>>>
>>> # ceph osd pool get vms pg_num
>>> pg_num: 16384
>>>
>>> # ceph osd pool get 

[ceph-users] Inconsistent PGs

2016-06-21 Thread Paweł Sadowski
Hello,

We have an issue on one of our clusters. One node with 9 OSD was down
for more than 12 hours. During that time cluster recovered without
problems. When host back to the cluster we got two PGs in incomplete
state. We decided to mark OSDs on this host as out but the two PGs are
still in incomplete state. Trying to query those pg hangs forever. We
were alredy trying restarting OSDs. Is there any way to solve this issue
without loosing data? Any help appreciate :)

# ceph health detail | grep incomplete
HEALTH_WARN 2 pgs incomplete; 2 pgs stuck inactive; 2 pgs stuck unclean;
200 requests are blocked > 32 sec; 2 osds have slow requests;
noscrub,nodeep-scrub flag(s) set
pg 3.2929 is stuck inactive since forever, current state incomplete,
last acting [109,272,83]
pg 3.1683 is stuck inactive since forever, current state incomplete,
last acting [166,329,281]
pg 3.2929 is stuck unclean since forever, current state incomplete, last
acting [109,272,83]
pg 3.1683 is stuck unclean since forever, current state incomplete, last
acting [166,329,281]
pg 3.1683 is incomplete, acting [166,329,281] (reducing pool vms
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 3.2929 is incomplete, acting [109,272,83] (reducing pool vms min_size
from 2 may help; search ceph.com/docs for 'incomplete')

Directory for PG 3.1683 is present on OSD 166 and containes ~8GB.

We didn't try setting min_size to 1 yet (we treat is as a last resort).



Some cluster info:
# ceph --version

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)

# ceph -s
 health HEALTH_WARN
2 pgs incomplete
2 pgs stuck inactive
2 pgs stuck unclean
200 requests are blocked > 32 sec
noscrub,nodeep-scrub flag(s) set
 monmap e7: 5 mons at
{mon-03=*.2:6789/0,mon-04=*.36:6789/0,mon-05=*.81:6789/0,mon-06=*.0:6789/0,mon-07=*.40:6789/0}
election epoch 3250, quorum 0,1,2,3,4
mon-06,mon-07,mon-04,mon-03,mon-05
 osdmap e613040: 346 osds: 346 up, 337 in
flags noscrub,nodeep-scrub
  pgmap v27163053: 18624 pgs, 6 pools, 138 TB data, 39062 kobjects
415 TB used, 186 TB / 601 TB avail
   18622 active+clean
   2 incomplete
  client io 9992 kB/s rd, 64867 kB/s wr, 8458 op/s


# ceph osd pool get vms pg_num
pg_num: 16384

# ceph osd pool get vms size
size: 3

# ceph osd pool get vms min_size
min_size: 2


-- 
PS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS failover, how to speed it up?

2016-06-21 Thread Brian Lagoni
I will plan to add more logging and other info you have asked for at the
next MDS restart.

As this cluster are being used in production, I have a limited maintenance
window, so unless I don't find a time outside this window you have to wait
until Sunday/Monday to get the logs.

@John, yes I have used the "ceph mds fail " but I would like to do it
again with a bit more logging, just to be sure.

@Zheng, It might be due to pressure in the MDS server, I don't see a
critical high load on the MDS server ~ 0.4 and see ~90Mbit traffic from and
to the MDS in in average*.*
Also a extra question, when doing a "df -i" on the cephfs mountpoint, I get
a high inode count which looks like it's all the inodes on all the OSD's
combined divided with the amount of replicas, is this assumption correct?

Please let me know if there are any more info needed.

Regards

On 20 June 2016 at 14:09, Yan, Zheng  wrote:

> On Mon, Jun 20, 2016 at 7:04 PM, Brian Lagoni  wrote:
> > Are anyone here able to help us with a question about mds failover?
> >
> > The case is that we are hitting a bug in ceph which requires us to
> restart
> > the mds every week.
> > There is a bug and PR for it here -
> https://github.com/ceph/ceph/pull/9456
> > but until this have been resolved we need to do a restart. Unless there
> are
> > a better workaround for this bug?
> >
> > The issue we are having are when we do a failover, the time it takes for
> the
> > cephfs kernel client to recover are high enough so that the vm guests
> using
> > this cephfs are having timeouts to they storage and therefor enters
> readonly
> > mode.
> >
> > We have tried with making a failover to another mds or restarting the mds
> > while it's the only mds in the cluser and in both cases our cephfs kernel
> > client are taking too long to recover.
> > We have also tried to set the failover MDS into "MDS_STANDBY_REPLAY" mode
> > which didn't help on this matter.
> >
> > When doing a failover all IOPS against ceph are being blocked for 2-5 min
> > until the kernel cephfs clients recovers after some timeouts messages
> like
> > these:
> > "2016-06-19 19:09:55.573739 7faaf8f48700  0 log_channel(cluster) log
> [WRN] :
> > slow request 75.141028 seconds old, received at 2016-06-19
> 19:08:40.432655:
> > client_request(client.4283066:4164703242 getattr pAsLsXsFs #1fe
> > 2016-06-19 19:08:40.429496) currently failed to rdlock, waiting"
> > After this there is a huge spike i IOPS data starts to being processed
> > again.
> >
> > I'm not sure if any of this can be related to this warning which are
> present
> > 90% of the day.
> > "mds0: Behind on trimming (94/30)"?
> > I have searched the mailing list for clues and answers on what to do
> about
> > this but haven't found anything which have helped us.
> > We have move/isolated the MDS service to it's own VM with the fastest
> > processor we having, without any real changes to this warning.
> >
> >  Our infrastructure is the following:
> >  - We use CEPH/CEPHFS (10.2.1)
> >  - We have 3 mons and 6 storage servers with a total of 36 OSDs (~4160
> PGs).
> >  - We have one main mds and one standby mds.
> >  - The primary MDS is a virtual machine with 8 core E5-2643 v3 @
> > 3.40GHz(steal time=0), 16G mem
> >  - We are using ceph kernel client to mount cephfs.
> >  - Ubuntu 16.04 (4.4.0-22-generic kernel)
> >  - The OSD's are physical machines with 8 cores & 32GB memory
> >  - All networking is 10Gb
> >
> > So at the end are there anything we can do to make the failover and
> recovery
> > to go faster?
>
> I guess your MDS is very busy. there are lots of inodes in client
> cache. Please run 'ceph daemon mds.xxx session ls' before restarting
> the MDS, and send the output to us.
>
> Regards
> Yan, Zheng
>
>
> >
> > Regards,
> > Brian Lagoni
> > System administrator, Engineering Tools
> > Unity Technologies
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com