[ceph-users] RGW buckets sync to AWS?

2015-03-31 Thread Henrik Korkuc

Hello,

can anyone recommend script/program to periodically synchronize RGW 
buckets with Amazon's S3?


--
Sincerely
Henrik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host failure bring down the whole cluster

2015-03-31 Thread Kai KH Huang
1) But Ceph says ...You can run a cluster with 1 monitor. 
(http://ceph.com/docs/master/rados/operations/add-or-rm-mons/), I assume it 
should work. And brain split is not my current concern
2) I've written object to Ceph, now I just want to get it back

Anyway. I tried to reduce the mon number to 1. But after I remove it following 
the steps, it just cannot start up any more

1. [root~]  service ceph -a stop mon.serverB
2. [root~]  ceph mon remove serverB ## hang here forever
3. #Remove the monitor entry from ceph.conf.
4. Restart ceph service


[root@serverA~]# systemctl status ceph.service -l
ceph.service - LSB: Start Ceph distributed file system daemons at boot time
   Loaded: loaded (/etc/rc.d/init.d/ceph)
   Active: failed (Result: timeout) since Tue 2015-03-31 15:46:25 CST; 3min 15s 
ago
  Process: 2937 ExecStop=/etc/rc.d/init.d/ceph stop (code=exited, 
status=0/SUCCESS)
  Process: 3670 ExecStart=/etc/rc.d/init.d/ceph start (code=killed, signal=TERM)

Mar 31 15:44:26 serverA ceph[3670]: === osd.6 ===
Mar 31 15:44:56 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.6 --keyring=/var/lib/ceph/osd/ceph-6/keyring osd 
crush create-or-move -- 6 3.64 host=serverA root=default'
Mar 31 15:44:56 serverA ceph[3670]: === osd.7 ===
Mar 31 15:45:26 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.7 --keyring=/var/lib/ceph/osd/ceph-7/keyring osd 
crush create-or-move -- 7 3.64 host=serverA root=default'
Mar 31 15:45:26 serverA ceph[3670]: === osd.8 ===
Mar 31 15:45:57 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.8 --keyring=/var/lib/ceph/osd/ceph-8/keyring osd 
crush create-or-move -- 8 3.64 host=serverA root=default'
Mar 31 15:45:57 serverA ceph[3670]: === osd.9 ===
Mar 31 15:46:25 serverA systemd[1]: ceph.service operation timed out. 
Terminating.
Mar 31 15:46:25 serverA systemd[1]: Failed to start LSB: Start Ceph distributed 
file system daemons at boot time.
Mar 31 15:46:25 serverA systemd[1]: Unit ceph.service entered failed state.

/var/log/ceph/ceph.log says:
2015-03-31 15:55:57.648800 mon.0 10.???.78:6789/0 1048 : cluster [INF] osd.21 
10.???.78:6855/25598 failed (39 reports from 9 peers after 20.118062 = grace 
20.00)
2015-03-31 15:55:57.931889 mon.0 10.???.78:6789/0 1055 : cluster [INF] osd.15 
10..78:6825/23894 failed (39 reports from 9 peers after 20.401379 = grace 
20.00)

Obviously serverB is down, but it should not affect serverA from functioning? 
Right?

From: Gregory Farnum [g...@gregs42.com]
Sent: Tuesday, March 31, 2015 11:53 AM
To: Lindsay Mathieson; Kai KH Huang
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] One host failure bring down the whole cluster

On Mon, Mar 30, 2015 at 8:02 PM, Lindsay Mathieson
lindsay.mathie...@gmail.com wrote:
 On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:
 Hi, all
 I have a two-node Ceph cluster, and both are monitor and osd. When
 they're both up, osd are all up and in, everything is fine... almost:



 Two things.

 1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with
 just two monitors and you run a risk of split brain.

You can form quorums with an even number of monitors, and Ceph does so
— there's no risk of split brain.

The problem with 2 monitors is that a quorum is always 2 — which is
exactly what you're seeing right now. You can't run with only one
monitor up (assuming you have a non-zero number of them).

 2 - You also probably have a min size of two set (the default). This means
 that you need a minimum  of two copies of each data object for writes to work.
 So with just two nodes, if one goes down you can't write to the other.

Also this.



 So:
 - Install a extra monitor node - it doesn't have to be powerful, we just use a
 Intel Celeron NUC for that.

 - reduce your minimum size to 1 (One).

Yep.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw authorization failed

2015-03-31 Thread Neville

 
 Date: Mon, 30 Mar 2015 12:17:48 -0400
 From: yeh...@redhat.com
 To: neville.tay...@hotmail.co.uk
 CC: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Radosgw authorization failed
 
 
 
 - Original Message -
  From: Neville neville.tay...@hotmail.co.uk
  To: Yehuda Sadeh-Weinraub yeh...@redhat.com
  Cc: ceph-users@lists.ceph.com
  Sent: Monday, March 30, 2015 6:49:29 AM
  Subject: Re: [ceph-users] Radosgw authorization failed
  
  
   Date: Wed, 25 Mar 2015 11:43:44 -0400
   From: yeh...@redhat.com
   To: neville.tay...@hotmail.co.uk
   CC: ceph-users@lists.ceph.com
   Subject: Re: [ceph-users] Radosgw authorization failed
   
   
   
   - Original Message -
From: Neville neville.tay...@hotmail.co.uk
To: ceph-users@lists.ceph.com
Sent: Wednesday, March 25, 2015 8:16:39 AM
Subject: [ceph-users] Radosgw authorization failed

Hi all,

I'm testing backup product which supports Amazon S3 as target for 
Archive
storage and I'm trying to setup a Ceph cluster configured with the S3 
API
to
use as an internal target for backup archives instead of AWS.

I've followed the online guide for setting up Radosgw and created a
default
region and zone based on the AWS naming convention US-East-1. I'm not
sure
if this is relevant but since I was having issues I thought it might 
need
to
be the same.

I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can
create a bucket, create a folder, list buckets etc. The problem is when
the
backup software tries to create an object I get an authorization 
failure.
It's using the same user/access/secret as I'm using from boto.s3 and I'm
sure the creds are right as it lets me create the initial connection, it
just fails when trying to create an object (backup folder).

Here's the extract from the radosgw log:

-
2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
/:list_bucket:init op
2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
/:list_bucket:verifying op mask
2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1
user.op_mask=7
2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
/:list_bucket:verifying op permissions
2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for
uid=test
mask=49
2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for
group=1
mask=49
2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15
2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for
group=2
mask=49
2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15
2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test
owner=test perm=1
2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm
(type)=1,
policy perm=1, user_perm_mask=1, acl perm=1
2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET
/:list_bucket:verifying op params
2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET
/:list_bucket:executing
2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list
test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2])
start num 1001
2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET
/:list_bucket:http status=200
2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done
req=0x7f107000e2e0
http_status=200 ==
2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request
req=0x7f107000f0e0
2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request
req=0x7f107000f6b0
2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request
req=0x7f107000f0e0
2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88
2015-03-25 15:07:26.517084 7f1058dd7700 20
CONTENT_TYPE=application/octet-stream
2015-03-25 15:07:26.517085 7f1058dd7700 20 
CONTEXT_DOCUMENT_ROOT=/var/www
2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX=
2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www
2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER
2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1
2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS
F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs=
2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive
2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015
  

[ceph-users] Radosgw multi-region user creation question

2015-03-31 Thread Abhishek L
Hi

I'm trying to set up a POC multi-region radosgw configuration (with
different ceph clusters). Following the official docs[1], here the part
about creation of zone system users was not very clear. Going by an
example configuration of 2 regions US (master zone us-dc1), EU (master
zone eu-dc1) for eg. (with secondary zones of other also created in
these regions). 

If I create zone users seperately in the 2 regions ie. us-dc1 zone user
 eu-dc1 zone user, while the metadata sync does occur, if I try to
create a bucket with location passed as the secondary region, it fails
with an 403, access denied, as the system user of secondary region is
unknown to master region. I was able to bypass this by creating a system
user for secondary zone of secondary region in the master region (ie
creating a system user for eu secondary zone in us region) and then
recreating the user in the secondary region by passing on --access 
--secret-key parameter to recreate the same user with same keys. This
seemed to work, however I'm not sure whether this is the direction to
proceed, as the docs do not mention a step like this


[1] 
http://ceph.com/docs/master/radosgw/federated-config/#configure-a-secondary-region

-- 
Abhishek


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host failure bring down the whole cluster

2015-03-31 Thread Henrik Korkuc

On 3/31/15 11:27, Kai KH Huang wrote:

1) But Ceph says ...You can run a cluster with 1 monitor. 
(http://ceph.com/docs/master/rados/operations/add-or-rm-mons/), I assume it should work. 
And brain split is not my current concern

Point is that you must have majority of monitors up.
* In one monitor setup you need one monitor running,
* In two monitor setup you need two monitors running,because if one goes 
down you do not have majority up,
* In three monitor setup you need at least two monitors up, because if 
one goes down you still have majority up,

* 4 - at least 3
* 5 - at least 3
* etc




2) I've written object to Ceph, now I just want to get it back

Anyway. I tried to reduce the mon number to 1. But after I remove it following 
the steps, it just cannot start up any more

1. [root~]  service ceph -a stop mon.serverB
2. [root~]  ceph mon remove serverB ## hang here forever
3. #Remove the monitor entry from ceph.conf.
4. Restart ceph service
It is grey area for me, but I think that you failed to remove that 
monitor because you didn't have a quorum for operation to succeed. I 
think you'll need to modify monmap manually and remove second monitor 
from it




[root@serverA~]# systemctl status ceph.service -l
ceph.service - LSB: Start Ceph distributed file system daemons at boot time
Loaded: loaded (/etc/rc.d/init.d/ceph)
Active: failed (Result: timeout) since Tue 2015-03-31 15:46:25 CST; 3min 
15s ago
   Process: 2937 ExecStop=/etc/rc.d/init.d/ceph stop (code=exited, 
status=0/SUCCESS)
   Process: 3670 ExecStart=/etc/rc.d/init.d/ceph start (code=killed, 
signal=TERM)

Mar 31 15:44:26 serverA ceph[3670]: === osd.6 ===
Mar 31 15:44:56 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.6 --keyring=/var/lib/ceph/osd/ceph-6/keyring osd 
crush create-or-move -- 6 3.64 host=serverA root=default'
Mar 31 15:44:56 serverA ceph[3670]: === osd.7 ===
Mar 31 15:45:26 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.7 --keyring=/var/lib/ceph/osd/ceph-7/keyring osd 
crush create-or-move -- 7 3.64 host=serverA root=default'
Mar 31 15:45:26 serverA ceph[3670]: === osd.8 ===
Mar 31 15:45:57 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.8 --keyring=/var/lib/ceph/osd/ceph-8/keyring osd 
crush create-or-move -- 8 3.64 host=serverA root=default'
Mar 31 15:45:57 serverA ceph[3670]: === osd.9 ===
Mar 31 15:46:25 serverA systemd[1]: ceph.service operation timed out. 
Terminating.
Mar 31 15:46:25 serverA systemd[1]: Failed to start LSB: Start Ceph distributed 
file system daemons at boot time.
Mar 31 15:46:25 serverA systemd[1]: Unit ceph.service entered failed state.

/var/log/ceph/ceph.log says:
2015-03-31 15:55:57.648800 mon.0 10.???.78:6789/0 1048 : cluster [INF] osd.21 
10.???.78:6855/25598 failed (39 reports from 9 peers after 20.118062 = grace 
20.00)
2015-03-31 15:55:57.931889 mon.0 10.???.78:6789/0 1055 : cluster [INF] osd.15 
10..78:6825/23894 failed (39 reports from 9 peers after 20.401379 = grace 
20.00)

Obviously serverB is down, but it should not affect serverA from functioning? 
Right?

From: Gregory Farnum [g...@gregs42.com]
Sent: Tuesday, March 31, 2015 11:53 AM
To: Lindsay Mathieson; Kai KH Huang
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] One host failure bring down the whole cluster

On Mon, Mar 30, 2015 at 8:02 PM, Lindsay Mathieson
lindsay.mathie...@gmail.com wrote:

On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:

Hi, all
 I have a two-node Ceph cluster, and both are monitor and osd. When
they're both up, osd are all up and in, everything is fine... almost:



Two things.

1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with
just two monitors and you run a risk of split brain.

You can form quorums with an even number of monitors, and Ceph does so
— there's no risk of split brain.

The problem with 2 monitors is that a quorum is always 2 — which is
exactly what you're seeing right now. You can't run with only one
monitor up (assuming you have a non-zero number of them).


2 - You also probably have a min size of two set (the default). This means
that you need a minimum  of two copies of each data object for writes to work.
So with just two nodes, if one goes down you can't write to the other.

Also this.



So:
- Install a extra monitor node - it doesn't have to be powerful, we just use a
Intel Celeron NUC for that.

- reduce your minimum size to 1 (One).

Yep.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cannot add OSD node into crushmap or all writes fail

2015-03-31 Thread Henrik Korkuc

check firewall rules, network connectivity.
Can all nodes and clients reach each other? Can you telnet to OSD ports 
(note that multiple OSDs may listen on differenct ports)?


On 3/31/15 8:44, Tyler Bishop wrote:
I have this ceph node that will correctly recover into my ceph pool 
and performance looks to be normal for the rbd clients.  However after 
a few minutes once finishing recovery the rbd clients begin to fall 
over and cannot write data to the pool.


I've been trying to figure this out for weeks! None of the logs 
contain anything relevant at all.


If I disable the node in the crushmap the rbd clients immediately 
begin writing to the other nodes.


Ideas?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] One of three monitors can not be started

2015-03-31 Thread 张皓宇
Who can help me? 

One monitor in my ceph cluster can not be started. 
Before that, I added '[mon] mon_compact_on_start = true' to /etc/ceph/ceph.conf 
on three monitor hosts. Then I did 'ceph tell mon.computer05 compact ' on 
computer05, which has a monitor on it. 
When store.db of computer05 changed from 108G to 1G,  mon.computer06 stoped, 
and it can not be started since that. 

If I start mon.computer06, it will stop on this state:
# /etc/init.d/ceph start mon.computer06

=== mon.computer06 ===

Starting Ceph mon.computer06 on computer06...

The process info is like this:
root 12149  3807  0 20:46 pts/27   00:00:00 /bin/sh /etc/init.d/ceph start 
mon.computer06

root 12308 12149  0 20:46 pts/27   00:00:00 bash -c ulimit -n 32768;
  /usr/bin/ceph-mon -i computer06 --pid-file 
/var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf

root 12309 12308  0 20:46 pts/27   00:00:00 /usr/bin/ceph-mon -i 
computer06 --pid-file /var/run/ceph/mon.computer06.pid -c 
/etc/ceph/ceph.conf

root 12313 12309 19 20:46 pts/27   00:00:01 /usr/bin/ceph-mon -i 
computer06 --pid-file /var/run/ceph/mon.computer06.pid -c 
/etc/ceph/ceph.conf  

Log on computer06 is like this:
2015-03-30 20:46:54.152956 7fc5379d07a0  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
...
2015-03-30 20:46:54.759791 7fc5379d07a0  1 mon.computer06@-1(probing) e4 
preinit clean up potentially inconsistent store state

 Sorry, my English is not good.
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Hardware recommendation

2015-03-31 Thread f...@univ-lr.fr

Hi,

in our quest to get the right SSD for OSD journals, I managed to 
benchmark two kind of 10 DWPD SSDs :

- Toshiba M2 PX02SMF020
- Samsung 845DC PRO

I wan't to determine if a disk is appropriate considering its absolute 
performances, and the optimal number of ceph-osd processes using the SSD 
as a journal.
The benchmark consists of a fio command, with SYNC and DIRECT access 
options, and 4k blocks write accesses.


fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k 
--runtime=60 --time_based --group_reporting --name=journal-test 
--iodepth=1 or 16 --numjobs= ranging from 1 to 16


I think numjobs can represent the concurrent number of OSD served by 
this SSD. Am I right on this ?


   
http://www.4shared.com/download/WOvooKVXce/Fio-Direct-Sync-ToshibaM2-Sams.png?lgfp=3000


My understanding of that data is that the 845DC Pro cannot be used for 
more that 4 OSD.

The M2 is very constant in its comportment.
The iodepth has almost no impact on perfs here.

Could someone having other SSD types make the same test to consolidate 
the data ?


Among the short list that could be considered for that task (for their 
price/perfs/DWPD/...) :

- Seagate 1200 SSD 200GB, SAS 12Gb/s ST200FM0053
- Hitachi SSD800MM MLC HUSMM8020ASS200
- Intel DC3700

I've not yet considered write amplification mentionned in other posts.

Frederic

Josef Johansson jose...@gmail.com a écrit le 20/03/15 10:29 :



The 845DC Pro does look really nice, comparable with s3700 with TDW even.
The price is what really does it, as it’s almost a third compared with s3700..

  


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Creating and deploying OSDs in parallel

2015-03-31 Thread Dan van der Ster
Hi Somnath,
We have deployed many machines in parallel and it generally works.
Keep in mind that if you deploy many many (1000) then this will
create so many osdmap incrementals, so quickly, that the memory usage
on the OSDs will increase substantially (until you reboot).
Best Regards, Dan

On Mon, Mar 30, 2015 at 5:29 PM, Somnath Roy somnath@sandisk.com wrote:
 Hi,

 I am planning to modify our deployment script so that it can create and
 deploy multiple OSDs in parallel to the same host as well as on different
 hosts.

 Just wanted to check if there is any problem to run say ‘ceph-deploy osd
 create’ etc. in parallel while deploying cluster.



 Thanks  Regards

 Somnath


 

 PLEASE NOTE: The information contained in this electronic mail message is
 intended only for the use of the designated recipient(s) named above. If the
 reader of this message is not the intended recipient, you are hereby
 notified that you have received this message in error and that any review,
 dissemination, distribution, or copying of this message is strictly
 prohibited. If you have received this communication in error, please notify
 the sender by telephone or e-mail (as shown above) immediately and destroy
 any and all copies of this message in your possession (whether hard copies
 or electronically stored copies).


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One of three monitors can not be started

2015-03-31 Thread 张皓宇

There is asok on computer06. 
I tried to start the mon.computer06, maybe two hours later,  the mon.computer06 
still not start,
but there are some different processes on computer06, I don't know how to 
handle it:
root  7812 1  0 11:39 pts/400:00:00 python 
/usr/sbin/ceph-create-keys -i computer06
root 11025 1 12 09:02 pts/400:32:13 /usr/bin/ceph-mon -i computer06 
--pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
root 35692  7812  0 12:59 pts/400:00:00 python /usr/bin/ceph 
--cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.computer06.asok mon_status


I got the quorum_status from another running monitor:
{ election_epoch: 508,
  quorum: [
0,
1],
  quorum_names: [
computer05,
computer04],
  quorum_leader_name: computer04,
  monmap: { epoch: 4,
  fsid: 471483e5-493f-41f6-b6f4-0187c13d156d,
  modified: 2014-07-26 09:52:02.411967,
  created: 0.00,
  mons: [
{ rank: 0,
  name: computer04,
  addr: 192.168.1.60:6789\/0},
{ rank: 1,
  name: computer05,
  addr: 192.168.1.65:6789\/0},
{ rank: 2,
  name: computer06,
  addr: 192.168.1.66:6789\/0}]}} 

 Date: Tue, 31 Mar 2015 12:30:22 -0700
 Subject: Re: [ceph-users] One of three monitors can not be started
 From: g...@gregs42.com
 To: zhanghaoyu1...@hotmail.com
 CC: ceph-users@lists.ceph.com
 
 On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇 zhanghaoyu1...@hotmail.com wrote:
  Who can help me?
 
  One monitor in my ceph cluster can not be started.
  Before that, I added '[mon] mon_compact_on_start = true' to
  /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell
  mon.computer05 compact ' on computer05, which has a monitor on it.
  When store.db of computer05 changed from 108G to 1G,  mon.computer06 stoped,
  and it can not be started since that.
 
  If I start mon.computer06, it will stop on this state:
  # /etc/init.d/ceph start mon.computer06
  === mon.computer06 ===
  Starting Ceph mon.computer06 on computer06...
 
  The process info is like this:
  root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start
  mon.computer06
  root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768;
  /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid
  -c /etc/ceph/ceph.conf
  root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06
  --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
  root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i computer06
  --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
 
  Log on computer06 is like this:
  2015-03-30 20:46:54.152956 7fc5379d07a0  0 ceph version 0.72.2
  (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
  ...
  2015-03-30 20:46:54.759791 7fc5379d07a0  1 mon.computer06@-1(probing) e4
  preinit clean up potentially inconsistent store state
 
 So I haven't looked at this code in a while, but I think the monitor
 is trying to validate that it's consistent with the others. You
 probably want to dig around the monitor admin sockets and see what
 state each monitor is in, plus its perception of the others.
 
 In this case, I think maybe mon.computer06 is trying to examine its
 whole store, but 100GB is a lot (way too much, in fact), so this can
 take a lng time.
 
 
  Sorry, my English is not good.
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Journaling

2015-03-31 Thread Garg, Pankaj
Hi Mark,

Yes my reads are consistently slower. I have testes both Random and Sequential 
and various block sizes.

Thanks
Pankaj

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark 
Nelson
Sent: Monday, March 30, 2015 1:07 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] SSD Journaling

On 03/30/2015 03:01 PM, Garg, Pankaj wrote:
 Hi,

 I'm benchmarking my small cluster with HDDs vs HDDs with SSD Journaling.
 I am using both RADOS bench and Block device (using fio) for testing.

 I am seeing significant Write performance improvements, as expected. I 
 am however seeing the Reads coming out a bit slower on the SSD 
 Journaling side. They are not terribly different, but sometimes 10% slower.

 Is that something other folks have also seen, or do I need some 
 settings to be tuned properly? I'm wondering if accessing 2 drives for 
 reads, adds latency and hence the throughput suffers.

Hi,

What kind of reads are you seeing the degradation with?  Is it consistent with 
different sizes and random/seq?  Any interesting spikes or valleys during the 
tests?


 Thanks

 Pankaj



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Hardware recommendation

2015-03-31 Thread Adam Tygart
Speaking of SSD IOPs. Running the same tests on my SSDs (LiteOn
ECT-480N9S 480GB SSDs):
The lines at the bottom are a single 6TB spinning disk for comparison's sake.

http://imgur.com/a/fD0Mh

Based on these numbers, there is a minimum latency per operation, but
multiple operations can be performed simultaneously. The sweet spot
for my SSDs is ~8 journals per SSD to maximize IOPs on a per journal
basis. Unfortunately, at 8 journals, the overall IOPs is much less
than the stated IOPs for the SSD. (~5000 vs 9000 IOPs). Better than
spinning disks, but not what I was expecting.

The spreadsheet is available here:
https://people.beocat.cis.ksu.edu/~mozes/hobbit-ssd-vs-std-iops.ods

--
Adam

On Tue, Mar 31, 2015 at 7:09 AM, f...@univ-lr.fr f...@univ-lr.fr wrote:
 Hi,

 in our quest to get the right SSD for OSD journals, I managed to benchmark
 two kind of 10 DWPD SSDs :
 - Toshiba M2 PX02SMF020
 - Samsung 845DC PRO

 I wan't to determine if a disk is appropriate considering its absolute
 performances, and the optimal number of ceph-osd processes using the SSD as
 a journal.
 The benchmark consists of a fio command, with SYNC and DIRECT access
 options, and 4k blocks write accesses.

 fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --runtime=60
 --time_based --group_reporting --name=journal-test --iodepth=1 or 16
 --numjobs= ranging from 1 to 16

 I think numjobs can represent the concurrent number of OSD served by this
 SSD. Am I right on this ?


 http://www.4shared.com/download/WOvooKVXce/Fio-Direct-Sync-ToshibaM2-Sams.png?lgfp=3000

 My understanding of that data is that the 845DC Pro cannot be used for more
 that 4 OSD.
 The M2 is very constant in its comportment.
 The iodepth has almost no impact on perfs here.

 Could someone having other SSD types make the same test to consolidate the
 data ?

 Among the short list that could be considered for that task (for their
 price/perfs/DWPD/...) :
 - Seagate 1200 SSD 200GB, SAS 12Gb/s ST200FM0053
 - Hitachi SSD800MM MLC HUSMM8020ASS200
 - Intel DC3700

 I've not yet considered write amplification mentionned in other posts.

 Frederic

 Josef Johansson jose...@gmail.com a écrit le 20/03/15 10:29 :


 The 845DC Pro does look really nice, comparable with s3700 with TDW even.
 The price is what really does it, as it’s almost a third compared with
 s3700..





 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Robert LeBlanc
Turns out jumbo frames was not set on all the switch ports. Once that
was resolved the cluster quickly became healthy.

On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc rob...@leblancnet.us wrote:
 I've been working at this peering problem all day. I've done a lot of
 testing at the network layer and I just don't believe that we have a problem
 that would prevent OSDs from peering. When looking though osd_debug 20/20
 logs, it just doesn't look like the OSDs are trying to peer. I don't know if
 it is because there are so many outstanding creations or what. OSDs will
 peer with OSDs on other hosts, but for reason only chooses a certain number
 and not one that it needs to finish the peering process.

 I've check: firewall, open files, number of threads allowed. These usually
 have given me an error in the logs that helped me fix the problem.

 I can't find a configuration item that specifies how many peers an OSD
 should contact or anything that would be artificially limiting the peering
 connections. I've restarted the OSDs a number of times, as well as rebooting
 the hosts. I beleive if the OSDs finish peering everything will clear up. I
 can't find anything in pg query that would help me figure out what is
 blocking it (peering blocked by is empty). The PGs are scattered across all
 the hosts so we can't pin it down to a specific host.

 Any ideas on what to try would be appreciated.

 [ulhglive-root@ceph9 ~]# ceph --version
 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 [ulhglive-root@ceph9 ~]# ceph status
 cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
  health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
 inactive; 1321 pgs stuck unclean; too few pgs per osd (17  min 20)
  monmap e2: 3 mons at
 {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
 election epoch 30, quorum 0,1,2 mon1,mon2,mon3
  osdmap e704: 120 osds: 120 up, 120 in
   pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
 11447 MB used, 436 TB / 436 TB avail
  727 active+clean
  990 peering
   37 creating+peering
1 down+peering
  290 remapped+peering
3 creating+remapped+peering

 { state: peering,
   epoch: 707,
   up: [
 40,
 92,
 48,
 91],
   acting: [
 40,
 92,
 48,
 91],
   info: { pgid: 7.171,
   last_update: 0'0,
   last_complete: 0'0,
   log_tail: 0'0,
   last_user_version: 0,
   last_backfill: MAX,
   purged_snaps: [],
   history: { epoch_created: 293,
   last_epoch_started: 343,
   last_epoch_clean: 343,
   last_epoch_split: 0,
   same_up_since: 688,
   same_interval_since: 688,
   same_primary_since: 608,
   last_scrub: 0'0,
   last_scrub_stamp: 2015-03-30 11:11:18.872851,
   last_deep_scrub: 0'0,
   last_deep_scrub_stamp: 2015-03-30 11:11:18.872851,
   last_clean_scrub_stamp: 0.00},
   stats: { version: 0'0,
   reported_seq: 326,
   reported_epoch: 707,
   state: peering,
   last_fresh: 2015-03-30 20:10:39.509855,
   last_change: 2015-03-30 19:44:17.361601,
   last_active: 2015-03-30 11:37:56.956417,
   last_clean: 2015-03-30 11:37:56.956417,
   last_became_active: 0.00,
   last_unstale: 2015-03-30 20:10:39.509855,
   mapping_epoch: 683,
   log_start: 0'0,
   ondisk_log_start: 0'0,
   created: 293,
   last_epoch_clean: 343,
   parent: 0.0,
   parent_split_bits: 0,
   last_scrub: 0'0,
   last_scrub_stamp: 2015-03-30 11:11:18.872851,
   last_deep_scrub: 0'0,
   last_deep_scrub_stamp: 2015-03-30 11:11:18.872851,
   last_clean_scrub_stamp: 0.00,
   log_size: 0,
   ondisk_log_size: 0,
   stats_invalid: 0,
   stat_sum: { num_bytes: 0,
   num_objects: 0,
   num_object_clones: 0,
   num_object_copies: 0,
   num_objects_missing_on_primary: 0,
   num_objects_degraded: 0,
   num_objects_unfound: 0,
   num_objects_dirty: 0,
   num_whiteouts: 0,
   num_read: 0,
   num_read_kb: 0,
   num_write: 0,
   num_write_kb: 0,
   num_scrub_errors: 0,
   num_shallow_scrub_errors: 0,
   num_deep_scrub_errors: 0,
   num_objects_recovered: 0,
   num_bytes_recovered: 0,
   num_keys_recovered: 0,
   num_objects_omap: 0,
   num_objects_hit_set_archive: 0},
   stat_cat_sum: {},
   up: [
 40,
 92,
 48,
 91],
   acting: [
 40,
 

Re: [ceph-users] Weird cluster restart behavior

2015-03-31 Thread Gregory Farnum
On Tue, Mar 31, 2015 at 7:50 AM, Quentin Hartman
qhart...@direwolfdigital.com wrote:
 I'm working on redeploying a 14-node cluster. I'm running giant 0.87.1. Last
 friday I got everything deployed and all was working well, and I set noout
 and shut all the OSD nodes down over the weekend. Yesterday when I spun it
 back up, the OSDs were behaving very strangely, incorrectly marking each
 other because of missed heartbeats, even though they were up. It looked like
 some kind of low-level networking problem, but I couldn't find any.

 After much work, I narrowed the apparent source of the problem down to the
 OSDs running on the first host I started in the morning. They were the ones
 that were logged the most messages about not being able to ping other OSDs,
 and the other OSDs were mostly complaining about them. After running out of
 other ideas to try, I restarted them, and then everything started working.
 It's still working happily this morning. It seems as though when that set of
 OSDs started they got stale OSD map information from the MON boxes, which
 failed to be updated as the other OSDs came up. Does that make sense? I
 still don't consider myself an expert on ceph architecture and would
 appreciate and corrections or other possible interpretations of events (I'm
 happy to provide whatever additional information I can) so I can get a
 deeper understanding of things. If my interpretation of events is correct,
 it seems that might point at a bug.

I can't find the ticket now, but I think we did indeed have a bug
around heartbeat failures when restarting nodes. This has been fixed
in other branches but might have been missed for giant. (Did you by
any chance set the nodown flag as well as noout?)

In general Ceph isn't very happy with being shut down completely like
that and its behaviors aren't validated, so nothing will go seriously
wrong but you might find little irritants like this. It's particularly
likely when you're prohibiting state changes with the noout/nodown
flags.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One of three monitors can not be started

2015-03-31 Thread Gregory Farnum
On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇 zhanghaoyu1...@hotmail.com wrote:
 Who can help me?

 One monitor in my ceph cluster can not be started.
 Before that, I added '[mon] mon_compact_on_start = true' to
 /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell
 mon.computer05 compact ' on computer05, which has a monitor on it.
 When store.db of computer05 changed from 108G to 1G,  mon.computer06 stoped,
 and it can not be started since that.

 If I start mon.computer06, it will stop on this state:
 # /etc/init.d/ceph start mon.computer06
 === mon.computer06 ===
 Starting Ceph mon.computer06 on computer06...

 The process info is like this:
 root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start
 mon.computer06
 root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768;
 /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid
 -c /etc/ceph/ceph.conf
 root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06
 --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
 root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i computer06
 --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf

 Log on computer06 is like this:
 2015-03-30 20:46:54.152956 7fc5379d07a0  0 ceph version 0.72.2
 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
 ...
 2015-03-30 20:46:54.759791 7fc5379d07a0  1 mon.computer06@-1(probing) e4
 preinit clean up potentially inconsistent store state

So I haven't looked at this code in a while, but I think the monitor
is trying to validate that it's consistent with the others. You
probably want to dig around the monitor admin sockets and see what
state each monitor is in, plus its perception of the others.

In this case, I think maybe mon.computer06 is trying to examine its
whole store, but 100GB is a lot (way too much, in fact), so this can
take a lng time.


 Sorry, my English is not good.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread koukou73gr

On 03/31/2015 09:23 PM, Sage Weil wrote:


It's nothing specific to peering (or ceph).  The symptom we've seen is
just that byte stop passing across a TCP connection, usually when there is
some largish messages being sent.  The ping/heartbeat messages get through
because they are small and we disable nagle so they never end up in large
frames.


Is there any special route one should take in order to transition a live 
cluster to use jumbo frames and avoid such pitfalls with OSD peering?


-K.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weird cluster restart behavior

2015-03-31 Thread Quentin Hartman
Thanks for the extra info Gregory. I did not also set nodown.

I expect that I will be very rarely shutting everything down in the normal
course of things, but it has come up a couple times when having to do some
physical re-organizing of racks. Little irritants like this aren't a big
deal if people know to expect them, but as it is I lost quite a lot of time
troubleshooting a non-existant problem. What's the best way to get notes to
that effect added to the docs? It seems something in
http://ceph.com/docs/master/rados/operations/operating/ would save some
people some headache. I'm happy to propose edits, but a quick look doesn't
reveal a process for submitting that sort of thing.

My understanding is that the right method to take an entire cluster
offline is to set noout and then shutting everything down. Is there a
better way?

QH

On Tue, Mar 31, 2015 at 1:35 PM, Gregory Farnum g...@gregs42.com wrote:

 On Tue, Mar 31, 2015 at 7:50 AM, Quentin Hartman
 qhart...@direwolfdigital.com wrote:
  I'm working on redeploying a 14-node cluster. I'm running giant 0.87.1.
 Last
  friday I got everything deployed and all was working well, and I set
 noout
  and shut all the OSD nodes down over the weekend. Yesterday when I spun
 it
  back up, the OSDs were behaving very strangely, incorrectly marking each
  other because of missed heartbeats, even though they were up. It looked
 like
  some kind of low-level networking problem, but I couldn't find any.
 
  After much work, I narrowed the apparent source of the problem down to
 the
  OSDs running on the first host I started in the morning. They were the
 ones
  that were logged the most messages about not being able to ping other
 OSDs,
  and the other OSDs were mostly complaining about them. After running out
 of
  other ideas to try, I restarted them, and then everything started
 working.
  It's still working happily this morning. It seems as though when that
 set of
  OSDs started they got stale OSD map information from the MON boxes, which
  failed to be updated as the other OSDs came up. Does that make sense? I
  still don't consider myself an expert on ceph architecture and would
  appreciate and corrections or other possible interpretations of events
 (I'm
  happy to provide whatever additional information I can) so I can get a
  deeper understanding of things. If my interpretation of events is
 correct,
  it seems that might point at a bug.

 I can't find the ticket now, but I think we did indeed have a bug
 around heartbeat failures when restarting nodes. This has been fixed
 in other branches but might have been missed for giant. (Did you by
 any chance set the nodown flag as well as noout?)

 In general Ceph isn't very happy with being shut down completely like
 that and its behaviors aren't validated, so nothing will go seriously
 wrong but you might find little irritants like this. It's particularly
 likely when you're prohibiting state changes with the noout/nodown
 flags.
 -Greg

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Robert LeBlanc
At the L2 level, if the hosts and switches don't accept jumbo frames,
they just drop them because they are too big. They are not fragmented
because they don't go through a router. My problem is that OSDs were
able to peer with other OSDs on the host, but my guess is that they
never sent/received packets larger than 1500 bytes. Then other OSD
processes tried to peer but sent packets larger than 1500 bytes
causing the packets to be dropped and peering to stall.

On Tue, Mar 31, 2015 at 12:10 PM, Somnath Roy somnath@sandisk.com wrote:
 But, do we know why Jumbo frames may have an impact on peering ?
 In our setup so far, we haven't enabled jumbo frames other than performance 
 reason (if at all).

 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Robert LeBlanc
 Sent: Tuesday, March 31, 2015 11:08 AM
 To: Sage Weil
 Cc: ceph-devel; Ceph-User
 Subject: Re: [ceph-users] Force an OSD to try to peer

 I was desperate for anything after exhausting every other possibility I could 
 think of. Maybe I should put a checklist in the Ceph docs of things to look 
 for.

 Thanks,

 On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil s...@newdream.net wrote:
 On Tue, 31 Mar 2015, Robert LeBlanc wrote:
 Turns out jumbo frames was not set on all the switch ports. Once that
 was resolved the cluster quickly became healthy.

 I always hesitate to point the finger at the jumbo frames
 configuration but almost every time that is the culprit!

 Thanks for the update.  :)
 sage




 On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc rob...@leblancnet.us 
 wrote:
  I've been working at this peering problem all day. I've done a lot
  of testing at the network layer and I just don't believe that we
  have a problem that would prevent OSDs from peering. When looking
  though osd_debug 20/20 logs, it just doesn't look like the OSDs are
  trying to peer. I don't know if it is because there are so many
  outstanding creations or what. OSDs will peer with OSDs on other
  hosts, but for reason only chooses a certain number and not one that it 
  needs to finish the peering process.
 
  I've check: firewall, open files, number of threads allowed. These
  usually have given me an error in the logs that helped me fix the problem.
 
  I can't find a configuration item that specifies how many peers an
  OSD should contact or anything that would be artificially limiting
  the peering connections. I've restarted the OSDs a number of times,
  as well as rebooting the hosts. I beleive if the OSDs finish
  peering everything will clear up. I can't find anything in pg query
  that would help me figure out what is blocking it (peering blocked
  by is empty). The PGs are scattered across all the hosts so we can't pin 
  it down to a specific host.
 
  Any ideas on what to try would be appreciated.
 
  [ulhglive-root@ceph9 ~]# ceph --version ceph version 0.80.7
  (6c0127fcb58008793d3c8b62d925bc91963672a3)
  [ulhglive-root@ceph9 ~]# ceph status
  cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
   health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs
  stuck inactive; 1321 pgs stuck unclean; too few pgs per osd (17  min 20)
   monmap e2: 3 mons at
  {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.2
  9:6789/0}, election epoch 30, quorum 0,1,2 mon1,mon2,mon3
   osdmap e704: 120 osds: 120 up, 120 in
pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
  11447 MB used, 436 TB / 436 TB avail
   727 active+clean
   990 peering
37 creating+peering
 1 down+peering
   290 remapped+peering
 3 creating+remapped+peering
 
  { state: peering,
epoch: 707,
up: [
  40,
  92,
  48,
  91],
acting: [
  40,
  92,
  48,
  91],
info: { pgid: 7.171,
last_update: 0'0,
last_complete: 0'0,
log_tail: 0'0,
last_user_version: 0,
last_backfill: MAX,
purged_snaps: [],
history: { epoch_created: 293,
last_epoch_started: 343,
last_epoch_clean: 343,
last_epoch_split: 0,
same_up_since: 688,
same_interval_since: 688,
same_primary_since: 608,
last_scrub: 0'0,
last_scrub_stamp: 2015-03-30 11:11:18.872851,
last_deep_scrub: 0'0,
last_deep_scrub_stamp: 2015-03-30 11:11:18.872851,
last_clean_scrub_stamp: 0.00},
stats: { version: 0'0,
reported_seq: 326,
reported_epoch: 707,
state: peering,
last_fresh: 2015-03-30 20:10:39.509855,
last_change: 2015-03-30 19:44:17.361601,
last_active: 2015-03-30 11:37:56.956417,
last_clean: 2015-03-30 11:37:56.956417,
last_became_active: 0.00,

Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Sage Weil
On Tue, 31 Mar 2015, Robert LeBlanc wrote:
 Turns out jumbo frames was not set on all the switch ports. Once that
 was resolved the cluster quickly became healthy.

I always hesitate to point the finger at the jumbo frames configuration 
but almost every time that is the culprit!

Thanks for the update.  :)
sage



 
 On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc rob...@leblancnet.us wrote:
  I've been working at this peering problem all day. I've done a lot of
  testing at the network layer and I just don't believe that we have a problem
  that would prevent OSDs from peering. When looking though osd_debug 20/20
  logs, it just doesn't look like the OSDs are trying to peer. I don't know if
  it is because there are so many outstanding creations or what. OSDs will
  peer with OSDs on other hosts, but for reason only chooses a certain number
  and not one that it needs to finish the peering process.
 
  I've check: firewall, open files, number of threads allowed. These usually
  have given me an error in the logs that helped me fix the problem.
 
  I can't find a configuration item that specifies how many peers an OSD
  should contact or anything that would be artificially limiting the peering
  connections. I've restarted the OSDs a number of times, as well as rebooting
  the hosts. I beleive if the OSDs finish peering everything will clear up. I
  can't find anything in pg query that would help me figure out what is
  blocking it (peering blocked by is empty). The PGs are scattered across all
  the hosts so we can't pin it down to a specific host.
 
  Any ideas on what to try would be appreciated.
 
  [ulhglive-root@ceph9 ~]# ceph --version
  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
  [ulhglive-root@ceph9 ~]# ceph status
  cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
   health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
  inactive; 1321 pgs stuck unclean; too few pgs per osd (17  min 20)
   monmap e2: 3 mons at
  {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
  election epoch 30, quorum 0,1,2 mon1,mon2,mon3
   osdmap e704: 120 osds: 120 up, 120 in
pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
  11447 MB used, 436 TB / 436 TB avail
   727 active+clean
   990 peering
37 creating+peering
 1 down+peering
   290 remapped+peering
 3 creating+remapped+peering
 
  { state: peering,
epoch: 707,
up: [
  40,
  92,
  48,
  91],
acting: [
  40,
  92,
  48,
  91],
info: { pgid: 7.171,
last_update: 0'0,
last_complete: 0'0,
log_tail: 0'0,
last_user_version: 0,
last_backfill: MAX,
purged_snaps: [],
history: { epoch_created: 293,
last_epoch_started: 343,
last_epoch_clean: 343,
last_epoch_split: 0,
same_up_since: 688,
same_interval_since: 688,
same_primary_since: 608,
last_scrub: 0'0,
last_scrub_stamp: 2015-03-30 11:11:18.872851,
last_deep_scrub: 0'0,
last_deep_scrub_stamp: 2015-03-30 11:11:18.872851,
last_clean_scrub_stamp: 0.00},
stats: { version: 0'0,
reported_seq: 326,
reported_epoch: 707,
state: peering,
last_fresh: 2015-03-30 20:10:39.509855,
last_change: 2015-03-30 19:44:17.361601,
last_active: 2015-03-30 11:37:56.956417,
last_clean: 2015-03-30 11:37:56.956417,
last_became_active: 0.00,
last_unstale: 2015-03-30 20:10:39.509855,
mapping_epoch: 683,
log_start: 0'0,
ondisk_log_start: 0'0,
created: 293,
last_epoch_clean: 343,
parent: 0.0,
parent_split_bits: 0,
last_scrub: 0'0,
last_scrub_stamp: 2015-03-30 11:11:18.872851,
last_deep_scrub: 0'0,
last_deep_scrub_stamp: 2015-03-30 11:11:18.872851,
last_clean_scrub_stamp: 0.00,
log_size: 0,
ondisk_log_size: 0,
stats_invalid: 0,
stat_sum: { num_bytes: 0,
num_objects: 0,
num_object_clones: 0,
num_object_copies: 0,
num_objects_missing_on_primary: 0,
num_objects_degraded: 0,
num_objects_unfound: 0,
num_objects_dirty: 0,
num_whiteouts: 0,
num_read: 0,
num_read_kb: 0,
num_write: 0,
num_write_kb: 0,
num_scrub_errors: 0,
num_shallow_scrub_errors: 0,
num_deep_scrub_errors: 0,
num_objects_recovered: 0,
num_bytes_recovered: 

Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Somnath Roy
But, do we know why Jumbo frames may have an impact on peering ?
In our setup so far, we haven't enabled jumbo frames other than performance 
reason (if at all).

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Robert 
LeBlanc
Sent: Tuesday, March 31, 2015 11:08 AM
To: Sage Weil
Cc: ceph-devel; Ceph-User
Subject: Re: [ceph-users] Force an OSD to try to peer

I was desperate for anything after exhausting every other possibility I could 
think of. Maybe I should put a checklist in the Ceph docs of things to look for.

Thanks,

On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil s...@newdream.net wrote:
 On Tue, 31 Mar 2015, Robert LeBlanc wrote:
 Turns out jumbo frames was not set on all the switch ports. Once that
 was resolved the cluster quickly became healthy.

 I always hesitate to point the finger at the jumbo frames
 configuration but almost every time that is the culprit!

 Thanks for the update.  :)
 sage




 On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc rob...@leblancnet.us wrote:
  I've been working at this peering problem all day. I've done a lot
  of testing at the network layer and I just don't believe that we
  have a problem that would prevent OSDs from peering. When looking
  though osd_debug 20/20 logs, it just doesn't look like the OSDs are
  trying to peer. I don't know if it is because there are so many
  outstanding creations or what. OSDs will peer with OSDs on other
  hosts, but for reason only chooses a certain number and not one that it 
  needs to finish the peering process.
 
  I've check: firewall, open files, number of threads allowed. These
  usually have given me an error in the logs that helped me fix the problem.
 
  I can't find a configuration item that specifies how many peers an
  OSD should contact or anything that would be artificially limiting
  the peering connections. I've restarted the OSDs a number of times,
  as well as rebooting the hosts. I beleive if the OSDs finish
  peering everything will clear up. I can't find anything in pg query
  that would help me figure out what is blocking it (peering blocked
  by is empty). The PGs are scattered across all the hosts so we can't pin 
  it down to a specific host.
 
  Any ideas on what to try would be appreciated.
 
  [ulhglive-root@ceph9 ~]# ceph --version ceph version 0.80.7
  (6c0127fcb58008793d3c8b62d925bc91963672a3)
  [ulhglive-root@ceph9 ~]# ceph status
  cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
   health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs
  stuck inactive; 1321 pgs stuck unclean; too few pgs per osd (17  min 20)
   monmap e2: 3 mons at
  {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.2
  9:6789/0}, election epoch 30, quorum 0,1,2 mon1,mon2,mon3
   osdmap e704: 120 osds: 120 up, 120 in
pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
  11447 MB used, 436 TB / 436 TB avail
   727 active+clean
   990 peering
37 creating+peering
 1 down+peering
   290 remapped+peering
 3 creating+remapped+peering
 
  { state: peering,
epoch: 707,
up: [
  40,
  92,
  48,
  91],
acting: [
  40,
  92,
  48,
  91],
info: { pgid: 7.171,
last_update: 0'0,
last_complete: 0'0,
log_tail: 0'0,
last_user_version: 0,
last_backfill: MAX,
purged_snaps: [],
history: { epoch_created: 293,
last_epoch_started: 343,
last_epoch_clean: 343,
last_epoch_split: 0,
same_up_since: 688,
same_interval_since: 688,
same_primary_since: 608,
last_scrub: 0'0,
last_scrub_stamp: 2015-03-30 11:11:18.872851,
last_deep_scrub: 0'0,
last_deep_scrub_stamp: 2015-03-30 11:11:18.872851,
last_clean_scrub_stamp: 0.00},
stats: { version: 0'0,
reported_seq: 326,
reported_epoch: 707,
state: peering,
last_fresh: 2015-03-30 20:10:39.509855,
last_change: 2015-03-30 19:44:17.361601,
last_active: 2015-03-30 11:37:56.956417,
last_clean: 2015-03-30 11:37:56.956417,
last_became_active: 0.00,
last_unstale: 2015-03-30 20:10:39.509855,
mapping_epoch: 683,
log_start: 0'0,
ondisk_log_start: 0'0,
created: 293,
last_epoch_clean: 343,
parent: 0.0,
parent_split_bits: 0,
last_scrub: 0'0,
last_scrub_stamp: 2015-03-30 11:11:18.872851,
last_deep_scrub: 0'0,
last_deep_scrub_stamp: 2015-03-30 11:11:18.872851,
last_clean_scrub_stamp: 0.00,
log_size: 0,
ondisk_log_size: 0,

Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Robert LeBlanc
I was desperate for anything after exhausting every other possibility
I could think of. Maybe I should put a checklist in the Ceph docs of
things to look for.

Thanks,

On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil s...@newdream.net wrote:
 On Tue, 31 Mar 2015, Robert LeBlanc wrote:
 Turns out jumbo frames was not set on all the switch ports. Once that
 was resolved the cluster quickly became healthy.

 I always hesitate to point the finger at the jumbo frames configuration
 but almost every time that is the culprit!

 Thanks for the update.  :)
 sage




 On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc rob...@leblancnet.us wrote:
  I've been working at this peering problem all day. I've done a lot of
  testing at the network layer and I just don't believe that we have a 
  problem
  that would prevent OSDs from peering. When looking though osd_debug 20/20
  logs, it just doesn't look like the OSDs are trying to peer. I don't know 
  if
  it is because there are so many outstanding creations or what. OSDs will
  peer with OSDs on other hosts, but for reason only chooses a certain number
  and not one that it needs to finish the peering process.
 
  I've check: firewall, open files, number of threads allowed. These usually
  have given me an error in the logs that helped me fix the problem.
 
  I can't find a configuration item that specifies how many peers an OSD
  should contact or anything that would be artificially limiting the peering
  connections. I've restarted the OSDs a number of times, as well as 
  rebooting
  the hosts. I beleive if the OSDs finish peering everything will clear up. I
  can't find anything in pg query that would help me figure out what is
  blocking it (peering blocked by is empty). The PGs are scattered across all
  the hosts so we can't pin it down to a specific host.
 
  Any ideas on what to try would be appreciated.
 
  [ulhglive-root@ceph9 ~]# ceph --version
  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
  [ulhglive-root@ceph9 ~]# ceph status
  cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
   health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
  inactive; 1321 pgs stuck unclean; too few pgs per osd (17  min 20)
   monmap e2: 3 mons at
  {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
  election epoch 30, quorum 0,1,2 mon1,mon2,mon3
   osdmap e704: 120 osds: 120 up, 120 in
pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
  11447 MB used, 436 TB / 436 TB avail
   727 active+clean
   990 peering
37 creating+peering
 1 down+peering
   290 remapped+peering
 3 creating+remapped+peering
 
  { state: peering,
epoch: 707,
up: [
  40,
  92,
  48,
  91],
acting: [
  40,
  92,
  48,
  91],
info: { pgid: 7.171,
last_update: 0'0,
last_complete: 0'0,
log_tail: 0'0,
last_user_version: 0,
last_backfill: MAX,
purged_snaps: [],
history: { epoch_created: 293,
last_epoch_started: 343,
last_epoch_clean: 343,
last_epoch_split: 0,
same_up_since: 688,
same_interval_since: 688,
same_primary_since: 608,
last_scrub: 0'0,
last_scrub_stamp: 2015-03-30 11:11:18.872851,
last_deep_scrub: 0'0,
last_deep_scrub_stamp: 2015-03-30 11:11:18.872851,
last_clean_scrub_stamp: 0.00},
stats: { version: 0'0,
reported_seq: 326,
reported_epoch: 707,
state: peering,
last_fresh: 2015-03-30 20:10:39.509855,
last_change: 2015-03-30 19:44:17.361601,
last_active: 2015-03-30 11:37:56.956417,
last_clean: 2015-03-30 11:37:56.956417,
last_became_active: 0.00,
last_unstale: 2015-03-30 20:10:39.509855,
mapping_epoch: 683,
log_start: 0'0,
ondisk_log_start: 0'0,
created: 293,
last_epoch_clean: 343,
parent: 0.0,
parent_split_bits: 0,
last_scrub: 0'0,
last_scrub_stamp: 2015-03-30 11:11:18.872851,
last_deep_scrub: 0'0,
last_deep_scrub_stamp: 2015-03-30 11:11:18.872851,
last_clean_scrub_stamp: 0.00,
log_size: 0,
ondisk_log_size: 0,
stats_invalid: 0,
stat_sum: { num_bytes: 0,
num_objects: 0,
num_object_clones: 0,
num_object_copies: 0,
num_objects_missing_on_primary: 0,
num_objects_degraded: 0,
num_objects_unfound: 0,
num_objects_dirty: 0,
num_whiteouts: 0,
num_read: 0,
num_read_kb: 0,

Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Sage Weil
On Tue, 31 Mar 2015, Somnath Roy wrote:
 But, do we know why Jumbo frames may have an impact on peering ?
 In our setup so far, we haven't enabled jumbo frames other than performance 
 reason (if at all).

It's nothing specific to peering (or ceph).  The symptom we've seen is 
just that byte stop passing across a TCP connection, usually when there is 
some largish messages being sent.  The ping/heartbeat messages get through 
because they are small and we disable nagle so they never end up in large 
frames.

It's a pain to diagnose.

sage


 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Robert LeBlanc
 Sent: Tuesday, March 31, 2015 11:08 AM
 To: Sage Weil
 Cc: ceph-devel; Ceph-User
 Subject: Re: [ceph-users] Force an OSD to try to peer
 
 I was desperate for anything after exhausting every other possibility I could 
 think of. Maybe I should put a checklist in the Ceph docs of things to look 
 for.
 
 Thanks,
 
 On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil s...@newdream.net wrote:
  On Tue, 31 Mar 2015, Robert LeBlanc wrote:
  Turns out jumbo frames was not set on all the switch ports. Once that
  was resolved the cluster quickly became healthy.
 
  I always hesitate to point the finger at the jumbo frames
  configuration but almost every time that is the culprit!
 
  Thanks for the update.  :)
  sage
 
 
 
 
  On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc rob...@leblancnet.us 
  wrote:
   I've been working at this peering problem all day. I've done a lot
   of testing at the network layer and I just don't believe that we
   have a problem that would prevent OSDs from peering. When looking
   though osd_debug 20/20 logs, it just doesn't look like the OSDs are
   trying to peer. I don't know if it is because there are so many
   outstanding creations or what. OSDs will peer with OSDs on other
   hosts, but for reason only chooses a certain number and not one that it 
   needs to finish the peering process.
  
   I've check: firewall, open files, number of threads allowed. These
   usually have given me an error in the logs that helped me fix the 
   problem.
  
   I can't find a configuration item that specifies how many peers an
   OSD should contact or anything that would be artificially limiting
   the peering connections. I've restarted the OSDs a number of times,
   as well as rebooting the hosts. I beleive if the OSDs finish
   peering everything will clear up. I can't find anything in pg query
   that would help me figure out what is blocking it (peering blocked
   by is empty). The PGs are scattered across all the hosts so we can't pin 
   it down to a specific host.
  
   Any ideas on what to try would be appreciated.
  
   [ulhglive-root@ceph9 ~]# ceph --version ceph version 0.80.7
   (6c0127fcb58008793d3c8b62d925bc91963672a3)
   [ulhglive-root@ceph9 ~]# ceph status
   cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs
   stuck inactive; 1321 pgs stuck unclean; too few pgs per osd (17  min 20)
monmap e2: 3 mons at
   {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.2
   9:6789/0}, election epoch 30, quorum 0,1,2 mon1,mon2,mon3
osdmap e704: 120 osds: 120 up, 120 in
 pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
   11447 MB used, 436 TB / 436 TB avail
727 active+clean
990 peering
 37 creating+peering
  1 down+peering
290 remapped+peering
  3 creating+remapped+peering
  
   { state: peering,
 epoch: 707,
 up: [
   40,
   92,
   48,
   91],
 acting: [
   40,
   92,
   48,
   91],
 info: { pgid: 7.171,
 last_update: 0'0,
 last_complete: 0'0,
 log_tail: 0'0,
 last_user_version: 0,
 last_backfill: MAX,
 purged_snaps: [],
 history: { epoch_created: 293,
 last_epoch_started: 343,
 last_epoch_clean: 343,
 last_epoch_split: 0,
 same_up_since: 688,
 same_interval_since: 688,
 same_primary_since: 608,
 last_scrub: 0'0,
 last_scrub_stamp: 2015-03-30 11:11:18.872851,
 last_deep_scrub: 0'0,
 last_deep_scrub_stamp: 2015-03-30 11:11:18.872851,
 last_clean_scrub_stamp: 0.00},
 stats: { version: 0'0,
 reported_seq: 326,
 reported_epoch: 707,
 state: peering,
 last_fresh: 2015-03-30 20:10:39.509855,
 last_change: 2015-03-30 19:44:17.361601,
 last_active: 2015-03-30 11:37:56.956417,
 last_clean: 2015-03-30 11:37:56.956417,
 last_became_active: 0.00,
 last_unstale: 2015-03-30 20:10:39.509855,
 

Re: [ceph-users] Weird cluster restart behavior

2015-03-31 Thread Quentin Hartman
On Tue, Mar 31, 2015 at 2:05 PM, Gregory Farnum g...@gregs42.com wrote:

 Github pull requests. :)


Ah, well that's easy:

https://github.com/ceph/ceph/pull/4237


QH
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weird cluster restart behavior

2015-03-31 Thread Jeffrey Ollie
On Tue, Mar 31, 2015 at 3:05 PM, Gregory Farnum g...@gregs42.com wrote:

 On Tue, Mar 31, 2015 at 12:56 PM, Quentin Hartman
 
  My understanding is that the right method to take an entire cluster
  offline is to set noout and then shutting everything down. Is there a
 better
  way?

 That's probably the best way to do it. Like I said, there was also a
 bug here that I think is fixed for Hammer but that might not have been
 backported to Giant. Unfortunately I don't remember the right keywords
 as I wasn't involved in the fix.


I'd hope that the complete shutdown scenario would get some more testing in
the future...  I know that Ceph is targeted more at enterprise situations
where things like generators and properly sized battery backups aren't
extravagant luxuries, but there are probably a lot of clusters out there
that will get shut down completely, planned or unplanned.

-- 
Jeff Ollie
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Weird cluster restart behavior

2015-03-31 Thread Quentin Hartman
I'm working on redeploying a 14-node cluster. I'm running giant 0.87.1.
Last friday I got everything deployed and all was working well, and I set
noout and shut all the OSD nodes down over the weekend. Yesterday when I
spun it back up, the OSDs were behaving very strangely, incorrectly marking
each other because of missed heartbeats, even though they were up. It
looked like some kind of low-level networking problem, but I couldn't find
any.

After much work, I narrowed the apparent source of the problem down to the
OSDs running on the first host I started in the morning. They were the ones
that were logged the most messages about not being able to ping other OSDs,
and the other OSDs were mostly complaining about them. After running out of
other ideas to try, I restarted them, and then everything started working.
It's still working happily this morning. It seems as though when that set
of OSDs started they got stale OSD map information from the MON boxes,
which failed to be updated as the other OSDs came up. Does that make sense?
I still don't consider myself an expert on ceph architecture and would
appreciate and corrections or other possible interpretations of events (I'm
happy to provide whatever additional information I can) so I can get a
deeper understanding of things. If my interpretation of events is correct,
it seems that might point at a bug.

QH
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weird cluster restart behavior

2015-03-31 Thread Gregory Farnum
On Tue, Mar 31, 2015 at 12:56 PM, Quentin Hartman
qhart...@direwolfdigital.com wrote:
 Thanks for the extra info Gregory. I did not also set nodown.

 I expect that I will be very rarely shutting everything down in the normal
 course of things, but it has come up a couple times when having to do some
 physical re-organizing of racks. Little irritants like this aren't a big
 deal if people know to expect them, but as it is I lost quite a lot of time
 troubleshooting a non-existant problem. What's the best way to get notes to
 that effect added to the docs? It seems something in
 http://ceph.com/docs/master/rados/operations/operating/ would save some
 people some headache. I'm happy to propose edits, but a quick look doesn't
 reveal a process for submitting that sort of thing.

Github pull requests. :)


 My understanding is that the right method to take an entire cluster
 offline is to set noout and then shutting everything down. Is there a better
 way?

That's probably the best way to do it. Like I said, there was also a
bug here that I think is fixed for Hammer but that might not have been
backported to Giant. Unfortunately I don't remember the right keywords
as I wasn't involved in the fix.
-Greg


 QH

 On Tue, Mar 31, 2015 at 1:35 PM, Gregory Farnum g...@gregs42.com wrote:

 On Tue, Mar 31, 2015 at 7:50 AM, Quentin Hartman
 qhart...@direwolfdigital.com wrote:
  I'm working on redeploying a 14-node cluster. I'm running giant 0.87.1.
  Last
  friday I got everything deployed and all was working well, and I set
  noout
  and shut all the OSD nodes down over the weekend. Yesterday when I spun
  it
  back up, the OSDs were behaving very strangely, incorrectly marking each
  other because of missed heartbeats, even though they were up. It looked
  like
  some kind of low-level networking problem, but I couldn't find any.
 
  After much work, I narrowed the apparent source of the problem down to
  the
  OSDs running on the first host I started in the morning. They were the
  ones
  that were logged the most messages about not being able to ping other
  OSDs,
  and the other OSDs were mostly complaining about them. After running out
  of
  other ideas to try, I restarted them, and then everything started
  working.
  It's still working happily this morning. It seems as though when that
  set of
  OSDs started they got stale OSD map information from the MON boxes,
  which
  failed to be updated as the other OSDs came up. Does that make sense? I
  still don't consider myself an expert on ceph architecture and would
  appreciate and corrections or other possible interpretations of events
  (I'm
  happy to provide whatever additional information I can) so I can get a
  deeper understanding of things. If my interpretation of events is
  correct,
  it seems that might point at a bug.

 I can't find the ticket now, but I think we did indeed have a bug
 around heartbeat failures when restarting nodes. This has been fixed
 in other branches but might have been missed for giant. (Did you by
 any chance set the nodown flag as well as noout?)

 In general Ceph isn't very happy with being shut down completely like
 that and its behaviors aren't validated, so nothing will go seriously
 wrong but you might find little irritants like this. It's particularly
 likely when you're prohibiting state changes with the noout/nodown
 flags.
 -Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cascading Failure of OSDs

2015-03-31 Thread Francois Lafont
Hi,

Quentin Hartman wrote:

 Since I have been in ceph-land today, it reminded me that I needed to close
 the loop on this. I was finally able to isolate this problem down to a
 faulty NIC on the ceph cluster network. It worked, but it was
 accumulating a huge number of Rx errors. My best guess is some receive
 buffer cache failed? Anyway, having a NIC go weird like that is totally
 consistent with all the weird problems I was seeing, the corrupted PGs, and
 the inability for the cluster to settle down.
 
 As a result we've added NIC error rates to our monitoring suite on the
 cluster so we'll hopefully see this coming if it ever happens again.

Good for you. ;)

Could you post here the command that you use to get NIC error rates?

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com