Re: [ceph-users] One of three monitors can not be started

2015-03-31 Thread 张皓宇

There is asok on computer06. 
I tried to start the mon.computer06, maybe two hours later,  the mon.computer06 
still not start,
but there are some different processes on computer06, I don't know how to 
handle it:
root  7812 1  0 11:39 pts/400:00:00 python 
/usr/sbin/ceph-create-keys -i computer06
root 11025 1 12 09:02 pts/400:32:13 /usr/bin/ceph-mon -i computer06 
--pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
root 35692  7812  0 12:59 pts/400:00:00 python /usr/bin/ceph 
--cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.computer06.asok mon_status


I got the quorum_status from another running monitor:
{ "election_epoch": 508,
  "quorum": [
0,
1],
  "quorum_names": [
"computer05",
"computer04"],
  "quorum_leader_name": "computer04",
  "monmap": { "epoch": 4,
  "fsid": "471483e5-493f-41f6-b6f4-0187c13d156d",
  "modified": "2014-07-26 09:52:02.411967",
  "created": "0.00",
  "mons": [
{ "rank": 0,
  "name": "computer04",
  "addr": "192.168.1.60:6789\/0"},
{ "rank": 1,
  "name": "computer05",
  "addr": "192.168.1.65:6789\/0"},
{ "rank": 2,
  "name": "computer06",
  "addr": "192.168.1.66:6789\/0"}]}} 

> Date: Tue, 31 Mar 2015 12:30:22 -0700
> Subject: Re: [ceph-users] One of three monitors can not be started
> From: g...@gregs42.com
> To: zhanghaoyu1...@hotmail.com
> CC: ceph-users@lists.ceph.com
> 
> On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇  wrote:
> > Who can help me?
> >
> > One monitor in my ceph cluster can not be started.
> > Before that, I added '[mon] mon_compact_on_start = true' to
> > /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell
> > mon.computer05 compact ' on computer05, which has a monitor on it.
> > When store.db of computer05 changed from 108G to 1G,  mon.computer06 stoped,
> > and it can not be started since that.
> >
> > If I start mon.computer06, it will stop on this state:
> > # /etc/init.d/ceph start mon.computer06
> > === mon.computer06 ===
> > Starting Ceph mon.computer06 on computer06...
> >
> > The process info is like this:
> > root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start
> > mon.computer06
> > root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768;
> > /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid
> > -c /etc/ceph/ceph.conf
> > root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06
> > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
> > root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i computer06
> > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
> >
> > Log on computer06 is like this:
> > 2015-03-30 20:46:54.152956 7fc5379d07a0  0 ceph version 0.72.2
> > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
> > ...
> > 2015-03-30 20:46:54.759791 7fc5379d07a0  1 mon.computer06@-1(probing) e4
> > preinit clean up potentially inconsistent store state
> 
> So I haven't looked at this code in a while, but I think the monitor
> is trying to validate that it's consistent with the others. You
> probably want to dig around the monitor admin sockets and see what
> state each monitor is in, plus its perception of the others.
> 
> In this case, I think maybe mon.computer06 is trying to examine its
> whole store, but 100GB is a lot (way too much, in fact), so this can
> take a lng time.
> 
> >
> > Sorry, my English is not good.
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >

  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cascading Failure of OSDs

2015-03-31 Thread Francois Lafont
Hi,

Quentin Hartman wrote:

> Since I have been in ceph-land today, it reminded me that I needed to close
> the loop on this. I was finally able to isolate this problem down to a
> faulty NIC on the ceph cluster network. It "worked", but it was
> accumulating a huge number of Rx errors. My best guess is some receive
> buffer cache failed? Anyway, having a NIC go weird like that is totally
> consistent with all the weird problems I was seeing, the corrupted PGs, and
> the inability for the cluster to settle down.
> 
> As a result we've added NIC error rates to our monitoring suite on the
> cluster so we'll hopefully see this coming if it ever happens again.

Good for you. ;)

Could you post here the command that you use to get NIC error rates?

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weird cluster restart behavior

2015-03-31 Thread Jeffrey Ollie
On Tue, Mar 31, 2015 at 3:05 PM, Gregory Farnum  wrote:

> On Tue, Mar 31, 2015 at 12:56 PM, Quentin Hartman
> >
> > My understanding is that the "right" method to take an entire cluster
> > offline is to set noout and then shutting everything down. Is there a
> better
> > way?
>
> That's probably the best way to do it. Like I said, there was also a
> bug here that I think is fixed for Hammer but that might not have been
> backported to Giant. Unfortunately I don't remember the right keywords
> as I wasn't involved in the fix.


I'd hope that the complete shutdown scenario would get some more testing in
the future...  I know that Ceph is targeted more at "enterprise" situations
where things like generators and properly sized battery backups aren't
extravagant luxuries, but there are probably a lot of clusters out there
that will get shut down completely, planned or unplanned.

-- 
Jeff Ollie
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weird cluster restart behavior

2015-03-31 Thread Quentin Hartman
On Tue, Mar 31, 2015 at 2:05 PM, Gregory Farnum  wrote:

> Github pull requests. :)
>

Ah, well that's easy:

https://github.com/ceph/ceph/pull/4237


QH
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weird cluster restart behavior

2015-03-31 Thread Gregory Farnum
On Tue, Mar 31, 2015 at 12:56 PM, Quentin Hartman
 wrote:
> Thanks for the extra info Gregory. I did not also set nodown.
>
> I expect that I will be very rarely shutting everything down in the normal
> course of things, but it has come up a couple times when having to do some
> physical re-organizing of racks. Little irritants like this aren't a big
> deal if people know to expect them, but as it is I lost quite a lot of time
> troubleshooting a non-existant problem. What's the best way to get notes to
> that effect added to the docs? It seems something in
> http://ceph.com/docs/master/rados/operations/operating/ would save some
> people some headache. I'm happy to propose edits, but a quick look doesn't
> reveal a process for submitting that sort of thing.

Github pull requests. :)

>
> My understanding is that the "right" method to take an entire cluster
> offline is to set noout and then shutting everything down. Is there a better
> way?

That's probably the best way to do it. Like I said, there was also a
bug here that I think is fixed for Hammer but that might not have been
backported to Giant. Unfortunately I don't remember the right keywords
as I wasn't involved in the fix.
-Greg

>
> QH
>
> On Tue, Mar 31, 2015 at 1:35 PM, Gregory Farnum  wrote:
>>
>> On Tue, Mar 31, 2015 at 7:50 AM, Quentin Hartman
>>  wrote:
>> > I'm working on redeploying a 14-node cluster. I'm running giant 0.87.1.
>> > Last
>> > friday I got everything deployed and all was working well, and I set
>> > noout
>> > and shut all the OSD nodes down over the weekend. Yesterday when I spun
>> > it
>> > back up, the OSDs were behaving very strangely, incorrectly marking each
>> > other because of missed heartbeats, even though they were up. It looked
>> > like
>> > some kind of low-level networking problem, but I couldn't find any.
>> >
>> > After much work, I narrowed the apparent source of the problem down to
>> > the
>> > OSDs running on the first host I started in the morning. They were the
>> > ones
>> > that were logged the most messages about not being able to ping other
>> > OSDs,
>> > and the other OSDs were mostly complaining about them. After running out
>> > of
>> > other ideas to try, I restarted them, and then everything started
>> > working.
>> > It's still working happily this morning. It seems as though when that
>> > set of
>> > OSDs started they got stale OSD map information from the MON boxes,
>> > which
>> > failed to be updated as the other OSDs came up. Does that make sense? I
>> > still don't consider myself an expert on ceph architecture and would
>> > appreciate and corrections or other possible interpretations of events
>> > (I'm
>> > happy to provide whatever additional information I can) so I can get a
>> > deeper understanding of things. If my interpretation of events is
>> > correct,
>> > it seems that might point at a bug.
>>
>> I can't find the ticket now, but I think we did indeed have a bug
>> around heartbeat failures when restarting nodes. This has been fixed
>> in other branches but might have been missed for giant. (Did you by
>> any chance set the nodown flag as well as noout?)
>>
>> In general Ceph isn't very happy with being shut down completely like
>> that and its behaviors aren't validated, so nothing will go seriously
>> wrong but you might find little irritants like this. It's particularly
>> likely when you're prohibiting state changes with the noout/nodown
>> flags.
>> -Greg
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weird cluster restart behavior

2015-03-31 Thread Quentin Hartman
Thanks for the extra info Gregory. I did not also set nodown.

I expect that I will be very rarely shutting everything down in the normal
course of things, but it has come up a couple times when having to do some
physical re-organizing of racks. Little irritants like this aren't a big
deal if people know to expect them, but as it is I lost quite a lot of time
troubleshooting a non-existant problem. What's the best way to get notes to
that effect added to the docs? It seems something in
http://ceph.com/docs/master/rados/operations/operating/ would save some
people some headache. I'm happy to propose edits, but a quick look doesn't
reveal a process for submitting that sort of thing.

My understanding is that the "right" method to take an entire cluster
offline is to set noout and then shutting everything down. Is there a
better way?

QH

On Tue, Mar 31, 2015 at 1:35 PM, Gregory Farnum  wrote:

> On Tue, Mar 31, 2015 at 7:50 AM, Quentin Hartman
>  wrote:
> > I'm working on redeploying a 14-node cluster. I'm running giant 0.87.1.
> Last
> > friday I got everything deployed and all was working well, and I set
> noout
> > and shut all the OSD nodes down over the weekend. Yesterday when I spun
> it
> > back up, the OSDs were behaving very strangely, incorrectly marking each
> > other because of missed heartbeats, even though they were up. It looked
> like
> > some kind of low-level networking problem, but I couldn't find any.
> >
> > After much work, I narrowed the apparent source of the problem down to
> the
> > OSDs running on the first host I started in the morning. They were the
> ones
> > that were logged the most messages about not being able to ping other
> OSDs,
> > and the other OSDs were mostly complaining about them. After running out
> of
> > other ideas to try, I restarted them, and then everything started
> working.
> > It's still working happily this morning. It seems as though when that
> set of
> > OSDs started they got stale OSD map information from the MON boxes, which
> > failed to be updated as the other OSDs came up. Does that make sense? I
> > still don't consider myself an expert on ceph architecture and would
> > appreciate and corrections or other possible interpretations of events
> (I'm
> > happy to provide whatever additional information I can) so I can get a
> > deeper understanding of things. If my interpretation of events is
> correct,
> > it seems that might point at a bug.
>
> I can't find the ticket now, but I think we did indeed have a bug
> around heartbeat failures when restarting nodes. This has been fixed
> in other branches but might have been missed for giant. (Did you by
> any chance set the nodown flag as well as noout?)
>
> In general Ceph isn't very happy with being shut down completely like
> that and its behaviors aren't validated, so nothing will go seriously
> wrong but you might find little irritants like this. It's particularly
> likely when you're prohibiting state changes with the noout/nodown
> flags.
> -Greg
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread koukou73gr

On 03/31/2015 09:23 PM, Sage Weil wrote:


It's nothing specific to peering (or ceph).  The symptom we've seen is
just that byte stop passing across a TCP connection, usually when there is
some largish messages being sent.  The ping/heartbeat messages get through
because they are small and we disable nagle so they never end up in large
frames.


Is there any special route one should take in order to transition a live 
cluster to use jumbo frames and avoid such pitfalls with OSD peering?


-K.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weird cluster restart behavior

2015-03-31 Thread Gregory Farnum
On Tue, Mar 31, 2015 at 7:50 AM, Quentin Hartman
 wrote:
> I'm working on redeploying a 14-node cluster. I'm running giant 0.87.1. Last
> friday I got everything deployed and all was working well, and I set noout
> and shut all the OSD nodes down over the weekend. Yesterday when I spun it
> back up, the OSDs were behaving very strangely, incorrectly marking each
> other because of missed heartbeats, even though they were up. It looked like
> some kind of low-level networking problem, but I couldn't find any.
>
> After much work, I narrowed the apparent source of the problem down to the
> OSDs running on the first host I started in the morning. They were the ones
> that were logged the most messages about not being able to ping other OSDs,
> and the other OSDs were mostly complaining about them. After running out of
> other ideas to try, I restarted them, and then everything started working.
> It's still working happily this morning. It seems as though when that set of
> OSDs started they got stale OSD map information from the MON boxes, which
> failed to be updated as the other OSDs came up. Does that make sense? I
> still don't consider myself an expert on ceph architecture and would
> appreciate and corrections or other possible interpretations of events (I'm
> happy to provide whatever additional information I can) so I can get a
> deeper understanding of things. If my interpretation of events is correct,
> it seems that might point at a bug.

I can't find the ticket now, but I think we did indeed have a bug
around heartbeat failures when restarting nodes. This has been fixed
in other branches but might have been missed for giant. (Did you by
any chance set the nodown flag as well as noout?)

In general Ceph isn't very happy with being shut down completely like
that and its behaviors aren't validated, so nothing will go seriously
wrong but you might find little irritants like this. It's particularly
likely when you're prohibiting state changes with the noout/nodown
flags.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One of three monitors can not be started

2015-03-31 Thread Gregory Farnum
On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇  wrote:
> Who can help me?
>
> One monitor in my ceph cluster can not be started.
> Before that, I added '[mon] mon_compact_on_start = true' to
> /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell
> mon.computer05 compact ' on computer05, which has a monitor on it.
> When store.db of computer05 changed from 108G to 1G,  mon.computer06 stoped,
> and it can not be started since that.
>
> If I start mon.computer06, it will stop on this state:
> # /etc/init.d/ceph start mon.computer06
> === mon.computer06 ===
> Starting Ceph mon.computer06 on computer06...
>
> The process info is like this:
> root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start
> mon.computer06
> root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768;
> /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid
> -c /etc/ceph/ceph.conf
> root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06
> --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
> root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i computer06
> --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
>
> Log on computer06 is like this:
> 2015-03-30 20:46:54.152956 7fc5379d07a0  0 ceph version 0.72.2
> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
> ...
> 2015-03-30 20:46:54.759791 7fc5379d07a0  1 mon.computer06@-1(probing) e4
> preinit clean up potentially inconsistent store state

So I haven't looked at this code in a while, but I think the monitor
is trying to validate that it's consistent with the others. You
probably want to dig around the monitor admin sockets and see what
state each monitor is in, plus its perception of the others.

In this case, I think maybe mon.computer06 is trying to examine its
whole store, but 100GB is a lot (way too much, in fact), so this can
take a lng time.

>
> Sorry, my English is not good.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Sage Weil
On Tue, 31 Mar 2015, Somnath Roy wrote:
> But, do we know why Jumbo frames may have an impact on peering ?
> In our setup so far, we haven't enabled jumbo frames other than performance 
> reason (if at all).

It's nothing specific to peering (or ceph).  The symptom we've seen is 
just that byte stop passing across a TCP connection, usually when there is 
some largish messages being sent.  The ping/heartbeat messages get through 
because they are small and we disable nagle so they never end up in large 
frames.

It's a pain to diagnose.

sage


> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Robert LeBlanc
> Sent: Tuesday, March 31, 2015 11:08 AM
> To: Sage Weil
> Cc: ceph-devel; Ceph-User
> Subject: Re: [ceph-users] Force an OSD to try to peer
> 
> I was desperate for anything after exhausting every other possibility I could 
> think of. Maybe I should put a checklist in the Ceph docs of things to look 
> for.
> 
> Thanks,
> 
> On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil  wrote:
> > On Tue, 31 Mar 2015, Robert LeBlanc wrote:
> >> Turns out jumbo frames was not set on all the switch ports. Once that
> >> was resolved the cluster quickly became healthy.
> >
> > I always hesitate to point the finger at the jumbo frames
> > configuration but almost every time that is the culprit!
> >
> > Thanks for the update.  :)
> > sage
> >
> >
> >
> >>
> >> On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc  
> >> wrote:
> >> > I've been working at this peering problem all day. I've done a lot
> >> > of testing at the network layer and I just don't believe that we
> >> > have a problem that would prevent OSDs from peering. When looking
> >> > though osd_debug 20/20 logs, it just doesn't look like the OSDs are
> >> > trying to peer. I don't know if it is because there are so many
> >> > outstanding creations or what. OSDs will peer with OSDs on other
> >> > hosts, but for reason only chooses a certain number and not one that it 
> >> > needs to finish the peering process.
> >> >
> >> > I've check: firewall, open files, number of threads allowed. These
> >> > usually have given me an error in the logs that helped me fix the 
> >> > problem.
> >> >
> >> > I can't find a configuration item that specifies how many peers an
> >> > OSD should contact or anything that would be artificially limiting
> >> > the peering connections. I've restarted the OSDs a number of times,
> >> > as well as rebooting the hosts. I beleive if the OSDs finish
> >> > peering everything will clear up. I can't find anything in pg query
> >> > that would help me figure out what is blocking it (peering blocked
> >> > by is empty). The PGs are scattered across all the hosts so we can't pin 
> >> > it down to a specific host.
> >> >
> >> > Any ideas on what to try would be appreciated.
> >> >
> >> > [ulhglive-root@ceph9 ~]# ceph --version ceph version 0.80.7
> >> > (6c0127fcb58008793d3c8b62d925bc91963672a3)
> >> > [ulhglive-root@ceph9 ~]# ceph status
> >> > cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
> >> >  health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs
> >> > stuck inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
> >> >  monmap e2: 3 mons at
> >> > {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.2
> >> > 9:6789/0}, election epoch 30, quorum 0,1,2 mon1,mon2,mon3
> >> >  osdmap e704: 120 osds: 120 up, 120 in
> >> >   pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
> >> > 11447 MB used, 436 TB / 436 TB avail
> >> >  727 active+clean
> >> >  990 peering
> >> >   37 creating+peering
> >> >1 down+peering
> >> >  290 remapped+peering
> >> >3 creating+remapped+peering
> >> >
> >> > { "state": "peering",
> >> >   "epoch": 707,
> >> >   "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> >   "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> >   "info": { "pgid": "7.171",
> >> >   "last_update": "0'0",
> >> >   "last_complete": "0'0",
> >> >   "log_tail": "0'0",
> >> >   "last_user_version": 0,
> >> >   "last_backfill": "MAX",
> >> >   "purged_snaps": "[]",
> >> >   "history": { "epoch_created": 293,
> >> >   "last_epoch_started": 343,
> >> >   "last_epoch_clean": 343,
> >> >   "last_epoch_split": 0,
> >> >   "same_up_since": 688,
> >> >   "same_interval_since": 688,
> >> >   "same_primary_since": 608,
> >> >   "last_scrub": "0'0",
> >> >   "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> >> >   "last_deep_scrub": "0'0",
> >> >   "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> >> >   "last_clean_scrub_stamp": "0.00"},
> >> >   "stats": { "version": "0'0",
> >> >   "r

Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Robert LeBlanc
At the L2 level, if the hosts and switches don't accept jumbo frames,
they just drop them because they are too big. They are not fragmented
because they don't go through a router. My problem is that OSDs were
able to peer with other OSDs on the host, but my guess is that they
never sent/received packets larger than 1500 bytes. Then other OSD
processes tried to peer but sent packets larger than 1500 bytes
causing the packets to be dropped and peering to stall.

On Tue, Mar 31, 2015 at 12:10 PM, Somnath Roy  wrote:
> But, do we know why Jumbo frames may have an impact on peering ?
> In our setup so far, we haven't enabled jumbo frames other than performance 
> reason (if at all).
>
> Thanks & Regards
> Somnath
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Robert LeBlanc
> Sent: Tuesday, March 31, 2015 11:08 AM
> To: Sage Weil
> Cc: ceph-devel; Ceph-User
> Subject: Re: [ceph-users] Force an OSD to try to peer
>
> I was desperate for anything after exhausting every other possibility I could 
> think of. Maybe I should put a checklist in the Ceph docs of things to look 
> for.
>
> Thanks,
>
> On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil  wrote:
>> On Tue, 31 Mar 2015, Robert LeBlanc wrote:
>>> Turns out jumbo frames was not set on all the switch ports. Once that
>>> was resolved the cluster quickly became healthy.
>>
>> I always hesitate to point the finger at the jumbo frames
>> configuration but almost every time that is the culprit!
>>
>> Thanks for the update.  :)
>> sage
>>
>>
>>
>>>
>>> On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc  
>>> wrote:
>>> > I've been working at this peering problem all day. I've done a lot
>>> > of testing at the network layer and I just don't believe that we
>>> > have a problem that would prevent OSDs from peering. When looking
>>> > though osd_debug 20/20 logs, it just doesn't look like the OSDs are
>>> > trying to peer. I don't know if it is because there are so many
>>> > outstanding creations or what. OSDs will peer with OSDs on other
>>> > hosts, but for reason only chooses a certain number and not one that it 
>>> > needs to finish the peering process.
>>> >
>>> > I've check: firewall, open files, number of threads allowed. These
>>> > usually have given me an error in the logs that helped me fix the problem.
>>> >
>>> > I can't find a configuration item that specifies how many peers an
>>> > OSD should contact or anything that would be artificially limiting
>>> > the peering connections. I've restarted the OSDs a number of times,
>>> > as well as rebooting the hosts. I beleive if the OSDs finish
>>> > peering everything will clear up. I can't find anything in pg query
>>> > that would help me figure out what is blocking it (peering blocked
>>> > by is empty). The PGs are scattered across all the hosts so we can't pin 
>>> > it down to a specific host.
>>> >
>>> > Any ideas on what to try would be appreciated.
>>> >
>>> > [ulhglive-root@ceph9 ~]# ceph --version ceph version 0.80.7
>>> > (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>> > [ulhglive-root@ceph9 ~]# ceph status
>>> > cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
>>> >  health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs
>>> > stuck inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
>>> >  monmap e2: 3 mons at
>>> > {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.2
>>> > 9:6789/0}, election epoch 30, quorum 0,1,2 mon1,mon2,mon3
>>> >  osdmap e704: 120 osds: 120 up, 120 in
>>> >   pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
>>> > 11447 MB used, 436 TB / 436 TB avail
>>> >  727 active+clean
>>> >  990 peering
>>> >   37 creating+peering
>>> >1 down+peering
>>> >  290 remapped+peering
>>> >3 creating+remapped+peering
>>> >
>>> > { "state": "peering",
>>> >   "epoch": 707,
>>> >   "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> >   "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> >   "info": { "pgid": "7.171",
>>> >   "last_update": "0'0",
>>> >   "last_complete": "0'0",
>>> >   "log_tail": "0'0",
>>> >   "last_user_version": 0,
>>> >   "last_backfill": "MAX",
>>> >   "purged_snaps": "[]",
>>> >   "history": { "epoch_created": 293,
>>> >   "last_epoch_started": 343,
>>> >   "last_epoch_clean": 343,
>>> >   "last_epoch_split": 0,
>>> >   "same_up_since": 688,
>>> >   "same_interval_since": 688,
>>> >   "same_primary_since": 608,
>>> >   "last_scrub": "0'0",
>>> >   "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>>> >   "last_deep_scrub": "0'0",
>>> >   "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>>> >   "last_clean_scrub_stamp": "0.00"},
>>> >   "stats": { "v

Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Somnath Roy
But, do we know why Jumbo frames may have an impact on peering ?
In our setup so far, we haven't enabled jumbo frames other than performance 
reason (if at all).

Thanks & Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Robert 
LeBlanc
Sent: Tuesday, March 31, 2015 11:08 AM
To: Sage Weil
Cc: ceph-devel; Ceph-User
Subject: Re: [ceph-users] Force an OSD to try to peer

I was desperate for anything after exhausting every other possibility I could 
think of. Maybe I should put a checklist in the Ceph docs of things to look for.

Thanks,

On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil  wrote:
> On Tue, 31 Mar 2015, Robert LeBlanc wrote:
>> Turns out jumbo frames was not set on all the switch ports. Once that
>> was resolved the cluster quickly became healthy.
>
> I always hesitate to point the finger at the jumbo frames
> configuration but almost every time that is the culprit!
>
> Thanks for the update.  :)
> sage
>
>
>
>>
>> On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc  wrote:
>> > I've been working at this peering problem all day. I've done a lot
>> > of testing at the network layer and I just don't believe that we
>> > have a problem that would prevent OSDs from peering. When looking
>> > though osd_debug 20/20 logs, it just doesn't look like the OSDs are
>> > trying to peer. I don't know if it is because there are so many
>> > outstanding creations or what. OSDs will peer with OSDs on other
>> > hosts, but for reason only chooses a certain number and not one that it 
>> > needs to finish the peering process.
>> >
>> > I've check: firewall, open files, number of threads allowed. These
>> > usually have given me an error in the logs that helped me fix the problem.
>> >
>> > I can't find a configuration item that specifies how many peers an
>> > OSD should contact or anything that would be artificially limiting
>> > the peering connections. I've restarted the OSDs a number of times,
>> > as well as rebooting the hosts. I beleive if the OSDs finish
>> > peering everything will clear up. I can't find anything in pg query
>> > that would help me figure out what is blocking it (peering blocked
>> > by is empty). The PGs are scattered across all the hosts so we can't pin 
>> > it down to a specific host.
>> >
>> > Any ideas on what to try would be appreciated.
>> >
>> > [ulhglive-root@ceph9 ~]# ceph --version ceph version 0.80.7
>> > (6c0127fcb58008793d3c8b62d925bc91963672a3)
>> > [ulhglive-root@ceph9 ~]# ceph status
>> > cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
>> >  health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs
>> > stuck inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
>> >  monmap e2: 3 mons at
>> > {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.2
>> > 9:6789/0}, election epoch 30, quorum 0,1,2 mon1,mon2,mon3
>> >  osdmap e704: 120 osds: 120 up, 120 in
>> >   pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
>> > 11447 MB used, 436 TB / 436 TB avail
>> >  727 active+clean
>> >  990 peering
>> >   37 creating+peering
>> >1 down+peering
>> >  290 remapped+peering
>> >3 creating+remapped+peering
>> >
>> > { "state": "peering",
>> >   "epoch": 707,
>> >   "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> >   "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> >   "info": { "pgid": "7.171",
>> >   "last_update": "0'0",
>> >   "last_complete": "0'0",
>> >   "log_tail": "0'0",
>> >   "last_user_version": 0,
>> >   "last_backfill": "MAX",
>> >   "purged_snaps": "[]",
>> >   "history": { "epoch_created": 293,
>> >   "last_epoch_started": 343,
>> >   "last_epoch_clean": 343,
>> >   "last_epoch_split": 0,
>> >   "same_up_since": 688,
>> >   "same_interval_since": 688,
>> >   "same_primary_since": 608,
>> >   "last_scrub": "0'0",
>> >   "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> >   "last_deep_scrub": "0'0",
>> >   "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> >   "last_clean_scrub_stamp": "0.00"},
>> >   "stats": { "version": "0'0",
>> >   "reported_seq": "326",
>> >   "reported_epoch": "707",
>> >   "state": "peering",
>> >   "last_fresh": "2015-03-30 20:10:39.509855",
>> >   "last_change": "2015-03-30 19:44:17.361601",
>> >   "last_active": "2015-03-30 11:37:56.956417",
>> >   "last_clean": "2015-03-30 11:37:56.956417",
>> >   "last_became_active": "0.00",
>> >   "last_unstale": "2015-03-30 20:10:39.509855",
>> >   "mapping_epoch": 683,
>> >   "log_start": "0'0",
>> >   "ondisk_log_start": "0'0",
>> >   "created": 293,
>> >   "last_epoch_cle

Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Robert LeBlanc
I was desperate for anything after exhausting every other possibility
I could think of. Maybe I should put a checklist in the Ceph docs of
things to look for.

Thanks,

On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil  wrote:
> On Tue, 31 Mar 2015, Robert LeBlanc wrote:
>> Turns out jumbo frames was not set on all the switch ports. Once that
>> was resolved the cluster quickly became healthy.
>
> I always hesitate to point the finger at the jumbo frames configuration
> but almost every time that is the culprit!
>
> Thanks for the update.  :)
> sage
>
>
>
>>
>> On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc  wrote:
>> > I've been working at this peering problem all day. I've done a lot of
>> > testing at the network layer and I just don't believe that we have a 
>> > problem
>> > that would prevent OSDs from peering. When looking though osd_debug 20/20
>> > logs, it just doesn't look like the OSDs are trying to peer. I don't know 
>> > if
>> > it is because there are so many outstanding creations or what. OSDs will
>> > peer with OSDs on other hosts, but for reason only chooses a certain number
>> > and not one that it needs to finish the peering process.
>> >
>> > I've check: firewall, open files, number of threads allowed. These usually
>> > have given me an error in the logs that helped me fix the problem.
>> >
>> > I can't find a configuration item that specifies how many peers an OSD
>> > should contact or anything that would be artificially limiting the peering
>> > connections. I've restarted the OSDs a number of times, as well as 
>> > rebooting
>> > the hosts. I beleive if the OSDs finish peering everything will clear up. I
>> > can't find anything in pg query that would help me figure out what is
>> > blocking it (peering blocked by is empty). The PGs are scattered across all
>> > the hosts so we can't pin it down to a specific host.
>> >
>> > Any ideas on what to try would be appreciated.
>> >
>> > [ulhglive-root@ceph9 ~]# ceph --version
>> > ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>> > [ulhglive-root@ceph9 ~]# ceph status
>> > cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
>> >  health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
>> > inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
>> >  monmap e2: 3 mons at
>> > {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
>> > election epoch 30, quorum 0,1,2 mon1,mon2,mon3
>> >  osdmap e704: 120 osds: 120 up, 120 in
>> >   pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
>> > 11447 MB used, 436 TB / 436 TB avail
>> >  727 active+clean
>> >  990 peering
>> >   37 creating+peering
>> >1 down+peering
>> >  290 remapped+peering
>> >3 creating+remapped+peering
>> >
>> > { "state": "peering",
>> >   "epoch": 707,
>> >   "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> >   "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> >   "info": { "pgid": "7.171",
>> >   "last_update": "0'0",
>> >   "last_complete": "0'0",
>> >   "log_tail": "0'0",
>> >   "last_user_version": 0,
>> >   "last_backfill": "MAX",
>> >   "purged_snaps": "[]",
>> >   "history": { "epoch_created": 293,
>> >   "last_epoch_started": 343,
>> >   "last_epoch_clean": 343,
>> >   "last_epoch_split": 0,
>> >   "same_up_since": 688,
>> >   "same_interval_since": 688,
>> >   "same_primary_since": 608,
>> >   "last_scrub": "0'0",
>> >   "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> >   "last_deep_scrub": "0'0",
>> >   "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> >   "last_clean_scrub_stamp": "0.00"},
>> >   "stats": { "version": "0'0",
>> >   "reported_seq": "326",
>> >   "reported_epoch": "707",
>> >   "state": "peering",
>> >   "last_fresh": "2015-03-30 20:10:39.509855",
>> >   "last_change": "2015-03-30 19:44:17.361601",
>> >   "last_active": "2015-03-30 11:37:56.956417",
>> >   "last_clean": "2015-03-30 11:37:56.956417",
>> >   "last_became_active": "0.00",
>> >   "last_unstale": "2015-03-30 20:10:39.509855",
>> >   "mapping_epoch": 683,
>> >   "log_start": "0'0",
>> >   "ondisk_log_start": "0'0",
>> >   "created": 293,
>> >   "last_epoch_clean": 343,
>> >   "parent": "0.0",
>> >   "parent_split_bits": 0,
>> >   "last_scrub": "0'0",
>> >   "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> >   "last_deep_scrub": "0'0",
>> >   "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> >   "last_clean_scrub_stamp": "0.00",
>> >   "log_size": 0,
>> >   "ondisk_log_size": 0,
>> >   "stats_

Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Sage Weil
On Tue, 31 Mar 2015, Robert LeBlanc wrote:
> Turns out jumbo frames was not set on all the switch ports. Once that
> was resolved the cluster quickly became healthy.

I always hesitate to point the finger at the jumbo frames configuration 
but almost every time that is the culprit!

Thanks for the update.  :)
sage



> 
> On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc  wrote:
> > I've been working at this peering problem all day. I've done a lot of
> > testing at the network layer and I just don't believe that we have a problem
> > that would prevent OSDs from peering. When looking though osd_debug 20/20
> > logs, it just doesn't look like the OSDs are trying to peer. I don't know if
> > it is because there are so many outstanding creations or what. OSDs will
> > peer with OSDs on other hosts, but for reason only chooses a certain number
> > and not one that it needs to finish the peering process.
> >
> > I've check: firewall, open files, number of threads allowed. These usually
> > have given me an error in the logs that helped me fix the problem.
> >
> > I can't find a configuration item that specifies how many peers an OSD
> > should contact or anything that would be artificially limiting the peering
> > connections. I've restarted the OSDs a number of times, as well as rebooting
> > the hosts. I beleive if the OSDs finish peering everything will clear up. I
> > can't find anything in pg query that would help me figure out what is
> > blocking it (peering blocked by is empty). The PGs are scattered across all
> > the hosts so we can't pin it down to a specific host.
> >
> > Any ideas on what to try would be appreciated.
> >
> > [ulhglive-root@ceph9 ~]# ceph --version
> > ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> > [ulhglive-root@ceph9 ~]# ceph status
> > cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
> >  health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
> > inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
> >  monmap e2: 3 mons at
> > {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
> > election epoch 30, quorum 0,1,2 mon1,mon2,mon3
> >  osdmap e704: 120 osds: 120 up, 120 in
> >   pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
> > 11447 MB used, 436 TB / 436 TB avail
> >  727 active+clean
> >  990 peering
> >   37 creating+peering
> >1 down+peering
> >  290 remapped+peering
> >3 creating+remapped+peering
> >
> > { "state": "peering",
> >   "epoch": 707,
> >   "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> >   "acting": [
> > 40,
> > 92,
> > 48,
> > 91],
> >   "info": { "pgid": "7.171",
> >   "last_update": "0'0",
> >   "last_complete": "0'0",
> >   "log_tail": "0'0",
> >   "last_user_version": 0,
> >   "last_backfill": "MAX",
> >   "purged_snaps": "[]",
> >   "history": { "epoch_created": 293,
> >   "last_epoch_started": 343,
> >   "last_epoch_clean": 343,
> >   "last_epoch_split": 0,
> >   "same_up_since": 688,
> >   "same_interval_since": 688,
> >   "same_primary_since": 608,
> >   "last_scrub": "0'0",
> >   "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> >   "last_deep_scrub": "0'0",
> >   "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> >   "last_clean_scrub_stamp": "0.00"},
> >   "stats": { "version": "0'0",
> >   "reported_seq": "326",
> >   "reported_epoch": "707",
> >   "state": "peering",
> >   "last_fresh": "2015-03-30 20:10:39.509855",
> >   "last_change": "2015-03-30 19:44:17.361601",
> >   "last_active": "2015-03-30 11:37:56.956417",
> >   "last_clean": "2015-03-30 11:37:56.956417",
> >   "last_became_active": "0.00",
> >   "last_unstale": "2015-03-30 20:10:39.509855",
> >   "mapping_epoch": 683,
> >   "log_start": "0'0",
> >   "ondisk_log_start": "0'0",
> >   "created": 293,
> >   "last_epoch_clean": 343,
> >   "parent": "0.0",
> >   "parent_split_bits": 0,
> >   "last_scrub": "0'0",
> >   "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> >   "last_deep_scrub": "0'0",
> >   "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> >   "last_clean_scrub_stamp": "0.00",
> >   "log_size": 0,
> >   "ondisk_log_size": 0,
> >   "stats_invalid": "0",
> >   "stat_sum": { "num_bytes": 0,
> >   "num_objects": 0,
> >   "num_object_clones": 0,
> >   "num_object_copies": 0,
> >   "num_objects_missing_on_primary": 0,
> >   "num_objects_degraded": 0,
> >   "num_objects_unfound": 0,
> >   "num_object

Re: [ceph-users] Force an OSD to try to peer

2015-03-31 Thread Robert LeBlanc
Turns out jumbo frames was not set on all the switch ports. Once that
was resolved the cluster quickly became healthy.

On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc  wrote:
> I've been working at this peering problem all day. I've done a lot of
> testing at the network layer and I just don't believe that we have a problem
> that would prevent OSDs from peering. When looking though osd_debug 20/20
> logs, it just doesn't look like the OSDs are trying to peer. I don't know if
> it is because there are so many outstanding creations or what. OSDs will
> peer with OSDs on other hosts, but for reason only chooses a certain number
> and not one that it needs to finish the peering process.
>
> I've check: firewall, open files, number of threads allowed. These usually
> have given me an error in the logs that helped me fix the problem.
>
> I can't find a configuration item that specifies how many peers an OSD
> should contact or anything that would be artificially limiting the peering
> connections. I've restarted the OSDs a number of times, as well as rebooting
> the hosts. I beleive if the OSDs finish peering everything will clear up. I
> can't find anything in pg query that would help me figure out what is
> blocking it (peering blocked by is empty). The PGs are scattered across all
> the hosts so we can't pin it down to a specific host.
>
> Any ideas on what to try would be appreciated.
>
> [ulhglive-root@ceph9 ~]# ceph --version
> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> [ulhglive-root@ceph9 ~]# ceph status
> cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
>  health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
> inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
>  monmap e2: 3 mons at
> {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
> election epoch 30, quorum 0,1,2 mon1,mon2,mon3
>  osdmap e704: 120 osds: 120 up, 120 in
>   pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
> 11447 MB used, 436 TB / 436 TB avail
>  727 active+clean
>  990 peering
>   37 creating+peering
>1 down+peering
>  290 remapped+peering
>3 creating+remapped+peering
>
> { "state": "peering",
>   "epoch": 707,
>   "up": [
> 40,
> 92,
> 48,
> 91],
>   "acting": [
> 40,
> 92,
> 48,
> 91],
>   "info": { "pgid": "7.171",
>   "last_update": "0'0",
>   "last_complete": "0'0",
>   "log_tail": "0'0",
>   "last_user_version": 0,
>   "last_backfill": "MAX",
>   "purged_snaps": "[]",
>   "history": { "epoch_created": 293,
>   "last_epoch_started": 343,
>   "last_epoch_clean": 343,
>   "last_epoch_split": 0,
>   "same_up_since": 688,
>   "same_interval_since": 688,
>   "same_primary_since": 608,
>   "last_scrub": "0'0",
>   "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>   "last_deep_scrub": "0'0",
>   "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>   "last_clean_scrub_stamp": "0.00"},
>   "stats": { "version": "0'0",
>   "reported_seq": "326",
>   "reported_epoch": "707",
>   "state": "peering",
>   "last_fresh": "2015-03-30 20:10:39.509855",
>   "last_change": "2015-03-30 19:44:17.361601",
>   "last_active": "2015-03-30 11:37:56.956417",
>   "last_clean": "2015-03-30 11:37:56.956417",
>   "last_became_active": "0.00",
>   "last_unstale": "2015-03-30 20:10:39.509855",
>   "mapping_epoch": 683,
>   "log_start": "0'0",
>   "ondisk_log_start": "0'0",
>   "created": 293,
>   "last_epoch_clean": 343,
>   "parent": "0.0",
>   "parent_split_bits": 0,
>   "last_scrub": "0'0",
>   "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>   "last_deep_scrub": "0'0",
>   "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>   "last_clean_scrub_stamp": "0.00",
>   "log_size": 0,
>   "ondisk_log_size": 0,
>   "stats_invalid": "0",
>   "stat_sum": { "num_bytes": 0,
>   "num_objects": 0,
>   "num_object_clones": 0,
>   "num_object_copies": 0,
>   "num_objects_missing_on_primary": 0,
>   "num_objects_degraded": 0,
>   "num_objects_unfound": 0,
>   "num_objects_dirty": 0,
>   "num_whiteouts": 0,
>   "num_read": 0,
>   "num_read_kb": 0,
>   "num_write": 0,
>   "num_write_kb": 0,
>   "num_scrub_errors": 0,
>   "num_shallow_scrub_errors": 0,
>   "num_deep_scrub_errors": 0,
>   "num_objects_recovered": 0,
>   "num_bytes_recovered": 0,
>  

Re: [ceph-users] SSD Hardware recommendation

2015-03-31 Thread Adam Tygart
Speaking of SSD IOPs. Running the same tests on my SSDs (LiteOn
ECT-480N9S 480GB SSDs):
The lines at the bottom are a single 6TB spinning disk for comparison's sake.

http://imgur.com/a/fD0Mh

Based on these numbers, there is a minimum latency per operation, but
multiple operations can be performed simultaneously. The sweet spot
for my SSDs is ~8 journals per SSD to maximize IOPs on a per journal
basis. Unfortunately, at 8 journals, the overall IOPs is much less
than the stated IOPs for the SSD. (~5000 vs 9000 IOPs). Better than
spinning disks, but not what I was expecting.

The spreadsheet is available here:
https://people.beocat.cis.ksu.edu/~mozes/hobbit-ssd-vs-std-iops.ods

--
Adam

On Tue, Mar 31, 2015 at 7:09 AM, f...@univ-lr.fr  wrote:
> Hi,
>
> in our quest to get the right SSD for OSD journals, I managed to benchmark
> two kind of "10 DWPD" SSDs :
> - Toshiba M2 PX02SMF020
> - Samsung 845DC PRO
>
> I wan't to determine if a disk is appropriate considering its absolute
> performances, and the optimal number of ceph-osd processes using the SSD as
> a journal.
> The benchmark consists of a fio command, with SYNC and DIRECT access
> options, and 4k blocks write accesses.
>
> fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --runtime=60
> --time_based --group_reporting --name=journal-test --iodepth=<1 or 16>
> --numjobs=< ranging from 1 to 16>
>
> I think numjobs can represent the concurrent number of OSD served by this
> SSD. Am I right on this ?
>
>
> http://www.4shared.com/download/WOvooKVXce/Fio-Direct-Sync-ToshibaM2-Sams.png?lgfp=3000
>
> My understanding of that data is that the 845DC Pro cannot be used for more
> that 4 OSD.
> The M2 is very constant in its comportment.
> The iodepth has almost no impact on perfs here.
>
> Could someone having other SSD types make the same test to consolidate the
> data ?
>
> Among the short list that could be considered for that task (for their
> price/perfs/DWPD/...) :
> - Seagate 1200 SSD 200GB, SAS 12Gb/s ST200FM0053
> - Hitachi SSD800MM MLC HUSMM8020ASS200
> - Intel DC3700
>
> I've not yet considered write amplification mentionned in other posts.
>
> Frederic
>
> Josef Johansson  a écrit le 20/03/15 10:29 :
>
>
> The 845DC Pro does look really nice, comparable with s3700 with TDW even.
> The price is what really does it, as it’s almost a third compared with
> s3700..
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Journaling

2015-03-31 Thread Garg, Pankaj
Hi Mark,

Yes my reads are consistently slower. I have testes both Random and Sequential 
and various block sizes.

Thanks
Pankaj

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark 
Nelson
Sent: Monday, March 30, 2015 1:07 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] SSD Journaling

On 03/30/2015 03:01 PM, Garg, Pankaj wrote:
> Hi,
>
> I'm benchmarking my small cluster with HDDs vs HDDs with SSD Journaling.
> I am using both RADOS bench and Block device (using fio) for testing.
>
> I am seeing significant Write performance improvements, as expected. I 
> am however seeing the Reads coming out a bit slower on the SSD 
> Journaling side. They are not terribly different, but sometimes 10% slower.
>
> Is that something other folks have also seen, or do I need some 
> settings to be tuned properly? I'm wondering if accessing 2 drives for 
> reads, adds latency and hence the throughput suffers.

Hi,

What kind of reads are you seeing the degradation with?  Is it consistent with 
different sizes and random/seq?  Any interesting spikes or valleys during the 
tests?

>
> Thanks
>
> Pankaj
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Weird cluster restart behavior

2015-03-31 Thread Quentin Hartman
I'm working on redeploying a 14-node cluster. I'm running giant 0.87.1.
Last friday I got everything deployed and all was working well, and I set
noout and shut all the OSD nodes down over the weekend. Yesterday when I
spun it back up, the OSDs were behaving very strangely, incorrectly marking
each other because of missed heartbeats, even though they were up. It
looked like some kind of low-level networking problem, but I couldn't find
any.

After much work, I narrowed the apparent source of the problem down to the
OSDs running on the first host I started in the morning. They were the ones
that were logged the most messages about not being able to ping other OSDs,
and the other OSDs were mostly complaining about them. After running out of
other ideas to try, I restarted them, and then everything started working.
It's still working happily this morning. It seems as though when that set
of OSDs started they got stale OSD map information from the MON boxes,
which failed to be updated as the other OSDs came up. Does that make sense?
I still don't consider myself an expert on ceph architecture and would
appreciate and corrections or other possible interpretations of events (I'm
happy to provide whatever additional information I can) so I can get a
deeper understanding of things. If my interpretation of events is correct,
it seems that might point at a bug.

QH
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Creating and deploying OSDs in parallel

2015-03-31 Thread Dan van der Ster
Hi Somnath,
We have deployed many machines in parallel and it generally works.
Keep in mind that if you deploy many many (>1000) then this will
create so many osdmap incrementals, so quickly, that the memory usage
on the OSDs will increase substantially (until you reboot).
Best Regards, Dan

On Mon, Mar 30, 2015 at 5:29 PM, Somnath Roy  wrote:
> Hi,
>
> I am planning to modify our deployment script so that it can create and
> deploy multiple OSDs in parallel to the same host as well as on different
> hosts.
>
> Just wanted to check if there is any problem to run say ‘ceph-deploy osd
> create’ etc. in parallel while deploying cluster.
>
>
>
> Thanks & Regards
>
> Somnath
>
>
> 
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Hardware recommendation

2015-03-31 Thread f...@univ-lr.fr

Hi,

in our quest to get the right SSD for OSD journals, I managed to 
benchmark two kind of "10 DWPD" SSDs :

- Toshiba M2 PX02SMF020
- Samsung 845DC PRO

I wan't to determine if a disk is appropriate considering its absolute 
performances, and the optimal number of ceph-osd processes using the SSD 
as a journal.
The benchmark consists of a fio command, with SYNC and DIRECT access 
options, and 4k blocks write accesses.


fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k 
--runtime=60 --time_based --group_reporting --name=journal-test 
--iodepth=<1 or 16> --numjobs=< ranging from 1 to 16>


I think numjobs can represent the concurrent number of OSD served by 
this SSD. Am I right on this ?


   
http://www.4shared.com/download/WOvooKVXce/Fio-Direct-Sync-ToshibaM2-Sams.png?lgfp=3000


My understanding of that data is that the 845DC Pro cannot be used for 
more that 4 OSD.

The M2 is very constant in its comportment.
The iodepth has almost no impact on perfs here.

Could someone having other SSD types make the same test to consolidate 
the data ?


Among the short list that could be considered for that task (for their 
price/perfs/DWPD/...) :

- Seagate 1200 SSD 200GB, SAS 12Gb/s ST200FM0053
- Hitachi SSD800MM MLC HUSMM8020ASS200
- Intel DC3700

I've not yet considered write amplification mentionned in other posts.

Frederic

Josef Johansson  a écrit le 20/03/15 10:29 :



The 845DC Pro does look really nice, comparable with s3700 with TDW even.
The price is what really does it, as it’s almost a third compared with s3700..

  


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Radosgw multi-region user creation question

2015-03-31 Thread Abhishek L
Hi

I'm trying to set up a POC multi-region radosgw configuration (with
different ceph clusters). Following the official docs[1], here the part
about creation of zone system users was not very clear. Going by an
example configuration of 2 regions US (master zone us-dc1), EU (master
zone eu-dc1) for eg. (with secondary zones of other also created in
these regions). 

If I create zone users seperately in the 2 regions ie. us-dc1 zone user
& eu-dc1 zone user, while the metadata sync does occur, if I try to
create a bucket with location passed as the secondary region, it fails
with an 403, access denied, as the system user of secondary region is
unknown to master region. I was able to bypass this by creating a system
user for secondary zone of secondary region in the master region (ie
creating a system user for eu secondary zone in us region) and then
recreating the user in the secondary region by passing on --access &
--secret-key parameter to recreate the same user with same keys. This
seemed to work, however I'm not sure whether this is the direction to
proceed, as the docs do not mention a step like this


[1] 
http://ceph.com/docs/master/radosgw/federated-config/#configure-a-secondary-region

-- 
Abhishek


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw authorization failed

2015-03-31 Thread Neville

 
> Date: Mon, 30 Mar 2015 12:17:48 -0400
> From: yeh...@redhat.com
> To: neville.tay...@hotmail.co.uk
> CC: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Radosgw authorization failed
> 
> 
> 
> - Original Message -
> > From: "Neville" 
> > To: "Yehuda Sadeh-Weinraub" 
> > Cc: ceph-users@lists.ceph.com
> > Sent: Monday, March 30, 2015 6:49:29 AM
> > Subject: Re: [ceph-users] Radosgw authorization failed
> > 
> > 
> > > Date: Wed, 25 Mar 2015 11:43:44 -0400
> > > From: yeh...@redhat.com
> > > To: neville.tay...@hotmail.co.uk
> > > CC: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] Radosgw authorization failed
> > > 
> > > 
> > > 
> > > - Original Message -
> > > > From: "Neville" 
> > > > To: ceph-users@lists.ceph.com
> > > > Sent: Wednesday, March 25, 2015 8:16:39 AM
> > > > Subject: [ceph-users] Radosgw authorization failed
> > > > 
> > > > Hi all,
> > > > 
> > > > I'm testing backup product which supports Amazon S3 as target for 
> > > > Archive
> > > > storage and I'm trying to setup a Ceph cluster configured with the S3 
> > > > API
> > > > to
> > > > use as an internal target for backup archives instead of AWS.
> > > > 
> > > > I've followed the online guide for setting up Radosgw and created a
> > > > default
> > > > region and zone based on the AWS naming convention US-East-1. I'm not
> > > > sure
> > > > if this is relevant but since I was having issues I thought it might 
> > > > need
> > > > to
> > > > be the same.
> > > > 
> > > > I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can
> > > > create a bucket, create a folder, list buckets etc. The problem is when
> > > > the
> > > > backup software tries to create an object I get an authorization 
> > > > failure.
> > > > It's using the same user/access/secret as I'm using from boto.s3 and I'm
> > > > sure the creds are right as it lets me create the initial connection, it
> > > > just fails when trying to create an object (backup folder).
> > > > 
> > > > Here's the extract from the radosgw log:
> > > > 
> > > > -
> > > > 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
> > > > /:list_bucket:init op
> > > > 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
> > > > /:list_bucket:verifying op mask
> > > > 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1
> > > > user.op_mask=7
> > > > 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
> > > > /:list_bucket:verifying op permissions
> > > > 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for
> > > > uid=test
> > > > mask=49
> > > > 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
> > > > 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for
> > > > group=1
> > > > mask=49
> > > > 2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15
> > > > 2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for
> > > > group=2
> > > > mask=49
> > > > 2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15
> > > > 2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test
> > > > owner=test perm=1
> > > > 2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm
> > > > (type)=1,
> > > > policy perm=1, user_perm_mask=1, acl perm=1
> > > > 2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET
> > > > /:list_bucket:verifying op params
> > > > 2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET
> > > > /:list_bucket:executing
> > > > 2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list
> > > > test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2])
> > > > start num 1001
> > > > 2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET
> > > > /:list_bucket:http status=200
> > > > 2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done
> > > > req=0x7f107000e2e0
> > > > http_status=200 ==
> > > > 2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request
> > > > req=0x7f107000f0e0
> > > > 2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
> > > > 2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
> > > > 2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request
> > > > req=0x7f107000f6b0
> > > > 2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request
> > > > req=0x7f107000f0e0
> > > > 2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
> > > > 2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88
> > > > 2015-03-25 15:07:26.517084 7f1058dd7700 20
> > > > CONTENT_TYPE=application/octet-stream
> > > > 2015-03-25 15:07:26.517085 7f1058dd7700 20 
> > > > CONTEXT_DOCUMENT_ROOT=/var/www
> > > > 2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX=
> > > > 2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www
> > > > 2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER
> > > > 

Re: [ceph-users] Cannot add OSD node into crushmap or all writes fail

2015-03-31 Thread Henrik Korkuc

check firewall rules, network connectivity.
Can all nodes and clients reach each other? Can you telnet to OSD ports 
(note that multiple OSDs may listen on differenct ports)?


On 3/31/15 8:44, Tyler Bishop wrote:
I have this ceph node that will correctly recover into my ceph pool 
and performance looks to be normal for the rbd clients.  However after 
a few minutes once finishing recovery the rbd clients begin to fall 
over and cannot write data to the pool.


I've been trying to figure this out for weeks! None of the logs 
contain anything relevant at all.


If I disable the node in the crushmap the rbd clients immediately 
begin writing to the other nodes.


Ideas?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] One of three monitors can not be started

2015-03-31 Thread 张皓宇
Who can help me? 

One monitor in my ceph cluster can not be started. 
Before that, I added '[mon] mon_compact_on_start = true' to /etc/ceph/ceph.conf 
on three monitor hosts. Then I did 'ceph tell mon.computer05 compact ' on 
computer05, which has a monitor on it. 
When store.db of computer05 changed from 108G to 1G,  mon.computer06 stoped, 
and it can not be started since that. 

If I start mon.computer06, it will stop on this state:
# /etc/init.d/ceph start mon.computer06

=== mon.computer06 ===

Starting Ceph mon.computer06 on computer06...

The process info is like this:
root 12149  3807  0 20:46 pts/27   00:00:00 /bin/sh /etc/init.d/ceph start 
mon.computer06

root 12308 12149  0 20:46 pts/27   00:00:00 bash -c ulimit -n 32768;
  /usr/bin/ceph-mon -i computer06 --pid-file 
/var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf

root 12309 12308  0 20:46 pts/27   00:00:00 /usr/bin/ceph-mon -i 
computer06 --pid-file /var/run/ceph/mon.computer06.pid -c 
/etc/ceph/ceph.conf

root 12313 12309 19 20:46 pts/27   00:00:01 /usr/bin/ceph-mon -i 
computer06 --pid-file /var/run/ceph/mon.computer06.pid -c 
/etc/ceph/ceph.conf  

Log on computer06 is like this:
2015-03-30 20:46:54.152956 7fc5379d07a0  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
...
2015-03-30 20:46:54.759791 7fc5379d07a0  1 mon.computer06@-1(probing) e4 
preinit clean up potentially inconsistent store state

 Sorry, my English is not good.
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host failure bring down the whole cluster

2015-03-31 Thread Henrik Korkuc

On 3/31/15 11:27, Kai KH Huang wrote:

1) But Ceph says "...You can run a cluster with 1 monitor." 
(http://ceph.com/docs/master/rados/operations/add-or-rm-mons/), I assume it should work. 
And brain split is not my current concern

Point is that you must have majority of monitors up.
* In one monitor setup you need one monitor running,
* In two monitor setup you need two monitors running,because if one goes 
down you do not have majority up,
* In three monitor setup you need at least two monitors up, because if 
one goes down you still have majority up,

* 4 - at least 3
* 5 - at least 3
* etc




2) I've written object to Ceph, now I just want to get it back

Anyway. I tried to reduce the mon number to 1. But after I remove it following 
the steps, it just cannot start up any more

1. [root~]  service ceph -a stop mon.serverB
2. [root~]  ceph mon remove serverB ## hang here forever
3. #Remove the monitor entry from ceph.conf.
4. Restart ceph service
It is grey area for me, but I think that you failed to remove that 
monitor because you didn't have a quorum for operation to succeed. I 
think you'll need to modify monmap manually and remove second monitor 
from it




[root@serverA~]# systemctl status ceph.service -l
ceph.service - LSB: Start Ceph distributed file system daemons at boot time
Loaded: loaded (/etc/rc.d/init.d/ceph)
Active: failed (Result: timeout) since Tue 2015-03-31 15:46:25 CST; 3min 
15s ago
   Process: 2937 ExecStop=/etc/rc.d/init.d/ceph stop (code=exited, 
status=0/SUCCESS)
   Process: 3670 ExecStart=/etc/rc.d/init.d/ceph start (code=killed, 
signal=TERM)

Mar 31 15:44:26 serverA ceph[3670]: === osd.6 ===
Mar 31 15:44:56 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.6 --keyring=/var/lib/ceph/osd/ceph-6/keyring osd 
crush create-or-move -- 6 3.64 host=serverA root=default'
Mar 31 15:44:56 serverA ceph[3670]: === osd.7 ===
Mar 31 15:45:26 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.7 --keyring=/var/lib/ceph/osd/ceph-7/keyring osd 
crush create-or-move -- 7 3.64 host=serverA root=default'
Mar 31 15:45:26 serverA ceph[3670]: === osd.8 ===
Mar 31 15:45:57 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.8 --keyring=/var/lib/ceph/osd/ceph-8/keyring osd 
crush create-or-move -- 8 3.64 host=serverA root=default'
Mar 31 15:45:57 serverA ceph[3670]: === osd.9 ===
Mar 31 15:46:25 serverA systemd[1]: ceph.service operation timed out. 
Terminating.
Mar 31 15:46:25 serverA systemd[1]: Failed to start LSB: Start Ceph distributed 
file system daemons at boot time.
Mar 31 15:46:25 serverA systemd[1]: Unit ceph.service entered failed state.

/var/log/ceph/ceph.log says:
2015-03-31 15:55:57.648800 mon.0 10.???.78:6789/0 1048 : cluster [INF] osd.21 
10.???.78:6855/25598 failed (39 reports from 9 peers after 20.118062 >= grace 
20.00)
2015-03-31 15:55:57.931889 mon.0 10.???.78:6789/0 1055 : cluster [INF] osd.15 
10..78:6825/23894 failed (39 reports from 9 peers after 20.401379 >= grace 
20.00)

Obviously serverB is down, but it should not affect serverA from functioning? 
Right?

From: Gregory Farnum [g...@gregs42.com]
Sent: Tuesday, March 31, 2015 11:53 AM
To: Lindsay Mathieson; Kai KH Huang
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] One host failure bring down the whole cluster

On Mon, Mar 30, 2015 at 8:02 PM, Lindsay Mathieson
 wrote:

On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:

Hi, all
 I have a two-node Ceph cluster, and both are monitor and osd. When
they're both up, osd are all up and in, everything is fine... almost:



Two things.

1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with
just two monitors and you run a risk of split brain.

You can form quorums with an even number of monitors, and Ceph does so
— there's no risk of split brain.

The problem with 2 monitors is that a quorum is always 2 — which is
exactly what you're seeing right now. You can't run with only one
monitor up (assuming you have a non-zero number of them).


2 - You also probably have a min size of two set (the default). This means
that you need a minimum  of two copies of each data object for writes to work.
So with just two nodes, if one goes down you can't write to the other.

Also this.



So:
- Install a extra monitor node - it doesn't have to be powerful, we just use a
Intel Celeron NUC for that.

- reduce your minimum size to 1 (One).

Yep.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host failure bring down the whole cluster

2015-03-31 Thread Kai KH Huang
1) But Ceph says "...You can run a cluster with 1 monitor." 
(http://ceph.com/docs/master/rados/operations/add-or-rm-mons/), I assume it 
should work. And brain split is not my current concern
2) I've written object to Ceph, now I just want to get it back

Anyway. I tried to reduce the mon number to 1. But after I remove it following 
the steps, it just cannot start up any more

1. [root~]  service ceph -a stop mon.serverB
2. [root~]  ceph mon remove serverB ## hang here forever
3. #Remove the monitor entry from ceph.conf.
4. Restart ceph service


[root@serverA~]# systemctl status ceph.service -l
ceph.service - LSB: Start Ceph distributed file system daemons at boot time
   Loaded: loaded (/etc/rc.d/init.d/ceph)
   Active: failed (Result: timeout) since Tue 2015-03-31 15:46:25 CST; 3min 15s 
ago
  Process: 2937 ExecStop=/etc/rc.d/init.d/ceph stop (code=exited, 
status=0/SUCCESS)
  Process: 3670 ExecStart=/etc/rc.d/init.d/ceph start (code=killed, signal=TERM)

Mar 31 15:44:26 serverA ceph[3670]: === osd.6 ===
Mar 31 15:44:56 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.6 --keyring=/var/lib/ceph/osd/ceph-6/keyring osd 
crush create-or-move -- 6 3.64 host=serverA root=default'
Mar 31 15:44:56 serverA ceph[3670]: === osd.7 ===
Mar 31 15:45:26 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.7 --keyring=/var/lib/ceph/osd/ceph-7/keyring osd 
crush create-or-move -- 7 3.64 host=serverA root=default'
Mar 31 15:45:26 serverA ceph[3670]: === osd.8 ===
Mar 31 15:45:57 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.8 --keyring=/var/lib/ceph/osd/ceph-8/keyring osd 
crush create-or-move -- 8 3.64 host=serverA root=default'
Mar 31 15:45:57 serverA ceph[3670]: === osd.9 ===
Mar 31 15:46:25 serverA systemd[1]: ceph.service operation timed out. 
Terminating.
Mar 31 15:46:25 serverA systemd[1]: Failed to start LSB: Start Ceph distributed 
file system daemons at boot time.
Mar 31 15:46:25 serverA systemd[1]: Unit ceph.service entered failed state.

/var/log/ceph/ceph.log says:
2015-03-31 15:55:57.648800 mon.0 10.???.78:6789/0 1048 : cluster [INF] osd.21 
10.???.78:6855/25598 failed (39 reports from 9 peers after 20.118062 >= grace 
20.00)
2015-03-31 15:55:57.931889 mon.0 10.???.78:6789/0 1055 : cluster [INF] osd.15 
10..78:6825/23894 failed (39 reports from 9 peers after 20.401379 >= grace 
20.00)

Obviously serverB is down, but it should not affect serverA from functioning? 
Right?

From: Gregory Farnum [g...@gregs42.com]
Sent: Tuesday, March 31, 2015 11:53 AM
To: Lindsay Mathieson; Kai KH Huang
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] One host failure bring down the whole cluster

On Mon, Mar 30, 2015 at 8:02 PM, Lindsay Mathieson
 wrote:
> On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:
>> Hi, all
>> I have a two-node Ceph cluster, and both are monitor and osd. When
>> they're both up, osd are all up and in, everything is fine... almost:
>
>
>
> Two things.
>
> 1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with
> just two monitors and you run a risk of split brain.

You can form quorums with an even number of monitors, and Ceph does so
— there's no risk of split brain.

The problem with 2 monitors is that a quorum is always 2 — which is
exactly what you're seeing right now. You can't run with only one
monitor up (assuming you have a non-zero number of them).

> 2 - You also probably have a min size of two set (the default). This means
> that you need a minimum  of two copies of each data object for writes to work.
> So with just two nodes, if one goes down you can't write to the other.

Also this.

>
>
> So:
> - Install a extra monitor node - it doesn't have to be powerful, we just use a
> Intel Celeron NUC for that.
>
> - reduce your minimum size to 1 (One).

Yep.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW buckets sync to AWS?

2015-03-31 Thread Henrik Korkuc

Hello,

can anyone recommend script/program to periodically synchronize RGW 
buckets with Amazon's S3?


--
Sincerely
Henrik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com