Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Cameron . Scrace
It mostly like is the model of switch. In its settings the minimum frame 
size you can set is 1518, default MTU is 1500, seems the switch wants the 
18 byte difference.

We are using a pair of Netgear XS712T and bonded pairs of Intel 10-Gigabit 
X540-AT2 (rev 01) with 3 VLans. 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Somnath Roy somnath@sandisk.com
To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz, Jan 
Schermer j...@schermer.cz
Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com
Date:   04/06/2015 11:13 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic)



Hmm…Thanks for sharing this..
Any chance it depends on switch ?
Could you please share what NIC card and switch you are using ?
 
Thanks  Regards
Somnath
 
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Wednesday, June 03, 2015 4:07 PM
To: Somnath Roy; Jan Schermer
Cc: ceph-users@lists.ceph.com; ceph-users
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
 
The interface MTU has to be 18 or more bytes lower than the switch MTU or 
it just stops working. As far as I know the monitor communication is not 
being encapsulated by any SDN. 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:Jan Schermer j...@schermer.cz, cameron.scr...@solnet.co.nz 
cameron.scr...@solnet.co.nz 
Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com 
Date:04/06/2015 02:58 a.m. 
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 




The TCP_NODELAY issue was with kernel rbd *not* with OSD. Ceph messenger 
code base is setting it by default. 
BTW, I doubt TCP_NODELAY has anything to do with it. 
  
Thanks  Regards 
Somnath 
  
From: Jan Schermer [mailto:j...@schermer.cz] 
Sent: Wednesday, June 03, 2015 1:37 AM
To: cameron.scr...@solnet.co.nz
Cc: Somnath Roy; ceph-users@lists.ceph.com; ceph-users
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic) 
 
Interface and switch should have the same MTU and that should not cause 
any issues (setting switch MTU higher is always safe, though). 
Aren’t you encapsulating the mon communication in some SDN like 
openwswitch? Is that a straight L2 connection? 
 
I think this is worth investigating. For example are mons properly setting 
TCP_NODELAY on the sockets that are latency sensitive? (I just tried 
finding out and lsof/netstat doesn’t report that to me, I’d need to 
restart and strace it… I vaguely remember there was an issue with NODELAY 
that was fixed on OSD side.) 
 
Jan 
 
 
On 03 Jun 2015, at 06:30, cameron.scr...@solnet.co.nz wrote: 
  
Seems to be something to do with our switch. If the interface MTU is too 
close to the switch MTU it stops working. Thanks for all your help :) 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz 
Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de 
Date:03/06/2015 11:49 a.m. 
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 





I doubt it is anything to do with Ceph, hope you checked your switch is 
supporting Jumbo frames and you have set MTU 9000 to all the devices in 
between. It‘s better to ping your devices (all the devices participating 
in the cluster) like the way it mentioned in the following articles , just 
in case you are not sure. 
 
http://www.mylesgray.com/hardware/test-jumbo-frames-working/ 
http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working
 

 
Hope this helps, 
 
Thanks  Regards 
Somnath 
 
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic) 
 
Setting the MTU

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Alex Moore
Surely this is to be expected... 1500 is the IP MTU, and 1518 is the 
Ethernet MTU including 4 Bytes for optional 802.1q VLAN tag. Interface 
MTU typically means the IP MTU, whereas a layer 2 switch cares more 
about layer 2 Ethernet frames, and so MTU in that context means the 
Ethernet MTU.


On 04/06/2015 12:26 AM, cameron.scr...@solnet.co.nz wrote:
It mostly like is the model of switch. In its settings the minimum 
frame size you can set is 1518, default MTU is 1500, seems the switch 
wants the 18 byte difference.


We are using a pair of Netgear XS712T and bonded pairs of Intel 
10-Gigabit X540-AT2 (rev 01) with 3 VLans.


Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From: Somnath Roy somnath@sandisk.com
To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz, Jan 
Schermer j...@schermer.cz
Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com

Date: 04/06/2015 11:13 a.m.
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)





Hmm…Thanks for sharing this..
Any chance it depends on switch ?
Could you please share what NIC card and switch you are using ?

Thanks  Regards
Somnath

*From:* cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] *
Sent:* Wednesday, June 03, 2015 4:07 PM*
To:* Somnath Roy; Jan Schermer*
Cc:* ceph-users@lists.ceph.com; ceph-users*
Subject:* RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)


The interface MTU has to be 18 or more bytes lower than the switch MTU 
or it just stops working. As far as I know the monitor communication 
is not being encapsulated by any SDN.


Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email _cameron.scr...@solnet.co.nz_ mailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140
_
__www.solnet.co.nz_



From: Somnath Roy _Somnath.Roy@sandisk.com_ 
mailto:somnath@sandisk.com
To: Jan Schermer _jan@schermer.cz_ mailto:j...@schermer.cz, 
_cameron.scr...@solnet.co.nz_ mailto:cameron.scr...@solnet.co.nz 
_cameron.scr...@solnet.co.nz_ mailto:cameron.scr...@solnet.co.nz
Cc: _ceph-us...@lists.ceph.com_ mailto:ceph-users@lists.ceph.com 
_ceph-us...@lists.ceph.com_ mailto:ceph-users@lists.ceph.com, 
ceph-users _ceph-users-boun...@lists.ceph.com_ 
mailto:ceph-users-boun...@lists.ceph.com

Date: 04/06/2015 02:58 a.m.
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)






The TCP_NODELAY issue was with kernel rbd **not** with OSD. Ceph 
messenger code base is setting it by default.

BTW, I doubt TCP_NODELAY has anything to do with it.

Thanks  Regards
Somnath
*
From:* Jan Schermer [_mailto:jan@schermer.cz_] *
Sent:* Wednesday, June 03, 2015 1:37 AM*
To:* _cameron.scr...@solnet.co.nz_ mailto:cameron.scr...@solnet.co.nz*
Cc:* Somnath Roy; _ceph-us...@lists.ceph.com_ 
mailto:ceph-users@lists.ceph.com; ceph-users*
Subject:* Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)


Interface and switch should have the same MTU and that should not 
cause any issues (setting switch MTU higher is always safe, though).
Aren’t you encapsulating the mon communication in some SDN like 
openwswitch? Is that a straight L2 connection?


I think this is worth investigating. For example are mons properly 
setting TCP_NODELAY on the sockets that are latency sensitive? (I just 
tried finding out and lsof/netstat doesn’t report that to me, I’d need 
to restart and strace it… I vaguely remember there was an issue with 
NODELAY that was fixed on OSD side.)


Jan


On 03 Jun 2015, at 06:30, _cameron.scr...@solnet.co.nz_ 
mailto:cameron.scr...@solnet.co.nzwrote:


Seems to be something to do with our switch. If the interface MTU is 
too close to the switch MTU it stops working. Thanks for all your help :)


Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email _cameron.scr...@solnet.co.nz_ mailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140_

__www.solnet.co.nz_ x-msg://2/www.solnet.co.nz



From: Somnath Roy _Somnath.Roy@sandisk.com_ 
mailto:somnath@sandisk.com
To: _cameron.scr...@solnet.co.nz_ 
mailto:cameron.scr...@solnet.co.nz _cameron.scr...@solnet.co.nz_ 
mailto:cameron.scr...@solnet.co.nz
Cc: _ceph-us...@lists.ceph.com_ mailto:ceph-users@lists.ceph.com 
_ceph-us...@lists.ceph.com_ mailto:ceph-users@lists.ceph.com

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Somnath Roy
Hmm…Thanks for sharing this..
Any chance it depends on switch ?
Could you please share what NIC card and switch you are using ?

Thanks  Regards
Somnath

From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz]
Sent: Wednesday, June 03, 2015 4:07 PM
To: Somnath Roy; Jan Schermer
Cc: ceph-users@lists.ceph.com; ceph-users
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

The interface MTU has to be 18 or more bytes lower than the switch MTU or it 
just stops working. As far as I know the monitor communication is not being 
encapsulated by any SDN.

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com
To:Jan Schermer j...@schermer.czmailto:j...@schermer.cz, 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Cc:ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com, ceph-users 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com
Date:04/06/2015 02:58 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)




The TCP_NODELAY issue was with kernel rbd *not* with OSD. Ceph messenger code 
base is setting it by default.
BTW, I doubt TCP_NODELAY has anything to do with it.

Thanks  Regards
Somnath

From: Jan Schermer [mailto:j...@schermer.cz]
Sent: Wednesday, June 03, 2015 1:37 AM
To: cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Cc: Somnath Roy; ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; 
ceph-users
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

Interface and switch should have the same MTU and that should not cause any 
issues (setting switch MTU higher is always safe, though).
Aren’t you encapsulating the mon communication in some SDN like openwswitch? Is 
that a straight L2 connection?

I think this is worth investigating. For example are mons properly setting 
TCP_NODELAY on the sockets that are latency sensitive? (I just tried finding 
out and lsof/netstat doesn’t report that to me, I’d need to restart and strace 
it… I vaguely remember there was an issue with NODELAY that was fixed on OSD 
side.)

Jan


On 03 Jun 2015, at 06:30, 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz wrote:

Seems to be something to do with our switch. If the interface MTU is too close 
to the switch MTU it stops working. Thanks for all your help :)

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nzx-msg://2/www.solnet.co.nz



From:Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com
To:cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Cc:ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com, ceph-users 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com, 
Joao Eduardo Luis j...@suse.demailto:j...@suse.de
Date:03/06/2015 11:49 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)





I doubt it is anything to do with Ceph, hope you checked your switch is 
supporting Jumbo frames and you have set MTU 9000 to all the devices in 
between. It‘s better to ping your devices (all the devices participating in the 
cluster) like the way it mentioned in the following articles , just in case you 
are not sure.

http://www.mylesgray.com/hardware/test-jumbo-frames-working/
http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working

Hope this helps,

Thanks  Regards
Somnath

From: cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
[mailto:cameron.scr...@solnet.co.nz]
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; ceph-users; 
Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately 
we really want Jumbo Frames to be on, any ideas on how to get ceph to work with 
them on?

Thanks!

Cameron Scrace
Infrastructure

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Somnath Roy
The TCP_NODELAY issue was with kernel rbd *not* with OSD. Ceph messenger code 
base is setting it by default.
BTW, I doubt TCP_NODELAY has anything to do with it.

Thanks  Regards
Somnath

From: Jan Schermer [mailto:j...@schermer.cz]
Sent: Wednesday, June 03, 2015 1:37 AM
To: cameron.scr...@solnet.co.nz
Cc: Somnath Roy; ceph-users@lists.ceph.com; ceph-users
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

Interface and switch should have the same MTU and that should not cause any 
issues (setting switch MTU higher is always safe, though).
Aren’t you encapsulating the mon communication in some SDN like openwswitch? Is 
that a straight L2 connection?

I think this is worth investigating. For example are mons properly setting 
TCP_NODELAY on the sockets that are latency sensitive? (I just tried finding 
out and lsof/netstat doesn’t report that to me, I’d need to restart and strace 
it… I vaguely remember there was an issue with NODELAY that was fixed on OSD 
side.)

Jan


On 03 Jun 2015, at 06:30, 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz wrote:

Seems to be something to do with our switch. If the interface MTU is too close 
to the switch MTU it stops working. Thanks for all your help :)

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nzx-msg://2/www.solnet.co.nz



From:Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com
To:cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Cc:ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com, ceph-users 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com, 
Joao Eduardo Luis j...@suse.demailto:j...@suse.de
Date:03/06/2015 11:49 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)




I doubt it is anything to do with Ceph, hope you checked your switch is 
supporting Jumbo frames and you have set MTU 9000 to all the devices in 
between. It‘s better to ping your devices (all the devices participating in the 
cluster) like the way it mentioned in the following articles , just in case you 
are not sure.

http://www.mylesgray.com/hardware/test-jumbo-frames-working/
http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working

Hope this helps,

Thanks  Regards
Somnath

From: cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
[mailto:cameron.scr...@solnet.co.nz]
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; ceph-users; 
Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately 
we really want Jumbo Frames to be on, any ideas on how to get ceph to work with 
them on?

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nzx-msg://2/www.solnet.co.nz



From:Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com
To:cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Cc:ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com, ceph-users 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com, 
Joao Eduardo Luis j...@suse.demailto:j...@suse.de
Date:03/06/2015 10:34 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)





We have seen some communication issue with that, try to make all the server MTU 
1500 and try out…

From: cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
[mailto:cameron.scr...@solnet.co.nz]
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; ceph-users; 
Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

We are running with Jumbo Frames turned on. Is that likely to be the issue? Do 
I need to configure something in ceph?

The mon maps are fine and after setting debug to 10 and debug ms to 1, I see 
probe timeouts in the logs: http

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Joao Eduardo Luis
On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to force
 a re-election only the monitor I send the request to shows the new
 election in its logs. My logs are filled entirely of the following two
 lines:
 
 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything
more than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due to the
monitors not being able to communicate with each other.  Given you see
traffic between the monitors, I'm inclined to assume that the other two
monitors do not have each other on the monmap or, if they do know each
other, either 1) the monitor's auth keys do not match, or 2) the probe
timeout is being triggered before they successfully manage to find
enough monitors to trigger an election -- which may be due to latency.

Logs will tells us more.

  -Joao

 Querying the admin socket with mon_status (the other two are the similar
 but with their hostnames and rank):
 
 {
 name: wcm1,
 rank: 0,
 state: probing,
 election_epoch: 1,
 quorum: [],
 outside_quorum: [
 wcm1
 ],
 extra_probe_peers: [],
 sync_provider: [],
 monmap: {
 epoch: 0,
 fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
 modified: 2015-06-02 10:43:41.467811,
 created: 2015-06-02 10:43:41.467811,
 mons: [
 {
 rank: 0,
 name: wcm1,
 addr: 10.1.226.64:6789\/0
 },
 {
 rank: 1,
 name: wcm2,
 addr: 10.1.226.65:6789\/0
 },
 {
 rank: 2,
 name: wcm3,
 addr: 10.1.226.66:6789\/0
 }
 ]
 }
 }

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
Thanks for the links, Jumbo frames are definitely working. Although we had 
to set the MTU to 8192 because one of the components doesn't support an 
MTU higher than that. 

Thanks for the help. Looks like we may just have to deal with jumbo frames 
being off.

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Somnath Roy somnath@sandisk.com
To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz
Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de
Date:   03/06/2015 11:49 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic)



I doubt it is anything to do with Ceph, hope you checked your switch is 
supporting Jumbo frames and you have set MTU 9000 to all the devices in 
between. It‘s better to ping your devices (all the devices participating 
in the cluster) like the way it mentioned in the following articles , just 
in case you are not sure.
 
http://www.mylesgray.com/hardware/test-jumbo-frames-working/
http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working
 
Hope this helps,
 
Thanks  Regards
Somnath
 
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
 
Setting the MTU to 1500 worked, monitors reach quorum right away. 
Unfortunately we really want Jumbo Frames to be on, any ideas on how to 
get ceph to work with them on? 

Thanks! 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz 
Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de 
Date:03/06/2015 10:34 a.m. 
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 




We have seen some communication issue with that, try to make all the 
server MTU 1500 and try out… 
  
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic) 
  
We are running with Jumbo Frames turned on. Is that likely to be the 
issue? Do I need to configure something in ceph? 

The mon maps are fine and after setting debug to 10 and debug ms to 1, I 
see probe timeouts in the logs: http://pastebin.com/44M1uJZc 
I just set probe timeout to 10 (up from 2) and it still times out. 

Thanks! 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com 
ceph-users@lists.ceph.com 
Date:03/06/2015 03:49 a.m. 
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 
Sent by:ceph-users ceph-users-boun...@lists.ceph.com 





By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Joao Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)

On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
Setting the MTU to 1500 worked, monitors reach quorum right away. 
Unfortunately we really want Jumbo Frames to be on, any ideas on how to 
get ceph to work with them on?

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Somnath Roy somnath@sandisk.com
To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz
Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de
Date:   03/06/2015 10:34 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic)



We have seen some communication issue with that, try to make all the 
server MTU 1500 and try out…
 
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
 
We are running with Jumbo Frames turned on. Is that likely to be the 
issue? Do I need to configure something in ceph? 

The mon maps are fine and after setting debug to 10 and debug ms to 1, I 
see probe timeouts in the logs: http://pastebin.com/44M1uJZc 
I just set probe timeout to 10 (up from 2) and it still times out. 

Thanks! 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com 
ceph-users@lists.ceph.com 
Date:03/06/2015 03:49 a.m. 
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 
Sent by:ceph-users ceph-users-boun...@lists.ceph.com 




By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Joao Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)

On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything 
more than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due to the 
monitors not being able to communicate with each other.  Given you see 
traffic between the monitors, I'm inclined to assume that the other two 
monitors do not have each other on the monmap or, if they do know each 
other, either 1) the monitor's auth keys do not match, or 2) the probe 
timeout is being triggered before they successfully manage to find enough 
monitors to trigger an election -- which may be due to latency.

Logs will tells us more.

 -Joao

 Querying the admin socket with mon_status (the other two are the
 similar but with their hostnames and rank):

 {
 name: wcm1,
 rank: 0,
 state: probing,
 election_epoch: 1,
 quorum: [],
 outside_quorum: [
 wcm1
 ],
 extra_probe_peers: [],
 sync_provider: [],
 monmap: {
 epoch: 0,
 fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
 modified: 2015-06-02 10:43:41.467811,
 created: 2015-06-02 10:43:41.467811,
 mons: [
 {
 rank: 0,
 name: wcm1,
 addr: 10.1.226.64:6789\/0
 },
 {
 rank: 1,
 name: wcm2,
 addr: 10.1.226.65:6789\/0
 },
 {
 rank: 2,
 name: wcm3,
 addr: 10.1.226.66:6789\/0
 }
 ]
 }
 }

___
ceph-users mailing list
ceph

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Somnath Roy
I doubt it is anything to do with Ceph, hope you checked your switch is 
supporting Jumbo frames and you have set MTU 9000 to all the devices in 
between. It‘s better to ping your devices (all the devices participating in the 
cluster) like the way it mentioned in the following articles , just in case you 
are not sure.

http://www.mylesgray.com/hardware/test-jumbo-frames-working/
http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working

Hope this helps,

Thanks  Regards
Somnath

From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz]
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately 
we really want Jumbo Frames to be on, any ideas on how to get ceph to work with 
them on?

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com
To:cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Cc:ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com, ceph-users 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com, 
Joao Eduardo Luis j...@suse.demailto:j...@suse.de
Date:03/06/2015 10:34 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)




We have seen some communication issue with that, try to make all the server MTU 
1500 and try out…

From: cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
[mailto:cameron.scr...@solnet.co.nz]
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; ceph-users; 
Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

We are running with Jumbo Frames turned on. Is that likely to be the issue? Do 
I need to configure something in ceph?

The mon maps are fine and after setting debug to 10 and debug ms to 1, I see 
probe timeouts in the logs: http://pastebin.com/44M1uJZc
I just set probe timeout to 10 (up from 2) and it still times out.

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com
To:Joao Eduardo Luis j...@suse.demailto:j...@suse.de, 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Date:03/06/2015 03:49 a.m.
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
Sent by:ceph-users 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com





By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao 
Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

On 06/02/2015 01:42 AM, 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything more 
than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Somnath Roy
We have seen some communication issue with that, try to make all the server MTU 
1500 and try out...

From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz]
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

We are running with Jumbo Frames turned on. Is that likely to be the issue? Do 
I need to configure something in ceph?

The mon maps are fine and after setting debug to 10 and debug ms to 1, I see 
probe timeouts in the logs: http://pastebin.com/44M1uJZc
I just set probe timeout to 10 (up from 2) and it still times out.

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com
To:Joao Eduardo Luis j...@suse.demailto:j...@suse.de, 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Date:03/06/2015 03:49 a.m.
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
Sent by:ceph-users 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com




By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao 
Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

On 06/02/2015 01:42 AM, 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything more 
than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due to the monitors 
not being able to communicate with each other.  Given you see traffic between 
the monitors, I'm inclined to assume that the other two monitors do not have 
each other on the monmap or, if they do know each other, either 1) the 
monitor's auth keys do not match, or 2) the probe timeout is being triggered 
before they successfully manage to find enough monitors to trigger an election 
-- which may be due to latency.

Logs will tells us more.

 -Joao

 Querying the admin socket with mon_status (the other two are the
 similar but with their hostnames and rank):

 {
 name: wcm1,
 rank: 0,
 state: probing,
 election_epoch: 1,
 quorum: [],
 outside_quorum: [
 wcm1
 ],
 extra_probe_peers: [],
 sync_provider: [],
 monmap: {
 epoch: 0,
 fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
 modified: 2015-06-02 10:43:41.467811,
 created: 2015-06-02 10:43:41.467811,
 mons: [
 {
 rank: 0,
 name: wcm1,
 addr: 10.1.226.64:6789\/0
 },
 {
 rank: 1,
 name: wcm2,
 addr: 10.1.226.65:6789\/0
 },
 {
 rank: 2,
 name: wcm3,
 addr: 10.1.226.66:6789\/0
 }
 ]
 }
 }

___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Nigel Williams
On Wed, Jun 3, 2015 at 8:30 AM,  cameron.scr...@solnet.co.nz wrote:
 We are running with Jumbo Frames turned on. Is that likely to be the issue?

I got caught by this previously:

http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043955.html

The problem is Ceph almost-but-not-quite works, leading you down
lots of fruitless paths.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
We are running with Jumbo Frames turned on. Is that likely to be the 
issue? Do I need to configure something in ceph?

The mon maps are fine and after setting debug to 10 and debug ms to 1, I 
see probe timeouts in the logs: http://pastebin.com/44M1uJZc
I just set probe timeout to 10 (up from 2) and it still times out.

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Somnath Roy somnath@sandisk.com
To: Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com 
ceph-users@lists.ceph.com
Date:   03/06/2015 03:49 a.m.
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic)
Sent by:ceph-users ceph-users-boun...@lists.ceph.com



By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Joao Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)

On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything 
more than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due to the 
monitors not being able to communicate with each other.  Given you see 
traffic between the monitors, I'm inclined to assume that the other two 
monitors do not have each other on the monmap or, if they do know each 
other, either 1) the monitor's auth keys do not match, or 2) the probe 
timeout is being triggered before they successfully manage to find enough 
monitors to trigger an election -- which may be due to latency.

Logs will tells us more.

  -Joao

 Querying the admin socket with mon_status (the other two are the
 similar but with their hostnames and rank):

 {
 name: wcm1,
 rank: 0,
 state: probing,
 election_epoch: 1,
 quorum: [],
 outside_quorum: [
 wcm1
 ],
 extra_probe_peers: [],
 sync_provider: [],
 monmap: {
 epoch: 0,
 fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
 modified: 2015-06-02 10:43:41.467811,
 created: 2015-06-02 10:43:41.467811,
 mons: [
 {
 rank: 0,
 name: wcm1,
 addr: 10.1.226.64:6789\/0
 },
 {
 rank: 1,
 name: wcm2,
 addr: 10.1.226.65:6789\/0
 },
 {
 rank: 2,
 name: wcm3,
 addr: 10.1.226.66:6789\/0
 }
 ]
 }
 }

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If 
the reader of this message is not the intended recipient, you are hereby 
notified that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly 
prohibited. If you have received this communication in error, please 
notify the sender by telephone or e-mail (as shown above) immediately and 
destroy any and all copies of this message in your possession (whether 
hard copies or electronically stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Attention:
This email may contain information intended for the sole use of
the original recipient. Please respect this when sharing or
disclosing this email's contents with any third party. If you
believe you have received this email in error, please delete it
and notify the sender or postmas...@solnetsolutions.co.nz as
soon as possible

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
Seems to be something to do with our switch. If the interface MTU is too 
close to the switch MTU it stops working. Thanks for all your help :)

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Somnath Roy somnath@sandisk.com
To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz
Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de
Date:   03/06/2015 11:49 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic)



I doubt it is anything to do with Ceph, hope you checked your switch is 
supporting Jumbo frames and you have set MTU 9000 to all the devices in 
between. It‘s better to ping your devices (all the devices participating 
in the cluster) like the way it mentioned in the following articles , just 
in case you are not sure.
 
http://www.mylesgray.com/hardware/test-jumbo-frames-working/
http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working
 
Hope this helps,
 
Thanks  Regards
Somnath
 
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
 
Setting the MTU to 1500 worked, monitors reach quorum right away. 
Unfortunately we really want Jumbo Frames to be on, any ideas on how to 
get ceph to work with them on? 

Thanks! 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz 
Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de 
Date:03/06/2015 10:34 a.m. 
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 




We have seen some communication issue with that, try to make all the 
server MTU 1500 and try out… 
  
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic) 
  
We are running with Jumbo Frames turned on. Is that likely to be the 
issue? Do I need to configure something in ceph? 

The mon maps are fine and after setting debug to 10 and debug ms to 1, I 
see probe timeouts in the logs: http://pastebin.com/44M1uJZc 
I just set probe timeout to 10 (up from 2) and it still times out. 

Thanks! 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com 
ceph-users@lists.ceph.com 
Date:03/06/2015 03:49 a.m. 
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 
Sent by:ceph-users ceph-users-boun...@lists.ceph.com 





By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Joao Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)

On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything 
more than

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Somnath Roy
By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao 
Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything more 
than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due to the monitors 
not being able to communicate with each other.  Given you see traffic between 
the monitors, I'm inclined to assume that the other two monitors do not have 
each other on the monmap or, if they do know each other, either 1) the 
monitor's auth keys do not match, or 2) the probe timeout is being triggered 
before they successfully manage to find enough monitors to trigger an election 
-- which may be due to latency.

Logs will tells us more.

  -Joao

 Querying the admin socket with mon_status (the other two are the
 similar but with their hostnames and rank):

 {
 name: wcm1,
 rank: 0,
 state: probing,
 election_epoch: 1,
 quorum: [],
 outside_quorum: [
 wcm1
 ],
 extra_probe_peers: [],
 sync_provider: [],
 monmap: {
 epoch: 0,
 fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
 modified: 2015-06-02 10:43:41.467811,
 created: 2015-06-02 10:43:41.467811,
 mons: [
 {
 rank: 0,
 name: wcm1,
 addr: 10.1.226.64:6789\/0
 },
 {
 rank: 1,
 name: wcm2,
 addr: 10.1.226.65:6789\/0
 },
 {
 rank: 2,
 name: wcm3,
 addr: 10.1.226.66:6789\/0
 }
 ]
 }
 }

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-01 Thread Cameron . Scrace
I am trying to deploy a new ceph cluster and my monitors are not reaching 
quorum. SELinux is off, firewalls are off, I can see traffic between the 
nodes on port 6789 but when I use the admin socket to force a re-election 
only the monitor I send the request to shows the new election in its logs. 
My logs are filled entirely of the following two lines:

2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log [DBG] : 
from='admin socket' entity='admin socket' cmd='mon_status' args=[]: 
dispatch
2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log [DBG] : 
from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished

Querying the admin socket with mon_status (the other two are the similar 
but with their hostnames and rank):

{
name: wcm1,
rank: 0,
state: probing,
election_epoch: 1,
quorum: [],
outside_quorum: [
wcm1
],
extra_probe_peers: [],
sync_provider: [],
monmap: {
epoch: 0,
fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
modified: 2015-06-02 10:43:41.467811,
created: 2015-06-02 10:43:41.467811,
mons: [
{
rank: 0,
name: wcm1,
addr: 10.1.226.64:6789\/0
},
{
rank: 1,
name: wcm2,
addr: 10.1.226.65:6789\/0
},
{
rank: 2,
name: wcm3,
addr: 10.1.226.66:6789\/0
}
]
}
}

Any suggestions on what could be the issue?

Regards,

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz
Attention:
This email may contain information intended for the sole use of
the original recipient. Please respect this when sharing or
disclosing this email's contents with any third party. If you
believe you have received this email in error, please delete it
and notify the sender or postmas...@solnetsolutions.co.nz as
soon as possible. The content of this email does not necessarily
reflect the views of Solnet Solutions Ltd.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com