Re: [ceph-users] HEALTH WARN: clock skew detected

2013-05-07 Thread Joao Eduardo Luis

On 05/07/2013 03:20 PM, Mike Lowe wrote:

You've learned on of the three computer science facts you need to know about 
distributed systems, and I'm glad I could pass something on:

1. Consistent, Available, Distributed - pick any two


To some degree of Consistent, Available and Distributed. :-P



2. To completely guard against k failures where you don't know which one failed 
just by looking you need 2k+1 redundant copies
3. Fault tolerant systems must all agree on what time it is

On May 7, 2013, at 6:29 AM, Varun Chandramouli  wrote:


Hi All,

Thanks for the replies. I started the ntp daemon and the warnings as well as 
the crashes seem to have gone. This is the first time I set up a cluster (of 
physical machines), and was unaware of the need to synchronize the clocks. 
Probably should have googled it more :). Pardon my ignorance.

Thanks Again,
Varun



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH WARN: clock skew detected

2013-05-07 Thread Mike Lowe
You've learned on of the three computer science facts you need to know about 
distributed systems, and I'm glad I could pass something on:

1. Consistent, Available, Distributed - pick any two
2. To completely guard against k failures where you don't know which one failed 
just by looking you need 2k+1 redundant copies
3. Fault tolerant systems must all agree on what time it is

On May 7, 2013, at 6:29 AM, Varun Chandramouli  wrote:

> Hi All,
> 
> Thanks for the replies. I started the ntp daemon and the warnings as well as 
> the crashes seem to have gone. This is the first time I set up a cluster (of 
> physical machines), and was unaware of the need to synchronize the clocks. 
> Probably should have googled it more :). Pardon my ignorance.
> 
> Thanks Again,
> Varun
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH WARN: clock skew detected

2013-05-07 Thread Varun Chandramouli
Hi All,

Thanks for the replies. I started the ntp daemon and the warnings as well
as the crashes seem to have gone. This is the first time I set up a cluster
(of physical machines), and was unaware of the need to synchronize the
clocks. Probably should have googled it more :). Pardon my ignorance.

Thanks Again,
Varun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH WARN: clock skew detected

2013-05-07 Thread Joao Eduardo Luis

On 05/06/2013 01:07 PM, Michael Lowe wrote:

Um, start it? You must have synchronized clocks in a fault tolerant system 
(google Byzantine generals clock) and the way to do that is ntp, therefore ntp 
is required.


On May 6, 2013, at 1:34 AM, Varun Chandramouli  wrote:


Hi Michael,

Thanks for your response. No, the ntp daemon is not running. Any other 
suggestions?

Regards
Varun



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




The monitors have a low tolerance to clock skew.  It was common to hit 
strange behaviours due to unsynchronized clocks, which can manifest 
themselves in so many weird ways, that we decided to introduce those 
warning messages in case the monitors' clocks drifted too much apart.


You should run ntpd (or something of the sorts) as Michael and others 
have suggested.  Failing to have synchronized clocks on a monitor 
cluster will cause all sorts of weirdness.  Keep your clocks 
synchronized people!


  -Joao

--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH WARN: clock skew detected

2013-05-06 Thread Michael Lowe
Um, start it? You must have synchronized clocks in a fault tolerant system 
(google Byzantine generals clock) and the way to do that is ntp, therefore ntp 
is required.


On May 6, 2013, at 1:34 AM, Varun Chandramouli  wrote:

> Hi Michael,
> 
> Thanks for your response. No, the ntp daemon is not running. Any other 
> suggestions?
> 
> Regards
> Varun
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH WARN: clock skew detected

2013-05-06 Thread Wolfgang Hennerbichler
On 05/06/2013 07:34 AM, Varun Chandramouli wrote:

No, the ntp daemon is not running. Any other
> suggestions?

How do you sync your clocks then?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH WARN: clock skew detected

2013-05-05 Thread Varun Chandramouli
Hi Michael,

Thanks for your response. No, the ntp daemon is not running. Any other
suggestions?

Regards
Varun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH WARN: clock skew detected

2013-05-05 Thread Matthew Roy
Instead of installing ntpupdate, you can just tell ntpd that it's okay
to shift the clock a lot using the '-g' parameter. See:
http://superuser.com/questions/518694/ubuntu-12-10-clock-is-wrong

I also find it useful to configure the NTP daemons on my monitors as NTP
peers so they exchange time directly with each other as well as
reference servers.

Matthew


On 05/05/2013 09:29 AM, Michael Lowe wrote:
> Are you running ntpd?  If so you may need to stop, run ntpdate, and
> restart ntpd.  Sometimes if the clock is too far out of sync ntp won't
> update the time.
> 
> On May 5, 2013, at 8:52 AM, Varun Chandramouli  > wrote:
> 
>> Hi All,
>>
>> I have a cluster of 4 nodes with 1 mds, 3 mons and 4 osds. Whenever I
>> do ceph health or ceph -s, it shows a health warning saying clock skew
>> detected in 2 of the 3 mons. When I run a mapreduce application on the
>> cluster, one of the monitors crashes (the one in which the skew is not
>> detected) soon after the application is started. Sometimes the
>> application completes, sometimes, it fails. I would like to know what
>> this warning means. Is it responsible for the failing of the
>> application. If yes, how to remove the warning?
>>
>> Here is my ceph.conf:
>>
>> [global]
>> auth client required = none
>> auth cluster required = none
>> auth service required = none
>>
>> [osd]
>> osd journal data = 1000
>> filestore xattr use omap = true
>>
>> [mon.a]
>> host = lnx147-73
>> mon addr = 10.72.147.73:6789 
>>
>> [mon.b]
>> host = lnx148-20
>> mon addr = 10.72.148.20:6789 
>>
>> [mon.c]
>> host = lnx-148-27
>> mon addr = 10.72.148.27:6789 
>>
>> [mds.a]
>> host = lnx147-73
>>
>> [osd.0]
>> host = lnx147-73
>>
>> [osd.1]
>> host = lnx148-20
>>
>> [osd.2]
>> host = lnx-148-27
>>
>> [osd.3]
>> host = ln148-28
>>
>> I can mail the mon logs and the output of ceph -w for the duration of
>> the application. 
>>
>> Regards
>> Varun
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Matthew
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH WARN: clock skew detected

2013-05-05 Thread Michael Lowe
Are you running ntpd?  If so you may need to stop, run ntpdate, and restart 
ntpd.  Sometimes if the clock is too far out of sync ntp won't update the time.

On May 5, 2013, at 8:52 AM, Varun Chandramouli  wrote:

> Hi All,
> 
> I have a cluster of 4 nodes with 1 mds, 3 mons and 4 osds. Whenever I do ceph 
> health or ceph -s, it shows a health warning saying clock skew detected in 2 
> of the 3 mons. When I run a mapreduce application on the cluster, one of the 
> monitors crashes (the one in which the skew is not detected) soon after the 
> application is started. Sometimes the application completes, sometimes, it 
> fails. I would like to know what this warning means. Is it responsible for 
> the failing of the application. If yes, how to remove the warning?
> 
> Here is my ceph.conf:
> 
> [global]
> auth client required = none
> auth cluster required = none
> auth service required = none
> 
> [osd]
> osd journal data = 1000
> filestore xattr use omap = true
> 
> [mon.a]
> host = lnx147-73
> mon addr = 10.72.147.73:6789
> 
> [mon.b]
> host = lnx148-20
> mon addr = 10.72.148.20:6789
> 
> [mon.c]
> host = lnx-148-27
> mon addr = 10.72.148.27:6789
> 
> [mds.a]
> host = lnx147-73
> 
> [osd.0]
> host = lnx147-73
> 
> [osd.1]
> host = lnx148-20
> 
> [osd.2]
> host = lnx-148-27
> 
> [osd.3]
> host = ln148-28
> 
> I can mail the mon logs and the output of ceph -w for the duration of the 
> application. 
> 
> Regards
> Varun
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] HEALTH WARN: clock skew detected

2013-05-05 Thread Varun Chandramouli
Hi All,

I have a cluster of 4 nodes with 1 mds, 3 mons and 4 osds. Whenever I do
ceph health or ceph -s, it shows a health warning saying clock skew
detected in 2 of the 3 mons. When I run a mapreduce application on the
cluster, one of the monitors crashes (the one in which the skew is not
detected) soon after the application is started. Sometimes the application
completes, sometimes, it fails. I would like to know what this warning
means. Is it responsible for the failing of the application. If yes, how to
remove the warning?

Here is my ceph.conf:

[global]
auth client required = none
auth cluster required = none
auth service required = none

[osd]
osd journal data = 1000
filestore xattr use omap = true

[mon.a]
host = lnx147-73
mon addr = 10.72.147.73:6789

[mon.b]
host = lnx148-20
mon addr = 10.72.148.20:6789

[mon.c]
host = lnx-148-27
mon addr = 10.72.148.27:6789

[mds.a]
host = lnx147-73

[osd.0]
host = lnx147-73

[osd.1]
host = lnx148-20

[osd.2]
host = lnx-148-27

[osd.3]
host = ln148-28

I can mail the mon logs and the output of ceph -w for the duration of the
application.

Regards
Varun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com