Interesting,

So if I change ' auth service ticket ttl' to 172,800, in theory I could go 
without a monitor for 48 hours?


-----Original Message-----
From: Sage Weil [mailto:s...@inktank.com] 
Sent: Monday, August 12, 2013 9:50 PM
To: Jeppesen, Nelson
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Why is my mon store.db is 220GB?

On Mon, 12 Aug 2013, Jeppesen, Nelson wrote:
> Joao,
> 
> (log file uploaded to http://pastebin.com/Ufrxn6fZ)
> 
> I had some good luck and some bad luck. I copied the store.db to a new 
> monitor, injected a modified monmap and started it up (This is all on the 
> same host.) Very quickly it reached quorum (as far as I can tell) but didn't 
> respond. Running 'ceph -w' just hung, no timeouts or errors. Same thing when 
> restarting an OSD.
> 
> The last lines of the log file   '...ms_verify_authorizer..' are from 'ceph 
> -w' attempts.
> 
> I restarted everything again and it sat there synchronizing. IO stat reported 
> about 100MB/s, but just reads. I let it sit there for 7 min but nothing 
> happened.

Can you do this again with --debug-mon 20 --debug-ms 1?  It looks as though the 
main dispatch thread is blocked (7f71a1aa5700 does nothing after winning the 
election).  It would also be helpful to gdb attach to the running ceph-mon and 
capture the output from 'thread apply all bt'.

> Side question, how long can a ceph cluster run without a monitor? I 
> was able to upload files via rados gateway without issue even when the 
> monitor was down.

Quite a while, as long as no new processes need to authenticate, and no nodes 
go up or down.  Eventually the authentication keys are going to time out, 
though (1 hour is the default).

sage
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to