Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
Thanks Joao,

Is there a doc somewhere on the dependencies? I assume I'll need to setup the 
tool chain to compile?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
Is there an easy way I can find the age and/or expiration of the service ticket 
on a particular osd? Is that a file or just kept in ram?


-Original Message-
From: Sage Weil [mailto:s...@inktank.com] 
Sent: Tuesday, August 13, 2013 9:01 AM
To: Jeppesen, Nelson
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Why is my mon store.db is 220GB?

On Tue, 13 Aug 2013, Jeppesen, Nelson wrote:
 Interesting,
 
 So if I change ' auth service ticket ttl' to 172,800, in theory I could go 
 without a monitor for 48 hours?

If there are no up/down events, no new clients need to start, no osd recovery 
going on, then I *think* so.  I may be forgetting something.

sage


 
 
 -Original Message-
 From: Sage Weil [mailto:s...@inktank.com]
 Sent: Monday, August 12, 2013 9:50 PM
 To: Jeppesen, Nelson
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Why is my mon store.db is 220GB?
 
 On Mon, 12 Aug 2013, Jeppesen, Nelson wrote:
  Joao,
  
  (log file uploaded to http://pastebin.com/Ufrxn6fZ)
  
  I had some good luck and some bad luck. I copied the store.db to a new 
  monitor, injected a modified monmap and started it up (This is all on the 
  same host.) Very quickly it reached quorum (as far as I can tell) but 
  didn't respond. Running 'ceph -w' just hung, no timeouts or errors. Same 
  thing when restarting an OSD.
  
  The last lines of the log file   '...ms_verify_authorizer..' are from 'ceph 
  -w' attempts.
  
  I restarted everything again and it sat there synchronizing. IO stat 
  reported about 100MB/s, but just reads. I let it sit there for 7 min but 
  nothing happened.
 
 Can you do this again with --debug-mon 20 --debug-ms 1?  It looks as though 
 the main dispatch thread is blocked (7f71a1aa5700 does nothing after winning 
 the election).  It would also be helpful to gdb attach to the running 
 ceph-mon and capture the output from 'thread apply all bt'.
 
  Side question, how long can a ceph cluster run without a monitor? I 
  was able to upload files via rados gateway without issue even when 
  the monitor was down.
 
 Quite a while, as long as no new processes need to authenticate, and no nodes 
 go up or down.  Eventually the authentication keys are going to time out, 
 though (1 hour is the default).
 
 sage
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Sage Weil
On Tue, 13 Aug 2013, Jeppesen, Nelson wrote:
 Is there an easy way I can find the age and/or expiration of the service 
 ticket on a particular osd? Is that a file or just kept in ram?

It's just in ram.  If you crank up debug auth = 10 you will periodically 
see it dump the rotating keys and expirations.  Ideally the middle one 
will remain valid, but things won't grind to a halt until they are all 
expired.

sage

  
 
 -Original Message-
 From: Sage Weil [mailto:s...@inktank.com] 
 Sent: Tuesday, August 13, 2013 9:01 AM
 To: Jeppesen, Nelson
 Cc: ceph-users@lists.ceph.com
 Subject: RE: [ceph-users] Why is my mon store.db is 220GB?
 
 On Tue, 13 Aug 2013, Jeppesen, Nelson wrote:
  Interesting,
  
  So if I change ' auth service ticket ttl' to 172,800, in theory I could go 
  without a monitor for 48 hours?
 
 If there are no up/down events, no new clients need to start, no osd recovery 
 going on, then I *think* so.  I may be forgetting something.
 
 sage
 
 
  
  
  -Original Message-
  From: Sage Weil [mailto:s...@inktank.com]
  Sent: Monday, August 12, 2013 9:50 PM
  To: Jeppesen, Nelson
  Cc: ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] Why is my mon store.db is 220GB?
  
  On Mon, 12 Aug 2013, Jeppesen, Nelson wrote:
   Joao,
   
   (log file uploaded to http://pastebin.com/Ufrxn6fZ)
   
   I had some good luck and some bad luck. I copied the store.db to a new 
   monitor, injected a modified monmap and started it up (This is all on the 
   same host.) Very quickly it reached quorum (as far as I can tell) but 
   didn't respond. Running 'ceph -w' just hung, no timeouts or errors. Same 
   thing when restarting an OSD.
   
   The last lines of the log file   '...ms_verify_authorizer..' are from 
   'ceph -w' attempts.
   
   I restarted everything again and it sat there synchronizing. IO stat 
   reported about 100MB/s, but just reads. I let it sit there for 7 min but 
   nothing happened.
  
  Can you do this again with --debug-mon 20 --debug-ms 1?  It looks as though 
  the main dispatch thread is blocked (7f71a1aa5700 does nothing after 
  winning the election).  It would also be helpful to gdb attach to the 
  running ceph-mon and capture the output from 'thread apply all bt'.
  
   Side question, how long can a ceph cluster run without a monitor? I 
   was able to upload files via rados gateway without issue even when 
   the monitor was down.
  
  Quite a while, as long as no new processes need to authenticate, and no 
  nodes go up or down.  Eventually the authentication keys are going to time 
  out, though (1 hour is the default).
  
  sage
  
  
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Joao Eduardo Luis

On 13/08/13 09:19, Jeppesen, Nelson wrote:

Thanks Joao,

Is there a doc somewhere on the dependencies? I assume I’ll need to
setup the tool chain to compile?




README on the ceph repo has the dependencies.

You could also try getting it from the gitbuilders [1], but I'm not sure 
how you'd go about doing that without installing other packages.


[1] - http://gitbuilder.ceph.com/

--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
I built the wip-monstore-copy branch with './configure --with-rest-bench 
--with-debug' and 'make'. It worked and I get all the usual stuff but 
ceph-monstore-tool is missing. I see code in ./src/tools/. Did I miss something?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Mandell Degerness
Hmm.  This sounds very similar to the problem I reported (with
debug-mon = 20 and debug ms = 1 logs as of today) on our support site
(ticket #438) - Sage, please take a look.

On Mon, Aug 12, 2013 at 9:49 PM, Sage Weil s...@inktank.com wrote:
 On Mon, 12 Aug 2013, Jeppesen, Nelson wrote:
 Joao,

 (log file uploaded to http://pastebin.com/Ufrxn6fZ)

 I had some good luck and some bad luck. I copied the store.db to a new 
 monitor, injected a modified monmap and started it up (This is all on the 
 same host.) Very quickly it reached quorum (as far as I can tell) but didn't 
 respond. Running 'ceph -w' just hung, no timeouts or errors. Same thing when 
 restarting an OSD.

 The last lines of the log file   '...ms_verify_authorizer..' are from 'ceph 
 -w' attempts.

 I restarted everything again and it sat there synchronizing. IO stat 
 reported about 100MB/s, but just reads. I let it sit there for 7 min but 
 nothing happened.

 Can you do this again with --debug-mon 20 --debug-ms 1?  It looks as
 though the main dispatch thread is blocked (7f71a1aa5700 does nothing
 after winning the election).  It would also be helpful to gdb attach to
 the running ceph-mon and capture the output from 'thread apply all bt'.

 Side question, how long can a ceph cluster run without a monitor? I was
 able to upload files via rados gateway without issue even when the
 monitor was down.

 Quite a while, as long as no new processes need to authenticate, and no
 nodes go up or down.  Eventually the authentication keys are going to time
 out, though (1 hour is the default).

 sage
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
Never mind, I removed --with-rest-bench and it worked.

 I built the wip-monstore-copy branch with './configure --with-rest-bench 
 --with-debug' and 'make'. It worked and I get all the usual stuff but ceph- 
 monstore-tool is missing. I see code in ./src/tools/. Did I miss something?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
Joao,

ceph-monstore-tool --mon-store-path /var/lib/ceph/mon/ceph-2 --out 
/var/lib/ceph/mon/ceph-1  --command store-copy
is running now. It hit 52MB very quickly then nothing with lots of disk read, 
which is what I'd expect. Its reading fast and expect it to finish in 35min.

Just to make sure, this won't add a new monitor, just clean it up. So, when 
it's done I should do the following:

mv /var/lib/ceph/mon/ceph-2 /var/lib/ceph/mon/ceph-2.old
mv /var/lib/ceph/mon/ceph-1 /var/lib/ceph/mon/ceph-2
service ceph start mon.2



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Joao Eduardo Luis

On 13/08/13 14:46, Jeppesen, Nelson wrote:

Joao,

ceph-monstore-tool --mon-store-path /var/lib/ceph/mon/ceph-2 --out
/var/lib/ceph/mon/ceph-1  --command store-copy

is running now. It hit 52MB very quickly then nothing with lots of disk
read, which is what I’d expect. Its reading fast and expect it to finish
in 35min.

Just to make sure, this won’t add a new monitor, just clean it up. So,
when it’s done I should do the following:

mv /var/lib/ceph/mon/ceph-2 /var/lib/ceph/mon/ceph-2.old

mv /var/lib/ceph/mon/ceph-1 /var/lib/ceph/mon/ceph-2

service ceph start mon.2


Correct.  The tool just extracts whatever is on one mon store and copies 
it to another store.  The contents should be the same and the monitor 
should come back to life as if nothing had happened.


If for some reason that is not the case, you'll still have the original 
store readily to be used.  Let me know if that happens and I'll be happy 
to help.


  -Joao


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Joao Eduardo Luis

On 13/08/13 14:46, Jeppesen, Nelson wrote:

Joao,

ceph-monstore-tool --mon-store-path /var/lib/ceph/mon/ceph-2 --out
/var/lib/ceph/mon/ceph-1  --command store-copy

is running now. It hit 52MB very quickly then nothing with lots of disk
read, which is what I’d expect. Its reading fast and expect it to finish
in 35min.

Just to make sure, this won’t add a new monitor, just clean it up. So,
when it’s done I should do the following:

mv /var/lib/ceph/mon/ceph-2 /var/lib/ceph/mon/ceph-2.old

mv /var/lib/ceph/mon/ceph-1 /var/lib/ceph/mon/ceph-2

service ceph start mon.2


Sage pointed out that you'll also need to copy the 'keyring' file from 
the original mon data dir to the new mon data dir.


So that would be 'cp /var/lib/ceph/mon/ceph-2/keyring 
/var/lib/ceph/mon/ceph-1/'


You should be good to go then.

  -Joao


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
Success! It was pretty quick too, maybe 20-30min. It’s now at 100MB.

In a matter of min I was able to add two monitors and now I’m back to three 
monitors.

Thank you again, Joao and Sage! I can sleep at night now knowing that a single 
node won't take down the cluster anymore ☺
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Joao Eduardo Luis

On 13/08/13 16:13, Jeppesen, Nelson wrote:

Success! It was pretty quick too, maybe 20-30min. It’s now at 100MB.

In a matter of min I was able to add two monitors and now I’m back to three 
monitors.

Thank you again, Joao and Sage! I can sleep at night now knowing that a single 
node won't take down the cluster anymore ☺


Hooray!  Glad to know everything worked out! :-)

  -Joao


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-12 Thread Joao Eduardo Luis
Following a discussion we had today on #ceph, I've added some extra 
functionality to 'ceph-monstore-tool' to allow copying the data out of a 
store into a new mon store, and can be found on branch wip-monstore-copy.


Using it as

ceph-monstore-tool --mon-store-path mon-data-dir --out mon-data-out 
--command store-copy


with mon-data-dir being the mon data dir where the current monitor lives 
(say, /var/lib/ceph/mon/ceph-a), and mon-data-out being another 
directory.  This last directory should be empty, allowing the tool to 
create a new store, but if a store already exists it will not error out, 
copying instead the keys from the first store to the already existing 
store, so beware!


Also, should bear in mind that you must stop the monitor while doing 
this -- the tool won't work otherwise.


Anyway, this should allow you to grab all your data from the current 
monitor.  You'll be presented with a few stats when the store finishes 
being copied, and hopefully you'll see that the tool didn't copy 220GB 
worth of data -- should be considerably less!


Let me know if this works out for you.

  -Joao

On 07/08/13 15:14, Jeppesen, Nelson wrote:

Joao,

Have you had a chance to look at my monitor issues? I Ran ''ceph-mon -i FOO 
-compact'  last week but it did not improve disk usage.

Let me know if there's anything else I dig up. The monitor still at 0.67-rc2 
with the OSDs at .0.61.7.


On 08/02/2013 12:15 AM, Jeppesen, Nelson wrote:

Thanks for the reply, but how can I fix this without an outage?

I tired adding 'mon compact on start = true' but the monitor just hung. 
Unfortunately this is a production cluster and can't take the outages (I'm 
assuming the cluster will fail without a monitor). I had three monitors I was 
hit with the store.db bug and lost two of the three.

I have tried running with 0.61.5, .0.61.7 and 0.67-rc2. None of them seem to 
shrink the DB.


My guess is that the compaction policies we are enforcing won't cover
the portions of the store that haven't been compacted *prior* to the
upgrade.

Even today we still know of users with stores growing over dozens of
GBs, requiring occasional restarts to compact (which is far from an
acceptable fix).  Some of these stores can take several minutes to
compact when the monitors are restarted, although these guys can often
mitigate any down time by restarting monitors one at a time while
maintaining quorum.  Unfortunately you don't have that luxury. :-\

If however you are willing to manually force a compaction, you should be
able to do so with 'ceph-mon -i FOO --compact'.

Now, there is a possibility this is why you've been unable to add other
monitors to the cluster.  Chances are that the iterators used to
synchronize the store get stuck, or move slowly enough to make all sorts
of funny timeouts to be triggered.

I intend to look into your issue (especially the problems with adding
new monitors) in the morning to better assess what's happening.

-Joao



-Original Message-
From: Mike Dawson [mailto:mike.dawson at cloudapt.com]
Sent: Thursday, August 01, 2013 4:10 PM
To: Jeppesen, Nelson
Cc: ceph-users at lists.ceph.com
Subject: Re: [ceph-users] Why is my mon store.db is 220GB?

220GB is way, way too big. I suspect your monitors need to go through a 
successful leveldb compaction. The early releases of Cuttlefish suffered 
several issues with store.db growing unbounded. Most were fixed by 0.61.5, I 
believe.

You may have luck stoping all Ceph daemons, then starting the monitor by itself. 
When there were bugs, leveldb compaction tended work better without OSD traffic 
hitting the monitors. Also, there are some settings to force a compact on startup 
like 'mon compact on start = true' and mon compact on trim = true. I don't 
think either are required anymore though. See some history here:

http://tracker.ceph.com/issues/4895


Thanks,

Mike Dawson
Co-Founder  Director of Cloud Architecture Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/1/2013 6:52 PM, Jeppesen, Nelson wrote:

My Mon store.db has been at 220GB for a few months now. Why is this
and how can I fix it? I have one monitor in this cluster and I suspect
that I can't  add monitors to the cluster because it is too big. Thank you.



___
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com







--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-12 Thread Jeppesen, Nelson
Joao, 

(log file uploaded to http://pastebin.com/Ufrxn6fZ)

I had some good luck and some bad luck. I copied the store.db to a new monitor, 
injected a modified monmap and started it up (This is all on the same host.) 
Very quickly it reached quorum (as far as I can tell) but didn't respond. 
Running 'ceph -w' just hung, no timeouts or errors. Same thing when restarting 
an OSD.

The last lines of the log file   '...ms_verify_authorizer..' are from 'ceph -w' 
attempts.

I restarted everything again and it sat there synchronizing. IO stat reported 
about 100MB/s, but just reads. I let it sit there for 7 min but nothing 
happened.

Side question, how long can a ceph cluster run without a monitor? I was able to 
upload files via rados gateway without issue even when the monitor was down.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-09 Thread Joao Eduardo Luis

On 07/08/13 15:14, Jeppesen, Nelson wrote:

Joao,

Have you had a chance to look at my monitor issues? I Ran ''ceph-mon -i FOO 
-compact'  last week but it did not improve disk usage.

Let me know if there's anything else I dig up. The monitor still at 0.67-rc2 
with the OSDs at .0.61.7.


Hi Nelson,

It's been a crazy week, and haven't had the opportunity to dive into the 
compaction issues -- and we've been tying the last loose ends for the 
dumpling release.


Btw, just noticed that you mentioned on your previous email that the 
'mon compact on start = true' flag made your monitor hang.  Well, that 
was not a hang per se.  If you try that again and take a look at IO on 
the mon store, you should see the monitor doing loads of it.  That's 
leveldb compacting.  It should take a while.  A considerable while.  As 
I previously mentioned, 10G stores can take a while to compact -- a 
220GB store will take even longer.


However, regardless of how we eventually fix this whole thing, you'll 
need to compact your store.  I seriously doubt there's a way out of it. 
 Well, there may be another way out of it, but that would involve a bit 
of trickery to get the leveldb contents out of the store and into a new, 
fresh store, which would seem a lot like a last resort.


But feel free to ping me on IRC and we'll try to figure something out.

  -Joao





On 08/02/2013 12:15 AM, Jeppesen, Nelson wrote:

Thanks for the reply, but how can I fix this without an outage?

I tired adding 'mon compact on start = true' but the monitor just hung. 
Unfortunately this is a production cluster and can't take the outages (I'm 
assuming the cluster will fail without a monitor). I had three monitors I was 
hit with the store.db bug and lost two of the three.

I have tried running with 0.61.5, .0.61.7 and 0.67-rc2. None of them seem to 
shrink the DB.


My guess is that the compaction policies we are enforcing won't cover
the portions of the store that haven't been compacted *prior* to the
upgrade.

Even today we still know of users with stores growing over dozens of
GBs, requiring occasional restarts to compact (which is far from an
acceptable fix).  Some of these stores can take several minutes to
compact when the monitors are restarted, although these guys can often
mitigate any down time by restarting monitors one at a time while
maintaining quorum.  Unfortunately you don't have that luxury. :-\

If however you are willing to manually force a compaction, you should be
able to do so with 'ceph-mon -i FOO --compact'.

Now, there is a possibility this is why you've been unable to add other
monitors to the cluster.  Chances are that the iterators used to
synchronize the store get stuck, or move slowly enough to make all sorts
of funny timeouts to be triggered.

I intend to look into your issue (especially the problems with adding
new monitors) in the morning to better assess what's happening.

-Joao



-Original Message-
From: Mike Dawson [mailto:mike.dawson at cloudapt.com]
Sent: Thursday, August 01, 2013 4:10 PM
To: Jeppesen, Nelson
Cc: ceph-users at lists.ceph.com
Subject: Re: [ceph-users] Why is my mon store.db is 220GB?

220GB is way, way too big. I suspect your monitors need to go through a 
successful leveldb compaction. The early releases of Cuttlefish suffered 
several issues with store.db growing unbounded. Most were fixed by 0.61.5, I 
believe.

You may have luck stoping all Ceph daemons, then starting the monitor by itself. 
When there were bugs, leveldb compaction tended work better without OSD traffic 
hitting the monitors. Also, there are some settings to force a compact on startup 
like 'mon compact on start = true' and mon compact on trim = true. I don't 
think either are required anymore though. See some history here:

http://tracker.ceph.com/issues/4895


Thanks,

Mike Dawson
Co-Founder  Director of Cloud Architecture Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/1/2013 6:52 PM, Jeppesen, Nelson wrote:

My Mon store.db has been at 220GB for a few months now. Why is this
and how can I fix it? I have one monitor in this cluster and I suspect
that I can't  add monitors to the cluster because it is too big. Thank you.



___
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com







--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Why is my mon store.db is 220GB?

2013-08-01 Thread Jeppesen, Nelson
My Mon store.db has been at 220GB for a few months now. Why is this and how can 
I fix it? I have one monitor in this cluster and I suspect that I can't  add 
monitors to the cluster because it is too big. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-01 Thread Mike Dawson
220GB is way, way too big. I suspect your monitors need to go through a 
successful leveldb compaction. The early releases of Cuttlefish suffered 
several issues with store.db growing unbounded. Most were fixed by 
0.61.5, I believe.


You may have luck stoping all Ceph daemons, then starting the monitor by 
itself. When there were bugs, leveldb compaction tended work better 
without OSD traffic hitting the monitors. Also, there are some settings 
to force a compact on startup like 'mon compact on start = true' and mon 
compact on trim = true. I don't think either are required anymore 
though. See some history here:


http://tracker.ceph.com/issues/4895


Thanks,

Mike Dawson
Co-Founder  Director of Cloud Architecture
Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/1/2013 6:52 PM, Jeppesen, Nelson wrote:

My Mon store.db has been at 220GB for a few months now. Why is this and
how can I fix it? I have one monitor in this cluster and I suspect that
I can’t  add monitors to the cluster because it is too big. Thank you.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-01 Thread Jeppesen, Nelson
Thanks for the reply, but how can I fix this without an outage?

I tired adding 'mon compact on start = true' but the monitor just hung. 
Unfortunately this is a production cluster and can't take the outages (I'm 
assuming the cluster will fail without a monitor). I had three monitors I was 
hit with the store.db bug and lost two of the three.

I have tried running with 0.61.5, .0.61.7 and 0.67-rc2. None of them seem to 
shrink the DB.

Nelson Jeppesen
   Disney Technology Solutions and Services
   Phone 206-588-5001

-Original Message-
From: Mike Dawson [mailto:mike.daw...@cloudapt.com] 
Sent: Thursday, August 01, 2013 4:10 PM
To: Jeppesen, Nelson
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Why is my mon store.db is 220GB?

220GB is way, way too big. I suspect your monitors need to go through a 
successful leveldb compaction. The early releases of Cuttlefish suffered 
several issues with store.db growing unbounded. Most were fixed by 0.61.5, I 
believe.

You may have luck stoping all Ceph daemons, then starting the monitor by 
itself. When there were bugs, leveldb compaction tended work better without OSD 
traffic hitting the monitors. Also, there are some settings to force a compact 
on startup like 'mon compact on start = true' and mon compact on trim = true. 
I don't think either are required anymore though. See some history here:

http://tracker.ceph.com/issues/4895


Thanks,

Mike Dawson
Co-Founder  Director of Cloud Architecture Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/1/2013 6:52 PM, Jeppesen, Nelson wrote:
 My Mon store.db has been at 220GB for a few months now. Why is this 
 and how can I fix it? I have one monitor in this cluster and I suspect 
 that I can't  add monitors to the cluster because it is too big. Thank you.



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-01 Thread Joao Eduardo Luis

On 08/02/2013 12:15 AM, Jeppesen, Nelson wrote:

Thanks for the reply, but how can I fix this without an outage?

I tired adding 'mon compact on start = true' but the monitor just hung. 
Unfortunately this is a production cluster and can't take the outages (I'm 
assuming the cluster will fail without a monitor). I had three monitors I was 
hit with the store.db bug and lost two of the three.

I have tried running with 0.61.5, .0.61.7 and 0.67-rc2. None of them seem to 
shrink the DB.


My guess is that the compaction policies we are enforcing won't cover 
the portions of the store that haven't been compacted *prior* to the 
upgrade.


Even today we still know of users with stores growing over dozens of 
GBs, requiring occasional restarts to compact (which is far from an 
acceptable fix).  Some of these stores can take several minutes to 
compact when the monitors are restarted, although these guys can often 
mitigate any down time by restarting monitors one at a time while 
maintaining quorum.  Unfortunately you don't have that luxury. :-\


If however you are willing to manually force a compaction, you should be 
able to do so with 'ceph-mon -i FOO --compact'.


Now, there is a possibility this is why you've been unable to add other 
monitors to the cluster.  Chances are that the iterators used to 
synchronize the store get stuck, or move slowly enough to make all sorts 
of funny timeouts to be triggered.


I intend to look into your issue (especially the problems with adding 
new monitors) in the morning to better assess what's happening.


  -Joao




Nelson Jeppesen
Disney Technology Solutions and Services
Phone 206-588-5001

-Original Message-
From: Mike Dawson [mailto:mike.daw...@cloudapt.com]
Sent: Thursday, August 01, 2013 4:10 PM
To: Jeppesen, Nelson
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Why is my mon store.db is 220GB?

220GB is way, way too big. I suspect your monitors need to go through a 
successful leveldb compaction. The early releases of Cuttlefish suffered 
several issues with store.db growing unbounded. Most were fixed by 0.61.5, I 
believe.

You may have luck stoping all Ceph daemons, then starting the monitor by itself. 
When there were bugs, leveldb compaction tended work better without OSD traffic 
hitting the monitors. Also, there are some settings to force a compact on startup 
like 'mon compact on start = true' and mon compact on trim = true. I don't 
think either are required anymore though. See some history here:

http://tracker.ceph.com/issues/4895


Thanks,

Mike Dawson
Co-Founder  Director of Cloud Architecture Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/1/2013 6:52 PM, Jeppesen, Nelson wrote:

My Mon store.db has been at 220GB for a few months now. Why is this
and how can I fix it? I have one monitor in this cluster and I suspect
that I can't  add monitors to the cluster because it is too big. Thank you.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-01 Thread Jeppesen, Nelson
Thank you Joao,

I'll get you any information you need.

I can tell you that I've restarted the mon few times and it does seem to change 
disk usage.

I just ran 'ceph-mon -i 2 --compact' on my monitor, see how that looks in the 
morning.


On 08/02/2013 12:15 AM, Jeppesen, Nelson wrote:
 Thanks for the reply, but how can I fix this without an outage?

 I tired adding 'mon compact on start = true' but the monitor just hung. 
 Unfortunately this is a production cluster and can't take the outages (I'm 
 assuming the cluster will fail without a monitor). I had three monitors I was 
 hit with the store.db bug and lost two of the three.

 I have tried running with 0.61.5, .0.61.7 and 0.67-rc2. None of them seem to 
 shrink the DB.

My guess is that the compaction policies we are enforcing won't cover
the portions of the store that haven't been compacted *prior* to the
upgrade.

Even today we still know of users with stores growing over dozens of
GBs, requiring occasional restarts to compact (which is far from an
acceptable fix).  Some of these stores can take several minutes to
compact when the monitors are restarted, although these guys can often
mitigate any down time by restarting monitors one at a time while
maintaining quorum.  Unfortunately you don't have that luxury. :-\

If however you are willing to manually force a compaction, you should be
able to do so with 'ceph-mon -i FOO --compact'.

Now, there is a possibility this is why you've been unable to add other
monitors to the cluster.  Chances are that the iterators used to
synchronize the store get stuck, or move slowly enough to make all sorts
of funny timeouts to be triggered.

I intend to look into your issue (especially the problems with adding
new monitors) in the morning to better assess what's happening.

   -Joao



 Nelson Jeppesen
 Disney Technology Solutions and Services
 Phone 206-588-5001

 -Original Message-
 From: Mike Dawson [mailto:mike.dawson at 
 cloudapt.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]
 Sent: Thursday, August 01, 2013 4:10 PM
 To: Jeppesen, Nelson
 Cc: ceph-users at 
 lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 Subject: Re: [ceph-users] Why is my mon store.db is 220GB?

 220GB is way, way too big. I suspect your monitors need to go through a 
 successful leveldb compaction. The early releases of Cuttlefish suffered 
 several issues with store.db growing unbounded. Most were fixed by 0.61.5, I 
 believe.

 You may have luck stoping all Ceph daemons, then starting the monitor by 
 itself. When there were bugs, leveldb compaction tended work better without 
 OSD traffic hitting the monitors. Also, there are some settings to force a 
 compact on startup like 'mon compact on start = true' and mon compact on trim 
 = true. I don't think either are required anymore though. See some history 
 here:

 http://tracker.ceph.com/issues/4895


 Thanks,

 Mike Dawson
 Co-Founder  Director of Cloud Architecture Cloudapt LLC
 6330 East 75th Street, Suite 170
 Indianapolis, IN 46250

 On 8/1/2013 6:52 PM, Jeppesen, Nelson wrote:
 My Mon store.db has been at 220GB for a few months now. Why is this
 and how can I fix it? I have one monitor in this cluster and I suspect
 that I can't  add monitors to the cluster because it is too big. Thank you.



 ___
 ceph-users mailing list
 ceph-users at 
 lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users at 
 lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com