Re: [ceph-users] redundancy with 2 nodes

2015-01-01 Thread Christian Balzer

Hello,

On Thu, 01 Jan 2015 18:25:47 +1300 Mark Kirkwood wrote:

 The number of monitors recommended and the fact that a voting quorum is 
 the way it works is covered here:
 
 http://ceph.com/docs/master/rados/deployment/ceph-deploy-mon/
 
 but I agree that you should probably not get a HEALTH OK status when you 
 have just setup 2 (or in fact any even number of) monitors...HEALTH WARN 
 would make more sense, with a wee message suggesting adding at least one 
 more!
 

I think what Jiri meant is that wen the whole cluster goes into a deadlock
due to loosing monitor quorum, ceph -s etc won't work anymore either.

And while the cluster rightfully shouldn't be doing anything in such a
state, querying the surviving/reachable monitor and being told as much
would indeed be a nice feature, as opposed to deafening silence.

As for your suggestion, while certainly helpful it is my not so humble
opinion than the the WARN state right now is totally overloaded and quite
frankly bogus.
This is particularly a problem with monitor plugins that just pick up the
WARN state without further discrimination. 

And some WARN states like slow requests are pretty much an ERR state for
most people, stalled requests for more than 30 seconds (or days!) are a
sign of something massively wrong and likely to have customer/client
impact.

I think a neat solution would be the ability to assign all possible
problem states a value like ERR, WARN, NOTE.

A cluster with just 1 or 2 monitors or having scrub disabled is (for me)
worth a NOTE, but not a WARN.

Christian

 Regards
 
 Mark
 
 
 On 01/01/15 18:06, Jiri Kanicky wrote:
  Hi,
 
  I think you are right. I was too focused on the following line in docs:
  A cluster will run fine with a single monitor; however,*a single
  monitor is a single-point-of-failure*. I will try to add another
  monitor. Hopefully, this will fix my issue.
 
  Anyway, I think that ceph status or ceph health should report at
  least something in such state. Its quite weird that everything stops...
 
  Thank you
  Jiri
 
  On 1/01/2015 15:51, Lindsay Mathieson wrote:
  On Thu, 1 Jan 2015 03:46:33 PM Jiri Kanicky wrote:
  Hi,
 
  I have:
  - 2 monitors, one on each node
  - 4 OSDs, two on each node
  - 2 MDS, one on each node
  POOMA U here, but I don't think you can reach quorum with one out of
  two monitors, you need a odd number:
 
  http://ceph.com/docs/master/rados/configuration/mon-config-ref/#monitor-quorum
 
  Perhaps try removing one monitor, so you only have one left, then
  take the node without a monitor down.
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW slow gc

2015-01-01 Thread Aaron Bassett
I’m doing some load testing on radosgw to get ready for production and I had a 
problem with it stalling out. I had 100 cores from several nodes doing 
multipart uploads in parallel. This ran great for about two days, managing to 
upload about 2000 objects with an average size of 100GB. Then it stalled out 
and stopped. Ever since then, the gw has been gc’ing very slowly. During the 
upload run, it was creating objects at ~ 100/s, now it’s cleaning them at ~3/s. 
At this rate it wont be done for nearly a year and this is only a fraction of 
the data I need to put in. 

The pool I’m writing to is a cache pool at size 2 with an EC pool at 10+2 
behind it. (This data is not mission critical so we are trying to save space). 
I don’t know if this will affect the slow gc or not. 

I tried turning up rgw gc max objs to 256, but it didn’t seem to make a 
difference.

I’m working under the assumption that my uploads started stalling because too 
many un-gc’ed parts accumulated, but I may be way off base there. 

Any thoughts would be much appreciated, Aaron 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weighting question

2015-01-01 Thread Lindsay Mathieson
On Thu, 1 Jan 2015 08:27:33 AM Dyweni - Ceph-Users wrote:
 I suspect a better configuration would be to leave your weights alone 
 and to
 change your primary affinity so that the osd with the ssd is used first. 

Interesting 

   You
 might a little improvement on the writes (since the spinners have to 
 work too),
 but the reads should have the most improvement (since ceph only has to 
 read
 from the ssd).

Couple of things:
- The SSD will be partitioned for each OSD to have a journal

- I thought Journals were for writes only, not reads?

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com