[ceph-users] Remove RBD Image

2015-07-29 Thread Christian Eichelmann
Hi all,

I am trying to remove several rbd images from the cluster.
Unfortunately, that doesn't work:

$ rbd info foo
rbd image 'foo':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.919443.238e1f29
format: 1


$ rbd rm foo
2015-07-29 10:25:01.438296 7f868d330760 -1 librbd: image has watchers -
not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try
again after closing/unmapping it or waiting 30s for the crashed client
to timeout.

$ rados -p rbd listwatchers foo
error listing watchers rbd/foo: (2) No such file or directory

Well, that is quite frustrating. The image was mapped on one host, where
I was unmapping it. What do I have to do to get rid of it?

We are using ceph version 0.87.2

Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove RBD Image

2015-07-29 Thread Christian Eichelmann
Hi Ilya,

that worked for me and actually pointed out that one of my collegues
currently had the rbd pool locally mounted via fuse-rbd, which obviously
locks all images in this pool. Problem solved! Thanks!

Regards,
Christian

Am 29.07.2015 um 11:48 schrieb Ilya Dryomov:
 On Wed, Jul 29, 2015 at 11:30 AM, Christian Eichelmann
 christian.eichelm...@1und1.de wrote:
 Hi all,

 I am trying to remove several rbd images from the cluster.
 Unfortunately, that doesn't work:

 $ rbd info foo
 rbd image 'foo':
 size 1024 GB in 262144 objects
 order 22 (4096 kB objects)
 block_name_prefix: rb.0.919443.238e1f29
 format: 1


 $ rbd rm foo
 2015-07-29 10:25:01.438296 7f868d330760 -1 librbd: image has watchers -
 not removing
 Removing image: 0% complete...failed.
 rbd: error: image still has watchers
 This means the image is still open or the client using it crashed. Try
 again after closing/unmapping it or waiting 30s for the crashed client
 to timeout.

 $ rados -p rbd listwatchers foo
 error listing watchers rbd/foo: (2) No such file or directory
 
 For a format 1 image, you need to do
 
 $ rados -p rbd listwatchers foo.rbd
 
 rbd status command was recently introduced to abstract this, but it's
 not in 0.87.
 

 Well, that is quite frustrating. The image was mapped on one host, where
 I was unmapping it. What do I have to do to get rid of it?
 
 Did you unmap the image?  What is the output of rbd showmapped on the
 host you had it mapped?  Is there anything rbd or ceph related in dmesg on
 that host?
 
 Thanks,
 
 Ilya
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Scrub Error / How does ceph pg repair work?

2015-05-12 Thread Christian Eichelmann
Hi Christian, Hi Robert,

thank you for your replies!
I was already expecting something like this. But I am seriously worried
about that!

Just assume that this is happening at night. Our shift has not
necessarily enough knowledge to perform all the steps in Sebasien's
article. And if we always have to do that when a scrub error appears, we
are putting several hours per week into fixing such problems.

It is also very misleading that a command called ceph pg repair might
do quite the opposit and overwrite the good data in your cluster with
corrupt one. I don't know much about the interna of ceph, but if the
cluster can already recognize that checksums are not the same, why can't
he just build a quorum from the existing replicas if possible?

And again the question:
Are these placementgroups (scrub error, inconsistent) blocking on
read/write requests? Because if yes, we have a serious problem here...

Regards,
Christian

Am 12.05.2015 um 08:20 schrieb Christian Balzer:
 
 Hello,
 
 I can only nod emphatically to what Robert said, don't issue repairs
 unless you 
 a) don't care about the data or 
 b) have verified that your primary OSD is good.
 
 See this for some details on how establish which replica(s) are actually
 good or not:
 http://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/
 
 Of course if you somehow wind up with more subtle data corruption and are
 faced with 3 slightly differing data sets, you may have have to resort to
 rolling a dice after all.
 
 A word from the devs about the state of checksums and automatic repairs we
 can trust would be appreciated.
 
 Christian
 
 On Mon, 11 May 2015 10:19:08 -0600 Robert LeBlanc wrote:
 
 Personally I would not just run this command automatically because as you
 stated, it only copies the primary PGs to the replicas and if the primary
 is corrupt, you will corrupt your secondaries.I think the monitor log
 shows which OSD has the problem so if it is not your primary, then just
 issue the repair command.

 There was talk, and I believe work towards, Ceph storing a hash of the
 object so that it can be smarter about which replica has the correct data
 and automatically replicate the good data no matter where it is. I think
 the first part, creating the hash and storing it, has been included in
 Hammer. I'm not an authority on this so take it with a grain of salt.

 Right now our procedure is to find the PG files on the OSDs, perform a
 MD5 on all of them and the one that doesn't match, overwrite, either by
 issuing the PG repair command, or removing the bad PG files, rsyncing
 them with the -X argument and then instructing a deep-scrub on the PG to
 clear it up in Ceph.

 I've only tested this on an idle cluster, so I don't know how well it
 will work on an active cluster. Since we issue a deep-scrub, if the PGs
 of the replicas change during the rsync, it should come up with an
 error. The idea is to keep rsyncing until the deep-scrub is clean. Be
 warned that you may be aiming your gun at your foot with this!

 
 Robert LeBlanc
 GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

 On Mon, May 11, 2015 at 2:09 AM, Christian Eichelmann 
 christian.eichelm...@1und1.de wrote:

 Hi all!

 We are experiencing approximately 1 scrub error / inconsistent pg every
 two days. As far as I know, to fix this you can issue a ceph pg
 repair, which works fine for us. I have a few qestions regarding the
 behavior of the ceph cluster in such a case:

 1. After ceph detects the scrub error, the pg is marked as
 inconsistent. Does that mean that any IO to this pg is blocked until
 it is repaired?

 2. Is this amount of scrub errors normal? We currently have only 150TB
 in our cluster, distributed over 720 2TB disks.

 3. As far as I know, a ceph pg repair just copies the content of the
 primary pg to all replicas. Is this still the case? What if the primary
 copy is the one having errors? We have a 4x replication level and it
 would be cool if ceph would use one of the pg for recovery which has
 the same checksum as the majority of pgs.

 4. Some of this errors are happening at night. Since ceph reports this
 as a critical error, our shift is called and wake up, just to issue a
 single command. Do you see any problems in triggering this command
 automatically via monitoring event? Is there a reason why ceph isn't
 resolving these errors itself when it has enought replicas to do so?

 Regards,
 Christian
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr

[ceph-users] Scrub Error / How does ceph pg repair work?

2015-05-11 Thread Christian Eichelmann
Hi all!

We are experiencing approximately 1 scrub error / inconsistent pg every
two days. As far as I know, to fix this you can issue a ceph pg
repair, which works fine for us. I have a few qestions regarding the
behavior of the ceph cluster in such a case:

1. After ceph detects the scrub error, the pg is marked as inconsistent.
Does that mean that any IO to this pg is blocked until it is repaired?

2. Is this amount of scrub errors normal? We currently have only 150TB
in our cluster, distributed over 720 2TB disks.

3. As far as I know, a ceph pg repair just copies the content of the
primary pg to all replicas. Is this still the case? What if the primary
copy is the one having errors? We have a 4x replication level and it
would be cool if ceph would use one of the pg for recovery which has the
same checksum as the majority of pgs.

4. Some of this errors are happening at night. Since ceph reports this
as a critical error, our shift is called and wake up, just to issue a
single command. Do you see any problems in triggering this command
automatically via monitoring event? Is there a reason why ceph isn't
resolving these errors itself when it has enought replicas to do so?

Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-21 Thread Christian Eichelmann
Hi Dan,

we are alreay back on the kernel module since the same problems were
happening with fuse. I had no special ulimit settings for the
fuse-process, so that could have been an issue there.

I was pasting you the kernel messages during such incidents here:
http://pastebin.com/X5JRe1v3

I was never debugging the kernel client. Can you give me a short hint
how to increase the debug level and where the logs will be written to?

Regards,
Christian

Am 20.04.2015 um 15:50 schrieb Dan van der Ster:
 Hi,
 This is similar to what you would observe if you hit the ulimit on
 open files/sockets in a Ceph client. Though that normally only affects
 clients in user mode, not the kernel. What are the ulimits of your
 rbd-fuse client? Also, you could increase the client logging debug
 levels to see why the client is hanging. When the kernel rbd client
 was hanging, was there anything printed to dmesg ?
 Cheers, Dan
 
 On Mon, Apr 20, 2015 at 9:29 AM, Christian Eichelmann
 christian.eichelm...@1und1.de wrote:
 Hi Ceph-Users!

 We currently have a problem where I am not sure if the it has it's cause
 in Ceph or something else. First, some information about our ceph-setup:

 * ceph version 0.87.1
 * 5 MON
 * 12 OSD with 60x2TB each
 * 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, Debian
 Wheezy)

 Our cluster is mainly used to store Log-Files from numerous servers via
 RSync and make them available via RSync as well. Since about two weeks
 we have a very strange behaviour and our RSync Gateways (they just map
 several rbd devices and export them via rsyncd): The IO Wait on the
 systems are increasing untill some of the cores getting stuck with an IO
 Wait of 100%. RSync processes become zombies (defunct) and/or can not be
 killed even with SIGKILL. After the system has reached a load of about
 1400, it becomes totally unresponsive and the only way to fix the
 problem is to reboot the system.

 I was trying to manually reproduce the problem by simultainously reading
 and writing from several machine, but the problem didn't appear.

 I have no idea where the error can be. I was doing a ceph tell osd.*
 bench during the problem and all osds where having normal benchmark
 results. Has anyone an idea how this can happen? If you need any more
 informations, please let me know.

 Regards,
 Christian

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-21 Thread Christian Eichelmann
Hi Onur,

actual 50, ideal 330128, fragmentation factor 0.97%

so fragmentation is not an issue here.

Regards,
Christian

Am 20.04.2015 um 16:41 schrieb Onur BEKTAS:
 Hi,
 
 Check   xfs fregmentation factor for rbd disks i.e.
 
 xfs_db -c frag -r /dev/sdX
 
 if it is high, try defrag
 
 xfs_fsr /dev/sdX
 
 
 Regards,
 
 Onur.
 
 
 On 4/20/2015 4:41 PM, Nick Fisk wrote:
 If possible, it might be worth trying an EXT4 formatted RBD. I've had
 problems with XFS hanging in the past on simple LVM volumes and never
 really
 got to the bottom of it, whereas the same volumes formatted with EXT4 has
 been running for years without a problem.

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Christian Eichelmann
 Sent: 20 April 2015 14:41
 To: Nick Fisk; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

 I'm using xfs on the rbd disks.
 They are between 1 and 10TB in size.

 Am 20.04.2015 um 14:32 schrieb Nick Fisk:
 Ah ok, good point

 What FS are you using on the RBD?

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of Christian Eichelmann
 Sent: 20 April 2015 13:16
 To: Nick Fisk; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

 Hi Nick,

 I forgot to mention that I was also trying a workaround using the
 userland (rbd-fuse). The behaviour was exactly the same (worked fine
 for several hours, testing parallel reading and writing, then IO Wait
 and system load increased).

 This is why I don't think it is an issue with the rbd kernel module.

 Regards,
 Christian

 Am 20.04.2015 um 11:37 schrieb Nick Fisk:
 Hi Christian,

 A very non-technical answer but as the problem seems related to the
 RBD client it might be worth trying the latest Kernel if possible.
 The RBD client is Kernel based and so there may be a fix which might
 stop this from happening.

 Nick

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
 Behalf Of Christian Eichelmann
 Sent: 20 April 2015 08:29
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

 Hi Ceph-Users!

 We currently have a problem where I am not sure if the it has it's
 cause
 in
 Ceph or something else. First, some information about our
 ceph-setup:

 * ceph version 0.87.1
 * 5 MON
 * 12 OSD with 60x2TB each
 * 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1,
 Debian
 Wheezy)

 Our cluster is mainly used to store Log-Files from numerous servers
 via
 RSync
 and make them available via RSync as well. Since about two weeks we
 have a very strange behaviour and our RSync Gateways (they just map
 several rbd devices and export them via rsyncd): The IO Wait on
 the systems are increasing untill some of the cores getting stuck
 with an
 IO
 Wait of 100%.
 RSync processes become zombies (defunct) and/or can not be killed
 even with SIGKILL. After the system has reached a load of about
 1400, it
 becomes
 totally unresponsive and the only way to fix the problem is to
 reboot
 the
 system.

 I was trying to manually reproduce the problem by simultainously
 reading and writing from several machine, but the problem didn't
 appear.
 I have no idea where the error can be. I was doing a ceph tell
 osd.* bench during the problem and all osds where having normal
 benchmark results. Has anyone an idea how this can happen? If you
 need any more informations, please let me know.

 Regards,
 Christian

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 -- 
 Christian Eichelmann
 Systemadministrator

 11 Internet AG - IT Operations Mail  Media Advertising  Targeting
 Brauerstraße 48 · DE-76135 Karlsruhe
 Telefon: +49 721 91374-8026
 christian.eichelm...@1und1.de

 Amtsgericht Montabaur / HRB 6484
 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
 Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan
 Oetjen
 Aufsichtsratsvorsitzender: Michael Scheeren
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 -- 
 Christian Eichelmann
 Systemadministrator

 11 Internet AG - IT Operations Mail  Media Advertising  Targeting
 Brauerstraße 48 · DE-76135 Karlsruhe
 Telefon: +49 721 91374-8026
 christian.eichelm...@1und1.de

 Amtsgericht Montabaur / HRB 6484
 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
 Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan
 Oetjen
 Aufsichtsratsvorsitzender: Michael Scheeren
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-21 Thread Christian Eichelmann
Hi Dan,

nope, we have no iptables rules on those hosts and the gateway is on the
same subnet as the ceph cluster.

I will see if I can find some informations on how to debug the rbd
kernel module (any suggestions are appreciated :))

Regards,
Christian

Am 21.04.2015 um 10:20 schrieb Dan van der Ster:
 Hi Christian,
 
 I've never debugged the kernel client either, so I don't know how to
 increase debugging. (I don't see any useful parms on the kernel
 modules).
 
 Your log looks like the client just stops communicating with the ceph
 cluster. Is iptables getting in the way ?
 
 Cheers, Dan
 
 On Tue, Apr 21, 2015 at 9:13 AM, Christian Eichelmann
 christian.eichelm...@1und1.de wrote:
 Hi Dan,

 we are alreay back on the kernel module since the same problems were
 happening with fuse. I had no special ulimit settings for the
 fuse-process, so that could have been an issue there.

 I was pasting you the kernel messages during such incidents here:
 http://pastebin.com/X5JRe1v3

 I was never debugging the kernel client. Can you give me a short hint
 how to increase the debug level and where the logs will be written to?

 Regards,
 Christian

 Am 20.04.2015 um 15:50 schrieb Dan van der Ster:
 Hi,
 This is similar to what you would observe if you hit the ulimit on
 open files/sockets in a Ceph client. Though that normally only affects
 clients in user mode, not the kernel. What are the ulimits of your
 rbd-fuse client? Also, you could increase the client logging debug
 levels to see why the client is hanging. When the kernel rbd client
 was hanging, was there anything printed to dmesg ?
 Cheers, Dan

 On Mon, Apr 20, 2015 at 9:29 AM, Christian Eichelmann
 christian.eichelm...@1und1.de wrote:
 Hi Ceph-Users!

 We currently have a problem where I am not sure if the it has it's cause
 in Ceph or something else. First, some information about our ceph-setup:

 * ceph version 0.87.1
 * 5 MON
 * 12 OSD with 60x2TB each
 * 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, Debian
 Wheezy)

 Our cluster is mainly used to store Log-Files from numerous servers via
 RSync and make them available via RSync as well. Since about two weeks
 we have a very strange behaviour and our RSync Gateways (they just map
 several rbd devices and export them via rsyncd): The IO Wait on the
 systems are increasing untill some of the cores getting stuck with an IO
 Wait of 100%. RSync processes become zombies (defunct) and/or can not be
 killed even with SIGKILL. After the system has reached a load of about
 1400, it becomes totally unresponsive and the only way to fix the
 problem is to reboot the system.

 I was trying to manually reproduce the problem by simultainously reading
 and writing from several machine, but the problem didn't appear.

 I have no idea where the error can be. I was doing a ceph tell osd.*
 bench during the problem and all osds where having normal benchmark
 results. Has anyone an idea how this can happen? If you need any more
 informations, please let me know.

 Regards,
 Christian

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 --
 Christian Eichelmann
 Systemadministrator

 11 Internet AG - IT Operations Mail  Media Advertising  Targeting
 Brauerstraße 48 · DE-76135 Karlsruhe
 Telefon: +49 721 91374-8026
 christian.eichelm...@1und1.de

 Amtsgericht Montabaur / HRB 6484
 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
 Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
 Aufsichtsratsvorsitzender: Michael Scheeren


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Christian Eichelmann
I'm using xfs on the rbd disks.
They are between 1 and 10TB in size.

Am 20.04.2015 um 14:32 schrieb Nick Fisk:
 Ah ok, good point
 
 What FS are you using on the RBD?
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Christian Eichelmann
 Sent: 20 April 2015 13:16
 To: Nick Fisk; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

 Hi Nick,

 I forgot to mention that I was also trying a workaround using the userland
 (rbd-fuse). The behaviour was exactly the same (worked fine for several
 hours, testing parallel reading and writing, then IO Wait and system load
 increased).

 This is why I don't think it is an issue with the rbd kernel module.

 Regards,
 Christian

 Am 20.04.2015 um 11:37 schrieb Nick Fisk:
 Hi Christian,

 A very non-technical answer but as the problem seems related to the
 RBD client it might be worth trying the latest Kernel if possible. The
 RBD client is Kernel based and so there may be a fix which might stop
 this from happening.

 Nick

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of Christian Eichelmann
 Sent: 20 April 2015 08:29
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

 Hi Ceph-Users!

 We currently have a problem where I am not sure if the it has it's
 cause
 in
 Ceph or something else. First, some information about our ceph-setup:

 * ceph version 0.87.1
 * 5 MON
 * 12 OSD with 60x2TB each
 * 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1,
 Debian
 Wheezy)

 Our cluster is mainly used to store Log-Files from numerous servers
 via
 RSync
 and make them available via RSync as well. Since about two weeks we
 have a very strange behaviour and our RSync Gateways (they just map
 several rbd devices and export them via rsyncd): The IO Wait on the
 systems are increasing untill some of the cores getting stuck with an
 IO
 Wait of 100%.
 RSync processes become zombies (defunct) and/or can not be killed
 even with SIGKILL. After the system has reached a load of about 1400,
 it
 becomes
 totally unresponsive and the only way to fix the problem is to
 reboot
 the
 system.

 I was trying to manually reproduce the problem by simultainously
 reading and writing from several machine, but the problem didn't
 appear.

 I have no idea where the error can be. I was doing a ceph tell osd.*
 bench during the problem and all osds where having normal benchmark
 results. Has anyone an idea how this can happen? If you need any more
 informations, please let me know.

 Regards,
 Christian

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






 --
 Christian Eichelmann
 Systemadministrator

 11 Internet AG - IT Operations Mail  Media Advertising  Targeting
 Brauerstraße 48 · DE-76135 Karlsruhe
 Telefon: +49 721 91374-8026
 christian.eichelm...@1und1.de

 Amtsgericht Montabaur / HRB 6484
 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
 Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
 Aufsichtsratsvorsitzender: Michael Scheeren
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Christian Eichelmann
Hi Nick,

I forgot to mention that I was also trying a workaround using the
userland (rbd-fuse). The behaviour was exactly the same (worked fine for
several hours, testing parallel reading and writing, then IO Wait and
system load increased).

This is why I don't think it is an issue with the rbd kernel module.

Regards,
Christian

Am 20.04.2015 um 11:37 schrieb Nick Fisk:
 Hi Christian,
 
 A very non-technical answer but as the problem seems related to the RBD
 client it might be worth trying the latest Kernel if possible. The RBD
 client is Kernel based and so there may be a fix which might stop this from
 happening.
 
 Nick 
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Christian Eichelmann
 Sent: 20 April 2015 08:29
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

 Hi Ceph-Users!

 We currently have a problem where I am not sure if the it has it's cause
 in
 Ceph or something else. First, some information about our ceph-setup:

 * ceph version 0.87.1
 * 5 MON
 * 12 OSD with 60x2TB each
 * 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, Debian
 Wheezy)

 Our cluster is mainly used to store Log-Files from numerous servers via
 RSync
 and make them available via RSync as well. Since about two weeks we have a
 very strange behaviour and our RSync Gateways (they just map several rbd
 devices and export them via rsyncd): The IO Wait on the systems are
 increasing untill some of the cores getting stuck with an IO Wait of 100%.
 RSync processes become zombies (defunct) and/or can not be killed even
 with SIGKILL. After the system has reached a load of about 1400, it
 becomes
 totally unresponsive and the only way to fix the problem is to reboot
 the
 system.

 I was trying to manually reproduce the problem by simultainously reading
 and writing from several machine, but the problem didn't appear.

 I have no idea where the error can be. I was doing a ceph tell osd.* bench
 during the problem and all osds where having normal benchmark results. Has
 anyone an idea how this can happen? If you need any more informations,
 please let me know.

 Regards,
 Christian

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Christian Eichelmann
Hi Ceph-Users!

We currently have a problem where I am not sure if the it has it's cause
in Ceph or something else. First, some information about our ceph-setup:

* ceph version 0.87.1
* 5 MON
* 12 OSD with 60x2TB each
* 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, Debian
Wheezy)

Our cluster is mainly used to store Log-Files from numerous servers via
RSync and make them available via RSync as well. Since about two weeks
we have a very strange behaviour and our RSync Gateways (they just map
several rbd devices and export them via rsyncd): The IO Wait on the
systems are increasing untill some of the cores getting stuck with an IO
Wait of 100%. RSync processes become zombies (defunct) and/or can not be
killed even with SIGKILL. After the system has reached a load of about
1400, it becomes totally unresponsive and the only way to fix the
problem is to reboot the system.

I was trying to manually reproduce the problem by simultainously reading
and writing from several machine, but the problem didn't appear.

I have no idea where the error can be. I was doing a ceph tell osd.*
bench during the problem and all osds where having normal benchmark
results. Has anyone an idea how this can happen? If you need any more
informations, please let me know.

Regards,
Christian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-10 Thread Christian Eichelmann

Hi Sage,

we hit this problem a few monthes ago as well and it took us quite a 
while to figure out what's wrong.


As a Systemadministrator I don't like the idea that daemons or even init 
scripts are changing system wide configuration parameters, so I wouldn't 
like to see the OSDs do it themself.


I've noticed that building ceph on high density hardware is a totally 
different thing with totally different problems and solutions than with 
common hardware. I would like to see a special section in the 
documentation regarding problems with that kind of hardware and ceph 
clusters at a larger scale.


So I vote for the documentation. Sysctls are something I want to set for 
myself.
The idea with the warning is on one hand a good hint, on the other hand 
it also may confuse people, since changing this setting is not required 
for common hardware.


Regards,
Christian

On 03/09/2015 08:01 PM, Sage Weil wrote:

On Mon, 9 Mar 2015, Karan Singh wrote:

Thanks Guys kernel.pid_max=4194303 did the trick.

Great to hear!  Sorry we missed that you only had it at 65536.

This is a really common problem that people hit when their clusters start
to grow.  Is there somewhere in the docs we can put this to catch more
users?  Or maybe a warning issued by the osds themselves or something if
they see limits that are low?

sage


- Karan -

   On 09 Mar 2015, at 14:48, Christian Eichelmann
   christian.eichelm...@1und1.de wrote:

Hi Karan,

as you are actually writing in your own book, the problem is the
sysctl
setting kernel.pid_max. I've seen in your bug report that you were
setting it to 65536, which is still to low for high density hardware.

In our cluster, one OSD server has in an idle situation about 66.000
Threads (60 OSDs per Server). The number of threads increases when you
increase the number of placement groups in the cluster, which I think
has triggered your problem.

Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
Aliyar suggested, and the problem should be gone.

Regards,
Christian

Am 09.03.2015 11:41, schrieb Karan Singh:
   Hello Community need help to fix a long going Ceph
   problem.

   Cluster is unhealthy , Multiple OSDs are DOWN. When i am
   trying to
   restart OSD?s i am getting this error


   /2015-03-09 12:22:16.312774 7f760dac9700 -1
   common/Thread.cc
   http://Thread.cc: In function 'void
   Thread::create(size_t)' thread
   7f760dac9700 time 2015-03-09 12:22:16.311970/
   /common/Thread.cc http://Thread.cc: 129: FAILED
   assert(ret == 0)/


   *Environment *:  4 Nodes , OSD+Monitor , Firefly latest ,
   CentOS6.5
   , 3.17.2-1.el6.elrepo.x86_64

   Tried upgrading from 0.80.7 to 0.80.8  but no Luck

   Tried centOS stock kernel 2.6.32  but no Luck

   Memory is not a problem more then 150+GB is free


   Did any one every faced this problem ??

   *Cluster status *
   *
   *
   / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
   / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down;
   1 pgs
   incomplete; 1735 pgs peering; 8938 pgs stale; 1/
   /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs
   stuck unclean;
   recovery 6061/31080 objects degraded (19/
   /.501%); 111/196 in osds are down; clock skew detected on
   mon.pouta-s02,
   mon.pouta-s03/
   / monmap e3: 3 mons at
{pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX
   .50.3:6789/
   //0}, election epoch 1312, quorum 0,1,2
   pouta-s01,pouta-s02,pouta-s03/
   /   * osdmap e26633: 239 osds: 85 up, 196 in*/
   /  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data,
   10360 objects/
   /4699 GB used, 707 TB / 711 TB avail/
   /6061/31080 objects degraded (19.501%)/
   /  14 down+remapped+peering/
   /  39 active/
   /3289 active+clean/
   / 547 peering/
   / 663 stale+down+peering/
   / 705 stale+active+remapped/
   /   1 active+degraded+remapped/
   /   1 stale+down+incomplete/
   / 484 down+peering/
   / 455 active+remapped/
   /3696 stale+active+degraded/
   /   4 remapped+peering/
   /  23 stale+down+remapped+peering/
   /  51 stale+active/
   /3637 active+degraded/
   /3799 stale+active+clean/

   *OSD :  Logs *

   /2015-03-09 12:22:16.312774 7f760dac9700 -1
   common/Thread.cc
   http://Thread.cc: In function 'void
   Thread::create(size_t)' thread
   7f760dac9700 time 2015-03-09 12:22:16.311970/
   /common/Thread.cc http://Thread.cc: 129: FAILED
   assert(ret == 0)/
   /
   /
   / ceph version 0.80.8

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Christian Eichelmann
Hi Karan,

as you are actually writing in your own book, the problem is the sysctl
setting kernel.pid_max. I've seen in your bug report that you were
setting it to 65536, which is still to low for high density hardware.

In our cluster, one OSD server has in an idle situation about 66.000
Threads (60 OSDs per Server). The number of threads increases when you
increase the number of placement groups in the cluster, which I think
has triggered your problem.

Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
Aliyar suggested, and the problem should be gone.

Regards,
Christian

Am 09.03.2015 11:41, schrieb Karan Singh:
 Hello Community need help to fix a long going Ceph problem.
 
 Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to
 restart OSD’s i am getting this error 
 
 
 /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970/
 /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/
 
 
 *Environment *:  4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5
 , 3.17.2-1.el6.elrepo.x86_64
 
 Tried upgrading from 0.80.7 to 0.80.8  but no Luck
 
 Tried centOS stock kernel 2.6.32  but no Luck
 
 Memory is not a problem more then 150+GB is free 
 
 
 Did any one every faced this problem ??
 
 *Cluster status *
 *
 *
  / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
 / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs
 incomplete; 1735 pgs peering; 8938 pgs stale; 1/
 /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean;
 recovery 6061/31080 objects degraded (19/
 /.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02,
 mon.pouta-s03/
 / monmap e3: 3 mons at
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789/
 //0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03/
 /   * osdmap e26633: 239 osds: 85 up, 196 in*/
 /  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects/
 /4699 GB used, 707 TB / 711 TB avail/
 /6061/31080 objects degraded (19.501%)/
 /  14 down+remapped+peering/
 /  39 active/
 /3289 active+clean/
 / 547 peering/
 / 663 stale+down+peering/
 / 705 stale+active+remapped/
 /   1 active+degraded+remapped/
 /   1 stale+down+incomplete/
 / 484 down+peering/
 / 455 active+remapped/
 /3696 stale+active+degraded/
 /   4 remapped+peering/
 /  23 stale+down+remapped+peering/
 /  51 stale+active/
 /3637 active+degraded/
 /3799 stale+active+clean/
 
 *OSD :  Logs *
 
 /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970/
 /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/
 /
 /
 / ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)/
 / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
 / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]/
 / 3: (Accepter::entry()+0x265) [0xb5c635]/
 / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/
 / 5: (clone()+0x6d) [0x3c8a2e89dd]/
 / NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this./
 
 
 *More information at Ceph Tracker Issue :
  *http://tracker.ceph.com/issues/10988#change-49018
 
 
 
 Karan Singh 
 Systems Specialist , Storage Platforms
 CSC - IT Center for Science,
 Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
 mobile: +358 503 812758
 tel. +358 9 4572001
 fax +358 9 4572302
 http://www.csc.fi/
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor Restart triggers half of our OSDs marked down

2015-02-05 Thread Christian Eichelmann
Am 05.02.2015 10:10, schrieb Dan van der Ster:
 
 But then when I restarted the (peon) monitor:
 
 2015-01-29 11:29:18.250750 mon.0 128.142.35.220:6789/0 10570 : [INF]
 pgmap v35847068: 24608 pgs: 1 active+clean+scrubbing+deep, 24602
 active+clean, 5 active+clean+scrubbing; 125 T
 B data, 377 TB used, 2021 TB / 2399 TB avail; 193 MB/s rd, 238 MB/s
 wr, 7410 op/s
 2015-01-29 11:29:28.844678 mon.3 128.142.39.77:6789/0 1 : [INF] mon.2
 calling new monitor election
 2015-01-29 11:29:33.846946 mon.2 128.142.36.229:6789/0 9 : [INF] mon.4
 calling new monitor election
 2015-01-29 11:29:33.847022 mon.4 128.142.39.144:6789/0 7 : [INF] mon.3
 calling new monitor election
 2015-01-29 11:29:33.847085 mon.1 128.142.36.227:6789/0 24 : [INF]
 mon.1 calling new monitor election
 2015-01-29 11:29:33.853498 mon.3 128.142.39.77:6789/0 2 : [INF] mon.2
 calling new monitor election
 2015-01-29 11:29:33.895660 mon.0 128.142.35.220:6789/0 10860 : [INF]
 mon.0 calling new monitor election
 2015-01-29 11:29:33.901335 mon.0 128.142.35.220:6789/0 10861 : [INF]
 mon.0@0 won leader election with quorum 0,1,2,3,4
 2015-01-29 11:29:34.004028 mon.0 128.142.35.220:6789/0 10862 : [INF]
 monmap e5: 5 mons at
 {0=128.142.35.220:6789/0,1=128.142.36.227:6789/0,2=128.142.39.77:6789/0,3=128.142.39.144:6789/0,4=128.142.36.229:6789/0}
 2015-01-29 11:29:34.005808 mon.0 128.142.35.220:6789/0 10863 : [INF]
 pgmap v35847069: 24608 pgs: 1 active+clean+scrubbing+deep, 24602
 active+clean, 5 active+clean+scrubbing; 125 TB data, 377 TB used, 2021
 TB / 2399 TB avail; 54507 kB/s rd, 85412 kB/s wr, 1967 op/s
 2015-01-29 11:29:34.006111 mon.0 128.142.35.220:6789/0 10864 : [INF]
 mdsmap e157: 1/1/1 up {0=0=up:active}
 2015-01-29 11:29:34.007165 mon.0 128.142.35.220:6789/0 10865 : [INF]
 osdmap e132055: 880 osds: 880 up, 880 in
 2015-01-29 11:29:34.037367 mon.0 128.142.35.220:6789/0 11055 : [INF]
 osd.1202 128.142.23.104:6801/98353 failed (4 reports from 3 peers
 after 29.673699 = grace 28.948726)
 2015-01-29 11:29:34.050478 mon.0 128.142.35.220:6789/0 11139 : [INF]
 osd.1164 128.142.23.102:6850/22486 failed (3 reports from 2 peers
 after 30.685537 = grace 28.946983)
 
 
 and then just after:
 
 2015-01-29 11:29:35.210184 osd.1202 128.142.23.104:6801/98353 59 :
 [WRN] map e132056 wrongly marked me down
 2015-01-29 11:29:35.441922 osd.1164 128.142.23.102:6850/22486 25 :
 [WRN] map e132056 wrongly marked me down

The behaviour is exactly the same on our system, to it looks like the
same issue.
We are current running Giant by the way (0.87)

 plus many other OSDs like that.



-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Monitor Restart triggers half of our OSDs marked down

2015-02-03 Thread Christian Eichelmann
Hi all,

during some failover tests and some configuration tests, we currently
discover a strange phenomenon:

Restarting one of our monitors (5 in sum) triggers about 300 of the
following events:

osd.669 10.76.28.58:6935/149172 failed (20 reports from 20 peers after
22.005858 = grace 20.00)

The osds come back up shortly after the have been marked down. What I
don't understand is: How can a restart of one monitor prevent the osds
from talking to each other and marking them down?

FYI:
We are currently using the following settings:
mon osd adjust hearbeat grace = false
mon osd min down reporters = 20
mon osd adjust down out interval = false

Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Behaviour of Ceph while OSDs are down

2015-01-21 Thread Christian Eichelmann

Hi Samuel, Hi Gregory,

we are using Giant (0.87).

Sure, I was checking on this PGs. The strange thing was, that they 
reported a bad state (state: inactive), but looking at the recovery 
state, everything seems to be fine. That would point to the mentioned 
bug. Do you have a link to this bug, so I can have a look at it to 
confirm that we are having the same issues?


Here is a pg_query (slightly older and with only 3x replication, so 
don't be confused):

http://pastebin.com/fyC8Qepv

Regards,
Christian

On 01/20/2015 10:57 PM, Samuel Just wrote:

Version?
-Sam

On Tue, Jan 20, 2015 at 9:45 AM, Gregory Farnum g...@gregs42.com wrote:

On Tue, Jan 20, 2015 at 2:40 AM, Christian Eichelmann
christian.eichelm...@1und1.de wrote:

Hi all,

I want to understand what Ceph does if several OSDs are down. First of our,
some words to our Setup:

We have 5 Monitors and 12 OSD Server, each has 60x2TB Disks. These Servers
are spread across 4 racks in our datacenter. Every rack holds 3 OSD Server.
We have a replication factor of 4 and a crush rule applied that says step
chooseleaf firstn 0 type rack. So, in my oppinion, every rack should hold a
copy of all the data in our ceph cluster. Is that more or less correct?

So, our cluster is in state health OK and I am rebooting one of our OSD
servers. That means 60 of 720 OSDs are going down. Since this hardware takes
quite some time to boot up, we are using mon osd down out subtree limit =
host to avoid rebalancing when a whole server goes down. Ceph show this
output of ceph -s while the OSDs are down:

  health HEALTH_WARN 7 pgs degraded; 1 pgs peering; 7 pgs stuck
degraded; 1 pgs stuck inactive; 8 pgs stuck unclean; 7 pgs stuck und
ersized; 7 pgs undersized; recovery 623/7420 objects degraded (8.396%);
60/720 in osds are down
  monmap e5: 5 mons at
{mon-bs01=10.76.28.160:6789/0,mon-bs02=10.76.28.161:6789/0,mon-bs03=10.76.28.162:6789/0,mon-bs04=10.76.28.8:6789/0,mon-bs05=1
0.76.28.9:6789/0}, election epoch 228, quorum 0,1,2,3,4
mon-bs04,mon-bs05,mon-bs01,mon-bs02,mon-bs03
  osdmap e60390: 720 osds: 660 up, 720 in
   pgmap v15427437: 67584 pgs, 2 pools, 7253 MB data, 1855 objects
 3948 GB used, 1304 TB / 1308 TB avail
 623/7420 objects degraded (8.396%)
45356 active+clean
1 peering
7 active+undersized+degraded

The pgs that are degraded and undersized are not a problem, since this
behaviour is expected. I am worried about the peering pg (it stays in this
state until all osds are up again) since this would cause I/O to hang if I
am not mistaken.

After the host is back up and all OSDs are up and running again, I see this:

  health HEALTH_WARN 2 pgs stuck unclean
  monmap e5: 5 mons at
{mon-bs01=10.76.28.160:6789/0,mon-bs02=10.76.28.161:6789/0,mon-bs03=10.76.28.162:6789/0,mon-bs04=10.76.28.8:6789/0,mon-bs05=10.76.28.9:6789/0},
election epoch 228, quorum 0,1,2,3,4
mon-bs04,mon-bs05,mon-bs01,mon-bs02,mon-bs03
  osdmap e60461: 720 osds: 720 up, 720 in
   pgmap v15427555: 67584 pgs, 2 pools, 7253 MB data, 1855 objects
 3972 GB used, 1304 TB / 1308 TB avail
2 inactive
67582 active+clean

Without any interaction, it will stay in this state. I guess these two
inactive pgs will also cause I/O to hang? Some more information:

ceph health detail
HEALTH_WARN 2 pgs stuck unclean
pg 9.f765 is stuck unclean for 858.298811, current state inactive, last
acting [91,362,484,553]
pg 9.ea0f is stuck unclean for 963.441117, current state inactive, last
acting [91,233,485,524]

I was trying to give osd.91 a kick with ceph osd down 91

After the osd is back in the cluster:
health HEALTH_WARN 3 pgs peering; 54 pgs stuck inactive; 57 pgs stuck
unclean

So even worse. I decided to take the osd out. The cluster goes back to
HEALTH_OK. Bringing the OSD back in, the cluster does some rebalancing,
ending with the cluster in an OK state again.

That actually happens everytime when there are some OSDs going down. I don't
understand why the cluster is not able to get back to a healthy state
without admin interaction. In a setup with several hundred OSDs it is normal
business that some of the go down from time to time. Are there any ideas why
this is happening? Right now, we do not have many data in our cluster, so I
can do some tests. Any suggestions would be appreciated.

Have you done any digging into the state of the PGs reported as
peering or inactive or whatever when this pops up? Running pg_query,
looking at their calculated and acting sets, etc.

I suspect it's more likely you're exposing a reporting bug with stale
data, rather than actually stuck PGs, but it would take more
information to check that out.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users

[ceph-users] Behaviour of Ceph while OSDs are down

2015-01-20 Thread Christian Eichelmann

Hi all,

I want to understand what Ceph does if several OSDs are down. First of 
our, some words to our Setup:


We have 5 Monitors and 12 OSD Server, each has 60x2TB Disks. These 
Servers are spread across 4 racks in our datacenter. Every rack holds 3 
OSD Server. We have a replication factor of 4 and a crush rule applied 
that says step chooseleaf firstn 0 type rack. So, in my oppinion, 
every rack should hold a copy of all the data in our ceph cluster. Is 
that more or less correct?


So, our cluster is in state health OK and I am rebooting one of our OSD 
servers. That means 60 of 720 OSDs are going down. Since this hardware 
takes quite some time to boot up, we are using mon osd down out subtree 
limit = host to avoid rebalancing when a whole server goes down. Ceph 
show this output of ceph -s while the OSDs are down:


 health HEALTH_WARN 7 pgs degraded; 1 pgs peering; 7 pgs 
stuck degraded; 1 pgs stuck inactive; 8 pgs stuck unclean; 7 pgs 
stuck und
ersized; 7 pgs undersized; recovery 623/7420 objects degraded 
(8.396%); 60/720 in osds are down
 monmap e5: 5 mons at 
{mon-bs01=10.76.28.160:6789/0,mon-bs02=10.76.28.161:6789/0,mon-bs03=10.76.28.162:6789/0,mon-bs04=10.76.28.8:6789/0,mon-bs05=1
0.76.28.9:6789/0}, election epoch 228, quorum 0,1,2,3,4 
mon-bs04,mon-bs05,mon-bs01,mon-bs02,mon-bs03

 osdmap e60390: 720 osds: 660 up, 720 in
  pgmap v15427437: 67584 pgs, 2 pools, 7253 MB data, 1855 objects
3948 GB used, 1304 TB / 1308 TB avail
623/7420 objects degraded (8.396%)
   45356 active+clean
   1 peering
   7 active+undersized+degraded

The pgs that are degraded and undersized are not a problem, since this 
behaviour is expected. I am worried about the peering pg (it stays in 
this state until all osds are up again) since this would cause I/O to 
hang if I am not mistaken.


After the host is back up and all OSDs are up and running again, I see this:

 health HEALTH_WARN 2 pgs stuck unclean
 monmap e5: 5 mons at 
{mon-bs01=10.76.28.160:6789/0,mon-bs02=10.76.28.161:6789/0,mon-bs03=10.76.28.162:6789/0,mon-bs04=10.76.28.8:6789/0,mon-bs05=10.76.28.9:6789/0}, 
election epoch 228, quorum 0,1,2,3,4 
mon-bs04,mon-bs05,mon-bs01,mon-bs02,mon-bs03

 osdmap e60461: 720 osds: 720 up, 720 in
  pgmap v15427555: 67584 pgs, 2 pools, 7253 MB data, 1855 objects
3972 GB used, 1304 TB / 1308 TB avail
   2 inactive
   67582 active+clean

Without any interaction, it will stay in this state. I guess these two 
inactive pgs will also cause I/O to hang? Some more information:


ceph health detail
HEALTH_WARN 2 pgs stuck unclean
pg 9.f765 is stuck unclean for 858.298811, current state inactive, last 
acting [91,362,484,553]
pg 9.ea0f is stuck unclean for 963.441117, current state inactive, last 
acting [91,233,485,524]


I was trying to give osd.91 a kick with ceph osd down 91

After the osd is back in the cluster:
health HEALTH_WARN 3 pgs peering; 54 pgs stuck inactive; 57 pgs stuck 
unclean


So even worse. I decided to take the osd out. The cluster goes back to 
HEALTH_OK. Bringing the OSD back in, the cluster does some rebalancing, 
ending with the cluster in an OK state again.


That actually happens everytime when there are some OSDs going down. I 
don't understand why the cluster is not able to get back to a healthy 
state without admin interaction. In a setup with several hundred OSDs it 
is normal business that some of the go down from time to time. Are there 
any ideas why this is happening? Right now, we do not have many data in 
our cluster, so I can do some tests. Any suggestions would be appreciated.


Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Placementgroups stuck peering

2015-01-14 Thread Christian Eichelmann
Hi all,

after our cluster problems with incomplete placementgroups, we've
decided to remove our pools and create new ones. This was going fine in
the beginning. After adding an additional OSD server, we now have 2 PGs
that are stuck in the peering state:

HEALTH_WARN 2 pgs peering; 2 pgs stuck inactive; 2 pgs stuck unclean
pg 9.2e41 is stuck inactive for 52540.202628, current state peering,
last acting [91,240,273]
pg 9.bad5 is stuck inactive for 52540.077013, current state peering,
last acting [335,64,273]
pg 9.2e41 is stuck unclean for 65683.195508, current state peering, last
acting [91,240,273]
pg 9.bad5 is stuck unclean for 65683.218581, current state peering, last
acting [335,64,273]
pg 9.bad5 is peering, acting [335,64,273]
pg 9.2e41 is peering, acting [91,240,273]

I was checking the placementgroups with ceph pg query, but I found no
reasons why the peering can not be completed.

The out of ceph pg 9.2e41 query:
http://pastebin.com/fyC8Qepv

Any ideas?

Regards,
Christian


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Christian Eichelmann
Hi Lionel,

we have a ceph cluster with in sum about 1PB, 12 OSDs with 60 Disks,
devided into 4 racks in 2 rooms, all connected with a dedicated 10G
cluster network. Of course with a replication level of 3.

We did about 9 Month intensive testing. Just like you, we were never
experiences that kind of problems before. And incomplete PG was
recovering as soon as at least one OSD holding a copy of it came back up.

We still don't know what caused this specific error, but at no point
there were more than two hosts down at the same time. Our pool has a
min_size of 1. And after everything was up again, we had completely LOST
2 of 3 pg copies (the directories on the OSDs were empty) and the third
copy was obvioulsy broken, because even manually injecting this pg into
the other osds didn't changed anything.

My main problem here is, that with even one incomplete PG your pool is
rendered unusable. And there is currently no way to make ceph forget
about the data of this pg and create it as an empty one. So the only way
to make this pool usable again is to loose all your data in there. Which
for me is just not acceptable.

Regards,
Christian

Am 07.01.2015 21:10, schrieb Lionel Bouton:
 On 12/30/14 16:36, Nico Schottelius wrote:
 Good evening,

 we also tried to rescue data *from* our old / broken pool by map'ing the
 rbd devices, mounting them on a host and rsync'ing away as much as
 possible.

 However, after some time rsync got completly stuck and eventually the
 host which mounted the rbd mapped devices decided to kernel panic at
 which time we decided to drop the pool and go with a backup.

 This story and the one of Christian makes me wonder:

 Is anyone using ceph as a backend for qemu VM images in production?
 
 Yes with Ceph 0.80.5 since September after extensive testing over
 several months (including an earlier version IIRC) and some hardware
 failure simulations. We plan to upgrade one storage host and one monitor
 to 0.80.7 to validate this version over several months too before
 migrating the others.
 

 And:

 Has anyone on the list been able to recover from a pg incomplete /
 stuck situation like ours?
 
 Only by adding back an OSD with the data needed to reach min_size for
 said pg, which is expected behavior. Even with some experimentations
 with isolated unstable OSDs I've not yet witnessed a case where Ceph
 lost multiple replicates simultaneously (we lost one OSD to disk failure
 and another to a BTRFS bug but without trying to recover the filesystem
 so we might have been able to recover this OSD).
 
 If your setup is susceptible to situations where you can lose all
 replicates you will lose data but there's not much that can be done
 about that. Ceph actually begins to generate new replicates to replace
 the missing onesaftermon osd down out interval so the actual loss
 should not happen unless you lose (and can't recover) size OSDs on
 separate hosts (with default crush map) simultaneously. Before going in
 production you should know how long Ceph will take to fully recover from
 a disk or host failure by testing it with load. Your setup might not be
 robust if it hasn't the available disk space or the speed needed to
 recover quickly from such a failure.
 
 Lionel
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Documentation of ceph pg num query

2015-01-09 Thread Christian Eichelmann
Hi all,

as mentioned last year, our ceph cluster is still broken and unusable.
We are still investigating what has happened and I am taking more deep
looks into the output of ceph pg pgnum query.

The problem is that I can find some informations about what some of the
sections mean, but mostly I can only guess. Is there any kind of
documentation where I can find some explanations of whats state there?
Because without that the output is barely usefull.

Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Christian Eichelmann
Hi Nico and all others who answered,

After some more trying to somehow get the pgs in a working state (I've
tried force_create_pg, which was putting then in creating state. But
that was obviously not true, since after rebooting one of the containing
osd's it went back to incomplete), I decided to save what can be saved.

I've created a new pool, created a new image there, mapped the old image
from the old pool and the new image from the new pool to a machine, to
copy data on posix level.

Unfortunately, formatting the image from the new pool hangs after some
time. So it seems that the new pool is suffering from the same problem
as the old pool. Which is totaly not understandable for me.

Right now, it seems like Ceph is giving me no options to either save
some of the still intact rbd volumes, or to create a new pool along the
old one to at least enable our clients to send data to ceph again.

To tell the truth, I guess that will result in the end of our ceph
project (running for already 9 Monthes).

Regards,
Christian

Am 29.12.2014 15:59, schrieb Nico Schottelius:
 Hey Christian,
 
 Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]:
 [incomplete PG / RBD hanging, osd lost also not helping]
 
 that is very interesting to hear, because we had a similar situation
 with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg
 directories to allow OSDs to start after the disk filled up completly.
 
 So I am sorry not to being able to give you a good hint, but I am very
 interested in seeing your problem solved, as it is a show stopper for
 us, too. (*)
 
 Cheers,
 
 Nico
 
 (*) We migrated from sheepdog to gluster to ceph and so far sheepdog
 seems to run much smoother. The first one is however not supported
 by opennebula directly, the second one not flexible enough to host
 our heterogeneous infrastructure (mixed disk sizes/amounts) - so we 
 are using ceph at the moment.
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Christian Eichelmann
Hi Eneko,

I was trying a rbd cp before, but that was haning as well. But I
couldn't find out if the source image was causing the hang or the
destination image. That's why I decided to try a posix copy.

Our cluster is sill nearly empty (12TB / 867TB). But as far as I
understood (If not, somebody please correct me) placement groups are in
genereally not shared between pools at all.

Regards,
Christian

Am 30.12.2014 12:23, schrieb Eneko Lacunza:
 Hi Christian,
 
 Have you tried to migrate the disk from the old storage (pool) to the
 new one?
 
 I think it should show the same problem, but I think it'd be a much
 easier path to recover than the posix copy.
 
 How full is your storage?
 
 Maybe you can customize the crushmap, so that some OSDs are left in the
 bad (default) pool, and other OSDs and set for the new pool. It think
 (I'm yet learning ceph) that this will make different pgs for each pool,
 also different OSDs, may be this way you can overcome the issue.
 
 Cheers
 Eneko
 
 On 30/12/14 12:17, Christian Eichelmann wrote:
 Hi Nico and all others who answered,

 After some more trying to somehow get the pgs in a working state (I've
 tried force_create_pg, which was putting then in creating state. But
 that was obviously not true, since after rebooting one of the containing
 osd's it went back to incomplete), I decided to save what can be saved.

 I've created a new pool, created a new image there, mapped the old image
 from the old pool and the new image from the new pool to a machine, to
 copy data on posix level.

 Unfortunately, formatting the image from the new pool hangs after some
 time. So it seems that the new pool is suffering from the same problem
 as the old pool. Which is totaly not understandable for me.

 Right now, it seems like Ceph is giving me no options to either save
 some of the still intact rbd volumes, or to create a new pool along the
 old one to at least enable our clients to send data to ceph again.

 To tell the truth, I guess that will result in the end of our ceph
 project (running for already 9 Monthes).

 Regards,
 Christian

 Am 29.12.2014 15:59, schrieb Nico Schottelius:
 Hey Christian,

 Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]:
 [incomplete PG / RBD hanging, osd lost also not helping]
 that is very interesting to hear, because we had a similar situation
 with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg
 directories to allow OSDs to start after the disk filled up completly.

 So I am sorry not to being able to give you a good hint, but I am very
 interested in seeing your problem solved, as it is a show stopper for
 us, too. (*)

 Cheers,

 Nico

 (*) We migrated from sheepdog to gluster to ceph and so far sheepdog
  seems to run much smoother. The first one is however not supported
  by opennebula directly, the second one not flexible enough to host
  our heterogeneous infrastructure (mixed disk sizes/amounts) - so we
  are using ceph at the moment.


 
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Christian Eichelmann
Hi Eneko,

nope, new pool has all pgs active+clean, not errors during image
creation. The format command just hangs, without error.



Am 30.12.2014 12:33, schrieb Eneko Lacunza:
 Hi Christian,
 
 New pool's pgs also show as incomplete?
 
 Did you notice something remarkable in ceph logs in the new pools image
 format?
 
 On 30/12/14 12:31, Christian Eichelmann wrote:
 Hi Eneko,

 I was trying a rbd cp before, but that was haning as well. But I
 couldn't find out if the source image was causing the hang or the
 destination image. That's why I decided to try a posix copy.

 Our cluster is sill nearly empty (12TB / 867TB). But as far as I
 understood (If not, somebody please correct me) placement groups are in
 genereally not shared between pools at all.

 Regards,
 Christian

 Am 30.12.2014 12:23, schrieb Eneko Lacunza:
 Hi Christian,

 Have you tried to migrate the disk from the old storage (pool) to the
 new one?

 I think it should show the same problem, but I think it'd be a much
 easier path to recover than the posix copy.

 How full is your storage?

 Maybe you can customize the crushmap, so that some OSDs are left in the
 bad (default) pool, and other OSDs and set for the new pool. It think
 (I'm yet learning ceph) that this will make different pgs for each pool,
 also different OSDs, may be this way you can overcome the issue.

 Cheers
 Eneko

 On 30/12/14 12:17, Christian Eichelmann wrote:
 Hi Nico and all others who answered,

 After some more trying to somehow get the pgs in a working state (I've
 tried force_create_pg, which was putting then in creating state. But
 that was obviously not true, since after rebooting one of the
 containing
 osd's it went back to incomplete), I decided to save what can be saved.

 I've created a new pool, created a new image there, mapped the old
 image
 from the old pool and the new image from the new pool to a machine, to
 copy data on posix level.

 Unfortunately, formatting the image from the new pool hangs after some
 time. So it seems that the new pool is suffering from the same problem
 as the old pool. Which is totaly not understandable for me.

 Right now, it seems like Ceph is giving me no options to either save
 some of the still intact rbd volumes, or to create a new pool along the
 old one to at least enable our clients to send data to ceph again.

 To tell the truth, I guess that will result in the end of our ceph
 project (running for already 9 Monthes).

 Regards,
 Christian

 Am 29.12.2014 15:59, schrieb Nico Schottelius:
 Hey Christian,

 Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]:
 [incomplete PG / RBD hanging, osd lost also not helping]
 that is very interesting to hear, because we had a similar situation
 with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg
 directories to allow OSDs to start after the disk filled up completly.

 So I am sorry not to being able to give you a good hint, but I am very
 interested in seeing your problem solved, as it is a show stopper for
 us, too. (*)

 Cheers,

 Nico

 (*) We migrated from sheepdog to gluster to ceph and so far sheepdog
   seems to run much smoother. The first one is however not
 supported
   by opennebula directly, the second one not flexible enough to
 host
   our heterogeneous infrastructure (mixed disk sizes/amounts) -
 so we
   are using ceph at the moment.



 
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-29 Thread Christian Eichelmann
Hi all,

we have a ceph cluster, with currently 360 OSDs in 11 Systems. Last week
we were replacing one OSD System with a new one. During that, we had a
lot of problems with OSDs crashing on all of our systems. But that is
not our current problem.

After we got everything up and running again, we still have 3 PGs in the
state incomplete. I was checking one of them directly on the systems
(replication factor is 3). On two machines the directory was there but
empty, on the third one, I found some content. Using
ceph_objectstore_tool I exported this PG and imported it on the other
nodes. Nothing changed.

We only use ceph for providing rbd images. Right now, two of them are
unusable, because ceph hangs when someone trys to access content in
these pgs. Not bad enough, if I create a new rbd image, ceph is still
using the incomplete pgs, so it is a pure gambling if a new volume will
be usable or not. That, for now, makes our 900TB ceph cluster unusable
because of 3 bad PGs.

And right here it seems like I can't to anything. Instructing the ceph
cluster to scrub, deep-scrub or repair the pg does nothing, even after
several days. Checking which rbd images are affected is also not
possible, because rados -p poolname ls hangs forever when it comes to
one of the incomplete pgs. ceph osd lost also does actually nothing.

So right now, I am OK if I lose the content of these three PGs. So how
can I get the cluster back to live without deleting the whole pool which
is not for discussion?

Regards,
Christian

P.S.
We are using Giant
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs are crashing with Cannot fork or cannot create thread but plenty of memory is left

2014-09-23 Thread Christian Eichelmann
Hi Nathan,

that was indeed the Problem! I was increasing the max_pid value to 65535
and the problem is gone! Thank you!

It was a bit misleading that there is also a
/proc/sys/kernel/threads-max, which has a much higher number. And since
I was only seeing around 400 processes and wasn't aware that threads are
also consuming pids, it was hard to find the root cause of this issue.

After this problem is solved, I'm thinking if it is a good idea to run
aout 40.000 Threads (in an idle cluster) on one machine. The system has
a load around 6-7 without having traffic, maybe just because of the
intense context-switching.

Anyways, thats another topic. Thank you for your help!

Regards,
Christian

Am 23.09.2014 03:21, schrieb Nathan O'Sullivan:
 Hi Christian,
 
 Your problem is probably that your kernel.pid_max (the maximum
 threads+processes across the entire system) needs to be increased - the
 default is 32768, which is too low for even a medium density
 deployment.  You can test this easily enough with
 
 $ ps axms | wc -l
 
 If you get a number around the 30,000 mark then you are going to be
 affected.
 
 There's an issue here http://tracker.ceph.com/issues/6142 , although it
 doesn't seem to have gotten much traction in terms of informing users.
 
 Regards
 Nathan
 
 On 15/09/2014 7:13 PM, Christian Eichelmann wrote:
 Hi all,

 I have no idea why running out of filehandles should produce a out of
 memory error, but well. I've increased the ulimit as you told me, and
 nothing changed. I've noticed that the osd init script sets the max open
 file handles explicitly, so I was setting the corresponding option in my
 ceph conf. Now the limits of an OSD process look like this:

 Limit Soft Limit   Hard Limit
 Units
 Max cpu time  unlimitedunlimited
 seconds
 Max file size unlimitedunlimited
 bytes
 Max data size unlimitedunlimited
 bytes
 Max stack size8388608  unlimited
 bytes
 Max core file sizeunlimitedunlimited
 bytes
 Max resident set  unlimitedunlimited
 bytes
 Max processes 2067478  2067478
 processes
 Max open files6553665536
 files
 Max locked memory 6553665536
 bytes
 Max address space unlimitedunlimited
 bytes
 Max file locksunlimitedunlimited
 locks
 Max pending signals   2067478  2067478
 signals
 Max msgqueue size 819200   819200
 bytes
 Max nice priority 00
 Max realtime priority 00
 Max realtime timeout  unlimitedunlimitedus

 Anyways, the exact same behavior as before. I was also finding a mailing
 on this list from someone who had the exact same problem:
 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-May/040059.html

 Unfortunately, there was also no real solution for this problem.

 So again: this is *NOT* a ulimit issue. We were running emperor and
 dumpling on the same hardware without any issues. They first started
 after our upgrade to firefly.

 Regards,
 Christian


 Am 12.09.2014 18:26, schrieb Christian Balzer:
 On Fri, 12 Sep 2014 12:05:06 -0400 Brian Rak wrote:

 That's not how ulimit works.  Check the `ulimit -a` output.

 Indeed.

 And to forestall the next questions, see man initscript, mine looks
 like
 this:
 ---
 ulimit -Hn 131072
 ulimit -Sn 65536

 # Execute the program.
 eval exec $4
 ---

 And also a /etc/security/limits.d/tuning.conf (debian) like this:
 ---
 rootsoftnofile  65536
 roothardnofile  131072
 *   softnofile  16384
 *   hardnofile  65536
 ---

 Adjusted to your actual needs. There might be other limits you're
 hitting,
 but that is the most likely one

 Also 45 OSDs with 12 (24 with HT, bleah) CPU cores is pretty ballsy.
 I personally would rather do 4 RAID6 (10 disks, with OSD SSD journals)
 with that kind of case and enjoy the fact that my OSDs never fail. ^o^

 Christian (another one)


 On 9/12/2014 10:15 AM, Christian Eichelmann wrote:
 Hi,

 I am running all commands as root, so there are no limits for the
 processes.

 Regards,
 Christian
 ___
 Von: Mariusz Gronczewski [mariusz.gronczew...@efigence.com]
 Gesendet: Freitag, 12. September 2014 15:33
 An: Christian Eichelmann
 Cc: ceph-users@lists.ceph.com
 Betreff: Re: [ceph-users] OSDs are crashing with Cannot fork or
 cannot create thread but plenty of memory is left

 do cat /proc/pid/limits

 probably you hit max processes limit or max FD limit

 Hi Ceph-Users,

 I have absolutely no idea what is going on on my systems...

 Hardware:
 45 x 4TB Harddisks
 2 x 6 Core CPUs
 256GB Memory

 When initializing all disks and join them to the cluster, after
 approximately 30 OSDs

Re: [ceph-users] OSDs are crashing with Cannot fork or cannot create thread but plenty of memory is left

2014-09-15 Thread Christian Eichelmann
Hi all,

I have no idea why running out of filehandles should produce a out of
memory error, but well. I've increased the ulimit as you told me, and
nothing changed. I've noticed that the osd init script sets the max open
file handles explicitly, so I was setting the corresponding option in my
ceph conf. Now the limits of an OSD process look like this:

Limit Soft Limit   Hard Limit
Units
Max cpu time  unlimitedunlimited
seconds
Max file size unlimitedunlimited
bytes
Max data size unlimitedunlimited
bytes
Max stack size8388608  unlimited
bytes
Max core file sizeunlimitedunlimited
bytes
Max resident set  unlimitedunlimited
bytes
Max processes 2067478  2067478
processes
Max open files6553665536
files
Max locked memory 6553665536
bytes
Max address space unlimitedunlimited
bytes
Max file locksunlimitedunlimited
locks
Max pending signals   2067478  2067478
signals
Max msgqueue size 819200   819200
bytes
Max nice priority 00
Max realtime priority 00
Max realtime timeout  unlimitedunlimitedus

Anyways, the exact same behavior as before. I was also finding a mailing
on this list from someone who had the exact same problem:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-May/040059.html

Unfortunately, there was also no real solution for this problem.

So again: this is *NOT* a ulimit issue. We were running emperor and
dumpling on the same hardware without any issues. They first started
after our upgrade to firefly.

Regards,
Christian


Am 12.09.2014 18:26, schrieb Christian Balzer:
 On Fri, 12 Sep 2014 12:05:06 -0400 Brian Rak wrote:
 
 That's not how ulimit works.  Check the `ulimit -a` output.

 Indeed.
 
 And to forestall the next questions, see man initscript, mine looks like
 this:
 ---
 ulimit -Hn 131072
 ulimit -Sn 65536
 
 # Execute the program.
 eval exec $4
 ---
 
 And also a /etc/security/limits.d/tuning.conf (debian) like this:
 ---
 rootsoftnofile  65536
 roothardnofile  131072
 *   softnofile  16384
 *   hardnofile  65536
 ---
 
 Adjusted to your actual needs. There might be other limits you're hitting,
 but that is the most likely one
 
 Also 45 OSDs with 12 (24 with HT, bleah) CPU cores is pretty ballsy. 
 I personally would rather do 4 RAID6 (10 disks, with OSD SSD journals)
 with that kind of case and enjoy the fact that my OSDs never fail. ^o^
 
 Christian (another one)
 
 
 On 9/12/2014 10:15 AM, Christian Eichelmann wrote:
 Hi,

 I am running all commands as root, so there are no limits for the
 processes.

 Regards,
 Christian
 ___
 Von: Mariusz Gronczewski [mariusz.gronczew...@efigence.com]
 Gesendet: Freitag, 12. September 2014 15:33
 An: Christian Eichelmann
 Cc: ceph-users@lists.ceph.com
 Betreff: Re: [ceph-users] OSDs are crashing with Cannot fork or
 cannot create thread but plenty of memory is left

 do cat /proc/pid/limits

 probably you hit max processes limit or max FD limit

 Hi Ceph-Users,

 I have absolutely no idea what is going on on my systems...

 Hardware:
 45 x 4TB Harddisks
 2 x 6 Core CPUs
 256GB Memory

 When initializing all disks and join them to the cluster, after
 approximately 30 OSDs, other osds are crashing. When I try to start
 them again I see different kinds of errors. For example:


 Starting Ceph osd.316 on ceph-osd-bs04...already running
 === osd.317 ===
 Traceback (most recent call last):
File /usr/bin/ceph, line 830, in module
  sys.exit(main())
File /usr/bin/ceph, line 773, in main
  sigdict, inbuf, verbose)
File /usr/bin/ceph, line 420, in new_style_command
  inbuf=inbuf)
File /usr/lib/python2.7/dist-packages/ceph_argparse.py, line
 1112, in json_command
  raise RuntimeError('{0}: exception {1}'.format(cmd, e))
 NameError: global name 'cmd' is not defined
 Exception thread.error: error(can't start new thread,) in bound
 method Rados.__del__ of rados.Rados object
 at 0x29ee410 ignored


 or:
 /etc/init.d/ceph: 190: /etc/init.d/ceph: Cannot fork
 /etc/init.d/ceph: 191: /etc/init.d/ceph: Cannot fork
 /etc/init.d/ceph: 192: /etc/init.d/ceph: Cannot fork

 or:
 /usr/bin/ceph-crush-location: 72: /usr/bin/ceph-crush-location:
 Cannot fork /usr/bin/ceph-crush-location:
 79: /usr/bin/ceph-crush-location: Cannot fork Thread::try_create():
 pthread_create failed with error 11common/Thread.cc: In function
 'void Thread::create(size_t)' thread 7fcf768c9760 time 2014-09-12
 15:00:28.284735 common/Thread.cc: 110: FAILED assert(ret == 0)
   ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6

[ceph-users] OSDs are crashing with Cannot fork or cannot create thread but plenty of memory is left

2014-09-12 Thread Christian Eichelmann
Hi Ceph-Users,

I have absolutely no idea what is going on on my systems...

Hardware:
45 x 4TB Harddisks
2 x 6 Core CPUs
256GB Memory

When initializing all disks and join them to the cluster, after
approximately 30 OSDs, other osds are crashing. When I try to start them
again I see different kinds of errors. For example:


Starting Ceph osd.316 on ceph-osd-bs04...already running
=== osd.317 ===
Traceback (most recent call last):
  File /usr/bin/ceph, line 830, in module
sys.exit(main())
  File /usr/bin/ceph, line 773, in main
sigdict, inbuf, verbose)
  File /usr/bin/ceph, line 420, in new_style_command
inbuf=inbuf)
  File /usr/lib/python2.7/dist-packages/ceph_argparse.py, line 1112,
in json_command
raise RuntimeError('{0}: exception {1}'.format(cmd, e))
NameError: global name 'cmd' is not defined
Exception thread.error: error(can't start new thread,) in bound
method Rados.__del__ of rados.Rados object
at 0x29ee410 ignored


or:
/etc/init.d/ceph: 190: /etc/init.d/ceph: Cannot fork
/etc/init.d/ceph: 191: /etc/init.d/ceph: Cannot fork
/etc/init.d/ceph: 192: /etc/init.d/ceph: Cannot fork

or:
/usr/bin/ceph-crush-location: 72: /usr/bin/ceph-crush-location: Cannot fork
/usr/bin/ceph-crush-location: 79: /usr/bin/ceph-crush-location: Cannot fork
Thread::try_create(): pthread_create failed with error
11common/Thread.cc: In function 'void Thread::create(size_t)' thread
7fcf768c9760 time 2014-09-12 15:00:28.284735
common/Thread.cc: 110: FAILED assert(ret == 0)
 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
 1: /usr/bin/ceph-conf() [0x51de8f]
 2: (CephContext::CephContext(unsigned int)+0xb1) [0x520fe1]
 3: (common_preinit(CephInitParameters const, code_environment_t,
int)+0x48) [0x52eb78]
 4: (global_pre_init(std::vectorchar const*, std::allocatorchar
const* *, std::vectorchar const*, std::allocatorchar const* ,
unsigned int, code_environment_t, int)+0x8d) [0x518d0d]
 5: (main()+0x17a) [0x514f6a]
 6: (__libc_start_main()+0xfd) [0x7fcf7522ceed]
 7: /usr/bin/ceph-conf() [0x5168d1]
 NOTE: a copy of the executable, or `objdump -rdS executable` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped)
/etc/init.d/ceph: 340: /etc/init.d/ceph: Cannot fork
/etc/init.d/ceph: 1: /etc/init.d/ceph: Cannot fork
Traceback (most recent call last):
  File /usr/bin/ceph, line 830, in module
sys.exit(main())
  File /usr/bin/ceph, line 590, in main
conffile=conffile)
  File /usr/lib/python2.7/dist-packages/rados.py, line 198, in __init__
librados_path = find_library('rados')
  File /usr/lib/python2.7/ctypes/util.py, line 224, in find_library
return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
  File /usr/lib/python2.7/ctypes/util.py, line 213, in
_findSoname_ldconfig
f = os.popen('/sbin/ldconfig -p 2/dev/null')
OSError: [Errno 12] Cannot allocate memory

But anyways, when I look at the memory consumption of the system:
# free -m
 total   used   free sharedbuffers cached
Mem:258450  25841 232609  0 18  15506
-/+ buffers/cache:  10315 248135
Swap: 3811  0   3811


There are more then 230GB of memory available! What is going on there?
System:
Linux ceph-osd-bs04 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1
(2014-07-13) x86_64 GNU/Linux

Since this is happening on other Hardware as well, I don't think it's
Hardware related. I have no Idea if this is an OS issue (which would be
seriously strange) or a ceph issue.

Since this is happening only AFTER we upgraded to firefly, I guess it
has something to do with ceph.

ANY idea on what is going on here would be very appreciated!

Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs are crashing with Cannot fork or cannot create thread but plenty of memory is left

2014-09-12 Thread Christian Eichelmann
Hi,

I am running all commands as root, so there are no limits for the processes.

Regards,
Christian
___
Von: Mariusz Gronczewski [mariusz.gronczew...@efigence.com]
Gesendet: Freitag, 12. September 2014 15:33
An: Christian Eichelmann
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] OSDs are crashing with Cannot fork or cannot 
create thread but plenty of memory is left

do cat /proc/pid/limits

probably you hit max processes limit or max FD limit

 Hi Ceph-Users,

 I have absolutely no idea what is going on on my systems...

 Hardware:
 45 x 4TB Harddisks
 2 x 6 Core CPUs
 256GB Memory

 When initializing all disks and join them to the cluster, after
 approximately 30 OSDs, other osds are crashing. When I try to start them
 again I see different kinds of errors. For example:


 Starting Ceph osd.316 on ceph-osd-bs04...already running
 === osd.317 ===
 Traceback (most recent call last):
   File /usr/bin/ceph, line 830, in module
 sys.exit(main())
   File /usr/bin/ceph, line 773, in main
 sigdict, inbuf, verbose)
   File /usr/bin/ceph, line 420, in new_style_command
 inbuf=inbuf)
   File /usr/lib/python2.7/dist-packages/ceph_argparse.py, line 1112,
 in json_command
 raise RuntimeError('{0}: exception {1}'.format(cmd, e))
 NameError: global name 'cmd' is not defined
 Exception thread.error: error(can't start new thread,) in bound
 method Rados.__del__ of rados.Rados object
 at 0x29ee410 ignored


 or:
 /etc/init.d/ceph: 190: /etc/init.d/ceph: Cannot fork
 /etc/init.d/ceph: 191: /etc/init.d/ceph: Cannot fork
 /etc/init.d/ceph: 192: /etc/init.d/ceph: Cannot fork

 or:
 /usr/bin/ceph-crush-location: 72: /usr/bin/ceph-crush-location: Cannot fork
 /usr/bin/ceph-crush-location: 79: /usr/bin/ceph-crush-location: Cannot fork
 Thread::try_create(): pthread_create failed with error
 11common/Thread.cc: In function 'void Thread::create(size_t)' thread
 7fcf768c9760 time 2014-09-12 15:00:28.284735
 common/Thread.cc: 110: FAILED assert(ret == 0)
  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
  1: /usr/bin/ceph-conf() [0x51de8f]
  2: (CephContext::CephContext(unsigned int)+0xb1) [0x520fe1]
  3: (common_preinit(CephInitParameters const, code_environment_t,
 int)+0x48) [0x52eb78]
  4: (global_pre_init(std::vectorchar const*, std::allocatorchar
 const* *, std::vectorchar const*, std::allocatorchar const* ,
 unsigned int, code_environment_t, int)+0x8d) [0x518d0d]
  5: (main()+0x17a) [0x514f6a]
  6: (__libc_start_main()+0xfd) [0x7fcf7522ceed]
  7: /usr/bin/ceph-conf() [0x5168d1]
  NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.
 terminate called after throwing an instance of 'ceph::FailedAssertion'
 Aborted (core dumped)
 /etc/init.d/ceph: 340: /etc/init.d/ceph: Cannot fork
 /etc/init.d/ceph: 1: /etc/init.d/ceph: Cannot fork
 Traceback (most recent call last):
   File /usr/bin/ceph, line 830, in module
 sys.exit(main())
   File /usr/bin/ceph, line 590, in main
 conffile=conffile)
   File /usr/lib/python2.7/dist-packages/rados.py, line 198, in __init__
 librados_path = find_library('rados')
   File /usr/lib/python2.7/ctypes/util.py, line 224, in find_library
 return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
   File /usr/lib/python2.7/ctypes/util.py, line 213, in
 _findSoname_ldconfig
 f = os.popen('/sbin/ldconfig -p 2/dev/null')
 OSError: [Errno 12] Cannot allocate memory

 But anyways, when I look at the memory consumption of the system:
 # free -m
  total   used   free sharedbuffers cached
 Mem:258450  25841 232609  0 18  15506
 -/+ buffers/cache:  10315 248135
 Swap: 3811  0   3811


 There are more then 230GB of memory available! What is going on there?
 System:
 Linux ceph-osd-bs04 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1
 (2014-07-13) x86_64 GNU/Linux

 Since this is happening on other Hardware as well, I don't think it's
 Hardware related. I have no Idea if this is an OS issue (which would be
 seriously strange) or a ceph issue.

 Since this is happening only AFTER we upgraded to firefly, I guess it
 has something to do with ceph.

 ANY idea on what is going on here would be very appreciated!

 Regards,
 Christian
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczew...@efigence.com
mailto:mariusz.gronczew...@efigence.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] scrub error on firefly

2014-07-10 Thread Christian Eichelmann
I can also confirm that after upgrading to firefly both of our clusters (test 
and live) were going from 0 scrub errors each for about 6 Month to about 9-12 
per week...
This also makes me kind of nervous, since as far as I know everything ceph pg 
repair does, is to copy the primary object to all replicas, no matter which 
object is the correct one.
Of course the described method of manual checking works (for pools with more 
than 2 replicas), but doing this in a large cluster nearly every week is 
horribly timeconsuming and error prone.
It would be great to get an explanation for the increased numbers of scrub 
errors since firefly. Were they just not detected correctly in previous 
versions? Or is there maybe something wrong with the new code?

Acutally, our company is currently preventing our projects to move to ceph 
because of this problem.

Regards,
Christian

Von: ceph-users [ceph-users-boun...@lists.ceph.com] im Auftrag von Travis 
Rhoden [trho...@gmail.com]
Gesendet: Donnerstag, 10. Juli 2014 16:24
An: Gregory Farnum
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] scrub error on firefly

And actually just to follow-up, it does seem like there are some additional 
smarts beyond just using the primary to overwrite the secondaries...  Since I 
captured md5 sums before and after the repair, I can say that in this 
particular instance, the secondary copy was used to overwrite the primary.  So, 
I'm just trusting Ceph to the right thing, and so far it seems to, but the 
comments here about needing to determine the correct object and place it on the 
primary PG make me wonder if I've been missing something.

 - Travis


On Thu, Jul 10, 2014 at 10:19 AM, Travis Rhoden 
trho...@gmail.commailto:trho...@gmail.com wrote:
I can also say that after a recent upgrade to Firefly, I have experienced 
massive uptick in scrub errors.  The cluster was on cuttlefish for about a 
year, and had maybe one or two scrub errors.  After upgrading to Firefly, we've 
probably seen 3 to 4 dozen in the last month or so (was getting 2-3 a day for a 
few weeks until the whole cluster was rescrubbed, it seemed).

What I cannot determine, however, is how to know which object is busted?  For 
example, just today I ran into a scrub error.  The object has two copies and is 
an 8MB piece of an RBD, and has identical timestamps, identical xattrs names 
and values.  But it definitely has a different MD5 sum. How to know which one 
is correct?

I've been just kicking off pg repair each time, which seems to just use the 
primary copy to overwrite the others.  Haven't run into any issues with that so 
far, but it does make me nervous.

 - Travis


On Tue, Jul 8, 2014 at 1:06 AM, Gregory Farnum 
g...@inktank.commailto:g...@inktank.com wrote:
It's not very intuitive or easy to look at right now (there are plans
from the recent developer summit to improve things), but the central
log should have output about exactly what objects are busted. You'll
then want to compare the copies manually to determine which ones are
good or bad, get the good copy on the primary (make sure you preserve
xattrs), and run repair.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Jul 7, 2014 at 6:48 PM, Randy Smith 
rbsm...@adams.edumailto:rbsm...@adams.edu wrote:
 Greetings,

 I upgraded to firefly last week and I suddenly received this error:

 health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors

 ceph health detail shows the following:

 HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
 pg 3.c6 is active+clean+inconsistent, acting [2,5]
 1 scrub errors

 The docs say that I can run `ceph pg repair 3.c6` to fix this. What I want
 to know is what are the risks of data loss if I run that command in this
 state and how can I mitigate them?

 --
 Randall Smith
 Computing Services
 Adams State University
 http://www.adams.edu/
 719-587-7741tel:719-587-7741

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] external monitoring tools for ceph

2014-07-01 Thread Christian Eichelmann
Hi all,

if it should be nagios/icinga and not Zabbix, there is a remote check
from me that can be found here:

https://github.com/Crapworks/check_ceph_dash

This one uses ceph-dash to monitor the overall cluster status via http:

https://github.com/Crapworks/ceph-dash

But it can be easily adopted to work together with ceph-rest-api since
the outout is nearly the same.

Regards,
Christian


Am 01.07.2014 10:24, schrieb Pierre BLONDEAU:
 Hi,
 
 May be you can use that : https://github.com/thelan/ceph-zabbix, but i
 am interested to view Craig's script and template.
 
 Regards
 
 Le 01/07/2014 10:16, Georgios Dimitrakakis a écrit :
 Hi Craig,

 I am also interested at the Zabbix templates and scripts if you can
 publish them.

 Regards,

 G.

 On Mon, 30 Jun 2014 18:15:12 -0700, Craig Lewis wrote:
 You should check out Calamari (https://github.com/ceph/calamari [3]),
 Inktanks monitoring and administration tool.

  I started before Calamari was announced, so I rolled my own using
 using Zabbix.  It handles all the monitoring, graphing, and alerting
 in one tool.  Its kind of a pain to setup, but works ok now that its
 going.
 I dont know how to handle the cluster view though.  Im monitoring
 individual machines.  Whenever something happens, like an OSD stops
 responding, I get an alert from every monitor.  Otherwise its not a
 big deal.

 Im in the middle of re-factoring the data gathering from poll to push.
  If youre interested, I can publish my templates and scripts when Im
 done.

 On Sun, Jun 29, 2014 at 1:17 AM, pragya jain  wrote:

 Hello all,

 I am working on ceph storage cluster with rados gateway for object
 storage.
 I am looking for external monitoring tools that can be used to
 monitor ceph storage cluster and rados gateway interface.
 I find various monitoring tools, such as nagios, collectd, ganglia,
 diamond, sensu, logstash.
 but i dont get details of anyone about what features do these
 monitoring tools monitor in ceph.

 Has somebody implemented anyone of these tools?

 Can somebody help me in identifying the features provided by these
 tools?

 Is there any other tool which can also be used to monitor ceph
 specially for object storage?

 Regards
 Pragya Jain
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com [1]
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]



 Links:
 --
 [1] mailto:ceph-users@lists.ceph.com
 [2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 [3] https://github.com/ceph/calamari
 [4] mailto:prag_2...@yahoo.co.in

 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Behaviour of ceph pg repair on different replication levels

2014-06-23 Thread Christian Eichelmann
Hi ceph users,

since our cluster had a few inconsistent pgs in the last time, i was
wondering what ceph pg repair does, depending on the replication level.
So I just wanted to check if my assumptions are correct:

Replication 2x
Since the cluster can not decide which version is correct one, it would
just copy the primary copy (the active one) over the secondary copy.
Which is a 50/50 chance to get the correct version.

Replication 3x or more
Now the cluster has a quorum and a ceph pg repair will replace the
corrupt replica with one of the correct one. No manual intervention needed.

Am I on the right way?

Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG Scrub Error / active+clean+inconsistent

2014-06-10 Thread Christian Eichelmann
Hi all,

after coming back from a long weekend, I found my production cluster in
an error state, mentioning 6 scrub errors and 6 pg's in
active+clean+inconsistent state.

Strange is, that my Prelive-Cluster, running on different Hardware, are
also showing 1 scrub error and 1 inconsisten pg...

pg dump shows that 6 different OSD's are affected. I will check again
for some Hardware Errors, but since the hardware is quite new, and none
of our monitoring checks found disk errors, I'm not sure about it.

What can be the cause of such a problem? And, what is also interesting,
how to recover from it? :)

Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG Scrub Error / active+clean+inconsistent

2014-06-10 Thread Christian Eichelmann
Hi again,

just found the ceph pg repair command :) Now both clusters are OK again.
Anyways, I'm really interested in the caus of the problem.

Regards,
Christian

Am 10.06.2014 10:28, schrieb Christian Eichelmann:
 Hi all,
 
 after coming back from a long weekend, I found my production cluster in
 an error state, mentioning 6 scrub errors and 6 pg's in
 active+clean+inconsistent state.
 
 Strange is, that my Prelive-Cluster, running on different Hardware, are
 also showing 1 scrub error and 1 inconsisten pg...
 
 pg dump shows that 6 different OSD's are affected. I will check again
 for some Hardware Errors, but since the hardware is quite new, and none
 of our monitoring checks found disk errors, I'm not sure about it.
 
 What can be the cause of such a problem? And, what is also interesting,
 how to recover from it? :)
 
 Regards,
 Christian
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Nagios Check for Ceph-Dash

2014-06-02 Thread Christian Eichelmann
Hi Folks!

For those of you, who are using ceph-dash
(https://github.com/Crapworks/ceph-dash), I've created a Nagios-Plugin,
that uses the json endpoint to monitor your cluster remotely:

* https://github.com/Crapworks/check_ceph_dash

I think this can be easily adopted to use the ceph-rest-api as well.
Since ceph-dash is completely read-only, there are less security
considerations about exposing this api to your monitoring system.

Any feedback is welcome!

Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] visualizing a ceph cluster automatically

2014-05-16 Thread Christian Eichelmann
I have written a small and lightweight gui, which can also acts as a json rest 
api (for non-interactive monitoring):

https://github.com/Crapworks/ceph-dash

Maybe thats what you searching for.

Regards,
Christian

Von: ceph-users [ceph-users-boun...@lists.ceph.com] im Auftrag von Drew 
Weaver [drew.wea...@thenap.com]
Gesendet: Freitag, 16. Mai 2014 14:01
An: 'ceph-users@lists.ceph.com'
Betreff: [ceph-users] visualizing a ceph cluster automatically

Does anyone know of any tools that help you visually monitor a ceph cluster 
automatically?

Something that is host, osd, mon aware and shows various status of components, 
etc?

Thanks,
-Drew
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com