Re: Adding a delay when restarting all OSDs on a host
Default stack size shouldn't matter. At least it's not an issue on a kernel with over-commit turned on (default). Most threads / apps never use that many stack frames (in fact they use a fraction of that), thus the kernel doesn't bother allocating the pages to it. My bet is on some other resource. On 7/23/14, 3:22 PM, Vit Yenukas wrote: > Just some fun fact pertaining to the resources consumption during startup > sequence - > we've ran out of memory on a 72-disk server with 256GB RAM during the startup. > ceph-osd dies with 'can not fork' and cores. There were in excess of 40 > thousands > threads when this began to happen. With default thread stack size being 8MB, > no wonder :) > Note that this was in an experimental setup with just one node, so all OSDs > peering happens on the same host. > Just for heck of it, I reduced the number of OSDs by two (to 36 OSDs) by > setting up a soft RAID-0 for each disk pair. > This worked after some tweaking of udev rules (that ignore 'md' block devs). > I'm not sure if we're going to see > the same problem with real cluster (18 such 72-disk nodes), with EC 9-3. > Also, not sure if reducing user proc stack to 4MB would be a good idea. > > On 07/22/2014 08:08 PM, Gregory Farnum wrote: > >> On Tue, Jul 22, 2014 at 6:19 AM, Wido den Hollander wrote: >>> Hi, >>> >>> Currently on Ubuntu with Upstart when you invoke a restart like this: >>> >>> $ sudo restart ceph-osd-all >>> >>> It will restart all OSDs at once, which can increase the load on the system >>> a quite a bit. >>> >>> It's better to restart all OSDs by restarting them one by one: >>> >>> $ sudo ceph restart ceph-osd id=X >>> >>> But you then have to figure out all the IDs by doing a find in >>> /var/lib/ceph/osd and that's more manual work. >>> >>> I'm thinking of patching the init scripts which allows something like this: >>> >>> $ sudo restart ceph-osd-all delay=180 >>> >>> It then waits 180 seconds between each OSD restart making the proces even >>> smoother. >>> >>> I know there are currently sysvinit, upstart and systemd scripts, so it has >>> to be implemented on various places, but how does the general idea sound? >> That sounds like a good idea to me. I presume you're meaning to >> actually delay the restarts, not just turning them on, so that the >> daemons all remain alive (that's what it sounds like to me here, just >> wanted to clarify). >> -Greg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adding a delay when restarting all OSDs on a host
Just some fun fact pertaining to the resources consumption during startup sequence - we've ran out of memory on a 72-disk server with 256GB RAM during the startup. ceph-osd dies with 'can not fork' and cores. There were in excess of 40 thousands threads when this began to happen. With default thread stack size being 8MB, no wonder :) Note that this was in an experimental setup with just one node, so all OSDs peering happens on the same host. Just for heck of it, I reduced the number of OSDs by two (to 36 OSDs) by setting up a soft RAID-0 for each disk pair. This worked after some tweaking of udev rules (that ignore 'md' block devs). I'm not sure if we're going to see the same problem with real cluster (18 such 72-disk nodes), with EC 9-3. Also, not sure if reducing user proc stack to 4MB would be a good idea. On 07/22/2014 08:08 PM, Gregory Farnum wrote: > On Tue, Jul 22, 2014 at 6:19 AM, Wido den Hollander wrote: >> Hi, >> >> Currently on Ubuntu with Upstart when you invoke a restart like this: >> >> $ sudo restart ceph-osd-all >> >> It will restart all OSDs at once, which can increase the load on the system >> a quite a bit. >> >> It's better to restart all OSDs by restarting them one by one: >> >> $ sudo ceph restart ceph-osd id=X >> >> But you then have to figure out all the IDs by doing a find in >> /var/lib/ceph/osd and that's more manual work. >> >> I'm thinking of patching the init scripts which allows something like this: >> >> $ sudo restart ceph-osd-all delay=180 >> >> It then waits 180 seconds between each OSD restart making the proces even >> smoother. >> >> I know there are currently sysvinit, upstart and systemd scripts, so it has >> to be implemented on various places, but how does the general idea sound? > > That sounds like a good idea to me. I presume you're meaning to > actually delay the restarts, not just turning them on, so that the > daemons all remain alive (that's what it sounds like to me here, just > wanted to clarify). > -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adding a delay when restarting all OSDs on a host
On Tue, Jul 22, 2014 at 6:19 AM, Wido den Hollander wrote: > Hi, > > Currently on Ubuntu with Upstart when you invoke a restart like this: > > $ sudo restart ceph-osd-all > > It will restart all OSDs at once, which can increase the load on the system > a quite a bit. > > It's better to restart all OSDs by restarting them one by one: > > $ sudo ceph restart ceph-osd id=X > > But you then have to figure out all the IDs by doing a find in > /var/lib/ceph/osd and that's more manual work. > > I'm thinking of patching the init scripts which allows something like this: > > $ sudo restart ceph-osd-all delay=180 > > It then waits 180 seconds between each OSD restart making the proces even > smoother. > > I know there are currently sysvinit, upstart and systemd scripts, so it has > to be implemented on various places, but how does the general idea sound? That sounds like a good idea to me. I presume you're meaning to actually delay the restarts, not just turning them on, so that the daemons all remain alive (that's what it sounds like to me here, just wanted to clarify). -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adding a delay when restarting all OSDs on a host
On Tue, Jul 22, 2014 at 6:28 PM, Wido den Hollander wrote: > On 07/22/2014 03:48 PM, Andrey Korolyov wrote: >> >> On Tue, Jul 22, 2014 at 5:19 PM, Wido den Hollander wrote: >>> >>> Hi, >>> >>> Currently on Ubuntu with Upstart when you invoke a restart like this: >>> >>> $ sudo restart ceph-osd-all >>> >>> It will restart all OSDs at once, which can increase the load on the >>> system >>> a quite a bit. >>> >>> It's better to restart all OSDs by restarting them one by one: >>> >>> $ sudo ceph restart ceph-osd id=X >>> >>> But you then have to figure out all the IDs by doing a find in >>> /var/lib/ceph/osd and that's more manual work. >>> >>> I'm thinking of patching the init scripts which allows something like >>> this: >>> >>> $ sudo restart ceph-osd-all delay=180 >>> >>> It then waits 180 seconds between each OSD restart making the proces even >>> smoother. >>> >>> I know there are currently sysvinit, upstart and systemd scripts, so it >>> has >>> to be implemented on various places, but how does the general idea sound? >>> >>> -- >>> Wido den Hollander >>> Ceph consultant and trainer >>> 42on B.V. >>> >>> Phone: +31 (0)20 700 9902 >>> Skype: contact42on >>> -- >> >> >> >> Hi, >> >> this behaviour obviously have a negative side of increased overall >> peering time and larger integral value of out-of-SLA delays. I`d vote >> for warming up necessary files, most likely collections, just before >> restart. If there are no enough room to hold all of them at once, we >> can probably combine both methods to achieve lower impact value on >> restart, although adding a simple delay sounds much more straight than >> putting file cache to ram. >> > > In the case I'm talking about there are 23 OSDs running on a single machine > and restarting all the OSDs causes a lot of peering and reading PG logs. > > A warm-up mechanism might work, but that would be a lot of work. > > When upgrading your cluster you simply want to do this: > > $ dsh -g ceph-osd "sudo restart ceph-osd-all delay=180" > > That might take hours to complete, but if it's just an upgrade that doesn't > matter. You want as minimal impact on service as possible. > I may suggest to measure impact with vmtouch[0], it decreased OSD startup time greatly on mine tests, but I was stuck with same resource exhaustion as before after OSD marked itself up (IOPS ceiling primarily). 0. http://hoytech.com/vmtouch/ -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adding a delay when restarting all OSDs on a host
On 07/22/2014 03:48 PM, Andrey Korolyov wrote: On Tue, Jul 22, 2014 at 5:19 PM, Wido den Hollander wrote: Hi, Currently on Ubuntu with Upstart when you invoke a restart like this: $ sudo restart ceph-osd-all It will restart all OSDs at once, which can increase the load on the system a quite a bit. It's better to restart all OSDs by restarting them one by one: $ sudo ceph restart ceph-osd id=X But you then have to figure out all the IDs by doing a find in /var/lib/ceph/osd and that's more manual work. I'm thinking of patching the init scripts which allows something like this: $ sudo restart ceph-osd-all delay=180 It then waits 180 seconds between each OSD restart making the proces even smoother. I know there are currently sysvinit, upstart and systemd scripts, so it has to be implemented on various places, but how does the general idea sound? -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- Hi, this behaviour obviously have a negative side of increased overall peering time and larger integral value of out-of-SLA delays. I`d vote for warming up necessary files, most likely collections, just before restart. If there are no enough room to hold all of them at once, we can probably combine both methods to achieve lower impact value on restart, although adding a simple delay sounds much more straight than putting file cache to ram. In the case I'm talking about there are 23 OSDs running on a single machine and restarting all the OSDs causes a lot of peering and reading PG logs. A warm-up mechanism might work, but that would be a lot of work. When upgrading your cluster you simply want to do this: $ dsh -g ceph-osd "sudo restart ceph-osd-all delay=180" That might take hours to complete, but if it's just an upgrade that doesn't matter. You want as minimal impact on service as possible. -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adding a delay when restarting all OSDs on a host
On Tue, Jul 22, 2014 at 5:19 PM, Wido den Hollander wrote: > Hi, > > Currently on Ubuntu with Upstart when you invoke a restart like this: > > $ sudo restart ceph-osd-all > > It will restart all OSDs at once, which can increase the load on the system > a quite a bit. > > It's better to restart all OSDs by restarting them one by one: > > $ sudo ceph restart ceph-osd id=X > > But you then have to figure out all the IDs by doing a find in > /var/lib/ceph/osd and that's more manual work. > > I'm thinking of patching the init scripts which allows something like this: > > $ sudo restart ceph-osd-all delay=180 > > It then waits 180 seconds between each OSD restart making the proces even > smoother. > > I know there are currently sysvinit, upstart and systemd scripts, so it has > to be implemented on various places, but how does the general idea sound? > > -- > Wido den Hollander > Ceph consultant and trainer > 42on B.V. > > Phone: +31 (0)20 700 9902 > Skype: contact42on > -- Hi, this behaviour obviously have a negative side of increased overall peering time and larger integral value of out-of-SLA delays. I`d vote for warming up necessary files, most likely collections, just before restart. If there are no enough room to hold all of them at once, we can probably combine both methods to achieve lower impact value on restart, although adding a simple delay sounds much more straight than putting file cache to ram. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Adding a delay when restarting all OSDs on a host
Hi, Currently on Ubuntu with Upstart when you invoke a restart like this: $ sudo restart ceph-osd-all It will restart all OSDs at once, which can increase the load on the system a quite a bit. It's better to restart all OSDs by restarting them one by one: $ sudo ceph restart ceph-osd id=X But you then have to figure out all the IDs by doing a find in /var/lib/ceph/osd and that's more manual work. I'm thinking of patching the init scripts which allows something like this: $ sudo restart ceph-osd-all delay=180 It then waits 180 seconds between each OSD restart making the proces even smoother. I know there are currently sysvinit, upstart and systemd scripts, so it has to be implemented on various places, but how does the general idea sound? -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html