Re: XFS stack overflow
Does anyone know how this may be accomplished? Thank you On Thu, Dec 15, 2011 at 11:32 AM, Ryan C. England ryan.engl...@corvidtec.com wrote: Denice, I have spoken with a couple of the guys on the xfs mailing list. The quick fix would seem to be recompiling the kernel to support a 16K kernel stack. I've spent a few hours researching and have been unable to locate anything relative to the 2.6.32 kernel. It's not easy finding anything regarding a patch, or recompiling the kernel to support this feature, let along finding anything relative to these operations for 2.6.32. Any suggestions? Thank you -- Forwarded message -- From: Dave Chinner da...@fromorbit.com Date: Mon, Dec 12, 2011 at 5:47 PM Subject: Re: XFS causing stack overflow To: Ryan C. England ryan.engl...@corvidtec.com Cc: Andi Kleen a...@firstfloor.org, Christoph Hellwig h...@infradead.org, linux...@kvack.org, x...@oss.sgi.com On Mon, Dec 12, 2011 at 08:43:57AM -0500, Ryan C. England wrote: On Mon, Dec 12, 2011 at 4:00 AM, Dave Chinner da...@fromorbit.com wrote: On Mon, Dec 12, 2011 at 06:13:11AM +0100, Andi Kleen wrote: BTW I suppose it wouldn't be all that hard to add more stacks and switch to them too, similar to what the 32bit do_IRQ does. Perhaps XFS could just allocate its own stack per thread (or maybe only if it detects some specific configuration that is known to need much stack) That's possible, but rather complex, I think. It would need to be per thread if you could sleep inside them. Yes, we'd need to sleep, do IO, possibly operate within a transaction context, etc, and a workqueue handles all these cases without having to do anything special. Splitting the stack at a logical point is probably better, such as this patch: http://oss.sgi.com/archives/xfs/2011-07/msg00443.html Is it possible to apply this patch to my current installation? We use this box in production and the reboots that we're experiencing are an inconvenience. Not easily. The problem with a backport is that the workqueue infrastructure changed around 2.6.36, allowing workqueues to act like an (almost) infinite pool of worker threads and so by using a workqueue we can have effectively unlimited numbers of concurrent allocations in progress at once. The workqueue implementation in 2.6.32 only allows a single work instance per workqueue thread, and so even with per-CPU worker threads, would only allow one allocation at a time per CPU. This adds additional serialisation within a filesystem, between filesystem and potentially adds new deadlock conditions as well. So it's not exactly obvious whether it can be backported in a sane manner or not. Is there is a walkthrough on how to apply this patch? If not, could your provide the steps necessary to apply successfully? I would greatly appreciate it. It would probably need redesigning and re-implementing from scratch because of the above reasons. It'd then need a lot of testing and review. As a workaround, you might be better off doing what Andi first suggested - recompiling your kernel to use 16k stacks. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Ryan C. England Corvid Technologies http://www.corvidtec.com/ office: 704-799-6944 x158 cell:980-521-2297 -- Ryan C. England Corvid Technologies http://www.corvidtec.com/ office: 704-799-6944 x158 cell:980-521-2297
Re: Move a SL6 server from md software raid 5 to hardware raid 5
First take a complete backup of the md raid. Then if the laws if Innate Perversity of Inanimate Objects you'll be able to move the disks and have them just work. Your data is protected. (If you had no backup IPIO would, of course, lead to the transition failing expensively.) Even if IPIO does not work you restore from the complete backup to the same disks they were on after the hardware RAID assembles itself. (Despite the numerous times IPIO seems to work, I still figure it's a silly superstition. It does lead to a correct degree of paranoia, though.) {^_^} On 2011/12/19 09:18, Felip Moll wrote: Well, I will remake my question to not scare possible answerers: How to move a SL6.0 system with md raid (raid per software), to another server without mantaining the raid per software? Thanks! 2011/12/16 Felip Moll lip...@gmail.com mailto:lip...@gmail.com Hello all! Recently I installed and configured a Scientific Linux to run as a high performance computing cluster with 15 slave nodes and one master. I did this while an older system with RedHat 5.0 was running in order to avoid users to stop their computations. All gone well. I migrated node to node and now I have a flawlessly cluster with SL6!. Well, the fact is that while migrating I used the node1 to install SL6 while the node0 was hosting the old master operating system. Node1 has less ram and no raid capabilities, so I configured a Raid5 per software when installing, using md linux software (which comes per default to a normal installation when you select raid). Node0 has a Raid 5 hardware controller. Now I want to move the new master node1, into node0. I thought about this and I have to shutdown node1, node0, and with a LiveCD partition the harddisk of node0 and copy the contents of the disk of node1 into it. Then make grub install. All right but, what do you think that I should take in consideration regarding to Raid and md? I will have to modify /etc/fstab and also delete /etc/mdadm.conf to avoid md running. Anything more? Thank you very much!
Re: Move a SL6 server from md software raid 5 to hardware raid 5
Doing it this way seems to be a high risk operation. Furthermore I want not do this because then I will have two raids: one raid per software (md) into one per hardware.. my thoughts are about copying manually the dirs of the operating system, then modifying configurations.. I think it is a more secure process. Thanks for the answer jdow ;) 2011/12/20 jdow j...@earthlink.net First take a complete backup of the md raid. Then if the laws if Innate Perversity of Inanimate Objects you'll be able to move the disks and have them just work. Your data is protected. (If you had no backup IPIO would, of course, lead to the transition failing expensively.) Even if IPIO does not work you restore from the complete backup to the same disks they were on after the hardware RAID assembles itself. (Despite the numerous times IPIO seems to work, I still figure it's a silly superstition. It does lead to a correct degree of paranoia, though.) {^_^} On 2011/12/19 09:18, Felip Moll wrote: Well, I will remake my question to not scare possible answerers: How to move a SL6.0 system with md raid (raid per software), to another server without mantaining the raid per software? Thanks! 2011/12/16 Felip Moll lip...@gmail.com mailto:lip...@gmail.com Hello all! Recently I installed and configured a Scientific Linux to run as a high performance computing cluster with 15 slave nodes and one master. I did this while an older system with RedHat 5.0 was running in order to avoid users to stop their computations. All gone well. I migrated node to node and now I have a flawlessly cluster with SL6!. Well, the fact is that while migrating I used the node1 to install SL6 while the node0 was hosting the old master operating system. Node1 has less ram and no raid capabilities, so I configured a Raid5 per software when installing, using md linux software (which comes per default to a normal installation when you select raid). Node0 has a Raid 5 hardware controller. Now I want to move the new master node1, into node0. I thought about this and I have to shutdown node1, node0, and with a LiveCD partition the harddisk of node0 and copy the contents of the disk of node1 into it. Then make grub install. All right but, what do you think that I should take in consideration regarding to Raid and md? I will have to modify /etc/fstab and also delete /etc/mdadm.conf to avoid md running. Anything more? Thank you very much!
Re: Move a SL6 server from md software raid 5 to hardware raid 5
Not as I see it. You take a backup to a large disk, or disks as the case may be. That is your safety net. Then you try the md disks in the hard raid controller. If they work, Bob's your uncle. If they do not work then create the proper raid configuration on the hardware controller with the md disks and copy in the backup. Perform the copying using a live CD to the extent you can. At no time do you end up with twin RAID arrays. Of course, if you have enough disks simply copy the md raid as an disk to the hard raid as a disk. tar or dd imaging can work. If you use different disks in the RAIDs then use tar or even cpio to copy the files rather than copy a pure image. That will tend to optimize the partitioning to use the drive's actual internal block size for creating partition boundaries. {^_^} On 2011/12/19 15:43, Felip Moll wrote: Doing it this way seems to be a high risk operation. Furthermore I want not do this because then I will have two raids: one raid per software (md) into one per hardware.. my thoughts are about copying manually the dirs of the operating system, then modifying configurations.. I think it is a more secure process. Thanks for the answer jdow ;) 2011/12/20 jdow j...@earthlink.net mailto:j...@earthlink.net First take a complete backup of the md raid. Then if the laws if Innate Perversity of Inanimate Objects you'll be able to move the disks and have them just work. Your data is protected. (If you had no backup IPIO would, of course, lead to the transition failing expensively.) Even if IPIO does not work you restore from the complete backup to the same disks they were on after the hardware RAID assembles itself. (Despite the numerous times IPIO seems to work, I still figure it's a silly superstition. It does lead to a correct degree of paranoia, though.) {^_^} On 2011/12/19 09:18, Felip Moll wrote: Well, I will remake my question to not scare possible answerers: How to move a SL6.0 system with md raid (raid per software), to another server without mantaining the raid per software? Thanks! 2011/12/16 Felip Moll lip...@gmail.com mailto:lip...@gmail.com mailto:lip...@gmail.com mailto:lip...@gmail.com Hello all! Recently I installed and configured a Scientific Linux to run as a high performance computing cluster with 15 slave nodes and one master. I did this while an older system with RedHat 5.0 was running in order to avoid users to stop their computations. All gone well. I migrated node to node and now I have a flawlessly cluster with SL6!. Well, the fact is that while migrating I used the node1 to install SL6 while the node0 was hosting the old master operating system. Node1 has less ram and no raid capabilities, so I configured a Raid5 per software when installing, using md linux software (which comes per default to a normal installation when you select raid). Node0 has a Raid 5 hardware controller. Now I want to move the new master node1, into node0. I thought about this and I have to shutdown node1, node0, and with a LiveCD partition the harddisk of node0 and copy the contents of the disk of node1 into it. Then make grub install. All right but, what do you think that I should take in consideration regarding to Raid and md? I will have to modify /etc/fstab and also delete /etc/mdadm.conf to avoid md running. Anything more? Thank you very much!