Re: XFS stack overflow

2011-12-19 Thread Ryan C. England
Does anyone know how this may be accomplished?

Thank you

On Thu, Dec 15, 2011 at 11:32 AM, Ryan C. England 
ryan.engl...@corvidtec.com wrote:

 Denice,

 I have spoken with a couple of the guys on the xfs mailing list.  The
 quick fix would seem to be recompiling the kernel to support a 16K kernel
 stack.

 I've spent a few hours researching and have been unable to locate anything
 relative to the 2.6.32 kernel.  It's not easy finding anything regarding a
 patch, or recompiling the kernel to support this feature, let along finding
 anything relative to these operations for 2.6.32.  Any suggestions?

 Thank you

 -- Forwarded message --
 From: Dave Chinner da...@fromorbit.com
 Date: Mon, Dec 12, 2011 at 5:47 PM
 Subject: Re: XFS causing stack overflow
 To: Ryan C. England ryan.engl...@corvidtec.com
 Cc: Andi Kleen a...@firstfloor.org, Christoph Hellwig h...@infradead.org,
 linux...@kvack.org, x...@oss.sgi.com


 On Mon, Dec 12, 2011 at 08:43:57AM -0500, Ryan C. England wrote:
  On Mon, Dec 12, 2011 at 4:00 AM, Dave Chinner da...@fromorbit.com
 wrote:
   On Mon, Dec 12, 2011 at 06:13:11AM +0100, Andi Kleen wrote:
BTW I suppose it wouldn't be all that hard to add more stacks and
switch to them too, similar to what the 32bit do_IRQ does.
Perhaps XFS could just allocate its own stack per thread
(or maybe only if it detects some specific configuration that
is known to need much stack)
  
   That's possible, but rather complex, I think.
It would need to be per thread if you could sleep inside them.
  
   Yes, we'd need to sleep, do IO, possibly operate within a
   transaction context, etc, and a workqueue handles all these cases
   without having to do anything special. Splitting the stack at a
   logical point is probably better, such as this patch:
  
   http://oss.sgi.com/archives/xfs/2011-07/msg00443.html
 
  Is it possible to apply this patch to my current installation?  We use
 this
  box in production and the reboots that we're experiencing are an
  inconvenience.

 Not easily. The problem with a backport is that the workqueue
 infrastructure changed around 2.6.36, allowing workqueues to act
 like an (almost) infinite pool of worker threads and so by using a
 workqueue we can have effectively unlimited numbers of concurrent
 allocations in progress at once.

 The workqueue implementation in 2.6.32 only allows a single work
 instance per workqueue thread, and so even with per-CPU worker
 threads, would only allow one allocation at a time per CPU. This
 adds additional serialisation within a filesystem, between
 filesystem and potentially adds new deadlock conditions as well.

 So it's not exactly obvious whether it can be backported in a sane
 manner or not.

  Is there is a walkthrough on how to apply this patch?  If not, could your
  provide the steps necessary to apply successfully?  I would greatly
  appreciate it.

 It would probably need redesigning and re-implementing from scratch
 because of the above reasons. It'd then need a lot of testing and
 review. As a workaround, you might be better off doing what Andi
 first suggested - recompiling your kernel to use 16k stacks.

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



 --
 Ryan C. England
 Corvid Technologies http://www.corvidtec.com/
 office: 704-799-6944 x158
 cell:980-521-2297




-- 
Ryan C. England
Corvid Technologies http://www.corvidtec.com/
office: 704-799-6944 x158
cell:980-521-2297


Re: Move a SL6 server from md software raid 5 to hardware raid 5

2011-12-19 Thread jdow

First take a complete backup of the md raid.

Then if the laws if Innate Perversity of Inanimate Objects you'll be able to
move the disks and have them just work. Your data is protected. (If you had
no backup IPIO would, of course, lead to the transition failing expensively.)

Even if IPIO does not work you restore from the complete backup to the same
disks they were on after the hardware RAID assembles itself. (Despite the
numerous times IPIO seems to work, I still figure it's a silly superstition.
It does lead to a correct degree of paranoia, though.)

{^_^}

On 2011/12/19 09:18, Felip Moll wrote:

Well, I will remake my question to not scare possible answerers:

How to move a SL6.0 system with md raid (raid per software), to another server
without mantaining the raid per software?

Thanks!



2011/12/16 Felip Moll lip...@gmail.com mailto:lip...@gmail.com

Hello all!

Recently I installed and configured a Scientific Linux to run as a high
performance computing cluster with 15 slave nodes and one master. I did this
while an older system with RedHat 5.0 was running in order
to avoid users to stop their computations. All gone well. I migrated node to
node and now I have a flawlessly cluster with SL6!.

Well, the fact is that while migrating I used the node1 to install SL6 while
the node0 was hosting the old master operating system. Node1 has less ram
and no raid capabilities, so I configured a Raid5 per software when
installing, using md linux software (which comes per default to a normal
installation when you select raid). Node0 has a Raid 5 hardware 
controller.

Now I want to move the new master node1, into node0. I thought about this
and I have to shutdown node1, node0, and with a LiveCD partition the
harddisk of node0 and copy the contents of the disk of node1 into it. Then
make grub install.

All right but, what do you think that I should take in consideration
regarding to Raid and md? I will have to modify /etc/fstab and also delete
/etc/mdadm.conf to avoid md running. Anything more?

Thank you very much!




Re: Move a SL6 server from md software raid 5 to hardware raid 5

2011-12-19 Thread Felip Moll
Doing it this way seems to be a high risk operation.

Furthermore I want not do this because then I will have two raids: one raid
per software (md) into one per hardware.. my thoughts are about copying
manually the dirs of the operating system, then modifying configurations..
I think it is a more secure process.

Thanks for the answer jdow ;)


2011/12/20 jdow j...@earthlink.net

 First take a complete backup of the md raid.

 Then if the laws if Innate Perversity of Inanimate Objects you'll be able
 to
 move the disks and have them just work. Your data is protected. (If you had
 no backup IPIO would, of course, lead to the transition failing
 expensively.)

 Even if IPIO does not work you restore from the complete backup to the same
 disks they were on after the hardware RAID assembles itself. (Despite the
 numerous times IPIO seems to work, I still figure it's a silly
 superstition.
 It does lead to a correct degree of paranoia, though.)

 {^_^}


 On 2011/12/19 09:18, Felip Moll wrote:

 Well, I will remake my question to not scare possible answerers:

 How to move a SL6.0 system with md raid (raid per software), to another
 server
 without mantaining the raid per software?

 Thanks!



 2011/12/16 Felip Moll lip...@gmail.com mailto:lip...@gmail.com


Hello all!

Recently I installed and configured a Scientific Linux to run as a high
performance computing cluster with 15 slave nodes and one master. I
 did this
while an older system with RedHat 5.0 was running in order
to avoid users to stop their computations. All gone well. I migrated
 node to
node and now I have a flawlessly cluster with SL6!.

Well, the fact is that while migrating I used the node1 to install SL6
 while
the node0 was hosting the old master operating system. Node1 has less
 ram
and no raid capabilities, so I configured a Raid5 per software when
installing, using md linux software (which comes per default to a
 normal
installation when you select raid). Node0 has a Raid 5 hardware
 controller.

Now I want to move the new master node1, into node0. I thought about
 this
and I have to shutdown node1, node0, and with a LiveCD partition the
harddisk of node0 and copy the contents of the disk of node1 into it.
 Then
make grub install.

All right but, what do you think that I should take in consideration
regarding to Raid and md? I will have to modify /etc/fstab and also
 delete
/etc/mdadm.conf to avoid md running. Anything more?

Thank you very much!





Re: Move a SL6 server from md software raid 5 to hardware raid 5

2011-12-19 Thread jdow

Not as I see it. You take a backup to a large disk, or disks as the case
may be. That is your safety net. Then you try the md disks in the hard raid
controller. If they work, Bob's your uncle. If they do not work then create
the proper raid configuration on the hardware controller with the md disks
and copy in the backup. Perform the copying using a live CD to the extent
you can. At no time do you end up with twin RAID arrays. Of course, if you
have enough disks simply copy the md raid as an disk to the hard raid as a
disk. tar or dd imaging can work. If you use different disks in the
RAIDs then use tar or even cpio to copy the files rather than copy a pure
image. That will tend to optimize the partitioning to use the drive's
actual internal block size for creating partition boundaries.

{^_^}

On 2011/12/19 15:43, Felip Moll wrote:

Doing it this way seems to be a high risk operation.

Furthermore I want not do this because then I will have two raids: one raid per
software (md) into one per hardware.. my thoughts are about copying manually the
dirs of the operating system, then modifying configurations.. I think it is a
more secure process.

Thanks for the answer jdow ;)


2011/12/20 jdow j...@earthlink.net mailto:j...@earthlink.net

First take a complete backup of the md raid.

Then if the laws if Innate Perversity of Inanimate Objects you'll be able to
move the disks and have them just work. Your data is protected. (If you had
no backup IPIO would, of course, lead to the transition failing 
expensively.)

Even if IPIO does not work you restore from the complete backup to the same
disks they were on after the hardware RAID assembles itself. (Despite the
numerous times IPIO seems to work, I still figure it's a silly superstition.
It does lead to a correct degree of paranoia, though.)

{^_^}


On 2011/12/19 09:18, Felip Moll wrote:

Well, I will remake my question to not scare possible answerers:

How to move a SL6.0 system with md raid (raid per software), to another
server
without mantaining the raid per software?

Thanks!



2011/12/16 Felip Moll lip...@gmail.com mailto:lip...@gmail.com
mailto:lip...@gmail.com mailto:lip...@gmail.com


Hello all!

Recently I installed and configured a Scientific Linux to run as a 
high
performance computing cluster with 15 slave nodes and one master. I
did this
while an older system with RedHat 5.0 was running in order
to avoid users to stop their computations. All gone well. I migrated
node to
node and now I have a flawlessly cluster with SL6!.

Well, the fact is that while migrating I used the node1 to install
SL6 while
the node0 was hosting the old master operating system. Node1 has
less ram
and no raid capabilities, so I configured a Raid5 per software when
installing, using md linux software (which comes per default to a 
normal
installation when you select raid). Node0 has a Raid 5 hardware
controller.

Now I want to move the new master node1, into node0. I thought about
this
and I have to shutdown node1, node0, and with a LiveCD partition the
harddisk of node0 and copy the contents of the disk of node1 into
it. Then
make grub install.

All right but, what do you think that I should take in consideration
regarding to Raid and md? I will have to modify /etc/fstab and also
delete
/etc/mdadm.conf to avoid md running. Anything more?

Thank you very much!