Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-09 Thread Neil Brown
On Friday August 10, [EMAIL PROTECTED] wrote:
> On 8/1/07, Neil Brown <[EMAIL PROTECTED]> wrote:
> 
> > No, this does not use indefinite stack.
> >
> > loop will schedule each request to be handled by a kernel thread, so
> > requests to 'loop' are serialised, never stacked.
> >
> > In 2.6.22, generic_make_request detects and serialises recursive calls,
> > so unlimited recursion is not possible there either.
> 
> Is that saying "before 2.6.22, a read/write on a deeply layered device
> would use a lot of stack?"

before 2.6.22, a stack of dm and/or md devices (not loop, and not
md/raid0 or md/linear) would use more stack the more devices were
involved.  If you made a very deep stack, you could push the stack
over any limit you chose.

I won't say "a lot of stack" as I haven't measured the exact amount,
just "more stack as you add more devices".

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-09 Thread Dan Merillat
On 8/1/07, Neil Brown <[EMAIL PROTECTED]> wrote:

> No, this does not use indefinite stack.
>
> loop will schedule each request to be handled by a kernel thread, so
> requests to 'loop' are serialised, never stacked.
>
> In 2.6.22, generic_make_request detects and serialises recursive calls,
> so unlimited recursion is not possible there either.

Is that saying "before 2.6.22, a read/write on a deeply layered device
would use a lot of stack?"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-09 Thread Dan Merillat
On 8/1/07, Alan Cox <[EMAIL PROTECTED]> wrote:
> On Wed, 1 Aug 2007 15:33:58 +0200
> Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> > Tweaking kernel ptes is prohibitive during clone() because that's
> > kernel memory and it would require a flush tlb all with IPIs that
> > won't scale (IPIs are really the blocker)
>
> Agreed - except when doing debug work then its an acceptable cost. You
> still have to sort the debug side out because you are going to fault the
> kernel stack which will probably then cause a triple fault and reboot on
> the spot.

I was assuming debugging work, yes.  I was also thinking it wouldn't
be done at clone() time, but mapped (on a single CPU) at the time of a
context switch.  It would eliminate IPI, but would probably make the
rest of the TLB handling much too ugly to contemplate.As an
alternative, could the TLB flush and associated IPI be deferred until
the process migrates?   First migration would trigger flush/IPI,
further migration would be as now, no?   I'd happily run it with
various dm/md layers underneath

On 8/1/07, Denis Vlasenko <[EMAIL PROTECTED]> wrote:
> Hmm, neat. Why do you need to _allocate second page_ at all?
> Just mark it "not present"...

Because the kernel mapping covers all physical memory contiguously, so
if the page isn't allocated, it could be used by a kernel data
structure you need to access.  Same reason the kernel stack has to be
contiguous pages.   Well, for non-highmem at least.  Either way, you
don't want to mark an in-use page as inaccessable, you never know
what's under there.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-09 Thread Dan Merillat
On 8/1/07, Alan Cox [EMAIL PROTECTED] wrote:
 On Wed, 1 Aug 2007 15:33:58 +0200
 Andrea Arcangeli [EMAIL PROTECTED] wrote:
  Tweaking kernel ptes is prohibitive during clone() because that's
  kernel memory and it would require a flush tlb all with IPIs that
  won't scale (IPIs are really the blocker)

 Agreed - except when doing debug work then its an acceptable cost. You
 still have to sort the debug side out because you are going to fault the
 kernel stack which will probably then cause a triple fault and reboot on
 the spot.

I was assuming debugging work, yes.  I was also thinking it wouldn't
be done at clone() time, but mapped (on a single CPU) at the time of a
context switch.  It would eliminate IPI, but would probably make the
rest of the TLB handling much too ugly to contemplate.As an
alternative, could the TLB flush and associated IPI be deferred until
the process migrates?   First migration would trigger flush/IPI,
further migration would be as now, no?   I'd happily run it with
various dm/md layers underneath

On 8/1/07, Denis Vlasenko [EMAIL PROTECTED] wrote:
 Hmm, neat. Why do you need to _allocate second page_ at all?
 Just mark it not present...

Because the kernel mapping covers all physical memory contiguously, so
if the page isn't allocated, it could be used by a kernel data
structure you need to access.  Same reason the kernel stack has to be
contiguous pages.   Well, for non-highmem at least.  Either way, you
don't want to mark an in-use page as inaccessable, you never know
what's under there.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-09 Thread Dan Merillat
On 8/1/07, Neil Brown [EMAIL PROTECTED] wrote:

 No, this does not use indefinite stack.

 loop will schedule each request to be handled by a kernel thread, so
 requests to 'loop' are serialised, never stacked.

 In 2.6.22, generic_make_request detects and serialises recursive calls,
 so unlimited recursion is not possible there either.

Is that saying before 2.6.22, a read/write on a deeply layered device
would use a lot of stack?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-09 Thread Neil Brown
On Friday August 10, [EMAIL PROTECTED] wrote:
 On 8/1/07, Neil Brown [EMAIL PROTECTED] wrote:
 
  No, this does not use indefinite stack.
 
  loop will schedule each request to be handled by a kernel thread, so
  requests to 'loop' are serialised, never stacked.
 
  In 2.6.22, generic_make_request detects and serialises recursive calls,
  so unlimited recursion is not possible there either.
 
 Is that saying before 2.6.22, a read/write on a deeply layered device
 would use a lot of stack?

before 2.6.22, a stack of dm and/or md devices (not loop, and not
md/raid0 or md/linear) would use more stack the more devices were
involved.  If you made a very deep stack, you could push the stack
over any limit you chose.

I won't say a lot of stack as I haven't measured the exact amount,
just more stack as you add more devices.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-01 Thread Alan Cox
On Wed, 1 Aug 2007 15:33:58 +0200
Andrea Arcangeli <[EMAIL PROTECTED]> wrote:

> On Wed, Aug 01, 2007 at 04:11:23AM -0400, Dan Merillat wrote:
> > How expensive would it be to allocate two , then use the MMU mark the
> > second page unwritable? Hardware wise it should be possible,  (for
> 
> Tweaking kernel ptes is prohibitive during clone() because that's
> kernel memory and it would require a flush tlb all with IPIs that
> won't scale (IPIs are really the blocker)

Agreed - except when doing debug work then its an acceptable cost. You
still have to sort the debug side out because you are going to fault the
kernel stack which will probably then cause a triple fault and reboot on
the spot.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-01 Thread Andrea Arcangeli
On Wed, Aug 01, 2007 at 04:11:23AM -0400, Dan Merillat wrote:
> How expensive would it be to allocate two , then use the MMU mark the
> second page unwritable? Hardware wise it should be possible,  (for

Tweaking kernel ptes is prohibitive during clone() because that's
kernel memory and it would require a flush tlb all with IPIs that
won't scale (IPIs are really the blocker). Basically vmalloc already
does what you suggest with the gap page and yet we can't use it for
performance reasons. Kernel stack should be readable by any context to
allow sysrq+t kind of things, so I doubt it's feasible to do tricks to
avoid ipis.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-01 Thread Neil Brown
On Wednesday August 1, [EMAIL PROTECTED] wrote:
> 
> The other issue is with the layered IO design - no matter what we
> configure the stack size to, it is still possible to create a set of
> translation layers that will cause it to crash regularly:  XFS on
> dm_crypt on loop on XFS on dm_crypt on loop on ad infinitum.

No, this does not use indefinite stack.

loop will schedule each request to be handled by a kernel thread, so
requests to 'loop' are serialised, never stacked.

In 2.6.22, generic_make_request detects and serialises recursive calls,
so unlimited recursion is not possible there either.

It is still possible to do
  dm on dm on dm on dm on md on md on md on md

and calls to ->issue_flush_fn or ->unplug_fn could use an arbitrarily
large amount of stack.  But the stack usage of each stage is very
small so it is unlikely to be a problem (though it should still be
fixed).

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-01 Thread Dan Merillat
On 7/31/07, Eric Sandeen <[EMAIL PROTECTED]> wrote:

> No, what I had did only that, so it was still a matter of probabilities...

How expensive would it be to allocate two , then use the MMU mark the
second page unwritable? Hardware wise it should be possible,  (for
constant 4k pagesizes, I have not worked with variable pagesize MMUs)
and since it's a per-context-switch constant operation, it would be a
special case in the fault handler rather then adding another entry to
the VM for every process.

Using large hardware pages to cover the kernel mapping could be worked
around by leaving the area where the current process stack resides
mapped via 4k pages.  Of course, I haven't touched a modern PC MMU in
ages, so I could be missing something fundamentally difficult.

The other issue is with the layered IO design - no matter what we
configure the stack size to, it is still possible to create a set of
translation layers that will cause it to crash regularly:  XFS on
dm_crypt on loop on XFS on dm_crypt on loop on ad infinitum.

That said, I'm missing something here - why is the stack growing?
Filesystems should be issuing bios with callbacks, so they should be
back off the stack, same with dm, loop, etc.   Am I missing step where
they use a wrapper function that pretends to be syncronous?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-01 Thread Dan Merillat
On 7/31/07, Eric Sandeen [EMAIL PROTECTED] wrote:

 No, what I had did only that, so it was still a matter of probabilities...

How expensive would it be to allocate two , then use the MMU mark the
second page unwritable? Hardware wise it should be possible,  (for
constant 4k pagesizes, I have not worked with variable pagesize MMUs)
and since it's a per-context-switch constant operation, it would be a
special case in the fault handler rather then adding another entry to
the VM for every process.

Using large hardware pages to cover the kernel mapping could be worked
around by leaving the area where the current process stack resides
mapped via 4k pages.  Of course, I haven't touched a modern PC MMU in
ages, so I could be missing something fundamentally difficult.

The other issue is with the layered IO design - no matter what we
configure the stack size to, it is still possible to create a set of
translation layers that will cause it to crash regularly:  XFS on
dm_crypt on loop on XFS on dm_crypt on loop on ad infinitum.

That said, I'm missing something here - why is the stack growing?
Filesystems should be issuing bios with callbacks, so they should be
back off the stack, same with dm, loop, etc.   Am I missing step where
they use a wrapper function that pretends to be syncronous?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-01 Thread Neil Brown
On Wednesday August 1, [EMAIL PROTECTED] wrote:
 
 The other issue is with the layered IO design - no matter what we
 configure the stack size to, it is still possible to create a set of
 translation layers that will cause it to crash regularly:  XFS on
 dm_crypt on loop on XFS on dm_crypt on loop on ad infinitum.

No, this does not use indefinite stack.

loop will schedule each request to be handled by a kernel thread, so
requests to 'loop' are serialised, never stacked.

In 2.6.22, generic_make_request detects and serialises recursive calls,
so unlimited recursion is not possible there either.

It is still possible to do
  dm on dm on dm on dm on md on md on md on md

and calls to -issue_flush_fn or -unplug_fn could use an arbitrarily
large amount of stack.  But the stack usage of each stage is very
small so it is unlikely to be a problem (though it should still be
fixed).

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-01 Thread Andrea Arcangeli
On Wed, Aug 01, 2007 at 04:11:23AM -0400, Dan Merillat wrote:
 How expensive would it be to allocate two , then use the MMU mark the
 second page unwritable? Hardware wise it should be possible,  (for

Tweaking kernel ptes is prohibitive during clone() because that's
kernel memory and it would require a flush tlb all with IPIs that
won't scale (IPIs are really the blocker). Basically vmalloc already
does what you suggest with the gap page and yet we can't use it for
performance reasons. Kernel stack should be readable by any context to
allow sysrq+t kind of things, so I doubt it's feasible to do tricks to
avoid ipis.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-01 Thread Alan Cox
On Wed, 1 Aug 2007 15:33:58 +0200
Andrea Arcangeli [EMAIL PROTECTED] wrote:

 On Wed, Aug 01, 2007 at 04:11:23AM -0400, Dan Merillat wrote:
  How expensive would it be to allocate two , then use the MMU mark the
  second page unwritable? Hardware wise it should be possible,  (for
 
 Tweaking kernel ptes is prohibitive during clone() because that's
 kernel memory and it would require a flush tlb all with IPIs that
 won't scale (IPIs are really the blocker)

Agreed - except when doing debug work then its an acceptable cost. You
still have to sort the debug side out because you are going to fault the
kernel stack which will probably then cause a triple fault and reboot on
the spot.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-31 Thread Eric Sandeen
Satyam Sharma wrote:
> On 7/27/07, Alan Cox <[EMAIL PROTECTED]> wrote:
>>> Maybe I should resurrect it & send it out...
> 
> Hmm, something that hooks in not only at do_IRQ time (as the present
> in-mainline stackoverflow check thing)?

No, what I had did only that, so it was still a matter of probabilities...

-Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-31 Thread Eric Sandeen
Satyam Sharma wrote:
 On 7/27/07, Alan Cox [EMAIL PROTECTED] wrote:
 Maybe I should resurrect it  send it out...
 
 Hmm, something that hooks in not only at do_IRQ time (as the present
 in-mainline stackoverflow check thing)?

No, what I had did only that, so it was still a matter of probabilities...

-Eric

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Satyam Sharma
On 7/27/07, Alan Cox <[EMAIL PROTECTED]> wrote:
> > Maybe I should resurrect it & send it out...

Hmm, something that hooks in not only at do_IRQ time (as the present
in-mainline stackoverflow check thing)?

> > (FWIW I think I recall that the warning itself sometimes tipped the
> > scales enough on 4k stacks to bring the box down)
>
> You can always switch stack for the printk and it probably should panic
> at that point and give a trace then die as that is what we are trying to
> prove does not occur

Yes, only yesterday I saw exactly this happening DEBUG_STACKOVERFLOW
when doing a udf -> pktcdvd -> cdrom -> ide_cd thing. It's one of those
reproducible will-crash-4k-stacks tests, especially if you have debug stuff
enabled in your build that would make on-stack structures (where such
exist on the codepath) a bit heavier.

Admittedly, what seems to have happened is a bit pathological:

[  481.836378] cdrom: entering cdrom_count_tracks
[  481.844266] BUG: sleeping function called from invalid context at
include/asm/semaphore.h:98
[  481.844434] do_IRQ: stack overflow: 164
[  481.844540]  [] show_trace_log_lvl+0x19/0x2e
[  481.844707]  [] show_trace+0x12/0x14
[  481.844867]  [] dump_stack+0x14/0x16
[  481.845027]  [] do_IRQ+0x7b/0xe1
[  481.845186]  [] common_interrupt+0x2e/0x34
[  481.845348]  [] printk+0x1b/0x1d
[  481.845507]  [] __might_sleep+0x81/0xdc
[  481.845668]  [] __reacquire_kernel_lock+0x2d/0x4f
[  481.845833]  [] schedule+0x78a/0x7a4
[  481.845996]  [] wait_for_completion+0x72/0x97
[  481.846160]  [] ide_do_drive_cmd+0xeb/0x109
[  481.846324]  [] cdrom_queue_packet_command+0x40/0xc5 [ide_cd]
[  481.846503]  [] ide_cdrom_packet+0x86/0xa4 [ide_cd]
[  481.846669]  [] cdrom_get_disc_info+0x48/0x87 [cdrom]
[  481.846839]  [] cdrom_get_last_written+0x2a/0xfe [cdrom]
[  481.847009]  [] cdrom_read_toc+0x39d/0x3f3 [ide_cd]
[  481.847231]  [] ide_cdrom_audio_ioctl+0x130/0x1ce [ide_cd]
[  481.847414]  [] cdrom_count_tracks+0x5c/0x126 [cdrom]
[  481.847583]  [] cdrom_open+0x147/0x79c [cdrom]
[  481.847748]  [] idecd_open+0x75/0x8a [ide_cd]
[  481.847912]  [] do_open+0x1d1/0x284
[  481.848079]  [] __blkdev_get+0x73/0x7e
[  481.848242]  [] blkdev_get+0x15/0x17
[  481.848411]  [] pkt_open+0x99/0xc6e [pktcdvd]
[  481.848583]  [] do_open+0x96/0x284
[  481.848745]  [] __blkdev_get+0x73/0x7e
[  481.848910]  [] blkdev_get+0x15/0x17

(... the trace cut off there, and then the box froze hard, no sysrq ...)

The mount(2) hit the wait_for_completion() in ide_do_drive_cmd(),
little stack was left at this point. But then I have no idea why the
__reacquire_kernel_lock() from schedule() gave a might_sleep() there,
the code in sched.c and kernel_lock.c looks obviously correct -- the
down(_sem) only happens with both irqs and preemption on.

Anyway, the second line of printk() in __might_sleep (the one that
tells us in_atomic() and irqs_disabled()) was about to be printed when
an interrupt decided to join the fun. do_IRQ() comes in, with debug
stackoverflows on, it notices that only 164 bytes worth of stack is left
and decides to dump_stack ... and while we were doing just that,
we died. (this was 2.6.23-rc1-mm1)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Alan Cox
> Maybe I should resurrect it & send it out...
> 
> (FWIW I think I recall that the warning itself sometimes tipped the
> scales enough on 4k stacks to bring the box down)

You can always switch stack for the printk and it probably should panic
at that point and give a trace then die as that is what we are trying to
prove does not occur
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Krzysztof Halasa
Eric Sandeen <[EMAIL PROTECTED]> writes:

>> 8K stacks without IRQ stacks are not "safer" so I don't understand your
>> comment ?
>
> Hmm was it SuSE or RH kernels (or mainline?) I saw which had a test to
> defer soft IRQs if they occurred too deep in the stack for the current
> thread.

Perhaps the "8 KB softpage" should be an option instead
of 8 KB stack size?

Not sure about ABI compatibility.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Eric Sandeen
Alan Cox wrote:
>> I don't think they're necessarily bugs. IMHO the WARN_ON is better off
>> at 7k level like it is today with the current STACK_WARN. 4k for a
>> stack for common code really is small. I doubt you're going to find
> 
> You want the limit settable. On a production system you want to set the
> limit to somewhere appropriate for the stack size used. When debugging
> (eg to remove any last few bogus users of 8K stack space) you want to be
> able to set it to just under 4K

Hm, when cramming cxfs into 4k at sgi, I had a patch that did just that
for debugging (warn about encroaching on 4k without actually tipping
over, with a settable threshold...)

Maybe I should resurrect it & send it out...

(FWIW I think I recall that the warning itself sometimes tipped the
scales enough on 4k stacks to bring the box down)

-eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Eric Sandeen
Alan Cox wrote:
>> About 4k stacks I was generally against them, much better to fail in
>> fork than to risk corruption. The per-irq stack part is great feature
>> instead (too bad it wasn't enabled for the safer 8k stacks).
> 
> 8K stacks without IRQ stacks are not "safer" so I don't understand your
> comment ?

Hmm was it SuSE or RH kernels (or mainline?) I saw which had a test to
defer soft IRQs if they occurred too deep in the stack for the current
thread.

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Eric Sandeen
Alan Cox wrote:
 I don't think they're necessarily bugs. IMHO the WARN_ON is better off
 at 7k level like it is today with the current STACK_WARN. 4k for a
 stack for common code really is small. I doubt you're going to find
 
 You want the limit settable. On a production system you want to set the
 limit to somewhere appropriate for the stack size used. When debugging
 (eg to remove any last few bogus users of 8K stack space) you want to be
 able to set it to just under 4K

Hm, when cramming cxfs into 4k at sgi, I had a patch that did just that
for debugging (warn about encroaching on 4k without actually tipping
over, with a settable threshold...)

Maybe I should resurrect it  send it out...

(FWIW I think I recall that the warning itself sometimes tipped the
scales enough on 4k stacks to bring the box down)

-eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Eric Sandeen
Alan Cox wrote:
 About 4k stacks I was generally against them, much better to fail in
 fork than to risk corruption. The per-irq stack part is great feature
 instead (too bad it wasn't enabled for the safer 8k stacks).
 
 8K stacks without IRQ stacks are not safer so I don't understand your
 comment ?

Hmm was it SuSE or RH kernels (or mainline?) I saw which had a test to
defer soft IRQs if they occurred too deep in the stack for the current
thread.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Krzysztof Halasa
Eric Sandeen [EMAIL PROTECTED] writes:

 8K stacks without IRQ stacks are not safer so I don't understand your
 comment ?

 Hmm was it SuSE or RH kernels (or mainline?) I saw which had a test to
 defer soft IRQs if they occurred too deep in the stack for the current
 thread.

Perhaps the 8 KB softpage should be an option instead
of 8 KB stack size?

Not sure about ABI compatibility.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Alan Cox
 Maybe I should resurrect it  send it out...
 
 (FWIW I think I recall that the warning itself sometimes tipped the
 scales enough on 4k stacks to bring the box down)

You can always switch stack for the printk and it probably should panic
at that point and give a trace then die as that is what we are trying to
prove does not occur
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Satyam Sharma
On 7/27/07, Alan Cox [EMAIL PROTECTED] wrote:
  Maybe I should resurrect it  send it out...

Hmm, something that hooks in not only at do_IRQ time (as the present
in-mainline stackoverflow check thing)?

  (FWIW I think I recall that the warning itself sometimes tipped the
  scales enough on 4k stacks to bring the box down)

 You can always switch stack for the printk and it probably should panic
 at that point and give a trace then die as that is what we are trying to
 prove does not occur

Yes, only yesterday I saw exactly this happening DEBUG_STACKOVERFLOW
when doing a udf - pktcdvd - cdrom - ide_cd thing. It's one of those
reproducible will-crash-4k-stacks tests, especially if you have debug stuff
enabled in your build that would make on-stack structures (where such
exist on the codepath) a bit heavier.

Admittedly, what seems to have happened is a bit pathological:

[  481.836378] cdrom: entering cdrom_count_tracks
[  481.844266] BUG: sleeping function called from invalid context at
include/asm/semaphore.h:98
[  481.844434] do_IRQ: stack overflow: 164
[  481.844540]  [c0405cfe] show_trace_log_lvl+0x19/0x2e
[  481.844707]  [c0405dfe] show_trace+0x12/0x14
[  481.844867]  [c0405e14] dump_stack+0x14/0x16
[  481.845027]  [c0406ff6] do_IRQ+0x7b/0xe1
[  481.845186]  [c040583e] common_interrupt+0x2e/0x34
[  481.845348]  [c042b8e7] printk+0x1b/0x1d
[  481.845507]  [c0422c05] __might_sleep+0x81/0xdc
[  481.845668]  [c066d869] __reacquire_kernel_lock+0x2d/0x4f
[  481.845833]  [c066b09b] schedule+0x78a/0x7a4
[  481.845996]  [c066b538] wait_for_completion+0x72/0x97
[  481.846160]  [c05937a6] ide_do_drive_cmd+0xeb/0x109
[  481.846324]  [f89172a2] cdrom_queue_packet_command+0x40/0xc5 [ide_cd]
[  481.846503]  [f89175b7] ide_cdrom_packet+0x86/0xa4 [ide_cd]
[  481.846669]  [f8854dc1] cdrom_get_disc_info+0x48/0x87 [cdrom]
[  481.846839]  [f8854ec6] cdrom_get_last_written+0x2a/0xfe [cdrom]
[  481.847009]  [f891831b] cdrom_read_toc+0x39d/0x3f3 [ide_cd]
[  481.847231]  [f8918e7e] ide_cdrom_audio_ioctl+0x130/0x1ce [ide_cd]
[  481.847414]  [f8854123] cdrom_count_tracks+0x5c/0x126 [cdrom]
[  481.847583]  [f8855688] cdrom_open+0x147/0x79c [cdrom]
[  481.847748]  [f891799a] idecd_open+0x75/0x8a [ide_cd]
[  481.847912]  [c04aac0e] do_open+0x1d1/0x284
[  481.848079]  [c04aad89] __blkdev_get+0x73/0x7e
[  481.848242]  [c04aada9] blkdev_get+0x15/0x17
[  481.848411]  [f8b34b6b] pkt_open+0x99/0xc6e [pktcdvd]
[  481.848583]  [c04aaad3] do_open+0x96/0x284
[  481.848745]  [c04aad89] __blkdev_get+0x73/0x7e
[  481.848910]  [c04aada9] blkdev_get+0x15/0x17

(... the trace cut off there, and then the box froze hard, no sysrq ...)

The mount(2) hit the wait_for_completion() in ide_do_drive_cmd(),
little stack was left at this point. But then I have no idea why the
__reacquire_kernel_lock() from schedule() gave a might_sleep() there,
the code in sched.c and kernel_lock.c looks obviously correct -- the
down(kernel_sem) only happens with both irqs and preemption on.

Anyway, the second line of printk() in __might_sleep (the one that
tells us in_atomic() and irqs_disabled()) was about to be printed when
an interrupt decided to join the fun. do_IRQ() comes in, with debug
stackoverflows on, it notices that only 164 bytes worth of stack is left
and decides to dump_stack ... and while we were doing just that,
we died. (this was 2.6.23-rc1-mm1)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Bodo Eggert
On Thu, 19 Jul 2007, Denis Vlasenko wrote:
> On Tuesday 17 July 2007 00:42, Bodo Eggert wrote:

> > > b) make 4K stacks the default option in vanilla kernel.org kernels as
> > > a gentle nudge towards getting people to start fixing the code paths
> > > that are not 4K stack safe.
> > 
> > That's the big NACK. It's OK for MM, where things are supposed to be in a 
> > not well-tested state, but for running possibly mission-critical systems,
> > you should take no risk.
> 
> Mission-critical machines are not supposed to have kernel configured
> with incompetent/careless sysadmin who didn't think about
> config choices he made at kernel build time.

Is it careless to asume good code quality for default options?
Does the 4K stack come with a big red warning about crashing the kernel?
(I just checked, it does not, only benefits are listed.)
Are 4K stacks so obviously flawed nobody would use them for reliable systems?
Or is each sysadmin supposed to read LKML in order to find out about the
pitfalls you designed for them?
-- 
Top 100 things you don't want the sysadmin to say:
55. NO!  Not _that_ button!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Denis Vlasenko
On Tuesday 17 July 2007 00:42, Bodo Eggert wrote:
> > Please note that I was not trying to remove the 8K stack option right
> > now - heck, I didn't even add anything to feature-removal-schedule.txt
> > - all I wanted to accomplish with the patch that started this threas
> > was;  a) indicate that the 4K option is no longer a debug thing  and
> 
> Very ACK.
> 
> > b) make 4K stacks the default option in vanilla kernel.org kernels as
> > a gentle nudge towards getting people to start fixing the code paths
> > that are not 4K stack safe.
> 
> That's the big NACK. It's OK for MM, where things are supposed to be in a 
> not well-tested state, but for running possibly mission-critical systems,
> you should take no risk.

Mission-critical machines are not supposed to have kernel configured
with incompetent/careless sysadmin who didn't think about
config choices he made at kernel build time.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Bodo Eggert
On Tue, 17 Jul 2007, Arjan van de Ven wrote:

> > 1) It all can be reduced to 4K + 4K by asuming all IRQ happen on one CPU.
> 
> no it's separate stacks for soft and hard irqs, so it's really 4+4+4

Thanks, I missed that information. Unfortunately this change still does 
not help if one of these stacks needs to grow beyond 4K.

> another angle is that while correctness rules, userspace correctness
> rules as well. If you can't fork enough threads for what you need the
> machine for, why have the machine in the first place?

Userspace can't work correctly after the kernel crashed, but it can fail 
gracefully if it can't create enough threads.

I'd really like to be able to select 4K stacks, but as long as that stack
would overflow, I can't, and it can't be default, too.
-- 
Top 100 things you don't want the sysadmin to say:
8. ...and after I patched the microcode...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Bodo Eggert
On Wed, 18 Jul 2007, Rene Herman wrote:
> On 07/18/2007 01:19 AM, Bodo Eggert wrote:

> > Please post a list of things you have designed, so I can avoid them.
> 
> - The ability to read
> - The ability to understand
> 
> You're doing a hell of a job already.

If you designed them like you design secure systems, that explains a lot.

-- 
Top 100 things you don't want the sysadmin to say:
83. Damn, and I just bought that pop...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Bodo Eggert
Alan Cox <[EMAIL PROTECTED]> wrote:
> On Thu, 19 Jul 2007 03:33:58 +0200
> Andrea Arcangeli <[EMAIL PROTECTED]> wrote:

>> > 8K stacks without IRQ stacks are not "safer" so I don't understand your
>> > comment ?
>> 
>> Ouch, see the reports about 4k stack crashes. I agree they're not
>> safe w/o irq stacks (like on x86-64), but they're generally safer.
> 
> Still don't follow. How is "exceeds stack space but less likely to be
> noticed" safer.

If there is a tree in the forest, is it as likely to fall as the three
that's being chopped in front of our eyes? It is, because each tree will
fall eventually, but you'd still not allow your kids to play on the
tree being chopped, but you'd probably allow them to climb that other tree
like all the other kids do.

The same applies to the stack: We don't know if or when we'll see all
possible interrupts fire and kill the 8K stack, but we know for sure the
8K stack has been climbed for years and there is an axe on that 4K stack.
So where do you send the users to play?
-- 
What's worse than a Male Chauvinist Pig?
A woman that won't do what she's told.

Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Alan Cox
> I don't think they're necessarily bugs. IMHO the WARN_ON is better off
> at 7k level like it is today with the current STACK_WARN. 4k for a
> stack for common code really is small. I doubt you're going to find

You want the limit settable. On a production system you want to set the
limit to somewhere appropriate for the stack size used. When debugging
(eg to remove any last few bogus users of 8K stack space) you want to be
able to set it to just under 4K

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Andrea Arcangeli
On Wed, Jul 18, 2007 at 08:37:25PM -0500, Matt Mackall wrote:
> Turn on irqstacks when using 8k stacks

Indeed.

> Detect when usage with 8k stacks would overrun a 4k stack when doing
>  our stack switch and do a WARN_ONCE
> Fix up the damn bugs

I don't think they're necessarily bugs. IMHO the WARN_ON is better off
at 7k level like it is today with the current STACK_WARN. 4k for a
stack for common code really is small. I doubt you're going to find
obvious culprits that way, more likely you'll have to mangle the code
to call kmalloc for fairly small structures which isn't necessarily a
good thing in the long term. It comes to mind the folio ptes array
that Hugh allocated on the stack in his large PAGE_SIZE patch of jul
2001, that thing like any other local array, would need to be
kmalloced with a 4k stack. With 4k I'm afraid you better not use the
stack for anything but pointers, especially if you run in common code
that may invoke I/O like that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Andrea Arcangeli
On Thu, Jul 19, 2007 at 10:23:59AM +0100, Alan Cox wrote:
> Still don't follow. How is "exceeds stack space but less likely to be
> noticed" safer.

Statistically speaking it clearly is. The reason is probably that the
irq theoretical issue happens only on large boxes with lots of
reentrant irqs. Not all irqs are reentrant, not all systems runs lots
of irqs at the same time etc..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Alan Cox
On Thu, 19 Jul 2007 03:33:58 +0200
Andrea Arcangeli <[EMAIL PROTECTED]> wrote:

> > 8K stacks without IRQ stacks are not "safer" so I don't understand your
> > comment ?
> 
> Ouch, see the reports about 4k stack crashes. I agree they're not
> safe w/o irq stacks (like on x86-64), but they're generally safer.

Still don't follow. How is "exceeds stack space but less likely to be
noticed" safer.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Alan Cox
On Thu, 19 Jul 2007 03:33:58 +0200
Andrea Arcangeli [EMAIL PROTECTED] wrote:

  8K stacks without IRQ stacks are not safer so I don't understand your
  comment ?
 
 Ouch, see the reports about 4k stack crashes. I agree they're not
 safe w/o irq stacks (like on x86-64), but they're generally safer.

Still don't follow. How is exceeds stack space but less likely to be
noticed safer.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Andrea Arcangeli
On Thu, Jul 19, 2007 at 10:23:59AM +0100, Alan Cox wrote:
 Still don't follow. How is exceeds stack space but less likely to be
 noticed safer.

Statistically speaking it clearly is. The reason is probably that the
irq theoretical issue happens only on large boxes with lots of
reentrant irqs. Not all irqs are reentrant, not all systems runs lots
of irqs at the same time etc..
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Andrea Arcangeli
On Wed, Jul 18, 2007 at 08:37:25PM -0500, Matt Mackall wrote:
 Turn on irqstacks when using 8k stacks

Indeed.

 Detect when usage with 8k stacks would overrun a 4k stack when doing
  our stack switch and do a WARN_ONCE
 Fix up the damn bugs

I don't think they're necessarily bugs. IMHO the WARN_ON is better off
at 7k level like it is today with the current STACK_WARN. 4k for a
stack for common code really is small. I doubt you're going to find
obvious culprits that way, more likely you'll have to mangle the code
to call kmalloc for fairly small structures which isn't necessarily a
good thing in the long term. It comes to mind the folio ptes array
that Hugh allocated on the stack in his large PAGE_SIZE patch of jul
2001, that thing like any other local array, would need to be
kmalloced with a 4k stack. With 4k I'm afraid you better not use the
stack for anything but pointers, especially if you run in common code
that may invoke I/O like that.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Alan Cox
 I don't think they're necessarily bugs. IMHO the WARN_ON is better off
 at 7k level like it is today with the current STACK_WARN. 4k for a
 stack for common code really is small. I doubt you're going to find

You want the limit settable. On a production system you want to set the
limit to somewhere appropriate for the stack size used. When debugging
(eg to remove any last few bogus users of 8K stack space) you want to be
able to set it to just under 4K

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Bodo Eggert
Alan Cox [EMAIL PROTECTED] wrote:
 On Thu, 19 Jul 2007 03:33:58 +0200
 Andrea Arcangeli [EMAIL PROTECTED] wrote:

  8K stacks without IRQ stacks are not safer so I don't understand your
  comment ?
 
 Ouch, see the reports about 4k stack crashes. I agree they're not
 safe w/o irq stacks (like on x86-64), but they're generally safer.
 
 Still don't follow. How is exceeds stack space but less likely to be
 noticed safer.

If there is a tree in the forest, is it as likely to fall as the three
that's being chopped in front of our eyes? It is, because each tree will
fall eventually, but you'd still not allow your kids to play on the
tree being chopped, but you'd probably allow them to climb that other tree
like all the other kids do.

The same applies to the stack: We don't know if or when we'll see all
possible interrupts fire and kill the 8K stack, but we know for sure the
8K stack has been climbed for years and there is an axe on that 4K stack.
So where do you send the users to play?
-- 
What's worse than a Male Chauvinist Pig?
A woman that won't do what she's told.

Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Bodo Eggert
On Wed, 18 Jul 2007, Rene Herman wrote:
 On 07/18/2007 01:19 AM, Bodo Eggert wrote:

  Please post a list of things you have designed, so I can avoid them.
 
 - The ability to read
 - The ability to understand
 
 You're doing a hell of a job already.

If you designed them like you design secure systems, that explains a lot.

-- 
Top 100 things you don't want the sysadmin to say:
83. Damn, and I just bought that pop...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Bodo Eggert
On Tue, 17 Jul 2007, Arjan van de Ven wrote:

  1) It all can be reduced to 4K + 4K by asuming all IRQ happen on one CPU.
 
 no it's separate stacks for soft and hard irqs, so it's really 4+4+4

Thanks, I missed that information. Unfortunately this change still does 
not help if one of these stacks needs to grow beyond 4K.

 another angle is that while correctness rules, userspace correctness
 rules as well. If you can't fork enough threads for what you need the
 machine for, why have the machine in the first place?

Userspace can't work correctly after the kernel crashed, but it can fail 
gracefully if it can't create enough threads.

I'd really like to be able to select 4K stacks, but as long as that stack
would overflow, I can't, and it can't be default, too.
-- 
Top 100 things you don't want the sysadmin to say:
8. ...and after I patched the microcode...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Denis Vlasenko
On Tuesday 17 July 2007 00:42, Bodo Eggert wrote:
  Please note that I was not trying to remove the 8K stack option right
  now - heck, I didn't even add anything to feature-removal-schedule.txt
  - all I wanted to accomplish with the patch that started this threas
  was;  a) indicate that the 4K option is no longer a debug thing  and
 
 Very ACK.
 
  b) make 4K stacks the default option in vanilla kernel.org kernels as
  a gentle nudge towards getting people to start fixing the code paths
  that are not 4K stack safe.
 
 That's the big NACK. It's OK for MM, where things are supposed to be in a 
 not well-tested state, but for running possibly mission-critical systems,
 you should take no risk.

Mission-critical machines are not supposed to have kernel configured
with incompetent/careless sysadmin who didn't think about
config choices he made at kernel build time.
--
vda
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-19 Thread Bodo Eggert
On Thu, 19 Jul 2007, Denis Vlasenko wrote:
 On Tuesday 17 July 2007 00:42, Bodo Eggert wrote:

   b) make 4K stacks the default option in vanilla kernel.org kernels as
   a gentle nudge towards getting people to start fixing the code paths
   that are not 4K stack safe.
  
  That's the big NACK. It's OK for MM, where things are supposed to be in a 
  not well-tested state, but for running possibly mission-critical systems,
  you should take no risk.
 
 Mission-critical machines are not supposed to have kernel configured
 with incompetent/careless sysadmin who didn't think about
 config choices he made at kernel build time.

Is it careless to asume good code quality for default options?
Does the 4K stack come with a big red warning about crashing the kernel?
(I just checked, it does not, only benefits are listed.)
Are 4K stacks so obviously flawed nobody would use them for reliable systems?
Or is each sysadmin supposed to read LKML in order to find out about the
pitfalls you designed for them?
-- 
Top 100 things you don't want the sysadmin to say:
55. NO!  Not _that_ button!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Rene Herman

On 07/19/2007 03:37 AM, Matt Mackall wrote:


Here's a way to make forward progress on this whole thing:

Turn on irqstacks when using 8k stacks


WLI: are you submitting? Makes great sense regardless of anything and 
they've been tested silly with 4KSTACKS already...



Detect when usage with 8k stacks would overrun a 4k stack when doing
our stack switch and do a WARN_ONCE


Our stack switch?


Fix up the damn bugs


DM ofcourse is fairly "layered-by-design" so I _hope_ they can be classified 
simple bugs...


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Matt Mackall
On Thu, Jul 19, 2007 at 03:33:58AM +0200, Andrea Arcangeli wrote:
> On Thu, Jul 19, 2007 at 01:39:55AM +0100, Alan Cox wrote:
> > > About 4k stacks I was generally against them, much better to fail in
> > > fork than to risk corruption. The per-irq stack part is great feature
> > > instead (too bad it wasn't enabled for the safer 8k stacks).
> > 
> > 8K stacks without IRQ stacks are not "safer" so I don't understand your
> > comment ?
> 
> Ouch, see the reports about 4k stack crashes. I agree they're not
> safe w/o irq stacks (like on x86-64), but they're generally safer.

Here's a way to make forward progress on this whole thing:

Turn on irqstacks when using 8k stacks
Detect when usage with 8k stacks would overrun a 4k stack when doing
 our stack switch and do a WARN_ONCE
Fix up the damn bugs

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Andrea Arcangeli
On Thu, Jul 19, 2007 at 01:39:55AM +0100, Alan Cox wrote:
> > About 4k stacks I was generally against them, much better to fail in
> > fork than to risk corruption. The per-irq stack part is great feature
> > instead (too bad it wasn't enabled for the safer 8k stacks).
> 
> 8K stacks without IRQ stacks are not "safer" so I don't understand your
> comment ?

Ouch, see the reports about 4k stack crashes. I agree they're not
safe w/o irq stacks (like on x86-64), but they're generally safer.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Matt Mackall
On Thu, Jul 19, 2007 at 02:48:37AM +0200, Rene Herman wrote:
> On 07/19/2007 02:41 AM, Matt Mackall wrote:
> 
> >On Thu, Jul 19, 2007 at 02:15:39AM +0200, Andrea Arcangeli wrote:
> 
> >>Using kmalloc(8k) instead of alloc_page() doesn't sound a too big deal
> >>and that will solve the problem.
> >
> >How do you figure?
> >
> >If you're saying that soft pages helps our 8k stack allocations, it
> >doesn't. The memory overhead of soft pages will be higher (5-15%,
> >mostly due to file tails in pagecache) than the level at which 8k
> >stacks currently run into trouble (1-2% free?).
> >
> >Not helpful.
> 
> With tail-packing it is.

Tail packing is a whole new can of worms. Especially as it's very
likely to make performance suffer on small files (the common case).

On the other hand, if someone can demonstrate that tail-packed page
cache doesn't suck, we should put it in mainline pronto. The poor
architectures that are stuck with real 64k pages are sure to
appreciate it.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Rene Herman

On 07/19/2007 02:41 AM, Matt Mackall wrote:


On Thu, Jul 19, 2007 at 02:15:39AM +0200, Andrea Arcangeli wrote:



Using kmalloc(8k) instead of alloc_page() doesn't sound a too big deal
and that will solve the problem.


How do you figure?

If you're saying that soft pages helps our 8k stack allocations, it
doesn't. The memory overhead of soft pages will be higher (5-15%,
mostly due to file tails in pagecache) than the level at which 8k
stacks currently run into trouble (1-2% free?).

Not helpful.


With tail-packing it is.

Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Matt Mackall
On Thu, Jul 19, 2007 at 02:15:39AM +0200, Andrea Arcangeli wrote:
> On Mon, Jul 16, 2007 at 06:27:55PM -0500, Matt Mackall wrote:
> > So it's absolutely no help in fixing our order-1 allocation problem
> > because we don't want to force large pages on people.
> 
> Using kmalloc(8k) instead of alloc_page() doesn't sound a too big deal
> and that will solve the problem.

How do you figure?

If you're saying that soft pages helps our 8k stack allocations, it
doesn't. The memory overhead of soft pages will be higher (5-15%,
mostly due to file tails in pagecache) than the level at which 8k
stacks currently run into trouble (1-2% free?).

Not helpful.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Alan Cox
> About 4k stacks I was generally against them, much better to fail in
> fork than to risk corruption. The per-irq stack part is great feature
> instead (too bad it wasn't enabled for the safer 8k stacks).

8K stacks without IRQ stacks are not "safer" so I don't understand your
comment ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Andrea Arcangeli
On Mon, Jul 16, 2007 at 06:27:55PM -0500, Matt Mackall wrote:
> So it's absolutely no help in fixing our order-1 allocation problem
> because we don't want to force large pages on people.

Using kmalloc(8k) instead of alloc_page() doesn't sound a too big deal
and that will solve the problem. The whole idea is to avoid the memcpy
+ pte mangling of defrag while hopefully lowering cpu utilization in
allocations at the same time.

About 4k stacks I was generally against them, much better to fail in
fork than to risk corruption. The per-irq stack part is great feature
instead (too bad it wasn't enabled for the safer 8k stacks).

Failing in a do_no_page with variable order page size allocation is a
fatal event (the task will be killed), failing in fork is graceful,
userland can retry etc... Fork can fail for different reasons, ulimit
itself is the most likely source of fork failures. I don't think the
8k stacks have ever been a problem, yes you will run out of stack
sooner (sooner also because the 4k stacks takes less memory) but
nothing is terribly wrong if the 8k allocation fails.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Rene Herman

On 07/18/2007 07:19 PM, Phillip Susi wrote:

Why do the two pages have to be physically contiguous?  The stack just 
needs to be two contiguous pages in virtual memory, but they can map to 
any two pages anywhere in physical memory.


As far as I'm aware that's just a consequence of the way linux does memory 
management. If we ignore highmem, virtual memory is simply +/- PAGE_OFFSET 
away from physical so allocating virtually contiguous pages that are _not_ 
physically contiguous requires mapping them somewhere (the vmalloc area) 
which is limited. Given that large number of threads _are_ the problem you 
wouldn't solve things -- you'd again be out of space, although now for a 
different reason.


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Phillip Susi

Alan Cox wrote:
Why do the two pages have to be physically contiguous?  The stack just 
needs to be two contiguous pages in virtual memory, but they can map to 
any two pages anywhere in physical memory.


Historically we allowed DMA off the stack on old x86 systems. Removing
that while a good idea would take a lot of auditing. We also have a very
limited vmalloc window for mapped pages and filling that with stacks
would be bad.


Wow, DMA off the stack?  That's just crazy.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Alan Cox
> Why do the two pages have to be physically contiguous?  The stack just 
> needs to be two contiguous pages in virtual memory, but they can map to 
> any two pages anywhere in physical memory.

Historically we allowed DMA off the stack on old x86 systems. Removing
that while a good idea would take a lot of auditing. We also have a very
limited vmalloc window for mapped pages and filling that with stacks
would be bad.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Phillip Susi

Matt Mackall wrote:
As far as I'm aware, the actual reason for 4K stacks is that after the 
system has been up and running for some time getting "1 physically 
contiguous pages" becomes significantly easier than 2 which wouldn't be 
arbitrary.


If there are exactly two free pages in the system, the odds of them
being buddies (ie adjacent AND properly aligned) is quite small. The
available page pool has to grow quite a bit before the availability of
order-1 page pairs approaches 100%. 


So if we fail to allocate an 8k stack when we could have allocated a
4k stack, we're almost certainly failing significantly prematurely.


Why do the two pages have to be physically contiguous?  The stack just 
needs to be two contiguous pages in virtual memory, but they can map to 
any two pages anywhere in physical memory.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Rene Herman

On 07/18/2007 06:54 PM, Matt Mackall wrote:


You can expect the distribution of file sizes to follow a gamma
distribution, with a large hump towards the small end of the spectrum
around 1-10K, dropping off very rapidly as file sizes grow.


Okay.

Not too sure then that 8K wouldn't be something I'd want, given fewer 
pagefaults and all that...


Fewer minor pagefaults, perhaps. Readahead already deals with most of
the major pagefaults that larger pages would.


Mmm, yes.

Anyway, raising the systemwide memory overhead by up to 15% seems an 
awfully silly way to address the problem of not being able to allocate a

stack when you're down to your last 1 or 2% of memory!


Well, I've seen larger pagesizes submerge in more situations, specifically 
in allocation overhead -- ie, making the struct page's fit in lowmem for 
hugemem x86 boxes was the first I heard of it. But yes, otherwise (also) 
mostly database loads which obviously have moved to 64-bit since.


Pagecache tail-packing seems like a promising idea to deal with the downside 
of larger pages but I'll admit I'm not particularly sure how many _up_ sides 
to them are left on x86 (not -64) now that's becoming a legacy architecture 
(and since you just shot down the pagefaults thing).


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Matt Mackall
On Wed, Jul 18, 2007 at 04:38:19AM +0200, Rene Herman wrote:
> On 07/17/2007 01:27 AM, Matt Mackall wrote:
> 
> >Larger soft pages waste tremendous amounts of memory (mostly in page
> >cache) for minimal benefit on, say, the typical desktop. While there
> >are workloads where it's a win, it's probably on a small percentage of
> >machines.
> >
> >So it's absolutely no help in fixing our order-1 allocation problem
> >because we don't want to force large pages on people.
> 
> I was just now looking at how much space is in fact wasted in pagecache for 
> various pagesizes by running the attached dumb little program from a few 
> selected directories (heavy stack recursion, never mind).
> 
> Well, hmmm. This is on a (compiled) git tree:
> 
> [EMAIL PROTECTED]:~/src/linux/local$ pageslack
> total : 447350347
>  4k   : 67738037 (15%)
>  8k   : 147814837 (33%)
> 16k   : 324614581 (72%)
> 32k   : 724629941 (161%)
> 64k   : 1592785333 (356%)
> 
> Nicely constant factor 2.2 instead of the 2 one would expect but oh well. 
> On a collection of larger files the percentages obviously drop. This is on 
> a directory of ogg vorbis files:
> 
> [EMAIL PROTECTED]:/mnt/ogg/.../... # pageslack
> total : 70817974
>  4k   : 26442 (0%)
>  8k   : 67402 (0%)
> 16k   : 124746 (0%)
> 32k   : 288586 (0%)
> 64k   : 419658 (0%)
> 
> The "typical desktop" is presented by neither I guess but does involve 
> audio and (much larger still) video and bloody huge browser apps.

I'd be surprised if a user had substantially more than one OGG, video,
or browser in memory at one time. In fact, you're likely to find only
a fraction of each of those in memory at any given time.

Meanwhile, they're likely to have thousands of small browser cache,
thumbnail, config, icon, maildir, etc. files in cache. And hundreds of
medium-sized libraries, utilities, applications, and so on.

You can expect the distribution of file sizes to follow a gamma
distribution, with a large hump towards the small end of the spectrum
around 1-10K, dropping off very rapidly as file sizes grow.

> Not too sure then that 8K wouldn't be something I'd want, given fewer 
> pagefaults and all that...

Fewer minor pagefaults, perhaps. Readahead already deals with most of
the major pagefaults that larger pages would.

Anyway, raising the systemwide memory overhead by up to 15% seems an
awfully silly way to address the problem of not being able to allocate
a stack when you're down to your last 1 or 2% of memory! In all
likelihood, we'll fail sooner because we're completely OOM.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Nick Craig-Wood
Nick Craig-Wood <[EMAIL PROTECTED]> wrote:
>  Zan Lynx <[EMAIL PROTECTED]> wrote:
> >  There *are* crashes from LVM and ext3.  I had to change kernels to avoid
> >  them.
> > 
> >  I had crashes with ext3 on LVM snapshot on DM mirror on SATA.
> 
>  We've noticed these too... ext3/LVM/raid0/sata seems fine.  If you add
>  snapshot in that mix then it becomes rather unreliable.

I meant raid1 up there not raid0!

-- 
Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Nick Craig-Wood
Zan Lynx <[EMAIL PROTECTED]> wrote:
>  There *are* crashes from LVM and ext3.  I had to change kernels to avoid
>  them.
> 
>  I had crashes with ext3 on LVM snapshot on DM mirror on SATA.

We've noticed these too... ext3/LVM/raid0/sata seems fine.  If you add
snapshot in that mix then it becomes rather unreliable.

-- 
Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Ray Lee

On 7/17/07, Alan Cox <[EMAIL PROTECTED]> wrote:

On Mon, 16 Jul 2007 16:15:28 -0700
"Ray Lee" <[EMAIL PROTECTED]> wrote:
> Heh :-). No, it's not a question of trust. First and foremost, it's
> that there are still users who say that they can crash a current
> 4k+interrupt stacks kernel, while the 8k without interrupt stacks is
> fine.

You forgot "most of the time".


Yeah, fair enough.


Its statistically less likely, which
merely means its evilly hard to debug


Not being able to debug the cases that occur (and the fact that
they're rare, as you're pointing out) is as much of a problem as the
crashes themselves. 8k + IRQ stacks with a warning when 4k of process
stack is exceeded would seem like a reasonable first step to making 4k
a palatable default.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Ray Lee

On 7/17/07, Alan Cox [EMAIL PROTECTED] wrote:

On Mon, 16 Jul 2007 16:15:28 -0700
Ray Lee [EMAIL PROTECTED] wrote:
 Heh :-). No, it's not a question of trust. First and foremost, it's
 that there are still users who say that they can crash a current
 4k+interrupt stacks kernel, while the 8k without interrupt stacks is
 fine.

You forgot most of the time.


Yeah, fair enough.


Its statistically less likely, which
merely means its evilly hard to debug


Not being able to debug the cases that occur (and the fact that
they're rare, as you're pointing out) is as much of a problem as the
crashes themselves. 8k + IRQ stacks with a warning when 4k of process
stack is exceeded would seem like a reasonable first step to making 4k
a palatable default.

Ray
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Nick Craig-Wood
Zan Lynx [EMAIL PROTECTED] wrote:
  There *are* crashes from LVM and ext3.  I had to change kernels to avoid
  them.
 
  I had crashes with ext3 on LVM snapshot on DM mirror on SATA.

We've noticed these too... ext3/LVM/raid0/sata seems fine.  If you add
snapshot in that mix then it becomes rather unreliable.

-- 
Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Nick Craig-Wood
Nick Craig-Wood [EMAIL PROTECTED] wrote:
  Zan Lynx [EMAIL PROTECTED] wrote:
   There *are* crashes from LVM and ext3.  I had to change kernels to avoid
   them.
  
   I had crashes with ext3 on LVM snapshot on DM mirror on SATA.
 
  We've noticed these too... ext3/LVM/raid0/sata seems fine.  If you add
  snapshot in that mix then it becomes rather unreliable.

I meant raid1 up there not raid0!

-- 
Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Matt Mackall
On Wed, Jul 18, 2007 at 04:38:19AM +0200, Rene Herman wrote:
 On 07/17/2007 01:27 AM, Matt Mackall wrote:
 
 Larger soft pages waste tremendous amounts of memory (mostly in page
 cache) for minimal benefit on, say, the typical desktop. While there
 are workloads where it's a win, it's probably on a small percentage of
 machines.
 
 So it's absolutely no help in fixing our order-1 allocation problem
 because we don't want to force large pages on people.
 
 I was just now looking at how much space is in fact wasted in pagecache for 
 various pagesizes by running the attached dumb little program from a few 
 selected directories (heavy stack recursion, never mind).
 
 Well, hmmm. This is on a (compiled) git tree:
 
 [EMAIL PROTECTED]:~/src/linux/local$ pageslack
 total : 447350347
  4k   : 67738037 (15%)
  8k   : 147814837 (33%)
 16k   : 324614581 (72%)
 32k   : 724629941 (161%)
 64k   : 1592785333 (356%)
 
 Nicely constant factor 2.2 instead of the 2 one would expect but oh well. 
 On a collection of larger files the percentages obviously drop. This is on 
 a directory of ogg vorbis files:
 
 [EMAIL PROTECTED]:/mnt/ogg/.../... # pageslack
 total : 70817974
  4k   : 26442 (0%)
  8k   : 67402 (0%)
 16k   : 124746 (0%)
 32k   : 288586 (0%)
 64k   : 419658 (0%)
 
 The typical desktop is presented by neither I guess but does involve 
 audio and (much larger still) video and bloody huge browser apps.

I'd be surprised if a user had substantially more than one OGG, video,
or browser in memory at one time. In fact, you're likely to find only
a fraction of each of those in memory at any given time.

Meanwhile, they're likely to have thousands of small browser cache,
thumbnail, config, icon, maildir, etc. files in cache. And hundreds of
medium-sized libraries, utilities, applications, and so on.

You can expect the distribution of file sizes to follow a gamma
distribution, with a large hump towards the small end of the spectrum
around 1-10K, dropping off very rapidly as file sizes grow.

 Not too sure then that 8K wouldn't be something I'd want, given fewer 
 pagefaults and all that...

Fewer minor pagefaults, perhaps. Readahead already deals with most of
the major pagefaults that larger pages would.

Anyway, raising the systemwide memory overhead by up to 15% seems an
awfully silly way to address the problem of not being able to allocate
a stack when you're down to your last 1 or 2% of memory! In all
likelihood, we'll fail sooner because we're completely OOM.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Rene Herman

On 07/18/2007 06:54 PM, Matt Mackall wrote:


You can expect the distribution of file sizes to follow a gamma
distribution, with a large hump towards the small end of the spectrum
around 1-10K, dropping off very rapidly as file sizes grow.


Okay.

Not too sure then that 8K wouldn't be something I'd want, given fewer 
pagefaults and all that...


Fewer minor pagefaults, perhaps. Readahead already deals with most of
the major pagefaults that larger pages would.


Mmm, yes.

Anyway, raising the systemwide memory overhead by up to 15% seems an 
awfully silly way to address the problem of not being able to allocate a

stack when you're down to your last 1 or 2% of memory!


Well, I've seen larger pagesizes submerge in more situations, specifically 
in allocation overhead -- ie, making the struct page's fit in lowmem for 
hugemem x86 boxes was the first I heard of it. But yes, otherwise (also) 
mostly database loads which obviously have moved to 64-bit since.


Pagecache tail-packing seems like a promising idea to deal with the downside 
of larger pages but I'll admit I'm not particularly sure how many _up_ sides 
to them are left on x86 (not -64) now that's becoming a legacy architecture 
(and since you just shot down the pagefaults thing).


Rene.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Phillip Susi

Matt Mackall wrote:
As far as I'm aware, the actual reason for 4K stacks is that after the 
system has been up and running for some time getting 1 physically 
contiguous pages becomes significantly easier than 2 which wouldn't be 
arbitrary.


If there are exactly two free pages in the system, the odds of them
being buddies (ie adjacent AND properly aligned) is quite small. The
available page pool has to grow quite a bit before the availability of
order-1 page pairs approaches 100%. 


So if we fail to allocate an 8k stack when we could have allocated a
4k stack, we're almost certainly failing significantly prematurely.


Why do the two pages have to be physically contiguous?  The stack just 
needs to be two contiguous pages in virtual memory, but they can map to 
any two pages anywhere in physical memory.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Alan Cox
 Why do the two pages have to be physically contiguous?  The stack just 
 needs to be two contiguous pages in virtual memory, but they can map to 
 any two pages anywhere in physical memory.

Historically we allowed DMA off the stack on old x86 systems. Removing
that while a good idea would take a lot of auditing. We also have a very
limited vmalloc window for mapped pages and filling that with stacks
would be bad.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Phillip Susi

Alan Cox wrote:
Why do the two pages have to be physically contiguous?  The stack just 
needs to be two contiguous pages in virtual memory, but they can map to 
any two pages anywhere in physical memory.


Historically we allowed DMA off the stack on old x86 systems. Removing
that while a good idea would take a lot of auditing. We also have a very
limited vmalloc window for mapped pages and filling that with stacks
would be bad.


Wow, DMA off the stack?  That's just crazy.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Rene Herman

On 07/18/2007 07:19 PM, Phillip Susi wrote:

Why do the two pages have to be physically contiguous?  The stack just 
needs to be two contiguous pages in virtual memory, but they can map to 
any two pages anywhere in physical memory.


As far as I'm aware that's just a consequence of the way linux does memory 
management. If we ignore highmem, virtual memory is simply +/- PAGE_OFFSET 
away from physical so allocating virtually contiguous pages that are _not_ 
physically contiguous requires mapping them somewhere (the vmalloc area) 
which is limited. Given that large number of threads _are_ the problem you 
wouldn't solve things -- you'd again be out of space, although now for a 
different reason.


Rene.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Andrea Arcangeli
On Mon, Jul 16, 2007 at 06:27:55PM -0500, Matt Mackall wrote:
 So it's absolutely no help in fixing our order-1 allocation problem
 because we don't want to force large pages on people.

Using kmalloc(8k) instead of alloc_page() doesn't sound a too big deal
and that will solve the problem. The whole idea is to avoid the memcpy
+ pte mangling of defrag while hopefully lowering cpu utilization in
allocations at the same time.

About 4k stacks I was generally against them, much better to fail in
fork than to risk corruption. The per-irq stack part is great feature
instead (too bad it wasn't enabled for the safer 8k stacks).

Failing in a do_no_page with variable order page size allocation is a
fatal event (the task will be killed), failing in fork is graceful,
userland can retry etc... Fork can fail for different reasons, ulimit
itself is the most likely source of fork failures. I don't think the
8k stacks have ever been a problem, yes you will run out of stack
sooner (sooner also because the 4k stacks takes less memory) but
nothing is terribly wrong if the 8k allocation fails.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Alan Cox
 About 4k stacks I was generally against them, much better to fail in
 fork than to risk corruption. The per-irq stack part is great feature
 instead (too bad it wasn't enabled for the safer 8k stacks).

8K stacks without IRQ stacks are not safer so I don't understand your
comment ?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Matt Mackall
On Thu, Jul 19, 2007 at 02:15:39AM +0200, Andrea Arcangeli wrote:
 On Mon, Jul 16, 2007 at 06:27:55PM -0500, Matt Mackall wrote:
  So it's absolutely no help in fixing our order-1 allocation problem
  because we don't want to force large pages on people.
 
 Using kmalloc(8k) instead of alloc_page() doesn't sound a too big deal
 and that will solve the problem.

How do you figure?

If you're saying that soft pages helps our 8k stack allocations, it
doesn't. The memory overhead of soft pages will be higher (5-15%,
mostly due to file tails in pagecache) than the level at which 8k
stacks currently run into trouble (1-2% free?).

Not helpful.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Rene Herman

On 07/19/2007 02:41 AM, Matt Mackall wrote:


On Thu, Jul 19, 2007 at 02:15:39AM +0200, Andrea Arcangeli wrote:



Using kmalloc(8k) instead of alloc_page() doesn't sound a too big deal
and that will solve the problem.


How do you figure?

If you're saying that soft pages helps our 8k stack allocations, it
doesn't. The memory overhead of soft pages will be higher (5-15%,
mostly due to file tails in pagecache) than the level at which 8k
stacks currently run into trouble (1-2% free?).

Not helpful.


With tail-packing it is.

Rene.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Matt Mackall
On Thu, Jul 19, 2007 at 02:48:37AM +0200, Rene Herman wrote:
 On 07/19/2007 02:41 AM, Matt Mackall wrote:
 
 On Thu, Jul 19, 2007 at 02:15:39AM +0200, Andrea Arcangeli wrote:
 
 Using kmalloc(8k) instead of alloc_page() doesn't sound a too big deal
 and that will solve the problem.
 
 How do you figure?
 
 If you're saying that soft pages helps our 8k stack allocations, it
 doesn't. The memory overhead of soft pages will be higher (5-15%,
 mostly due to file tails in pagecache) than the level at which 8k
 stacks currently run into trouble (1-2% free?).
 
 Not helpful.
 
 With tail-packing it is.

Tail packing is a whole new can of worms. Especially as it's very
likely to make performance suffer on small files (the common case).

On the other hand, if someone can demonstrate that tail-packed page
cache doesn't suck, we should put it in mainline pronto. The poor
architectures that are stuck with real 64k pages are sure to
appreciate it.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Andrea Arcangeli
On Thu, Jul 19, 2007 at 01:39:55AM +0100, Alan Cox wrote:
  About 4k stacks I was generally against them, much better to fail in
  fork than to risk corruption. The per-irq stack part is great feature
  instead (too bad it wasn't enabled for the safer 8k stacks).
 
 8K stacks without IRQ stacks are not safer so I don't understand your
 comment ?

Ouch, see the reports about 4k stack crashes. I agree they're not
safe w/o irq stacks (like on x86-64), but they're generally safer.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Matt Mackall
On Thu, Jul 19, 2007 at 03:33:58AM +0200, Andrea Arcangeli wrote:
 On Thu, Jul 19, 2007 at 01:39:55AM +0100, Alan Cox wrote:
   About 4k stacks I was generally against them, much better to fail in
   fork than to risk corruption. The per-irq stack part is great feature
   instead (too bad it wasn't enabled for the safer 8k stacks).
  
  8K stacks without IRQ stacks are not safer so I don't understand your
  comment ?
 
 Ouch, see the reports about 4k stack crashes. I agree they're not
 safe w/o irq stacks (like on x86-64), but they're generally safer.

Here's a way to make forward progress on this whole thing:

Turn on irqstacks when using 8k stacks
Detect when usage with 8k stacks would overrun a 4k stack when doing
 our stack switch and do a WARN_ONCE
Fix up the damn bugs

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-18 Thread Rene Herman

On 07/19/2007 03:37 AM, Matt Mackall wrote:


Here's a way to make forward progress on this whole thing:

Turn on irqstacks when using 8k stacks


WLI: are you submitting? Makes great sense regardless of anything and 
they've been tested silly with 4KSTACKS already...



Detect when usage with 8k stacks would overrun a 4k stack when doing
our stack switch and do a WARN_ONCE


Our stack switch?


Fix up the damn bugs


DM ofcourse is fairly layered-by-design so I _hope_ they can be classified 
simple bugs...


Rene.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Rene Herman

On 07/17/2007 01:27 AM, Matt Mackall wrote:


Larger soft pages waste tremendous amounts of memory (mostly in page
cache) for minimal benefit on, say, the typical desktop. While there
are workloads where it's a win, it's probably on a small percentage of
machines.

So it's absolutely no help in fixing our order-1 allocation problem
because we don't want to force large pages on people.


I was just now looking at how much space is in fact wasted in pagecache for 
various pagesizes by running the attached dumb little program from a few 
selected directories (heavy stack recursion, never mind).


Well, hmmm. This is on a (compiled) git tree:

[EMAIL PROTECTED]:~/src/linux/local$ pageslack
total   : 447350347
 4k : 67738037 (15%)
 8k : 147814837 (33%)
16k : 324614581 (72%)
32k : 724629941 (161%)
64k : 1592785333 (356%)

Nicely constant factor 2.2 instead of the 2 one would expect but oh well. On 
a collection of larger files the percentages obviously drop. This is on a 
directory of ogg vorbis files:


[EMAIL PROTECTED]:/mnt/ogg/.../... # pageslack
total   : 70817974
 4k : 26442 (0%)
 8k : 67402 (0%)
16k : 124746 (0%)
32k : 288586 (0%)
64k : 419658 (0%)

The "typical desktop" is presented by neither I guess but does involve audio 
and (much larger still) video and bloody huge browser apps.


Not too sure then that 8K wouldn't be something I'd want, given fewer 
pagefaults and all that...


Rene.

/* gcc -W -Wall -o pageslack pageslack.c */

#include 
#include 
#include 

#include 
#include 
#include 
#include 

#define PAGE_SIZE (1UL << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE - 1))

unsigned long long total;
unsigned long long slack[5];

void do_dir(const char *name)
{
DIR *dir; 
struct dirent *ent;

dir = opendir(name);
if (!dir) {
perror("opendir");
exit(EXIT_FAILURE);
}
while ((ent = readdir(dir))) {
struct stat buf;
char path[PATH_MAX];

if (!strcmp(ent->d_name, "."))
continue;
if (!strcmp(ent->d_name, ".."))
continue;

sprintf(path, "%s/%s", name, ent->d_name);
if (stat(path, )) {
perror("stat");
exit(EXIT_FAILURE);
}
if (S_ISDIR(buf.st_mode)) {
do_dir(path);
continue;
}
if (S_ISREG(buf.st_mode)) {
int i;

for (i = 0; i < 5; i++) {
unsigned long PAGE_SHIFT = 12 + i;
slack[i] += (PAGE_SIZE - (buf.st_size % 
PAGE_SIZE)) % PAGE_SIZE;
}
total += buf.st_size;
}
}
if (closedir(dir)) {
perror("closedir");
exit(EXIT_FAILURE);
}
}

int main(void)
{
do_dir(".");
printf("total\t: %llu\n", total);
printf(" 4k\t: %llu (%llu%%)\n", slack[0], (100 * slack[0]) / total);
printf(" 8k\t: %llu (%llu%%)\n", slack[1], (100 * slack[1]) / total);
printf("16k\t: %llu (%llu%%)\n", slack[2], (100 * slack[2]) / total);
printf("32k\t: %llu (%llu%%)\n", slack[3], (100 * slack[3]) / total);
printf("64k\t: %llu (%llu%%)\n", slack[4], (100 * slack[4]) / total);
return EXIT_SUCCESS;
}


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Rene Herman

On 07/18/2007 01:39 AM, Jesper Juhl wrote:


On 17/07/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:



At hch's suggestion I rewrote the separate IRQ stack configurability
patch into one making IRQ stacks mandatory and unconfigurable, and
hence enabled with 8K stacks.


For what it's worth, that sounds good to me - like something that we
would want merged.


Yes, seperate IRQ stacks make eminent sense in their own right.

Andrea Arcangeli's current thread on soft pages:

http://lkml.org/lkml/2007/7/6/346

is also interesting though in the context of 1-page stacks.

Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Alan Cox
> I can't speak for Fedora, but RHEL disables XFS in their kernel likely
> because it is known to cause problems with 4K stacks.

-was- - the SGI folks submitted patches to deal with some gcc problems
with stack usage.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Alan Cox
On Mon, 16 Jul 2007 16:15:28 -0700
"Ray Lee" <[EMAIL PROTECTED]> wrote:

> On 7/16/07, Rene Herman <[EMAIL PROTECTED]> wrote:
> > Yes but it's also an argument that the 4K stacks don't make the _current_
> > situation without CONFIG_4KSTACKS selected worse and given that you trust
> > that current situation, that leaves you without your argument :-)
> 
> Heh :-). No, it's not a question of trust. First and foremost, it's
> that there are still users who say that they can crash a current
> 4k+interrupt stacks kernel, while the 8k without interrupt stacks is
> fine.

You forgot "most of the time". Its statistically less likely, which
merely means its evilly hard to debug
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Rene Herman

On 07/18/2007 01:19 AM, Bodo Eggert wrote:


Please post a list of things you have designed, so I can avoid them.


- The ability to read
- The ability to understand

You're doing a hell of a job already.

Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread utz lehmann
On Tue, 2007-07-17 at 10:45 -0400, John Stoffel wrote:
> utz> I have to recompile the fedora kernel rpms (fc6, f7) with 8k
> utz> stacks on my i686 server. It's using NFS -> XFS -> DM -> MD
> utz> (raid1) -> IDE disks.  With 4k stacks it crash (hang) within
> utz> minutes after using NFS.  With 8k stacks it's rock solid. No
> utz> crashes within months.
> 
> Does it give any useful information when it does crash?
>  
No, sorry. Nearly always it lock up so hard that even sysrq didn't work
anymore. Most times the console was blanked. If not, there was a line
with "do_irq" or something like that (if i remember correctly).
A few times it continuous oopsing (scrolling like mad).

I think it's just a stack overflow. Knowing that XFS + long IO stack
have problems with 4k stacks. And i have zero crashes with the
recompiled 8k stack kernels. (All kernel are the fedora ones).

Btw: In the past the server runs on slightly different hardware and
without raid1 (NFS -> XFS -> DM -> IDE disk). It runs with 4k stacks. I
had a few crashes, but i blame the hardware for it.

I don't want to make tests with the server. It's my main data storage
and i don't want to risk it.

>  Can you make
> a simple test case using ram disks instead of IDE disks and then
> building upon that?

Sorry, i don't think i can do this. My other computer, which i can use
for tests, is x86_64 based.
And IFAIK the problem on the XFS side has something to do with looking
for freespace on many AGs. So maybe a bigger and filled filesystem is
needed. And 50GB ram disks are out of question.

utz


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Jesper Juhl

On 17/07/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:

At some point in the past, I wrote:
>> If at some point one of the pro-4k stacks crowd can prove that all
>> code paths are safe, or introduce another viable alternative (such as
>> Matt's idea for extending the stack dynamically), then removing the 8k
>> stacks option makes sense.

On Mon, Jul 16, 2007 at 11:54:38PM +0100, Alan Cox wrote:
> Any x86-32 path unsafe with 4K stacks is almost certainly unsafe with 8K
> stacks because the 8K stacks do not have seperate IRQ stack paths, so you
> have the same space but split. It might be less predictable on 8K stacks
> but it isn't absent.

At hch's suggestion I rewrote the separate IRQ stack configurability
patch into one making IRQ stacks mandatory and unconfigurable, and
hence enabled with 8K stacks.


For what it's worth, that sounds good to me - like something that we
would want merged.

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Arjan van de Ven

> 1) It all can be reduced to 4K + 4K by asuming all IRQ happen on one CPU.

no it's separate stacks for soft and hard irqs, so it's really 4+4+4


another angle is that while correctness rules, userspace correctness
rules as well. If you can't fork enough threads for what you need the
machine for, why have the machine in the first place?

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Bodo Eggert
On Tue, 17 Jul 2007, Rene Herman wrote:
> On 07/17/2007 12:06 PM, Bodo Eggert wrote:
> > On Tue, 17 Jul 2007, Rene Herman wrote:
> >> On 07/17/2007 01:45 AM, Bodo Eggert wrote:

> >>> You claim 4k+4k is safe, therefore 8k must be safe, too.
> >> 
> >> No, I most certainly do not. I claim proving that 4K and seperate (per 
> >> cpu) 
> >> interrupt stacks are safe are exactly the same as proving unshared 8K 
> >> stacks 
> >> are safe. That is, you don't, no such proof exists other than in the 
> >> eating 
> >> of the pudding.
> > 
> > And yet you have a more strict claim than I do. If you are right, I'll be
> > right, too, because two times less-than-4K is less tham 8K.
> 
> Firstly, it's not two times 4K but 4K + (4K + 4K) * NR_CPUS. Secondly, _you_ 
> are the one making claims -- specifically that !CONFIG_4KSTACKS is "safer", 
> happily ignoring the fact that generally speaking available process stack 
> can be _better_ with CONFIG_4KSTACKS

It can be better, but the worst case stays 4K + 4K - unless one CPU will 
walk over to the next and nicely ask for a cup of stack.

Therefore you can discuss 4K + 4K or 4K + 4K + 4K, or 4K + 4K * \inf. It 
won't change a thing:

1) It all can be reduced to 4K + 4K by asuming all IRQ happen on one CPU.
2) Even if the interrupts decide not to happen on one CPU, you still can't 
   fit that possible 5K into 4K.

Having a local stack per CPU helps locality, and it's gootd, but that's 
about it.

> and there seems to exist but _one_ 
> (één, ein, une) known situation where it's problematic.

One case is reason enough not to enable 4K-stacks per default, and this 
is a common server setup. "server" as in "I need a reliable system".

> Must there be none rather than one? In some senses maybe, if the problem is 
> more than bad, fixable code but I doubt you know this. CONFIG_4KSTACKS is 
> much better on the VM (and hence faster) and as such,

"Look how fast I crashed!" doesn't buy you anything. In order to finish 
first, you first got to finish.

> any user not using the 
> one nicely isolated and identified problem case benefits from it.

And they can turn it on.

> This means 
> it's either very close or already _at_ the point of being the best default 
> for the kernel. Changing options is for users with special needs, as you 
> believe you are.

If you designed a car, you would also go for breaks with a well-known 
problem just because they weight less and all that people not crossing 
mountains would be happy about the weight benefit - that is if they'd 
notice, wouldn't you?

Please post a list of things you have designed, so I can avoid them.

> I truly apologise for taking it into this direction but you're wearing me 
> down rapidly.

I put the facts onto the ground. If you're getting down, you may stumble 
on them. Beware

> Every single time you insert some uninformed crap comment that 
> shows that you both don't understand the issue and didn't understand what 
> the other person was saying and then after being made aware of such, ignore 
> that and follow up with the next uninformed crap comment.

So what did you say about the worst case stack size being bigger than 4K?
That's correct, you choose to put it aside as a minor use case. Yea, it's
just the combination you'd choose for a reliable server setup, the users
won't have a problem when their systems crash ...

Was your claim about each CPU having a separate stack helping your cause?
No, everybody can see it's not. That is, except for you, your CPU will
just borrow some, since their neighbours have some free stack.


But let's not stop here: You claimed: "Unshared interrupt 
stacks make for more determistisc behaviour, so you'd have a harder time 
proven anything to some set limit of uncertainty with the shared 8K stacks 
than with the unshared 4K stacks."

So you want to tell me I can't prove 8K stacks are safe - you are right.
But can you prove 4K stacks are safe? You can't either. But you want to be
able to prove it. I told you to stick to your words - go and prove 4K+4K
to be safe. What did you do? You chose to ignore that.

I bet you don't even consider proving 4K stacks to be correct, nor do you
know anyone who would try that in the near future. And besides that, you
know at least one case where your proof would fail. So why would you talk
about proofs? Do you think you can trick me into believing a crashing 
system would work more correctly than a non-crashing one, just by 
mentioning the possibility of having it easier to make a proof?


Besides that, I told you you can separate unshared interrupt stacks from 
4K stacks. And yet, you still argue as if 4K stacks are required for 
having a separate interrupt stack.


So who's ignoring facts, who is talking crap?

> That is, you seem 
> to care less about the issue then about the discussion and since for me it's 
> quite the other way around I'm leaving it at that.

I know very well you're going to turn linux into a bleeding edge system 
where you're 

Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Zan Lynx
On Tue, 2007-07-17 at 18:52 +0200, Rene Herman wrote:
> On 07/17/2007 06:14 PM, Shawn Bohrer wrote:
> 
> > I can't speak for Fedora, but RHEL disables XFS in their kernel likely
> > because it is known to cause problems with 4K stacks.
> 
> Okay. So is it fair to say it's largely XFS that's the problem? No problems 
> with LVM/MD and say plain ext?

There *are* crashes from LVM and ext3.  I had to change kernels to avoid
them.

I had crashes with ext3 on LVM snapshot on DM mirror on SATA.
-- 
Zan Lynx <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread William Lee Irwin III
At some point in the past, I wrote:
>> If at some point one of the pro-4k stacks crowd can prove that all
>> code paths are safe, or introduce another viable alternative (such as
>> Matt's idea for extending the stack dynamically), then removing the 8k
>> stacks option makes sense.

On Mon, Jul 16, 2007 at 11:54:38PM +0100, Alan Cox wrote:
> Any x86-32 path unsafe with 4K stacks is almost certainly unsafe with 8K
> stacks because the 8K stacks do not have seperate IRQ stack paths, so you
> have the same space but split. It might be less predictable on 8K stacks
> but it isn't absent.

At hch's suggestion I rewrote the separate IRQ stack configurability
patch into one making IRQ stacks mandatory and unconfigurable, and
hence enabled with 8K stacks.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Rene Herman

On 07/17/2007 06:14 PM, Shawn Bohrer wrote:


I can't speak for Fedora, but RHEL disables XFS in their kernel likely
because it is known to cause problems with 4K stacks.


Okay. So is it fair to say it's largely XFS that's the problem? No problems 
with LVM/MD and say plain ext? If that's the case, I believe it could be 
concluded that it's not something in any sense fundamentally unfixable and 
the question becomes why XFS isn't fixed...


 Well, no. "oldconfig" works fine, and other than that, all failure modes 
 I've heard about also in this thread are MD/LVM/XFS. This is extremely 
 widely tested stuff in at least Fedora and RHEL.


Again don't assume that because Fedora and RHEL have 4K stacks means
that MD/LVM/XFS is widely tested.


No, quite, that specific combination was reported in this thread alone 3 
times again, so that one's clear, but _other_ than that, I've heard of no 
other failure modes.



Additionally I think I should point out that the problems pointed out so
far are not the only problem areas with 4K stacks.  There are out of
tree drivers to consider as well, and use cases like ndiswrapper.


Except these. Good to have pointed out, thanks, but as far as I'm concerned 
both these cases do not get a say in what's default configuration for the 
kernel.org kernel. They might get a say in what's removed or not removed 
from that kernel but that's not under discussion at the moment (nor would I 
expect it to be anytime soon if ever).


Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Shawn Bohrer
On Tue, Jul 17, 2007 at 02:57:45AM +0200, Rene Herman wrote:
>  True enough. I'm rather wondering though why RHEL is shipping with it if 
>  it's a _real_ problem. Scribbling junk all over kernel memory would be the 
>  kind of thing I'd imagine you'd mightely piss-off enterprise customers with. 
>  But well, sure, that rather quickly becomes a self-referential argument I 
>  guess.

I can't speak for Fedora, but RHEL disables XFS in their kernel likely
because it is known to cause problems with 4K stacks.

>  Well, no. "oldconfig" works fine, and other than that, all failure modes 
>  I've heard about also in this thread are MD/LVM/XFS. This is extremely 
>  widely tested stuff in at least Fedora and RHEL.

Again don't assume that because Fedora and RHEL have 4K stacks means
that MD/LVM/XFS is widely tested.

Additionally I think I should point out that the problems pointed out so
far are not the only problem areas with 4K stacks.  There are out of
tree drivers to consider as well, and use cases like ndiswrapper.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread John Stoffel

utz> On Tue, 2007-07-17 at 00:28 +0200, Rene Herman wrote:
>> Given that as Arjan stated Fedora and even RHEL have been using 4K stacks 
>> for some time now, and certainly the latter being a distribution which I 
>> would expect to both host a relatively large number of lvm/md/xfs and what 
>> stackeaters have you users and to be fairly conservative with respect to the 
>> chances of scribbling over kernel memory (I'm a trusting person...) it seems 
>> there might at this stage only be very few offenders left.

utz> I have to recompile the fedora kernel rpms (fc6, f7) with 8k
utz> stacks on my i686 server. It's using NFS -> XFS -> DM -> MD
utz> (raid1) -> IDE disks.  With 4k stacks it crash (hang) within
utz> minutes after using NFS.  With 8k stacks it's rock solid. No
utz> crashes within months.

Does it give any useful information when it does crash?  Can you make
a simple test case using ram disks instead of IDE disks and then
building upon that?  

I think I should try to do this myself at some point...

John
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Rene Herman

On 07/17/2007 12:06 PM, Bodo Eggert wrote:


On Tue, 17 Jul 2007, Rene Herman wrote:

On 07/17/2007 01:45 AM, Bodo Eggert wrote:



You claim 4k+4k is safe, therefore 8k must be safe, too.


No, I most certainly do not. I claim proving that 4K and seperate (per cpu) 
interrupt stacks are safe are exactly the same as proving unshared 8K stacks 
are safe. That is, you don't, no such proof exists other than in the eating 
of the pudding.


And yet you have a more strict claim than I do. If you are right, I'll be
right, too, because two times less-than-4K is less tham 8K.


Firstly, it's not two times 4K but 4K + (4K + 4K) * NR_CPUS. Secondly, _you_ 
are the one making claims -- specifically that !CONFIG_4KSTACKS is "safer", 
happily ignoring the fact that generally speaking available process stack 
can be _better_ with CONFIG_4KSTACKS and there seems to exist but _one_ 
(één, ein, une) known situation where it's problematic.


Must there be none rather than one? In some senses maybe, if the problem is 
more than bad, fixable code but I doubt you know this. CONFIG_4KSTACKS is 
much better on the VM (and hence faster) and as such, any user not using the 
one nicely isolated and identified problem case benefits from it. This means 
it's either very close or already _at_ the point of being the best default 
for the kernel. Changing options is for users with special needs, as you 
believe you are.


I truly apologise for taking it into this direction but you're wearing me 
down rapidly. Every single time you insert some uninformed crap comment that 
shows that you both don't understand the issue and didn't understand what 
the other person was saying and then after being made aware of such, ignore 
that and follow up with the next uninformed crap comment. That is, you seem 
to care less about the issue then about the discussion and since for me it's 
quite the other way around I'm leaving it at that.


RedHat is the one with the actual data available, and they've been enabling 
4KSTACKS for quite some time now (with some of their users apparently 
unhappy about it but not many it would seem).


Jesper also already posted how he's going to proceed: lift 4K from debug 
status and submit it as default for -mm. As to the latter bit, unless I 
remember wrong, it already _was_ default in -mm for some time a while ago so 
Andrew no doubt has an informed opinion on how to proceed with that.


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Rene Herman

On 07/17/2007 01:38 AM, Matt Mackall wrote:


On Sun, Jul 15, 2007 at 12:19:15AM +0200, Rene Herman wrote:



Quite. Ofcourse, saying "our stacks are 1 page" would be the by far
easiest solution to that. Personally, I've been running with 4K stacks
exclusively on a variety of machines for quite some time now, but I
can't say I'm all too adventurous with respect to filesystems
(especially) so I'm not sure how many problems remain with 4K stacks. I
did recently see Andrew Morton say that problems _do_ still exist. If
it's just XFS -- well, heck...


One long-standing problem is DM/LVM. That -may- be fixed now, but I
suspect issues remain.


Three cases were reported again in this thread alone yes. Problems do seem 
to be nicely isolated to that specific issue...



int growstack(int headroom, int func, void *data)
{


[ ... ]


}


This would also need something to tell func() where its current_thread_info 
 is now at.


That'd be handled in the usual way by switch_to_new_stack. That is,
we'd store the location of the old stack at the top of the new stack
and then literally change everything to point to the new stack.


I might not understand what you're saying but I don't believe that would do.
The current thread_info _itself_ (ie, the struct itself, not a pointer) is 
located at esp & ~(THREAD_SIZE - 1) meaning you'd either have to copy over 
the struct to the new stack, or forego that historic optimization (don't get 
me wrong, either may be okay).


Which might not be much of a problem. Can't think of much else 
either but it's the kind of thing you'd _like_ to be a problem just to have 
an excuse to shoot down an icky notion like that...


It's not any ickier than explicitly calling schedule().


Somewhat comparable in notion perhaps, but I disagree on the relative level 
of ickyness. Calling schedule() you do when you know you no longer have to 
hog te CPU and when you know it's safe to do so. Calling via growstack() 
looks to be a "ah, heck, let's err on the safe side since we don't have a 
bleedin' clue otherwise" sort of thing.


Would you intend this just as a "make this path work until we fix it 
properly" kind of thing?


Maybe.


If you know, _can_ MD/LVM (and/or XFS) in fact be sanely/timely fixed, or is 
this looking at something fundamental?


Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Bodo Eggert
On Tue, 17 Jul 2007, Rene Herman wrote:
> On 07/17/2007 01:45 AM, Bodo Eggert wrote:
> > On Tue, 17 Jul 2007, Rene Herman wrote:
> >> On 07/17/2007 12:37 AM, Ray Lee wrote:

> >>> If at some point one of the pro-4k stacks crowd can prove that all 
> >>> code paths are safe
> >> I'll do that the minute you prove the current shared 8K stacks are
> >> safe. Do we have a deal?
> > 
> > You claim 4k+4k is safe, therefore 8k must be safe, too.
> 
> No, I most certainly do not. I claim proving that 4K and seperate (per cpu) 
> interrupt stacks are safe are exactly the same as proving unshared 8K stacks 
> are safe. That is, you don't, no such proof exists other than in the eating 
> of the pudding.

And yet you have a more strict claim than I do. If you are right, I'll be
right, too, because two times less-than-4K is less tham 8K. If I'm wrong
and 8K is not enough, you must be wrong, too, because you can impossibly
fit more than 8K into 4K+4K. That's the law of mathematics.

> Ray (and you) in considering !CONFIG_4KSTACKS to be "safer" 
> than CONFIG_4KSTACKS suggest that _inevitably_ CONFIG_4KSTACKS would leave 
> you with less available stack and I pointed out this isn't be the case.

Why do you insist on 4Kstacks being good as long as there is _one_ usevase 
not crashing the kernel? _All_ usecases have to be safe!

> And in fact, I shouldn't have said "exactly" the same. Unshared interrupt 
> stacks

, which are a completely different thing which was bundled to 4K-stacks
  because you need more than 4K,

> make for more determistisc behaviour, so you'd have a harder time 
> proven anything to some set limit of uncertainty with the shared 8K stacks 
> than with the unshared 4K stacks.

I don't want my stack to overflow in order to be theoretically able to
prove it does not overflow. I'd rather go for 8K+4K-stacks, and if _you_
have done the proof _you_ wanted to make, we can talk again about
4K-stacks. Then I'll just add up the maximum stack usages and have the
proof that 8K stacks are safe.

> > But if 8k is safe, this does not yet prove that you can store 5k+3k in
> > 4k+4k.
> 
> I really have not made any claim of the kind. The argument is that with 
> CONFIG_4KSTACKS, availeble stack space isn't inevitably less at any point in 
> time.

I claim, you can store 5k + 3k on the 8k stack, where 5k is something like
the current worst case for non-interrupt stack and 3k is plenty for
interrupts. Thousands of stable systems with 8K stacks support my claim.

You claimed with 4k + 4k, there is not less available stack space.
(At least for usecases you are interested in, but I'll asume you don't 
 want other usecases to crash.)

If you were right, I'd have enough space on 4k + 4k to store that 5k.
Obviously, thousands of systems disagree by crashing with 4K-stacks.
That's most simple logic.

Off cause I may be wrong and the kernels don't crash because of 4K stacks, 
but because of bad karma ... But even then, you'd first have to get rid of
that bad karma before defaulting to 4K stacks.

-- 
Top 100 things you don't want the sysadmin to say:
41. OH, SH*T! (as they scrabble at the keyboard for ^c).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Bodo Eggert
On Tue, 17 Jul 2007, Rene Herman wrote:
 On 07/17/2007 01:45 AM, Bodo Eggert wrote:
  On Tue, 17 Jul 2007, Rene Herman wrote:
  On 07/17/2007 12:37 AM, Ray Lee wrote:

  If at some point one of the pro-4k stacks crowd can prove that all 
  code paths are safe
  I'll do that the minute you prove the current shared 8K stacks are
  safe. Do we have a deal?
  
  You claim 4k+4k is safe, therefore 8k must be safe, too.
 
 No, I most certainly do not. I claim proving that 4K and seperate (per cpu) 
 interrupt stacks are safe are exactly the same as proving unshared 8K stacks 
 are safe. That is, you don't, no such proof exists other than in the eating 
 of the pudding.

And yet you have a more strict claim than I do. If you are right, I'll be
right, too, because two times less-than-4K is less tham 8K. If I'm wrong
and 8K is not enough, you must be wrong, too, because you can impossibly
fit more than 8K into 4K+4K. That's the law of mathematics.

 Ray (and you) in considering !CONFIG_4KSTACKS to be safer 
 than CONFIG_4KSTACKS suggest that _inevitably_ CONFIG_4KSTACKS would leave 
 you with less available stack and I pointed out this isn't be the case.

Why do you insist on 4Kstacks being good as long as there is _one_ usevase 
not crashing the kernel? _All_ usecases have to be safe!

 And in fact, I shouldn't have said exactly the same. Unshared interrupt 
 stacks

, which are a completely different thing which was bundled to 4K-stacks
  because you need more than 4K,

 make for more determistisc behaviour, so you'd have a harder time 
 proven anything to some set limit of uncertainty with the shared 8K stacks 
 than with the unshared 4K stacks.

I don't want my stack to overflow in order to be theoretically able to
prove it does not overflow. I'd rather go for 8K+4K-stacks, and if _you_
have done the proof _you_ wanted to make, we can talk again about
4K-stacks. Then I'll just add up the maximum stack usages and have the
proof that 8K stacks are safe.

  But if 8k is safe, this does not yet prove that you can store 5k+3k in
  4k+4k.
 
 I really have not made any claim of the kind. The argument is that with 
 CONFIG_4KSTACKS, availeble stack space isn't inevitably less at any point in 
 time.

I claim, you can store 5k + 3k on the 8k stack, where 5k is something like
the current worst case for non-interrupt stack and 3k is plenty for
interrupts. Thousands of stable systems with 8K stacks support my claim.

You claimed with 4k + 4k, there is not less available stack space.
(At least for usecases you are interested in, but I'll asume you don't 
 want other usecases to crash.)

If you were right, I'd have enough space on 4k + 4k to store that 5k.
Obviously, thousands of systems disagree by crashing with 4K-stacks.
That's most simple logic.

Off cause I may be wrong and the kernels don't crash because of 4K stacks, 
but because of bad karma ... But even then, you'd first have to get rid of
that bad karma before defaulting to 4K stacks.

-- 
Top 100 things you don't want the sysadmin to say:
41. OH, SH*T! (as they scrabble at the keyboard for ^c).
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Rene Herman

On 07/17/2007 01:38 AM, Matt Mackall wrote:


On Sun, Jul 15, 2007 at 12:19:15AM +0200, Rene Herman wrote:



Quite. Ofcourse, saying our stacks are 1 page would be the by far
easiest solution to that. Personally, I've been running with 4K stacks
exclusively on a variety of machines for quite some time now, but I
can't say I'm all too adventurous with respect to filesystems
(especially) so I'm not sure how many problems remain with 4K stacks. I
did recently see Andrew Morton say that problems _do_ still exist. If
it's just XFS -- well, heck...


One long-standing problem is DM/LVM. That -may- be fixed now, but I
suspect issues remain.


Three cases were reported again in this thread alone yes. Problems do seem 
to be nicely isolated to that specific issue...



int growstack(int headroom, int func, void *data)
{


[ ... ]


}


This would also need something to tell func() where its current_thread_info 
 is now at.


That'd be handled in the usual way by switch_to_new_stack. That is,
we'd store the location of the old stack at the top of the new stack
and then literally change everything to point to the new stack.


I might not understand what you're saying but I don't believe that would do.
The current thread_info _itself_ (ie, the struct itself, not a pointer) is 
located at esp  ~(THREAD_SIZE - 1) meaning you'd either have to copy over 
the struct to the new stack, or forego that historic optimization (don't get 
me wrong, either may be okay).


Which might not be much of a problem. Can't think of much else 
either but it's the kind of thing you'd _like_ to be a problem just to have 
an excuse to shoot down an icky notion like that...


It's not any ickier than explicitly calling schedule().


Somewhat comparable in notion perhaps, but I disagree on the relative level 
of ickyness. Calling schedule() you do when you know you no longer have to 
hog te CPU and when you know it's safe to do so. Calling via growstack() 
looks to be a ah, heck, let's err on the safe side since we don't have a 
bleedin' clue otherwise sort of thing.


Would you intend this just as a make this path work until we fix it 
properly kind of thing?


Maybe.


If you know, _can_ MD/LVM (and/or XFS) in fact be sanely/timely fixed, or is 
this looking at something fundamental?


Rene.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Rene Herman

On 07/17/2007 12:06 PM, Bodo Eggert wrote:


On Tue, 17 Jul 2007, Rene Herman wrote:

On 07/17/2007 01:45 AM, Bodo Eggert wrote:



You claim 4k+4k is safe, therefore 8k must be safe, too.


No, I most certainly do not. I claim proving that 4K and seperate (per cpu) 
interrupt stacks are safe are exactly the same as proving unshared 8K stacks 
are safe. That is, you don't, no such proof exists other than in the eating 
of the pudding.


And yet you have a more strict claim than I do. If you are right, I'll be
right, too, because two times less-than-4K is less tham 8K.


Firstly, it's not two times 4K but 4K + (4K + 4K) * NR_CPUS. Secondly, _you_ 
are the one making claims -- specifically that !CONFIG_4KSTACKS is safer, 
happily ignoring the fact that generally speaking available process stack 
can be _better_ with CONFIG_4KSTACKS and there seems to exist but _one_ 
(één, ein, une) known situation where it's problematic.


Must there be none rather than one? In some senses maybe, if the problem is 
more than bad, fixable code but I doubt you know this. CONFIG_4KSTACKS is 
much better on the VM (and hence faster) and as such, any user not using the 
one nicely isolated and identified problem case benefits from it. This means 
it's either very close or already _at_ the point of being the best default 
for the kernel. Changing options is for users with special needs, as you 
believe you are.


I truly apologise for taking it into this direction but you're wearing me 
down rapidly. Every single time you insert some uninformed crap comment that 
shows that you both don't understand the issue and didn't understand what 
the other person was saying and then after being made aware of such, ignore 
that and follow up with the next uninformed crap comment. That is, you seem 
to care less about the issue then about the discussion and since for me it's 
quite the other way around I'm leaving it at that.


RedHat is the one with the actual data available, and they've been enabling 
4KSTACKS for quite some time now (with some of their users apparently 
unhappy about it but not many it would seem).


Jesper also already posted how he's going to proceed: lift 4K from debug 
status and submit it as default for -mm. As to the latter bit, unless I 
remember wrong, it already _was_ default in -mm for some time a while ago so 
Andrew no doubt has an informed opinion on how to proceed with that.


Rene.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread John Stoffel

utz On Tue, 2007-07-17 at 00:28 +0200, Rene Herman wrote:
 Given that as Arjan stated Fedora and even RHEL have been using 4K stacks 
 for some time now, and certainly the latter being a distribution which I 
 would expect to both host a relatively large number of lvm/md/xfs and what 
 stackeaters have you users and to be fairly conservative with respect to the 
 chances of scribbling over kernel memory (I'm a trusting person...) it seems 
 there might at this stage only be very few offenders left.

utz I have to recompile the fedora kernel rpms (fc6, f7) with 8k
utz stacks on my i686 server. It's using NFS - XFS - DM - MD
utz (raid1) - IDE disks.  With 4k stacks it crash (hang) within
utz minutes after using NFS.  With 8k stacks it's rock solid. No
utz crashes within months.

Does it give any useful information when it does crash?  Can you make
a simple test case using ram disks instead of IDE disks and then
building upon that?  

I think I should try to do this myself at some point...

John
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-17 Thread Shawn Bohrer
On Tue, Jul 17, 2007 at 02:57:45AM +0200, Rene Herman wrote:
  True enough. I'm rather wondering though why RHEL is shipping with it if 
  it's a _real_ problem. Scribbling junk all over kernel memory would be the 
  kind of thing I'd imagine you'd mightely piss-off enterprise customers with. 
  But well, sure, that rather quickly becomes a self-referential argument I 
  guess.

I can't speak for Fedora, but RHEL disables XFS in their kernel likely
because it is known to cause problems with 4K stacks.

  Well, no. oldconfig works fine, and other than that, all failure modes 
  I've heard about also in this thread are MD/LVM/XFS. This is extremely 
  widely tested stuff in at least Fedora and RHEL.

Again don't assume that because Fedora and RHEL have 4K stacks means
that MD/LVM/XFS is widely tested.

Additionally I think I should point out that the problems pointed out so
far are not the only problem areas with 4K stacks.  There are out of
tree drivers to consider as well, and use cases like ndiswrapper.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >