from:"Philipp Rumpf"

Re: [RFC] initstr & exitstr

2001-07-06 Thread Philipp Rumpf

On Fri, Jul 06, 2001 at 11:17:44PM -0300, Arnaldo Carvalho de Melo wrote:
> Hi,
> 
>   Please comment on this approach to move strings in __init functions
> from .rodata to .data.init so that it get discarded after initialization,
> like the variables marked as __initdata and the functions marked as __init,
> as well as move strings in __exit marked functions to .data.exit, that will
> be discarded and not even get into the generated kernel image.
> 
>   Please note that if possible the best approach was for gcc to move
> those strings automatically if the function was marked with modified 
> __init/__exit macros, but we have to keep in mind that some of the strings
> in those functions can not be discarded because they keep being referenced
> by say register_chrdev and others, unlike, for example, proc functions and
> others that copy the string passed to some malloc'ed data structure, so we
> have to be selective marking exactly the ones that can indeed be discarded.

.. or fix all registration functions to use a private copy of the string,
which would avoid some common oopses.

>   I've also implemented helper functions for printk thats the most
> common case, and leaved the other common case, panic, using
> __initstr/__exitstr explicitely, so that people can comment on what is
> better.

> #define init_printk(fmt,arg...) printk(__initstr(fmt) , ##arg)

I dislike init_printk;  it combines variadic functions/macros with
assumptions about how the first argument is specified (i.e. as a string
constant), so it's potentially very confusing.

Also, printk is used for debugging, and accidentally using init_printk
instead of printk would result in no messages being printed at all if
the driver is compiled into the kernel (while everything would work
fine if the driver is compiled as a module, where init_printk and printk
are identical).  I think this would be very annoying to track down for
less experienced kernel hackers.

> #define __initstr(s)({ static char __tmp_init_str[] __initdata=s;
> __tmp_init_str;})

I think this would fail if used in structure initialisations ?

Philipp Rumpf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] initstr exitstr

2001-07-06 Thread Philipp Rumpf


On Fri, Jul 06, 2001 at 11:17:44PM -0300, Arnaldo Carvalho de Melo wrote:
 Hi,
 
   Please comment on this approach to move strings in __init functions
 from .rodata to .data.init so that it get discarded after initialization,
 like the variables marked as __initdata and the functions marked as __init,
 as well as move strings in __exit marked functions to .data.exit, that will
 be discarded and not even get into the generated kernel image.
 
   Please note that if possible the best approach was for gcc to move
 those strings automatically if the function was marked with modified 
 __init/__exit macros, but we have to keep in mind that some of the strings
 in those functions can not be discarded because they keep being referenced
 by say register_chrdev and others, unlike, for example, proc functions and
 others that copy the string passed to some malloc'ed data structure, so we
 have to be selective marking exactly the ones that can indeed be discarded.

.. or fix all registration functions to use a private copy of the string,
which would avoid some common oopses.

   I've also implemented helper functions for printk thats the most
 common case, and leaved the other common case, panic, using
 __initstr/__exitstr explicitely, so that people can comment on what is
 better.

 #define init_printk(fmt,arg...) printk(__initstr(fmt) , ##arg)

I dislike init_printk;  it combines variadic functions/macros with
assumptions about how the first argument is specified (i.e. as a string
constant), so it's potentially very confusing.

Also, printk is used for debugging, and accidentally using init_printk
instead of printk would result in no messages being printed at all if
the driver is compiled into the kernel (while everything would work
fine if the driver is compiled as a module, where init_printk and printk
are identical).  I think this would be very annoying to track down for
less experienced kernel hackers.

 #define __initstr(s)({ static char __tmp_init_str[] __initdata=s;
 __tmp_init_str;})

I think this would fail if used in structure initialisations ?

Philipp Rumpf
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Prevent OOM from killing init

2001-03-22 Thread Philipp Rumpf

On Thu, Mar 22, 2001 at 01:14:41AM -0700, Eric W. Biederman wrote:
> Rik van Riel <[EMAIL PROTECTED]> writes:
> Is there ever a case where killing init is the right thing to do?

There are cases where panic() is the right thing to do.  Broken init
is such a case.

> My impression is that if init is selected the whole machine dies.
> If you can kill init and still have a machine that mostly works,

you can't.

> Guaranteeing not to select init can buy you piece of mind because
> init if properly setup can put the machine back together again, while
> not special casing init means something weird might happen and init
> would be selected.

If we're in a situation where long-running processes with relatively
small VM are killed the box is very unlikely to be usable anyway.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Prevent OOM from killing init

2001-03-22 Thread Philipp Rumpf

On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote:
> On Wed, 21 Mar 2001, Patrick O'Rourke wrote:
> 
> > Since the system will panic if the init process is chosen by
> > the OOM killer, the following patch prevents select_bad_process()
> > from picking init.
> 
> One question ... has the OOM killer ever selected init on
> anybody's system ?

Yes, I managed to reproduce this a while ago.  (init was the only
process around though).

We don't ever kill init, fwiw;  we panic(), which is the right thing
to do if init can't keep running.

> I think that the scoring algorithm should make sure that
> we never pick init, unless the system is screwed so badly
> that init is broken or the only process left ;)

I can't think of a situation where the OOM killer does the wrong thing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Prevent OOM from killing init

2001-03-22 Thread Philipp Rumpf


On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote:
 On Wed, 21 Mar 2001, Patrick O'Rourke wrote:
 
  Since the system will panic if the init process is chosen by
  the OOM killer, the following patch prevents select_bad_process()
  from picking init.
 
 One question ... has the OOM killer ever selected init on
 anybody's system ?

Yes, I managed to reproduce this a while ago.  (init was the only
process around though).

We don't ever kill init, fwiw;  we panic(), which is the right thing
to do if init can't keep running.

 I think that the scoring algorithm should make sure that
 we never pick init, unless the system is screwed so badly
 that init is broken or the only process left ;)

I can't think of a situation where the OOM killer does the wrong thing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: static scheduling - SCHED_IDLE?

2001-03-14 Thread Philipp Rumpf

On Fri, Mar 09, 2001 at 09:09:13PM +0100, Jamie Lokier wrote:
> Rik van Riel wrote:
> > > Just raise the priority whenever the task's in kernel mode.  Problem
> > > solved.
> > 
> > Remember that a task schedules itself out at the timer interrupt,
> > in kernel/sched.c::schedule() ... which is kernel mode ;)
> 
> Even nicer.  On x86 change this:
> 
> reschedule:
>   call SYMBOL_NAME(schedule)# test
>   jmp ret_from_sys_call
> 
> to this:
> 
> reschedule:
>   orl $PF_HONOUR_LOW_PRIORITY,flags(%ebx) 
>   call SYMBOL_NAME(schedule)# test
>   andl $~PF_HONOUR_LOW_PRIORITY,flags(%ebx)
>   jmp ret_from_sys_call
> 
> (You get the idea; this isn't the best implementation).

A few months ago, I implemented preemptible kernel threads (locally;  I
tend to think the other patches are superior).  Part of the changes was
to separate schedule into __schedule() (common part), schedule_user()
(automatic schedule from entry.S) and schedule() (manual schedule in
kernel space);  besides making what Jamie proposed easier, we can also
save a few cycles in the (common) schedule_user case:

 - we never release the kernel lock
 - we can pass current to schedule_user
 - we just handled softirqs

this is 2.5 material though ...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Rename all derived CONFIG variables

2001-03-12 Thread Philipp Rumpf

On Mon, Mar 12, 2001 at 06:03:22PM +1100, Keith Owens wrote:
> In 2.4.2-ac18 there are 130 CONFIG options that are always derived from
> other options, the user has no control over them.  It is useful for the
> kernel build process to know which variables are derived and which
> variables the user can control.  There are also 6 CONFIG options that

Yes, it is.  However, I don't see a reason to use a different namespace
for them.  At least some of the options below might become non-derived
options again, and having that knowledge spread out over most of the
kernel tree doesn't make any sense to me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Rename all derived CONFIG variables

2001-03-12 Thread Philipp Rumpf


On Mon, Mar 12, 2001 at 06:03:22PM +1100, Keith Owens wrote:
 In 2.4.2-ac18 there are 130 CONFIG options that are always derived from
 other options, the user has no control over them.  It is useful for the
 kernel build process to know which variables are derived and which
 variables the user can control.  There are also 6 CONFIG options that

Yes, it is.  However, I don't see a reason to use a different namespace
for them.  At least some of the options below might become non-derived
options again, and having that knowledge spread out over most of the
kernel tree doesn't make any sense to me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] documentation mm.h + swap.h

2001-03-09 Thread Philipp Rumpf

On Thu, Mar 08, 2001 at 06:10:16PM -0300, Rik van Riel wrote:
> --- linux-2.4.2-doc/include/linux/mm.h.orig   Wed Mar  7 15:36:32 2001
> +++ linux-2.4.2-doc/include/linux/mm.hThu Mar  8 09:54:22 2001
> @@ -39,32 +39,37 @@
>   * library, the executable area etc).
>   */
>  struct vm_area_struct {
> - struct mm_struct * vm_mm;   /* VM area parameters */
> - unsigned long vm_start;
> - unsigned long vm_end;
> + struct mm_struct * vm_mm;   /* The address space we belong to. */
> + unsigned long vm_start; /* Our start address within vm_mm. */
> + unsigned long vm_end;   /* Our end address within vm_mm. */

it might be a good idea to point out that this is the address of the byte
after the last one covered by the vma, not the address of the last byte.

(are there any architectures where we allow a vma at the end of memory ?  Is
the mm/ code handling ->vm_end = 0 correctly ?)

>  /*
> + * Each physical page in the system has a struct page associated with
> + * it to keep track of whatever it is we are using the page for at the
> + * moment. Note that we have no way to track which tasks are using
> + * a page.
> + *

Each page of "real" RAM.  In particular I think MMIO pages still don't have
a struct page.

>   * Try to keep the most commonly accessed fields in single cache lines
>   * here (16 bytes or greater).  This ordering should be particularly
>   * beneficial on 32-bit processors.
>   *
>   * The first line is data used in page cache lookup, the second line
>   * is used for linear searches (eg. clock algorithm scans).
> + *
> + * TODO: make this structure smaller, it could be as small as 32 bytes.

Or make it cover large pages, which might be even more of a win ..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] documentation mm.h + swap.h

2001-03-09 Thread Philipp Rumpf


On Thu, Mar 08, 2001 at 06:10:16PM -0300, Rik van Riel wrote:
 --- linux-2.4.2-doc/include/linux/mm.h.orig   Wed Mar  7 15:36:32 2001
 +++ linux-2.4.2-doc/include/linux/mm.hThu Mar  8 09:54:22 2001
 @@ -39,32 +39,37 @@
   * library, the executable area etc).
   */
  struct vm_area_struct {
 - struct mm_struct * vm_mm;   /* VM area parameters */
 - unsigned long vm_start;
 - unsigned long vm_end;
 + struct mm_struct * vm_mm;   /* The address space we belong to. */
 + unsigned long vm_start; /* Our start address within vm_mm. */
 + unsigned long vm_end;   /* Our end address within vm_mm. */

it might be a good idea to point out that this is the address of the byte
after the last one covered by the vma, not the address of the last byte.

(are there any architectures where we allow a vma at the end of memory ?  Is
the mm/ code handling -vm_end = 0 correctly ?)

  /*
 + * Each physical page in the system has a struct page associated with
 + * it to keep track of whatever it is we are using the page for at the
 + * moment. Note that we have no way to track which tasks are using
 + * a page.
 + *

Each page of "real" RAM.  In particular I think MMIO pages still don't have
a struct page.

   * Try to keep the most commonly accessed fields in single cache lines
   * here (16 bytes or greater).  This ordering should be particularly
   * beneficial on 32-bit processors.
   *
   * The first line is data used in page cache lookup, the second line
   * is used for linear searches (eg. clock algorithm scans).
 + *
 + * TODO: make this structure smaller, it could be as small as 32 bytes.

Or make it cover large pages, which might be even more of a win ..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ramdisk/VM fix

2001-03-08 Thread Philipp Rumpf


With the current rd.c code, we can get into a situation where there is
a buffer-only page for data which is also in a page cache page with
page->buffers != NULL.  The current vmscan.c code never frees the page
cache page in this scenario, effectively doubling ramdisk memory
requirements.

Linus, I think this is a bugfix (tested against -ac kernels):

diff -ur linux/include/linux/swap.h linux-prumpf/include/linux/swap.h
--- linux/include/linux/swap.h  Thu Mar  8 10:01:30 2001
+++ linux-prumpf/include/linux/swap.h   Thu Mar  8 10:14:12 2001
@@ -284,7 +284,7 @@
 #endif
 
 #define page_ramdisk(page) \
-   (page->buffers && (MAJOR(page->buffers->b_dev) == RAMDISK_MAJOR))
+   (page->buffers && (MAJOR(page->buffers->b_dev) == RAMDISK_MAJOR) && 
+buffer_protected(page->buffers))
 
 extern spinlock_t swaplock;
 

I don't think I'm missing anything important ...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ramdisk/VM fix

2001-03-08 Thread Philipp Rumpf


With the current rd.c code, we can get into a situation where there is
a buffer-only page for data which is also in a page cache page with
page-buffers != NULL.  The current vmscan.c code never frees the page
cache page in this scenario, effectively doubling ramdisk memory
requirements.

Linus, I think this is a bugfix (tested against -ac kernels):

diff -ur linux/include/linux/swap.h linux-prumpf/include/linux/swap.h
--- linux/include/linux/swap.h  Thu Mar  8 10:01:30 2001
+++ linux-prumpf/include/linux/swap.h   Thu Mar  8 10:14:12 2001
@@ -284,7 +284,7 @@
 #endif
 
 #define page_ramdisk(page) \
-   (page-buffers  (MAJOR(page-buffers-b_dev) == RAMDISK_MAJOR))
+   (page-buffers  (MAJOR(page-buffers-b_dev) == RAMDISK_MAJOR)  
+buffer_protected(page-buffers))
 
 extern spinlock_t swaplock;
 

I don't think I'm missing anything important ...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kmalloc() alignment

2001-03-06 Thread Philipp Rumpf


On Sun, Mar 04, 2001 at 10:34:31PM +, Alan Cox wrote:
> > Does kmalloc() make any guarantees of the alignment of allocated
> > blocks?  Will the returned block always be 4-, 8- or 16-byte
> > aligned, for example?
> 
> There are people who assume 16byte alignment guarantees. I dont think anyone
> has formally specified the guarantee beyond 4 bytes tho

Userspace malloc is "suitably aligned for any kind of variable", so I think
expecting 8 bytes alignment (long long on 32-bit platforms) should be okay.

>From reading the code it seems as though we actually use L1_CACHE_BYTES,
and I think it might be a good idea to document the current behaviour (as
long as there's no good reason to change it ?)

diff -ur linux/mm/slab.c linux-prumpf/mm/slab.c
--- linux/mm/slab.c Tue Mar  6 00:54:38 2001
+++ linux-prumpf/mm/slab.c  Tue Mar  6 01:00:47 2001
@@ -1525,9 +1525,10 @@
  * @flags: the type of memory to allocate.
  *
  * kmalloc is the normal method of allocating memory
- * in the kernel.  Note that the @size parameter must be less than or
- * equals to %KMALLOC_MAXSIZE and the caller must ensure this. The @flags
- * argument may be one of:
+ * in the kernel.  It returns a pointer (aligned to a hardware cache line
+ * boundary) to the allocated memory, or %NULL in case of failure. Note that
+ * the @size parameter must be less than or equal to %KMALLOC_MAXSIZE and
+ * the caller must ensure this. The @flags argument may be one of:
  *
  * %GFP_USER - Allocate memory on behalf of user.  May sleep.
  *
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kmalloc() alignment

2001-03-06 Thread Philipp Rumpf


On Sun, Mar 04, 2001 at 10:34:31PM +, Alan Cox wrote:
  Does kmalloc() make any guarantees of the alignment of allocated
  blocks?  Will the returned block always be 4-, 8- or 16-byte
  aligned, for example?
 
 There are people who assume 16byte alignment guarantees. I dont think anyone
 has formally specified the guarantee beyond 4 bytes tho

Userspace malloc is "suitably aligned for any kind of variable", so I think
expecting 8 bytes alignment (long long on 32-bit platforms) should be okay.

From reading the code it seems as though we actually use L1_CACHE_BYTES,
and I think it might be a good idea to document the current behaviour (as
long as there's no good reason to change it ?)

diff -ur linux/mm/slab.c linux-prumpf/mm/slab.c
--- linux/mm/slab.c Tue Mar  6 00:54:38 2001
+++ linux-prumpf/mm/slab.c  Tue Mar  6 01:00:47 2001
@@ -1525,9 +1525,10 @@
  * @flags: the type of memory to allocate.
  *
  * kmalloc is the normal method of allocating memory
- * in the kernel.  Note that the @size parameter must be less than or
- * equals to %KMALLOC_MAXSIZE and the caller must ensure this. The @flags
- * argument may be one of:
+ * in the kernel.  It returns a pointer (aligned to a hardware cache line
+ * boundary) to the allocated memory, or %NULL in case of failure. Note that
+ * the @size parameter must be less than or equal to %KMALLOC_MAXSIZE and
+ * the caller must ensure this. The @flags argument may be one of:
  *
  * %GFP_USER - Allocate memory on behalf of user.  May sleep.
  *
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] /drivers/char/serial.c cleanup

2001-03-05 Thread Philipp Rumpf


On Mon, Mar 05, 2001 at 03:37:04PM +0300, Andrey Panin wrote:
> Attached patch (2.4.2-ac11) makes some changes in serial driver:
> adds ioremap() return code checks, removes panic() calls
> and adds better error handling in start_pci_pnp_board() function.

Did you test it ?

> diff -u /linux.vanilla/drivers/char/serial.c /linux/drivers/char/serial.c
> --- /linux.vanilla/drivers/char/serial.c  Thu Mar  1 20:15:43 2001
> +++ /linux/drivers/char/serial.c  Fri Mar  2 00:10:29 2001
> @@ -3876,7 +3876,10 @@
>   return 0;
>   }
>   req->io_type = SERIAL_IO_MEM;
> - req->iomem_base = ioremap(port, board->uart_offset);
> + if ((req->iomem_base = ioremap(port, board->uart_offset))) {
> + printk(KERN_ERR "serial: Couldn's remap IO memory at %#lx\n", port);
> + return 1;
> + }

This seems wrong.  ioremap returns NULL in case of failure.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] /drivers/char/serial.c cleanup

2001-03-05 Thread Philipp Rumpf


On Mon, Mar 05, 2001 at 03:37:04PM +0300, Andrey Panin wrote:
 Attached patch (2.4.2-ac11) makes some changes in serial driver:
 adds ioremap() return code checks, removes panic() calls
 and adds better error handling in start_pci_pnp_board() function.

Did you test it ?

 diff -u /linux.vanilla/drivers/char/serial.c /linux/drivers/char/serial.c
 --- /linux.vanilla/drivers/char/serial.c  Thu Mar  1 20:15:43 2001
 +++ /linux/drivers/char/serial.c  Fri Mar  2 00:10:29 2001
 @@ -3876,7 +3876,10 @@
   return 0;
   }
   req-io_type = SERIAL_IO_MEM;
 - req-iomem_base = ioremap(port, board-uart_offset);
 + if ((req-iomem_base = ioremap(port, board-uart_offset))) {
 + printk(KERN_ERR "serial: Couldn's remap IO memory at %#lx\n", port);
 + return 1;
 + }

This seems wrong.  ioremap returns NULL in case of failure.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [kernel] Re: [PATCH] 2.4.2: cure the kapm-idled taking (100-epsilon)% CPU time

2001-03-03 Thread Philipp Rumpf

On Sun, Mar 04, 2001 at 12:19:07AM +0100, Francis Galiegue wrote:
> Well, from reading the source, I don't see how this can break APM... What am I
> missing?

apm_bios_call must not be called with two identical pointers for
two different registers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [kernel] Re: [PATCH] 2.4.2: cure the kapm-idled taking (100-epsilon)% CPU time

2001-03-03 Thread Philipp Rumpf


On Sun, Mar 04, 2001 at 12:19:07AM +0100, Francis Galiegue wrote:
 Well, from reading the source, I don't see how this can break APM... What am I
 missing?

apm_bios_call must not be called with two identical pointers for
two different registers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch-2.4.2-ac8] misc_register() fix (was Re: Linux 2.4.2ac8

2001-03-02 Thread Philipp Rumpf


On Fri, Mar 02, 2001 at 10:55:39AM +, Tigran Aivazian wrote:
> On Fri, 2 Mar 2001, Tigran Aivazian wrote:
> > On Thu, 1 Mar 2001, Alan Cox wrote:
> > > 2.4.2-ac8
> > > o Stop two people claiming the same misc dev id   (Philipp Rumpf)
> > 
> > is this what has broken misc devi registration on my machine? I have two
> > misc devices -- microcode and psaux -- now (ac8) I get none, /proc/misc is
> > empty. Also, on boot gpm generates an "oops" from gpm.c(968) saying
> > "/dev/mouse: No such device"
> 
> Hi Alan,
> 
> here is the fix, tested, it works fine. The only unsatisfactory thing is
> that we do an extra if() on each iteration making misc_register()
> typically a few instructions slower. I will think a few minutes on how to
> make the old version work (i.e. I suspect it was just an incorrect walking
> of the misc_list in ac8).

See earlier patch.

> --- linux/drivers/char/misc.c.0   Fri Mar  2 09:35:01 2001
> +++ linux/drivers/char/misc.c Fri Mar  2 10:01:17 2001
> @@ -175,14 +175,16 @@
>   
>   if (misc->next || misc->prev)
>   return -EBUSY;
> +
>   down(_sem);
> - c = misc_list.next;
>  
> - while ((c != _list) && (c->minor != misc->minor))
> + c = misc_list.next;
> + while (c != _list) {
> + if (c->minor == misc->minor) {
> + up(_sem);
> + return -EBUSY;
> + }
>   c = c->next;
> - if (c == _list) {
> - up(_sem);
> - return -EBUSY;
>   }
>  
>   if (misc->minor == MISC_DYNAMIC_MINOR) {

This is fine as well, and possibly more readable.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.4.2ac8 lost char devices

2001-03-02 Thread Philipp Rumpf


On Fri, Mar 02, 2001 at 03:05:32PM +0900, tachino Nobuhiro wrote:
> 
> Hello, 
> 
> At Fri, 02 Mar 2001 00:42:28 -0500,
> <[EMAIL PROTECTED]> wrote:
> > 
> > actually, its not just ps/2 mice -- it seems to be something generic to char
> > devices. agpgartis failing to register itself, too.
> > 
> > what changed with char device handling from ac7 to ac8?
> > 
> > robert love
> > [EMAIL PROTECTED]
> > -
> 
>   misc_register() in drivers/char/misc.c seems to have a problem.

I guess it's too late now to pretend it's not my fault.  I foolishly assumed
if it would boot for me I couldn't have broken it too badly.

> +   c = misc_list.next;
> +   
> +   while ((c != _list) && (c->minor != misc->minor))
> +   c = c->next;
> +   if (c == _list) {
> 
>   This should be  (c != _list)

Correct.  Alan,


diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c
--- linux/drivers/char/misc.c   Fri Mar  2 02:48:04 2001
+++ linux-prumpf/drivers/char/misc.cFri Mar  2 02:49:43 2001
@@ -180,7 +180,7 @@
 
while ((c != _list) && (c->minor != misc->minor))
c = c->next;
-   if (c == _list) {
+   if (c != _list) {
up(_sem);
return -EBUSY;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.4.2ac8 lost char devices

2001-03-02 Thread Philipp Rumpf


On Fri, Mar 02, 2001 at 03:05:32PM +0900, tachino Nobuhiro wrote:
 
 Hello, 
 
 At Fri, 02 Mar 2001 00:42:28 -0500,
 [EMAIL PROTECTED] wrote:
  
  actually, its not just ps/2 mice -- it seems to be something generic to char
  devices. agpgartis failing to register itself, too.
  
  what changed with char device handling from ac7 to ac8?
  
  robert love
  [EMAIL PROTECTED]
  -
 
   misc_register() in drivers/char/misc.c seems to have a problem.

I guess it's too late now to pretend it's not my fault.  I foolishly assumed
if it would boot for me I couldn't have broken it too badly.

 +   c = misc_list.next;
 +   
 +   while ((c != misc_list)  (c-minor != misc-minor))
 +   c = c-next;
 +   if (c == misc_list) {
 
   This should be  (c != misc_list)

Correct.  Alan,


diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c
--- linux/drivers/char/misc.c   Fri Mar  2 02:48:04 2001
+++ linux-prumpf/drivers/char/misc.cFri Mar  2 02:49:43 2001
@@ -180,7 +180,7 @@
 
while ((c != misc_list)  (c-minor != misc-minor))
c = c-next;
-   if (c == misc_list) {
+   if (c != misc_list) {
up(misc_sem);
return -EBUSY;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PCI oddities on Dell Inspiron 5000e w/ 2.4.x

2001-02-24 Thread Philipp Rumpf


On Sat, Feb 24, 2001 at 10:25:42AM -0700, Jeff Lessem wrote:
> In your message of: Sat, 24 Feb 2001 09:54:47 CST, you write:
> >Jeff, are you using the e820 memory map at all ?  In particular, are you
> >using grub or some other buggy bootloader that insists on specifying a
> >mem= option on the kernel command line ?  There should be a kernel command
> >line message very early on, what does that say ?
> 
> Yes, I am using grub, the buggy bootloader.  The relevant chunk of
> kernal messages are:
> 
>  BIOS-provided physical RAM map:
>   BIOS-e820: 0009f800 @  (usable)
>   BIOS-e820: 0800 @ 0009f800 (reserved)
>   BIOS-e820: 00019800 @ 000e6800 (reserved)
>   BIOS-e820: 13ef @ 0010 (usable)
>   BIOS-e820: fc00 @ 13ff (ACPI data)
>   BIOS-e820: 0400 @ 13fffc00 (ACPI NVS)
>   BIOS-e820: 0008 @ fff8 (reserved)
>  On node 0 totalpages: 81904
>  zone(0): 4096 pages.
>  zone(1): 77808 pages.
>  zone(2): 0 pages.
>  Kernel command line: root=/dev/hda1 mem=327616K
> 
> You are dead on, mem= seems a bit small.  Forcing mem=320M on the
> command line fixes the problem completely.

Careful, you're overwriting ACPI data now (and using it as normal RAM).

Can you try one of a) LILO b) a fixed version of grub c) this patch ?

diff -ur linux/arch/i386/kernel/setup.c linux-prumpf/arch/i386/kernel/setup.c
--- linux/arch/i386/kernel/setup.c  Fri Feb 23 13:37:38 2001
+++ linux-prumpf/arch/i386/kernel/setup.c   Sat Feb 24 09:49:50 2001
@@ -555,30 +555,9 @@
e820.nr_map = 0;
usermem = 1;
} else {
-   /* If the user specifies memory size, we
-* blow away any automatically generated
-* size
-*/
-   unsigned long start_at, mem_size;
- 
-   if (usermem == 0) {
-   /* first time in: zap the whitelist
-* and reinitialize it with the
-* standard low-memory region.
-*/
-   e820.nr_map = 0;
-   usermem = 1;
-   add_memory_region(0, LOWMEMSIZE(), E820_RAM);
-   }
-   mem_size = memparse(from+4, );
+   memparse(from+4, );
if (*from == '@')
-   start_at = memparse(from+1, );
-   else {
-   start_at = HIGH_MEMORY;
-   mem_size -= HIGH_MEMORY;
-   usermem=0;
-   }
-   add_memory_region(start_at, mem_size, E820_RAM);
+   memparse(from+1, );
}
}
c = *(from++);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PCI oddities on Dell Inspiron 5000e w/ 2.4.x

2001-02-24 Thread Philipp Rumpf

On Sat, Feb 24, 2001 at 05:36:47AM -0800, Linus Torvalds wrote:
> On Sat, 24 Feb 2001, Jeff Lessem wrote:
> > 
> > >Also, how much memory does this machine have? That "13ff" does worry
> > >me a bit..
> > 
> > The comptuer has 320MB.  At this point I am ready to conclude that the
> > computer is broken in some way, because nobody else with an Inspiron
> > 5000e that I have heard from has anything like this problem.

> I didn't believe that you'd have 320MB just because it's such an odd
> number, but the problem is that Linux apparently starts allocating the PCI
> address space just _under_ the 320MB mark (you probably have some memory
> reserved for ACPI that doesn't show up in the e820 memory map).

Jeff, are you using the e820 memory map at all ?  In particular, are you
using grub or some other buggy bootloader that insists on specifying a
mem= option on the kernel command line ?  There should be a kernel command
line message very early on, what does that say ?

Also, can you give us the E820 memory map (kernel messages starting with
BIOS-e820) ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PCI oddities on Dell Inspiron 5000e w/ 2.4.x

2001-02-24 Thread Philipp Rumpf


On Sat, Feb 24, 2001 at 05:36:47AM -0800, Linus Torvalds wrote:
 On Sat, 24 Feb 2001, Jeff Lessem wrote:
  
  Also, how much memory does this machine have? That "13ff" does worry
  me a bit..
  
  The comptuer has 320MB.  At this point I am ready to conclude that the
  computer is broken in some way, because nobody else with an Inspiron
  5000e that I have heard from has anything like this problem.

 I didn't believe that you'd have 320MB just because it's such an odd
 number, but the problem is that Linux apparently starts allocating the PCI
 address space just _under_ the 320MB mark (you probably have some memory
 reserved for ACPI that doesn't show up in the e820 memory map).

Jeff, are you using the e820 memory map at all ?  In particular, are you
using grub or some other buggy bootloader that insists on specifying a
mem= option on the kernel command line ?  There should be a kernel command
line message very early on, what does that say ?

Also, can you give us the E820 memory map (kernel messages starting with
BIOS-e820) ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PCI oddities on Dell Inspiron 5000e w/ 2.4.x

2001-02-24 Thread Philipp Rumpf


On Sat, Feb 24, 2001 at 10:25:42AM -0700, Jeff Lessem wrote:
 In your message of: Sat, 24 Feb 2001 09:54:47 CST, you write:
 Jeff, are you using the e820 memory map at all ?  In particular, are you
 using grub or some other buggy bootloader that insists on specifying a
 mem= option on the kernel command line ?  There should be a kernel command
 line message very early on, what does that say ?
 
 Yes, I am using grub, the buggy bootloader.  The relevant chunk of
 kernal messages are:
 
  BIOS-provided physical RAM map:
   BIOS-e820: 0009f800 @  (usable)
   BIOS-e820: 0800 @ 0009f800 (reserved)
   BIOS-e820: 00019800 @ 000e6800 (reserved)
   BIOS-e820: 13ef @ 0010 (usable)
   BIOS-e820: fc00 @ 13ff (ACPI data)
   BIOS-e820: 0400 @ 13fffc00 (ACPI NVS)
   BIOS-e820: 0008 @ fff8 (reserved)
  On node 0 totalpages: 81904
  zone(0): 4096 pages.
  zone(1): 77808 pages.
  zone(2): 0 pages.
  Kernel command line: root=/dev/hda1 mem=327616K
 
 You are dead on, mem= seems a bit small.  Forcing mem=320M on the
 command line fixes the problem completely.

Careful, you're overwriting ACPI data now (and using it as normal RAM).

Can you try one of a) LILO b) a fixed version of grub c) this patch ?

diff -ur linux/arch/i386/kernel/setup.c linux-prumpf/arch/i386/kernel/setup.c
--- linux/arch/i386/kernel/setup.c  Fri Feb 23 13:37:38 2001
+++ linux-prumpf/arch/i386/kernel/setup.c   Sat Feb 24 09:49:50 2001
@@ -555,30 +555,9 @@
e820.nr_map = 0;
usermem = 1;
} else {
-   /* If the user specifies memory size, we
-* blow away any automatically generated
-* size
-*/
-   unsigned long start_at, mem_size;
- 
-   if (usermem == 0) {
-   /* first time in: zap the whitelist
-* and reinitialize it with the
-* standard low-memory region.
-*/
-   e820.nr_map = 0;
-   usermem = 1;
-   add_memory_region(0, LOWMEMSIZE(), E820_RAM);
-   }
-   mem_size = memparse(from+4, from);
+   memparse(from+4, from);
if (*from == '@')
-   start_at = memparse(from+1, from);
-   else {
-   start_at = HIGH_MEMORY;
-   mem_size -= HIGH_MEMORY;
-   usermem=0;
-   }
-   add_memory_region(start_at, mem_size, E820_RAM);
+   memparse(from+1, from);
}
}
c = *(from++);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] (2.0) make softdog/hardware watchdog in same box work

2001-02-20 Thread Philipp Rumpf


While misc_register() semantics are different in 2.0 from 2.[24], and the
2.[24] code would actually work in 2.0, the 2.0 code doesn't.

This fixes (I think) the case where you have softdog and a hardware
watchdog driver on the same box (and obviously want to use the hardware
watchdog).

diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c
--- linux/drivers/char/misc.c   Thu Jun  4 00:17:47 1998
+++ linux-prumpf/drivers/char/misc.cTue Feb 20 18:05:46 2001
@@ -220,14 +220,17 @@
 #ifdef CONFIG_SUN_MOUSE
sun_mouse_init();
 #endif
-#ifdef CONFIG_SOFT_WATCHDOG
-   watchdog_init();
-#endif
+   /* In 2.0, only the first misc_register() is significant for each
+* minor.  So we load the hardware watchdog drivers first, then the
+* softdog driver. */
 #ifdef CONFIG_WDT
wdt_init();
 #endif
 #ifdef CONFIG_PCWATCHDOG
pcwatchdog_init();
+#endif
+#ifdef CONFIG_SOFT_WATCHDOG
+   watchdog_init();
 #endif
 #ifdef CONFIG_APM
apm_bios_init();

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] make softdog/hardware watchdog in same box work

2001-02-20 Thread Philipp Rumpf


misc_register() overrides misc devices with the same minor that have been
registered earlier, so if you enable both softdog and a hardware watchdog
the current initialization order will leave you with softdog only.

Should be fixed by (untested, 2.2):

diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c
--- linux/drivers/char/misc.c   Mon Dec 11 01:49:42 2000
+++ linux-prumpf/drivers/char/misc.cTue Feb 20 17:49:30 2001
@@ -216,11 +216,15 @@
pc110pad_init();
 #endif
 /*
- * Only one watchdog can succeed. We probe the pcwatchdog first,
- * then the wdt cards and finally the software watchdog which always
- * works. This means if your hardware watchdog dies or is 'borrowed'
- * for some reason the software watchdog still gives you some cover.
+ * Only one watchdog can succeed. We probe the software watchdog first,
+ * then the hardware watchdogs which can override softdog if you have
+ * both configured.  This means if your hardware watchdog dies or is
+ * 'borrowed' for some reason the software watchdog still gives you
+ * some cover.
  */
+#ifdef CONFIG_SOFT_WATCHDOG
+   watchdog_init();
+#endif
 #ifdef CONFIG_PCWATCHDOG
pcwatchdog_init();
 #endif
@@ -232,9 +236,6 @@
 #endif
 #ifdef CONFIG_60XX_WDT
sbc60xxwdt_init();
-#endif
-#ifdef CONFIG_SOFT_WATCHDOG
-   watchdog_init();
 #endif
 #ifdef CONFIG_DTLK
dtlk_init();

and (untested, 2.4):

diff -ur linux/drivers/char/Makefile linux-prumpf/drivers/char/Makefile
--- linux/drivers/char/Makefile Thu Jan  4 22:00:55 2001
+++ linux-prumpf/drivers/char/Makefile  Tue Feb 20 17:51:39 2001
@@ -170,11 +170,13 @@
 obj-$(CONFIG_NWBUTTON) += nwbutton.o
 obj-$(CONFIG_NWFLASH) += nwflash.o
 
-# Only one watchdog can succeed. We probe the hardware watchdog
-# drivers first, then the softdog driver.  This means if your hardware
-# watchdog dies or is 'borrowed' for some reason the software watchdog
-# still gives you some cover.
+# Only one watchdog can succeed. We probe the software watchdog first,
+# then the hardware watchdogs which can override softdog if you have
+# both configured.  This means if your hardware watchdog dies or is
+# 'borrowed' for some reason the software watchdog still gives you
+# some cover.
 
+obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o
 obj-$(CONFIG_PCWATCHDOG) += pcwd.o
 obj-$(CONFIG_ACQUIRE_WDT) += acquirewdt.o
 obj-$(CONFIG_MIXCOMWD) += mixcomwd.o
@@ -184,7 +186,6 @@
 obj-$(CONFIG_21285_WATCHDOG) += wdt285.o
 obj-$(CONFIG_977_WATCHDOG) += wdt977.o
 obj-$(CONFIG_I810_TCO) += i810-tco.o
-obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o
 
 
 include $(TOPDIR)/Rules.make


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-20 Thread Philipp Rumpf


On Tue, 20 Feb 2001, Keith Owens wrote:
> On Mon, 19 Feb 2001 16:04:07 + (GMT), 
> Alan Cox <[EMAIL PROTECTED]> wrote:
> Using wait_for_at_least_one_schedule_on_every_cpu() has no penalty
> except on the process running rmmod.  It does not require yet more
> spinlocks and is architecture independent.  Since schedule() updates
> sched_data->last_schedule, all the rmmod process has to do is wait
> until the last_schedule value changes on all cpus, forcing a reschedule
> if necessary.
> 
> Zero overhead in schedule, zero overhead in exception handling, zero
> overhead in IA64 unwind code, architecture independent.  Sounds good to
> me.

Not architecture independent unfortunately.  get_cycles() always returns 0
on some SMP-capable machines.

->last_schedule doesn't change if one CPU is always idle (kernel threads
do), or always running the same process (kernel threads do, unless it's an
RT process in an endless loop).  I'm not sure how you'd go about "forcing
a reschedule if necessary".

Using temporary kernel threads has zero overhead in {schedule, exception
handling, IA64 unwind code} and actually works on all architectures.  It
adds overhead to the wait_for_at_least_one_schedule_on_every_cpu() code,
but I think that's acceptable.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] new setprocuid syscall

2001-02-20 Thread Philipp Rumpf


> diff -urN linux-2.4.1/arch/i386/kernel/entry.S
> linux-2.4.1-setprocuid/arch/i386/kernel/entry.S
> --- linux-2.4.1/arch/i386/kernel/entry.SThu Nov  9 02:09:50 2000
> +++ linux-2.4.1-setprocuid/arch/i386/kernel/entry.S Mon Feb 19
> 22:12:00 2001
> @@ -645,6 +645,7 @@
> .long SYMBOL_NAME(sys_madvise)
> .long SYMBOL_NAME(sys_getdents64)   /* 220 */
> .long SYMBOL_NAME(sys_fcntl64)
> +   .long SYMBOL_NAME(sys_setprocuid)
> .long SYMBOL_NAME(sys_ni_syscall)   /* reserved for TUX */
> 
> /*

You stole TUX's syscall slot.

> +   p = find_task_by_pid(pid);
> +   p->fsuid = p->euid = p->suid = p->uid = uid;

p might be NULL.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] new setprocuid syscall

2001-02-20 Thread Philipp Rumpf


 diff -urN linux-2.4.1/arch/i386/kernel/entry.S
 linux-2.4.1-setprocuid/arch/i386/kernel/entry.S
 --- linux-2.4.1/arch/i386/kernel/entry.SThu Nov  9 02:09:50 2000
 +++ linux-2.4.1-setprocuid/arch/i386/kernel/entry.S Mon Feb 19
 22:12:00 2001
 @@ -645,6 +645,7 @@
 .long SYMBOL_NAME(sys_madvise)
 .long SYMBOL_NAME(sys_getdents64)   /* 220 */
 .long SYMBOL_NAME(sys_fcntl64)
 +   .long SYMBOL_NAME(sys_setprocuid)
 .long SYMBOL_NAME(sys_ni_syscall)   /* reserved for TUX */
 
 /*

You stole TUX's syscall slot.

 +   p = find_task_by_pid(pid);
 +   p-fsuid = p-euid = p-suid = p-uid = uid;

p might be NULL.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-20 Thread Philipp Rumpf


On Tue, 20 Feb 2001, Keith Owens wrote:
 On Mon, 19 Feb 2001 16:04:07 + (GMT), 
 Alan Cox [EMAIL PROTECTED] wrote:
 Using wait_for_at_least_one_schedule_on_every_cpu() has no penalty
 except on the process running rmmod.  It does not require yet more
 spinlocks and is architecture independent.  Since schedule() updates
 sched_data-last_schedule, all the rmmod process has to do is wait
 until the last_schedule value changes on all cpus, forcing a reschedule
 if necessary.
 
 Zero overhead in schedule, zero overhead in exception handling, zero
 overhead in IA64 unwind code, architecture independent.  Sounds good to
 me.

Not architecture independent unfortunately.  get_cycles() always returns 0
on some SMP-capable machines.

-last_schedule doesn't change if one CPU is always idle (kernel threads
do), or always running the same process (kernel threads do, unless it's an
RT process in an endless loop).  I'm not sure how you'd go about "forcing
a reschedule if necessary".

Using temporary kernel threads has zero overhead in {schedule, exception
handling, IA64 unwind code} and actually works on all architectures.  It
adds overhead to the wait_for_at_least_one_schedule_on_every_cpu() code,
but I think that's acceptable.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] make softdog/hardware watchdog in same box work

2001-02-20 Thread Philipp Rumpf


misc_register() overrides misc devices with the same minor that have been
registered earlier, so if you enable both softdog and a hardware watchdog
the current initialization order will leave you with softdog only.

Should be fixed by (untested, 2.2):

diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c
--- linux/drivers/char/misc.c   Mon Dec 11 01:49:42 2000
+++ linux-prumpf/drivers/char/misc.cTue Feb 20 17:49:30 2001
@@ -216,11 +216,15 @@
pc110pad_init();
 #endif
 /*
- * Only one watchdog can succeed. We probe the pcwatchdog first,
- * then the wdt cards and finally the software watchdog which always
- * works. This means if your hardware watchdog dies or is 'borrowed'
- * for some reason the software watchdog still gives you some cover.
+ * Only one watchdog can succeed. We probe the software watchdog first,
+ * then the hardware watchdogs which can override softdog if you have
+ * both configured.  This means if your hardware watchdog dies or is
+ * 'borrowed' for some reason the software watchdog still gives you
+ * some cover.
  */
+#ifdef CONFIG_SOFT_WATCHDOG
+   watchdog_init();
+#endif
 #ifdef CONFIG_PCWATCHDOG
pcwatchdog_init();
 #endif
@@ -232,9 +236,6 @@
 #endif
 #ifdef CONFIG_60XX_WDT
sbc60xxwdt_init();
-#endif
-#ifdef CONFIG_SOFT_WATCHDOG
-   watchdog_init();
 #endif
 #ifdef CONFIG_DTLK
dtlk_init();

and (untested, 2.4):

diff -ur linux/drivers/char/Makefile linux-prumpf/drivers/char/Makefile
--- linux/drivers/char/Makefile Thu Jan  4 22:00:55 2001
+++ linux-prumpf/drivers/char/Makefile  Tue Feb 20 17:51:39 2001
@@ -170,11 +170,13 @@
 obj-$(CONFIG_NWBUTTON) += nwbutton.o
 obj-$(CONFIG_NWFLASH) += nwflash.o
 
-# Only one watchdog can succeed. We probe the hardware watchdog
-# drivers first, then the softdog driver.  This means if your hardware
-# watchdog dies or is 'borrowed' for some reason the software watchdog
-# still gives you some cover.
+# Only one watchdog can succeed. We probe the software watchdog first,
+# then the hardware watchdogs which can override softdog if you have
+# both configured.  This means if your hardware watchdog dies or is
+# 'borrowed' for some reason the software watchdog still gives you
+# some cover.
 
+obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o
 obj-$(CONFIG_PCWATCHDOG) += pcwd.o
 obj-$(CONFIG_ACQUIRE_WDT) += acquirewdt.o
 obj-$(CONFIG_MIXCOMWD) += mixcomwd.o
@@ -184,7 +186,6 @@
 obj-$(CONFIG_21285_WATCHDOG) += wdt285.o
 obj-$(CONFIG_977_WATCHDOG) += wdt977.o
 obj-$(CONFIG_I810_TCO) += i810-tco.o
-obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o
 
 
 include $(TOPDIR)/Rules.make


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] (2.0) make softdog/hardware watchdog in same box work

2001-02-20 Thread Philipp Rumpf


While misc_register() semantics are different in 2.0 from 2.[24], and the
2.[24] code would actually work in 2.0, the 2.0 code doesn't.

This fixes (I think) the case where you have softdog and a hardware
watchdog driver on the same box (and obviously want to use the hardware
watchdog).

diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c
--- linux/drivers/char/misc.c   Thu Jun  4 00:17:47 1998
+++ linux-prumpf/drivers/char/misc.cTue Feb 20 18:05:46 2001
@@ -220,14 +220,17 @@
 #ifdef CONFIG_SUN_MOUSE
sun_mouse_init();
 #endif
-#ifdef CONFIG_SOFT_WATCHDOG
-   watchdog_init();
-#endif
+   /* In 2.0, only the first misc_register() is significant for each
+* minor.  So we load the hardware watchdog drivers first, then the
+* softdog driver. */
 #ifdef CONFIG_WDT
wdt_init();
 #endif
 #ifdef CONFIG_PCWATCHDOG
pcwatchdog_init();
+#endif
+#ifdef CONFIG_SOFT_WATCHDOG
+   watchdog_init();
 #endif
 #ifdef CONFIG_APM
apm_bios_init();

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel_thread() & thread starting

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Russell King wrote:
> Philipp Rumpf writes:
> > That still won't catch keventd oopsing though - which I think might happen
> > quite easily in real life.
> 
> Maybe we should panic in that case?  For example, what happens if kswapd
> oopses?  kreclaimd?  bdflush?  kupdate?  All these have the same problem,

No.  If kswapd oopses it's a bug in kswapd (or related code).  If keventd
oopses most likely the broken code is actually the task queue you
scheduled, which belongs to your driver.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Alan Cox wrote:
> > So you fixed the nonexistent race only.  The real race is that the module
> 
> Umm I fixed the small race. You are right that there is a second race.

There's just one race.  The small race is nonexistent since
put_mod_name always acts as a memory barrier.

> > uninitialized vmalloc'd (module_map'd) memory), then the module data
> > (including the exception table) gets copied.
> > The race window is from the first copy_from_user in sys_init_module until
> > the second one.
> 
> Yep. Obvious answer. Ignore exception tables for modules that are not
> MOD_RUNNING.

You can have exceptions while initializing.  not
MOD_RUNNING|MOD_INITIALIZING should work though.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf

On Mon, 19 Feb 2001, Alan Cox wrote:
> > so you hold a spinlock during copy_from_user ?  Or did you change
> > sys_init_module/sys_create_modules semantics ?
> 
> The only points I need to hold a spinlock are editing the chain and
> walking it in a case where I dont hold the kernel lock.
> 
> So I spin_lock_irqsave the two ops updating the list links in each case and
> the exception lookup walk

So you fixed the nonexistent race only.  The real race is that the module
structure gets initialized first (so the exception table pointers point to
uninitialized vmalloc'd (module_map'd) memory), then the module data
(including the exception table) gets copied.

The race window is from the first copy_from_user in sys_init_module until
the second one.

Or am I missing something ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Alan Cox wrote:
> > Rusty had a patch that locked the module list properly IIRC.
> 
> So does -ac now. I just take a spinlock for the modify cases that race
> against faults (including vmalloc faults from irq context)

so you hold a spinlock during copy_from_user ?  Or did you change
sys_init_module/sys_create_modules semantics ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel_thread() & thread starting

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Andrew Morton wrote:
> David Woodhouse wrote:
> > [EMAIL PROTECTED] said:
> > >  Why bother ?  It looks like a leftover debugging message which
> > > doesn't make a lot of sense once the code is stable (what might make
> > > sense is checking keventd is still around, but that's not what the
> > > code is doing).
> 
> keventd *must* still be around.

So put in the reliable check for it:

in start_context_thread:
while (!keventd_task)
schedule();

in need_keventd:
for_each_task(tsk) {
if (tsk == keventd_task) {
if (tsk->mm == NULL && strcmp(tsk->comm, "keventd") == 0)
return 1;
}
}

printk("eek.");

return 0;

> And the code obviously isn't completely stable, and this debug
> message has found something rather unpleasant.

So removing the debug code completely (which is what your patch does)
doesn't sound like the right thing.

> I don't think we should run the init tasks when keventd may, or
> may not be running.  Sure, the current code does, by happenstance,
> all work correctly when keventd hasn't yet started running, and
> when it's starting up.  But it's safer, saner and surer just
> to crank the damn thing up before proceeding.

Agreed.  The code snippet above should do that, no ?

> I believe the right thing to do here is the RMK approach.

That still won't catch keventd oopsing though - which I think might happen
quite easily in real life.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Tue, 20 Feb 2001, Keith Owens wrote:
> On Mon, 19 Feb 2001 06:15:22 -0600 (CST), 
> Philipp Rumpf <[EMAIL PROTECTED]> wrote:
> No need for a callin routine, you can get this for free as part of
> normal scheduling.  The sequence goes :-
> 
> if (use_count == 0) {
>   module_unregister();
>   wait_for_at_least_one_schedule_on_every_cpu();
>   if (use_count != 0) {
> module_register();/* lost the unregister race */
>   }
>   else {
> /* nobody can enter the module now */
> module_release_resources();
> unlink_module_from_list();
> wait_for_at_least_one_schedule_on_every_cpu();
> free_module_storage();
>   }
> }
> 
> wait_for_at_least_one_schedule_on_every_cpu() prevents the next

wait_for_at_least_one_schedule_on_every_cpu() *is* callin_other_cpus().
I agree the name isn't optimal.  (and yes, you could implement it by
hacking sched.c directly, but I don't think that's necessary).

> The beauty of this approach is that the rest of the cpus can do normal
> work.  No need to bring everything to a dead stop.

Which nicely avoids potential deadlocks in modules that need to initialize
on all CPUs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Keith Owens wrote:
> On Mon, 19 Feb 2001 11:35:08 + (GMT), 
> Alan Cox <[EMAIL PROTECTED]> wrote:
> >The problem isnt running module code. What happens in this case
> >
> >mod->next = module_list;
> >module_list = mod;  /* link it in */
> >
> >Note no write barrier.
> 
> It works on ix86 so the code must be right

Too bad it doesn't.

> >Delete is even worse
> >
> >We unlink the module
> >We free the memory
> >
> >At the same time another cpu may be walking the exception table that we free.
> 
> Another good reason why locking modules via use counts from external
> code is not the right fix.  We definitely need a quiesce model for
> module removal.

Unless I'm mistaken, we need both use counts and SMP magic (though not
necessarily as extreme as what the "freeze all other CPUs during module
unload" patch did).


I think something like this would work (in addition to use counts)

int callin_func(void *p)
{
int *cpu = p;

while (*cpu != smp_processor_id()) {
current->cpus_allowed = 1 << *cpu;
schedule();
}

return 0;
}

void callin_other_cpus(void)
{
int cpus[smp_num_cpus];
int i;

for (i=0; ihttp://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf

On Mon, 19 Feb 2001, Alan Cox wrote:
> mod->next = module_list;

put_mod_name() here.

> module_list = mod;  /* link it in */
> 
> Note no write barrier.

put_mod_name calls free_page which always implies a memory barrier.  This
isn't beautiful but it won't blow up either.

Actually that isn't relevant since the actual table is copied _after_
ex_table_{start,end}.  We'll scan uninitialized memory and if some word
happens to match the fault EIP we jump to the bogus fixup.

> Delete is even worse
> 
> We unlink the module
> We free the memory
> 
> At the same time another cpu may be walking the exception table that we free.

True.

Rusty had a patch that locked the module list properly IIRC.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel_thread() & thread starting

2001-02-19 Thread Philipp Rumpf


On Sun, 18 Feb 2001, Kenn Humborg wrote:
> in the .config, I can get schedule_task() to fail with:
> 
>schedule_task(): keventd has not started

This shouldn't be a failure case, just a (bogus) printk.

> When starting bdflush and kupdated, bdflush_init() uses a semaphore to
> make sure that the threads have run before continuing.  Shouldn't
> start_context_thread() do something similar?

Why bother ?  It looks like a leftover debugging message which doesn't
make a lot of sense once the code is stable (what might make sense is
checking keventd is still around, but that's not what the code is doing).

Proposed patch:

--- linux-2.4/kernel/context.c  Fri Jan 12 18:52:41 2001
+++ linux-prumpf/kernel/context.c   Mon Feb 19 11:11:37 2001
@@ -26,19 +26,9 @@
 static int keventd_running;
 static struct task_struct *keventd_task;
 
-static int need_keventd(const char *who)
-{
-   if (keventd_running == 0)
-   printk(KERN_ERR "%s(): keventd has not started\n", who);
-   return keventd_running;
-}
-   
 int current_is_keventd(void)
 {
-   int ret = 0;
-   if (need_keventd(__FUNCTION__))
-   ret = (current == keventd_task);
-   return ret;
+   return current == keventd_task;
 }
 
 /**
@@ -57,7 +47,6 @@
 int schedule_task(struct tq_struct *task)
 {
int ret;
-   need_keventd(__FUNCTION__);
ret = queue_task(task, _context);
wake_up(_task_wq);
return ret;

dwmw2 ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf

On Thu, 15 Feb 2001, Alan Cox wrote:
>   Question of the day for the VM folks:
>   If CPU1 is loading the exception tables for a module and
>   CPU2 faults.. what happens 8)

"loading" == in sys_create_module ?

The module list is modified atomically, so either we search the new table
or we don't, but we never see intermediate states.  Not searching the new
table shouldn't be a problem as we shouldn't run module code until
sys_init_module time.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Thu, 15 Feb 2001, Alan Cox wrote:
   Question of the day for the VM folks:
   If CPU1 is loading the exception tables for a module and
   CPU2 faults.. what happens 8)

"loading" == in sys_create_module ?

The module list is modified atomically, so either we search the new table
or we don't, but we never see intermediate states.  Not searching the new
table shouldn't be a problem as we shouldn't run module code until
sys_init_module time.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel_thread() thread starting

2001-02-19 Thread Philipp Rumpf


On Sun, 18 Feb 2001, Kenn Humborg wrote:
 in the .config, I can get schedule_task() to fail with:
 
schedule_task(): keventd has not started

This shouldn't be a failure case, just a (bogus) printk.

 When starting bdflush and kupdated, bdflush_init() uses a semaphore to
 make sure that the threads have run before continuing.  Shouldn't
 start_context_thread() do something similar?

Why bother ?  It looks like a leftover debugging message which doesn't
make a lot of sense once the code is stable (what might make sense is
checking keventd is still around, but that's not what the code is doing).

Proposed patch:

--- linux-2.4/kernel/context.c  Fri Jan 12 18:52:41 2001
+++ linux-prumpf/kernel/context.c   Mon Feb 19 11:11:37 2001
@@ -26,19 +26,9 @@
 static int keventd_running;
 static struct task_struct *keventd_task;
 
-static int need_keventd(const char *who)
-{
-   if (keventd_running == 0)
-   printk(KERN_ERR "%s(): keventd has not started\n", who);
-   return keventd_running;
-}
-   
 int current_is_keventd(void)
 {
-   int ret = 0;
-   if (need_keventd(__FUNCTION__))
-   ret = (current == keventd_task);
-   return ret;
+   return current == keventd_task;
 }
 
 /**
@@ -57,7 +47,6 @@
 int schedule_task(struct tq_struct *task)
 {
int ret;
-   need_keventd(__FUNCTION__);
ret = queue_task(task, tq_context);
wake_up(context_task_wq);
return ret;

dwmw2 ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Alan Cox wrote:
 mod-next = module_list;

put_mod_name() here.

 module_list = mod;  /* link it in */
 
 Note no write barrier.

put_mod_name calls free_page which always implies a memory barrier.  This
isn't beautiful but it won't blow up either.

Actually that isn't relevant since the actual table is copied _after_
ex_table_{start,end}.  We'll scan uninitialized memory and if some word
happens to match the fault EIP we jump to the bogus fixup.

 Delete is even worse
 
 We unlink the module
 We free the memory
 
 At the same time another cpu may be walking the exception table that we free.

True.

Rusty had a patch that locked the module list properly IIRC.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Keith Owens wrote:
 On Mon, 19 Feb 2001 11:35:08 + (GMT), 
 Alan Cox [EMAIL PROTECTED] wrote:
 The problem isnt running module code. What happens in this case
 
 mod-next = module_list;
 module_list = mod;  /* link it in */
 
 Note no write barrier.
 
 humourIt works on ix86 so the code must be right/humour

Too bad it doesn't.

 Delete is even worse
 
 We unlink the module
 We free the memory
 
 At the same time another cpu may be walking the exception table that we free.
 
 Another good reason why locking modules via use counts from external
 code is not the right fix.  We definitely need a quiesce model for
 module removal.

Unless I'm mistaken, we need both use counts and SMP magic (though not
necessarily as extreme as what the "freeze all other CPUs during module
unload" patch did).


I think something like this would work (in addition to use counts)

int callin_func(void *p)
{
int *cpu = p;

while (*cpu != smp_processor_id()) {
current-cpus_allowed = 1  *cpu;
schedule();
}

return 0;
}

void callin_other_cpus(void)
{
int cpus[smp_num_cpus];
int i;

for (i=0; ismp_num_cpus; i++) {
cpus[i] = i;

kernel_thread(callin_func, cpus[i], ...);
}
}

and call callin_other_cpus() before unloading a module.


I'm not sure how you could make exception handling safe without locking
all accesses to the module list - but that sounds like the sane thing to
do anyway.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Tue, 20 Feb 2001, Keith Owens wrote:
 On Mon, 19 Feb 2001 06:15:22 -0600 (CST), 
 Philipp Rumpf [EMAIL PROTECTED] wrote:
 No need for a callin routine, you can get this for free as part of
 normal scheduling.  The sequence goes :-
 
 if (use_count == 0) {
   module_unregister();
   wait_for_at_least_one_schedule_on_every_cpu();
   if (use_count != 0) {
 module_register();/* lost the unregister race */
   }
   else {
 /* nobody can enter the module now */
 module_release_resources();
 unlink_module_from_list();
 wait_for_at_least_one_schedule_on_every_cpu();
 free_module_storage();
   }
 }
 
 wait_for_at_least_one_schedule_on_every_cpu() prevents the next

wait_for_at_least_one_schedule_on_every_cpu() *is* callin_other_cpus().
I agree the name isn't optimal.  (and yes, you could implement it by
hacking sched.c directly, but I don't think that's necessary).

 The beauty of this approach is that the rest of the cpus can do normal
 work.  No need to bring everything to a dead stop.

Which nicely avoids potential deadlocks in modules that need to initialize
on all CPUs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel_thread() thread starting

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Andrew Morton wrote:
 David Woodhouse wrote:
  [EMAIL PROTECTED] said:
Why bother ?  It looks like a leftover debugging message which
   doesn't make a lot of sense once the code is stable (what might make
   sense is checking keventd is still around, but that's not what the
   code is doing).
 
 keventd *must* still be around.

So put in the reliable check for it:

in start_context_thread:
while (!keventd_task)
schedule();

in need_keventd:
for_each_task(tsk) {
if (tsk == keventd_task) {
if (tsk-mm == NULL  strcmp(tsk-comm, "keventd") == 0)
return 1;
}
}

printk("eek.");

return 0;

 And the code obviously isn't completely stable, and this debug
 message has found something rather unpleasant.

So removing the debug code completely (which is what your patch does)
doesn't sound like the right thing.

 I don't think we should run the init tasks when keventd may, or
 may not be running.  Sure, the current code does, by happenstance,
 all work correctly when keventd hasn't yet started running, and
 when it's starting up.  But it's safer, saner and surer just
 to crank the damn thing up before proceeding.

Agreed.  The code snippet above should do that, no ?

 I believe the right thing to do here is the RMK approach.

That still won't catch keventd oopsing though - which I think might happen
quite easily in real life.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Alan Cox wrote:
  Rusty had a patch that locked the module list properly IIRC.
 
 So does -ac now. I just take a spinlock for the modify cases that race
 against faults (including vmalloc faults from irq context)

so you hold a spinlock during copy_from_user ?  Or did you change
sys_init_module/sys_create_modules semantics ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Alan Cox wrote:
  so you hold a spinlock during copy_from_user ?  Or did you change
  sys_init_module/sys_create_modules semantics ?
 
 The only points I need to hold a spinlock are editing the chain and
 walking it in a case where I dont hold the kernel lock.
 
 So I spin_lock_irqsave the two ops updating the list links in each case and
 the exception lookup walk

So you fixed the nonexistent race only.  The real race is that the module
structure gets initialized first (so the exception table pointers point to
uninitialized vmalloc'd (module_map'd) memory), then the module data
(including the exception table) gets copied.

The race window is from the first copy_from_user in sys_init_module until
the second one.

Or am I missing something ?




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.1-ac15

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Alan Cox wrote:
  So you fixed the nonexistent race only.  The real race is that the module
 
 Umm I fixed the small race. You are right that there is a second race.

There's just one race.  The small race is nonexistent since
put_mod_name always acts as a memory barrier.

  uninitialized vmalloc'd (module_map'd) memory), then the module data
  (including the exception table) gets copied.
  The race window is from the first copy_from_user in sys_init_module until
  the second one.
 
 Yep. Obvious answer. Ignore exception tables for modules that are not
 MOD_RUNNING.

You can have exceptions while initializing.  not
MOD_RUNNING|MOD_INITIALIZING should work though.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel_thread() thread starting

2001-02-19 Thread Philipp Rumpf


On Mon, 19 Feb 2001, Russell King wrote:
 Philipp Rumpf writes:
  That still won't catch keventd oopsing though - which I think might happen
  quite easily in real life.
 
 Maybe we should panic in that case?  For example, what happens if kswapd
 oopses?  kreclaimd?  bdflush?  kupdate?  All these have the same problem,

No.  If kswapd oopses it's a bug in kswapd (or related code).  If keventd
oopses most likely the broken code is actually the task queue you
scheduled, which belongs to your driver.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] 2.4.2-pre3: parport_pc init_module bug

2001-02-14 Thread Philipp Rumpf

On Wed, 14 Feb 2001, Grant Grundler wrote:
> Philipp Rumpf wrote:
> > Jeff Garzik wrote:
> > > Looks ok, but I wonder if we should include this list in the docs.
> > > These is stuff defined by the PCI spec, and this list could potentially
> > > get longer...  (opinions either way wanted...)
> 
> Having people look things up in the spec isn't very user friendly.

Having the constants in some well-known header file should be sufficient,
shouldn't it ?

> > I'm not sure whether the
> > plan is to have drivers handle MSIs or do it in the generic PCI code.
> > Grant ?
> 
> Generic PCI code can d very little by itself with MSI since not all
> platforms provide support for it - even within the same arch.

It depends on the platform and maybe the exact PCI slot used, but I don't
think it depends on the driver (unless MSI support is broken in which case
you would want to fix it up in the driver).  At least I can't find
anything in the PCI 2.2 spec that would suggest we need to consult the
driver before enabling MSIs with one message only.

> It's also possible for the driver to just ignore MSI and not use it.
> ie use regular PCI IRQ lines for interrupts.

.. at least until someone comes out with a PCI board that supports MSI
interrupts only.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] 2.4.2-pre3: parport_pc init_module bug

2001-02-14 Thread Philipp Rumpf


On Wed, 14 Feb 2001, Jeff Garzik wrote:
> On Wed, 14 Feb 2001, Tim Waugh wrote:
> > +/**
> > + * pci_find_capability - query for devices' capabilities 
> > + * @dev: PCI device to query
> > + * @cap: capability code
> > + *
> > + * Tell if a device supports a given PCI capability.
> > + * Returns the address of the requested capability structure within the device's 
>PCI 
> > + * configuration space or 0 in case the device does not support it.
> > + * Possible values for @flags:
^^
@cap would make more sense, no ?

> > + *  %PCI_CAP_ID_AGP  Accelerated Graphics Port 
> > + *
> > + *  %PCI_CAP_ID_VPD  Vital Product Data 
> > + *
> > + *  %PCI_CAP_ID_SLOTID   Slot Identification 
> > + *
> > + *  %PCI_CAP_ID_MSI  Message Signalled Interrupts
> > + *
> > + *  %PCI_CAP_ID_CHSWPCompactPCI HotSwap 
> > + */
> 
> Looks ok, but I wonder if we should include this list in the docs.
> These is stuff defined by the PCI spec, and this list could potentially
> get longer...  (opinions either way wanted...)

I would vote for including those capabilities which are most likely to be
used by drivers (PCI_CAP_ID_PM, and maybe _VPD).  I'm not sure whether the
plan is to have drivers handle MSIs or do it in the generic PCI code.
Grant ?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] 2.4.2-pre3: parport_pc init_module bug

2001-02-14 Thread Philipp Rumpf


On Wed, 14 Feb 2001, Tim Waugh wrote:
> + * pci_find_subsys - begin or continue searching for a PCI device by 
>vendor/subvendor/device/subdevice id
> + * @vendor: PCI vendor id to match, or %PCI_ANY_ID to match all vendor ids
> + * @device: PCI device id to match, or %PCI_ANY_ID to match all vendor ids

"match all device ids", surely ?

> + * @ss_device: PCI subsystem device id to match, or %PCI_ANY_ID to match all vendor 
>ids

ditto.

(the pci_find_device documentation has "all vendor ids" as well)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] 2.4.2-pre3: parport_pc init_module bug

2001-02-14 Thread Philipp Rumpf


On Wed, 14 Feb 2001, Tim Waugh wrote:
 + * pci_find_subsys - begin or continue searching for a PCI device by 
vendor/subvendor/device/subdevice id
 + * @vendor: PCI vendor id to match, or %PCI_ANY_ID to match all vendor ids
 + * @device: PCI device id to match, or %PCI_ANY_ID to match all vendor ids

"match all device ids", surely ?

 + * @ss_device: PCI subsystem device id to match, or %PCI_ANY_ID to match all vendor 
ids

ditto.

(the pci_find_device documentation has "all vendor ids" as well)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] 2.4.2-pre3: parport_pc init_module bug

2001-02-14 Thread Philipp Rumpf


On Wed, 14 Feb 2001, Jeff Garzik wrote:
 On Wed, 14 Feb 2001, Tim Waugh wrote:
  +/**
  + * pci_find_capability - query for devices' capabilities 
  + * @dev: PCI device to query
  + * @cap: capability code
  + *
  + * Tell if a device supports a given PCI capability.
  + * Returns the address of the requested capability structure within the device's 
PCI 
  + * configuration space or 0 in case the device does not support it.
  + * Possible values for @flags:
^^
@cap would make more sense, no ?

  + *  %PCI_CAP_ID_AGP  Accelerated Graphics Port 
  + *
  + *  %PCI_CAP_ID_VPD  Vital Product Data 
  + *
  + *  %PCI_CAP_ID_SLOTID   Slot Identification 
  + *
  + *  %PCI_CAP_ID_MSI  Message Signalled Interrupts
  + *
  + *  %PCI_CAP_ID_CHSWPCompactPCI HotSwap 
  + */
 
 Looks ok, but I wonder if we should include this list in the docs.
 These is stuff defined by the PCI spec, and this list could potentially
 get longer...  (opinions either way wanted...)

I would vote for including those capabilities which are most likely to be
used by drivers (PCI_CAP_ID_PM, and maybe _VPD).  I'm not sure whether the
plan is to have drivers handle MSIs or do it in the generic PCI code.
Grant ?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.4.0 kernel paging error

2001-01-10 Thread Philipp Rumpf

On Wed, Jan 10, 2001 at 05:55:05PM +0100, Daniel Phillips wrote:
> Mark Hindley wrote:
> > I am running 2.4.0 final. I got the following failed paging request which
> > produced a complete freeze.
> > 
> > As you can see it was precipitated by cron starting to run some
> > housekeeping stuff overnight.
> > 
> > Has anyone else had prblems?
> 
> It looks real.  It was executing this line of clear_inode in fs/inode.c:
> 
> 380 if (inode->i_sb && inode->i_sb->s_op && inode->i_sb->s_op->clear_inode)
>  ^
>and it blew up here ->

> Unable to handle kernel paging request at virtual address c4870840

I'm pretty sure this is a vmalloc/module address, which would mean ->s_op
points to a module that has been unloaded.  This sounds consistent with the
"cron starting to run some housekeeping stuff" above.

Mark, which file systems are you using ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0 kernel paging error

2001-01-10 Thread Philipp Rumpf


On Wed, Jan 10, 2001 at 05:55:05PM +0100, Daniel Phillips wrote:
 Mark Hindley wrote:
  I am running 2.4.0 final. I got the following failed paging request which
  produced a complete freeze.
  
  As you can see it was precipitated by cron starting to run some
  housekeeping stuff overnight.
  
  Has anyone else had prblems?
 
 It looks real.  It was executing this line of clear_inode in fs/inode.c:
 
 380 if (inode-i_sb  inode-i_sb-s_op  inode-i_sb-s_op-clear_inode)
  ^
and it blew up here -

 Unable to handle kernel paging request at virtual address c4870840

I'm pretty sure this is a vmalloc/module address, which would mean -s_op
points to a module that has been unloaded.  This sounds consistent with the
"cron starting to run some housekeeping stuff" above.

Mark, which file systems are you using ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: innd mmap bug in 2.4.0-test12

2000-12-27 Thread Philipp Rumpf

On Wed, Dec 27, 2000 at 03:41:04PM -0800, Linus Torvalds wrote:
> It must be wrong.
> 
> If we have a dirty page on the LRU lists, that page _must_ have a mapping.

What about pages with a mapping but without a writepage function ? or pages
whose writepage function fails ?  The current code seems to simply put the
page onto the active list in that case, which seems just as wrong to me.

> The bug is somewhere else, and your patch is just papering it over. We
> should not have a page without a mapping on the LRU lists in the first
> place, except if the page has anonymous buffers (and such a page cannot

So is there any legal reason we could ever get to page_active ?  Removing
that code (or replacing it with BUG()) certainly would make page_launder
more readable.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: innd mmap bug in 2.4.0-test12

2000-12-27 Thread Philipp Rumpf


On Wed, Dec 27, 2000 at 03:41:04PM -0800, Linus Torvalds wrote:
 It must be wrong.
 
 If we have a dirty page on the LRU lists, that page _must_ have a mapping.

What about pages with a mapping but without a writepage function ? or pages
whose writepage function fails ?  The current code seems to simply put the
page onto the active list in that case, which seems just as wrong to me.

 The bug is somewhere else, and your patch is just papering it over. We
 should not have a page without a mapping on the LRU lists in the first
 place, except if the page has anonymous buffers (and such a page cannot

So is there any legal reason we could ever get to page_active ?  Removing
that code (or replacing it with BUG()) certainly would make page_launder
more readable.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: /dev/random: really secure?

2000-12-18 Thread Philipp Rumpf


On Sun, Dec 17, 2000 at 10:50:57PM +0100, Karel Kulhavy wrote:
> I noticed peculiarities in the behaviour of the delta-delta-3 system for
> entropy estimation in the random.c code./ When I hold right alt or control, I
> get about 8 bits of entropy per repeat fro the /dev/random which is
> overestimated. I think the real entropy is 0 bits because it is absolutely
> deterministic when the interrupt comes. Am I right or is there any hidden

not absolutely, but we should ignore repeated keys that generate more than 
one scancode.

tytso, here's the patch to do it again:

--- linux/drivers/char/random.c Sun Jul 30 18:01:23 2000
+++ linux-prumpf/drivers/char/random.c  Thu Sep 28 17:07:03 2000
@@ -763,10 +763,15 @@
 
 void add_keyboard_randomness(unsigned char scancode)
 {
-   static unsigned char last_scancode = 0;
-   /* ignore autorepeat (multiple key down w/o key up) */
-   if (scancode != last_scancode) {
-   last_scancode = scancode;
+   static unsigned char last_scancode[2] = { 0, 0 };
+
+   /* ignore autorepeat (multiple key down w/o key up).
+* add_keyboard_randomness is called twice for certain AT keyboard
+* keys, so we keep a longer history. */
+   if (scancode != last_scancode[0] &&
+   scancode != last_scancode[1]) {
+   last_scancode[0] = last_scancode[1];
+   last_scancode[1] = scancode;
add_timer_randomness(_timer_state, scancode);
}
 }

If we want to rely solely on the add_timer_randomness checks, we should
remove the autorepeat check completely.

Philipp Rumpf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: /dev/random: really secure?

2000-12-18 Thread Philipp Rumpf


On Sun, Dec 17, 2000 at 10:50:57PM +0100, Karel Kulhavy wrote:
 I noticed peculiarities in the behaviour of the delta-delta-3 system for
 entropy estimation in the random.c code./ When I hold right alt or control, I
 get about 8 bits of entropy per repeat fro the /dev/random which is
 overestimated. I think the real entropy is 0 bits because it is absolutely
 deterministic when the interrupt comes. Am I right or is there any hidden

not absolutely, but we should ignore repeated keys that generate more than 
one scancode.

tytso, here's the patch to do it again:

--- linux/drivers/char/random.c Sun Jul 30 18:01:23 2000
+++ linux-prumpf/drivers/char/random.c  Thu Sep 28 17:07:03 2000
@@ -763,10 +763,15 @@
 
 void add_keyboard_randomness(unsigned char scancode)
 {
-   static unsigned char last_scancode = 0;
-   /* ignore autorepeat (multiple key down w/o key up) */
-   if (scancode != last_scancode) {
-   last_scancode = scancode;
+   static unsigned char last_scancode[2] = { 0, 0 };
+
+   /* ignore autorepeat (multiple key down w/o key up).
+* add_keyboard_randomness is called twice for certain AT keyboard
+* keys, so we keep a longer history. */
+   if (scancode != last_scancode[0] 
+   scancode != last_scancode[1]) {
+   last_scancode[0] = last_scancode[1];
+   last_scancode[1] = scancode;
add_timer_randomness(keyboard_timer_state, scancode);
}
 }

If we want to rely solely on the add_timer_randomness checks, we should
remove the autorepeat check completely.

Philipp Rumpf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] i850 support

2000-12-17 Thread Philipp Rumpf


On Sun, Dec 17, 2000 at 12:52:59AM -0800, Drew Hess wrote:
> It also contains an experimental patch to drivers/pci/quirks.c that forces
> standby mode for pool C RDRAM devices.  I've seen some benchmarks
> comparing the Intel D850GB motherboard to the ASUS P4T and attributing the
> better performance of the P4T to the fact that Intel's BIOS puts pool C
> RDRAM in nap mode and has no option to change it, whereas the P4T does.  
> On my D850GB, the patch doesn't seem to change the performance of STREAM
> one way or the other; but when I change the value written into the RDST
> register to set nap mode and shrink the A and B pools to 1 device each, I
> see about a 20-30MB/s drop-off on STREAM, so I'm pretty sure the code does
> what it's supposed to.
> 
> This code isn't strictly a 'quirk' in the sense that some of the bug
> workarounds in drivers/pci/quirks.c are, but I didn't see a more
> appropriate place to put it.  Suggestions are welcome.

sounds like powertweak is what you want.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] i850 support

2000-12-17 Thread Philipp Rumpf


On Sun, Dec 17, 2000 at 12:52:59AM -0800, Drew Hess wrote:
 It also contains an experimental patch to drivers/pci/quirks.c that forces
 standby mode for pool C RDRAM devices.  I've seen some benchmarks
 comparing the Intel D850GB motherboard to the ASUS P4T and attributing the
 better performance of the P4T to the fact that Intel's BIOS puts pool C
 RDRAM in nap mode and has no option to change it, whereas the P4T does.  
 On my D850GB, the patch doesn't seem to change the performance of STREAM
 one way or the other; but when I change the value written into the RDST
 register to set nap mode and shrink the A and B pools to 1 device each, I
 see about a 20-30MB/s drop-off on STREAM, so I'm pretty sure the code does
 what it's supposed to.
 
 This code isn't strictly a 'quirk' in the sense that some of the bug
 workarounds in drivers/pci/quirks.c are, but I didn't see a more
 appropriate place to put it.  Suggestions are welcome.

sounds like powertweak is what you want.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.18pre25

2000-12-08 Thread Philipp Rumpf

On Fri, Dec 08, 2000 at 10:47:46AM +0100, Willy Tarreau wrote:
> |Bus  0, device   2, function  1:
> |  Unknown class: Intel OEM MegaRAID Controller (rev 5).
> |Medium devsel.  Fast back-to-back capable.  BIST capable.  IRQ 10.  Master
> Capable.  Latency=64.  
> |Prefetchable 32 bit memory at 0xf000 [0xf008].
> 
> as you see, the board is found at 0xf008, but used aligned to 0xf000.

No.  It's found at 0xf000, and has 8 bytes of MMIO space.

> my server currently works with that patch, but I'm sure it won't boot anymore
> if I apply this 2.2.18pre25 alone. 

"I'm sure" meaning "I didn't test it" ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.18pre25

2000-12-08 Thread Philipp Rumpf


On Fri, Dec 08, 2000 at 10:47:46AM +0100, Willy Tarreau wrote:
 |Bus  0, device   2, function  1:
 |  Unknown class: Intel OEM MegaRAID Controller (rev 5).
 |Medium devsel.  Fast back-to-back capable.  BIST capable.  IRQ 10.  Master
 Capable.  Latency=64.  
 |Prefetchable 32 bit memory at 0xf000 [0xf008].
 
 as you see, the board is found at 0xf008, but used aligned to 0xf000.

No.  It's found at 0xf000, and has 8 bytes of MMIO space.

 my server currently works with that patch, but I'm sure it won't boot anymore
 if I apply this 2.2.18pre25 alone. 

"I'm sure" meaning "I didn't test it" ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] mutliple root devs (take II)

2000-12-01 Thread Philipp Rumpf

On Fri, Dec 01, 2000 at 08:04:52PM -0500, Jeff Dike wrote:
> boot memory allocator (see mm/bootmem.c).  In the arch that I'm most familiar 
> with (arch/um), that is usable from the beginning of start_kernel.  I don't 
> know about the other arches.

setup_arch does the necessary initialization on most architectures - you
might want to consider fixing UML to be consistent with them.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] mutliple root devs (take II)

2000-12-01 Thread Philipp Rumpf


On Fri, Dec 01, 2000 at 08:04:52PM -0500, Jeff Dike wrote:
 boot memory allocator (see mm/bootmem.c).  In the arch that I'm most familiar 
 with (arch/um), that is usable from the beginning of start_kernel.  I don't 
 know about the other arches.

setup_arch does the necessary initialization on most architectures - you
might want to consider fixing UML to be consistent with them.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: How to transfer memory from PCI memory directly to user space safely and portable?

2000-11-27 Thread Philipp Rumpf

On Mon, Nov 27, 2000 at 10:36:34AM -0800, H. Peter Anvin wrote:
> Followup to:  <[EMAIL PROTECTED]>
> By author:Philipp Rumpf <[EMAIL PROTECTED]>
> In newsgroup: linux.dev.kernel
> > 
> > I hope count isn't provided by userspace here ?
> > 
> > > 1. What happens if the user space memory is swapped to disk? Will 
> > > verify_area() make sure that the memory is in physical RAM when it returns, 
> > > or will it return -EFAULT, or will something even worse happen?
> > 
> > On i386, you'll sleep implicitly waiting for the page fault to be handled;  in
> > the generic case, anything could happen.
> > 
> 
> That doesn't sound right.  I would expect it to wait for the page to
> be brought in on any and all architectures, otherwise it seems rather
> impossible to write portable Linux kernel code.

The code in question was
memcpy_fromio(user_space_dst, iobase, count);

Assuming user_space_dst is a userspace pointer, what I said is true;  on some
architectures we will be dereferencing random pointers in kernel space, and
we won't get -EFAULT right on any architecture.

Or did I miss something ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: How to transfer memory from PCI memory directly to user space safely and portable?

2000-11-27 Thread Philipp Rumpf


On Mon, Nov 27, 2000 at 10:36:34AM -0800, H. Peter Anvin wrote:
 Followup to:  [EMAIL PROTECTED]
 By author:Philipp Rumpf [EMAIL PROTECTED]
 In newsgroup: linux.dev.kernel
  
  I hope count isn't provided by userspace here ?
  
   1. What happens if the user space memory is swapped to disk? Will 
   verify_area() make sure that the memory is in physical RAM when it returns, 
   or will it return -EFAULT, or will something even worse happen?
  
  On i386, you'll sleep implicitly waiting for the page fault to be handled;  in
  the generic case, anything could happen.
  
 
 That doesn't sound right.  I would expect it to wait for the page to
 be brought in on any and all architectures, otherwise it seems rather
 impossible to write portable Linux kernel code.

The code in question was
memcpy_fromio(user_space_dst, iobase, count);

Assuming user_space_dst is a userspace pointer, what I said is true;  on some
architectures we will be dereferencing random pointers in kernel space, and
we won't get -EFAULT right on any architecture.

Or did I miss something ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: How to transfer memory from PCI memory directly to user space safely and portable?

2000-11-26 Thread Philipp Rumpf

On Sun, Nov 26, 2000 at 02:21:31PM +0100, Anders Torger wrote:
>   memcpy_toio(iobase, user_space_src, count);

I hope count isn't provided by userspace here ?

> 1. What happens if the user space memory is swapped to disk? Will 
> verify_area() make sure that the memory is in physical RAM when it returns, 
> or will it return -EFAULT, or will something even worse happen?

On i386, you'll sleep implicitly waiting for the page fault to be handled;  in
the generic case, anything could happen.

> 2. Is this code really portable? I currently have an I386 architecture, and I 
> could use copy_to/from_user on that instead, but that is not portable. Now, 
> by using memcpy_to/fromio instead, is this code fully portable?

No.  It would be portable if you were using memcpy_fromuser_toio and it
existed.

> 3. Will the current process always be the correct one? The copy functions is 
> directly initiated by the user, and not through an interrupt, so I think the 
> user space mapping will always be to the correct process. Is that correct?

current should be fine if you're not in a bh/interrupt/kernel thread.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] removal of "static foo = 0"

2000-11-26 Thread Philipp Rumpf

On Sun, Nov 26, 2000 at 10:37:07AM +, Tigran Aivazian wrote:
> On Sat, 25 Nov 2000, Tim Waugh wrote:
> > Why doesn't the compiler just leave out explicit zeros from the
> > 'initial data' segment then?  Seems like it ought to be tought to..
> 
> yes, taught to, _BUT_ never let this to be a default option, please.
> Because there are valid cases where a programmer things "this is in .data"

That's what __attribute__ ((section (".data"))) is for.

> and that means this should be in .data. Think of binary patching an object
> as one valid example (there may be others, I forgot).

can you think of any valid examples that apply to the kernel ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] removal of "static foo = 0"

2000-11-26 Thread Philipp Rumpf

On Sun, Nov 26, 2000 at 04:25:05AM +, Alan Cox wrote:
>   static int a=0;
> 
> says 'I thought about this. I want it to start at zero. I've written it this
> way to remind of the fact'
> 
> Sure it generates the same code

I agree it would be best if gcc would generate the same code;  unfortunately
this doesn't seem to be the case, which sounds like something to take up with
the gcc developers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] removal of static foo = 0

2000-11-26 Thread Philipp Rumpf


On Sun, Nov 26, 2000 at 04:25:05AM +, Alan Cox wrote:
   static int a=0;
 
 says 'I thought about this. I want it to start at zero. I've written it this
 way to remind of the fact'
 
 Sure it generates the same code

I agree it would be best if gcc would generate the same code;  unfortunately
this doesn't seem to be the case, which sounds like something to take up with
the gcc developers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: How to transfer memory from PCI memory directly to user space safely and portable?

2000-11-26 Thread Philipp Rumpf


On Sun, Nov 26, 2000 at 02:21:31PM +0100, Anders Torger wrote:
   memcpy_toio(iobase, user_space_src, count);

I hope count isn't provided by userspace here ?

 1. What happens if the user space memory is swapped to disk? Will 
 verify_area() make sure that the memory is in physical RAM when it returns, 
 or will it return -EFAULT, or will something even worse happen?

On i386, you'll sleep implicitly waiting for the page fault to be handled;  in
the generic case, anything could happen.

 2. Is this code really portable? I currently have an I386 architecture, and I 
 could use copy_to/from_user on that instead, but that is not portable. Now, 
 by using memcpy_to/fromio instead, is this code fully portable?

No.  It would be portable if you were using memcpy_fromuser_toio and it
existed.

 3. Will the current process always be the correct one? The copy functions is 
 directly initiated by the user, and not through an interrupt, so I think the 
 user space mapping will always be to the correct process. Is that correct?

current should be fine if you're not in a bh/interrupt/kernel thread.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-25 Thread Philipp Rumpf

On Sat, Nov 25, 2000 at 08:03:49PM +0100, Roger Larsson wrote:
> > _trylock functions return 0 for success.
> 
> Not   spin_trylock

Argh, I missed the (recent ?) change to make x86 spinlocks use 1 to mean
unlocked.  You're correct, and obviously this should be fixed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-25 Thread Philipp Rumpf

On Sat, Nov 25, 2000 at 03:49:25PM -0200, Rik van Riel wrote:
> On Sat, 25 Nov 2000, Roger Larsson wrote:
> 
> > Questions:
> >   What are _trylocks supposed to return?
> 
> It depends on the type of _trylock  ;(
> 
> >   Does spin_trylock and down_trylock behave differently?
> >   Why isn't the expected return value documented?
> 
> The whole trylock stuff is, IMHO, a big mess. When you
> change from one type of trylock to another, you may be
> forced to invert the logic of your code since the return
> code from the different locks is different.
> 
> For bitflags, for example, the trylock returns the state
> the bit had before the lock (ie. 1 if the thing was already
> locked).

I assume you're talking about test_and_{set,clear}_bit here.  Their return
value isn't consistent with the other _trylock functions since they're not
_trylock functions.

I think the real problem is that people use test_and_set_bit for locks,
which is almost never[1] a good idea.  The overhead for a semaphore shouldn't
be too much in most cases, and that way it is obvious what you want to do -
and, hopefully, even more obvious if you end up with a semaphore that can
be turned into a spinlock without further changes.

> For spinlocks, it'll probably return something else ;/

_trylock functions return 0 for success.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-25 Thread Philipp Rumpf


On Sat, Nov 25, 2000 at 03:49:25PM -0200, Rik van Riel wrote:
 On Sat, 25 Nov 2000, Roger Larsson wrote:
 
  Questions:
What are _trylocks supposed to return?
 
 It depends on the type of _trylock  ;(
 
Does spin_trylock and down_trylock behave differently?
Why isn't the expected return value documented?
 
 The whole trylock stuff is, IMHO, a big mess. When you
 change from one type of trylock to another, you may be
 forced to invert the logic of your code since the return
 code from the different locks is different.
 
 For bitflags, for example, the trylock returns the state
 the bit had before the lock (ie. 1 if the thing was already
 locked).

I assume you're talking about test_and_{set,clear}_bit here.  Their return
value isn't consistent with the other _trylock functions since they're not
_trylock functions.

I think the real problem is that people use test_and_set_bit for locks,
which is almost never[1] a good idea.  The overhead for a semaphore shouldn't
be too much in most cases, and that way it is obvious what you want to do -
and, hopefully, even more obvious if you end up with a semaphore that can
be turned into a spinlock without further changes.

 For spinlocks, it'll probably return something else ;/

_trylock functions return 0 for success.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-25 Thread Philipp Rumpf


On Sat, Nov 25, 2000 at 08:03:49PM +0100, Roger Larsson wrote:
  _trylock functions return 0 for success.
 
 Not   spin_trylock

Argh, I missed the (recent ?) change to make x86 spinlocks use 1 to mean
unlocked.  You're correct, and obviously this should be fixed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: jiffies wrap question...

2000-11-09 Thread Philipp Rumpf


On Fri, Nov 10, 2000 at 02:21:28AM -0500, Jeff Garzik wrote:
> The following is a piece of code from the latter half of
> schedule_timeout, in kernel/sched.c.  Is it possible that
> schedule_timeout could return an incorrect value, if the jiffy value
> wraps between the first and last lines shown below.

let's say the first line happens n ticks before the wrap and the second,
m ticks afterwards.

> expire = timeout + jiffies;

expire = timeout + 2^32/2^64 - n = timeout - n

> timeout = expire - jiffies;

timeout = expire - m = timeout - n - m = timeout - (n+m)

(n+m) is the time that actually passed while we were asleep, and timeout
has the correct value (timeout - ticks that happened)>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: jiffies wrap question...

2000-11-09 Thread Philipp Rumpf


On Fri, Nov 10, 2000 at 02:21:28AM -0500, Jeff Garzik wrote:
 The following is a piece of code from the latter half of
 schedule_timeout, in kernel/sched.c.  Is it possible that
 schedule_timeout could return an incorrect value, if the jiffy value
 wraps between the first and last lines shown below.

let's say the first line happens n ticks before the wrap and the second,
m ticks afterwards.

 expire = timeout + jiffies;

expire = timeout + 2^32/2^64 - n = timeout - n

 timeout = expire - jiffies;

timeout = expire - m = timeout - n - m = timeout - (n+m)

(n+m) is the time that actually passed while we were asleep, and timeout
has the correct value (timeout - ticks that happened)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: i82808 hardware hub RNG

2000-11-05 Thread Philipp Rumpf

On Sun, Nov 05, 2000 at 06:19:21PM +0100, Heusden, Folkert van wrote:
> I wrote a daemon that fetches (as root-user) random numbers from the RNG in
> the i82808 (found on 815-chipsets).
> You can download it from http://www.vanheusden.com/Linux/random.php3 .
> Currently, I'm trying to rewrite things into a kernel-module so that one has
> a standard character device which can deliver random values then.
> Please give it a try as I do not own a PC with such a motherboard ;-/

So how is this different from drivers/char/i810_rng.c ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: i82808 hardware hub RNG

2000-11-05 Thread Philipp Rumpf


On Sun, Nov 05, 2000 at 06:19:21PM +0100, Heusden, Folkert van wrote:
 I wrote a daemon that fetches (as root-user) random numbers from the RNG in
 the i82808 (found on 815-chipsets).
 You can download it from http://www.vanheusden.com/Linux/random.php3 .
 Currently, I'm trying to rewrite things into a kernel-module so that one has
 a standard character device which can deliver random values then.
 Please give it a try as I do not own a PC with such a motherboard ;-/

So how is this different from drivers/char/i810_rng.c ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.4 Status / TODO page (Updated as of 2.4.0-test10)

2000-11-03 Thread Philipp Rumpf


> 
> 3. Security
> 
>  * Fix module remove race bug (still to be done: TTY, ldisc, I2C,
>video_device - Al Viro) (Rogier Wolff will handle ATM)

TBD: sysctl, kernel_thread

* drivers/input/mousedev.c dereferences userspace pointers directly.  We
should make this fail for a bit to catch those bugs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.4 Status / TODO page (Updated as of 2.4.0-test10)

2000-11-03 Thread Philipp Rumpf


 
 3. Security
 
  * Fix module remove race bug (still to be done: TTY, ldisc, I2C,
video_device - Al Viro) (Rogier Wolff will handle ATM)

TBD: sysctl, kernel_thread

* drivers/input/mousedev.c dereferences userspace pointers directly.  We
should make this fail for a bit to catch those bugs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PROPOSED PATCH] ATM refcount + firestream

2000-10-30 Thread Philipp Rumpf


On Sat, Oct 28, 2000 at 11:50:22AM -0400, Brian Gerst wrote:
> Philipp Rumpf wrote:
> > 
> > On Sat, Oct 28, 2000 at 09:55:21AM -0400, Brian Gerst wrote:
> > > Yes, but they can be called (and sleep) with module refcount == 0.  This
> > > is because the file descripter used to perform the ioctl isn't directly
> > > associated with the network device, thereby not incrementing the
> > > refcount on open.
> > 
> > According to my proposal, it is perfectly safe to call a function in a module
> > while the module's use count is 0.  This function would typically look like this:
> > 
> > foo()
> > {
> > MOD_INC_USE_COUNT;
> > 
> > copy_*_user() (or anything else that sleeps);
> > 
> > MOD_DEC_USE_COUNT;
> > 
> > return bar;
> > }
> > 
> > The only difference to the "old" module scheme is that the above currently isn't
> > safe on SMP systems.
> 
> This will only work while the kernel is not preemptable.  Once the
> kernel thread can be rescheduled, all bets are off.

The implementation will need adjusting for preemptable kernel threads.  The
concept still works fine.   

> With or without your patch, the network ioctls are unsafe, since they
> don't currently do refcounting at all.

I was under the impression most network ioctls didn't sleep.

> Adding it in the layer above thTe
> driver is the easier and cleaner solution.

I disagree.  I dislike special-casing inter-module calls (and that's not even
taking into account that the current implementation of an inter-module call is
quite ugly).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PROPOSED PATCH] ATM refcount + firestream

2000-10-30 Thread Philipp Rumpf


On Sat, Oct 28, 2000 at 11:50:22AM -0400, Brian Gerst wrote:
 Philipp Rumpf wrote:
  
  On Sat, Oct 28, 2000 at 09:55:21AM -0400, Brian Gerst wrote:
   Yes, but they can be called (and sleep) with module refcount == 0.  This
   is because the file descripter used to perform the ioctl isn't directly
   associated with the network device, thereby not incrementing the
   refcount on open.
  
  According to my proposal, it is perfectly safe to call a function in a module
  while the module's use count is 0.  This function would typically look like this:
  
  foo()
  {
  MOD_INC_USE_COUNT;
  
  copy_*_user() (or anything else that sleeps);
  
  MOD_DEC_USE_COUNT;
  
  return bar;
  }
  
  The only difference to the "old" module scheme is that the above currently isn't
  safe on SMP systems.
 
 This will only work while the kernel is not preemptable.  Once the
 kernel thread can be rescheduled, all bets are off.

The implementation will need adjusting for preemptable kernel threads.  The
concept still works fine.   

 With or without your patch, the network ioctls are unsafe, since they
 don't currently do refcounting at all.

I was under the impression most network ioctls didn't sleep.

 Adding it in the layer above thTe
 driver is the easier and cleaner solution.

I disagree.  I dislike special-casing inter-module calls (and that's not even
taking into account that the current implementation of an inter-module call is
quite ugly).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PROPOSED PATCH] ATM refcount + firestream

2000-10-28 Thread Philipp Rumpf

On Sat, Oct 28, 2000 at 09:55:21AM -0400, Brian Gerst wrote:
> Yes, but they can be called (and sleep) with module refcount == 0.  This
> is because the file descripter used to perform the ioctl isn't directly
> associated with the network device, thereby not incrementing the
> refcount on open.

According to my proposal, it is perfectly safe to call a function in a module
while the module's use count is 0.  This function would typically look like this:

foo()
{
MOD_INC_USE_COUNT;

copy_*_user() (or anything else that sleeps);

MOD_DEC_USE_COUNT;

return bar;
}

The only difference to the "old" module scheme is that the above currently isn't
safe on SMP systems.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PROPOSED PATCH] ATM refcount + firestream

2000-10-28 Thread Philipp Rumpf


On Sat, Oct 28, 2000 at 09:37:28AM -0400, Brian Gerst wrote:
> Philipp Rumpf wrote:
> >  - you can't copy_(from|to)_user in the module exit function (which would
> > be copies from/to rmmod anyway)
> 
> Unfortunately, you need to be able to use copy_*_user() from the network
> ioctls, and this is the center of the current issue.

You misunderstood.  The network ioctls aren't module_exit functions, so
copy_*_user in them is fine.

Philipp Rumpf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PROPOSED PATCH] ATM refcount + firestream

2000-10-28 Thread Philipp Rumpf


On Fri, Oct 27, 2000 at 10:49:53PM +1100, Andrew Morton wrote:
> Look, this modules stuff is really bad.  Phillip Rumpf proposed
> a radical alternative a while back which I felt was not given

While it might be a "radical alternative", it doesn't require any changes
to the subsystems that have been fixed so far.  At this time, applying the
patch would basically fix the rest of the subsystems as well (if the
drivers use MOD_{INC,DEC}_USE_COUNT, that is).

> sufficient consideration.  The idea was to make sys_delete_module()
> grab all the other CPUs and leave them spinning on a flag while
> the unload was proceeding.  This was very similar to
> arch/i386/kernel/apm.c:apm_power_off().

The idea here is other CPUs don't have to deal with the kernel going
through a number of inconsistent states while a module is unloaded.  At
any point in time, for any module, exactly one of the following is true:

1. you're in the module_exit function
2. the module is (being) loaded
3. the module isn't loaded.

> As far as I can recall, the only restriction was that you are
> not allowed to call module functions when the module refcount
> is zero if those functions can call schedule().

There are other restrictions which shouldn't really matter:

 - you can't schedule() and hope you end up on a particular CPU (you can
use smp_call_function though)

 - you can't copy_(from|to)_user in the module exit function (which would
be copies from/to rmmod anyway)

> prumpf, please dig out that patch.

attached (rediff against test10-pre6, it seems to work).


diff -urN linux/arch/i386/mm/extable.c linux-prumpf/arch/i386/mm/extable.c
--- linux/arch/i386/mm/extable.cWed Nov 10 08:31:37 1999
+++ linux-prumpf/arch/i386/mm/extable.c Sat Oct 28 06:14:43 2000
@@ -30,26 +30,30 @@
 return 0;
 }
 
+extern struct rw_semaphore module_mutex;
+
 unsigned long
 search_exception_table(unsigned long addr)
 {
-   unsigned long ret;
+   unsigned long ret = 0;
 
 #ifndef CONFIG_MODULES
/* There is only the kernel to search.  */
ret = search_one_table(__start___ex_table, __stop___ex_table-1, addr);
-   if (ret) return ret;
 #else
/* The kernel is the last "module" -- no need to treat it special.  */
struct module *mp;
+
+   down_read(_mutex);
for (mp = module_list; mp != NULL; mp = mp->next) {
if (mp->ex_table_start == NULL)
continue;
ret = search_one_table(mp->ex_table_start,
   mp->ex_table_end - 1, addr);
-   if (ret) return ret;
+   if (ret) break;
}
+   up_read(_mutex);
 #endif
 
-   return 0;
+   return ret;
 }
diff -urN linux/include/linux/smp.h linux-prumpf/include/linux/smp.h
--- linux/include/linux/smp.h   Tue Sep 26 14:31:51 2000
+++ linux-prumpf/include/linux/smp.hSat Oct 28 06:14:43 2000
@@ -71,6 +71,9 @@
 #define MSG_RESCHEDULE 0x0003  /* Reschedule request from master CPU*/
 #define MSG_CALL_FUNCTION   0x0004  /* Call function on all other CPUs */
 
+extern int freeze_other_cpus(void);
+extern void melt_other_cpus(void);
+
 #else
 
 /*
@@ -86,6 +89,8 @@
 #define cpu_number_map(cpu)0
 #define smp_call_function(func,info,retry,wait)({ 0; })
 #define cpu_online_map 1
+#define freeze_other_cpus()0
+#define melt_other_cpus()  do { } while(0)
 
 #endif
 #endif
diff -urN linux/kernel/module.c linux-prumpf/kernel/module.c
--- linux/kernel/module.c   Sat Oct 28 05:03:26 2000
+++ linux-prumpf/kernel/module.cSat Oct 28 06:14:43 2000
@@ -37,6 +37,11 @@
ex_table_end:   __stop___ex_table
 };
 
+/* module_mutex protects module_list - if you can down_read() it, modules
+ * won't be added/removed, but their usecount, flags aso still might change.
+ */
+
+DECLARE_RWSEM(module_mutex);
 struct module *module_list = _module;
 
 static long get_mod_name(const char *user_name, char **buf);
@@ -105,7 +110,6 @@
 
if (!capable(CAP_SYS_MODULE))
return -EPERM;
-   lock_kernel();
if ((namelen = get_mod_name(name_user, )) < 0) {
error = namelen;
goto err0;
@@ -114,32 +118,40 @@
error = -EINVAL;
goto err1;
}
+   lock_kernel();
+   down_write(_mutex);
if (find_module(name) != NULL) {
error = -EEXIST;
-   goto err1;
+   goto err2;
}
if ((mod = (struct module *)module_map(size)) == NULL) {
error = -ENOMEM;
-   goto err1;
+   goto err2;
}
 
memset(mod, 0, sizeof(*mod));
mod->size_of_struct = sizeof(*mod);
-   mod->next = module_list;
mod->name = (char *)(mod + 1);
mod->size = size;
memcpy((char*)(mod+1), name, namelen+1);
 
put_mod_name(name);
 
+

Re: [PROPOSED PATCH] ATM refcount + firestream

2000-10-28 Thread Philipp Rumpf


On Fri, Oct 27, 2000 at 10:49:53PM +1100, Andrew Morton wrote:
 Look, this modules stuff is really bad.  Phillip Rumpf proposed
 a radical alternative a while back which I felt was not given

While it might be a "radical alternative", it doesn't require any changes
to the subsystems that have been fixed so far.  At this time, applying the
patch would basically fix the rest of the subsystems as well (if the
drivers use MOD_{INC,DEC}_USE_COUNT, that is).

 sufficient consideration.  The idea was to make sys_delete_module()
 grab all the other CPUs and leave them spinning on a flag while
 the unload was proceeding.  This was very similar to
 arch/i386/kernel/apm.c:apm_power_off().

The idea here is other CPUs don't have to deal with the kernel going
through a number of inconsistent states while a module is unloaded.  At
any point in time, for any module, exactly one of the following is true:

1. you're in the module_exit function
2. the module is (being) loaded
3. the module isn't loaded.

 As far as I can recall, the only restriction was that you are
 not allowed to call module functions when the module refcount
 is zero if those functions can call schedule().

There are other restrictions which shouldn't really matter:

 - you can't schedule() and hope you end up on a particular CPU (you can
use smp_call_function though)

 - you can't copy_(from|to)_user in the module exit function (which would
be copies from/to rmmod anyway)

 prumpf, please dig out that patch.

attached (rediff against test10-pre6, it seems to work).


diff -urN linux/arch/i386/mm/extable.c linux-prumpf/arch/i386/mm/extable.c
--- linux/arch/i386/mm/extable.cWed Nov 10 08:31:37 1999
+++ linux-prumpf/arch/i386/mm/extable.c Sat Oct 28 06:14:43 2000
@@ -30,26 +30,30 @@
 return 0;
 }
 
+extern struct rw_semaphore module_mutex;
+
 unsigned long
 search_exception_table(unsigned long addr)
 {
-   unsigned long ret;
+   unsigned long ret = 0;
 
 #ifndef CONFIG_MODULES
/* There is only the kernel to search.  */
ret = search_one_table(__start___ex_table, __stop___ex_table-1, addr);
-   if (ret) return ret;
 #else
/* The kernel is the last "module" -- no need to treat it special.  */
struct module *mp;
+
+   down_read(module_mutex);
for (mp = module_list; mp != NULL; mp = mp-next) {
if (mp-ex_table_start == NULL)
continue;
ret = search_one_table(mp-ex_table_start,
   mp-ex_table_end - 1, addr);
-   if (ret) return ret;
+   if (ret) break;
}
+   up_read(module_mutex);
 #endif
 
-   return 0;
+   return ret;
 }
diff -urN linux/include/linux/smp.h linux-prumpf/include/linux/smp.h
--- linux/include/linux/smp.h   Tue Sep 26 14:31:51 2000
+++ linux-prumpf/include/linux/smp.hSat Oct 28 06:14:43 2000
@@ -71,6 +71,9 @@
 #define MSG_RESCHEDULE 0x0003  /* Reschedule request from master CPU*/
 #define MSG_CALL_FUNCTION   0x0004  /* Call function on all other CPUs */
 
+extern int freeze_other_cpus(void);
+extern void melt_other_cpus(void);
+
 #else
 
 /*
@@ -86,6 +89,8 @@
 #define cpu_number_map(cpu)0
 #define smp_call_function(func,info,retry,wait)({ 0; })
 #define cpu_online_map 1
+#define freeze_other_cpus()0
+#define melt_other_cpus()  do { } while(0)
 
 #endif
 #endif
diff -urN linux/kernel/module.c linux-prumpf/kernel/module.c
--- linux/kernel/module.c   Sat Oct 28 05:03:26 2000
+++ linux-prumpf/kernel/module.cSat Oct 28 06:14:43 2000
@@ -37,6 +37,11 @@
ex_table_end:   __stop___ex_table
 };
 
+/* module_mutex protects module_list - if you can down_read() it, modules
+ * won't be added/removed, but their usecount, flags aso still might change.
+ */
+
+DECLARE_RWSEM(module_mutex);
 struct module *module_list = kernel_module;
 
 static long get_mod_name(const char *user_name, char **buf);
@@ -105,7 +110,6 @@
 
if (!capable(CAP_SYS_MODULE))
return -EPERM;
-   lock_kernel();
if ((namelen = get_mod_name(name_user, name))  0) {
error = namelen;
goto err0;
@@ -114,32 +118,40 @@
error = -EINVAL;
goto err1;
}
+   lock_kernel();
+   down_write(module_mutex);
if (find_module(name) != NULL) {
error = -EEXIST;
-   goto err1;
+   goto err2;
}
if ((mod = (struct module *)module_map(size)) == NULL) {
error = -ENOMEM;
-   goto err1;
+   goto err2;
}
 
memset(mod, 0, sizeof(*mod));
mod-size_of_struct = sizeof(*mod);
-   mod-next = module_list;
mod-name = (char *)(mod + 1);
mod-size = size;
memcpy((char*)(mod+1), name, namelen+1);
 
put_mod_name(name);
 
+

Re: [PROPOSED PATCH] ATM refcount + firestream

2000-10-28 Thread Philipp Rumpf


On Sat, Oct 28, 2000 at 09:37:28AM -0400, Brian Gerst wrote:
 Philipp Rumpf wrote:
   - you can't copy_(from|to)_user in the module exit function (which would
  be copies from/to rmmod anyway)
 
 Unfortunately, you need to be able to use copy_*_user() from the network
 ioctls, and this is the center of the current issue.

You misunderstood.  The network ioctls aren't module_exit functions, so
copy_*_user in them is fine.

Philipp Rumpf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PROPOSED PATCH] ATM refcount + firestream

2000-10-28 Thread Philipp Rumpf


On Sat, Oct 28, 2000 at 09:55:21AM -0400, Brian Gerst wrote:
 Yes, but they can be called (and sleep) with module refcount == 0.  This
 is because the file descripter used to perform the ioctl isn't directly
 associated with the network device, thereby not incrementing the
 refcount on open.

According to my proposal, it is perfectly safe to call a function in a module
while the module's use count is 0.  This function would typically look like this:

foo()
{
MOD_INC_USE_COUNT;

copy_*_user() (or anything else that sleeps);

MOD_DEC_USE_COUNT;

return bar;
}

The only difference to the "old" module scheme is that the above currently isn't
safe on SMP systems.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] 0-byte read()/write() behaviour

2000-10-20 Thread Philipp Rumpf

On Fri, Oct 20, 2000 at 10:47:45AM -0700, Linus Torvalds wrote:
> On Fri, 20 Oct 2000, Philipp Rumpf wrote:
> >
> > Single Unix specifies that 0-byte reads, as well as 0-byte writes, should
> > "return 0 and have no other results".  Our current implementation violates
> > the first requirement and makes it very easy to violate the second one.
> 
> Note that there _are_ cases where 0-byte reads and writes have specific
> meaning, notably there are some networking things where a 0-byte sendto()
> does something special if I remember correctly. And I seem to remember
> that this also _did_ translate into write().

sock_write and sock_read contain:
if(size==0) /* Match SYS5 behaviour */
return 0;

I'm not sure which other read()/write() functions are used by
"networking things", but I don't see any others sendto would map to.

So I suspect if there are any applications which would need the behaviour
you described they've already been broken, and should be recompiled to use
sys_send(|to|msg) (via sys_socketcall on x86, of course).

> I remember that Linux used to do exactly this, and we had to pass the
> 0-byte writes into the low-level cases exactly because some low-level
> cases do care.

I would suspect most of those have been eliminated by now.

> I suspect SUS only talks about regular files.

As I'm reading it, they're talking about every read() call, even those with
an invalid fd.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] 0-byte read()/write() behaviour

2000-10-20 Thread Philipp Rumpf


Single Unix specifies that 0-byte reads, as well as 0-byte writes, should
"return 0 and have no other results".  Our current implementation violates
the first requirement and makes it very easy to violate the second one.

 read(page_cache_fd, invalid_ptr, 0) returns -EFAULT;  IMHO, this is a
clear bug - suppose you have managed to acquire the page covering virtual
addresses up to and including PAGE_OFFSET-1 (on x86).
read(fd, PAGE_OFFSET-n, n) now is successful for PAGE_SIZE >= n > 0, but
fails with -EFAULT for n = 0.

 looking through drivers/char/*.c, many drivers seem to behave unexpectedly
for a 0-byte read;  this includes code that overwrites the byte read()'s
second argument points to, endless loops, and several different error
returns.

 generic_file_read and generice_file_write, which I suppose are the most
critical functions for read()/write() performance, both have special cases
to handle the count == 0 case.


So I think modifying sys_read and sys_write to take care of the count == 0
case and only calling the file-specific function when it's actually
necessary makes most sense;  as generic_file_(read|write) get simplified
accordingly, there should be no performance impact.

Linus ?

patch against test10-pre4

diff -ur linux/fs/read_write.c linux-prumpf/fs/read_write.c
--- linux/fs/read_write.c   Fri Oct 20 04:52:58 2000
+++ linux-prumpf/fs/read_write.cFri Oct 20 06:03:23 2000
@@ -120,6 +120,10 @@
ssize_t ret;
struct file * file;
 
+   /* Required for SU compatibility.  It also simplifies generic_file_read */
+   if (!count)
+   return 0;
+
ret = -EBADF;
file = fget(fd);
if (file) {
@@ -145,6 +149,10 @@
 {
ssize_t ret;
struct file * file;
+
+   /* Required for SU compatibility. */
+   if (!count)
+   return 0;
 
ret = -EBADF;
file = fget(fd);
diff -ur linux/mm/filemap.c linux-prumpf/mm/filemap.c
--- linux/mm/filemap.c  Fri Oct 20 05:05:04 2000
+++ linux-prumpf/mm/filemap.c   Fri Oct 20 06:08:41 2000
@@ -1124,27 +1124,18 @@
  */
 ssize_t generic_file_read(struct file * filp, char * buf, size_t count, loff_t *ppos)
 {
-   ssize_t retval;
+   read_descriptor_t desc;
 
-   retval = -EFAULT;
-   if (access_ok(VERIFY_WRITE, buf, count)) {
-   retval = 0;
-
-   if (count) {
-   read_descriptor_t desc;
-
-   desc.written = 0;
-   desc.count = count;
-   desc.buf = buf;
-   desc.error = 0;
-   do_generic_file_read(filp, ppos, , file_read_actor);
-
-   retval = desc.written;
-   if (!retval)
-   retval = desc.error;
-   }
-   }
-   return retval;
+   if (!access_ok(VERIFY_WRITE, buf, count))
+   return -EFAULT;
+   
+   desc.written = 0;
+   desc.count = count;
+   desc.buf = buf;
+   desc.error = 0;
+   do_generic_file_read(filp, ppos, , file_read_actor);
+
+   return desc.written ? : desc.error;
 }
 
 static int file_send_actor(read_descriptor_t * desc, struct page *page, unsigned long 
offset , unsigned long size)
@@ -2424,13 +2415,12 @@
}
 
status  = 0;
-   if (count) {
-   remove_suid(inode);
-   inode->i_ctime = inode->i_mtime = CURRENT_TIME;
-   mark_inode_dirty(inode);
-   }
 
-   while (count) {
+   remove_suid(inode);
+   inode->i_ctime = inode->i_mtime = CURRENT_TIME;
+   mark_inode_dirty(inode);
+
+   do {
unsigned long bytes, index, offset;
char *kaddr;
 
@@ -2480,7 +2470,7 @@
 
if (status < 0)
break;
-   }
+   } while (count);
*ppos = pos;
 
if (cached_page)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] 0-byte read()/write() behaviour

2000-10-20 Thread Philipp Rumpf


Single Unix specifies that 0-byte reads, as well as 0-byte writes, should
"return 0 and have no other results".  Our current implementation violates
the first requirement and makes it very easy to violate the second one.

 read(page_cache_fd, invalid_ptr, 0) returns -EFAULT;  IMHO, this is a
clear bug - suppose you have managed to acquire the page covering virtual
addresses up to and including PAGE_OFFSET-1 (on x86).
read(fd, PAGE_OFFSET-n, n) now is successful for PAGE_SIZE = n  0, but
fails with -EFAULT for n = 0.

 looking through drivers/char/*.c, many drivers seem to behave unexpectedly
for a 0-byte read;  this includes code that overwrites the byte read()'s
second argument points to, endless loops, and several different error
returns.

 generic_file_read and generice_file_write, which I suppose are the most
critical functions for read()/write() performance, both have special cases
to handle the count == 0 case.


So I think modifying sys_read and sys_write to take care of the count == 0
case and only calling the file-specific function when it's actually
necessary makes most sense;  as generic_file_(read|write) get simplified
accordingly, there should be no performance impact.

Linus ?

patch against test10-pre4

diff -ur linux/fs/read_write.c linux-prumpf/fs/read_write.c
--- linux/fs/read_write.c   Fri Oct 20 04:52:58 2000
+++ linux-prumpf/fs/read_write.cFri Oct 20 06:03:23 2000
@@ -120,6 +120,10 @@
ssize_t ret;
struct file * file;
 
+   /* Required for SU compatibility.  It also simplifies generic_file_read */
+   if (!count)
+   return 0;
+
ret = -EBADF;
file = fget(fd);
if (file) {
@@ -145,6 +149,10 @@
 {
ssize_t ret;
struct file * file;
+
+   /* Required for SU compatibility. */
+   if (!count)
+   return 0;
 
ret = -EBADF;
file = fget(fd);
diff -ur linux/mm/filemap.c linux-prumpf/mm/filemap.c
--- linux/mm/filemap.c  Fri Oct 20 05:05:04 2000
+++ linux-prumpf/mm/filemap.c   Fri Oct 20 06:08:41 2000
@@ -1124,27 +1124,18 @@
  */
 ssize_t generic_file_read(struct file * filp, char * buf, size_t count, loff_t *ppos)
 {
-   ssize_t retval;
+   read_descriptor_t desc;
 
-   retval = -EFAULT;
-   if (access_ok(VERIFY_WRITE, buf, count)) {
-   retval = 0;
-
-   if (count) {
-   read_descriptor_t desc;
-
-   desc.written = 0;
-   desc.count = count;
-   desc.buf = buf;
-   desc.error = 0;
-   do_generic_file_read(filp, ppos, desc, file_read_actor);
-
-   retval = desc.written;
-   if (!retval)
-   retval = desc.error;
-   }
-   }
-   return retval;
+   if (!access_ok(VERIFY_WRITE, buf, count))
+   return -EFAULT;
+   
+   desc.written = 0;
+   desc.count = count;
+   desc.buf = buf;
+   desc.error = 0;
+   do_generic_file_read(filp, ppos, desc, file_read_actor);
+
+   return desc.written ? : desc.error;
 }
 
 static int file_send_actor(read_descriptor_t * desc, struct page *page, unsigned long 
offset , unsigned long size)
@@ -2424,13 +2415,12 @@
}
 
status  = 0;
-   if (count) {
-   remove_suid(inode);
-   inode-i_ctime = inode-i_mtime = CURRENT_TIME;
-   mark_inode_dirty(inode);
-   }
 
-   while (count) {
+   remove_suid(inode);
+   inode-i_ctime = inode-i_mtime = CURRENT_TIME;
+   mark_inode_dirty(inode);
+
+   do {
unsigned long bytes, index, offset;
char *kaddr;
 
@@ -2480,7 +2470,7 @@
 
if (status  0)
break;
-   }
+   } while (count);
*ppos = pos;
 
if (cached_page)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] 0-byte read()/write() behaviour

2000-10-20 Thread Philipp Rumpf


On Fri, Oct 20, 2000 at 10:47:45AM -0700, Linus Torvalds wrote:
 On Fri, 20 Oct 2000, Philipp Rumpf wrote:
 
  Single Unix specifies that 0-byte reads, as well as 0-byte writes, should
  "return 0 and have no other results".  Our current implementation violates
  the first requirement and makes it very easy to violate the second one.
 
 Note that there _are_ cases where 0-byte reads and writes have specific
 meaning, notably there are some networking things where a 0-byte sendto()
 does something special if I remember correctly. And I seem to remember
 that this also _did_ translate into write().

sock_write and sock_read contain:
if(size==0) /* Match SYS5 behaviour */
return 0;

I'm not sure which other read()/write() functions are used by
"networking things", but I don't see any others sendto would map to.

So I suspect if there are any applications which would need the behaviour
you described they've already been broken, and should be recompiled to use
sys_send(|to|msg) (via sys_socketcall on x86, of course).

 I remember that Linux used to do exactly this, and we had to pass the
 0-byte writes into the low-level cases exactly because some low-level
 cases do care.

I would suspect most of those have been eliminated by now.

 I suspect SUS only talks about regular files.

As I'm reading it, they're talking about every read() call, even those with
an invalid fd.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4 MM overview?

2000-10-16 Thread Philipp Rumpf


On Sun, Oct 15, 2000 at 11:35:23PM +0100, Kenn Humborg wrote:
> On Sun, Oct 15, 2000 at 09:45:11PM +0100, Alan Cox wrote:
> > > Well, we ain't got these luxuries/complications in VAXland...  Hell, 
> > > we don't even have two-level page tables :-(
> > 
> > Really. Ugh. I always assumed Vax had at least two levels because mmap on
> > 4.2 BSD used to panic on 128K+ blocks. I guess there was a different reason
> > for that then
> 
> We've kind of got 1.5-level page tables.  There are actually 3 page tables.
> The system page table maps memory starting at 0x8000.  The P0 process
> page table maps from 0x0 up and the P1 process page table maps from
> 0x7fff down.

And they have to be physically contiguous I guess ?

> This means that sparse address spaces are going to be _really_ expensive
> on PTEs.  I don't know how much of a problem this is going to be yet,
> but I'm sure it's going to be fun :-)

512 byte pages, 4 bytes per pte ?  Ouch.  Can you fill the TLB manually ?

OTOH, I think mapping all physical memory makes sense with the three page
table setup.

Philipp Rumpf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

1 2 >

1 - 100 of 136 matches

Mail list logo