Re: [RFC] __initstr & __exitstr
On Fri, Jul 06, 2001 at 11:17:44PM -0300, Arnaldo Carvalho de Melo wrote: > Hi, > > Please comment on this approach to move strings in __init functions > from .rodata to .data.init so that it get discarded after initialization, > like the variables marked as __initdata and the functions marked as __init, > as well as move strings in __exit marked functions to .data.exit, that will > be discarded and not even get into the generated kernel image. > > Please note that if possible the best approach was for gcc to move > those strings automatically if the function was marked with modified > __init/__exit macros, but we have to keep in mind that some of the strings > in those functions can not be discarded because they keep being referenced > by say register_chrdev and others, unlike, for example, proc functions and > others that copy the string passed to some malloc'ed data structure, so we > have to be selective marking exactly the ones that can indeed be discarded. .. or fix all registration functions to use a private copy of the string, which would avoid some common oopses. > I've also implemented helper functions for printk thats the most > common case, and leaved the other common case, panic, using > __initstr/__exitstr explicitely, so that people can comment on what is > better. > #define init_printk(fmt,arg...) printk(__initstr(fmt) , ##arg) I dislike init_printk; it combines variadic functions/macros with assumptions about how the first argument is specified (i.e. as a string constant), so it's potentially very confusing. Also, printk is used for debugging, and accidentally using init_printk instead of printk would result in no messages being printed at all if the driver is compiled into the kernel (while everything would work fine if the driver is compiled as a module, where init_printk and printk are identical). I think this would be very annoying to track down for less experienced kernel hackers. > #define __initstr(s)({ static char __tmp_init_str[] __initdata=s; > __tmp_init_str;}) I think this would fail if used in structure initialisations ? Philipp Rumpf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] __initstr __exitstr
On Fri, Jul 06, 2001 at 11:17:44PM -0300, Arnaldo Carvalho de Melo wrote: Hi, Please comment on this approach to move strings in __init functions from .rodata to .data.init so that it get discarded after initialization, like the variables marked as __initdata and the functions marked as __init, as well as move strings in __exit marked functions to .data.exit, that will be discarded and not even get into the generated kernel image. Please note that if possible the best approach was for gcc to move those strings automatically if the function was marked with modified __init/__exit macros, but we have to keep in mind that some of the strings in those functions can not be discarded because they keep being referenced by say register_chrdev and others, unlike, for example, proc functions and others that copy the string passed to some malloc'ed data structure, so we have to be selective marking exactly the ones that can indeed be discarded. .. or fix all registration functions to use a private copy of the string, which would avoid some common oopses. I've also implemented helper functions for printk thats the most common case, and leaved the other common case, panic, using __initstr/__exitstr explicitely, so that people can comment on what is better. #define init_printk(fmt,arg...) printk(__initstr(fmt) , ##arg) I dislike init_printk; it combines variadic functions/macros with assumptions about how the first argument is specified (i.e. as a string constant), so it's potentially very confusing. Also, printk is used for debugging, and accidentally using init_printk instead of printk would result in no messages being printed at all if the driver is compiled into the kernel (while everything would work fine if the driver is compiled as a module, where init_printk and printk are identical). I think this would be very annoying to track down for less experienced kernel hackers. #define __initstr(s)({ static char __tmp_init_str[] __initdata=s; __tmp_init_str;}) I think this would fail if used in structure initialisations ? Philipp Rumpf - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Prevent OOM from killing init
On Thu, Mar 22, 2001 at 01:14:41AM -0700, Eric W. Biederman wrote: > Rik van Riel <[EMAIL PROTECTED]> writes: > Is there ever a case where killing init is the right thing to do? There are cases where panic() is the right thing to do. Broken init is such a case. > My impression is that if init is selected the whole machine dies. > If you can kill init and still have a machine that mostly works, you can't. > Guaranteeing not to select init can buy you piece of mind because > init if properly setup can put the machine back together again, while > not special casing init means something weird might happen and init > would be selected. If we're in a situation where long-running processes with relatively small VM are killed the box is very unlikely to be usable anyway. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Prevent OOM from killing init
On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote: > On Wed, 21 Mar 2001, Patrick O'Rourke wrote: > > > Since the system will panic if the init process is chosen by > > the OOM killer, the following patch prevents select_bad_process() > > from picking init. > > One question ... has the OOM killer ever selected init on > anybody's system ? Yes, I managed to reproduce this a while ago. (init was the only process around though). We don't ever kill init, fwiw; we panic(), which is the right thing to do if init can't keep running. > I think that the scoring algorithm should make sure that > we never pick init, unless the system is screwed so badly > that init is broken or the only process left ;) I can't think of a situation where the OOM killer does the wrong thing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Prevent OOM from killing init
On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote: On Wed, 21 Mar 2001, Patrick O'Rourke wrote: Since the system will panic if the init process is chosen by the OOM killer, the following patch prevents select_bad_process() from picking init. One question ... has the OOM killer ever selected init on anybody's system ? Yes, I managed to reproduce this a while ago. (init was the only process around though). We don't ever kill init, fwiw; we panic(), which is the right thing to do if init can't keep running. I think that the scoring algorithm should make sure that we never pick init, unless the system is screwed so badly that init is broken or the only process left ;) I can't think of a situation where the OOM killer does the wrong thing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: static scheduling - SCHED_IDLE?
On Fri, Mar 09, 2001 at 09:09:13PM +0100, Jamie Lokier wrote: > Rik van Riel wrote: > > > Just raise the priority whenever the task's in kernel mode. Problem > > > solved. > > > > Remember that a task schedules itself out at the timer interrupt, > > in kernel/sched.c::schedule() ... which is kernel mode ;) > > Even nicer. On x86 change this: > > reschedule: > call SYMBOL_NAME(schedule)# test > jmp ret_from_sys_call > > to this: > > reschedule: > orl $PF_HONOUR_LOW_PRIORITY,flags(%ebx) > call SYMBOL_NAME(schedule)# test > andl $~PF_HONOUR_LOW_PRIORITY,flags(%ebx) > jmp ret_from_sys_call > > (You get the idea; this isn't the best implementation). A few months ago, I implemented preemptible kernel threads (locally; I tend to think the other patches are superior). Part of the changes was to separate schedule into __schedule() (common part), schedule_user() (automatic schedule from entry.S) and schedule() (manual schedule in kernel space); besides making what Jamie proposed easier, we can also save a few cycles in the (common) schedule_user case: - we never release the kernel lock - we can pass current to schedule_user - we just handled softirqs this is 2.5 material though ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Rename all derived CONFIG variables
On Mon, Mar 12, 2001 at 06:03:22PM +1100, Keith Owens wrote: > In 2.4.2-ac18 there are 130 CONFIG options that are always derived from > other options, the user has no control over them. It is useful for the > kernel build process to know which variables are derived and which > variables the user can control. There are also 6 CONFIG options that Yes, it is. However, I don't see a reason to use a different namespace for them. At least some of the options below might become non-derived options again, and having that knowledge spread out over most of the kernel tree doesn't make any sense to me. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Rename all derived CONFIG variables
On Mon, Mar 12, 2001 at 06:03:22PM +1100, Keith Owens wrote: In 2.4.2-ac18 there are 130 CONFIG options that are always derived from other options, the user has no control over them. It is useful for the kernel build process to know which variables are derived and which variables the user can control. There are also 6 CONFIG options that Yes, it is. However, I don't see a reason to use a different namespace for them. At least some of the options below might become non-derived options again, and having that knowledge spread out over most of the kernel tree doesn't make any sense to me. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] documentation mm.h + swap.h
On Thu, Mar 08, 2001 at 06:10:16PM -0300, Rik van Riel wrote: > --- linux-2.4.2-doc/include/linux/mm.h.orig Wed Mar 7 15:36:32 2001 > +++ linux-2.4.2-doc/include/linux/mm.hThu Mar 8 09:54:22 2001 > @@ -39,32 +39,37 @@ > * library, the executable area etc). > */ > struct vm_area_struct { > - struct mm_struct * vm_mm; /* VM area parameters */ > - unsigned long vm_start; > - unsigned long vm_end; > + struct mm_struct * vm_mm; /* The address space we belong to. */ > + unsigned long vm_start; /* Our start address within vm_mm. */ > + unsigned long vm_end; /* Our end address within vm_mm. */ it might be a good idea to point out that this is the address of the byte after the last one covered by the vma, not the address of the last byte. (are there any architectures where we allow a vma at the end of memory ? Is the mm/ code handling ->vm_end = 0 correctly ?) > /* > + * Each physical page in the system has a struct page associated with > + * it to keep track of whatever it is we are using the page for at the > + * moment. Note that we have no way to track which tasks are using > + * a page. > + * Each page of "real" RAM. In particular I think MMIO pages still don't have a struct page. > * Try to keep the most commonly accessed fields in single cache lines > * here (16 bytes or greater). This ordering should be particularly > * beneficial on 32-bit processors. > * > * The first line is data used in page cache lookup, the second line > * is used for linear searches (eg. clock algorithm scans). > + * > + * TODO: make this structure smaller, it could be as small as 32 bytes. Or make it cover large pages, which might be even more of a win .. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] documentation mm.h + swap.h
On Thu, Mar 08, 2001 at 06:10:16PM -0300, Rik van Riel wrote: --- linux-2.4.2-doc/include/linux/mm.h.orig Wed Mar 7 15:36:32 2001 +++ linux-2.4.2-doc/include/linux/mm.hThu Mar 8 09:54:22 2001 @@ -39,32 +39,37 @@ * library, the executable area etc). */ struct vm_area_struct { - struct mm_struct * vm_mm; /* VM area parameters */ - unsigned long vm_start; - unsigned long vm_end; + struct mm_struct * vm_mm; /* The address space we belong to. */ + unsigned long vm_start; /* Our start address within vm_mm. */ + unsigned long vm_end; /* Our end address within vm_mm. */ it might be a good idea to point out that this is the address of the byte after the last one covered by the vma, not the address of the last byte. (are there any architectures where we allow a vma at the end of memory ? Is the mm/ code handling -vm_end = 0 correctly ?) /* + * Each physical page in the system has a struct page associated with + * it to keep track of whatever it is we are using the page for at the + * moment. Note that we have no way to track which tasks are using + * a page. + * Each page of "real" RAM. In particular I think MMIO pages still don't have a struct page. * Try to keep the most commonly accessed fields in single cache lines * here (16 bytes or greater). This ordering should be particularly * beneficial on 32-bit processors. * * The first line is data used in page cache lookup, the second line * is used for linear searches (eg. clock algorithm scans). + * + * TODO: make this structure smaller, it could be as small as 32 bytes. Or make it cover large pages, which might be even more of a win .. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ramdisk/VM fix
With the current rd.c code, we can get into a situation where there is a buffer-only page for data which is also in a page cache page with page->buffers != NULL. The current vmscan.c code never frees the page cache page in this scenario, effectively doubling ramdisk memory requirements. Linus, I think this is a bugfix (tested against -ac kernels): diff -ur linux/include/linux/swap.h linux-prumpf/include/linux/swap.h --- linux/include/linux/swap.h Thu Mar 8 10:01:30 2001 +++ linux-prumpf/include/linux/swap.h Thu Mar 8 10:14:12 2001 @@ -284,7 +284,7 @@ #endif #define page_ramdisk(page) \ - (page->buffers && (MAJOR(page->buffers->b_dev) == RAMDISK_MAJOR)) + (page->buffers && (MAJOR(page->buffers->b_dev) == RAMDISK_MAJOR) && +buffer_protected(page->buffers)) extern spinlock_t swaplock; I don't think I'm missing anything important ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ramdisk/VM fix
With the current rd.c code, we can get into a situation where there is a buffer-only page for data which is also in a page cache page with page-buffers != NULL. The current vmscan.c code never frees the page cache page in this scenario, effectively doubling ramdisk memory requirements. Linus, I think this is a bugfix (tested against -ac kernels): diff -ur linux/include/linux/swap.h linux-prumpf/include/linux/swap.h --- linux/include/linux/swap.h Thu Mar 8 10:01:30 2001 +++ linux-prumpf/include/linux/swap.h Thu Mar 8 10:14:12 2001 @@ -284,7 +284,7 @@ #endif #define page_ramdisk(page) \ - (page-buffers (MAJOR(page-buffers-b_dev) == RAMDISK_MAJOR)) + (page-buffers (MAJOR(page-buffers-b_dev) == RAMDISK_MAJOR) +buffer_protected(page-buffers)) extern spinlock_t swaplock; I don't think I'm missing anything important ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kmalloc() alignment
On Sun, Mar 04, 2001 at 10:34:31PM +, Alan Cox wrote: > > Does kmalloc() make any guarantees of the alignment of allocated > > blocks? Will the returned block always be 4-, 8- or 16-byte > > aligned, for example? > > There are people who assume 16byte alignment guarantees. I dont think anyone > has formally specified the guarantee beyond 4 bytes tho Userspace malloc is "suitably aligned for any kind of variable", so I think expecting 8 bytes alignment (long long on 32-bit platforms) should be okay. >From reading the code it seems as though we actually use L1_CACHE_BYTES, and I think it might be a good idea to document the current behaviour (as long as there's no good reason to change it ?) diff -ur linux/mm/slab.c linux-prumpf/mm/slab.c --- linux/mm/slab.c Tue Mar 6 00:54:38 2001 +++ linux-prumpf/mm/slab.c Tue Mar 6 01:00:47 2001 @@ -1525,9 +1525,10 @@ * @flags: the type of memory to allocate. * * kmalloc is the normal method of allocating memory - * in the kernel. Note that the @size parameter must be less than or - * equals to %KMALLOC_MAXSIZE and the caller must ensure this. The @flags - * argument may be one of: + * in the kernel. It returns a pointer (aligned to a hardware cache line + * boundary) to the allocated memory, or %NULL in case of failure. Note that + * the @size parameter must be less than or equal to %KMALLOC_MAXSIZE and + * the caller must ensure this. The @flags argument may be one of: * * %GFP_USER - Allocate memory on behalf of user. May sleep. * - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kmalloc() alignment
On Sun, Mar 04, 2001 at 10:34:31PM +, Alan Cox wrote: Does kmalloc() make any guarantees of the alignment of allocated blocks? Will the returned block always be 4-, 8- or 16-byte aligned, for example? There are people who assume 16byte alignment guarantees. I dont think anyone has formally specified the guarantee beyond 4 bytes tho Userspace malloc is "suitably aligned for any kind of variable", so I think expecting 8 bytes alignment (long long on 32-bit platforms) should be okay. From reading the code it seems as though we actually use L1_CACHE_BYTES, and I think it might be a good idea to document the current behaviour (as long as there's no good reason to change it ?) diff -ur linux/mm/slab.c linux-prumpf/mm/slab.c --- linux/mm/slab.c Tue Mar 6 00:54:38 2001 +++ linux-prumpf/mm/slab.c Tue Mar 6 01:00:47 2001 @@ -1525,9 +1525,10 @@ * @flags: the type of memory to allocate. * * kmalloc is the normal method of allocating memory - * in the kernel. Note that the @size parameter must be less than or - * equals to %KMALLOC_MAXSIZE and the caller must ensure this. The @flags - * argument may be one of: + * in the kernel. It returns a pointer (aligned to a hardware cache line + * boundary) to the allocated memory, or %NULL in case of failure. Note that + * the @size parameter must be less than or equal to %KMALLOC_MAXSIZE and + * the caller must ensure this. The @flags argument may be one of: * * %GFP_USER - Allocate memory on behalf of user. May sleep. * - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] /drivers/char/serial.c cleanup
On Mon, Mar 05, 2001 at 03:37:04PM +0300, Andrey Panin wrote: > Attached patch (2.4.2-ac11) makes some changes in serial driver: > adds ioremap() return code checks, removes panic() calls > and adds better error handling in start_pci_pnp_board() function. Did you test it ? > diff -u /linux.vanilla/drivers/char/serial.c /linux/drivers/char/serial.c > --- /linux.vanilla/drivers/char/serial.c Thu Mar 1 20:15:43 2001 > +++ /linux/drivers/char/serial.c Fri Mar 2 00:10:29 2001 > @@ -3876,7 +3876,10 @@ > return 0; > } > req->io_type = SERIAL_IO_MEM; > - req->iomem_base = ioremap(port, board->uart_offset); > + if ((req->iomem_base = ioremap(port, board->uart_offset))) { > + printk(KERN_ERR "serial: Couldn's remap IO memory at %#lx\n", port); > + return 1; > + } This seems wrong. ioremap returns NULL in case of failure. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] /drivers/char/serial.c cleanup
On Mon, Mar 05, 2001 at 03:37:04PM +0300, Andrey Panin wrote: Attached patch (2.4.2-ac11) makes some changes in serial driver: adds ioremap() return code checks, removes panic() calls and adds better error handling in start_pci_pnp_board() function. Did you test it ? diff -u /linux.vanilla/drivers/char/serial.c /linux/drivers/char/serial.c --- /linux.vanilla/drivers/char/serial.c Thu Mar 1 20:15:43 2001 +++ /linux/drivers/char/serial.c Fri Mar 2 00:10:29 2001 @@ -3876,7 +3876,10 @@ return 0; } req-io_type = SERIAL_IO_MEM; - req-iomem_base = ioremap(port, board-uart_offset); + if ((req-iomem_base = ioremap(port, board-uart_offset))) { + printk(KERN_ERR "serial: Couldn's remap IO memory at %#lx\n", port); + return 1; + } This seems wrong. ioremap returns NULL in case of failure. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel] Re: [PATCH] 2.4.2: cure the kapm-idled taking (100-epsilon)% CPU time
On Sun, Mar 04, 2001 at 12:19:07AM +0100, Francis Galiegue wrote: > Well, from reading the source, I don't see how this can break APM... What am I > missing? apm_bios_call must not be called with two identical pointers for two different registers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel] Re: [PATCH] 2.4.2: cure the kapm-idled taking (100-epsilon)% CPU time
On Sun, Mar 04, 2001 at 12:19:07AM +0100, Francis Galiegue wrote: Well, from reading the source, I don't see how this can break APM... What am I missing? apm_bios_call must not be called with two identical pointers for two different registers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch-2.4.2-ac8] misc_register() fix (was Re: Linux 2.4.2ac8
On Fri, Mar 02, 2001 at 10:55:39AM +, Tigran Aivazian wrote: > On Fri, 2 Mar 2001, Tigran Aivazian wrote: > > On Thu, 1 Mar 2001, Alan Cox wrote: > > > 2.4.2-ac8 > > > o Stop two people claiming the same misc dev id (Philipp Rumpf) > > > > is this what has broken misc devi registration on my machine? I have two > > misc devices -- microcode and psaux -- now (ac8) I get none, /proc/misc is > > empty. Also, on boot gpm generates an "oops" from gpm.c(968) saying > > "/dev/mouse: No such device" > > Hi Alan, > > here is the fix, tested, it works fine. The only unsatisfactory thing is > that we do an extra if() on each iteration making misc_register() > typically a few instructions slower. I will think a few minutes on how to > make the old version work (i.e. I suspect it was just an incorrect walking > of the misc_list in ac8). See earlier patch. > --- linux/drivers/char/misc.c.0 Fri Mar 2 09:35:01 2001 > +++ linux/drivers/char/misc.c Fri Mar 2 10:01:17 2001 > @@ -175,14 +175,16 @@ > > if (misc->next || misc->prev) > return -EBUSY; > + > down(_sem); > - c = misc_list.next; > > - while ((c != _list) && (c->minor != misc->minor)) > + c = misc_list.next; > + while (c != _list) { > + if (c->minor == misc->minor) { > + up(_sem); > + return -EBUSY; > + } > c = c->next; > - if (c == _list) { > - up(_sem); > - return -EBUSY; > } > > if (misc->minor == MISC_DYNAMIC_MINOR) { This is fine as well, and possibly more readable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.2ac8 lost char devices
On Fri, Mar 02, 2001 at 03:05:32PM +0900, tachino Nobuhiro wrote: > > Hello, > > At Fri, 02 Mar 2001 00:42:28 -0500, > <[EMAIL PROTECTED]> wrote: > > > > actually, its not just ps/2 mice -- it seems to be something generic to char > > devices. agpgartis failing to register itself, too. > > > > what changed with char device handling from ac7 to ac8? > > > > robert love > > [EMAIL PROTECTED] > > - > > misc_register() in drivers/char/misc.c seems to have a problem. I guess it's too late now to pretend it's not my fault. I foolishly assumed if it would boot for me I couldn't have broken it too badly. > + c = misc_list.next; > + > + while ((c != _list) && (c->minor != misc->minor)) > + c = c->next; > + if (c == _list) { > > This should be (c != _list) Correct. Alan, diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c --- linux/drivers/char/misc.c Fri Mar 2 02:48:04 2001 +++ linux-prumpf/drivers/char/misc.cFri Mar 2 02:49:43 2001 @@ -180,7 +180,7 @@ while ((c != _list) && (c->minor != misc->minor)) c = c->next; - if (c == _list) { + if (c != _list) { up(_sem); return -EBUSY; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.2ac8 lost char devices
On Fri, Mar 02, 2001 at 03:05:32PM +0900, tachino Nobuhiro wrote: Hello, At Fri, 02 Mar 2001 00:42:28 -0500, [EMAIL PROTECTED] wrote: actually, its not just ps/2 mice -- it seems to be something generic to char devices. agpgartis failing to register itself, too. what changed with char device handling from ac7 to ac8? robert love [EMAIL PROTECTED] - misc_register() in drivers/char/misc.c seems to have a problem. I guess it's too late now to pretend it's not my fault. I foolishly assumed if it would boot for me I couldn't have broken it too badly. + c = misc_list.next; + + while ((c != misc_list) (c-minor != misc-minor)) + c = c-next; + if (c == misc_list) { This should be (c != misc_list) Correct. Alan, diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c --- linux/drivers/char/misc.c Fri Mar 2 02:48:04 2001 +++ linux-prumpf/drivers/char/misc.cFri Mar 2 02:49:43 2001 @@ -180,7 +180,7 @@ while ((c != misc_list) (c-minor != misc-minor)) c = c-next; - if (c == misc_list) { + if (c != misc_list) { up(misc_sem); return -EBUSY; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PCI oddities on Dell Inspiron 5000e w/ 2.4.x
On Sat, Feb 24, 2001 at 10:25:42AM -0700, Jeff Lessem wrote: > In your message of: Sat, 24 Feb 2001 09:54:47 CST, you write: > >Jeff, are you using the e820 memory map at all ? In particular, are you > >using grub or some other buggy bootloader that insists on specifying a > >mem= option on the kernel command line ? There should be a kernel command > >line message very early on, what does that say ? > > Yes, I am using grub, the buggy bootloader. The relevant chunk of > kernal messages are: > > BIOS-provided physical RAM map: > BIOS-e820: 0009f800 @ (usable) > BIOS-e820: 0800 @ 0009f800 (reserved) > BIOS-e820: 00019800 @ 000e6800 (reserved) > BIOS-e820: 13ef @ 0010 (usable) > BIOS-e820: fc00 @ 13ff (ACPI data) > BIOS-e820: 0400 @ 13fffc00 (ACPI NVS) > BIOS-e820: 0008 @ fff8 (reserved) > On node 0 totalpages: 81904 > zone(0): 4096 pages. > zone(1): 77808 pages. > zone(2): 0 pages. > Kernel command line: root=/dev/hda1 mem=327616K > > You are dead on, mem= seems a bit small. Forcing mem=320M on the > command line fixes the problem completely. Careful, you're overwriting ACPI data now (and using it as normal RAM). Can you try one of a) LILO b) a fixed version of grub c) this patch ? diff -ur linux/arch/i386/kernel/setup.c linux-prumpf/arch/i386/kernel/setup.c --- linux/arch/i386/kernel/setup.c Fri Feb 23 13:37:38 2001 +++ linux-prumpf/arch/i386/kernel/setup.c Sat Feb 24 09:49:50 2001 @@ -555,30 +555,9 @@ e820.nr_map = 0; usermem = 1; } else { - /* If the user specifies memory size, we -* blow away any automatically generated -* size -*/ - unsigned long start_at, mem_size; - - if (usermem == 0) { - /* first time in: zap the whitelist -* and reinitialize it with the -* standard low-memory region. -*/ - e820.nr_map = 0; - usermem = 1; - add_memory_region(0, LOWMEMSIZE(), E820_RAM); - } - mem_size = memparse(from+4, ); + memparse(from+4, ); if (*from == '@') - start_at = memparse(from+1, ); - else { - start_at = HIGH_MEMORY; - mem_size -= HIGH_MEMORY; - usermem=0; - } - add_memory_region(start_at, mem_size, E820_RAM); + memparse(from+1, ); } } c = *(from++); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PCI oddities on Dell Inspiron 5000e w/ 2.4.x
On Sat, Feb 24, 2001 at 05:36:47AM -0800, Linus Torvalds wrote: > On Sat, 24 Feb 2001, Jeff Lessem wrote: > > > > >Also, how much memory does this machine have? That "13ff" does worry > > >me a bit.. > > > > The comptuer has 320MB. At this point I am ready to conclude that the > > computer is broken in some way, because nobody else with an Inspiron > > 5000e that I have heard from has anything like this problem. > I didn't believe that you'd have 320MB just because it's such an odd > number, but the problem is that Linux apparently starts allocating the PCI > address space just _under_ the 320MB mark (you probably have some memory > reserved for ACPI that doesn't show up in the e820 memory map). Jeff, are you using the e820 memory map at all ? In particular, are you using grub or some other buggy bootloader that insists on specifying a mem= option on the kernel command line ? There should be a kernel command line message very early on, what does that say ? Also, can you give us the E820 memory map (kernel messages starting with BIOS-e820) ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PCI oddities on Dell Inspiron 5000e w/ 2.4.x
On Sat, Feb 24, 2001 at 05:36:47AM -0800, Linus Torvalds wrote: On Sat, 24 Feb 2001, Jeff Lessem wrote: Also, how much memory does this machine have? That "13ff" does worry me a bit.. The comptuer has 320MB. At this point I am ready to conclude that the computer is broken in some way, because nobody else with an Inspiron 5000e that I have heard from has anything like this problem. I didn't believe that you'd have 320MB just because it's such an odd number, but the problem is that Linux apparently starts allocating the PCI address space just _under_ the 320MB mark (you probably have some memory reserved for ACPI that doesn't show up in the e820 memory map). Jeff, are you using the e820 memory map at all ? In particular, are you using grub or some other buggy bootloader that insists on specifying a mem= option on the kernel command line ? There should be a kernel command line message very early on, what does that say ? Also, can you give us the E820 memory map (kernel messages starting with BIOS-e820) ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PCI oddities on Dell Inspiron 5000e w/ 2.4.x
On Sat, Feb 24, 2001 at 10:25:42AM -0700, Jeff Lessem wrote: In your message of: Sat, 24 Feb 2001 09:54:47 CST, you write: Jeff, are you using the e820 memory map at all ? In particular, are you using grub or some other buggy bootloader that insists on specifying a mem= option on the kernel command line ? There should be a kernel command line message very early on, what does that say ? Yes, I am using grub, the buggy bootloader. The relevant chunk of kernal messages are: BIOS-provided physical RAM map: BIOS-e820: 0009f800 @ (usable) BIOS-e820: 0800 @ 0009f800 (reserved) BIOS-e820: 00019800 @ 000e6800 (reserved) BIOS-e820: 13ef @ 0010 (usable) BIOS-e820: fc00 @ 13ff (ACPI data) BIOS-e820: 0400 @ 13fffc00 (ACPI NVS) BIOS-e820: 0008 @ fff8 (reserved) On node 0 totalpages: 81904 zone(0): 4096 pages. zone(1): 77808 pages. zone(2): 0 pages. Kernel command line: root=/dev/hda1 mem=327616K You are dead on, mem= seems a bit small. Forcing mem=320M on the command line fixes the problem completely. Careful, you're overwriting ACPI data now (and using it as normal RAM). Can you try one of a) LILO b) a fixed version of grub c) this patch ? diff -ur linux/arch/i386/kernel/setup.c linux-prumpf/arch/i386/kernel/setup.c --- linux/arch/i386/kernel/setup.c Fri Feb 23 13:37:38 2001 +++ linux-prumpf/arch/i386/kernel/setup.c Sat Feb 24 09:49:50 2001 @@ -555,30 +555,9 @@ e820.nr_map = 0; usermem = 1; } else { - /* If the user specifies memory size, we -* blow away any automatically generated -* size -*/ - unsigned long start_at, mem_size; - - if (usermem == 0) { - /* first time in: zap the whitelist -* and reinitialize it with the -* standard low-memory region. -*/ - e820.nr_map = 0; - usermem = 1; - add_memory_region(0, LOWMEMSIZE(), E820_RAM); - } - mem_size = memparse(from+4, from); + memparse(from+4, from); if (*from == '@') - start_at = memparse(from+1, from); - else { - start_at = HIGH_MEMORY; - mem_size -= HIGH_MEMORY; - usermem=0; - } - add_memory_region(start_at, mem_size, E820_RAM); + memparse(from+1, from); } } c = *(from++); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] (2.0) make softdog/hardware watchdog in same box work
While misc_register() semantics are different in 2.0 from 2.[24], and the 2.[24] code would actually work in 2.0, the 2.0 code doesn't. This fixes (I think) the case where you have softdog and a hardware watchdog driver on the same box (and obviously want to use the hardware watchdog). diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c --- linux/drivers/char/misc.c Thu Jun 4 00:17:47 1998 +++ linux-prumpf/drivers/char/misc.cTue Feb 20 18:05:46 2001 @@ -220,14 +220,17 @@ #ifdef CONFIG_SUN_MOUSE sun_mouse_init(); #endif -#ifdef CONFIG_SOFT_WATCHDOG - watchdog_init(); -#endif + /* In 2.0, only the first misc_register() is significant for each +* minor. So we load the hardware watchdog drivers first, then the +* softdog driver. */ #ifdef CONFIG_WDT wdt_init(); #endif #ifdef CONFIG_PCWATCHDOG pcwatchdog_init(); +#endif +#ifdef CONFIG_SOFT_WATCHDOG + watchdog_init(); #endif #ifdef CONFIG_APM apm_bios_init(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] make softdog/hardware watchdog in same box work
misc_register() overrides misc devices with the same minor that have been registered earlier, so if you enable both softdog and a hardware watchdog the current initialization order will leave you with softdog only. Should be fixed by (untested, 2.2): diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c --- linux/drivers/char/misc.c Mon Dec 11 01:49:42 2000 +++ linux-prumpf/drivers/char/misc.cTue Feb 20 17:49:30 2001 @@ -216,11 +216,15 @@ pc110pad_init(); #endif /* - * Only one watchdog can succeed. We probe the pcwatchdog first, - * then the wdt cards and finally the software watchdog which always - * works. This means if your hardware watchdog dies or is 'borrowed' - * for some reason the software watchdog still gives you some cover. + * Only one watchdog can succeed. We probe the software watchdog first, + * then the hardware watchdogs which can override softdog if you have + * both configured. This means if your hardware watchdog dies or is + * 'borrowed' for some reason the software watchdog still gives you + * some cover. */ +#ifdef CONFIG_SOFT_WATCHDOG + watchdog_init(); +#endif #ifdef CONFIG_PCWATCHDOG pcwatchdog_init(); #endif @@ -232,9 +236,6 @@ #endif #ifdef CONFIG_60XX_WDT sbc60xxwdt_init(); -#endif -#ifdef CONFIG_SOFT_WATCHDOG - watchdog_init(); #endif #ifdef CONFIG_DTLK dtlk_init(); and (untested, 2.4): diff -ur linux/drivers/char/Makefile linux-prumpf/drivers/char/Makefile --- linux/drivers/char/Makefile Thu Jan 4 22:00:55 2001 +++ linux-prumpf/drivers/char/Makefile Tue Feb 20 17:51:39 2001 @@ -170,11 +170,13 @@ obj-$(CONFIG_NWBUTTON) += nwbutton.o obj-$(CONFIG_NWFLASH) += nwflash.o -# Only one watchdog can succeed. We probe the hardware watchdog -# drivers first, then the softdog driver. This means if your hardware -# watchdog dies or is 'borrowed' for some reason the software watchdog -# still gives you some cover. +# Only one watchdog can succeed. We probe the software watchdog first, +# then the hardware watchdogs which can override softdog if you have +# both configured. This means if your hardware watchdog dies or is +# 'borrowed' for some reason the software watchdog still gives you +# some cover. +obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o obj-$(CONFIG_PCWATCHDOG) += pcwd.o obj-$(CONFIG_ACQUIRE_WDT) += acquirewdt.o obj-$(CONFIG_MIXCOMWD) += mixcomwd.o @@ -184,7 +186,6 @@ obj-$(CONFIG_21285_WATCHDOG) += wdt285.o obj-$(CONFIG_977_WATCHDOG) += wdt977.o obj-$(CONFIG_I810_TCO) += i810-tco.o -obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o include $(TOPDIR)/Rules.make - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Tue, 20 Feb 2001, Keith Owens wrote: > On Mon, 19 Feb 2001 16:04:07 + (GMT), > Alan Cox <[EMAIL PROTECTED]> wrote: > Using wait_for_at_least_one_schedule_on_every_cpu() has no penalty > except on the process running rmmod. It does not require yet more > spinlocks and is architecture independent. Since schedule() updates > sched_data->last_schedule, all the rmmod process has to do is wait > until the last_schedule value changes on all cpus, forcing a reschedule > if necessary. > > Zero overhead in schedule, zero overhead in exception handling, zero > overhead in IA64 unwind code, architecture independent. Sounds good to > me. Not architecture independent unfortunately. get_cycles() always returns 0 on some SMP-capable machines. ->last_schedule doesn't change if one CPU is always idle (kernel threads do), or always running the same process (kernel threads do, unless it's an RT process in an endless loop). I'm not sure how you'd go about "forcing a reschedule if necessary". Using temporary kernel threads has zero overhead in {schedule, exception handling, IA64 unwind code} and actually works on all architectures. It adds overhead to the wait_for_at_least_one_schedule_on_every_cpu() code, but I think that's acceptable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] new setprocuid syscall
> diff -urN linux-2.4.1/arch/i386/kernel/entry.S > linux-2.4.1-setprocuid/arch/i386/kernel/entry.S > --- linux-2.4.1/arch/i386/kernel/entry.SThu Nov 9 02:09:50 2000 > +++ linux-2.4.1-setprocuid/arch/i386/kernel/entry.S Mon Feb 19 > 22:12:00 2001 > @@ -645,6 +645,7 @@ > .long SYMBOL_NAME(sys_madvise) > .long SYMBOL_NAME(sys_getdents64) /* 220 */ > .long SYMBOL_NAME(sys_fcntl64) > + .long SYMBOL_NAME(sys_setprocuid) > .long SYMBOL_NAME(sys_ni_syscall) /* reserved for TUX */ > > /* You stole TUX's syscall slot. > + p = find_task_by_pid(pid); > + p->fsuid = p->euid = p->suid = p->uid = uid; p might be NULL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] new setprocuid syscall
diff -urN linux-2.4.1/arch/i386/kernel/entry.S linux-2.4.1-setprocuid/arch/i386/kernel/entry.S --- linux-2.4.1/arch/i386/kernel/entry.SThu Nov 9 02:09:50 2000 +++ linux-2.4.1-setprocuid/arch/i386/kernel/entry.S Mon Feb 19 22:12:00 2001 @@ -645,6 +645,7 @@ .long SYMBOL_NAME(sys_madvise) .long SYMBOL_NAME(sys_getdents64) /* 220 */ .long SYMBOL_NAME(sys_fcntl64) + .long SYMBOL_NAME(sys_setprocuid) .long SYMBOL_NAME(sys_ni_syscall) /* reserved for TUX */ /* You stole TUX's syscall slot. + p = find_task_by_pid(pid); + p-fsuid = p-euid = p-suid = p-uid = uid; p might be NULL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Tue, 20 Feb 2001, Keith Owens wrote: On Mon, 19 Feb 2001 16:04:07 + (GMT), Alan Cox [EMAIL PROTECTED] wrote: Using wait_for_at_least_one_schedule_on_every_cpu() has no penalty except on the process running rmmod. It does not require yet more spinlocks and is architecture independent. Since schedule() updates sched_data-last_schedule, all the rmmod process has to do is wait until the last_schedule value changes on all cpus, forcing a reschedule if necessary. Zero overhead in schedule, zero overhead in exception handling, zero overhead in IA64 unwind code, architecture independent. Sounds good to me. Not architecture independent unfortunately. get_cycles() always returns 0 on some SMP-capable machines. -last_schedule doesn't change if one CPU is always idle (kernel threads do), or always running the same process (kernel threads do, unless it's an RT process in an endless loop). I'm not sure how you'd go about "forcing a reschedule if necessary". Using temporary kernel threads has zero overhead in {schedule, exception handling, IA64 unwind code} and actually works on all architectures. It adds overhead to the wait_for_at_least_one_schedule_on_every_cpu() code, but I think that's acceptable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] make softdog/hardware watchdog in same box work
misc_register() overrides misc devices with the same minor that have been registered earlier, so if you enable both softdog and a hardware watchdog the current initialization order will leave you with softdog only. Should be fixed by (untested, 2.2): diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c --- linux/drivers/char/misc.c Mon Dec 11 01:49:42 2000 +++ linux-prumpf/drivers/char/misc.cTue Feb 20 17:49:30 2001 @@ -216,11 +216,15 @@ pc110pad_init(); #endif /* - * Only one watchdog can succeed. We probe the pcwatchdog first, - * then the wdt cards and finally the software watchdog which always - * works. This means if your hardware watchdog dies or is 'borrowed' - * for some reason the software watchdog still gives you some cover. + * Only one watchdog can succeed. We probe the software watchdog first, + * then the hardware watchdogs which can override softdog if you have + * both configured. This means if your hardware watchdog dies or is + * 'borrowed' for some reason the software watchdog still gives you + * some cover. */ +#ifdef CONFIG_SOFT_WATCHDOG + watchdog_init(); +#endif #ifdef CONFIG_PCWATCHDOG pcwatchdog_init(); #endif @@ -232,9 +236,6 @@ #endif #ifdef CONFIG_60XX_WDT sbc60xxwdt_init(); -#endif -#ifdef CONFIG_SOFT_WATCHDOG - watchdog_init(); #endif #ifdef CONFIG_DTLK dtlk_init(); and (untested, 2.4): diff -ur linux/drivers/char/Makefile linux-prumpf/drivers/char/Makefile --- linux/drivers/char/Makefile Thu Jan 4 22:00:55 2001 +++ linux-prumpf/drivers/char/Makefile Tue Feb 20 17:51:39 2001 @@ -170,11 +170,13 @@ obj-$(CONFIG_NWBUTTON) += nwbutton.o obj-$(CONFIG_NWFLASH) += nwflash.o -# Only one watchdog can succeed. We probe the hardware watchdog -# drivers first, then the softdog driver. This means if your hardware -# watchdog dies or is 'borrowed' for some reason the software watchdog -# still gives you some cover. +# Only one watchdog can succeed. We probe the software watchdog first, +# then the hardware watchdogs which can override softdog if you have +# both configured. This means if your hardware watchdog dies or is +# 'borrowed' for some reason the software watchdog still gives you +# some cover. +obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o obj-$(CONFIG_PCWATCHDOG) += pcwd.o obj-$(CONFIG_ACQUIRE_WDT) += acquirewdt.o obj-$(CONFIG_MIXCOMWD) += mixcomwd.o @@ -184,7 +186,6 @@ obj-$(CONFIG_21285_WATCHDOG) += wdt285.o obj-$(CONFIG_977_WATCHDOG) += wdt977.o obj-$(CONFIG_I810_TCO) += i810-tco.o -obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o include $(TOPDIR)/Rules.make - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] (2.0) make softdog/hardware watchdog in same box work
While misc_register() semantics are different in 2.0 from 2.[24], and the 2.[24] code would actually work in 2.0, the 2.0 code doesn't. This fixes (I think) the case where you have softdog and a hardware watchdog driver on the same box (and obviously want to use the hardware watchdog). diff -ur linux/drivers/char/misc.c linux-prumpf/drivers/char/misc.c --- linux/drivers/char/misc.c Thu Jun 4 00:17:47 1998 +++ linux-prumpf/drivers/char/misc.cTue Feb 20 18:05:46 2001 @@ -220,14 +220,17 @@ #ifdef CONFIG_SUN_MOUSE sun_mouse_init(); #endif -#ifdef CONFIG_SOFT_WATCHDOG - watchdog_init(); -#endif + /* In 2.0, only the first misc_register() is significant for each +* minor. So we load the hardware watchdog drivers first, then the +* softdog driver. */ #ifdef CONFIG_WDT wdt_init(); #endif #ifdef CONFIG_PCWATCHDOG pcwatchdog_init(); +#endif +#ifdef CONFIG_SOFT_WATCHDOG + watchdog_init(); #endif #ifdef CONFIG_APM apm_bios_init(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel_thread() & thread starting
On Mon, 19 Feb 2001, Russell King wrote: > Philipp Rumpf writes: > > That still won't catch keventd oopsing though - which I think might happen > > quite easily in real life. > > Maybe we should panic in that case? For example, what happens if kswapd > oopses? kreclaimd? bdflush? kupdate? All these have the same problem, No. If kswapd oopses it's a bug in kswapd (or related code). If keventd oopses most likely the broken code is actually the task queue you scheduled, which belongs to your driver. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Mon, 19 Feb 2001, Alan Cox wrote: > > So you fixed the nonexistent race only. The real race is that the module > > Umm I fixed the small race. You are right that there is a second race. There's just one race. The small race is nonexistent since put_mod_name always acts as a memory barrier. > > uninitialized vmalloc'd (module_map'd) memory), then the module data > > (including the exception table) gets copied. > > The race window is from the first copy_from_user in sys_init_module until > > the second one. > > Yep. Obvious answer. Ignore exception tables for modules that are not > MOD_RUNNING. You can have exceptions while initializing. not MOD_RUNNING|MOD_INITIALIZING should work though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Mon, 19 Feb 2001, Alan Cox wrote: > > so you hold a spinlock during copy_from_user ? Or did you change > > sys_init_module/sys_create_modules semantics ? > > The only points I need to hold a spinlock are editing the chain and > walking it in a case where I dont hold the kernel lock. > > So I spin_lock_irqsave the two ops updating the list links in each case and > the exception lookup walk So you fixed the nonexistent race only. The real race is that the module structure gets initialized first (so the exception table pointers point to uninitialized vmalloc'd (module_map'd) memory), then the module data (including the exception table) gets copied. The race window is from the first copy_from_user in sys_init_module until the second one. Or am I missing something ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Mon, 19 Feb 2001, Alan Cox wrote: > > Rusty had a patch that locked the module list properly IIRC. > > So does -ac now. I just take a spinlock for the modify cases that race > against faults (including vmalloc faults from irq context) so you hold a spinlock during copy_from_user ? Or did you change sys_init_module/sys_create_modules semantics ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel_thread() & thread starting
On Mon, 19 Feb 2001, Andrew Morton wrote: > David Woodhouse wrote: > > [EMAIL PROTECTED] said: > > > Why bother ? It looks like a leftover debugging message which > > > doesn't make a lot of sense once the code is stable (what might make > > > sense is checking keventd is still around, but that's not what the > > > code is doing). > > keventd *must* still be around. So put in the reliable check for it: in start_context_thread: while (!keventd_task) schedule(); in need_keventd: for_each_task(tsk) { if (tsk == keventd_task) { if (tsk->mm == NULL && strcmp(tsk->comm, "keventd") == 0) return 1; } } printk("eek."); return 0; > And the code obviously isn't completely stable, and this debug > message has found something rather unpleasant. So removing the debug code completely (which is what your patch does) doesn't sound like the right thing. > I don't think we should run the init tasks when keventd may, or > may not be running. Sure, the current code does, by happenstance, > all work correctly when keventd hasn't yet started running, and > when it's starting up. But it's safer, saner and surer just > to crank the damn thing up before proceeding. Agreed. The code snippet above should do that, no ? > I believe the right thing to do here is the RMK approach. That still won't catch keventd oopsing though - which I think might happen quite easily in real life. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Tue, 20 Feb 2001, Keith Owens wrote: > On Mon, 19 Feb 2001 06:15:22 -0600 (CST), > Philipp Rumpf <[EMAIL PROTECTED]> wrote: > No need for a callin routine, you can get this for free as part of > normal scheduling. The sequence goes :- > > if (use_count == 0) { > module_unregister(); > wait_for_at_least_one_schedule_on_every_cpu(); > if (use_count != 0) { > module_register();/* lost the unregister race */ > } > else { > /* nobody can enter the module now */ > module_release_resources(); > unlink_module_from_list(); > wait_for_at_least_one_schedule_on_every_cpu(); > free_module_storage(); > } > } > > wait_for_at_least_one_schedule_on_every_cpu() prevents the next wait_for_at_least_one_schedule_on_every_cpu() *is* callin_other_cpus(). I agree the name isn't optimal. (and yes, you could implement it by hacking sched.c directly, but I don't think that's necessary). > The beauty of this approach is that the rest of the cpus can do normal > work. No need to bring everything to a dead stop. Which nicely avoids potential deadlocks in modules that need to initialize on all CPUs. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Mon, 19 Feb 2001, Keith Owens wrote: > On Mon, 19 Feb 2001 11:35:08 + (GMT), > Alan Cox <[EMAIL PROTECTED]> wrote: > >The problem isnt running module code. What happens in this case > > > >mod->next = module_list; > >module_list = mod; /* link it in */ > > > >Note no write barrier. > > It works on ix86 so the code must be right Too bad it doesn't. > >Delete is even worse > > > >We unlink the module > >We free the memory > > > >At the same time another cpu may be walking the exception table that we free. > > Another good reason why locking modules via use counts from external > code is not the right fix. We definitely need a quiesce model for > module removal. Unless I'm mistaken, we need both use counts and SMP magic (though not necessarily as extreme as what the "freeze all other CPUs during module unload" patch did). I think something like this would work (in addition to use counts) int callin_func(void *p) { int *cpu = p; while (*cpu != smp_processor_id()) { current->cpus_allowed = 1 << *cpu; schedule(); } return 0; } void callin_other_cpus(void) { int cpus[smp_num_cpus]; int i; for (i=0; ihttp://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Mon, 19 Feb 2001, Alan Cox wrote: > mod->next = module_list; put_mod_name() here. > module_list = mod; /* link it in */ > > Note no write barrier. put_mod_name calls free_page which always implies a memory barrier. This isn't beautiful but it won't blow up either. Actually that isn't relevant since the actual table is copied _after_ ex_table_{start,end}. We'll scan uninitialized memory and if some word happens to match the fault EIP we jump to the bogus fixup. > Delete is even worse > > We unlink the module > We free the memory > > At the same time another cpu may be walking the exception table that we free. True. Rusty had a patch that locked the module list properly IIRC. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel_thread() & thread starting
On Sun, 18 Feb 2001, Kenn Humborg wrote: > in the .config, I can get schedule_task() to fail with: > >schedule_task(): keventd has not started This shouldn't be a failure case, just a (bogus) printk. > When starting bdflush and kupdated, bdflush_init() uses a semaphore to > make sure that the threads have run before continuing. Shouldn't > start_context_thread() do something similar? Why bother ? It looks like a leftover debugging message which doesn't make a lot of sense once the code is stable (what might make sense is checking keventd is still around, but that's not what the code is doing). Proposed patch: --- linux-2.4/kernel/context.c Fri Jan 12 18:52:41 2001 +++ linux-prumpf/kernel/context.c Mon Feb 19 11:11:37 2001 @@ -26,19 +26,9 @@ static int keventd_running; static struct task_struct *keventd_task; -static int need_keventd(const char *who) -{ - if (keventd_running == 0) - printk(KERN_ERR "%s(): keventd has not started\n", who); - return keventd_running; -} - int current_is_keventd(void) { - int ret = 0; - if (need_keventd(__FUNCTION__)) - ret = (current == keventd_task); - return ret; + return current == keventd_task; } /** @@ -57,7 +47,6 @@ int schedule_task(struct tq_struct *task) { int ret; - need_keventd(__FUNCTION__); ret = queue_task(task, _context); wake_up(_task_wq); return ret; dwmw2 ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Thu, 15 Feb 2001, Alan Cox wrote: > Question of the day for the VM folks: > If CPU1 is loading the exception tables for a module and > CPU2 faults.. what happens 8) "loading" == in sys_create_module ? The module list is modified atomically, so either we search the new table or we don't, but we never see intermediate states. Not searching the new table shouldn't be a problem as we shouldn't run module code until sys_init_module time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Thu, 15 Feb 2001, Alan Cox wrote: Question of the day for the VM folks: If CPU1 is loading the exception tables for a module and CPU2 faults.. what happens 8) "loading" == in sys_create_module ? The module list is modified atomically, so either we search the new table or we don't, but we never see intermediate states. Not searching the new table shouldn't be a problem as we shouldn't run module code until sys_init_module time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel_thread() thread starting
On Sun, 18 Feb 2001, Kenn Humborg wrote: in the .config, I can get schedule_task() to fail with: schedule_task(): keventd has not started This shouldn't be a failure case, just a (bogus) printk. When starting bdflush and kupdated, bdflush_init() uses a semaphore to make sure that the threads have run before continuing. Shouldn't start_context_thread() do something similar? Why bother ? It looks like a leftover debugging message which doesn't make a lot of sense once the code is stable (what might make sense is checking keventd is still around, but that's not what the code is doing). Proposed patch: --- linux-2.4/kernel/context.c Fri Jan 12 18:52:41 2001 +++ linux-prumpf/kernel/context.c Mon Feb 19 11:11:37 2001 @@ -26,19 +26,9 @@ static int keventd_running; static struct task_struct *keventd_task; -static int need_keventd(const char *who) -{ - if (keventd_running == 0) - printk(KERN_ERR "%s(): keventd has not started\n", who); - return keventd_running; -} - int current_is_keventd(void) { - int ret = 0; - if (need_keventd(__FUNCTION__)) - ret = (current == keventd_task); - return ret; + return current == keventd_task; } /** @@ -57,7 +47,6 @@ int schedule_task(struct tq_struct *task) { int ret; - need_keventd(__FUNCTION__); ret = queue_task(task, tq_context); wake_up(context_task_wq); return ret; dwmw2 ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Mon, 19 Feb 2001, Alan Cox wrote: mod-next = module_list; put_mod_name() here. module_list = mod; /* link it in */ Note no write barrier. put_mod_name calls free_page which always implies a memory barrier. This isn't beautiful but it won't blow up either. Actually that isn't relevant since the actual table is copied _after_ ex_table_{start,end}. We'll scan uninitialized memory and if some word happens to match the fault EIP we jump to the bogus fixup. Delete is even worse We unlink the module We free the memory At the same time another cpu may be walking the exception table that we free. True. Rusty had a patch that locked the module list properly IIRC. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Mon, 19 Feb 2001, Keith Owens wrote: On Mon, 19 Feb 2001 11:35:08 + (GMT), Alan Cox [EMAIL PROTECTED] wrote: The problem isnt running module code. What happens in this case mod-next = module_list; module_list = mod; /* link it in */ Note no write barrier. humourIt works on ix86 so the code must be right/humour Too bad it doesn't. Delete is even worse We unlink the module We free the memory At the same time another cpu may be walking the exception table that we free. Another good reason why locking modules via use counts from external code is not the right fix. We definitely need a quiesce model for module removal. Unless I'm mistaken, we need both use counts and SMP magic (though not necessarily as extreme as what the "freeze all other CPUs during module unload" patch did). I think something like this would work (in addition to use counts) int callin_func(void *p) { int *cpu = p; while (*cpu != smp_processor_id()) { current-cpus_allowed = 1 *cpu; schedule(); } return 0; } void callin_other_cpus(void) { int cpus[smp_num_cpus]; int i; for (i=0; ismp_num_cpus; i++) { cpus[i] = i; kernel_thread(callin_func, cpus[i], ...); } } and call callin_other_cpus() before unloading a module. I'm not sure how you could make exception handling safe without locking all accesses to the module list - but that sounds like the sane thing to do anyway. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Tue, 20 Feb 2001, Keith Owens wrote: On Mon, 19 Feb 2001 06:15:22 -0600 (CST), Philipp Rumpf [EMAIL PROTECTED] wrote: No need for a callin routine, you can get this for free as part of normal scheduling. The sequence goes :- if (use_count == 0) { module_unregister(); wait_for_at_least_one_schedule_on_every_cpu(); if (use_count != 0) { module_register();/* lost the unregister race */ } else { /* nobody can enter the module now */ module_release_resources(); unlink_module_from_list(); wait_for_at_least_one_schedule_on_every_cpu(); free_module_storage(); } } wait_for_at_least_one_schedule_on_every_cpu() prevents the next wait_for_at_least_one_schedule_on_every_cpu() *is* callin_other_cpus(). I agree the name isn't optimal. (and yes, you could implement it by hacking sched.c directly, but I don't think that's necessary). The beauty of this approach is that the rest of the cpus can do normal work. No need to bring everything to a dead stop. Which nicely avoids potential deadlocks in modules that need to initialize on all CPUs. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel_thread() thread starting
On Mon, 19 Feb 2001, Andrew Morton wrote: David Woodhouse wrote: [EMAIL PROTECTED] said: Why bother ? It looks like a leftover debugging message which doesn't make a lot of sense once the code is stable (what might make sense is checking keventd is still around, but that's not what the code is doing). keventd *must* still be around. So put in the reliable check for it: in start_context_thread: while (!keventd_task) schedule(); in need_keventd: for_each_task(tsk) { if (tsk == keventd_task) { if (tsk-mm == NULL strcmp(tsk-comm, "keventd") == 0) return 1; } } printk("eek."); return 0; And the code obviously isn't completely stable, and this debug message has found something rather unpleasant. So removing the debug code completely (which is what your patch does) doesn't sound like the right thing. I don't think we should run the init tasks when keventd may, or may not be running. Sure, the current code does, by happenstance, all work correctly when keventd hasn't yet started running, and when it's starting up. But it's safer, saner and surer just to crank the damn thing up before proceeding. Agreed. The code snippet above should do that, no ? I believe the right thing to do here is the RMK approach. That still won't catch keventd oopsing though - which I think might happen quite easily in real life. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Mon, 19 Feb 2001, Alan Cox wrote: Rusty had a patch that locked the module list properly IIRC. So does -ac now. I just take a spinlock for the modify cases that race against faults (including vmalloc faults from irq context) so you hold a spinlock during copy_from_user ? Or did you change sys_init_module/sys_create_modules semantics ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Mon, 19 Feb 2001, Alan Cox wrote: so you hold a spinlock during copy_from_user ? Or did you change sys_init_module/sys_create_modules semantics ? The only points I need to hold a spinlock are editing the chain and walking it in a case where I dont hold the kernel lock. So I spin_lock_irqsave the two ops updating the list links in each case and the exception lookup walk So you fixed the nonexistent race only. The real race is that the module structure gets initialized first (so the exception table pointers point to uninitialized vmalloc'd (module_map'd) memory), then the module data (including the exception table) gets copied. The race window is from the first copy_from_user in sys_init_module until the second one. Or am I missing something ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.1-ac15
On Mon, 19 Feb 2001, Alan Cox wrote: So you fixed the nonexistent race only. The real race is that the module Umm I fixed the small race. You are right that there is a second race. There's just one race. The small race is nonexistent since put_mod_name always acts as a memory barrier. uninitialized vmalloc'd (module_map'd) memory), then the module data (including the exception table) gets copied. The race window is from the first copy_from_user in sys_init_module until the second one. Yep. Obvious answer. Ignore exception tables for modules that are not MOD_RUNNING. You can have exceptions while initializing. not MOD_RUNNING|MOD_INITIALIZING should work though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel_thread() thread starting
On Mon, 19 Feb 2001, Russell King wrote: Philipp Rumpf writes: That still won't catch keventd oopsing though - which I think might happen quite easily in real life. Maybe we should panic in that case? For example, what happens if kswapd oopses? kreclaimd? bdflush? kupdate? All these have the same problem, No. If kswapd oopses it's a bug in kswapd (or related code). If keventd oopses most likely the broken code is actually the task queue you scheduled, which belongs to your driver. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.2-pre3: parport_pc init_module bug
On Wed, 14 Feb 2001, Grant Grundler wrote: > Philipp Rumpf wrote: > > Jeff Garzik wrote: > > > Looks ok, but I wonder if we should include this list in the docs. > > > These is stuff defined by the PCI spec, and this list could potentially > > > get longer... (opinions either way wanted...) > > Having people look things up in the spec isn't very user friendly. Having the constants in some well-known header file should be sufficient, shouldn't it ? > > I'm not sure whether the > > plan is to have drivers handle MSIs or do it in the generic PCI code. > > Grant ? > > Generic PCI code can d very little by itself with MSI since not all > platforms provide support for it - even within the same arch. It depends on the platform and maybe the exact PCI slot used, but I don't think it depends on the driver (unless MSI support is broken in which case you would want to fix it up in the driver). At least I can't find anything in the PCI 2.2 spec that would suggest we need to consult the driver before enabling MSIs with one message only. > It's also possible for the driver to just ignore MSI and not use it. > ie use regular PCI IRQ lines for interrupts. .. at least until someone comes out with a PCI board that supports MSI interrupts only. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.2-pre3: parport_pc init_module bug
On Wed, 14 Feb 2001, Jeff Garzik wrote: > On Wed, 14 Feb 2001, Tim Waugh wrote: > > +/** > > + * pci_find_capability - query for devices' capabilities > > + * @dev: PCI device to query > > + * @cap: capability code > > + * > > + * Tell if a device supports a given PCI capability. > > + * Returns the address of the requested capability structure within the device's >PCI > > + * configuration space or 0 in case the device does not support it. > > + * Possible values for @flags: ^^ @cap would make more sense, no ? > > + * %PCI_CAP_ID_AGP Accelerated Graphics Port > > + * > > + * %PCI_CAP_ID_VPD Vital Product Data > > + * > > + * %PCI_CAP_ID_SLOTID Slot Identification > > + * > > + * %PCI_CAP_ID_MSI Message Signalled Interrupts > > + * > > + * %PCI_CAP_ID_CHSWPCompactPCI HotSwap > > + */ > > Looks ok, but I wonder if we should include this list in the docs. > These is stuff defined by the PCI spec, and this list could potentially > get longer... (opinions either way wanted...) I would vote for including those capabilities which are most likely to be used by drivers (PCI_CAP_ID_PM, and maybe _VPD). I'm not sure whether the plan is to have drivers handle MSIs or do it in the generic PCI code. Grant ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.2-pre3: parport_pc init_module bug
On Wed, 14 Feb 2001, Tim Waugh wrote: > + * pci_find_subsys - begin or continue searching for a PCI device by >vendor/subvendor/device/subdevice id > + * @vendor: PCI vendor id to match, or %PCI_ANY_ID to match all vendor ids > + * @device: PCI device id to match, or %PCI_ANY_ID to match all vendor ids "match all device ids", surely ? > + * @ss_device: PCI subsystem device id to match, or %PCI_ANY_ID to match all vendor >ids ditto. (the pci_find_device documentation has "all vendor ids" as well) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.2-pre3: parport_pc init_module bug
On Wed, 14 Feb 2001, Tim Waugh wrote: + * pci_find_subsys - begin or continue searching for a PCI device by vendor/subvendor/device/subdevice id + * @vendor: PCI vendor id to match, or %PCI_ANY_ID to match all vendor ids + * @device: PCI device id to match, or %PCI_ANY_ID to match all vendor ids "match all device ids", surely ? + * @ss_device: PCI subsystem device id to match, or %PCI_ANY_ID to match all vendor ids ditto. (the pci_find_device documentation has "all vendor ids" as well) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.2-pre3: parport_pc init_module bug
On Wed, 14 Feb 2001, Jeff Garzik wrote: On Wed, 14 Feb 2001, Tim Waugh wrote: +/** + * pci_find_capability - query for devices' capabilities + * @dev: PCI device to query + * @cap: capability code + * + * Tell if a device supports a given PCI capability. + * Returns the address of the requested capability structure within the device's PCI + * configuration space or 0 in case the device does not support it. + * Possible values for @flags: ^^ @cap would make more sense, no ? + * %PCI_CAP_ID_AGP Accelerated Graphics Port + * + * %PCI_CAP_ID_VPD Vital Product Data + * + * %PCI_CAP_ID_SLOTID Slot Identification + * + * %PCI_CAP_ID_MSI Message Signalled Interrupts + * + * %PCI_CAP_ID_CHSWPCompactPCI HotSwap + */ Looks ok, but I wonder if we should include this list in the docs. These is stuff defined by the PCI spec, and this list could potentially get longer... (opinions either way wanted...) I would vote for including those capabilities which are most likely to be used by drivers (PCI_CAP_ID_PM, and maybe _VPD). I'm not sure whether the plan is to have drivers handle MSIs or do it in the generic PCI code. Grant ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0 kernel paging error
On Wed, Jan 10, 2001 at 05:55:05PM +0100, Daniel Phillips wrote: > Mark Hindley wrote: > > I am running 2.4.0 final. I got the following failed paging request which > > produced a complete freeze. > > > > As you can see it was precipitated by cron starting to run some > > housekeeping stuff overnight. > > > > Has anyone else had prblems? > > It looks real. It was executing this line of clear_inode in fs/inode.c: > > 380 if (inode->i_sb && inode->i_sb->s_op && inode->i_sb->s_op->clear_inode) > ^ >and it blew up here -> > Unable to handle kernel paging request at virtual address c4870840 I'm pretty sure this is a vmalloc/module address, which would mean ->s_op points to a module that has been unloaded. This sounds consistent with the "cron starting to run some housekeeping stuff" above. Mark, which file systems are you using ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0 kernel paging error
On Wed, Jan 10, 2001 at 05:55:05PM +0100, Daniel Phillips wrote: Mark Hindley wrote: I am running 2.4.0 final. I got the following failed paging request which produced a complete freeze. As you can see it was precipitated by cron starting to run some housekeeping stuff overnight. Has anyone else had prblems? It looks real. It was executing this line of clear_inode in fs/inode.c: 380 if (inode-i_sb inode-i_sb-s_op inode-i_sb-s_op-clear_inode) ^ and it blew up here - Unable to handle kernel paging request at virtual address c4870840 I'm pretty sure this is a vmalloc/module address, which would mean -s_op points to a module that has been unloaded. This sounds consistent with the "cron starting to run some housekeeping stuff" above. Mark, which file systems are you using ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: innd mmap bug in 2.4.0-test12
On Wed, Dec 27, 2000 at 03:41:04PM -0800, Linus Torvalds wrote: > It must be wrong. > > If we have a dirty page on the LRU lists, that page _must_ have a mapping. What about pages with a mapping but without a writepage function ? or pages whose writepage function fails ? The current code seems to simply put the page onto the active list in that case, which seems just as wrong to me. > The bug is somewhere else, and your patch is just papering it over. We > should not have a page without a mapping on the LRU lists in the first > place, except if the page has anonymous buffers (and such a page cannot So is there any legal reason we could ever get to page_active ? Removing that code (or replacing it with BUG()) certainly would make page_launder more readable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: innd mmap bug in 2.4.0-test12
On Wed, Dec 27, 2000 at 03:41:04PM -0800, Linus Torvalds wrote: It must be wrong. If we have a dirty page on the LRU lists, that page _must_ have a mapping. What about pages with a mapping but without a writepage function ? or pages whose writepage function fails ? The current code seems to simply put the page onto the active list in that case, which seems just as wrong to me. The bug is somewhere else, and your patch is just papering it over. We should not have a page without a mapping on the LRU lists in the first place, except if the page has anonymous buffers (and such a page cannot So is there any legal reason we could ever get to page_active ? Removing that code (or replacing it with BUG()) certainly would make page_launder more readable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: /dev/random: really secure?
On Sun, Dec 17, 2000 at 10:50:57PM +0100, Karel Kulhavy wrote: > I noticed peculiarities in the behaviour of the delta-delta-3 system for > entropy estimation in the random.c code./ When I hold right alt or control, I > get about 8 bits of entropy per repeat fro the /dev/random which is > overestimated. I think the real entropy is 0 bits because it is absolutely > deterministic when the interrupt comes. Am I right or is there any hidden not absolutely, but we should ignore repeated keys that generate more than one scancode. tytso, here's the patch to do it again: --- linux/drivers/char/random.c Sun Jul 30 18:01:23 2000 +++ linux-prumpf/drivers/char/random.c Thu Sep 28 17:07:03 2000 @@ -763,10 +763,15 @@ void add_keyboard_randomness(unsigned char scancode) { - static unsigned char last_scancode = 0; - /* ignore autorepeat (multiple key down w/o key up) */ - if (scancode != last_scancode) { - last_scancode = scancode; + static unsigned char last_scancode[2] = { 0, 0 }; + + /* ignore autorepeat (multiple key down w/o key up). +* add_keyboard_randomness is called twice for certain AT keyboard +* keys, so we keep a longer history. */ + if (scancode != last_scancode[0] && + scancode != last_scancode[1]) { + last_scancode[0] = last_scancode[1]; + last_scancode[1] = scancode; add_timer_randomness(_timer_state, scancode); } } If we want to rely solely on the add_timer_randomness checks, we should remove the autorepeat check completely. Philipp Rumpf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: /dev/random: really secure?
On Sun, Dec 17, 2000 at 10:50:57PM +0100, Karel Kulhavy wrote: I noticed peculiarities in the behaviour of the delta-delta-3 system for entropy estimation in the random.c code./ When I hold right alt or control, I get about 8 bits of entropy per repeat fro the /dev/random which is overestimated. I think the real entropy is 0 bits because it is absolutely deterministic when the interrupt comes. Am I right or is there any hidden not absolutely, but we should ignore repeated keys that generate more than one scancode. tytso, here's the patch to do it again: --- linux/drivers/char/random.c Sun Jul 30 18:01:23 2000 +++ linux-prumpf/drivers/char/random.c Thu Sep 28 17:07:03 2000 @@ -763,10 +763,15 @@ void add_keyboard_randomness(unsigned char scancode) { - static unsigned char last_scancode = 0; - /* ignore autorepeat (multiple key down w/o key up) */ - if (scancode != last_scancode) { - last_scancode = scancode; + static unsigned char last_scancode[2] = { 0, 0 }; + + /* ignore autorepeat (multiple key down w/o key up). +* add_keyboard_randomness is called twice for certain AT keyboard +* keys, so we keep a longer history. */ + if (scancode != last_scancode[0] + scancode != last_scancode[1]) { + last_scancode[0] = last_scancode[1]; + last_scancode[1] = scancode; add_timer_randomness(keyboard_timer_state, scancode); } } If we want to rely solely on the add_timer_randomness checks, we should remove the autorepeat check completely. Philipp Rumpf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i850 support
On Sun, Dec 17, 2000 at 12:52:59AM -0800, Drew Hess wrote: > It also contains an experimental patch to drivers/pci/quirks.c that forces > standby mode for pool C RDRAM devices. I've seen some benchmarks > comparing the Intel D850GB motherboard to the ASUS P4T and attributing the > better performance of the P4T to the fact that Intel's BIOS puts pool C > RDRAM in nap mode and has no option to change it, whereas the P4T does. > On my D850GB, the patch doesn't seem to change the performance of STREAM > one way or the other; but when I change the value written into the RDST > register to set nap mode and shrink the A and B pools to 1 device each, I > see about a 20-30MB/s drop-off on STREAM, so I'm pretty sure the code does > what it's supposed to. > > This code isn't strictly a 'quirk' in the sense that some of the bug > workarounds in drivers/pci/quirks.c are, but I didn't see a more > appropriate place to put it. Suggestions are welcome. sounds like powertweak is what you want. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i850 support
On Sun, Dec 17, 2000 at 12:52:59AM -0800, Drew Hess wrote: It also contains an experimental patch to drivers/pci/quirks.c that forces standby mode for pool C RDRAM devices. I've seen some benchmarks comparing the Intel D850GB motherboard to the ASUS P4T and attributing the better performance of the P4T to the fact that Intel's BIOS puts pool C RDRAM in nap mode and has no option to change it, whereas the P4T does. On my D850GB, the patch doesn't seem to change the performance of STREAM one way or the other; but when I change the value written into the RDST register to set nap mode and shrink the A and B pools to 1 device each, I see about a 20-30MB/s drop-off on STREAM, so I'm pretty sure the code does what it's supposed to. This code isn't strictly a 'quirk' in the sense that some of the bug workarounds in drivers/pci/quirks.c are, but I didn't see a more appropriate place to put it. Suggestions are welcome. sounds like powertweak is what you want. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.18pre25
On Fri, Dec 08, 2000 at 10:47:46AM +0100, Willy Tarreau wrote: > |Bus 0, device 2, function 1: > | Unknown class: Intel OEM MegaRAID Controller (rev 5). > |Medium devsel. Fast back-to-back capable. BIST capable. IRQ 10. Master > Capable. Latency=64. > |Prefetchable 32 bit memory at 0xf000 [0xf008]. > > as you see, the board is found at 0xf008, but used aligned to 0xf000. No. It's found at 0xf000, and has 8 bytes of MMIO space. > my server currently works with that patch, but I'm sure it won't boot anymore > if I apply this 2.2.18pre25 alone. "I'm sure" meaning "I didn't test it" ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.18pre25
On Fri, Dec 08, 2000 at 10:47:46AM +0100, Willy Tarreau wrote: |Bus 0, device 2, function 1: | Unknown class: Intel OEM MegaRAID Controller (rev 5). |Medium devsel. Fast back-to-back capable. BIST capable. IRQ 10. Master Capable. Latency=64. |Prefetchable 32 bit memory at 0xf000 [0xf008]. as you see, the board is found at 0xf008, but used aligned to 0xf000. No. It's found at 0xf000, and has 8 bytes of MMIO space. my server currently works with that patch, but I'm sure it won't boot anymore if I apply this 2.2.18pre25 alone. "I'm sure" meaning "I didn't test it" ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mutliple root devs (take II)
On Fri, Dec 01, 2000 at 08:04:52PM -0500, Jeff Dike wrote: > boot memory allocator (see mm/bootmem.c). In the arch that I'm most familiar > with (arch/um), that is usable from the beginning of start_kernel. I don't > know about the other arches. setup_arch does the necessary initialization on most architectures - you might want to consider fixing UML to be consistent with them. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mutliple root devs (take II)
On Fri, Dec 01, 2000 at 08:04:52PM -0500, Jeff Dike wrote: boot memory allocator (see mm/bootmem.c). In the arch that I'm most familiar with (arch/um), that is usable from the beginning of start_kernel. I don't know about the other arches. setup_arch does the necessary initialization on most architectures - you might want to consider fixing UML to be consistent with them. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: How to transfer memory from PCI memory directly to user space safely and portable?
On Mon, Nov 27, 2000 at 10:36:34AM -0800, H. Peter Anvin wrote: > Followup to: <[EMAIL PROTECTED]> > By author:Philipp Rumpf <[EMAIL PROTECTED]> > In newsgroup: linux.dev.kernel > > > > I hope count isn't provided by userspace here ? > > > > > 1. What happens if the user space memory is swapped to disk? Will > > > verify_area() make sure that the memory is in physical RAM when it returns, > > > or will it return -EFAULT, or will something even worse happen? > > > > On i386, you'll sleep implicitly waiting for the page fault to be handled; in > > the generic case, anything could happen. > > > > That doesn't sound right. I would expect it to wait for the page to > be brought in on any and all architectures, otherwise it seems rather > impossible to write portable Linux kernel code. The code in question was memcpy_fromio(user_space_dst, iobase, count); Assuming user_space_dst is a userspace pointer, what I said is true; on some architectures we will be dereferencing random pointers in kernel space, and we won't get -EFAULT right on any architecture. Or did I miss something ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: How to transfer memory from PCI memory directly to user space safely and portable?
On Mon, Nov 27, 2000 at 10:36:34AM -0800, H. Peter Anvin wrote: Followup to: [EMAIL PROTECTED] By author:Philipp Rumpf [EMAIL PROTECTED] In newsgroup: linux.dev.kernel I hope count isn't provided by userspace here ? 1. What happens if the user space memory is swapped to disk? Will verify_area() make sure that the memory is in physical RAM when it returns, or will it return -EFAULT, or will something even worse happen? On i386, you'll sleep implicitly waiting for the page fault to be handled; in the generic case, anything could happen. That doesn't sound right. I would expect it to wait for the page to be brought in on any and all architectures, otherwise it seems rather impossible to write portable Linux kernel code. The code in question was memcpy_fromio(user_space_dst, iobase, count); Assuming user_space_dst is a userspace pointer, what I said is true; on some architectures we will be dereferencing random pointers in kernel space, and we won't get -EFAULT right on any architecture. Or did I miss something ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: How to transfer memory from PCI memory directly to user space safely and portable?
On Sun, Nov 26, 2000 at 02:21:31PM +0100, Anders Torger wrote: > memcpy_toio(iobase, user_space_src, count); I hope count isn't provided by userspace here ? > 1. What happens if the user space memory is swapped to disk? Will > verify_area() make sure that the memory is in physical RAM when it returns, > or will it return -EFAULT, or will something even worse happen? On i386, you'll sleep implicitly waiting for the page fault to be handled; in the generic case, anything could happen. > 2. Is this code really portable? I currently have an I386 architecture, and I > could use copy_to/from_user on that instead, but that is not portable. Now, > by using memcpy_to/fromio instead, is this code fully portable? No. It would be portable if you were using memcpy_fromuser_toio and it existed. > 3. Will the current process always be the correct one? The copy functions is > directly initiated by the user, and not through an interrupt, so I think the > user space mapping will always be to the correct process. Is that correct? current should be fine if you're not in a bh/interrupt/kernel thread. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] removal of "static foo = 0"
On Sun, Nov 26, 2000 at 10:37:07AM +, Tigran Aivazian wrote: > On Sat, 25 Nov 2000, Tim Waugh wrote: > > Why doesn't the compiler just leave out explicit zeros from the > > 'initial data' segment then? Seems like it ought to be tought to.. > > yes, taught to, _BUT_ never let this to be a default option, please. > Because there are valid cases where a programmer things "this is in .data" That's what __attribute__ ((section (".data"))) is for. > and that means this should be in .data. Think of binary patching an object > as one valid example (there may be others, I forgot). can you think of any valid examples that apply to the kernel ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] removal of "static foo = 0"
On Sun, Nov 26, 2000 at 04:25:05AM +, Alan Cox wrote: > static int a=0; > > says 'I thought about this. I want it to start at zero. I've written it this > way to remind of the fact' > > Sure it generates the same code I agree it would be best if gcc would generate the same code; unfortunately this doesn't seem to be the case, which sounds like something to take up with the gcc developers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] removal of static foo = 0
On Sun, Nov 26, 2000 at 04:25:05AM +, Alan Cox wrote: static int a=0; says 'I thought about this. I want it to start at zero. I've written it this way to remind of the fact' Sure it generates the same code I agree it would be best if gcc would generate the same code; unfortunately this doesn't seem to be the case, which sounds like something to take up with the gcc developers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: How to transfer memory from PCI memory directly to user space safely and portable?
On Sun, Nov 26, 2000 at 02:21:31PM +0100, Anders Torger wrote: memcpy_toio(iobase, user_space_src, count); I hope count isn't provided by userspace here ? 1. What happens if the user space memory is swapped to disk? Will verify_area() make sure that the memory is in physical RAM when it returns, or will it return -EFAULT, or will something even worse happen? On i386, you'll sleep implicitly waiting for the page fault to be handled; in the generic case, anything could happen. 2. Is this code really portable? I currently have an I386 architecture, and I could use copy_to/from_user on that instead, but that is not portable. Now, by using memcpy_to/fromio instead, is this code fully portable? No. It would be portable if you were using memcpy_fromuser_toio and it existed. 3. Will the current process always be the correct one? The copy functions is directly initiated by the user, and not through an interrupt, so I think the user space mapping will always be to the correct process. Is that correct? current should be fine if you're not in a bh/interrupt/kernel thread. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: *_trylock return on success?
On Sat, Nov 25, 2000 at 08:03:49PM +0100, Roger Larsson wrote: > > _trylock functions return 0 for success. > > Not spin_trylock Argh, I missed the (recent ?) change to make x86 spinlocks use 1 to mean unlocked. You're correct, and obviously this should be fixed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: *_trylock return on success?
On Sat, Nov 25, 2000 at 03:49:25PM -0200, Rik van Riel wrote: > On Sat, 25 Nov 2000, Roger Larsson wrote: > > > Questions: > > What are _trylocks supposed to return? > > It depends on the type of _trylock ;( > > > Does spin_trylock and down_trylock behave differently? > > Why isn't the expected return value documented? > > The whole trylock stuff is, IMHO, a big mess. When you > change from one type of trylock to another, you may be > forced to invert the logic of your code since the return > code from the different locks is different. > > For bitflags, for example, the trylock returns the state > the bit had before the lock (ie. 1 if the thing was already > locked). I assume you're talking about test_and_{set,clear}_bit here. Their return value isn't consistent with the other _trylock functions since they're not _trylock functions. I think the real problem is that people use test_and_set_bit for locks, which is almost never[1] a good idea. The overhead for a semaphore shouldn't be too much in most cases, and that way it is obvious what you want to do - and, hopefully, even more obvious if you end up with a semaphore that can be turned into a spinlock without further changes. > For spinlocks, it'll probably return something else ;/ _trylock functions return 0 for success. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: *_trylock return on success?
On Sat, Nov 25, 2000 at 03:49:25PM -0200, Rik van Riel wrote: On Sat, 25 Nov 2000, Roger Larsson wrote: Questions: What are _trylocks supposed to return? It depends on the type of _trylock ;( Does spin_trylock and down_trylock behave differently? Why isn't the expected return value documented? The whole trylock stuff is, IMHO, a big mess. When you change from one type of trylock to another, you may be forced to invert the logic of your code since the return code from the different locks is different. For bitflags, for example, the trylock returns the state the bit had before the lock (ie. 1 if the thing was already locked). I assume you're talking about test_and_{set,clear}_bit here. Their return value isn't consistent with the other _trylock functions since they're not _trylock functions. I think the real problem is that people use test_and_set_bit for locks, which is almost never[1] a good idea. The overhead for a semaphore shouldn't be too much in most cases, and that way it is obvious what you want to do - and, hopefully, even more obvious if you end up with a semaphore that can be turned into a spinlock without further changes. For spinlocks, it'll probably return something else ;/ _trylock functions return 0 for success. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: *_trylock return on success?
On Sat, Nov 25, 2000 at 08:03:49PM +0100, Roger Larsson wrote: _trylock functions return 0 for success. Not spin_trylock Argh, I missed the (recent ?) change to make x86 spinlocks use 1 to mean unlocked. You're correct, and obviously this should be fixed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: jiffies wrap question...
On Fri, Nov 10, 2000 at 02:21:28AM -0500, Jeff Garzik wrote: > The following is a piece of code from the latter half of > schedule_timeout, in kernel/sched.c. Is it possible that > schedule_timeout could return an incorrect value, if the jiffy value > wraps between the first and last lines shown below. let's say the first line happens n ticks before the wrap and the second, m ticks afterwards. > expire = timeout + jiffies; expire = timeout + 2^32/2^64 - n = timeout - n > timeout = expire - jiffies; timeout = expire - m = timeout - n - m = timeout - (n+m) (n+m) is the time that actually passed while we were asleep, and timeout has the correct value (timeout - ticks that happened)> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: jiffies wrap question...
On Fri, Nov 10, 2000 at 02:21:28AM -0500, Jeff Garzik wrote: The following is a piece of code from the latter half of schedule_timeout, in kernel/sched.c. Is it possible that schedule_timeout could return an incorrect value, if the jiffy value wraps between the first and last lines shown below. let's say the first line happens n ticks before the wrap and the second, m ticks afterwards. expire = timeout + jiffies; expire = timeout + 2^32/2^64 - n = timeout - n timeout = expire - jiffies; timeout = expire - m = timeout - n - m = timeout - (n+m) (n+m) is the time that actually passed while we were asleep, and timeout has the correct value (timeout - ticks that happened) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: i82808 hardware hub RNG
On Sun, Nov 05, 2000 at 06:19:21PM +0100, Heusden, Folkert van wrote: > I wrote a daemon that fetches (as root-user) random numbers from the RNG in > the i82808 (found on 815-chipsets). > You can download it from http://www.vanheusden.com/Linux/random.php3 . > Currently, I'm trying to rewrite things into a kernel-module so that one has > a standard character device which can deliver random values then. > Please give it a try as I do not own a PC with such a motherboard ;-/ So how is this different from drivers/char/i810_rng.c ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: i82808 hardware hub RNG
On Sun, Nov 05, 2000 at 06:19:21PM +0100, Heusden, Folkert van wrote: I wrote a daemon that fetches (as root-user) random numbers from the RNG in the i82808 (found on 815-chipsets). You can download it from http://www.vanheusden.com/Linux/random.php3 . Currently, I'm trying to rewrite things into a kernel-module so that one has a standard character device which can deliver random values then. Please give it a try as I do not own a PC with such a motherboard ;-/ So how is this different from drivers/char/i810_rng.c ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4 Status / TODO page (Updated as of 2.4.0-test10)
> > 3. Security > > * Fix module remove race bug (still to be done: TTY, ldisc, I2C, >video_device - Al Viro) (Rogier Wolff will handle ATM) TBD: sysctl, kernel_thread * drivers/input/mousedev.c dereferences userspace pointers directly. We should make this fail for a bit to catch those bugs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4 Status / TODO page (Updated as of 2.4.0-test10)
3. Security * Fix module remove race bug (still to be done: TTY, ldisc, I2C, video_device - Al Viro) (Rogier Wolff will handle ATM) TBD: sysctl, kernel_thread * drivers/input/mousedev.c dereferences userspace pointers directly. We should make this fail for a bit to catch those bugs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PROPOSED PATCH] ATM refcount + firestream
On Sat, Oct 28, 2000 at 11:50:22AM -0400, Brian Gerst wrote: > Philipp Rumpf wrote: > > > > On Sat, Oct 28, 2000 at 09:55:21AM -0400, Brian Gerst wrote: > > > Yes, but they can be called (and sleep) with module refcount == 0. This > > > is because the file descripter used to perform the ioctl isn't directly > > > associated with the network device, thereby not incrementing the > > > refcount on open. > > > > According to my proposal, it is perfectly safe to call a function in a module > > while the module's use count is 0. This function would typically look like this: > > > > foo() > > { > > MOD_INC_USE_COUNT; > > > > copy_*_user() (or anything else that sleeps); > > > > MOD_DEC_USE_COUNT; > > > > return bar; > > } > > > > The only difference to the "old" module scheme is that the above currently isn't > > safe on SMP systems. > > This will only work while the kernel is not preemptable. Once the > kernel thread can be rescheduled, all bets are off. The implementation will need adjusting for preemptable kernel threads. The concept still works fine. > With or without your patch, the network ioctls are unsafe, since they > don't currently do refcounting at all. I was under the impression most network ioctls didn't sleep. > Adding it in the layer above thTe > driver is the easier and cleaner solution. I disagree. I dislike special-casing inter-module calls (and that's not even taking into account that the current implementation of an inter-module call is quite ugly). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PROPOSED PATCH] ATM refcount + firestream
On Sat, Oct 28, 2000 at 11:50:22AM -0400, Brian Gerst wrote: Philipp Rumpf wrote: On Sat, Oct 28, 2000 at 09:55:21AM -0400, Brian Gerst wrote: Yes, but they can be called (and sleep) with module refcount == 0. This is because the file descripter used to perform the ioctl isn't directly associated with the network device, thereby not incrementing the refcount on open. According to my proposal, it is perfectly safe to call a function in a module while the module's use count is 0. This function would typically look like this: foo() { MOD_INC_USE_COUNT; copy_*_user() (or anything else that sleeps); MOD_DEC_USE_COUNT; return bar; } The only difference to the "old" module scheme is that the above currently isn't safe on SMP systems. This will only work while the kernel is not preemptable. Once the kernel thread can be rescheduled, all bets are off. The implementation will need adjusting for preemptable kernel threads. The concept still works fine. With or without your patch, the network ioctls are unsafe, since they don't currently do refcounting at all. I was under the impression most network ioctls didn't sleep. Adding it in the layer above thTe driver is the easier and cleaner solution. I disagree. I dislike special-casing inter-module calls (and that's not even taking into account that the current implementation of an inter-module call is quite ugly). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PROPOSED PATCH] ATM refcount + firestream
On Sat, Oct 28, 2000 at 09:55:21AM -0400, Brian Gerst wrote: > Yes, but they can be called (and sleep) with module refcount == 0. This > is because the file descripter used to perform the ioctl isn't directly > associated with the network device, thereby not incrementing the > refcount on open. According to my proposal, it is perfectly safe to call a function in a module while the module's use count is 0. This function would typically look like this: foo() { MOD_INC_USE_COUNT; copy_*_user() (or anything else that sleeps); MOD_DEC_USE_COUNT; return bar; } The only difference to the "old" module scheme is that the above currently isn't safe on SMP systems. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PROPOSED PATCH] ATM refcount + firestream
On Sat, Oct 28, 2000 at 09:37:28AM -0400, Brian Gerst wrote: > Philipp Rumpf wrote: > > - you can't copy_(from|to)_user in the module exit function (which would > > be copies from/to rmmod anyway) > > Unfortunately, you need to be able to use copy_*_user() from the network > ioctls, and this is the center of the current issue. You misunderstood. The network ioctls aren't module_exit functions, so copy_*_user in them is fine. Philipp Rumpf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PROPOSED PATCH] ATM refcount + firestream
On Fri, Oct 27, 2000 at 10:49:53PM +1100, Andrew Morton wrote: > Look, this modules stuff is really bad. Phillip Rumpf proposed > a radical alternative a while back which I felt was not given While it might be a "radical alternative", it doesn't require any changes to the subsystems that have been fixed so far. At this time, applying the patch would basically fix the rest of the subsystems as well (if the drivers use MOD_{INC,DEC}_USE_COUNT, that is). > sufficient consideration. The idea was to make sys_delete_module() > grab all the other CPUs and leave them spinning on a flag while > the unload was proceeding. This was very similar to > arch/i386/kernel/apm.c:apm_power_off(). The idea here is other CPUs don't have to deal with the kernel going through a number of inconsistent states while a module is unloaded. At any point in time, for any module, exactly one of the following is true: 1. you're in the module_exit function 2. the module is (being) loaded 3. the module isn't loaded. > As far as I can recall, the only restriction was that you are > not allowed to call module functions when the module refcount > is zero if those functions can call schedule(). There are other restrictions which shouldn't really matter: - you can't schedule() and hope you end up on a particular CPU (you can use smp_call_function though) - you can't copy_(from|to)_user in the module exit function (which would be copies from/to rmmod anyway) > prumpf, please dig out that patch. attached (rediff against test10-pre6, it seems to work). diff -urN linux/arch/i386/mm/extable.c linux-prumpf/arch/i386/mm/extable.c --- linux/arch/i386/mm/extable.cWed Nov 10 08:31:37 1999 +++ linux-prumpf/arch/i386/mm/extable.c Sat Oct 28 06:14:43 2000 @@ -30,26 +30,30 @@ return 0; } +extern struct rw_semaphore module_mutex; + unsigned long search_exception_table(unsigned long addr) { - unsigned long ret; + unsigned long ret = 0; #ifndef CONFIG_MODULES /* There is only the kernel to search. */ ret = search_one_table(__start___ex_table, __stop___ex_table-1, addr); - if (ret) return ret; #else /* The kernel is the last "module" -- no need to treat it special. */ struct module *mp; + + down_read(_mutex); for (mp = module_list; mp != NULL; mp = mp->next) { if (mp->ex_table_start == NULL) continue; ret = search_one_table(mp->ex_table_start, mp->ex_table_end - 1, addr); - if (ret) return ret; + if (ret) break; } + up_read(_mutex); #endif - return 0; + return ret; } diff -urN linux/include/linux/smp.h linux-prumpf/include/linux/smp.h --- linux/include/linux/smp.h Tue Sep 26 14:31:51 2000 +++ linux-prumpf/include/linux/smp.hSat Oct 28 06:14:43 2000 @@ -71,6 +71,9 @@ #define MSG_RESCHEDULE 0x0003 /* Reschedule request from master CPU*/ #define MSG_CALL_FUNCTION 0x0004 /* Call function on all other CPUs */ +extern int freeze_other_cpus(void); +extern void melt_other_cpus(void); + #else /* @@ -86,6 +89,8 @@ #define cpu_number_map(cpu)0 #define smp_call_function(func,info,retry,wait)({ 0; }) #define cpu_online_map 1 +#define freeze_other_cpus()0 +#define melt_other_cpus() do { } while(0) #endif #endif diff -urN linux/kernel/module.c linux-prumpf/kernel/module.c --- linux/kernel/module.c Sat Oct 28 05:03:26 2000 +++ linux-prumpf/kernel/module.cSat Oct 28 06:14:43 2000 @@ -37,6 +37,11 @@ ex_table_end: __stop___ex_table }; +/* module_mutex protects module_list - if you can down_read() it, modules + * won't be added/removed, but their usecount, flags aso still might change. + */ + +DECLARE_RWSEM(module_mutex); struct module *module_list = _module; static long get_mod_name(const char *user_name, char **buf); @@ -105,7 +110,6 @@ if (!capable(CAP_SYS_MODULE)) return -EPERM; - lock_kernel(); if ((namelen = get_mod_name(name_user, )) < 0) { error = namelen; goto err0; @@ -114,32 +118,40 @@ error = -EINVAL; goto err1; } + lock_kernel(); + down_write(_mutex); if (find_module(name) != NULL) { error = -EEXIST; - goto err1; + goto err2; } if ((mod = (struct module *)module_map(size)) == NULL) { error = -ENOMEM; - goto err1; + goto err2; } memset(mod, 0, sizeof(*mod)); mod->size_of_struct = sizeof(*mod); - mod->next = module_list; mod->name = (char *)(mod + 1); mod->size = size; memcpy((char*)(mod+1), name, namelen+1); put_mod_name(name); +
Re: [PROPOSED PATCH] ATM refcount + firestream
On Fri, Oct 27, 2000 at 10:49:53PM +1100, Andrew Morton wrote: Look, this modules stuff is really bad. Phillip Rumpf proposed a radical alternative a while back which I felt was not given While it might be a "radical alternative", it doesn't require any changes to the subsystems that have been fixed so far. At this time, applying the patch would basically fix the rest of the subsystems as well (if the drivers use MOD_{INC,DEC}_USE_COUNT, that is). sufficient consideration. The idea was to make sys_delete_module() grab all the other CPUs and leave them spinning on a flag while the unload was proceeding. This was very similar to arch/i386/kernel/apm.c:apm_power_off(). The idea here is other CPUs don't have to deal with the kernel going through a number of inconsistent states while a module is unloaded. At any point in time, for any module, exactly one of the following is true: 1. you're in the module_exit function 2. the module is (being) loaded 3. the module isn't loaded. As far as I can recall, the only restriction was that you are not allowed to call module functions when the module refcount is zero if those functions can call schedule(). There are other restrictions which shouldn't really matter: - you can't schedule() and hope you end up on a particular CPU (you can use smp_call_function though) - you can't copy_(from|to)_user in the module exit function (which would be copies from/to rmmod anyway) prumpf, please dig out that patch. attached (rediff against test10-pre6, it seems to work). diff -urN linux/arch/i386/mm/extable.c linux-prumpf/arch/i386/mm/extable.c --- linux/arch/i386/mm/extable.cWed Nov 10 08:31:37 1999 +++ linux-prumpf/arch/i386/mm/extable.c Sat Oct 28 06:14:43 2000 @@ -30,26 +30,30 @@ return 0; } +extern struct rw_semaphore module_mutex; + unsigned long search_exception_table(unsigned long addr) { - unsigned long ret; + unsigned long ret = 0; #ifndef CONFIG_MODULES /* There is only the kernel to search. */ ret = search_one_table(__start___ex_table, __stop___ex_table-1, addr); - if (ret) return ret; #else /* The kernel is the last "module" -- no need to treat it special. */ struct module *mp; + + down_read(module_mutex); for (mp = module_list; mp != NULL; mp = mp-next) { if (mp-ex_table_start == NULL) continue; ret = search_one_table(mp-ex_table_start, mp-ex_table_end - 1, addr); - if (ret) return ret; + if (ret) break; } + up_read(module_mutex); #endif - return 0; + return ret; } diff -urN linux/include/linux/smp.h linux-prumpf/include/linux/smp.h --- linux/include/linux/smp.h Tue Sep 26 14:31:51 2000 +++ linux-prumpf/include/linux/smp.hSat Oct 28 06:14:43 2000 @@ -71,6 +71,9 @@ #define MSG_RESCHEDULE 0x0003 /* Reschedule request from master CPU*/ #define MSG_CALL_FUNCTION 0x0004 /* Call function on all other CPUs */ +extern int freeze_other_cpus(void); +extern void melt_other_cpus(void); + #else /* @@ -86,6 +89,8 @@ #define cpu_number_map(cpu)0 #define smp_call_function(func,info,retry,wait)({ 0; }) #define cpu_online_map 1 +#define freeze_other_cpus()0 +#define melt_other_cpus() do { } while(0) #endif #endif diff -urN linux/kernel/module.c linux-prumpf/kernel/module.c --- linux/kernel/module.c Sat Oct 28 05:03:26 2000 +++ linux-prumpf/kernel/module.cSat Oct 28 06:14:43 2000 @@ -37,6 +37,11 @@ ex_table_end: __stop___ex_table }; +/* module_mutex protects module_list - if you can down_read() it, modules + * won't be added/removed, but their usecount, flags aso still might change. + */ + +DECLARE_RWSEM(module_mutex); struct module *module_list = kernel_module; static long get_mod_name(const char *user_name, char **buf); @@ -105,7 +110,6 @@ if (!capable(CAP_SYS_MODULE)) return -EPERM; - lock_kernel(); if ((namelen = get_mod_name(name_user, name)) 0) { error = namelen; goto err0; @@ -114,32 +118,40 @@ error = -EINVAL; goto err1; } + lock_kernel(); + down_write(module_mutex); if (find_module(name) != NULL) { error = -EEXIST; - goto err1; + goto err2; } if ((mod = (struct module *)module_map(size)) == NULL) { error = -ENOMEM; - goto err1; + goto err2; } memset(mod, 0, sizeof(*mod)); mod-size_of_struct = sizeof(*mod); - mod-next = module_list; mod-name = (char *)(mod + 1); mod-size = size; memcpy((char*)(mod+1), name, namelen+1); put_mod_name(name); +
Re: [PROPOSED PATCH] ATM refcount + firestream
On Sat, Oct 28, 2000 at 09:37:28AM -0400, Brian Gerst wrote: Philipp Rumpf wrote: - you can't copy_(from|to)_user in the module exit function (which would be copies from/to rmmod anyway) Unfortunately, you need to be able to use copy_*_user() from the network ioctls, and this is the center of the current issue. You misunderstood. The network ioctls aren't module_exit functions, so copy_*_user in them is fine. Philipp Rumpf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PROPOSED PATCH] ATM refcount + firestream
On Sat, Oct 28, 2000 at 09:55:21AM -0400, Brian Gerst wrote: Yes, but they can be called (and sleep) with module refcount == 0. This is because the file descripter used to perform the ioctl isn't directly associated with the network device, thereby not incrementing the refcount on open. According to my proposal, it is perfectly safe to call a function in a module while the module's use count is 0. This function would typically look like this: foo() { MOD_INC_USE_COUNT; copy_*_user() (or anything else that sleeps); MOD_DEC_USE_COUNT; return bar; } The only difference to the "old" module scheme is that the above currently isn't safe on SMP systems. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 0-byte read()/write() behaviour
On Fri, Oct 20, 2000 at 10:47:45AM -0700, Linus Torvalds wrote: > On Fri, 20 Oct 2000, Philipp Rumpf wrote: > > > > Single Unix specifies that 0-byte reads, as well as 0-byte writes, should > > "return 0 and have no other results". Our current implementation violates > > the first requirement and makes it very easy to violate the second one. > > Note that there _are_ cases where 0-byte reads and writes have specific > meaning, notably there are some networking things where a 0-byte sendto() > does something special if I remember correctly. And I seem to remember > that this also _did_ translate into write(). sock_write and sock_read contain: if(size==0) /* Match SYS5 behaviour */ return 0; I'm not sure which other read()/write() functions are used by "networking things", but I don't see any others sendto would map to. So I suspect if there are any applications which would need the behaviour you described they've already been broken, and should be recompiled to use sys_send(|to|msg) (via sys_socketcall on x86, of course). > I remember that Linux used to do exactly this, and we had to pass the > 0-byte writes into the low-level cases exactly because some low-level > cases do care. I would suspect most of those have been eliminated by now. > I suspect SUS only talks about regular files. As I'm reading it, they're talking about every read() call, even those with an invalid fd. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 0-byte read()/write() behaviour
Single Unix specifies that 0-byte reads, as well as 0-byte writes, should "return 0 and have no other results". Our current implementation violates the first requirement and makes it very easy to violate the second one. read(page_cache_fd, invalid_ptr, 0) returns -EFAULT; IMHO, this is a clear bug - suppose you have managed to acquire the page covering virtual addresses up to and including PAGE_OFFSET-1 (on x86). read(fd, PAGE_OFFSET-n, n) now is successful for PAGE_SIZE >= n > 0, but fails with -EFAULT for n = 0. looking through drivers/char/*.c, many drivers seem to behave unexpectedly for a 0-byte read; this includes code that overwrites the byte read()'s second argument points to, endless loops, and several different error returns. generic_file_read and generice_file_write, which I suppose are the most critical functions for read()/write() performance, both have special cases to handle the count == 0 case. So I think modifying sys_read and sys_write to take care of the count == 0 case and only calling the file-specific function when it's actually necessary makes most sense; as generic_file_(read|write) get simplified accordingly, there should be no performance impact. Linus ? patch against test10-pre4 diff -ur linux/fs/read_write.c linux-prumpf/fs/read_write.c --- linux/fs/read_write.c Fri Oct 20 04:52:58 2000 +++ linux-prumpf/fs/read_write.cFri Oct 20 06:03:23 2000 @@ -120,6 +120,10 @@ ssize_t ret; struct file * file; + /* Required for SU compatibility. It also simplifies generic_file_read */ + if (!count) + return 0; + ret = -EBADF; file = fget(fd); if (file) { @@ -145,6 +149,10 @@ { ssize_t ret; struct file * file; + + /* Required for SU compatibility. */ + if (!count) + return 0; ret = -EBADF; file = fget(fd); diff -ur linux/mm/filemap.c linux-prumpf/mm/filemap.c --- linux/mm/filemap.c Fri Oct 20 05:05:04 2000 +++ linux-prumpf/mm/filemap.c Fri Oct 20 06:08:41 2000 @@ -1124,27 +1124,18 @@ */ ssize_t generic_file_read(struct file * filp, char * buf, size_t count, loff_t *ppos) { - ssize_t retval; + read_descriptor_t desc; - retval = -EFAULT; - if (access_ok(VERIFY_WRITE, buf, count)) { - retval = 0; - - if (count) { - read_descriptor_t desc; - - desc.written = 0; - desc.count = count; - desc.buf = buf; - desc.error = 0; - do_generic_file_read(filp, ppos, , file_read_actor); - - retval = desc.written; - if (!retval) - retval = desc.error; - } - } - return retval; + if (!access_ok(VERIFY_WRITE, buf, count)) + return -EFAULT; + + desc.written = 0; + desc.count = count; + desc.buf = buf; + desc.error = 0; + do_generic_file_read(filp, ppos, , file_read_actor); + + return desc.written ? : desc.error; } static int file_send_actor(read_descriptor_t * desc, struct page *page, unsigned long offset , unsigned long size) @@ -2424,13 +2415,12 @@ } status = 0; - if (count) { - remove_suid(inode); - inode->i_ctime = inode->i_mtime = CURRENT_TIME; - mark_inode_dirty(inode); - } - while (count) { + remove_suid(inode); + inode->i_ctime = inode->i_mtime = CURRENT_TIME; + mark_inode_dirty(inode); + + do { unsigned long bytes, index, offset; char *kaddr; @@ -2480,7 +2470,7 @@ if (status < 0) break; - } + } while (count); *ppos = pos; if (cached_page) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 0-byte read()/write() behaviour
Single Unix specifies that 0-byte reads, as well as 0-byte writes, should "return 0 and have no other results". Our current implementation violates the first requirement and makes it very easy to violate the second one. read(page_cache_fd, invalid_ptr, 0) returns -EFAULT; IMHO, this is a clear bug - suppose you have managed to acquire the page covering virtual addresses up to and including PAGE_OFFSET-1 (on x86). read(fd, PAGE_OFFSET-n, n) now is successful for PAGE_SIZE = n 0, but fails with -EFAULT for n = 0. looking through drivers/char/*.c, many drivers seem to behave unexpectedly for a 0-byte read; this includes code that overwrites the byte read()'s second argument points to, endless loops, and several different error returns. generic_file_read and generice_file_write, which I suppose are the most critical functions for read()/write() performance, both have special cases to handle the count == 0 case. So I think modifying sys_read and sys_write to take care of the count == 0 case and only calling the file-specific function when it's actually necessary makes most sense; as generic_file_(read|write) get simplified accordingly, there should be no performance impact. Linus ? patch against test10-pre4 diff -ur linux/fs/read_write.c linux-prumpf/fs/read_write.c --- linux/fs/read_write.c Fri Oct 20 04:52:58 2000 +++ linux-prumpf/fs/read_write.cFri Oct 20 06:03:23 2000 @@ -120,6 +120,10 @@ ssize_t ret; struct file * file; + /* Required for SU compatibility. It also simplifies generic_file_read */ + if (!count) + return 0; + ret = -EBADF; file = fget(fd); if (file) { @@ -145,6 +149,10 @@ { ssize_t ret; struct file * file; + + /* Required for SU compatibility. */ + if (!count) + return 0; ret = -EBADF; file = fget(fd); diff -ur linux/mm/filemap.c linux-prumpf/mm/filemap.c --- linux/mm/filemap.c Fri Oct 20 05:05:04 2000 +++ linux-prumpf/mm/filemap.c Fri Oct 20 06:08:41 2000 @@ -1124,27 +1124,18 @@ */ ssize_t generic_file_read(struct file * filp, char * buf, size_t count, loff_t *ppos) { - ssize_t retval; + read_descriptor_t desc; - retval = -EFAULT; - if (access_ok(VERIFY_WRITE, buf, count)) { - retval = 0; - - if (count) { - read_descriptor_t desc; - - desc.written = 0; - desc.count = count; - desc.buf = buf; - desc.error = 0; - do_generic_file_read(filp, ppos, desc, file_read_actor); - - retval = desc.written; - if (!retval) - retval = desc.error; - } - } - return retval; + if (!access_ok(VERIFY_WRITE, buf, count)) + return -EFAULT; + + desc.written = 0; + desc.count = count; + desc.buf = buf; + desc.error = 0; + do_generic_file_read(filp, ppos, desc, file_read_actor); + + return desc.written ? : desc.error; } static int file_send_actor(read_descriptor_t * desc, struct page *page, unsigned long offset , unsigned long size) @@ -2424,13 +2415,12 @@ } status = 0; - if (count) { - remove_suid(inode); - inode-i_ctime = inode-i_mtime = CURRENT_TIME; - mark_inode_dirty(inode); - } - while (count) { + remove_suid(inode); + inode-i_ctime = inode-i_mtime = CURRENT_TIME; + mark_inode_dirty(inode); + + do { unsigned long bytes, index, offset; char *kaddr; @@ -2480,7 +2470,7 @@ if (status 0) break; - } + } while (count); *ppos = pos; if (cached_page) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 0-byte read()/write() behaviour
On Fri, Oct 20, 2000 at 10:47:45AM -0700, Linus Torvalds wrote: On Fri, 20 Oct 2000, Philipp Rumpf wrote: Single Unix specifies that 0-byte reads, as well as 0-byte writes, should "return 0 and have no other results". Our current implementation violates the first requirement and makes it very easy to violate the second one. Note that there _are_ cases where 0-byte reads and writes have specific meaning, notably there are some networking things where a 0-byte sendto() does something special if I remember correctly. And I seem to remember that this also _did_ translate into write(). sock_write and sock_read contain: if(size==0) /* Match SYS5 behaviour */ return 0; I'm not sure which other read()/write() functions are used by "networking things", but I don't see any others sendto would map to. So I suspect if there are any applications which would need the behaviour you described they've already been broken, and should be recompiled to use sys_send(|to|msg) (via sys_socketcall on x86, of course). I remember that Linux used to do exactly this, and we had to pass the 0-byte writes into the low-level cases exactly because some low-level cases do care. I would suspect most of those have been eliminated by now. I suspect SUS only talks about regular files. As I'm reading it, they're talking about every read() call, even those with an invalid fd. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 MM overview?
On Sun, Oct 15, 2000 at 11:35:23PM +0100, Kenn Humborg wrote: > On Sun, Oct 15, 2000 at 09:45:11PM +0100, Alan Cox wrote: > > > Well, we ain't got these luxuries/complications in VAXland... Hell, > > > we don't even have two-level page tables :-( > > > > Really. Ugh. I always assumed Vax had at least two levels because mmap on > > 4.2 BSD used to panic on 128K+ blocks. I guess there was a different reason > > for that then > > We've kind of got 1.5-level page tables. There are actually 3 page tables. > The system page table maps memory starting at 0x8000. The P0 process > page table maps from 0x0 up and the P1 process page table maps from > 0x7fff down. And they have to be physically contiguous I guess ? > This means that sparse address spaces are going to be _really_ expensive > on PTEs. I don't know how much of a problem this is going to be yet, > but I'm sure it's going to be fun :-) 512 byte pages, 4 bytes per pte ? Ouch. Can you fill the TLB manually ? OTOH, I think mapping all physical memory makes sense with the three page table setup. Philipp Rumpf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/