Re: [PATCH] powerpc: use is_init()
On Wed, Dec 20, 2006 at 03:06:51PM +1100, Paul Mackerras wrote: > Akinobu Mita writes: > > > Use is_init() rather than hard coded pid comparison. > > What's the context of this patch? Why is this a good thing to do? > This is just minor cleanup patch. is_init() is available on 2.6.20-rc1 (include/linux/sched.h): /** * is_init - check if a task structure is init * @tsk: Task structure to be checked. * * Check if a task structure is the first user space task the kernel created. */ static inline int is_init(struct task_struct *tsk) { return tsk->pid == 1; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
> Seriously. How many pieces of userspace-visible functionality have > recently been removed without there being any sort of alternative? There IS an alternative, you're using it for networking: You *down the interface*. If there's a NIC that doesn't support that let us (or preferably netdev) know and it'll get fixed quickly I'm sure. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: problem with signal delivery SIGCHLD
On Mon, 2006-12-18 at 20:05 +0100, Nicholas Mc Guire wrote: > > Hi ! > > I have a phenomena that I don't quite understand. gdbserver forks and > after setting ptrace (PTRACE_TRACEME, 0, 0, 0); it then execv > (program, allargs); when this child process hits ptrace_stoped (breakpoint > it does the following in kernel space: > > pid 1242 = child process > pid 1241 = gdbserver > pid 0= kernel > pid -1 = interrupt > pid > 1559 51242 ptrace_stop > 3 6 21242 | do_notify_parent_cldstop > 4 3 21242 | | __group_send_sig_info > 5 1 11242 | | | handle_stop_signal > 7 0 01242 | | | sig_ignored > 8 1 01242 | | __wake_up_sync > 8 1 11242 | | | __wake_up_common > 105475411242 | schedule > 10 2 21242 | | profile_hit > 13 1 11242 | | sched_clock > 15 1 01242 | | deactivate_task > 15 1 11242 | | | dequeue_task > 19 2 2 0 | | __switch_to > --- start -- > 24574574 0 default_idle > --- end > --- start -- > 780 41 12 0 do_IRQ > 780 29 2 -1 / __do_IRQ > ... > 807 2 2 -1 / / / enable_8259A_irq > --- end > --- start -- > 810 11 0 0 do_softirq > ... > 820 0 0 -1 { { { preempt_schedule > --- end > --- start -- > 822358 1 0 preempt_schedule_irq > ... > 827 1 11241 % % __switch_to > --- end > 829 1 11241 ( ( ( del_timer > --- end > --- start -- > 837 8 21241 sys_waitpid > > So basically child signals -> delayed to next tick -> parent wakes up. Hm. What does the trace of gdbserver look like prior to the clild doing do_notify_parent_cldstop()? Sleeping someplace other than wait4? -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kernel-doc: allow unnamed structs/unions
From: Randy Dunlap <[EMAIL PROTECTED]> Make kernel-doc support unnamed (anonymous) structs and unions. There is one (union) in include/linux/skbuff.h (inside struct sk_buff) that is currently generating a kernel-doc warning, so this fixes that warning. Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- scripts/kernel-doc | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) --- linux-2.6.20-rc1-git7.orig/scripts/kernel-doc +++ linux-2.6.20-rc1-git7/scripts/kernel-doc @@ -1469,6 +1469,7 @@ sub push_parameter($$$) { my $param = shift; my $type = shift; my $file = shift; + my $anon = 0; my $param_name = $param; $param_name =~ s/\[.*//; @@ -1484,9 +1485,20 @@ sub push_parameter($$$) { $param="void"; $parameterdescs{void} = "no arguments"; } + elsif ($type eq "" && ($param eq "struct" or $param eq "union")) + # handle unnamed (anonymous) union or struct: + { + $type = $param; + $param = "{unnamed_" . $param. "}"; + $parameterdescs{$param} = "anonymous\n"; + $anon = 1; + } + # warn if parameter has no description - # (but ignore ones starting with # as these are no parameters - # but inline preprocessor statements + # (but ignore ones starting with # as these are not parameters + # but inline preprocessor statements); + # also ignore unnamed structs/unions; + if (!$anon) { if (!defined $parameterdescs{$param_name} && $param_name !~ /^#/) { $parameterdescs{$param_name} = $undescribed; @@ -1500,6 +1512,7 @@ sub push_parameter($$$) { " No description found for parameter '$param'\n"; ++$warnings; } +} push @parameterlist, $param; $parametertypes{$param} = $type; --- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: wedged processes, test program supplied
On Wed, 2006-12-20 at 01:05 -0500, Albert Cahalan wrote: > On 12/20/06, Mike Galbraith <[EMAIL PROTECTED]> wrote: > > On Tue, 2006-12-19 at 21:46 -0500, Albert Cahalan wrote: > > > Somebody PLEASE try this... > > > > I was having enough fun with cloninator (which was whitespace munged > > btw). > > Anything stuck? Besides refusing to die, that beast slays debuggers > left and right. I just need to add execve of /proc/self/exe and a massive > storm of signals on the alternate stack. Usually, I can kill the misbehaving strace or abandoned cloninators if it decides to take a hike, but sometimes it leaves corpses lying around. > Oh. I wanted to be sure you'd see the problem. Did you have > some... difficulty? A plain old ^C should make things stop. > The second test program is like the first, but missing SIGCHLD > >from the clone flags, and hopefully not whitespace-mangled. > > Note that the test program is not normally a fork bomb. > It self-limits itself to 42 tasks via a lock in shared memory. > If things are working OK, you should see no more than > about 60 tasks. I didn't take any countermeasures.. had ~27000 zombies. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IO-APIC + timer doesn't work
On 12/19/06, Eric W. Biederman <[EMAIL PROTECTED]> wrote: So the pin2 case should be tested right after the pin1 case as we do currently. On most new boards that will be a complete noop. But it is better than our current blind guess at using ExtINT mode. I figure after we try what the BIOS has told us about and that has failed we should first try the common irq 0 apic mappings, and then try the common ExtINT mappings. Please check if this one is ok. [PATCH] x86_64: check_timer with io apic setup before try_apic_pin add io apic setup before try_apic_pin cc: Andi Kleen <[EMAIL PROTECTED]> cc: Eric W. Biederman <[EMAIL PROTECTED]> Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]> diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index 2a1dcd5..6d09fc0 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -273,10 +273,17 @@ static void add_pin_to_irq(unsigned int irq, int apic, int pin) struct irq_pin_list *entry = irq_2_pin + irq; BUG_ON(irq >= NR_IRQS); - while (entry->next) + while (entry->next) { + if (entry->apic == apic && entry->pin == pin) + return; + if (entry->pin == -1) + break; entry = irq_2_pin + entry->next; + } if (entry->pin != -1) { + if (entry->apic == apic && entry->pin == pin) + return; entry->next = first_free_entry; entry = irq_2_pin + entry->next; if (++first_free_entry >= PIN_MAP_SIZE) @@ -286,6 +293,24 @@ static void add_pin_to_irq(unsigned int irq, int apic, int pin) entry->pin = pin; } +static void remove_pin_to_irq(unsigned int irq, int apic, int pin) +{ + struct irq_pin_list *entry = irq_2_pin + irq; + + BUG_ON(irq >= NR_IRQS); + + while (entry) { + if (entry->apic == apic && entry->pin == pin) { + entry->apic = -1; + entry->pin = -1; + break; + } + if (entry->next) + entry = irq_2_pin + entry->next; + } + +} + #define DO_ACTION(name,R,ACTION, FINAL) \ \ @@ -367,6 +392,34 @@ static int find_irq_entry(int apic, int pin, int type) return -1; } +static int add_irq_entry(int type, int irqflag, int bus, int irq, int apic, int pin) +{ +struct mpc_config_intsrc intsrc; + int idx; + +intsrc.mpc_type = MP_INTSRC; +intsrc.mpc_irqflag = irqflag; /* conforming */ +intsrc.mpc_srcbus = bus; +intsrc.mpc_dstapic = (apic != -1) ? mp_ioapics[apic].mpc_apicid: MP_APIC_ALL; + +intsrc.mpc_irqtype = type; + +intsrc.mpc_srcbusirq = irq; +intsrc.mpc_dstirq = pin; + +mp_irqs [mp_irq_entries] = intsrc; +Dprintk("Int: type %d, pol %d, trig %d, bus %d," +" IRQ %02x, APIC ID %x, APIC INT %02x\n", +intsrc.mpc_irqtype, intsrc.mpc_irqflag & 3, +(intsrc.mpc_irqflag >> 2) & 3, intsrc.mpc_srcbus, +intsrc.mpc_srcbusirq, intsrc.mpc_dstapic, intsrc.mpc_dstirq); +idx = mp_irq_entries; + if (++mp_irq_entries >= MAX_IRQ_SOURCES) +panic("Max # of irq sources exceeded!!\n"); + return idx; + +} + /* * Find the pin to which IRQ[irq] (ISA) is connected */ @@ -1570,6 +1658,22 @@ static inline void unlock_ExtINT_logic(void) * fanatically on his truly buggy board. */ +static void set_try_apic_pin(int apic, int pin, int type) +{ + int idx; + int irq = 0; + int bus = 0; /* MP_ISA_BUS */ + int irqflag = 5; /* MP_IRQ_TRIGGER_EDGE|MP_IRQ_POLARITY_HIGH */ + + idx = find_irq_entry(apic,pin,type); + + if (idx == -1) + idx = add_irq_entry(type, irqflag, bus, irq, apic, pin); + + add_pin_to_irq(irq, apic, pin); + setup_IO_APIC_irq(apic, pin, idx, irq); +} + static int try_apic_pin(int apic, int pin, char *msg) { apic_printk(APIC_VERBOSE, KERN_INFO @@ -1588,7 +1692,7 @@ static int try_apic_pin(int apic, int pin, char *msg) } return 1; } - clear_IO_APIC_pin(apic, pin); + apic_printk(APIC_QUIET, KERN_ERR " .. failed\n"); return 0; } @@ -1599,12 +1703,13 @@ static void check_timer(void) int apic1, pin1, apic2, pin2; int vector; cpumask_t mask; + int i; /* * get/set the timer IRQ vector: */ - disable_8259A_irq(0); vector = assign_irq_vector(0, TARGET_CPUS, ); + disable_8259A_irq(0); /* * Subtle, code in do_timer_interrupt() expects an AEOI @@ -1621,33 +1726,51 @@ static void check_timer(void) pin2 = ioapic_i8259.pin; apic2 = ioapic_i8259.apic; - /* Do this first, otherwise we get double interrupts on ATI boards */ - if ((pin1 != -1) && try_apic_pin(apic1, pin1,"with 8259 IRQ0 disabled")) - return; + apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n", + vector, apic1, pin1, apic2, pin2); - /* Now try again with IRQ0 8259A enabled. - Assumes timer is on IO-APIC 0 ?!? */ - enable_8259A_irq(0); - unmask_IO_APIC_irq(0); - if (try_apic_pin(apic1, pin1, "with 8259 IRQ0 enabled")) - return; - disable_8259A_irq(0); + if (pin1 != -1) { + /* Do this first, otherwise we get double interrupts on ATI boards */ + /*
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
In-Reply-To: <[EMAIL PROTECTED]> On Tue, 19 Dec 2006 17:29:00 -0800, Andrew Morton wrote: > Quoting the bug report: > general protection fault: 013b [1] PREEMPT That '013b' is critical information. Bit 0: 1: exception source is external to the processor Bit 1: 1: there is a problem with an interrupt descriptor in the IDT Bit 2: n/a Bits 15-3: index of the problem descriptor So an external interrupt occurred, the system tried to use interrupt descriptor #39 decimal (irq 7), but the descriptor was invalid. -- MBTI: IXTP - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: util-linux: orphan
Karel Zak writes: I've originally thought about util-linux upstream fork, but as usually an fork is bad step. So.. I'd like to start some discussion before this step. ... after few weeks I'm pleased to announce a new "util-linux-ng" project. This project is a fork of the original util-linux (2.13-pre7). Aw damn, I missed it again. LKML gets about 300 posts/day. The last time util-linux was offered, I missed out. Bummer. Well, how about giving me a chunk of it? I'd like /bin/kill please. I already ship a nicer one in procps anyway, so you can just delete the files and call that done. (just today I was working on a Fedora system and /bin/kill annoyed me) VERY STRONG SUGGESTION: build a full test suite before you mess with the source. This isn't some cute toy like xeyes or a silly game. This is util-linux, which MUST work. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fdtable: Provide free_fdtable() wrapper.
Hi, Christoph Hellwig has expressed concerns that the recent fdtable changes expose the details of the RCU methodology used to release no-longer-used fdtable structures to the rest of the kernel. The trivial patch below addresses these concerns by introducing the appropriate free_fdtable() calls, which simply wrap the release RCU usage. Since free_fdtable() is a one-liner, it makes sense to promote it to an inline helper. Please apply. Signed-off-by: Vadim Lobanov <[EMAIL PROTECTED]> diff -pru old/fs/file.c new/fs/file.c --- old/fs/file.c 2006-12-19 19:54:23.0 -0800 +++ new/fs/file.c 2006-12-19 20:04:02.0 -0800 @@ -206,7 +206,7 @@ static int expand_fdtable(struct files_s copy_fdtable(new_fdt, cur_fdt); rcu_assign_pointer(files->fdt, new_fdt); if (cur_fdt->max_fds > NR_OPEN_DEFAULT) - call_rcu(_fdt->rcu, free_fdtable_rcu); + free_fdtable(cur_fdt); } else { /* Somebody else expanded, so undo our attempt */ free_fdarr(new_fdt); diff -pru old/include/linux/file.h new/include/linux/file.h --- old/include/linux/file.h2006-12-19 19:54:25.0 -0800 +++ new/include/linux/file.h2006-12-19 20:03:19.0 -0800 @@ -80,6 +80,11 @@ extern int expand_files(struct files_str extern void free_fdtable_rcu(struct rcu_head *rcu); extern void __init files_defer_init(void); +static inline void free_fdtable(struct fdtable *fdt) +{ + call_rcu(>rcu, free_fdtable_rcu); +} + static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd) { struct file * file = NULL; diff -pru old/kernel/exit.c new/kernel/exit.c --- old/kernel/exit.c 2006-12-19 19:54:52.0 -0800 +++ new/kernel/exit.c 2006-12-19 20:04:20.0 -0800 @@ -466,7 +466,7 @@ void fastcall put_files_struct(struct fi fdt = files_fdtable(files); if (fdt != >fdtab) kmem_cache_free(files_cachep, files); - call_rcu(>rcu, free_fdtable_rcu); + free_fdtable(fdt); } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
downloading kernels w/ metalink (mirrors, checksums, signatures)
Hi, This may not be as nice for kernels as for other downloads because of how nicely organized the kernel mirrors are, but maybe some people will be interested. Metalink is a system which attempts to improve the download process by increasing availability and guaranteeing integrity. It can give your users a more reliable download by providing multiple links to the same file, which can be switched to if one server is down or fails during transmission. It can also make downloads faster by using multiple resources at once. Metalink lists mirrors with machine readable information on priority and location so their efficient use can be automated by download programs. It can list mirrors around the world, but will automatically default to mirrors closer to you and by priority. The checksum verification process, usually manual and arcane to most people, is automated with Metalink, so files are guaranteed to be an exact copy of the file you downloaded, free of errors. Metalinks can also contain publisher information, Operating System and architecture, language, file descriptions, mutliple files (to be added to a download queue), partial file checksums, and so on. All this extra information allows download programs to do interesting things. Linux Kernel Metalink downloads (All): http://download.packages.ro/metalink/kernel/ More details..."Downloading bliss with Metalink": http://www.linux.com/article.pl?sid=06/11/01/1641247 Partial example .metalink: http://www.metalinker.org/; origin="http://prog.infosnel.nl/metalinks/kernel.php/kernel/v2.6/linux-2.6.19.tar.bz2.metalink; generator="http://prog.infosnel.nl/metalinks/kernel.php;> Kernel.org http://kernel.org/ 2.6.19 443c265b57e87eadc0c677c3acc37e20 http://www.al.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2 ftp://ftp.al.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2 http://www.aq.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2 http://www.ag.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2 http://www.al.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2.sign ftp://ftp.al.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2.sign http://www.aq.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2.sign http://www.ag.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2.sign A real .metalink would list all mirrors. Metalink is supported by download managers on Mac, Unix, and Windows. aria2 ( http://aria2.sourceforge.net/ ) is a really nice command line client. You can use command line options to default to mirrors in a certain country (--metalink-location=XX) and other things. The main users of metalink are OpenOffice.org, openSUSE, Arch Linux, and other Linux distributions for ISO downloads. (( Anthony Bryan )) Metalink [ http://www.metalinker.org ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [-mm patch] make uio_irq_handler() static
On Sat, Dec 16, 2006 at 02:56:54PM +0100, Adrian Bunk wrote: > On Thu, Dec 14, 2006 at 10:59:13PM -0800, Andrew Morton wrote: > >... > > Changes since 2.6.19-mm1: > >... > > +gregkh-driver-uio-irq.patch > > > > driver tree updates > >... > > This patch makes the needlessly global uio_irq_handler() static. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > --- linux-2.6.20-rc1-mm1/drivers/uio/uio_irq.c.old2006-12-15 > 22:23:23.0 +0100 > +++ linux-2.6.20-rc1-mm1/drivers/uio/uio_irq.c2006-12-15 > 22:33:40.0 +0100 > @@ -22,7 +22,7 @@ > > static struct uio_device *uio_irq_idev; > > -irqreturn_t uio_irq_handler(int irq, void *dev_id) > +static irqreturn_t uio_irq_handler(int irq, void *dev_id) > { > return IRQ_HANDLED; > } Thanks, I've applied this to my tree. greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: wedged processes, test program supplied
On 12/20/06, Mike Galbraith <[EMAIL PROTECTED]> wrote: On Tue, 2006-12-19 at 21:46 -0500, Albert Cahalan wrote: > Somebody PLEASE try this... I was having enough fun with cloninator (which was whitespace munged btw). Anything stuck? Besides refusing to die, that beast slays debuggers left and right. I just need to add execve of /proc/self/exe and a massive storm of signals on the alternate stack. In the original post, I also mangled the recommended ps command: ps -Ccloninator -mwostat,ppid,pid,tid,nlwp,pending,sigmask,sigignore,caught,wchan Leave out pid,tid,nlwp if you need to save screen space, like so: ps -Ccloninator -mwostat,ppid,pending,sigmask,sigignore,caught,wchan (note: procps versions prior to 3.2.7 are mostly fine, but will mess up the PENDING column for any single-threaded processes you get) This is fun to look at: watch ps -Ccloninator fostat,ppid,wchan:9,comm > Normally, when a process dies it becomes a zombie. > If the parent dies (before or after the child), the child > is adopted by init. Init will reap the child. > > The program included below DOES NOT get reaped. While true wasn't a great test recommendation :) Oh. I wanted to be sure you'd see the problem. Did you have some... difficulty? A plain old ^C should make things stop. The second test program is like the first, but missing SIGCHLD from the clone flags, and hopefully not whitespace-mangled. Note that the test program is not normally a fork bomb. It self-limits itself to 42 tasks via a lock in shared memory. If things are working OK, you should see no more than about 60 tasks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On 12/20/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: On Tue, 19 Dec 2006, Linus Torvalds wrote: > > here's a totally new tangent on this: it's possible that user code is > simply BUGGY. Btw, here's a simpler test-program that actually shows the difference between 2.6.18 and 2.6.19 in action, and why it could explain why a program like rtorrent might show corruption behavious that it didn't show before. Kinda late to the discussion, but I guess I could summarize what rtorrent actually does, or should be doing. When downloading a new torrent, it will create the files and truncate them to the final size. It will never call truncate after this and the files will remain sparse until data is downloaded. A 'piece' is mapped to memory using MAP_SHARED, which will be page aligned on single file torrents but unlikely to be so on multi-file torrents. So on multi-file torrents it'll often end up with two mappings overlapping with one page, each of which only write to their own part the page. These will then be sync'ed with MS_ASYNC, or MS_SYNC if low on disk space. After that it might be unmapped, then mapped as read-only. I haven't thought of asking if single file torrents are ok. Rakshasa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
On Tue, Dec 19, 2006 at 09:34:17PM -0800, Greg KH wrote: > I would be very interested to see any newer SuSE programs using that > interface. Just point them out to me and I'll quickly fix them. As far as I can tell, powersaved still uses these.. I'm not quite sure how you can fix it without just removing the functionality from it... > And yes, as a SuSE developer (and one of the people in charge of the > SuSE kernels), I have no problem with these files just going away. > Because, as David keeps repeating, they are broken and wrong. In the common case, it works perfectly well for the management of individual PCI devices. Yes it's "wrong", in much the same way as (say) the IDE bus registration/unregistration code. But we keep that around because despite it being even more broken than devices/.../power/state, people are still actually using it and we haven't provided any sort of alternative. Seriously. How many pieces of userspace-visible functionality have recently been removed without there being any sort of alternative? -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
David Wragg writes: Benjamin LaHaise <[EMAIL PROTECTED]> writes: On Mon, Dec 18, 2006 at 11:50:08PM +, David Wragg wrote: This patch (against 2.6.19/2.6.19.1) adds the four context switch values (voluntary context switches, involuntary context switches, and the same values accumulated from terminated child processes) to the end of /proc/*/stat, similarly to min_flt, maj_flt and the time used values. Hmmm, OK, do people have a use for these values? Please put these into new files, as the stat files in /proc are horribly overloaded and have always been somewhat problematic when it comes to changing how things are reported due to internal changes to the kernel. Cheers, No thanks. Yours truly, the maintainer of "ps", "top", "vmstat", etc. The delay accounting value was added to the end of /proc/pid/stat back in July without discussion, so I assumed this approach was still considered satisfactory. /proc/*/stat is the very best place in /proc for any per-process data that will be commonly needed. Unlike /proc/*/status, few people are tempted to screw with the formatting and/or spelling. Unlike the /sys crap, it doesn't take 3 syscalls PER VALUE to get at the data. The things to ask are of course: will this really be used, and does it really belong in /proc at all? Putting just these four values into a new file would seem a little odd, since they have a lot in common with the other getrusage values that are already in /proc/pid/stat. One possibility is to add /proc/pid/rusage, mirroring the full struct rusage in text form, since struct rusage is already part of the kernel ABI (though Linux doesn't fill in half of the values). Since we already have a struct defined and all... sys_get_rusage(int pid) Or perhaps it makes sense to reorganize all the values from /proc/pid/stat and its siblings into a sysfs-like one-value-per-file structure, though that might introduce atomicity and efficiency issues (calculating some of the values involves iterating over the threads in the process; with everything in one file, these loops are folded together). Yeah, big time. Things are quite bad in /proc, but /sys is a joke. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: wedged processes, test program supplied
On Tue, 2006-12-19 at 21:46 -0500, Albert Cahalan wrote: > Somebody PLEASE try this... I was having enough fun with cloninator (which was whitespace munged btw). > Normally, when a process dies it becomes a zombie. > If the parent dies (before or after the child), the child > is adopted by init. Init will reap the child. > > The program included below DOES NOT get reaped. While true wasn't a great test recommendation :) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] powerpc: remove the broken Gemini support
Roman Zippel writes: > Well, there are still patches umerged for over a year, they probably still > apply mostly. Please rebase and repost them, if you want them to go in. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
On Tue, Dec 19, 2006 at 09:14:49PM -0800, David Brownell wrote: > On Tuesday 19 December 2006 8:26 pm, Matthew Garrett wrote: > > On Tue, Dec 19, 2006 at 07:59:42PM -0800, David Brownell wrote: > > It's perfectly reasonable to > > refer to it as a flawed interface, or perhaps even a buggy one. But in > > itself, it's clearly not a bug. > > This class of bug is also called a "design bug" or sometimes "mistake". Exactly, those "power" files actually pre-date the actual tree of devices itself. They were just holders for what the original developer thought was going to be needed, but was never properly implemented due to some job changes (note, this was not myself...) > > > In contrast, the /sys/devices/.../power/state API has never had many > > > users beyond developers trying to test their drivers (without taking > > > the whole system into a low power state, which probably didn't work > > > in any case), and has *always* been problematic. And the change you > > > object to doesn't "break" anything fundamental, either. Everything > > > still works. > > > > It's used on every Ubuntu and Suse system, > > Odd how the relevant Suse developers didn't mention any issues with > those files going away, any of the times problems with them were > discussed on the PM list. Also, I have a Suse system that doesn't > use those files for anything ... maybe only newer release use it. I would be very interested to see any newer SuSE programs using that interface. Just point them out to me and I'll quickly fix them. And yes, as a SuSE developer (and one of the people in charge of the SuSE kernels), I have no problem with these files just going away. Because, as David keeps repeating, they are broken and wrong. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 7596 - Potential performance bottleneck for Linxu TCP
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 19 Dec 2006 21:11:24 -0800 > It was the realtime/normal comments that piqued my interest. > Perhaps we should either tweak process priority or remove > the comments. I mentioned that to Linus once and he said the entire idea was bogus. With the recent tcp_recvmsg() preemption issue thread, I agree with his sentiments even more than I did previously. What needs to happen is to liberate the locking so that input packet processing can occur in parallel with tcp_recvmsg(), instead of doing this bogus backlog thing which can wedge TCP ACK processing for an entire quantum if we take a kernel preemption while the process has the socket lock held. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
On Tuesday 19 December 2006 8:26 pm, Matthew Garrett wrote: > On Tue, Dec 19, 2006 at 07:59:42PM -0800, David Brownell wrote: > The existence of the power/state interface wasn't a bug - it was a > deliberate decision to add it. It's the only reason the > dpm_runtime_suspend() interface exists. All that buggy infrastructure talks together, yes. Those dpm_*() calls are in the same "will remove" task item. > It's perfectly reasonable to > refer to it as a flawed interface, or perhaps even a buggy one. But in > itself, it's clearly not a bug. This class of bug is also called a "design bug" or sometimes "mistake". > > In contrast, the /sys/devices/.../power/state API has never had many > > users beyond developers trying to test their drivers (without taking > > the whole system into a low power state, which probably didn't work > > in any case), and has *always* been problematic. And the change you > > object to doesn't "break" anything fundamental, either. Everything > > still works. > > It's used on every Ubuntu and Suse system, Odd how the relevant Suse developers didn't mention any issues with those files going away, any of the times problems with them were discussed on the PM list. Also, I have a Suse system that doesn't use those files for anything ... maybe only newer release use it. I've got some Ubuntu going too, which hasn't (visibly) suffered from any of these changes. - dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 7596 - Potential performance bottleneck for Linxu TCP
On Tue, 19 Dec 2006 18:55:25 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: > From: Herbert Xu <[EMAIL PROTECTED]> > Date: Wed, 20 Dec 2006 10:52:19 +1100 > > > Stephen Hemminger <[EMAIL PROTECTED]> wrote: > > > I noticed this bit of discussion in tcp_recvmsg. It implies that a better > > > queuing policy would be good. But it is confusing English (Alexey?) so > > > not sure where to start. > > > > Actually I think the comment says that the current code isn't the > > most elegant but is more efficient. > > It's just explaining the hierarchy of queues that need to > be purged, and in what order, for correctness. > > Alexey added that code when I mentioned to him, right after > we added the prequeue, that it was possible process the > normal backlog before the prequeue, which is illegal. > In fixing that bug, he added the comment we are discussing. It was the realtime/normal comments that piqued my interest. Perhaps we should either tweak process priority or remove the comments. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] Update feature-removal-schedule.txt
Add pm_has_noirq_stage to feature-removal-schedule as part of the /sys/devices/.../power/state removal. Also note that this functionality won't be removed until alternative functionality is implemented, in order to avoid having this argument again in July. Signed-off-by: Matthew Garrett <[EMAIL PROTECTED]> diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 30f3c8c..8a91689 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -9,7 +9,8 @@ be removed from this file. What: /sys/devices/.../power/state dev->power.power_state dpm_runtime_{suspend,resume)() -When: July 2007 + bus->pm_has_noirq_stage() +When: Once alternative functionality has been implemented Why: Broken design for runtime control over driver power states, confusing driver-internal runtime power management with: mechanisms to support system-wide sleep state transitions; event codes that distinguish -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] Fix /sys/device/.../power/state
Recent changes in the PM system made it impossible to perform runtime suspend of any PCI or platform devices. This patch restores the functionality for any devices that don't require any of their suspend or resume code to be run with interrupts disabled. Signed-off-by: Matthew Garrett <[EMAIL PROTECTED]> diff --git a/drivers/base/platform.c b/drivers/base/platform.c index f9c903b..6bf1218 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -597,6 +597,16 @@ static int platform_resume(struct device * dev) return ret; } +static int platform_pm_has_noirq_stage(struct device * dev) +{ + int ret = 0; + struct platform_driver *drv = to_platform_driver(dev->driver); + + if (dev->driver && (drv->resume_early || drv->suspend_late)) + ret = 1; + return ret; +} + struct bus_type platform_bus_type = { .name = "platform", .dev_attrs = platform_dev_attrs, @@ -606,6 +616,7 @@ struct bus_type platform_bus_type = { .suspend_late = platform_suspend_late, .resume_early = platform_resume_early, .resume = platform_resume, + .pm_has_noirq_stage = platform_pm_has_noirq_stage, }; EXPORT_SYMBOL_GPL(platform_bus_type); diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c index 2d47517..03d3f81 100644 --- a/drivers/base/power/sysfs.c +++ b/drivers/base/power/sysfs.c @@ -46,7 +46,8 @@ static ssize_t state_store(struct device * dev, struct device_attribute *attr, c int error = -EINVAL; /* disallow incomplete suspend sequences */ - if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early)) + if (dev->bus && dev->bus->pm_has_noirq_stage + && dev->bus->pm_has_noirq_stage(dev)) return error; state.event = PM_EVENT_SUSPEND; diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index e5ae3a0..c0e4e7a 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -351,6 +351,17 @@ static int pci_device_resume(struct device * dev) return error; } +static int pci_device_pm_has_noirq_stage(struct device * dev) +{ + int error = 0; + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + + if (drv && (drv->resume_early || drv->suspend_late)) + error = 1; + return error; +} + static int pci_device_resume_early(struct device * dev) { int error = 0; @@ -569,6 +580,7 @@ struct bus_type pci_bus_type = { .suspend_late = pci_device_suspend_late, .resume_early = pci_device_resume_early, .resume = pci_device_resume, + .pm_has_noirq_stage = pci_device_pm_has_noirq_stage, .shutdown = pci_device_shutdown, .dev_attrs = pci_dev_attrs, }; diff --git a/include/linux/device.h b/include/linux/device.h index 49ab53c..1c663c4 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -59,6 +59,7 @@ struct bus_type { int (*suspend)(struct device * dev, pm_message_t state); int (*suspend_late)(struct device * dev, pm_message_t state); int (*resume_early)(struct device * dev); + int (*pm_has_noirq_stage)(struct device * dev); int (*resume)(struct device * dev); }; -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] powerpc: use is_init()
Akinobu Mita writes: > Use is_init() rather than hard coded pid comparison. What's the context of this patch? Why is this a good thing to do? Doing a git grep -w is_init on Linus' current git tree reveals an is_init() in arch/parisc/kernel/module.c, which looks to be something different, but no generic definition of an is_init() function or macro. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/2] more patches for removable drive bay
Thanks for removing the new procfs code Kristen. applied. -Len On Saturday 16 December 2006 17:40, Kristen Carlson Accardi wrote: > Hi Len, > Here's a set of patches for changing the removable drive bay driver > (drivers/acpi/bay) from using the old proc interface to using a sysfs > interface instead. I made the bay driver a platform driver, and > so it's entries will now be located in /sys/devices/platform/bay.X. > There are still 2 entries - one for checking whether the bay is > present (present) that is read only, and one that is write only for > ejecting the bay (eject). Let me know if you would prefer me to fold > these into the original bay driver patch. > > Thanks, > Kristen > -- > - > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: GPL only modules [was Re: [GIT PATCH] more Driver core patches for 2.6.19]
On Sun, 2006-12-17 at 11:11 +0100, Geert Uytterhoeven wrote: > On Thu, 14 Dec 2006, David Schwartz wrote: > > That makes it clear that it's not about giving us the fruits of years of > > your own work but that it's about enabling us to do our own work. (I would > > have no objection to also requiring them to provide a minimal open-source > > driver. I'm not trying to work out the exact terms here, just get the idea > > out.) > > Since `works with' may sound a bit too vague, something like > `LinuxFriendly(tm)', with a happy penguin logo? > I've bought a couple of products lately that had the happy penguin logo on it. Just to find out that they only applied a bare minimum functionality of the device for Linux. If you want more, you need to plug it into a Windows box. Funny, if you own a Mac, it had the same problem. It had a little more functionality than the Linux port, but still far from what they give for Windows. I like the Open Hardware thing that Paolo mentioned. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
On Tue, Dec 19, 2006 at 07:59:42PM -0800, David Brownell wrote: > On Tuesday 19 December 2006 4:25 pm, Matthew Garrett wrote: > > 1) feature-removal-schedule.txt says that it'll be removed in July 2007. > > This isn't July 2007. > > Which is why the functionality is still there. Merely broken in the majority of cases... > > 2) The functionality was disabled in 2.6.19. The addition to > > feature-removal-schedule.txt was in, uh, 2.6.19. > > Please respond to the technical explanation I provided, and stop > referring to the functionality ** which is still there and works ** > as being disabled. The breakage is that devices that are happy to suspend with enabled interrupts can no longer be suspended from userspace. Refusing to suspend a single device on the basis that some other driver on the bus may, potentially, at some point require some suspend code to be run with disabled interrupts is not a sensible choice. Especially since I can't actually find a single driver in the kernel tree that currently uses this functionality. > I can't help it if that schedule.txt patch took until 2.6.19 to get > upstream; ISTR it was available before 2.6.18 shipped. Maybe patches > to that file should be accelerated, even into the stable series. That would still not have provided anywhere near enough warning. > One of the missing steps in Linus' formulation there is that not all > interfaces are equivalent in terms of support guarantee. Bugs are > interfaces, for example, and sometimes folk wrongly depend on them > when they persist for a long time (like, cough, this one). The existence of the power/state interface wasn't a bug - it was a deliberate decision to add it. It's the only reason the dpm_runtime_suspend() interface exists. It's perfectly reasonable to refer to it as a flawed interface, or perhaps even a buggy one. But in itself, it's clearly not a bug. And it's perfectly reasonable for userland to depend on interfaces that are deliberately exposed by the kernel. > In contrast, the /sys/devices/.../power/state API has never had many > users beyond developers trying to test their drivers (without taking > the whole system into a low power state, which probably didn't work > in any case), and has *always* been problematic. And the change you > object to doesn't "break" anything fundamental, either. Everything > still works. It's used on every Ubuntu and Suse system, and the change means that certain functionality no longer works - it's now impossible to prevent my wireless hardware from drawing power when I'm not using it, for example. If the WE power operations were deliberately disabled, then that would also be a bug. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] Add driver for OHCI firewire host controllers.
Kristian Høgsberg wrote: Signed-off-by: Kristian Hoegsberg <[EMAIL PROTECTED]> --- drivers/firewire/Kconfig | 11 drivers/firewire/Makefile |1 drivers/firewire/fw-ohci.c | 1394 drivers/firewire/fw-ohci.h | 152 + 4 files changed, 1558 insertions(+), 0 deletions(-) .. +static struct pci_driver fw_ohci_pci_driver = { + .name = ohci_driver_name, + .id_table = pci_table, + .probe = pci_probe, + .remove = pci_remove, +}; How about suspend/resume support? Lots of laptops have OHCI 1394 and full suspend/resume support is something that the current ohci1394 driver lacks. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
On Tuesday 19 December 2006 7:43 pm, Matthew Garrett wrote: > > Do you have an alternate solution? > > How about something like this? Entirely untested, but I think it shows > the basic idea. Other than indentation/whitespace bugs, it seems to encapsulate the layering violation needed to get those deprecated files working again for PCI (and platform_bus). I'd rename the new bus method though; maybe "pm_has_noirq_stage()" or somesuch. Your name is so generic that it'd be a surprise if the answer were ever "no"! You should also list this new call in the feature-removal.txt entry for stuff that gets removed with /sys/devices/.../power/state files, since it's another mechanism that only exists to prop up that broken API, and should vanish at the same time that API does. - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
On Tuesday 19 December 2006 4:25 pm, Matthew Garrett wrote: > On Tue, Dec 19, 2006 at 01:34:49PM -0800, David Brownell wrote: > > > Documentation/feature-removal-schedule.txt has warned about this since > > August, and the PM list has discussed how broken that model is numerous > > times over the past several years. (I'm pretty sure that discussion has > > leaked out to LKML on occasion.) It shouldn't be news today. > > 1) feature-removal-schedule.txt says that it'll be removed in July 2007. > This isn't July 2007. Which is why the functionality is still there. > 2) The functionality was disabled in 2.6.19. The addition to > feature-removal-schedule.txt was in, uh, 2.6.19. Please respond to the technical explanation I provided, and stop referring to the functionality ** which is still there and works ** as being disabled. The fact that PCI exposes a mechanism that conflicts with that is a separate issue. Whining does not help. I can't help it if that schedule.txt patch took until 2.6.19 to get upstream; ISTR it was available before 2.6.18 shipped. Maybe patches to that file should be accelerated, even into the stable series. > 3) "The whole _point_ of a kernel is to act as a abstraction layer and > resource management between user programs and hardware/outside world. > That's why kernels _exist_. Breaking user-land API's is thus by > definition something totally idiotic. > > If you need to break something, you create a new interface, and try to > translate between the two, and maybe you deprecate the old one so that > it can be removed once it's not in use any more. If you can't see that > this is how a kernel should work, you're missing the point of having a > kernel in the first place." > > Linus, http://lkml.org/lkml/2006/10/4/327 So I'm amused that the problem you refer to is the direct consequence of Linus' patch to add the suspend_late()/resume_early() mechanism into the PCI driver framework. (Again, see the technical explanation; and please try to have a technical discussion, not a flamefest.) One of the missing steps in Linus' formulation there is that not all interfaces are equivalent in terms of support guarantee. Bugs are interfaces, for example, and sometimes folk wrongly depend on them when they persist for a long time (like, cough, this one). His comment was specifically about breaking a widely used API that many people have been relying on since, oh, about 1996, and had been well proven in that time. And the change was a "system doesn't work" level change. In contrast, the /sys/devices/.../power/state API has never had many users beyond developers trying to test their drivers (without taking the whole system into a low power state, which probably didn't work in any case), and has *always* been problematic. And the change you object to doesn't "break" anything fundamental, either. Everything still works. In terms of any reasonable expectations about support, those two changes aren't comparable. - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add pci class code for SATA
On Wed, 20 Dec 2006 11:52:44 +0800 Conke Hu wrote: > On 12/20/06, Conke Hu <[EMAIL PROTECTED]> wrote: > > On 12/20/06, Conke Hu <[EMAIL PROTECTED]> wrote: > > > On 12/20/06, Jeff Garzik <[EMAIL PROTECTED]> wrote: > > > > Conke Hu wrote: > > > > > Add pci class code 0x0106 for SATA to pci_ids.h > > > > > > > > > > signed-off-by: [EMAIL PROTECTED] > > > > > > > > > > --- linux-2.6.20-rc1/include/linux/pci_ids.h.orig 2006-12-20 > > > > > 01:58:30.0 +0800 > > > > > +++ linux-2.6.20-rc1/include/linux/pci_ids.h 2006-12-20 > > > > > 01:59:07.0 +0800 > > > > > @@ -15,6 +15,7 @@ > > > > > #define PCI_CLASS_STORAGE_FLOPPY 0x0102 > > > > > #define PCI_CLASS_STORAGE_IPI0x0103 > > > > > #define PCI_CLASS_STORAGE_RAID 0x0104 > > > > > +#define PCI_CLASS_STORAGE_SATA 0x0106 > > > > > #define PCI_CLASS_STORAGE_SAS0x0107 > > > > > #define PCI_CLASS_STORAGE_OTHER 0x0180 > > > > > > > > Two comments: > > > > > > > > 1) I think "_SATA" is an inaccurate description. It should be _AHCI > > > > AFAICS. > > > > > > > > 2) Typically we don't add constants unless they are used somewhere... > > > > > > > > Jeff > > > > > > > > > > Hi Jeff, > > > According to PCI spec 3.0, 0x0106 means SATA controller, 0x010601 > > > means AHCI and 0x010600 means vendor specific SATA controller. Pls see > > > the following table (PCI spec 3.0 P296): > > > > > > Base Class Sub-Class Interface Meaning > > > > > > 00h 00h SCSI bus controller > > > > > > 01h xxh IDE controller > > > --- > > > 02h 00h Floppy disk controller > > > - > > > 03h 00h IPI bus controller > > > -- > > > 04h 00h RAID controller > > > 01h > > > 20h ATA controller with ADMA > > > interface > > > 05h > > > --- > > > 30h ATA controller with ADMA > > > interface > > > > > > --- > > > 00h Serial ATA > > > controller–vendor specific interface > > > 06h > > > - > > > 01h Serial ATA > > > controller–AHCI 1.0 interface > > > > > > - > > > 07h 00h Serial Attached SCSI > > > (SAS) controller > > > > > > - > > > 80h 00h Other mass storage > > > controller > > > -- > > > > > > > > > So, I think, the following macro is correct: > > > #define PCI_CLASS_STORAGE_SATA 0x0106 > > > If you would define AHCI class code, it should be 0x010601, not 0x0106: > > > #define PCI_CLASS_STORAGE_SATA_AHCI 0x010601 > > > > > > And, I think that PCI_CLASS_STORAGE_SATA had better be added to > > > pci_ids.h since the class code 0x0106 is used more than once. e.g. > > > ahci.c uses the magic number 0x0106 twice, and it might be used more > > > in future. > > > > > > Best regards, > > > Conke > > > > > > > > > Here is a patch to show more details: > > --- > > diff -Nur linux-2.6.20-rc1.orig/drivers/ata/ahci.c > > linux-2.6.20-rc1/drivers/ata/ahci.c > > --- linux-2.6.20-rc1.orig/drivers/ata/ahci.c2006-12-20 > > 10:25:00.0 +0800 > > +++ linux-2.6.20-rc1/drivers/ata/ahci.c 2006-12-20 10:13:24.0 +0800 > > @@ -418,7 +418,7 @@ > > > > /* Generic, PCI class code for AHCI */ > > { PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, > > - 0x010601, 0xff, board_ahci }, > > + PCI_CLASS_STORAGE_SATA<<8|1, 0xff, board_ahci }, > > > > { } /* terminate list */ > > }; > > @@ -1586,11 +1586,11 @@ > > speed_s = "?"; > > > > pci_read_config_word(pdev, 0x0a, ); > > - if (cc == 0x0101) > > + if (cc == PCI_CLASS_STORAGE_IDE) > > scc_s = "IDE"; > > - else if (cc == 0x0106) > > + else
Re: [PATCH] Add pci class code for SATA
On 12/20/06, Conke Hu <[EMAIL PROTECTED]> wrote: On 12/20/06, Conke Hu <[EMAIL PROTECTED]> wrote: > On 12/20/06, Jeff Garzik <[EMAIL PROTECTED]> wrote: > > Conke Hu wrote: > > > Add pci class code 0x0106 for SATA to pci_ids.h > > > > > > signed-off-by: [EMAIL PROTECTED] > > > > > > --- linux-2.6.20-rc1/include/linux/pci_ids.h.orig 2006-12-20 > > > 01:58:30.0 +0800 > > > +++ linux-2.6.20-rc1/include/linux/pci_ids.h 2006-12-20 > > > 01:59:07.0 +0800 > > > @@ -15,6 +15,7 @@ > > > #define PCI_CLASS_STORAGE_FLOPPY 0x0102 > > > #define PCI_CLASS_STORAGE_IPI0x0103 > > > #define PCI_CLASS_STORAGE_RAID 0x0104 > > > +#define PCI_CLASS_STORAGE_SATA 0x0106 > > > #define PCI_CLASS_STORAGE_SAS0x0107 > > > #define PCI_CLASS_STORAGE_OTHER 0x0180 > > > > Two comments: > > > > 1) I think "_SATA" is an inaccurate description. It should be _AHCI AFAICS. > > > > 2) Typically we don't add constants unless they are used somewhere... > > > > Jeff > > > > Hi Jeff, > According to PCI spec 3.0, 0x0106 means SATA controller, 0x010601 > means AHCI and 0x010600 means vendor specific SATA controller. Pls see > the following table (PCI spec 3.0 P296): > > Base Class Sub-Class Interface Meaning > > 00h 00h SCSI bus controller > > 01h xxh IDE controller > --- > 02h 00h Floppy disk controller > - > 03h 00h IPI bus controller > -- > 04h 00h RAID controller > 01h > 20h ATA controller with ADMA interface > 05h --- > 30h ATA controller with ADMA interface > --- > 00h Serial ATA controller–vendor specific interface > 06h - > 01h Serial ATA controller–AHCI 1.0 interface > - > 07h 00h Serial Attached SCSI (SAS) controller > - > 80h 00h Other mass storage controller > -- > > > So, I think, the following macro is correct: > #define PCI_CLASS_STORAGE_SATA 0x0106 > If you would define AHCI class code, it should be 0x010601, not 0x0106: > #define PCI_CLASS_STORAGE_SATA_AHCI 0x010601 > > And, I think that PCI_CLASS_STORAGE_SATA had better be added to > pci_ids.h since the class code 0x0106 is used more than once. e.g. > ahci.c uses the magic number 0x0106 twice, and it might be used more > in future. > > Best regards, > Conke > Here is a patch to show more details: --- diff -Nur linux-2.6.20-rc1.orig/drivers/ata/ahci.c linux-2.6.20-rc1/drivers/ata/ahci.c --- linux-2.6.20-rc1.orig/drivers/ata/ahci.c2006-12-20 10:25:00.0 +0800 +++ linux-2.6.20-rc1/drivers/ata/ahci.c 2006-12-20 10:13:24.0 +0800 @@ -418,7 +418,7 @@ /* Generic, PCI class code for AHCI */ { PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, - 0x010601, 0xff, board_ahci }, + PCI_CLASS_STORAGE_SATA<<8|1, 0xff, board_ahci }, { } /* terminate list */ }; @@ -1586,11 +1586,11 @@ speed_s = "?"; pci_read_config_word(pdev, 0x0a, ); - if (cc == 0x0101) + if (cc == PCI_CLASS_STORAGE_IDE) scc_s = "IDE"; - else if (cc == 0x0106) + else if (cc == PCI_CLASS_STORAGE_SATA) scc_s = "SATA"; - else if (cc == 0x0104) + else if (cc == PCI_CLASS_STORAGE_RAID) scc_s = "RAID"; else scc_s = "unknown"; diff -Nur linux-2.6.20-rc1.orig/include/linux/pci_ids.h linux-2.6.20-rc1/include/linux/pci_ids.h --- linux-2.6.20-rc1.orig/include/linux/pci_ids.h 2006-12-20 10:24:51.0 +0800 +++ linux-2.6.20-rc1/include/linux/pci_ids.h2006-12-20 10:08:15.0 +0800 @@ -15,6 +15,7 @@
Re: Changes to sysfs PM layer break userspace
On Tue, 19 Dec 2006 18:35:39 -0800 Randy Dunlap <[EMAIL PROTECTED]> wrote: > On Tue, 19 Dec 2006 18:15:24 -0800 Andrew Morton wrote: > > > On Tue, 19 Dec 2006 13:34:49 -0800 > > David Brownell <[EMAIL PROTECTED]> wrote: > > > > > Documentation/feature-removal-schedule.txt has warned about this since > > > August > > > > Nobody reads that. > > Ugh, I read it. > > > Please, wherever possible, put a nice printk("this is going away") in the > > code > > when planning these things. > > Can notices go in both places, or is in the source code (printk) > now the preferred way? I think printks grab a lot more attention. It's not surprising that people get surprised when the feature they're using goes away. Plus they may not even know that that they're using the feature. A printk fixes that. > I think that we can point people to Doc/feature-removal-schedule.txt > easier (and more effectively) than we can source code (or noisy kernel > logs). Hopefully developers who see the printk will think to look in feature-removal-schedule.txt for more details. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
netif_poll_enable() & barrier
Hi ! I stumbled accross what might be a bug on out of order architecture: netif_poll_enable() only does a clear_bit(). However, netif_poll_disable/enable pairs are often used as simili-spinlocks. (netif_poll_enable() has pretty much spin_lock semantics except that it schedules instead of looping). Thus, shouldn't netif_poll_disable() do an smp_wmb(); before clearing the bit to make sure that any stores done within the poll-disabled section are properly visible to the rest of the system before clearing the bit ? Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
On Tue, Dec 19, 2006 at 07:19:36PM -0800, David Brownell wrote: > On Tuesday 19 December 2006 4:09 pm, Matthew Garrett wrote: > > I'm sorry, which bit of "Don't break userspace API without adequate > > prior warning and with a workable replacement" is difficult to > > understand? > > What part of "it was already broken" do YOU not understand? The > whole notion is unsustainable. It doesn't work cross-platform, or > for multiple bus types. It confuses system-wide suspend mechanisms > with runtime mechanisms. It breaks guaranteed parent/child ordering > of suspend/resume calls. (And more...) Linux is utterly riddled with broken APIs. It's possible to see that as a downside of the "Release early, release often" model, but the advantage is that we get the opportunity to determine how these interfaces are broken. Based on that, we can either improve the existing interface or decide that it's broken beyond repair and design a new one. What we don't do is decide that an interface is broken, deprecate it and in the same release break it even for the cases where it previously worked. That's just insane. > Let us know when you get tired of whining and want to move on to > getting a real solution to the set of problems here. I've pointed > out that reverting Linus' patch would be one option to get your > short term issue rsolved ... that would remove a capability from > PCI drivers, but you could then use that deprecated mechanism. > I've also pointed out that you could start working towards a real > long term solution. I could, and in the long run I intend to. On the other hand, I don't expect to have enough time to fix every single in-tree network driver before 2.6.20, so... > Do you have an alternate solution? How about something like this? Entirely untested, but I think it shows the basic idea. diff --git a/drivers/base/platform.c b/drivers/base/platform.c index f9c903b..4865918 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -597,6 +597,17 @@ static int platform_resume(struct device * dev) return ret; } +static int platform_requires_disabled_interrupts(struct device * dev) +{ + int ret = 0; + + if (dev->driver && (dev->driver->resume_early + || dev->driver->suspend_late)) + ret = 1; + + return ret; +} + struct bus_type platform_bus_type = { .name = "platform", .dev_attrs = platform_dev_attrs, @@ -604,8 +615,9 @@ struct bus_type platform_bus_type = { .uevent = platform_uevent, .suspend= platform_suspend, .suspend_late = platform_suspend_late, - .resume_early = platform_resume_early, + .resume_early = platform_resume_early, .resume = platform_resume, + .requires_disabled_interrupts = platform_requires_disabled_interrupts, }; EXPORT_SYMBOL_GPL(platform_bus_type); diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c index 2d47517..97c6d65 100644 --- a/drivers/base/power/sysfs.c +++ b/drivers/base/power/sysfs.c @@ -46,7 +46,8 @@ static ssize_t state_store(struct device * dev, struct device_attribute *attr, c int error = -EINVAL; /* disallow incomplete suspend sequences */ - if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early)) + if (dev->bus && dev->bus->requires_disabled_interrupts + && dev->bus->requries_disabled_interrupts()) return error; state.event = PM_EVENT_SUSPEND; diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index e5ae3a0..9808d42 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -351,6 +351,18 @@ static int pci_device_resume(struct device * dev) return error; } +static int pci_device_requires_disabled_interrupts(struct device * dev) +{ + int error = 0; + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + + if (drv && (drv->resume_early || drv_suspend_late)) + error = 1; + + return error; +} + static int pci_device_resume_early(struct device * dev) { int error = 0; @@ -569,6 +581,7 @@ struct bus_type pci_bus_type = { .suspend_late = pci_device_suspend_late, .resume_early = pci_device_resume_early, .resume = pci_device_resume, + .requires_disabled_interrupts = pci_requires_disabled_interrupts, .shutdown = pci_device_shutdown, .dev_attrs = pci_dev_attrs, }; diff --git a/include/linux/device.h b/include/linux/device.h index 49ab53c..0686234 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -59,6 +59,7 @@ struct bus_type { int (*suspend)(struct device * dev, pm_message_t state); int (*suspend_late)(struct device * dev, pm_message_t state); int (*resume_early)(struct device * dev); + int
Re: [PATCH] Add pci class code for SATA
On 12/20/06, Conke Hu <[EMAIL PROTECTED]> wrote: On 12/20/06, Jeff Garzik <[EMAIL PROTECTED]> wrote: > Conke Hu wrote: > > Add pci class code 0x0106 for SATA to pci_ids.h > > > > signed-off-by: [EMAIL PROTECTED] > > > > --- linux-2.6.20-rc1/include/linux/pci_ids.h.orig 2006-12-20 > > 01:58:30.0 +0800 > > +++ linux-2.6.20-rc1/include/linux/pci_ids.h 2006-12-20 > > 01:59:07.0 +0800 > > @@ -15,6 +15,7 @@ > > #define PCI_CLASS_STORAGE_FLOPPY 0x0102 > > #define PCI_CLASS_STORAGE_IPI0x0103 > > #define PCI_CLASS_STORAGE_RAID 0x0104 > > +#define PCI_CLASS_STORAGE_SATA 0x0106 > > #define PCI_CLASS_STORAGE_SAS0x0107 > > #define PCI_CLASS_STORAGE_OTHER 0x0180 > > Two comments: > > 1) I think "_SATA" is an inaccurate description. It should be _AHCI AFAICS. > > 2) Typically we don't add constants unless they are used somewhere... > > Jeff > Hi Jeff, According to PCI spec 3.0, 0x0106 means SATA controller, 0x010601 means AHCI and 0x010600 means vendor specific SATA controller. Pls see the following table (PCI spec 3.0 P296): Base Class Sub-Class Interface Meaning 00h 00h SCSI bus controller 01h xxh IDE controller --- 02h 00h Floppy disk controller - 03h 00h IPI bus controller -- 04h 00h RAID controller 01h 20h ATA controller with ADMA interface 05h --- 30h ATA controller with ADMA interface --- 00h Serial ATA controller–vendor specific interface 06h - 01h Serial ATA controller–AHCI 1.0 interface - 07h 00h Serial Attached SCSI (SAS) controller - 80h 00h Other mass storage controller -- So, I think, the following macro is correct: #define PCI_CLASS_STORAGE_SATA 0x0106 If you would define AHCI class code, it should be 0x010601, not 0x0106: #define PCI_CLASS_STORAGE_SATA_AHCI 0x010601 And, I think that PCI_CLASS_STORAGE_SATA had better be added to pci_ids.h since the class code 0x0106 is used more than once. e.g. ahci.c uses the magic number 0x0106 twice, and it might be used more in future. Best regards, Conke Here is a patch to show more details: --- diff -Nur linux-2.6.20-rc1.orig/drivers/ata/ahci.c linux-2.6.20-rc1/drivers/ata/ahci.c --- linux-2.6.20-rc1.orig/drivers/ata/ahci.c2006-12-20 10:25:00.0 +0800 +++ linux-2.6.20-rc1/drivers/ata/ahci.c 2006-12-20 10:13:24.0 +0800 @@ -418,7 +418,7 @@ /* Generic, PCI class code for AHCI */ { PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, - 0x010601, 0xff, board_ahci }, + PCI_CLASS_STORAGE_SATA<<8|1, 0xff, board_ahci }, { } /* terminate list */ }; @@ -1586,11 +1586,11 @@ speed_s = "?"; pci_read_config_word(pdev, 0x0a, ); - if (cc == 0x0101) + if (cc == PCI_CLASS_STORAGE_IDE) scc_s = "IDE"; - else if (cc == 0x0106) + else if (cc == PCI_CLASS_STORAGE_SATA) scc_s = "SATA"; - else if (cc == 0x0104) + else if (cc == PCI_CLASS_STORAGE_RAID) scc_s = "RAID"; else scc_s = "unknown"; diff -Nur linux-2.6.20-rc1.orig/include/linux/pci_ids.h linux-2.6.20-rc1/include/linux/pci_ids.h --- linux-2.6.20-rc1.orig/include/linux/pci_ids.h 2006-12-20 10:24:51.0 +0800 +++ linux-2.6.20-rc1/include/linux/pci_ids.h2006-12-20 10:08:15.0 +0800 @@ -15,6 +15,7 @@ #define PCI_CLASS_STORAGE_FLOPPY0x0102 #define PCI_CLASS_STORAGE_IPI 0x0103 #define PCI_CLASS_STORAGE_RAID 0x0104 +#define PCI_CLASS_STORAGE_SATA
[PATCH] add .mailmap for proper git-shortlog output
This list has been ripped out of the latest git-shortlog tool. It can be maintained separately so this is what this patch does. A couple more entries were added to the original list as well. Signed-off-by: Nicolas Pitre <[EMAIL PROTECTED]> --- diff --git a/.mailmap b/.mailmap new file mode 100644 index 000..016b861 --- /dev/null +++ b/.mailmap @@ -0,0 +1,96 @@ +# +# This list is used by git-shortlog to fix a few botched name translations +# in the git archive, either because the author's full name was messed up +# and/or not always written the same way, making contributions from the +# same person appearing not to be so or badly displayed. +# +# repo-abbrev: /pub/scm/linux/kernel/git/ +# + +Aaron Durbin <[EMAIL PROTECTED]> +Adam Oldham <[EMAIL PROTECTED]> +Adam Radford <[EMAIL PROTECTED]> +Adrian Bunk <[EMAIL PROTECTED]> +Alan Cox <[EMAIL PROTECTED]> +Alan Cox <[EMAIL PROTECTED]> +Aleksey Gorelov <[EMAIL PROTECTED]> +Al Viro <[EMAIL PROTECTED]> +Al Viro <[EMAIL PROTECTED]> +Andreas Herrmann <[EMAIL PROTECTED]> +Andrew Morton <[EMAIL PROTECTED]> +Andrew Vasquez <[EMAIL PROTECTED]> +Andy Adamson <[EMAIL PROTECTED]> +Arnaud Patard <[EMAIL PROTECTED]> +Arnd Bergmann <[EMAIL PROTECTED]> +Axel Dyks <[EMAIL PROTECTED]> +Ben Gardner <[EMAIL PROTECTED]> +Ben M Cahill <[EMAIL PROTECTED]> +Björn Steinbrink <[EMAIL PROTECTED]> +Brian Avery <[EMAIL PROTECTED]> +Brian King <[EMAIL PROTECTED]> +Christoph Hellwig <[EMAIL PROTECTED]> +Corey Minyard <[EMAIL PROTECTED]> +David Brownell <[EMAIL PROTECTED]> +David Woodhouse <[EMAIL PROTECTED]> +Domen Puncer <[EMAIL PROTECTED]> +Douglas Gilbert <[EMAIL PROTECTED]> +Ed L. Cashin <[EMAIL PROTECTED]> +Evgeniy Polyakov <[EMAIL PROTECTED]> +Felipe W Damasio <[EMAIL PROTECTED]> +Felix Kuhling <[EMAIL PROTECTED]> +Felix Moeller <[EMAIL PROTECTED]> +Filipe Lautert <[EMAIL PROTECTED]> +Franck Bui-Huu <[EMAIL PROTECTED]> +Frank Zago <[EMAIL PROTECTED]> +Greg Kroah-Hartman <[EMAIL PROTECTED](none)> +Greg Kroah-Hartman <[EMAIL PROTECTED]> +Greg Kroah-Hartman <[EMAIL PROTECTED]> +Henk Vergonet <[EMAIL PROTECTED]> +Henrik Kretzschmar <[EMAIL PROTECTED]> +Herbert Xu <[EMAIL PROTECTED]> +Jacob Shin <[EMAIL PROTECTED]> +James Bottomley <[EMAIL PROTECTED](none)> +James Bottomley <[EMAIL PROTECTED]> +James E Wilson <[EMAIL PROTECTED]> +James Ketrenos <[EMAIL PROTECTED](none)> +Jean Tourrilhes <[EMAIL PROTECTED]> +Jeff Garzik <[EMAIL PROTECTED]> +Jens Axboe <[EMAIL PROTECTED]> +Jens Osterkamp <[EMAIL PROTECTED]> +John Stultz <[EMAIL PROTECTED]> +Juha Yrjola +Juha Yrjola <[EMAIL PROTECTED]> +Juha Yrjola <[EMAIL PROTECTED]> +Kay Sievers <[EMAIL PROTECTED]> +Kenneth W Chen <[EMAIL PROTECTED]> +Koushik <[EMAIL PROTECTED]> +Leonid I Ananiev <[EMAIL PROTECTED]> +Linas Vepstas <[EMAIL PROTECTED]> +Matthieu CASTET <[EMAIL PROTECTED]> +Michel Dänzer <[EMAIL PROTECTED]> +Mitesh shah <[EMAIL PROTECTED]> +Morten Welinder <[EMAIL PROTECTED]> +Morten Welinder <[EMAIL PROTECTED]> +Morten Welinder <[EMAIL PROTECTED]> +Morten Welinder <[EMAIL PROTECTED]> +Nguyen Anh Quynh <[EMAIL PROTECTED]> +Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> +Patrick Mochel <[EMAIL PROTECTED]> +Peter A Jonsson <[EMAIL PROTECTED]> +Praveen BP <[EMAIL PROTECTED]> +Rajesh Shah <[EMAIL PROTECTED]> +Ralf Baechle <[EMAIL PROTECTED]> +Ralf Wildenhues <[EMAIL PROTECTED]> +Rémi Denis-Courmont <[EMAIL PROTECTED]> +Rudolf Marek <[EMAIL PROTECTED]> +Rui Saraiva <[EMAIL PROTECTED]> +Sachin P Sant <[EMAIL PROTECTED]> +Sam Ravnborg <[EMAIL PROTECTED]> +Simon Kelley <[EMAIL PROTECTED]> +Stéphane Witzmann <[EMAIL PROTECTED]> +Stephen Hemminger <[EMAIL PROTECTED]> +Tejun Heo <[EMAIL PROTECTED]> +Thomas Graf <[EMAIL PROTECTED]> +Tony Luck <[EMAIL PROTECTED]> +Tsuneo Yoshioka <[EMAIL PROTECTED]> +Valdis Kletnieks <[EMAIL PROTECTED]>
Re: [2.6 patch] drivers/atm/fore200e.c: cleanups
From: Adrian Bunk <[EMAIL PROTECTED]> Date: Tue, 19 Dec 2006 05:12:58 +0100 > This patch contains the following transformations from custom functions > to standard kernel version: > - fore200e_kmalloc() -> kzalloc() > - fore200e_kfree() -> kfree() > - fore200e_swap() -> cpu_to_be32() > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Looks good, applied, thanks Adrian. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] drivers/atm/Kconfig: remove dead ATM_TNETA1570 option
From: Adrian Bunk <[EMAIL PROTECTED]> Date: Tue, 19 Dec 2006 05:13:00 +0100 > This patch removes the unconverted ATM_TNETA1570 option that also lacks > any code in the kernel. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Applied, thanks Adrian. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: schedule_timeout: wrong timeout value
- Original Message - From: "Robert Hancock" <[EMAIL PROTECTED]> To: "kyle" <[EMAIL PROTECTED]> Cc: Sent: Tuesday, December 19, 2006 10:34 AM Subject: Re: schedule_timeout: wrong timeout value kyle wrote: Hi, Recently my mysql servershows something like: Dec 18 18:24:05 sql kernel: schedule_timeout: wrong timeout value from c0284efd Dec 18 18:24:36 sql last message repeated 19939 times Dec 18 18:25:37 sql last message repeated 33392 times The message means some code in the kernel or in some module passed a negative value to schedule_timeout which it shouldn't have. The c0284efd value is the address of the function that made the call - you may be able to look that up in your /proc/ksyms or the System.map file and figure out what function that is.. There was no module loaded, and unfortunlately, I cannot find the System.map or /proc/ksyms file for the affected kernel! Anyway thank you for your explanation. I have upgraded the kernel to 2.6.17.14 and wish that it can fix the problem. Thank you Kyle - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to sysfs PM layer break userspace
On Tuesday 19 December 2006 6:15 pm, Andrew Morton wrote: > On Tue, 19 Dec 2006 13:34:49 -0800 > David Brownell <[EMAIL PROTECTED]> wrote: > > > Documentation/feature-removal-schedule.txt has warned about this since > > August > > Nobody reads that. > > Please, wherever possible, put a nice printk("this is going away") in the code > when planning these things. Signed-off-by: David Brownell <[EMAIL PROTECTED]> Index: g26/drivers/base/power/sysfs.c === --- g26.orig/drivers/base/power/sysfs.c 2006-09-27 16:19:00.0 -0700 +++ g26/drivers/base/power/sysfs.c 2006-12-19 19:27:25.0 -0800 @@ -42,9 +42,17 @@ static ssize_t state_show(struct device static ssize_t state_store(struct device * dev, struct device_attribute *attr, const char * buf, size_t n) { + static int warned; pm_message_t state; int error = -EINVAL; + if (!warned) { + printk(KERN_WARNING + "*** WARNING *** sysfs devices/.../power/state files " + "are only for testing, and will be removed\n"); + warned = error; + } + /* disallow incomplete suspend sequences */ if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early)) return error; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
On Tuesday 19 December 2006 4:09 pm, Matthew Garrett wrote: > On Tue, Dec 19, 2006 at 03:36:28PM -0800, David Brownell wrote: > > On Tuesday 19 December 2006 2:57 pm, Matthew Garrett wrote: > > > The fact that something is scheduled to be removed in July 2007 does > > > *not* mean it's acceptable to break it in 2006. We need to find a way to > > > fix this functionality in the meantime. > > > > The disconnect here is analagous to: I tell you the alleged perpetual > > motion machine never worked, and can't ever work; and you push back and > > say that you need a perpetual motion machine that works, NOW please, > > because you need something that pushes those widgets around. (There are > > better ways to push widgets than side effects of a broken machine...) > > But it *did* work. Having been on the other side ... I can testify that if you think it actually worked, it's because you're ignoring all the nasty failure modes. > > I'd not be keen on reverting Linus' patch [1] myself, even though few > > drivers have started to use that mechanism yet; that would be a step > > backwards, and would perpetuate users of that broken sysfs file. > > I'm sorry, which bit of "Don't break userspace API without adequate > prior warning and with a workable replacement" is difficult to > understand? What part of "it was already broken" do YOU not understand? The whole notion is unsustainable. It doesn't work cross-platform, or for multiple bus types. It confuses system-wide suspend mechanisms with runtime mechanisms. It breaks guaranteed parent/child ordering of suspend/resume calls. (And more...) Let us know when you get tired of whining and want to move on to getting a real solution to the set of problems here. I've pointed out that reverting Linus' patch would be one option to get your short term issue rsolved ... that would remove a capability from PCI drivers, but you could then use that deprecated mechanism. I've also pointed out that you could start working towards a real long term solution. Do you have an alternate solution? - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add pci class code for SATA
On 12/20/06, Jeff Garzik <[EMAIL PROTECTED]> wrote: Conke Hu wrote: > Add pci class code 0x0106 for SATA to pci_ids.h > > signed-off-by: [EMAIL PROTECTED] > > --- linux-2.6.20-rc1/include/linux/pci_ids.h.orig 2006-12-20 > 01:58:30.0 +0800 > +++ linux-2.6.20-rc1/include/linux/pci_ids.h 2006-12-20 > 01:59:07.0 +0800 > @@ -15,6 +15,7 @@ > #define PCI_CLASS_STORAGE_FLOPPY 0x0102 > #define PCI_CLASS_STORAGE_IPI0x0103 > #define PCI_CLASS_STORAGE_RAID 0x0104 > +#define PCI_CLASS_STORAGE_SATA 0x0106 > #define PCI_CLASS_STORAGE_SAS0x0107 > #define PCI_CLASS_STORAGE_OTHER 0x0180 Two comments: 1) I think "_SATA" is an inaccurate description. It should be _AHCI AFAICS. 2) Typically we don't add constants unless they are used somewhere... Jeff Hi Jeff, According to PCI spec 3.0, 0x0106 means SATA controller, 0x010601 means AHCI and 0x010600 means vendor specific SATA controller. Pls see the following table (PCI spec 3.0 P296): Base Class Sub-Class Interface Meaning 00h 00h SCSI bus controller 01h xxh IDE controller --- 02h 00h Floppy disk controller - 03h 00h IPI bus controller -- 04h 00h RAID controller 01h 20h ATA controller with ADMA interface 05h --- 30h ATA controller with ADMA interface --- 00h Serial ATA controller–vendor specific interface 06h - 01h Serial ATA controller–AHCI 1.0 interface - 07h 00h Serial Attached SCSI (SAS) controller - 80h 00h Other mass storage controller -- So, I think, the following macro is correct: #define PCI_CLASS_STORAGE_SATA 0x0106 If you would define AHCI class code, it should be 0x010601, not 0x0106: #define PCI_CLASS_STORAGE_SATA_AHCI 0x010601 And, I think that PCI_CLASS_STORAGE_SATA had better be added to pci_ids.h since the class code 0x0106 is used more than once. e.g. ahci.c uses the magic number 0x0106 twice, and it might be used more in future. Best regards, Conke - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 7596 - Potential performance bottleneck for Linxu TCP
From: Herbert Xu <[EMAIL PROTECTED]> Date: Wed, 20 Dec 2006 10:52:19 +1100 > Stephen Hemminger <[EMAIL PROTECTED]> wrote: > > I noticed this bit of discussion in tcp_recvmsg. It implies that a better > > queuing policy would be good. But it is confusing English (Alexey?) so > > not sure where to start. > > Actually I think the comment says that the current code isn't the > most elegant but is more efficient. It's just explaining the hierarchy of queues that need to be purged, and in what order, for correctness. Alexey added that code when I mentioned to him, right after we added the prequeue, that it was possible process the normal backlog before the prequeue, which is illegal. In fixing that bug, he added the comment we are discussing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
BUG: wedged processes, test program supplied
Somebody PLEASE try this... Normally, when a process dies it becomes a zombie. If the parent dies (before or after the child), the child is adopted by init. Init will reap the child. The program included below DOES NOT get reaped. Do like so: gcc -m32 -O2 -std=gnu99 -o foo foo.c while true; do killall -9 foo; ./foo; sleep 1; done BTW, it gets even better if you start playing with ptrace. Use the "strace" program (following children) and/or start sending rapid-fire SIGKILL to all the various _threads_ in the processes. You can get processes wedged in a wide variety of interesting states. I've seen "X" state, processes sitting around with pending SIGKILL, a process stuck in "D" state supposedly core dumping despite ulimit 0 on the core size, etc. / #include #include #include #include #include #include #include #include #include #include #include #include #include #include static void early_write(int fd, const void *buf, size_t count) { #if 0 unsigned long eax = __NR_write; /* push and pop because -fPIC probably needs ebx for the GOT base pointer */ __asm__ __volatile__( "push %%ebx ; " "push %1 ; pop %%ebx ; int $0x80" "; pop %%ebx" :"=a"(eax) :"r"(fd),"c"(buf),"d"(count),"0"(eax) :"memory" ); #endif } static void p_str(char *s) { size_t count = strlen(s); early_write(STDERR_FILENO,s,count); } static void p_hex(unsigned long u) { char buf[9]; char x[] = "0123456789abcdef"; char *s = buf; s[8] = '\0'; int i = 8; while(i--) buf[7-i] = x[(u>>(i*4))&15]; early_write(STDERR_FILENO,buf,8); } static void p_dec(unsigned long u) { char buf[11]; char *s = buf+10; *s-- = '\0'; int count = 0; while(u || !count) { *s-- = u%10 + '0'; u /= 10; count++; } early_write(STDERR_FILENO,s+1,count); } #define FUTEX_WAIT 0 #define FUTEX_WAKE 1 typedef int lock_t; #define LOCK_INITIALIZER 0 static inline void init_lock(lock_t* l) { *l = 0; } // lock_add performs an atomic add // and returns the resulting value static inline int lock_add(lock_t* l, int val) { int result = val; __asm__ __volatile__ ( "lock; xaddl %1, %0;" : "=m" (*l), "=r" (result) : "1" (result), "m" (*l) : "memory"); return result + val; // Returns the value written to memory } // lock_bts_high_bit atomically tests and // sets the high bit and returns // true if the bit was clear initially static inline bool lock_bts_high_bit(lock_t* l) { bool result; __asm__ __volatile__ ( "lock; btsl $31, %0;\n\t" "setnc %1;" : "=m" (*l), "=q" (result) : "m" (*l) : "memory"); return result; } static int futex(int* uaddr, int op, int val, const struct timespec*timeout, int*uaddr2, int val3) { (void)timeout; (void)uaddr2; (void)val3; int eax = __NR_futex; __asm__ __volatile__( "push %%ebx ; push %1 ; pop %%ebx" " ; int $0x80; pop %%ebx" :"=a"(eax) :"r"(uaddr),"c"(op),"d"(val),"0"(eax) :"memory" ); return eax; } // lock will wait for and lock a mutex static void lock(lock_t* l) { // Check the mutex and set held bit if (lock_bts_high_bit(l)) { // Got the mutex return; } // Increment wait count lock_add(l, 1); while (true) { // Check the mutex and set held bit if (lock_bts_high_bit(l)) { // Got mutex, decrement wait count lock_add(l, -1); return; } int val = *l; // Ensure mutex not given up since check if (!(val & 0x8000)) continue; // Wait for the mutex futex(l, FUTEX_WAIT, val, NULL, NULL, 0); } } // unlock will release a mutex static void unlock(lock_t* l) { // Turn off lock held bit and check for waiters if (lock_add(l, 0x8000) == 0) { // No waiters return; } // Waiters found, wake up one of them futex(l, FUTEX_WAKE, 1, NULL, NULL, 0); } unsigned toomany = 42; struct data { unsigned nprocs; lock_t lock; unsigned count; }; struct data *data; static struct data *get_shm(void) { void *addr; int shmid; // create shmid = shmget(IPC_PRIVATE,42,IPC_CREAT|0666); // attach addr = shmat(shmid, NULL, 0); // don't want it to
Re: [Alsa-devel] HDA Intel sound driver fails on Acer notebook
On Tuesday 19 December 2006 20:48, tony mancill wrote: > FWIW, using pci=noacpi seems to break the USB controller on this laptop. > I get "device not accepting address xx, error -110. Strange. I'm using an Acer Aspire 1640Z and the sound works perfectly. Of course Kubuntu was the only distro I could find that did OOB, but that's besides the point. In a quick look through /etc on my laptop I wasn't able to see how they do this. But after doing a quick check on Google the reports vary from this being a patched bug in ALSA to being easily solved by ensuring that the needed sound modules are loaded in the proper order. An alternate solution to this is to load the snd-hda-intel module with the parameter "model=laptop" > In addition, neither the onboard nor the wireless NIC work anymore with > this option. For the onboard, you see that the link is up, but then > get "NETDEV WATCHDOG: eth0: transmit timed out." > > acpi=off is worse - the boot hangs trying to load acpi/thermal.ko. >From personal experience I can say that ACPI is needed for Acer notebooks with the centrino chipset to function properly. > I've tested with both 1.0.13 and and 1.0.14rc1. I don't get exactly > the same kernel logging (I'm using a Debian 2.6.18 kernel), but kern.log > contains: I had the same problem when I tried Debian on this laptop. I don't recommend it for laptops, since there are several common pieces of hardware found on laptops that need firmware not shipped by Debian. This includes the ipw2200 firmware - which most Acer laptops need, because they ship with that wireless card. > Dec 19 17:39:43 maus kernel: : hda_codec: invalid dep_range_val 0:7fff > Dec 19 17:39:43 maus kernel: ALSA > /home/tony/alsa-driver-1.0.14rc1/pci/hda/hda_codec.c:216: hda_codec: > invalid dep_range_val 0:7fff Dec 19 17:39:43 maus last message repeated 279 > times > Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0xd > Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0x9 > Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0xd > Dec 19 17:39:43 maus last message repeated 20 times > Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0x9 > > Thanks in advance for any assistance. I hope you enjoyed your > vacation. > > Thanks, > tony > > Takashi Iwai wrote: > > Hi, > > > > sorry for the late reply since I've been on vacation. > > > > At Sun, 3 Dec 2006 02:30:34 -0500, > > > > Chuck Ebbert wrote: > >> The HDA Intel sound driver still fails to load on my Acer Aspire 5102 > >> notebook (Turion64 X2, ATI chipset): > >> > >> Here is the PCI info while running x86_64. I tried i386 and x86_64 and > >> it fails on both: > >> > >> 00:14.2 Audio device: ATI Technologies Inc Unknown device 437b (rev 01) > >> Subsystem: Acer Incorporated [ALI] Unknown device 009f > >> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > >> ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- > >> ParErr- DEVSEL=slow >TAbort- SERR- >> 64, Cache Line Size 08 > >> Interrupt: pin ? routed to IRQ 16 > >> Region 0: Memory at c000 (64-bit, non-prefetchable) > >> [size=16K] Capabilities: [50] Power Management version 2 > >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA > >> PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 > >> PME- > >> Capabilities: [60] Message Signalled Interrupts: 64bit+ > >> Queue=0/0 Enable- Address: Data: > >> 00: 02 10 7b 43 06 00 10 04 01 00 03 04 08 40 00 00 > >> 10: 04 00 00 c0 00 00 00 00 00 00 00 00 00 00 00 00 > >> 20: 00 00 00 00 00 00 00 00 00 00 00 00 25 10 9f 00 > >> 30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 00 00 00 > >> 40: 00 00 02 40 00 00 00 00 00 00 00 00 00 00 00 00 > >> 50: 01 60 42 c8 00 00 00 00 00 00 00 00 00 00 00 00 > >> 60: 05 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> On i386 I get this after doing > >> insmod snd-hda-codec.ko ; insmod snd-hda-intel.ko > >> > >> Dec 1 17:38:29 ac kernel: ACPI: PCI Interrupt :00:14.2[A] -> GSI 16 > >> (level, low) -> IRQ 18 Dec 1 17:38:29 ac kernel: codec_mask = 0xb > >> Dec 1 17:38:30 ac kernel: hda_codec: PCI 1025:9f, codec config 5 is > >> selected Dec 1 17:38:31 ac kernel: hda_intel: azx_get_response timeout, > >> switching to polling mode... Dec 1 17:38:32 ac kernel: hda_intel: > >> azx_get_response timeout, switching to single_cmd mode... > > > > These messages are scary. It
Re: SATA DMA problem (sata_uli)
Jeff Garzik wrote: > Tejun Heo wrote: >> Jeff Garzik wrote: >>> Alan wrote: > I tracked it down to one of the drives being forced into PIO4 mode > rather than UDMA mode; dmesg bits: > ata4.00: ATA-7, max UDMA/133, 586072368 sectors: LBA48 NCQ (depth > 0/32) > ata4.00: ata4: dev 0 multi count 16 > ata4.00: simplex DMA is claimed by other device, disabling DMA Your ULi controller is reporting that it supports UDMA upon only one channel at a time. The kernel is honouring this information. The older ULi (was ALi) PATA devices report simplex but let you turn it off so see if the following does the trick. Test carefully as always with disk driver changes. (Jeff probably best to check the docs before merging this but I believe it is sane) Signed-off-by: Alan Cox <[EMAIL PROTECTED]> >>> My Uli SATA docs do not appear to cover the bmdma registers :( Only the >>> PCI config registers. >>> >>> But regardless, I think the better fix is to never set ATA_HOST_SIMPLEX >>> if ATA_FLAG_NO_LEGACY is set. >>> >>> None of the SATA controllers I've ever encountered has been simplex. >> >> Just another data point. The same problem is reported by bug #7590. >> >> http://bugzilla.kernel.org/show_bug.cgi?id=7590 >> >> Is somebody brewing a patch? > > Not to my knowledge. Did you just volunteer? ;-) > > /me runs... I'm just gonna ack Alan's patch. * ATA_FLAG_NO_LEGACY is not really used widely (and thus LLDs don't set it rigorously). I think it should be removed once we get initialization model right. * I'm really reluctant to add more LLD-specific knowledge into libata core. We're already carrying too much due to the current init model (libata should initialize host according to probe_ent, so many weirdities should be represented in probe_ent in a form libata core understands). * The idea of clearing simplex for unknown controllers scares the hell out of me. where's mummy... So, I'll ask bug reporter of #7590 to test it. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to sysfs PM layer break userspace
On Tue, 19 Dec 2006 18:15:24 -0800 Andrew Morton wrote: > On Tue, 19 Dec 2006 13:34:49 -0800 > David Brownell <[EMAIL PROTECTED]> wrote: > > > Documentation/feature-removal-schedule.txt has warned about this since > > August > > Nobody reads that. Ugh, I read it. > Please, wherever possible, put a nice printk("this is going away") in the code > when planning these things. Can notices go in both places, or is in the source code (printk) now the preferred way? I think that we can point people to Doc/feature-removal-schedule.txt easier (and more effectively) than we can source code (or noisy kernel logs). --- ~Randy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] Fix kmalloc flags used in ext3 with an active journal handle
Andrew Morton wrote: On Tue, 19 Dec 2006 17:58:12 -0800 Suzuki <[EMAIL PROTECTED]> wrote: * Fix the kmalloc flags used from within ext3, when we have an active journal handle If we do a kmalloc with GFP_KERNEL on system running low on memory, with an active journal handle, we might end up in cleaning up the fs cache flushing dirty inodes for some other filesystem. This would cause hitting a J_ASSERT() in : The change might be needed (haven't looked at it yet). But I'd like to see the full BUG trace, please. To see the callchain. Here is the call trace which was hit by one of our test teams. This was from fs/ext3/xattr.c. While looking for similar calls I found the others described in the patch. Assertion failure in journal_start() at fs/jbd/transaction.c:274: "handle- >h_transaction->t_journal == journal" kernel BUG at fs/jbd/transaction.c:274! illegal operation: 0001 [#1] CPU:0Not tainted (2.6.5-7.282-s390x SLES9_SP3_BRANCH-20061031152356) Process dbench (pid: 14070, task: 025617f0, ksp: 01057630) Krnl PSW : 07018000 08837b38 (journal_start+0x90/0x15c [jbd]) Krnl GPRS: 00507fc0 002b 01056d80 08837b36 2885 08841da6 001bfaa0 03483d08 0002 07a8bda0 08833000 088a7d08 08837b36 01056e80 Krnl Code: 00 00 58 10 b0 0c a7 1a 00 01 b9 04 00 2b 50 10 b0 0c e3 40 Call Trace: [<088a30fc>] ext3_journal_start+0x8c/0xa4 [ext3] [<08896822>] ext3_dirty_inode+0x3a/0xe0 [ext3] [<001ca362>] __mark_inode_dirty+0x1ae/0x1c8 [<001bfaa0>] iput+0xbc/0xf0 [<001bdcca>] prune_dcache+0x29e/0x584 [<001bdfe4>] shrink_dcache_memory+0x34/0x54 [<0017b100>] shrink_slab+0x15c/0x250 [<0017b6e4>] try_to_free_pages+0x1c0/0x2a4 [<00170276>] __alloc_pages+0x2ba/0x4e0 [<0017059a>] __get_free_pages+0x4e/0x8c [<00174ea2>] cache_alloc_refill+0x2a6/0x868 [<00175540>] __kmalloc+0xdc/0xe0 [<088a4e62>] ext3_xattr_set_handle+0x114a/0x174c [ext3] [<088a54e4>] ext3_xattr_set+0x80/0xd0 [ext3] [<088a6312>] ext3_xattr_user_set+0xce/0xe4 [ext3] [<088a5f1e>] ext3_setxattr+0x17e/0x18c [ext3] [<001c88e6>] setxattr+0x14a/0x234 [<001c8a80>] sys_fsetxattr+0xb0/0x110 [<0011fc10>] sysc_noemu+0x10/0x16 Always include the trace... Will take care of it from now onwards. Thanks. * Fix the kmalloc flags used from within ext3, when we have an active journal handle If we do a kmalloc with GFP_KERNEL on system running low on memory, with an active journal handle, we might end up in cleaning up the fs cache flushing dirty inodes for some other filesystem. This would cause hitting a J_ASSERT() in : handle_t *journal_start(journal_t *journal, int nblocks) { handle_t *handle = journal_current_handle(); int err; [...] if (handle) { J_ASSERT(handle->h_transaction->t_journal == journal); Here are the places where we do kmalloc or may end up doing kmalloc, with __GFP_FS (through GFP_KERNEL) from ext3, while holding a journal handle. 1) fs/ext3/xattr.c :: ext3_xattr_block_set() : 2 occurences 2) fs/ext3/resize.c :: reserve_backup_gdb() 3) fs/ext3/resize.c :: add_new_gdb() 4) fs/ext3/acl.c :: ext3_init_acl() : There are quite a few points where we may endup calling the kmalloc() from ext3_init_acl() which is called with a handle() from ext3_new_inode(): a) Called direclty within ext3_init_acl() as: clone = posix_acl_clone(acl, GFP_KERNEL); b) With the following code path: ext3_init_acl()-> ext3_get_acl()-> ext3_acl_from_disk() -> posix_acl_alloc(GFP_KERNEL) c) Also ext3_init_acl()-> ext3_get_acl()-> kmalloc() also might call kmalloc() directly. 5) fs/ext3/acl.c :: ext3_acl_to_disk() which is called from ext3_set_acl(). Among these 4.b & 4.c may be called from a with or without handle case. There was a similar issue reported sometime back, early this year. http://lkml.org/lkml/2006/1/31/54 Attached patch fixes all the above invocatins to make use of GFP_NOFS instead of GFP_KERNEL. Signed-off-by: Suzuki K P <[EMAIL PROTECTED]> Index: linux-2.6.20-rc1/fs/ext3/xattr.c === --- linux-2.6.20-rc1.orig/fs/ext3/xattr.c 2006-12-13 17:14:23.0 -0800 +++ linux-2.6.20-rc1/fs/ext3/xattr.c 2006-12-19 11:41:35.0 -0800 @@ -718,7 +718,7 @@ ce = NULL; } ea_bdebug(bs->bh, "cloning"); - s->base = kmalloc(bs->bh->b_size, GFP_KERNEL); + s->base = kmalloc(bs->bh->b_size, GFP_NOFS); error = -ENOMEM; if (s->base == NULL) goto cleanup; @@ -730,7 +730,7 @@ } } else { /* Allocate a buffer where we construct the new block. */ - s->base = kmalloc(sb->s_blocksize, GFP_KERNEL); + s->base =
Re: [Alsa-devel] HDA Intel sound driver fails on Acer notebook
FWIW, using pci=noacpi seems to break the USB controller on this laptop. I get "device not accepting address xx, error -110. In addition, neither the onboard nor the wireless NIC work anymore with this option. For the onboard, you see that the link is up, but then get "NETDEV WATCHDOG: eth0: transmit timed out." acpi=off is worse - the boot hangs trying to load acpi/thermal.ko. I've tested with both 1.0.13 and and 1.0.14rc1. I don't get exactly the same kernel logging (I'm using a Debian 2.6.18 kernel), but kern.log contains: Dec 19 17:39:43 maus kernel: : hda_codec: invalid dep_range_val 0:7fff Dec 19 17:39:43 maus kernel: ALSA /home/tony/alsa-driver-1.0.14rc1/pci/hda/hda_codec.c:216: hda_codec: invalid dep_range_val 0:7fff Dec 19 17:39:43 maus last message repeated 279 times Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0xd Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0x9 Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0xd Dec 19 17:39:43 maus last message repeated 20 times Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0x9 Thanks in advance for any assistance. I hope you enjoyed your vacation. Thanks, tony Takashi Iwai wrote: > Hi, > > sorry for the late reply since I've been on vacation. > > At Sun, 3 Dec 2006 02:30:34 -0500, > Chuck Ebbert wrote: >> The HDA Intel sound driver still fails to load on my Acer Aspire 5102 >> notebook (Turion64 X2, ATI chipset): >> >> Here is the PCI info while running x86_64. I tried i386 and x86_64 and it >> fails >> on both: >> >> 00:14.2 Audio device: ATI Technologies Inc Unknown device 437b (rev 01) >> Subsystem: Acer Incorporated [ALI] Unknown device 009f >> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- >> Stepping- SERR- FastB2B- >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- >> SERR- > Latency: 64, Cache Line Size 08 >> Interrupt: pin ? routed to IRQ 16 >> Region 0: Memory at c000 (64-bit, non-prefetchable) [size=16K] >> Capabilities: [50] Power Management version 2 >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA >> PME(D0+,D1-,D2-,D3hot+,D3cold+) >> Status: D0 PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/0 >> Enable- >> Address: Data: >> 00: 02 10 7b 43 06 00 10 04 01 00 03 04 08 40 00 00 >> 10: 04 00 00 c0 00 00 00 00 00 00 00 00 00 00 00 00 >> 20: 00 00 00 00 00 00 00 00 00 00 00 00 25 10 9f 00 >> 30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 00 00 00 >> 40: 00 00 02 40 00 00 00 00 00 00 00 00 00 00 00 00 >> 50: 01 60 42 c8 00 00 00 00 00 00 00 00 00 00 00 00 >> 60: 05 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> On i386 I get this after doing >> insmod snd-hda-codec.ko ; insmod snd-hda-intel.ko >> >> Dec 1 17:38:29 ac kernel: ACPI: PCI Interrupt :00:14.2[A] -> GSI 16 >> (level, low) -> IRQ 18 >> Dec 1 17:38:29 ac kernel: codec_mask = 0xb >> Dec 1 17:38:30 ac kernel: hda_codec: PCI 1025:9f, codec config 5 is selected >> Dec 1 17:38:31 ac kernel: hda_intel: azx_get_response timeout, switching to >> polling mode... >> Dec 1 17:38:32 ac kernel: hda_intel: azx_get_response timeout, switching to >> single_cmd mode... > > These messages are scary. It means that the communication between the > controller chip and the codec chip doesn't work, usually incorrect IRQ > handling, and often due to broken BIOS or ACPI support. Any change if > you pass pci=noacpi or acpi=off boot option? > > Anyway, you can try alsa-git patch in mm tree. It's a better support > code for Acer laptops, and this might work slightly differently. > > > Takashi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to sysfs PM layer break userspace
On Tue, 19 Dec 2006 13:34:49 -0800 David Brownell <[EMAIL PROTECTED]> wrote: > Documentation/feature-removal-schedule.txt has warned about this since > August Nobody reads that. Please, wherever possible, put a nice printk("this is going away") in the code when planning these things. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] Fix kmalloc flags used in ext3 with an active journal handle
On Tue, 19 Dec 2006 17:58:12 -0800 Suzuki <[EMAIL PROTECTED]> wrote: > * Fix the kmalloc flags used from within ext3, when we have an active journal > handle > > If we do a kmalloc with GFP_KERNEL on system running low on memory, > with an active journal handle, we might end up in cleaning up the fs cache > flushing dirty inodes for some other filesystem. This would cause hitting a > J_ASSERT() in : The change might be needed (haven't looked at it yet). But I'd like to see the full BUG trace, please. To see the callchain. Always include the trace... Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: schedule_timeout: wrong timeout value
On Mon, 18 Dec 2006 20:34:43 -0600 Robert Hancock <[EMAIL PROTECTED]> wrote: > kyle wrote: > > Hi, > > > > Recently my mysql servershows something like: > > Dec 18 18:24:05 sql kernel: schedule_timeout: wrong timeout value > > from c0284efd > > Dec 18 18:24:36 sql last message repeated 19939 times > > Dec 18 18:25:37 sql last message repeated 33392 times > > > > from syslog every 1 or 2 days. Whenever the messages show, mysql server > > stop accept new connections from the same network, and I need to restart > > the mysql service and then it will keep running well for 1-2 days until > > the messages show up again. > > > > The server has been running over 1 year without any problem, the problem > > started show up around 2 weeks ago. It's running kernel 2.6.12, and > > mysql server, nothing else. Hardware is Pentium 4 2.8GHz with > > hyperthreading enabled. > > > > What does the kernel message mean and why it make mysql stop accept new > > connections? Is it hardware problem or try upgrade the kernel may help? > > Please CC me if possible. Thank you > > The message means some code in the kernel or in some module passed a > negative value to schedule_timeout which it shouldn't have. The c0284efd > value is the address of the function that made the call - you may be > able to look that up in your /proc/ksyms or the System.map file and > figure out what function that is.. > I queued this: From: Andrew Morton <[EMAIL PROTECTED]> Kyle is hitting this warning, and we don't have a clue what it's caused by. Add the obligatory dump_stack(). Cc: kyle <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- kernel/timer.c |7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff -puN kernel/timer.c~schedule_timeout-improve-warning-message kernel/timer.c --- a/kernel/timer.c~schedule_timeout-improve-warning-message +++ a/kernel/timer.c @@ -1344,11 +1344,10 @@ fastcall signed long __sched schedule_ti * should never happens anyway). You just have the printk() * that will tell you if something is gone wrong and where. */ - if (timeout < 0) - { + if (timeout < 0) { printk(KERN_ERR "schedule_timeout: wrong timeout " - "value %lx from %p\n", timeout, - __builtin_return_address(0)); + "value %lx\n", timeout); + dump_stack(); current->state = TASK_RUNNING; goto out; } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] [PATCH] Fix kmalloc flags used in ext3 with an active journal handle
Hi, The attached patch converts the GFP mask for kmallocs within ext3 to GFP_NOFS whenever they are called with an active journal handle. More description in the patch. Comments ? Thanks, Suzuki Linux Technology Center IBM Systems & Technology Labs. * Fix the kmalloc flags used from within ext3, when we have an active journal handle If we do a kmalloc with GFP_KERNEL on system running low on memory, with an active journal handle, we might end up in cleaning up the fs cache flushing dirty inodes for some other filesystem. This would cause hitting a J_ASSERT() in : handle_t *journal_start(journal_t *journal, int nblocks) { handle_t *handle = journal_current_handle(); int err; [...] if (handle) { J_ASSERT(handle->h_transaction->t_journal == journal); Here are the places where we do kmalloc or may end up doing kmalloc, with __GFP_FS (through GFP_KERNEL) from ext3, while holding a journal handle. 1) fs/ext3/xattr.c :: ext3_xattr_block_set() : 2 occurences 2) fs/ext3/resize.c :: reserve_backup_gdb() 3) fs/ext3/resize.c :: add_new_gdb() 4) fs/ext3/acl.c :: ext3_init_acl() : There are quite a few points where we may endup calling the kmalloc() from ext3_init_acl() which is called with a handle() from ext3_new_inode(): a) Called direclty within ext3_init_acl() as: clone = posix_acl_clone(acl, GFP_KERNEL); b) With the following code path: ext3_init_acl()-> ext3_get_acl()-> ext3_acl_from_disk() -> posix_acl_alloc(GFP_KERNEL) c) Also ext3_init_acl()-> ext3_get_acl()-> kmalloc() also might call kmalloc() directly. 5) fs/ext3/acl.c :: ext3_acl_to_disk() which is called from ext3_set_acl(). Among these 4.b & 4.c may be called from a with or without handle case. There was a similar issue reported sometime back, early this year. http://lkml.org/lkml/2006/1/31/54 Attached patch fixes all the above invocatins to make use of GFP_NOFS instead of GFP_KERNEL. Signed-off-by: Suzuki K P <[EMAIL PROTECTED]> Index: linux-2.6.20-rc1/fs/ext3/xattr.c === --- linux-2.6.20-rc1.orig/fs/ext3/xattr.c 2006-12-13 17:14:23.0 -0800 +++ linux-2.6.20-rc1/fs/ext3/xattr.c 2006-12-19 11:41:35.0 -0800 @@ -718,7 +718,7 @@ ce = NULL; } ea_bdebug(bs->bh, "cloning"); - s->base = kmalloc(bs->bh->b_size, GFP_KERNEL); + s->base = kmalloc(bs->bh->b_size, GFP_NOFS); error = -ENOMEM; if (s->base == NULL) goto cleanup; @@ -730,7 +730,7 @@ } } else { /* Allocate a buffer where we construct the new block. */ - s->base = kmalloc(sb->s_blocksize, GFP_KERNEL); + s->base = kmalloc(sb->s_blocksize, GFP_NOFS); /* assert(header == s->base) */ error = -ENOMEM; if (s->base == NULL) Index: linux-2.6.20-rc1/fs/ext3/resize.c === --- linux-2.6.20-rc1.orig/fs/ext3/resize.c 2006-12-13 17:14:23.0 -0800 +++ linux-2.6.20-rc1/fs/ext3/resize.c 2006-12-19 11:42:39.0 -0800 @@ -440,7 +440,7 @@ goto exit_dindj; n_group_desc = kmalloc((gdb_num + 1) * sizeof(struct buffer_head *), - GFP_KERNEL); + GFP_NOFS); if (!n_group_desc) { err = -ENOMEM; ext3_warning (sb, __FUNCTION__, @@ -524,7 +524,7 @@ int res, i; int err; - primary = kmalloc(reserved_gdb * sizeof(*primary), GFP_KERNEL); + primary = kmalloc(reserved_gdb * sizeof(*primary), GFP_NOFS); if (!primary) return -ENOMEM; Index: linux-2.6.20-rc1/fs/ext3/acl.c === --- linux-2.6.20-rc1.orig/fs/ext3/acl.c 2006-12-13 17:14:23.0 -0800 +++ linux-2.6.20-rc1/fs/ext3/acl.c 2006-12-19 11:45:35.0 -0800 @@ -37,7 +37,7 @@ return ERR_PTR(-EINVAL); if (count == 0) return NULL; - acl = posix_acl_alloc(count, GFP_KERNEL); + acl = posix_acl_alloc(count, GFP_NOFS); if (!acl) return ERR_PTR(-ENOMEM); for (n=0; n < count; n++) { @@ -91,7 +91,7 @@ *size = ext3_acl_size(acl->a_count); ext_acl = kmalloc(sizeof(ext3_acl_header) + acl->a_count * - sizeof(ext3_acl_entry), GFP_KERNEL); + sizeof(ext3_acl_entry), GFP_NOFS); if (!ext_acl) return ERR_PTR(-ENOMEM); ext_acl->a_version = cpu_to_le32(EXT3_ACL_VERSION); @@ -187,7 +187,7 @@ } retval = ext3_xattr_get(inode, name_index, "", NULL, 0); if (retval > 0) { - value = kmalloc(retval, GFP_KERNEL); + value = kmalloc(retval, GFP_NOFS); if (!value) return ERR_PTR(-ENOMEM); retval = ext3_xattr_get(inode, name_index, "", value, retval); @@ -335,7 +335,7 @@ if (error) goto cleanup; } - clone = posix_acl_clone(acl, GFP_KERNEL); + clone = posix_acl_clone(acl, GFP_NOFS); error = -ENOMEM; if (!clone) goto cleanup;
Re: [RFC] HZ free ntp
On Tue, 2006-12-19 at 17:32 -0800, john stultz wrote: > On Wed, 2006-12-13 at 21:40 +0100, Roman Zippel wrote: > > On Wed, 13 Dec 2006, john stultz wrote: > > > > You don't have to introduce anything new, it's tick_length that changes > > > > and HZ that becomes a variable in this function. > > > > > > So, forgive me for rehashing this, but it seems we're cross talking > > > again. The context here is the dynticks code. Where HZ doesn't change, > > > but we get interrupts at much reduced rates. > > > > I know and all you have to change in the ntp and some related code is to > > replace HZ there with a variable, thus make it changable, so you can > > increase the update interval (i.e. it becomes 1s/hz instead of 1s/HZ). > > Untested patch below. Does this vibe better with you are suggesting? And here would be the follow on patch (again *untested*) for CONFIG_NO_HZ slowing the time accumulation down to once per second. thanks -john diff --git a/include/linux/timex.h b/include/linux/timex.h index 8241e6e..3beb539 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -286,7 +286,11 @@ #endif /* !CONFIG_TIME_INTERPOLATION */ #define TICK_LENGTH_SHIFT 32 +#ifdef CONFIG_NO_HZ +#define NTP_INTERVAL_FREQ (1) +#else #define NTP_INTERVAL_FREQ (HZ) +#endif #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ) /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */ diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index d0ba190..53979a9 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -127,12 +127,14 @@ EXPORT_SYMBOL_GPL(ktime_get_ts); */ static void hrtimer_get_softirq_time(struct hrtimer_base *base) { + struct timespec ts; ktime_t xtim, tomono; unsigned long seq; do { seq = read_seqbegin(_lock); - xtim = timespec_to_ktime(xtime); + getnstimeofday(); + xtim = timespec_to_ktime(ts); tomono = timespec_to_ktime(wall_to_monotonic); } while (read_seqretry(_lock, seq)); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] powerpc iseries link error in allmodconfig
On Tue, 19 Dec 2006 15:57:19 + David Woodhouse <[EMAIL PROTECTED]> wrote: > > On Wed, 2006-11-08 at 09:34 -0800, Judith Lebzelter wrote: > > Choose rpa_vscsi.c over iseries_vscsi.c when building both > > pseries and iseries. > > Would it not be better to make them both work instead? The maintainer's take on this is the noone installs onto vscsi disks on legacy iSeries. > Untested-but-otherwise-Signed-off-by: David Woodhouse <[EMAIL PROTECTED]> And that will, unfortunately, never get into 2.6.20. I suggest that we put the simpler patch into 2.6.20 and maybe revisit this afterwards if we think it is worth the effort. -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpttq8DMhZ6Y.pgp Description: PGP signature
Re: [RFC] HZ free ntp
On Wed, 2006-12-13 at 21:40 +0100, Roman Zippel wrote: > On Wed, 13 Dec 2006, john stultz wrote: > > > You cannot choose arbitrary intervals otherwise you get other problems, > > > e.g. with your patch time_offset handling is broken. > > > > I'm not seeing this yet. Any more details? > > time_offset is scaled to HZ in do_adjtimex, which needs to be changed as > well. Ah, thanks! Fixed. > > > You don't have to introduce anything new, it's tick_length that changes > > > and HZ that becomes a variable in this function. > > > > So, forgive me for rehashing this, but it seems we're cross talking > > again. The context here is the dynticks code. Where HZ doesn't change, > > but we get interrupts at much reduced rates. > > I know and all you have to change in the ntp and some related code is to > replace HZ there with a variable, thus make it changable, so you can > increase the update interval (i.e. it becomes 1s/hz instead of 1s/HZ). Untested patch below. Does this vibe better with you are suggesting? Any other suggestions or feedback? thanks -john diff --git a/include/linux/timex.h b/include/linux/timex.h index db501dc..8241e6e 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -286,6 +286,9 @@ #endif /* !CONFIG_TIME_INTERPOLATION */ #define TICK_LENGTH_SHIFT 32 +#define NTP_INTERVAL_FREQ (HZ) +#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ) + /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */ extern u64 current_tick_length(void); diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 3afeaa3..eb12509 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -24,7 +24,7 @@ static u64 tick_length, tick_length_base #define MAX_TICKADJ500 /* microsecs */ #define MAX_TICKADJ_SCALED (((u64)(MAX_TICKADJ * NSEC_PER_USEC) << \ - TICK_LENGTH_SHIFT) / HZ) + TICK_LENGTH_SHIFT) / NTP_INTERVAL_FREQ) /* * phase-lock loop variables @@ -46,13 +46,17 @@ #define CLOCK_TICK_ADJUST (((s64)CLOCK_T static void ntp_update_frequency(void) { - tick_length_base = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << TICK_LENGTH_SHIFT; - tick_length_base += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; - tick_length_base += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); + u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) + << TICK_LENGTH_SHIFT; + second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; + second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); - do_div(tick_length_base, HZ); + tick_length_base = second_length; - tick_nsec = tick_length_base >> TICK_LENGTH_SHIFT; + do_div(second_length, HZ); + tick_nsec = second_length >> TICK_LENGTH_SHIFT; + + do_div(tick_length_base, NTP_INTERVAL_FREQ); } /** @@ -162,7 +166,7 @@ void second_overflow(void) tick_length -= MAX_TICKADJ_SCALED; } else { tick_length += (s64)(time_adjust * NSEC_PER_USEC / -HZ) << TICK_LENGTH_SHIFT; + NTP_INTERVAL_FREQ) << TICK_LENGTH_SHIFT; time_adjust = 0; } } @@ -239,7 +243,8 @@ #endif result = -EINVAL; goto leave; } - time_freq = ((s64)txc->freq * NSEC_PER_USEC) >> (SHIFT_USEC - SHIFT_NSEC); + time_freq = ((s64)txc->freq * NSEC_PER_USEC) + >> (SHIFT_USEC - SHIFT_NSEC); } if (txc->modes & ADJ_MAXERROR) { @@ -309,7 +314,8 @@ #endif freq_adj += time_freq; freq_adj = min(freq_adj, (s64)MAXFREQ_NSEC); time_freq = max(freq_adj, (s64)-MAXFREQ_NSEC); - time_offset = (time_offset / HZ) << SHIFT_UPDATE; + time_offset = (time_offset / NTP_INTERVAL_FREQ) + << SHIFT_UPDATE; } /* STA_PLL */ } /* txc->modes & ADJ_OFFSET */ if (txc->modes & ADJ_TICK) @@ -324,8 +330,10 @@ leave: if ((time_status & (STA_UNSYNC|ST if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT) txc->offset= save_adjust; else - txc->offset= shift_right(time_offset, SHIFT_UPDATE) * HZ / 1000; - txc->freq = (time_freq / NSEC_PER_USEC) << (SHIFT_USEC - SHIFT_NSEC); + txc->offset= shift_right(time_offset, SHIFT_UPDATE) + * NTP_INTERVAL_FREQ / 1000; + txc->freq = (time_freq / NSEC_PER_USEC) + << (SHIFT_USEC - SHIFT_NSEC); txc->maxerror = time_maxerror; txc->esterror = time_esterror; txc->status
[patch 1/4] Add
Hello, what about something along the lines of the following, on top of your patch ? Or should the kernel-doc be put on another function instead of that one ? -- Vincent Legoll Add do_syslog() kernel-doc --- commit 95b0721d8b4b46ddf83113fe49492810d7d92060 tree e2715a8cf7eb0d71b3bee2185a5cf98639d79d90 parent de794d2dfd6dd0c38dd552020ac00c46e1df5293 author Vincent Legoll <[EMAIL PROTECTED]> Wed, 20 Dec 2006 01:29:34 +0100 committer Vincent Legoll <[EMAIL PROTECTED]> Wed, 20 Dec 2006 01:29:34 +0100 kernel/printk.c | 11 ++- 1 files changed, 10 insertions(+), 1 deletions(-) diff --git a/kernel/printk.c b/kernel/printk.c index 232467e..5416d07 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -164,7 +164,16 @@ out: __setup("log_buf_len=", log_buf_len_setup); -/* See linux/klog.h for the command numbers passed as the first argument. */ +/** + * do_syslog - operate on kernel messages log + * @type: operation to perform + * @buf: user-space buffer to copy data into + * @len: length of data to copy from log into @buf + * + * See include/linux/klog.h for the command numbers passed as @type. + * Parameters @buf & @len are only used for operations of type %KLOG_READ, + * %KLOG_READ_HIST and %KLOG_READ_CLEAR_HIST. + */ int do_syslog(int type, char __user *buf, int len) { unsigned long i, j, limit, count;
Re: [patch] hrtimers: add state tracking, fix
Am 19.12.2006 20:56 schrieb Ingo Molnar: > thanks for the report - this made me review the hrtimer state engine > logic, and bingo, it indeed has a nasty typo! Could you try the fix > below, does it fix your problem? It might explain the crash you are > seeing, because the typo means we'd ignore HRTIMER_STATE_PENDING state > (which is rare but possible). Ok, the machine has been running for a couple of hours with that patch and so far hasn't frozen again. I'll watch it some more but it looks like your patch did indeed fix my problem. Thanks Tilman > --> > Subject: [patch] hrtimers: add state tracking, fix > From: Ingo Molnar <[EMAIL PROTECTED]> > > fix bug in hrtimer_is_queued(), introduced by a cleanup during > the recent refactoring. > > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> > --- > kernel/hrtimer.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux/kernel/hrtimer.c > === > --- linux.orig/kernel/hrtimer.c > +++ linux/kernel/hrtimer.c > @@ -157,7 +157,7 @@ static void hrtimer_get_softirq_time(str > static inline int hrtimer_is_queued(struct hrtimer *timer) > { > return timer->state & > - (HRTIMER_STATE_ENQUEUED || HRTIMER_STATE_PENDING); > + (HRTIMER_STATE_ENQUEUED | HRTIMER_STATE_PENDING); > } > > /* -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) signature.asc Description: OpenPGP digital signature
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Mon, 18 Dec 2006 09:48:01 -0700 [EMAIL PROTECTED] (Eric W. Biederman) wrote: > [EMAIL PROTECTED] writes: > > > http://bugzilla.kernel.org/show_bug.cgi?id=7505 > > > > --- Additional Comments From [EMAIL PROTECTED] 2006-12-18 07:39 --- > > OK, fixed. > > > Greg. > > It appears commit d71374dafbba7ec3f67371d3b7e9f6310a588808 which > replaced the pci bus spinlock with a semaphore causes some systems not > to boot. I haven't a clue why. > > So I figure I would toss the ball over to your court to see if you can > look and see what needs to happen to resolve this problem. > > There appears to be at least one positive confirmation that reverting > this patch allows this patch fixes the problems. > That's weird. Quoting the bug report: There are output from kernel with enabled 'earlyprintk' option. Linux version 2.6.19-rc5 ([EMAIL PROTECTED]) (gcc version 4.1.2 20060901 (prerelease) (Debian 4.1.1-13)) #2 PREEMPT Sat Nov 11 16:04:00 MSK 2006 Command line: BOOT_IMAGE=Linux-bug ro root=303 video=radeonfb:mode:[EMAIL PROTECTED] idebus=66 earlyprintk=serial,ttyS0,9600,keep BIOS-provided physical RAM map: BIOS-e820: - 0009f800 (usable) BIOS-e820: 0009f800 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 1fff (usable) BIOS-e820: 1fff - 1fff3000 (ACPI NVS) BIOS-e820: 1fff3000 - 2000 (ACPI data) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - 0001 (reserved) end_pfn_map = 1048576 kernel direct mapping tables up to 1 @ 8000-d000 DMI 2.2 present. Zone PFN ranges: DMA 0 -> 4096 DMA324096 -> 1048576 Normal1048576 -> 1048576 early_node_map[2] active PFN ranges 0:0 -> 159 0: 256 -> 131056 Nvidia board detected. Ignoring ACPI timer override. ACPI: PM-Timer IO Port: 0x4008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: BIOS IRQ0 pin2 override ignored. ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Nosave address range: 0009f000 - 000a Nosave address range: 000a - 000f Nosave address range: 000f - 0010 Allocating PCI resources starting at 3000 (gap: 2000:c000) Built 1 zonelists. Total pages: 128336 Kernel command line: BOOT_IMAGE=Linux-bug ro root=303 video=radeonfb:mode:[EMAIL PROTECTED] idebus=66 earlyprintk=serial,ttyS0,9600,keep ide_setup: idebus=66 Initializing CPU#0 general protection fault: 013b [1] PREEMPT CPU 0 Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.19-rc5 #2 RIP: 0010:[] [] init_8259A+0xb6/0xf0 RSP: 0018:803cdf68 EFLAGS: 00010246 RAX: 00ff RBX: 0246 RCX: b4fcb55f RDX: 0011 RSI: 8013cf40 RDI: 0199 RBP: R08: R09: R10: 0001 R11: 0070 R12: R13: R14: R15: FS: () GS:803c() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 00f0aed9 CR3: 00101000 CR4: 06a0 Process swapper (pid: 0, threadinfo 803cc000, task 80360360) Stack: 803d3a46 800089360a40206f 0009 0008e000 803d3ab9 803ddd99 0009 803cf65a 0009 Call Trace: [] init_ISA_irqs+0x16/0x80 [] init_IRQ+0x9/0x1e0 [] rcu_cpu_notify+0x49/0x60 [] start_kernel+0xda/0x1f0 [] _sinittext+0x146/0x150 I assume we went splat in start_kernel->trap_init->cpu_init. We shouldn't have touched pci_bus_lock that early? Perhaps acpi does PCI things very early.. Conceivably an accidental early local_irq_enable could cause bad things, but that rwsem should be 100% uncontended. Could the reporters please determine whether disabling the various CONFIG_DEBUG_* options prevents this? Such as CONFIG_DEBUG_LOCKDEP, CONFIG_DEBUG_LOCK_ALLOC, CONFIG_PROVE_LOCKING, etc? Also, some additional oops traces would be nice, if we can get them. (Please do reply-to-all via email from now on, rather than using the bugzilla UI). - To unsubscribe from this list: send the line
[PATCH] NFS: Kill the obsolete NFS_PARANOIA
Linus, This patch has been both compile and run-time tested. It has been in -mm for quite a while without problems. Trond & Andrew have both signed off on it. Please apply. Remove obsolete NFS_PARANOIA. Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Acked-by: Trond Myklebust <[EMAIL PROTECTED]> --- fs/nfs/dir.c | 17 ++--- fs/nfs/getroot.c |1 - fs/nfs/inode.c|3 --- fs/nfs/nfs2xdr.c |1 - fs/nfs/pagelist.c |7 --- 5 files changed, 2 insertions(+), 27 deletions(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index dee3d6c..8b71075 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -38,7 +38,6 @@ #include "nfs4_fs.h" #include "delegation.h" #include "iostat.h" -#define NFS_PARANOIA 1 /* #define NFS_DEBUG_VERBOSE 1 */ static int nfs_opendir(struct inode *, struct file *); @@ -1322,11 +1321,6 @@ static int nfs_sillyrename(struct inode atomic_read(>d_count)); nfs_inc_stats(dir, NFSIOS_SILLYRENAME); -#ifdef NFS_PARANOIA -if (!dentry->d_inode) -printk("NFS: silly-renaming %s/%s, negative dentry??\n", -dentry->d_parent->d_name.name, dentry->d_name.name); -#endif /* * We don't allow a dentry to be silly-renamed twice. */ @@ -1641,16 +1635,9 @@ static int nfs_rename(struct inode *old_ new_inode = NULL; /* instantiate the replacement target */ d_instantiate(new_dentry, NULL); - } else if (atomic_read(_dentry->d_count) > 1) { - /* dentry still busy? */ -#ifdef NFS_PARANOIA - printk("nfs_rename: target %s/%s busy, d_count=%d\n", - new_dentry->d_parent->d_name.name, - new_dentry->d_name.name, - atomic_read(_dentry->d_count)); -#endif + } else if (atomic_read(_dentry->d_count) > 1) + /* dentry still busy? */ goto out; - } } else drop_nlink(new_inode); diff --git a/fs/nfs/getroot.c b/fs/nfs/getroot.c index 8391bd7..4dc193f 100644 --- a/fs/nfs/getroot.c +++ b/fs/nfs/getroot.c @@ -42,7 +42,6 @@ #include "delegation.h" #include "internal.h" #define NFSDBG_FACILITYNFSDBG_CLIENT -#define NFS_PARANOIA 1 /* * get an NFS2/NFS3 root dentry from the root filehandle diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 63e4702..d29dfe0 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -48,7 +48,6 @@ #include "iostat.h" #include "internal.h" #define NFSDBG_FACILITYNFSDBG_VFS -#define NFS_PARANOIA 1 static void nfs_invalidate_inode(struct inode *); static int nfs_update_inode(struct inode *, struct nfs_fattr *); @@ -1022,10 +1021,8 @@ static int nfs_update_inode(struct inode /* * Big trouble! The inode has become a different object. */ -#ifdef NFS_PARANOIA printk(KERN_DEBUG "%s: inode %ld mode changed, %07o to %07o\n", __FUNCTION__, inode->i_ino, inode->i_mode, fattr->mode); -#endif out_err: /* * No need to worry about unhashing the dentry, as the diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c index 3be4e72..1fc757b 100644 --- a/fs/nfs/nfs2xdr.c +++ b/fs/nfs/nfs2xdr.c @@ -26,7 +26,6 @@ #include #include "internal.h" #define NFSDBG_FACILITYNFSDBG_XDR -/* #define NFS_PARANOIA 1 */ /* Mapping from NFS error code to "errno" error code. */ #define errno_NFSERR_IOEIO diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index ca4b1d4..7e32bf3 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -19,8 +19,6 @@ #include #include #include -#define NFS_PARANOIA 1 - static struct kmem_cache *nfs_page_cachep; static inline struct nfs_page * @@ -172,11 +170,6 @@ nfs_release_request(struct nfs_page *req if (!atomic_dec_and_test(>wb_count)) return; -#ifdef NFS_PARANOIA - BUG_ON (!list_empty(>wb_list)); - BUG_ON (NFS_WBACK_BUSY(req)); -#endif - /* Release struct file or cached credential */ nfs_clear_request(req); put_nfs_open_context(req->wb_context); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5][time][x86_64] Re-enable vsyscall support for x86_64
Cleanup and re-enable vsyscall gettimeofday using the generic clocksource infrastructure. Signed-off-by: John Stultz <[EMAIL PROTECTED]> arch/x86_64/Kconfig |4 + arch/x86_64/kernel/hpet.c|6 + arch/x86_64/kernel/time.c|6 - arch/x86_64/kernel/tsc.c |7 ++ arch/x86_64/kernel/vmlinux.lds.S | 28 +++-- arch/x86_64/kernel/vsyscall.c| 121 +++ include/asm-x86_64/proto.h |2 include/asm-x86_64/timex.h |1 include/asm-x86_64/vsyscall.h| 33 +- 9 files changed, 105 insertions(+), 103 deletions(-) linux-2.6.20-rc1_timeofday-arch-x86-64-vsyscall-reenablement_C7.patch diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig index e1d044c..98b11c6 100644 --- a/arch/x86_64/Kconfig +++ b/arch/x86_64/Kconfig @@ -28,6 +28,10 @@ config GENERIC_TIME bool default y +config GENERIC_TIME_VSYSCALL + bool + default y + config ZONE_DMA32 bool default y diff --git a/arch/x86_64/kernel/hpet.c b/arch/x86_64/kernel/hpet.c index 74d95d0..cd834cc 100644 --- a/arch/x86_64/kernel/hpet.c +++ b/arch/x86_64/kernel/hpet.c @@ -442,6 +442,11 @@ static cycle_t read_hpet(void) return (cycle_t)readl(hpet_ptr); } +static cycle_t __vsyscall_fn vread_hpet(void) +{ + return (cycle_t)readl((void *)fix_to_virt(VSYSCALL_HPET) + 0xf0); +} + struct clocksource clocksource_hpet = { .name = "hpet", .rating = 250, @@ -450,6 +455,7 @@ struct clocksource clocksource_hpet = { .mult = 0, /* set below */ .shift = HPET_SHIFT, .is_continuous = 1, + .vread = vread_hpet, }; static int __init init_hpet_clocksource(void) diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c index 4bc737c..17bb7de 100644 --- a/arch/x86_64/kernel/time.c +++ b/arch/x86_64/kernel/time.c @@ -53,13 +53,7 @@ DEFINE_SPINLOCK(rtc_lock); EXPORT_SYMBOL(rtc_lock); DEFINE_SPINLOCK(i8253_lock); -unsigned long vxtime_hz = PIT_TICK_RATE; - -struct vxtime_data __vxtime __section_vxtime; /* for vsyscalls */ - volatile unsigned long __jiffies __section_jiffies = INITIAL_JIFFIES; -struct timespec __xtime __section_xtime; -struct timezone __sys_tz __section_sys_tz; unsigned long profile_pc(struct pt_regs *regs) { diff --git a/arch/x86_64/kernel/tsc.c b/arch/x86_64/kernel/tsc.c index 958ec0a..f16733e 100644 --- a/arch/x86_64/kernel/tsc.c +++ b/arch/x86_64/kernel/tsc.c @@ -185,6 +185,12 @@ static cycle_t read_tsc(void) return ret; } +static cycle_t __vsyscall_fn vread_tsc(void) +{ + cycle_t ret = (cycle_t)get_cycles_sync(); + return ret; +} + static struct clocksource clocksource_tsc = { .name = "tsc", .rating = 300, @@ -194,6 +200,7 @@ static struct clocksource clocksource_ts .shift = 22, .update_callback= tsc_update_callback, .is_continuous = 1, + .vread = vread_tsc, }; static int tsc_update_callback(void) diff --git a/arch/x86_64/kernel/vmlinux.lds.S b/arch/x86_64/kernel/vmlinux.lds.S index 1e54ddf..adb4263 100644 --- a/arch/x86_64/kernel/vmlinux.lds.S +++ b/arch/x86_64/kernel/vmlinux.lds.S @@ -88,31 +88,25 @@ #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET __vsyscall_0 = VSYSCALL_VIRT_ADDR; . = ALIGN(CONFIG_X86_L1_CACHE_BYTES); - .xtime_lock : AT(VLOAD(.xtime_lock)) { *(.xtime_lock) } - xtime_lock = VVIRT(.xtime_lock); - - .vxtime : AT(VLOAD(.vxtime)) { *(.vxtime) } - vxtime = VVIRT(.vxtime); + .vsyscall_fn : AT(VLOAD(.vsyscall_fn)) { *(.vsyscall_fn) } + . = ALIGN(CONFIG_X86_L1_CACHE_BYTES); + .vsyscall_gtod_data : AT(VLOAD(.vsyscall_gtod_data)) + { *(.vsyscall_gtod_data) } + vsyscall_gtod_data = VVIRT(.vsyscall_gtod_data); .vgetcpu_mode : AT(VLOAD(.vgetcpu_mode)) { *(.vgetcpu_mode) } vgetcpu_mode = VVIRT(.vgetcpu_mode); - .sys_tz : AT(VLOAD(.sys_tz)) { *(.sys_tz) } - sys_tz = VVIRT(.sys_tz); - - .sysctl_vsyscall : AT(VLOAD(.sysctl_vsyscall)) { *(.sysctl_vsyscall) } - sysctl_vsyscall = VVIRT(.sysctl_vsyscall); - - .xtime : AT(VLOAD(.xtime)) { *(.xtime) } - xtime = VVIRT(.xtime); - . = ALIGN(CONFIG_X86_L1_CACHE_BYTES); .jiffies : AT(VLOAD(.jiffies)) { *(.jiffies) } jiffies = VVIRT(.jiffies); - .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1)) { *(.vsyscall_1) } - .vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) { *(.vsyscall_2) } - .vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) { *(.vsyscall_3) } + .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1)) + { *(.vsyscall_1) } + .vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) + { *(.vsyscall_2) } + .vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) + { *(.vsyscall_3) } .
Re: 2.6.18 mmap hangs unrelated apps
On Tue, 2006-12-19 at 19:17 -0500, Trond Myklebust wrote: > Ack, I'll add one in. If PagePrivate() is set during the call to > try_to_release_page(), then the page should never be freeable. OK. This one actually compiles, and eliminates a few logic bugs. Note that I renamed the callback to ->launder_page() for clarity (and for histerical reasons). Cheers Trond commit 85a5b844c56706a5e3f47cde8b82109d325ad609 Author: Trond Myklebust <[EMAIL PROTECTED]> Date: Tue Dec 19 20:18:55 2006 -0500 NFS: Fix race in nfs_release_page() invalidate_inode_pages2() may find the dirty bit has been set on a page owing to the fact that the page may still be mapped after it was locked. Only after the call to unmap_mapping_range() are we sure that the page can no longer be dirtied. In order to fix this, NFS has hooked the releasepage() method and tries to write the page out between the call to unmap_mapping_range() and the call to remove_mapping(). This, however leads to deadlocks in the page reclaim code, where the page may be locked without holding a reference to the inode or dentry. Fix is to add a new address_space_operation, launder_page(), which will attempt to write out a dirty page without releasing the page lock. Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]> --- Documentation/filesystems/Locking |8 fs/nfs/file.c | 16 include/linux/fs.h|1 + mm/truncate.c | 23 ++- 4 files changed, 35 insertions(+), 13 deletions(-) diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 790ef6f..28bfea7 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -171,6 +171,7 @@ prototypes: int (*releasepage) (struct page *, int); int (*direct_IO)(int, struct kiocb *, const struct iovec *iov, loff_t offset, unsigned long nr_segs); + int (*launder_page) (struct page *); locking rules: All except set_page_dirty may block @@ -188,6 +189,7 @@ bmap: yes invalidatepage:no yes releasepage: no yes direct_IO: no +launder_page: no yes ->prepare_write(), ->commit_write(), ->sync_page() and ->readpage() may be called from the request handler (/dev/loop). @@ -281,6 +283,12 @@ buffers from the page in preparation for indicate that the buffers are (or may be) freeable. If ->releasepage is zero, the kernel assumes that the fs has no private interest in the buffers. + ->launder_page() may be called prior to releasing a page if +it is still found to be dirty. It returns zero if the page was successfully +cleaned, or an error value if not. Note that in order to prevent the page +getting mapped back in and redirtied, it needs to be kept locked +across the entire operation. + Note: currently almost all instances of address_space methods are using BKL for internal serialization and that's one of the worst sources of contention. Normally they are calling library functions (in fs/buffer.c) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 0dd6be3..fab20d0 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -315,14 +315,13 @@ static void nfs_invalidate_page(struct p static int nfs_release_page(struct page *page, gfp_t gfp) { - /* -* Avoid deadlock on nfs_wait_on_request(). -*/ - if (!(gfp & __GFP_FS)) - return 0; - /* Hack... Force nfs_wb_page() to write out the page */ - SetPageDirty(page); - return !nfs_wb_page(page->mapping->host, page); + /* If PagePrivate() is set, then the page is not freeable */ + return 0; +} + +static int nfs_launder_page(struct page *page) +{ + return nfs_wb_page(page->mapping->host, page); } const struct address_space_operations nfs_file_aops = { @@ -338,6 +337,7 @@ const struct address_space_operations nf #ifdef CONFIG_NFS_DIRECTIO .direct_IO = nfs_direct_IO, #endif + .launder_page = nfs_launder_page, }; static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov, diff --git a/include/linux/fs.h b/include/linux/fs.h index 186da81..14a337c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -426,6 +426,7 @@ struct address_space_operations { /* migrate the contents of a page to the specified target */ int (*migratepage) (struct address_space *, struct page *, struct page *); + int (*launder_page) (struct page *); }; struct backing_dev_info; diff --git a/mm/truncate.c b/mm/truncate.c index 9bfb8e8..d4811dc 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -321,6 +321,16 @@ failed: return 0; } +static int +do_launder_page(struct address_space *mapping, struct
[PATCH 2/5][time][x86_64] hpet_address cleanup
In preparation for supporting generic timekeeping, this patch cleans up x86-64's use of vxtime.hpet_address, changing it to just hpet_address as is also used in i386. This is necessary since the vxtime structure will be going away. Signed-off-by: John Stultz <[EMAIL PROTECTED]> arch/i386/kernel/acpi/boot.c | 23 ++- arch/x86_64/kernel/apic.c|3 ++- arch/x86_64/kernel/time.c| 36 +++- include/asm-x86_64/hpet.h|1 + 4 files changed, 28 insertions(+), 35 deletions(-) linux-2.6.20-rc1_timeofday-arch-x86-64-hpet-address-cleanup_C7.patch diff --git a/arch/i386/kernel/acpi/boot.c b/arch/i386/kernel/acpi/boot.c index c8f96cf..464f95b 100644 --- a/arch/i386/kernel/acpi/boot.c +++ b/arch/i386/kernel/acpi/boot.c @@ -638,6 +638,7 @@ static int __init acpi_parse_sbf(unsigne } #ifdef CONFIG_HPET_TIMER +#include static int __init acpi_parse_hpet(unsigned long phys, unsigned long size) { @@ -671,32 +672,20 @@ #define HPET_RESOURCE_NAME_SIZE 9 hpet_res->end = (1 * 1024) - 1; } + hpet_address = hpet_tbl->addr.addrl; #ifdef CONFIG_X86_64 - vxtime.hpet_address = hpet_tbl->addr.addrl | - ((long)hpet_tbl->addr.addrh << 32); - + hpet_address |= ((long)hpet_tbl->addr.addrh << 32); +#endif printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n", - hpet_tbl->id, vxtime.hpet_address); - - res_start = vxtime.hpet_address; -#else /* X86 */ - { - extern unsigned long hpet_address; + hpet_tbl->id, hpet_address); - hpet_address = hpet_tbl->addr.addrl; - printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n", - hpet_tbl->id, hpet_address); - - res_start = hpet_address; - } -#endif /* X86 */ + res_start = hpet_address; if (hpet_res) { hpet_res->start = res_start; hpet_res->end += res_start; insert_resource(_resource, hpet_res); } - return 0; } #else diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c index 124b2d2..7ce7797 100644 --- a/arch/x86_64/kernel/apic.c +++ b/arch/x86_64/kernel/apic.c @@ -37,6 +37,7 @@ #include #include #include #include +#include #include int apic_mapped; @@ -763,7 +764,7 @@ static void setup_APIC_timer(unsigned in local_irq_save(flags); /* wait for irq slice */ - if (vxtime.hpet_address && hpet_use_timer) { + if (hpet_address && hpet_use_timer) { int trigger = hpet_readl(HPET_T0_CMP); while (hpet_readl(HPET_COUNTER) >= trigger) /* do nothing */ ; diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c index 9f05bc9..af9b072 100644 --- a/arch/x86_64/kernel/time.c +++ b/arch/x86_64/kernel/time.c @@ -67,6 +67,7 @@ #define US_SCALE 32 /* 2^32, arbitralril unsigned int cpu_khz; /* TSC clocks / usec, not used here */ EXPORT_SYMBOL(cpu_khz); +unsigned long hpet_address; static unsigned long hpet_period; /* fsecs / HPET clock */ unsigned long hpet_tick; /* HPET clocks / interrupt */ int hpet_use_timer;/* Use counter of hpet for time keeping, otherwise PIT */ @@ -316,7 +317,7 @@ static noinline void handle_lost_ticks(i KERN_WARNING "Your time source seems to be instable or " "some driver is hogging interupts\n"); print_symbol("rip %s\n", get_irq_regs()->rip); - if (vxtime.mode == VXTIME_TSC && vxtime.hpet_address) { + if (vxtime.mode == VXTIME_TSC && hpet_address) { printk(KERN_WARNING "Falling back to HPET\n"); if (hpet_use_timer) vxtime.last = hpet_readl(HPET_T0_CMP) - @@ -324,6 +325,7 @@ static noinline void handle_lost_ticks(i else vxtime.last = hpet_readl(HPET_COUNTER); vxtime.mode = VXTIME_HPET; + vxtime.hpet_address = hpet_address; do_gettimeoffset = do_gettimeoffset_hpet; } /* else should fall back to PIT, but code missing. */ @@ -354,7 +356,7 @@ void main_timer_handler(void) write_seqlock(_lock); - if (vxtime.hpet_address) + if (hpet_address) offset = hpet_readl(HPET_COUNTER); if (hpet_use_timer) { @@ -717,7 +719,7 @@ static __init int late_hpet_init(void) struct hpet_datahd; unsigned intntimer; - if (!vxtime.hpet_address) + if (!hpet_address) return 0; memset(, 0,
[PATCH 4/5][time][x86_64] Convert x86_64 to use GENERIC_TIME
This patch converts x86_64 to use the GENERIC_TIME infrastructure and adds clocksource structures for both TSC and HPET (ACPI PM is shared w/ i386). Signed-off-by: John Stultz <[EMAIL PROTECTED]> arch/x86_64/Kconfig|4 arch/x86_64/kernel/apic.c |2 arch/x86_64/kernel/hpet.c | 65 arch/x86_64/kernel/pmtimer.c | 58 --- arch/x86_64/kernel/smpboot.c |1 arch/x86_64/kernel/time.c | 301 - arch/x86_64/kernel/tsc.c | 108 -- drivers/char/hangcheck-timer.c |2 include/asm-x86_64/proto.h |2 include/asm-x86_64/timex.h |5 10 files changed, 137 insertions(+), 411 deletions(-) linux-2.6.20-rc1_timeofday-arch-x86-64-generic-time-conversion_C7.patch diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig index d427553..e1d044c 100644 --- a/arch/x86_64/Kconfig +++ b/arch/x86_64/Kconfig @@ -24,6 +24,10 @@ config X86 bool default y +config GENERIC_TIME + bool + default y + config ZONE_DMA32 bool default y diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c index 7ce7797..723417d 100644 --- a/arch/x86_64/kernel/apic.c +++ b/arch/x86_64/kernel/apic.c @@ -786,7 +786,7 @@ static void setup_APIC_timer(unsigned in /* Turn off PIT interrupt if we use APIC timer as main timer. Only works with the PM timer right now TBD fix it for HPET too. */ - if (vxtime.mode == VXTIME_PMTMR && + if ((pmtmr_ioport != 0) && smp_processor_id() == boot_cpu_id && apic_runs_main_timer == 1 && !cpu_isset(boot_cpu_id, timer_interrupt_broadcast_ipi_mask)) { diff --git a/arch/x86_64/kernel/hpet.c b/arch/x86_64/kernel/hpet.c index ad67c6b..74d95d0 100644 --- a/arch/x86_64/kernel/hpet.c +++ b/arch/x86_64/kernel/hpet.c @@ -21,12 +21,6 @@ unsigned long hpet_tick; /* HPET clocks int hpet_use_timer;/* Use counter of hpet for time keeping, * otherwise PIT */ -unsigned int do_gettimeoffset_hpet(void) -{ - /* cap counter read to one tick to avoid inconsistencies */ - unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last; - return (min(counter,hpet_tick) * vxtime.quot) >> US_SCALE; -} #ifdef CONFIG_HPET static __init int late_hpet_init(void) @@ -435,3 +429,62 @@ static int __init nohpet_setup(char *s) __setup("nohpet", nohpet_setup); +#define HPET_MASK 0x +#define HPET_SHIFT 22 + +/* FSEC = 10^-15 NSEC = 10^-9 */ +#define FSEC_PER_NSEC 100 + +static void *hpet_ptr; + +static cycle_t read_hpet(void) +{ + return (cycle_t)readl(hpet_ptr); +} + +struct clocksource clocksource_hpet = { + .name = "hpet", + .rating = 250, + .read = read_hpet, + .mask = (cycle_t)HPET_MASK, + .mult = 0, /* set below */ + .shift = HPET_SHIFT, + .is_continuous = 1, +}; + +static int __init init_hpet_clocksource(void) +{ + unsigned long hpet_period; + void __iomem *hpet_base; + u64 tmp; + + if (!hpet_address) + return -ENODEV; + + /* calculate the hpet address: */ + hpet_base = + (void __iomem*)ioremap_nocache(hpet_address, HPET_MMAP_SIZE); + hpet_ptr = hpet_base + HPET_COUNTER; + + /* calculate the frequency: */ + hpet_period = readl(hpet_base + HPET_PERIOD); + + /* +* hpet period is in femto seconds per cycle +* so we need to convert this to ns/cyc units +* aproximated by mult/2^shift +* +* fsec/cyc * 1nsec/100fsec = nsec/cyc = mult/2^shift +* fsec/cyc * 1ns/100fsec * 2^shift = mult +* fsec/cyc * 2^shift * 1nsec/100fsec = mult +* (fsec/cyc << shift)/100 = mult +* (hpet_period << shift)/FSEC_PER_NSEC = mult +*/ + tmp = (u64)hpet_period << HPET_SHIFT; + do_div(tmp, FSEC_PER_NSEC); + clocksource_hpet.mult = (u32)tmp; + + return clocksource_register(_hpet); +} + +module_init(init_hpet_clocksource); diff --git a/arch/x86_64/kernel/pmtimer.c b/arch/x86_64/kernel/pmtimer.c index 7554458..ae8f912 100644 --- a/arch/x86_64/kernel/pmtimer.c +++ b/arch/x86_64/kernel/pmtimer.c @@ -24,15 +24,6 @@ #include #include #include -/* The I/O port the PMTMR resides at. - * The location is detected during setup_arch(), - * in arch/i386/kernel/acpi/boot.c */ -u32 pmtmr_ioport __read_mostly; - -/* value of the Power timer at last timer interrupt */ -static u32 offset_delay; -static u32 last_pmtmr_tick; - #define ACPI_PM_MASK 0xFF /* limit it to 24 bits */ static inline u32 cyc2us(u32 cycles) @@ -48,38 +39,6 @@ static inline u32 cyc2us(u32 cycles) return (cycles >> 10); } -int
[PATCH 3/5][time][x86_64] Split x86_64/kernel/time.c up
In preparation for the x86_64 generic time conversion, this patch splits out TSC and HPET related code from arch/x86_64/kernel/time.c into respective hpet.c and tsc.c files. Signed-off-by: John Stultz <[EMAIL PROTECTED]> arch/x86_64/kernel/Makefile |2 arch/x86_64/kernel/hpet.c | 437 ++ arch/x86_64/kernel/time.c | 628 arch/x86_64/kernel/tsc.c| 201 ++ include/asm-x86_64/hpet.h |6 include/asm-x86_64/timex.h | 11 6 files changed, 660 insertions(+), 625 deletions(-) linux-2.6.20-rc1_timeofday-arch-x86-64-split-hpet-tsc-time_C7.patch diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile index 3c7cbff..e68a87e 100644 --- a/arch/x86_64/kernel/Makefile +++ b/arch/x86_64/kernel/Makefile @@ -8,7 +8,7 @@ obj-y := process.o signal.o entry.o trap ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \ x8664_ksyms.o i387.o syscall.o vsyscall.o \ setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \ - pci-dma.o pci-nommu.o alternative.o + pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_X86_MCE) += mce.o therm_throt.o diff --git a/arch/x86_64/kernel/hpet.c b/arch/x86_64/kernel/hpet.c new file mode 100644 index 000..ad67c6b --- /dev/null +++ b/arch/x86_64/kernel/hpet.c @@ -0,0 +1,437 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +int nohpet __initdata = 0; + +unsigned long hpet_address; +unsigned long hpet_period; /* fsecs / HPET clock */ +unsigned long hpet_tick; /* HPET clocks / interrupt */ + +int hpet_use_timer;/* Use counter of hpet for time keeping, +* otherwise PIT +*/ +unsigned int do_gettimeoffset_hpet(void) +{ + /* cap counter read to one tick to avoid inconsistencies */ + unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last; + return (min(counter,hpet_tick) * vxtime.quot) >> US_SCALE; +} + +#ifdef CONFIG_HPET +static __init int late_hpet_init(void) +{ + struct hpet_datahd; + unsigned intntimer; + + if (!hpet_address) + return 0; + + memset(, 0, sizeof (hd)); + + ntimer = hpet_readl(HPET_ID); + ntimer = (ntimer & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT; + ntimer++; + + /* +* Register with driver. +* Timer0 and Timer1 is used by platform. +*/ + hd.hd_phys_address = hpet_address; + hd.hd_address = (void __iomem *)fix_to_virt(FIX_HPET_BASE); + hd.hd_nirqs = ntimer; + hd.hd_flags = HPET_DATA_PLATFORM; + hpet_reserve_timer(, 0); +#ifdef CONFIG_HPET_EMULATE_RTC + hpet_reserve_timer(, 1); +#endif + hd.hd_irq[0] = HPET_LEGACY_8254; + hd.hd_irq[1] = HPET_LEGACY_RTC; + if (ntimer > 2) { + struct hpet *hpet; + struct hpet_timer *timer; + int i; + + hpet = (struct hpet *) fix_to_virt(FIX_HPET_BASE); + timer = >hpet_timers[2]; + for (i = 2; i < ntimer; timer++, i++) + hd.hd_irq[i] = (timer->hpet_config & + Tn_INT_ROUTE_CNF_MASK) >> + Tn_INT_ROUTE_CNF_SHIFT; + + } + + hpet_alloc(); + return 0; +} +fs_initcall(late_hpet_init); +#endif + +int hpet_timer_stop_set_go(unsigned long tick) +{ + unsigned int cfg; + +/* + * Stop the timers and reset the main counter. + */ + + cfg = hpet_readl(HPET_CFG); + cfg &= ~(HPET_CFG_ENABLE | HPET_CFG_LEGACY); + hpet_writel(cfg, HPET_CFG); + hpet_writel(0, HPET_COUNTER); + hpet_writel(0, HPET_COUNTER + 4); + +/* + * Set up timer 0, as periodic with first interrupt to happen at hpet_tick, + * and period also hpet_tick. + */ + if (hpet_use_timer) { + hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL | + HPET_TN_32BIT, HPET_T0_CFG); + hpet_writel(hpet_tick, HPET_T0_CMP); /* next interrupt */ + hpet_writel(hpet_tick, HPET_T0_CMP); /* period */ + cfg |= HPET_CFG_LEGACY; + } +/* + * Go! + */ + + cfg |= HPET_CFG_ENABLE; + hpet_writel(cfg, HPET_CFG); + + return 0; +} + +int hpet_arch_init(void) +{ + unsigned int id; + + if (!hpet_address) + return -1; + set_fixmap_nocache(FIX_HPET_BASE, hpet_address); + __set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE); + +/* + * Read the period, compute tick and quotient. + */ + + id = hpet_readl(HPET_ID); + +
[PATCH 0/5][time][x86_64] GENERIC_TIME patchset for x86_64
Andrew, Andi, I didn't hear any objections (or really, any comments) on my last release, so as I mentioned then, I want to go ahead and push this to Andrew for a bit of testing in -mm. Hopefully targeting for inclusion in 2.6.21 or 2.6.22. Here's the performance data from the last release: Vanilla TSC: 149 nsecs per gtod call 367 nsecs per CLOCK_MONOTONIC call 288 nsecs per CLOCK_REALTIME call Vanilla ACPI PM: 1272 nsecs per gtod call 1335 nsecs per CLOCK_MONOTONIC call 1273 nsecs per CLOCK_REALTIME call GENERIC_TIME TSC: 149 nsecs per gtod call 304 nsecs per CLOCK_MONOTONIC call 275 nsecs per CLOCK_REALTIME call GENERIC_TIME ACPI PM: 1273 nsecs per gtod call 1275 nsecs per CLOCK_MONOTONIC call 1273 nsecs per CLOCK_REALTIME call So almost no performance change. New in the current C8 release: o Synced up w/ 2.6.20-rc1 o Added a few small cleanups from Ingo Let me know if you have any thoughts or comments! thanks again! -john - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5][time][generic] vsyscall-gtod support for GENERIC_TIME
Provides generic infrastructure for vsyscall-gtod. Signed-off-by: John Stultz <[EMAIL PROTECTED]> include/linux/clocksource.h |8 kernel/timer.c |1 + 2 files changed, 9 insertions(+) linux-2.6.20-rc1_timeofday-vsyscall-support_C7.patch diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 1622d23..6899ef3 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -46,6 +46,7 @@ typedef u64 cycle_t; * @shift: cycle to nanosecond divisor (power of two) * @update_callback: called when safe to alter clocksource values * @is_continuous: defines if clocksource is free-running. + * @vread: vsyscall based read * @cycle_interval:Used internally by timekeeping core, please ignore. * @xtime_interval:Used internally by timekeeping core, please ignore. */ @@ -59,6 +60,7 @@ struct clocksource { u32 shift; int (*update_callback)(void); int is_continuous; + cycle_t (*vread)(void); /* timekeeping specific data, ignore */ cycle_t cycle_last, cycle_interval; @@ -182,4 +184,10 @@ int clocksource_register(struct clocksou void clocksource_reselect(void); struct clocksource* clocksource_get_next(void); +#ifdef CONFIG_GENERIC_TIME_VSYSCALL +extern void update_vsyscall(struct timespec *ts, struct clocksource *c); +#else +#define update_vsyscall(now, c) do { } while(0) +#endif + #endif /* _LINUX_CLOCKSOURCE_H */ diff --git a/kernel/timer.c b/kernel/timer.c index feddf81..d7a41e7 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1094,6 +1094,7 @@ #endif clock->xtime_nsec = 0; clocksource_calculate_interval(clock, tick_nsec); } + update_vsyscall(, clock); } /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL only modules
On Dec 19, 2006, "Horst H. von Brand" <[EMAIL PROTECTED]> wrote: > Sanjoy Mahajan <[EMAIL PROTECTED]> wrote: >> This License acknowledges your rights of "fair use" or other >> equivalent, as provided by copyright law. >> By choosing 'acknowledges' as the verb, the licensee says explicitly >> that fair-use rights are already yours, not that they are being given >> to you. > Pure noise, a license can't take them away in any case. Yeah, that's merely informative, indeed. Point is to ensure people know their rights, while at the same time avoiding giving impressions such the one Linus somehow got. > [That is my pet pevee with GPL: It has a bit of legally binding text, and > lots of "explanation" and "philosophy" that don't add anything but > confusion. A clear-cut license plus an explanation/comment would have been > better. IMHO, IANAL. HAND.] This bit would probably fit better in the spirit (preamble) than in the letter. That's why I filed the comment about it in the preamble. -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL only modules
On Dec 19, 2006, "D. Hazelton" <[EMAIL PROTECTED]> wrote: > However I have a feeling that the lawyers in the employ of the > companies that ship BLOB drivers say that all they need to do to > comply with the GPL is to ship the glue-code in source form. > And I have to admit that this does seem to comply with the GPL - to the > letter, if not the spirit. I don't see that it does comply even with the letter. Consider this: These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. The work, in this case, is the GPLed glue code, in source form, and the binary blob, without sources. See that, even though the binary blob is an independent and separate work in itself, and so it can indeed be distributed separaly under a different license, when it's distributed as part of a whole, then the whole must be on the terms of the GPL. So the question becomes whether the copyright holder of the glue code bound by these GPL terms. (a) If the glue code can be shown to be a derived work from Linux, even in source form, then the copyright holder *is* bound by these terms, and thus the whole could only be distributed under the GPL, so including the binary blob would be in violation of the license. (b) Now, if the glue code is *not* a derived work from Linux, then the copyright holder is entitled to use whatever terms she likes. It could be any license whatsoever, that permits the distribution of the whole or of the parts with whatever constraints copyright law permitted. Why would they choose the GPL in this case, then? Let's assume they're not intentionally violating the GPL, but rather that they believe they're entitled to do what they're doing, i.e., that they believe (a) their glue code is not a derived work from Linux. In this case, they *can* distribute the glue source code under the GPL along with their binary blob. But can anyone else? Methinks anyone else would be entitled to pass the same whole along under the GPL, per section 1, but wouldn't be entitled to distribute modified versions, because this would require the derived work to be licensed under the GPL, and nobody else is able to provide the source code to the binary blob. And then, who'd be entitled to complain? Only the copyright holder of the glue code and the binary blob. Would you like to be on the wrong end of a copyright infringement lawsuit by one of these binary blob distributors for distributing a patched version of their glue code + binary blob? More to the point, do you think they would actually bring suit, just to make it clear that the whole point is for them to keep a monopoly on the rights to modify and then distribute the combined work, in spite of using the GPL for (part of) the work? It gets trickier for binaries, since they are quite possibly derived works from the kernel, licensed under the GPL. If they are, they can't be distributed at all, not even by the copyright holder of the glue code + binary blob. If they aren't, then the copyright holder can distribute them, but nobody else can because that would be a violation of the GPL, as in the discussion above. So, the copyright holder would be keeping a monopoly on the rights to distribute binaries, and anyone else could be sued by them. Sure enough, one might think of praising them for distributing the glue code under the GPL. Then others could take this glue code and use it for something else that is useful, right? Well... Not quite. For one, even if enabling others to distribute glue code + binary blobs were a good thing, using somebody else's glue code means you're bound by the GPL requirements, so you can't ship the combination of the glue code with your binary blob. And then, if you intend to use the glue code to plug in some other code that is GPL-compatible in the kernel, perhaps you'd be better off not using the glue code at all, but rather modifying the GPL-compatible code to fit. So, even if condoning binary blobs were morally acceptable, we still wouldn't be gaining anything from this relationship, we'd only be enabling vendors to sell us their undocumented hardware while denying us our freedoms. Why should we do this? -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org} - To unsubscribe from
[PATCH 2/4] Add device probing and sysfs integration.
Signed-off-by: Kristian Hoegsberg <[EMAIL PROTECTED]> --- drivers/firewire/Makefile |3 drivers/firewire/fw-card.c| 56 +++ drivers/firewire/fw-device-cdev.c | 617 + drivers/firewire/fw-device-cdev.h | 146 + drivers/firewire/fw-device.c | 613 + drivers/firewire/fw-device.h | 127 drivers/firewire/fw-iso.c |1 drivers/firewire/fw-topology.c| 10 - drivers/firewire/fw-transaction.c |5 drivers/firewire/fw-transaction.h |4 10 files changed, 1573 insertions(+), 9 deletions(-) diff --git a/drivers/firewire/Makefile b/drivers/firewire/Makefile index db7020d..da77bc0 100644 --- a/drivers/firewire/Makefile +++ b/drivers/firewire/Makefile @@ -2,6 +2,7 @@ # # Makefile for the Linux IEEE 1394 implementation # -fw-core-objs := fw-card.o fw-topology.o fw-transaction.o fw-iso.o +fw-core-objs := fw-card.o fw-topology.o fw-transaction.o fw-iso.o \ + fw-device.o fw-device-cdev.o obj-$(CONFIG_FW) += fw-core.o diff --git a/drivers/firewire/fw-card.c b/drivers/firewire/fw-card.c index d8abd70..7977390 100644 --- a/drivers/firewire/fw-card.c +++ b/drivers/firewire/fw-card.c @@ -24,6 +24,7 @@ #include #include #include "fw-transaction.h" #include "fw-topology.h" +#include "fw-device.h" /* The lib/crc16.c implementation uses the standard (0x8005) * polynomial, but we need the ITU-T (or CCITT) polynomial (0x1021). @@ -186,6 +187,59 @@ fw_core_remove_descriptor (struct fw_des EXPORT_SYMBOL(fw_core_remove_descriptor); static void +fw_card_irm_work(struct work_struct *work) +{ + struct fw_card *card = + container_of(work, struct fw_card, work.work); + struct fw_device *root; + unsigned long flags; + int new_irm_id, generation; + + /* FIXME: This simple bus management unconditionally picks a +* cycle master if the current root can't do it. We need to +* not do this if there is a bus manager already. Also, some +* hubs set the contender bit, which is bogus, so we should +* probably do a little sanity check on the IRM (like, read +* the bandwidth register) if it's not us. */ + + spin_lock_irqsave(>lock, flags); + + generation = card->generation; + root = card->root_node->data; + + if (root == NULL) + /* Either link_on is false, or we failed to read the +* config rom. In either case, pick another root. */ + new_irm_id = card->local_node->node_id; + else if (root->state != FW_DEVICE_RUNNING) + /* If we haven't probed this device yet, bail out now +* and let's try again once that's done. */ + new_irm_id = -1; + else if (root->config_rom[2] & bib_cmc) + /* FIXME: I suppose we should set the cmstr bit in the +* STATE_CLEAR register of this node, as described in +* 1394-1995, 8.4.2.6. Also, send out a force root +* packet for this node. */ + new_irm_id = -1; + else + /* Current root has an active link layer and we +* successfully read the config rom, but it's not +* cycle master capable. */ + new_irm_id = card->local_node->node_id; + + if (card->irm_retries++ > 5) + new_irm_id = -1; + + spin_unlock_irqrestore(>lock, flags); + + if (new_irm_id > 0) { + fw_notify("Trying to become root (card %d)\n", card->index); + fw_send_force_root(card, new_irm_id, generation); + fw_core_initiate_bus_reset(card, 1); + } +} + +static void release_card(struct device *device) { struct fw_card *card = @@ -222,6 +276,8 @@ fw_card_initialize(struct fw_card *card, card->local_node = NULL; + INIT_DELAYED_WORK(>work, fw_card_irm_work); + card->card_device.bus = _bus_type; card->card_device.release = release_card; card->card_device.parent = card->device; diff --git a/drivers/firewire/fw-device-cdev.c b/drivers/firewire/fw-device-cdev.c new file mode 100644 index 000..c10e332 --- /dev/null +++ b/drivers/firewire/fw-device-cdev.c @@ -0,0 +1,617 @@ +/* -*- c-basic-offset: 8 -*- + * + * fw-device-cdev.c - Char device for device raw access + * + * Copyright (C) 2005-2006 Kristian Hoegsberg <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[PATCH 3/4] Add driver for OHCI firewire host controllers.
Signed-off-by: Kristian Hoegsberg <[EMAIL PROTECTED]> --- drivers/firewire/Kconfig | 11 drivers/firewire/Makefile |1 drivers/firewire/fw-ohci.c | 1394 drivers/firewire/fw-ohci.h | 152 + 4 files changed, 1558 insertions(+), 0 deletions(-) diff --git a/drivers/firewire/Kconfig b/drivers/firewire/Kconfig index bdd6303..b386334 100644 --- a/drivers/firewire/Kconfig +++ b/drivers/firewire/Kconfig @@ -20,4 +20,15 @@ config FW To compile this driver as a module, say M here: the module will be called fw-core. +config FW_OHCI + tristate "Support for OHCI firewire host controllers" + depends on PCI && FW + help + Enable this driver if you have an firewire controller based + on the OHCI specification. For all practical purposes, this + is the only chipset in use, so say Y here. + + To compile this driver as a module, say M here: the + module will be called fw-ohci. + endmenu diff --git a/drivers/firewire/Makefile b/drivers/firewire/Makefile index da77bc0..add3b98 100644 --- a/drivers/firewire/Makefile +++ b/drivers/firewire/Makefile @@ -6,3 +6,4 @@ fw-core-objs := fw-card.o fw-topology.o fw-device.o fw-device-cdev.o obj-$(CONFIG_FW) += fw-core.o +obj-$(CONFIG_FW_OHCI) += fw-ohci.o diff --git a/drivers/firewire/fw-ohci.c b/drivers/firewire/fw-ohci.c new file mode 100644 index 000..5392a2b --- /dev/null +++ b/drivers/firewire/fw-ohci.c @@ -0,0 +1,1394 @@ +/* -*- c-basic-offset: 8 -*- + * + * fw-ohci.c - Driver for OHCI 1394 boards + * Copyright (C) 2003-2006 Kristian Hoegsberg <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "fw-transaction.h" +#include "fw-ohci.h" + +#define descriptor_output_more 0 +#define descriptor_output_last (1 << 12) +#define descriptor_input_more (2 << 12) +#define descriptor_input_last (3 << 12) +#define descriptor_status (1 << 11) +#define descriptor_key_immediate (2 << 8) +#define descriptor_ping(1 << 7) +#define descriptor_yy (1 << 6) +#define descriptor_no_irq (0 << 4) +#define descriptor_irq_error (1 << 4) +#define descriptor_irq_always (3 << 4) +#define descriptor_branch_always (3 << 2) + +struct descriptor { + __le16 req_count; + __le16 control; + __le32 data_address; + __le32 branch_address; + __le16 res_count; + __le16 transfer_status; +} __attribute__((aligned(16))); + +struct ar_context { + struct fw_ohci *ohci; + struct descriptor descriptor; + __le32 buffer[512]; + dma_addr_t descriptor_bus; + dma_addr_t buffer_bus; + + u32 command_ptr; + u32 control_set; + u32 control_clear; + + struct tasklet_struct tasklet; +}; + +struct at_context { + struct fw_ohci *ohci; + dma_addr_t descriptor_bus; + dma_addr_t buffer_bus; + + struct list_head list; + + struct { + struct descriptor more; + __le32 header[4]; + struct descriptor last; + } d; + + u32 command_ptr; + u32 control_set; + u32 control_clear; + + struct tasklet_struct tasklet; +}; + +#define it_header_sy(v) ((v) << 0) +#define it_header_tcode(v) ((v) << 4) +#define it_header_channel(v) ((v) << 8) +#define it_header_tag(v) ((v) << 14) +#define it_header_speed(v) ((v) << 16) +#define it_header_data_length(v) ((v) << 16) + +struct iso_context { + struct fw_iso_context base; + struct tasklet_struct tasklet; + u32 control_set; + u32 control_clear; + u32 command_ptr; + u32 context_match; + + struct descriptor *buffer; + dma_addr_t buffer_bus; + struct descriptor *head_descriptor; + struct descriptor *tail_descriptor; + struct descriptor *tail_descriptor_last; + struct descriptor *prev_descriptor; +}; + +#define CONFIG_ROM_SIZE 1024 + +struct fw_ohci { + struct fw_card card; + + __iomem char
[PATCH 4/4] Add SBP-2 protocol driver for storage devices.
Signed-off-by: Kristian Hoegsberg <[EMAIL PROTECTED]> --- drivers/firewire/Kconfig | 12 drivers/firewire/Makefile |1 drivers/firewire/fw-sbp2.c | 1073 3 files changed, 1086 insertions(+), 0 deletions(-) diff --git a/drivers/firewire/Kconfig b/drivers/firewire/Kconfig index b386334..bfab4b3 100644 --- a/drivers/firewire/Kconfig +++ b/drivers/firewire/Kconfig @@ -31,4 +31,16 @@ config FW_OHCI To compile this driver as a module, say M here: the module will be called fw-ohci. +config FW_SBP2 + tristate "Support for storage devices (SBP-2 protocol driver)" + depends on FW && SCSI + help + This option enables you to use SBP-2 devices connected to an + firewire bus. SBP-2 devices include storage devices like + harddisks and DVD drives, also some other FireWire devices + like scanners. + + You should also enable support for disks, CD-ROMs, etc. in the SCSI + configuration section. + endmenu diff --git a/drivers/firewire/Makefile b/drivers/firewire/Makefile index add3b98..b955c99 100644 --- a/drivers/firewire/Makefile +++ b/drivers/firewire/Makefile @@ -7,3 +7,4 @@ fw-core-objs := fw-card.o fw-topology.o obj-$(CONFIG_FW) += fw-core.o obj-$(CONFIG_FW_OHCI) += fw-ohci.o +obj-$(CONFIG_FW_SBP2) += fw-sbp2.o \ No newline at end of file diff --git a/drivers/firewire/fw-sbp2.c b/drivers/firewire/fw-sbp2.c new file mode 100644 index 000..2756e0c --- /dev/null +++ b/drivers/firewire/fw-sbp2.c @@ -0,0 +1,1073 @@ +/* -*- c-basic-offset: 8 -*- + * fw-sbp2.c -- SBP2 driver (SCSI over IEEE1394) + * + * Copyright (C) 2005-2006 Kristian Hoegsberg <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + */ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "fw-transaction.h" +#include "fw-topology.h" +#include "fw-device.h" + +/* I don't know why the SCSI stack doesn't define something like this... */ +typedef void (*scsi_done_fn_t) (struct scsi_cmnd *); + +static const char sbp2_driver_name[] = "sbp2"; + +struct sbp2_device { + struct fw_unit *unit; + struct fw_address_handler address_handler; + struct list_head orb_list; + u64 management_agent_address; + u64 command_block_agent_address; + u32 workarounds; + int login_id; + + /* We cache these addresses and only update them once we've +* logged in or reconnected to the sbp2 device. That way, any +* IO to the device will automatically fail and get retried if +* it happens in a window where the device is not ready to +* handle it (e.g. after a bus reset but before we reconnect). */ + int node_id; + int address_high; + int generation; + + struct work_struct work; + struct Scsi_Host *scsi_host; +}; + +#define SBP2_MAX_SG_ELEMENT_LENGTH 0xf000 +#define SBP2_MAX_SECTORS 255 /* Max sectors supported */ +#define SBP2_MAX_CMDS 8 /* This should be safe */ + +#define SBP2_ORB_NULL 0x8000 + +#define SBP2_DIRECTION_TO_MEDIA0x0 +#define SBP2_DIRECTION_FROM_MEDIA 0x1 + +/* Unit directory keys */ +#define SBP2_COMMAND_SET_SPECIFIER 0x38 +#define SBP2_COMMAND_SET 0x39 +#define SBP2_COMMAND_SET_REVISION 0x3b +#define SBP2_FIRMWARE_REVISION 0x3c + +/* Flags for detected oddities and brokeness */ +#define SBP2_WORKAROUND_128K_MAX_TRANS 0x1 +#define SBP2_WORKAROUND_INQUIRY_36 0x2 +#define SBP2_WORKAROUND_MODE_SENSE_8 0x4 +#define SBP2_WORKAROUND_FIX_CAPACITY 0x8 +#define SBP2_WORKAROUND_OVERRIDE 0x100 + +/* Management orb opcodes */ +#define SBP2_LOGIN_REQUEST 0x0 +#define SBP2_QUERY_LOGINS_REQUEST 0x1 +#define SBP2_RECONNECT_REQUEST 0x3 +#define SBP2_SET_PASSWORD_REQUEST 0x4 +#define SBP2_LOGOUT_REQUEST0x7 +#define SBP2_ABORT_TASK_REQUEST0xb +#define SBP2_ABORT_TASK_SET0xc +#define SBP2_LOGICAL_UNIT_RESET0xe +#define SBP2_TARGET_RESET_REQUEST 0xf + +/* Offsets for command block agent registers */ +#define SBP2_AGENT_STATE
[PATCH 0/4] New firewire stack - updated patches
Hi, Here's a new set of patches for the new firewire stack. The changes since the last set of patches address the issues that were raised on the list and can be reviewed in detail here: http://gitweb.freedesktop.org/?p=users/krh/juju.git but to sum up the changes: - Got rid of bitfields. - Tested on ppc, ppc64 x86-64 and x86. - ioctl interface tested on 32-bit userspace / 64-bit kernels. - ASCIIfied sources. - Incorporated Jeff Garziks comments. - Updated to work with the new workqueue API changes. - Moved subsystem to drivers/firewire from drivers/fw. plus a number of bug fixes. As mentioned last time, the stack still lacks isochronous receive functionality to be on par with the old stack, feature-wise. This is the one remaining piece of feature work kernel-side. When that is done, I have a couple of TODO items in user space: - Make a libraw1394 compatibility library - Port libdv1394 to new isochronous API. which will allow us to move most user space applications to the new stack. That is, even if the new stack provides a new interface for asynchronous and isochronous IO, a lot of applications can still work since the changes are isolated to a couple of libraries. This is still in development and is being discussed on the linux1394-devel list. It will likely require a few changes kernel side in the stack as we figure out how to do this. It is still work in progress, but at least now it should work across all architectures and endianesses. Happy Holidays, Kristian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG on 2.6.20-rc1 when using gdb
On 12/20/06, Andrew Morton <[EMAIL PROTECTED]> wrote: > When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack > trace as the program was running (e.g. I had typed 'run' in gdb): > > WARNING at kernel/softirq.c:137 local_bh_enable() > [] dump_trace+0x68/0x1d9 > [] show_trace_log_lvl+0x18/0x2c > [] show_trace+0xf/0x11 > [] dump_stack+0x12/0x14 > [] local_bh_enable+0x44/0x94 > [] unix_release_sock+0x6e/0x1fe > [] unix_stream_connect+0x3b4/0x3cf > [] sys_connect+0x82/0xad > [] sys_socketcall+0xac/0x261 > [] syscall_call+0x7/0xb > [] 0xb7f70822 > === > [ cut here ] > kernel BUG at fs/buffer.c:1235! > invalid opcode: [#1] > PREEMPT > Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd > exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac > battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative > cpufreq_ondemand cpufreq_performance cpufreq_powersave > speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss > snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir > sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt > crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250 > serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic > pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev > ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix > ide_core ehci_hcd uhci_hcd usbcore > CPU:0 > EIP:0060:[]Not tainted VLI > EFLAGS: 00010046 (2.6.20-rc1 #1) > EIP is at __find_get_block+0x1c/0x16f > eax: 0086 ebx: ecx: edx: 0088a800 > esi: 0088a800 edi: ebp: dfffd040 esp: cad2dd30 > ds: 007b es: 007b ss: 0068 > Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0 > task.ti=cad2c000) > Stack: cad2dd58 c02caa0b 0002 000e 000b 0001 e8836580 > 0088a800 > e8836610 c01793dc 1000 c03ab3e0 > f3cadd80 >0086 c90d41b0 0088a800 dfffd040 8000 > 0002 > Call Trace: > [] __getblk+0x23/0x268 > [] ext3_getblk+0x10b/0x244 [ext3] > [] ext3_bread+0x19/0x70 [ext3] > [] dx_probe+0x43/0x2c9 [ext3] > [] ext3_htree_fill_tree+0x99/0x1ba [ext3] > [] ext3_readdir+0x1d4/0x5ed [ext3] > [] vfs_readdir+0x63/0x8d > [] sys_getdents64+0x63/0xa5 > [] syscall_call+0x7/0xb > [] 0xb7f70822 > === > Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56 > 89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b > eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74 > EIP: [] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30 > > This happens on 2.6.20-rc1 but not 2.6.19. > And it's repeatable, yes? And you're sure that use of gdb triggers it? Something is forgetting to reenable local interrupts. I've managed to get nearly the same thing on a test system I built yesterday, my app when running under gdb would also blow up in __find_get_block. I was using close to Linus's git head... Dave. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG on 2.6.20-rc1 when using gdb
On 12/20/06, Dave Airlie <[EMAIL PROTECTED]> wrote: On 12/20/06, Andrew Morton <[EMAIL PROTECTED]> wrote: > > When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack > > trace as the program was running (e.g. I had typed 'run' in gdb): > > > > WARNING at kernel/softirq.c:137 local_bh_enable() > > [] dump_trace+0x68/0x1d9 > > [] show_trace_log_lvl+0x18/0x2c > > [] show_trace+0xf/0x11 > > [] dump_stack+0x12/0x14 > > [] local_bh_enable+0x44/0x94 > > [] unix_release_sock+0x6e/0x1fe > > [] unix_stream_connect+0x3b4/0x3cf > > [] sys_connect+0x82/0xad > > [] sys_socketcall+0xac/0x261 > > [] syscall_call+0x7/0xb > > [] 0xb7f70822 > > === > > [ cut here ] > > kernel BUG at fs/buffer.c:1235! > > invalid opcode: [#1] > > PREEMPT > > Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd > > exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac > > battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative > > cpufreq_ondemand cpufreq_performance cpufreq_powersave > > speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss > > snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir > > sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt > > crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250 > > serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic > > pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev > > ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix > > ide_core ehci_hcd uhci_hcd usbcore > > CPU:0 > > EIP:0060:[]Not tainted VLI > > EFLAGS: 00010046 (2.6.20-rc1 #1) > > EIP is at __find_get_block+0x1c/0x16f > > eax: 0086 ebx: ecx: edx: 0088a800 > > esi: 0088a800 edi: ebp: dfffd040 esp: cad2dd30 > > ds: 007b es: 007b ss: 0068 > > Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0 > > task.ti=cad2c000) > > Stack: cad2dd58 c02caa0b 0002 000e 000b 0001 e8836580 > > 0088a800 > > e8836610 c01793dc 1000 c03ab3e0 > > f3cadd80 > >0086 c90d41b0 0088a800 dfffd040 8000 > > 0002 > > Call Trace: > > [] __getblk+0x23/0x268 > > [] ext3_getblk+0x10b/0x244 [ext3] > > [] ext3_bread+0x19/0x70 [ext3] > > [] dx_probe+0x43/0x2c9 [ext3] > > [] ext3_htree_fill_tree+0x99/0x1ba [ext3] > > [] ext3_readdir+0x1d4/0x5ed [ext3] > > [] vfs_readdir+0x63/0x8d > > [] sys_getdents64+0x63/0xa5 > > [] syscall_call+0x7/0xb > > [] 0xb7f70822 > > === > > Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56 > > 89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b > > eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74 > > EIP: [] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30 > > > > This happens on 2.6.20-rc1 but not 2.6.19. > > > > And it's repeatable, yes? > > And you're sure that use of gdb triggers it? > > Something is forgetting to reenable local interrupts. I've managed to get nearly the same thing on a test system I built yesterday, my app when running under gdb would also blow up in __find_get_block. I was using close to Linus's git head... And of course it was on a fresh 32-bit x86 with FC6 on it. Dave. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Patch: dynticks: idle load balancing
On Mon, 2006-12-11 at 15:53 -0800, Siddha, Suresh B wrote: > > Comments and review feedback welcome. Minimal testing done on couple of > i386 platforms. Perf testing yet to be done. Nice work! > > thanks, > suresh > --- > diff -pNru linux-2.6.19-mm1/include/linux/sched.h linux/include/linux/sched.h > --- linux-2.6.19-mm1/include/linux/sched.h2006-12-12 06:39:22.0 > -0800 > +++ linux/include/linux/sched.h 2006-12-12 06:51:03.0 -0800 > @@ -195,6 +195,14 @@ extern void sched_init_smp(void); > extern void init_idle(struct task_struct *idle, int cpu); > > extern cpumask_t nohz_cpu_mask; > +#ifdef CONFIG_SMP > +extern int select_notick_load_balancer(int cpu); > +#else > +static inline int select_notick_load_balancer(int cpu) Later on in the actual code, the parameter is named stop_tick, which makes sense. You should change the name here too so it's not confusing when looking later on at the code. > +{ > + return 0; > +} > +#endif [...] > + > +/* > + * This routine will try to nominate the ilb (idle load balancing) > + * owner among the cpus whose ticks are stopped. ilb owner will do the idle > + * load balancing on behalf of all those cpus. If all the cpus in the system > + * go into this tickless mode, then there will be no ilb owner (as there is > + * no need for one) and all the cpus will sleep till the next wakeup event > + * arrives... > + * > + * For the ilb owner, tick is not stopped. And this tick will be used > + * for idle load balancing. ilb owner will still be part of > + * notick.cpu_mask.. > + * > + * While stopping the tick, this cpu will become the ilb owner if there > + * is no other owner. And will be the owner till that cpu becomes busy > + * or if all cpus in the system stop their ticks at which point > + * there is no need for ilb owner. > + * > + * When the ilb owner becomes busy, it nominates another owner, during the > + * schedule() > + */ > +int select_notick_load_balancer(int stop_tick) > +{ > + int cpu = smp_processor_id(); > + [...] > +#ifdef CONFIG_NO_HZ > + if (idle_cpu(local_cpu) && notick.load_balancer == local_cpu && > + !cpus_empty(cpus)) > + goto restart; > +#endif > } > #else > /* > @@ -3562,6 +3669,21 @@ switch_tasks: > ++*switch_count; > > prepare_task_switch(rq, next); > +#if defined(CONFIG_HZ) && defined(CONFIG_SMP) Ah! so this is where the CONFIG_NO_HZ mistake came in ;) > + if (prev == rq->idle && notick.load_balancer == -1) { > + /* > + * simple selection for now: Nominate the first cpu in > + * the notick list to be the next ilb owner. > + * > + * TBD: Traverse the sched domains and nominate > + * the nearest cpu in the notick.cpu_mask. > + */ > + int ilb = first_cpu(notick.cpu_mask); > + > + if (ilb != NR_CPUS) > + resched_cpu(ilb); > + } > +#endif > prev = context_switch(rq, prev, next); -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG on 2.6.20-rc1 when using gdb
On Sun, 17 Dec 2006 20:55:18 -0500 "Andrew J. Barr" <[EMAIL PROTECTED]> wrote: > When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack > trace as the program was running (e.g. I had typed 'run' in gdb): > > WARNING at kernel/softirq.c:137 local_bh_enable() > [] dump_trace+0x68/0x1d9 > [] show_trace_log_lvl+0x18/0x2c > [] show_trace+0xf/0x11 > [] dump_stack+0x12/0x14 > [] local_bh_enable+0x44/0x94 > [] unix_release_sock+0x6e/0x1fe > [] unix_stream_connect+0x3b4/0x3cf > [] sys_connect+0x82/0xad > [] sys_socketcall+0xac/0x261 > [] syscall_call+0x7/0xb > [] 0xb7f70822 > === > [ cut here ] > kernel BUG at fs/buffer.c:1235! > invalid opcode: [#1] > PREEMPT > Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd > exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac > battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative > cpufreq_ondemand cpufreq_performance cpufreq_powersave > speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss > snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir > sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt > crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250 > serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic > pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev > ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix > ide_core ehci_hcd uhci_hcd usbcore > CPU:0 > EIP:0060:[]Not tainted VLI > EFLAGS: 00010046 (2.6.20-rc1 #1) > EIP is at __find_get_block+0x1c/0x16f > eax: 0086 ebx: ecx: edx: 0088a800 > esi: 0088a800 edi: ebp: dfffd040 esp: cad2dd30 > ds: 007b es: 007b ss: 0068 > Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0 > task.ti=cad2c000) > Stack: cad2dd58 c02caa0b 0002 000e 000b 0001 e8836580 > 0088a800 > e8836610 c01793dc 1000 c03ab3e0 > f3cadd80 >0086 c90d41b0 0088a800 dfffd040 8000 > 0002 > Call Trace: > [] __getblk+0x23/0x268 > [] ext3_getblk+0x10b/0x244 [ext3] > [] ext3_bread+0x19/0x70 [ext3] > [] dx_probe+0x43/0x2c9 [ext3] > [] ext3_htree_fill_tree+0x99/0x1ba [ext3] > [] ext3_readdir+0x1d4/0x5ed [ext3] > [] vfs_readdir+0x63/0x8d > [] sys_getdents64+0x63/0xa5 > [] syscall_call+0x7/0xb > [] 0xb7f70822 > === > Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56 > 89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b > eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74 > EIP: [] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30 > > This happens on 2.6.20-rc1 but not 2.6.19. > And it's repeatable, yes? And you're sure that use of gdb triggers it? Something is forgetting to reenable local interrupts. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA DMA problem (sata_uli)
Tejun Heo wrote: Jeff Garzik wrote: Alan wrote: I tracked it down to one of the drives being forced into PIO4 mode rather than UDMA mode; dmesg bits: ata4.00: ATA-7, max UDMA/133, 586072368 sectors: LBA48 NCQ (depth 0/32) ata4.00: ata4: dev 0 multi count 16 ata4.00: simplex DMA is claimed by other device, disabling DMA Your ULi controller is reporting that it supports UDMA upon only one channel at a time. The kernel is honouring this information. The older ULi (was ALi) PATA devices report simplex but let you turn it off so see if the following does the trick. Test carefully as always with disk driver changes. (Jeff probably best to check the docs before merging this but I believe it is sane) Signed-off-by: Alan Cox <[EMAIL PROTECTED]> My Uli SATA docs do not appear to cover the bmdma registers :( Only the PCI config registers. But regardless, I think the better fix is to never set ATA_HOST_SIMPLEX if ATA_FLAG_NO_LEGACY is set. None of the SATA controllers I've ever encountered has been simplex. Just another data point. The same problem is reported by bug #7590. http://bugzilla.kernel.org/show_bug.cgi?id=7590 Is somebody brewing a patch? Not to my knowledge. Did you just volunteer? ;-) /me runs... Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA DMA problem (sata_uli)
Jeff Garzik wrote: > Alan wrote: >>> I tracked it down to one of the drives being forced into PIO4 mode >>> rather than UDMA mode; dmesg bits: >>> ata4.00: ATA-7, max UDMA/133, 586072368 sectors: LBA48 NCQ (depth 0/32) >>> ata4.00: ata4: dev 0 multi count 16 >>> ata4.00: simplex DMA is claimed by other device, disabling DMA >> >> Your ULi controller is reporting that it supports UDMA upon only one >> channel at a time. The kernel is honouring this information. The older >> ULi (was ALi) PATA devices report simplex but let you turn it off so >> see if the following does the trick. Test carefully as always with >> disk driver >> changes. >> >> (Jeff probably best to check the docs before merging this but I believe >> it is sane) >> >> Signed-off-by: Alan Cox <[EMAIL PROTECTED]> > > My Uli SATA docs do not appear to cover the bmdma registers :( Only the > PCI config registers. > > But regardless, I think the better fix is to never set ATA_HOST_SIMPLEX > if ATA_FLAG_NO_LEGACY is set. > > None of the SATA controllers I've ever encountered has been simplex. Just another data point. The same problem is reported by bug #7590. http://bugzilla.kernel.org/show_bug.cgi?id=7590 Is somebody brewing a patch? -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to sysfs PM layer break userspace
On Tue, Dec 19, 2006 at 01:34:49PM -0800, David Brownell wrote: > Documentation/feature-removal-schedule.txt has warned about this since > August, and the PM list has discussed how broken that model is numerous > times over the past several years. (I'm pretty sure that discussion has > leaked out to LKML on occasion.) It shouldn't be news today. 1) feature-removal-schedule.txt says that it'll be removed in July 2007. This isn't July 2007. 2) The functionality was disabled in 2.6.19. The addition to feature-removal-schedule.txt was in, uh, 2.6.19. 3) "The whole _point_ of a kernel is to act as a abstraction layer and resource management between user programs and hardware/outside world. That's why kernels _exist_. Breaking user-land API's is thus by definition something totally idiotic. If you need to break something, you create a new interface, and try to translate between the two, and maybe you deprecate the old one so that it can be removed once it's not in use any more. If you can't see that this is how a kernel should work, you're missing the point of having a kernel in the first place." Linus, http://lkml.org/lkml/2006/10/4/327 -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Wed, 20 Dec 2006, Peter Zijlstra wrote: > On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote: > > OR: > > > > - page_mkclean_one() is simply buggy. > > GOLD! Ok. I was looking at that, and I wondered.. However, if that works, then I _think_ the correct sequence is the following.. The rule should be: - we flush the tlb _after_ we have cleared it, but _before_ we insert the new entry. But I dunno. These things are damn subtle. Does this patch fix it for you? I actually suspect we should do this as an arch-specific macro, and totally replace the current "ptep_clear_flush_dirty()" with one that does "ptep_clear_flush_dirty_and_set_wp()". Because what I'd _really_ prefer to do on x86 (and probably on most other sane architectures) is to do - atomically replace the pte with the EXACT SAME ONE, but one that has the writable bit clear. bit_clear(_PAGE_BIT_RW, &(ptep)->pte_low); - flush the TLB, making sure that all CPU's will no longer write to it: flush_tlb_page(vma, address); - finally, just fetch-and-clear the dirty bit (and since it's no longer writable, nobody should be settign it any more) ret = bit_clear(__PAGE_BIT_DIRTY, &(ptep)->pte_low); and now we should be all done. But the "ptep_get_and_clear() + flush_tlb_page()" sequence should hopefully also work. Pls test. Linus diff --git a/mm/rmap.c b/mm/rmap.c index d8a842a..eec8706 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -448,9 +448,10 @@ static int page_mkclean_one(struct page *page, struct vm_area_struct *vma) goto unlock; entry = ptep_get_and_clear(mm, address, pte); + flush_tlb_page(vma, address); entry = pte_mkclean(entry); entry = pte_wrprotect(entry); - ptep_establish(vma, address, pte, entry); + set_pte_at(mm, address, pte, entry); lazy_mmu_prot_update(entry); ret = 1; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 7596 - Potential performance bottleneck for Linxu TCP
Stephen Hemminger <[EMAIL PROTECTED]> wrote: > I noticed this bit of discussion in tcp_recvmsg. It implies that a better > queuing policy would be good. But it is confusing English (Alexey?) so > not sure where to start. Actually I think the comment says that the current code isn't the most elegant but is more efficient. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18 mmap hangs unrelated apps
On Tue, 19 Dec 2006 19:17:43 -0500 Trond Myklebust <[EMAIL PROTECTED]> wrote: > > (We were supposed to stop doing that about four years ago - change it so > > that all a_ops must implement ->releasepage, but nobody got around to it). > > Would you still be interested in seeing this done? Sure, when things calm down. It's just a cleanup. There are various places where we got lazy and did this. ->set_page_dirty, ->page_mkwrite, many others. With varying degrees of consequential ugliness. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL only modules
On Dec 18, 2006, "David Schwartz" <[EMAIL PROTECTED]> wrote: > It makes no difference whether the "mere aggregation" paragraph kicks in > because the "mere aggregation" paragraph is *explaining* the *law*. What > matters is what the law actually *says*. You mean "mere aggregation" is defined in copyright law? I don't think so, otherwise the term 'aggregate' probably wouldn't have been used in GPLv3. AFAIK it's perfectly legitimate (even if immoral) for a copyright license to prohibit the distribution of the software governed by the license with anything else the author establishes. E.g., some Java virtual machine's license used to establish that you couldn't ship it along with other implementations of Java that didn't pass some comformance test. Now, the GPL doesn't do this. It doesn't say you can't distribute GPLed software along with any other software. It only says that, when you distribute together works that don't constitute mere aggregation (providing its own definition of mere aggregation), then the whole must be licensed under the GPL. > The GPL could say that if you ever see the source code to a GPL'd work, > every work you ever write must be placed under the GPL. But that wouldn't > make it true, because that would be a requirement outside the GPL's scope. It is indeed possible that this would fall outside the scope of copyright law in the US, and it would not be morally acceptable for the GPL to impose such a condition. But then, since nobody can be forced to see the source code of a GPLed work, or any work for that matter, acceptance is voluntary, and one shouldn't enter an agreement one's not willing to abide by. -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Tue, 19 Dec 2006 16:03:49 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Wed, 20 Dec 2006, Peter Zijlstra wrote: > > > On Tue, 2006-12-19 at 14:58 -0800, Andrew Morton wrote: > > > > > Well... we'd need to see (corruption && this-not-triggering) to be sure. > > > > > > Peter, have you been able to trigger the corruption? > > > > Yes; however the mail I send describing that seems to be lost in space. > > Btw, can somebody actually explain the mess that is ext3 "dirtying". > > Ext3 does NOT use __set_page_dirty_buffers. It does > > static int ext3_journalled_set_page_dirty(struct page *page) > { > SetPageChecked(page); > return __set_page_dirty_nobuffers(page); > } > > and uses that "Checked" bit as a "whole page is dirty" bit (which it tests > in "writepage()". This is purely for data=journal, which is rarely used. In journalled-data mode, write(), write-fault, etc are not allowed to dirty the pages and buffers, because the data has to be written to the journal first. After the data has been written to the journal we only then mark buffers (and hence pages) dirty as far as the VFS is concerned. For checkpointing the data back to its real place on the disk. For MAP_SHARED pages ext3 cheats madly and doesn't journal the data at all. In all journalling modes, MAP_SHARED data follows the regular ext2-style handling. Which is a bit of a wart. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18 mmap hangs unrelated apps
On Tue, 2006-12-19 at 16:03 -0800, Andrew Morton wrote: > On Tue, 19 Dec 2006 18:19:38 -0500 > Trond Myklebust <[EMAIL PROTECTED]> wrote: > > > NFS: Fix race in nfs_release_page() > > > > invalidate_inode_pages2() may set the dirty bit on a page owing to the > > call > > to unmap_mapping_range() after the page was locked. In order to fix > > this, > > NFS has hooked the releasepage() method. This, however leads to > > deadlocks > > in other parts of the VM. > > hmm, subtle. > > > Fix is to add a new callback: flushpage(), which will write out a dirty > > page that is under the page lock. > > > > I guess this might permit us to clean up some of the nasties in > invalidate_inode_pages2() - if the page comes dirty again, write it again. > But the requirement that the page remain locked makes it hard. Need to > think about it some more. This was one of the reasons why I had to introduce nfs_writepage_locked() for 2.6.20 (the other reason being readpage()). The problem is that you can only protect against redirtying of the page by holding the page lock across the call to unmap_mapping_range(), the page writeout and the page removal. > Are you sure this is the cause of the NFS problem? > > > .prepare_write = nfs_prepare_write, > > .commit_write = nfs_commit_write, > > .invalidatepage = nfs_invalidate_page, > > - .releasepage = nfs_release_page, > > A NULL ->releasepage means that try_to_release_page() will call > try_to_free_buffers() if PagePrivate(). I suspect you'll need a stub to > prevent this. Ack, I'll add one in. If PagePrivate() is set during the call to try_to_release_page(), then the page should never be freeable. > (We were supposed to stop doing that about four years ago - change it so > that all a_ops must implement ->releasepage, but nobody got around to it). Would you still be interested in seeing this done? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
On Tue, Dec 19, 2006 at 03:36:28PM -0800, David Brownell wrote: > On Tuesday 19 December 2006 2:57 pm, Matthew Garrett wrote: > > The fact that something is scheduled to be removed in July 2007 does > > *not* mean it's acceptable to break it in 2006. We need to find a way to > > fix this functionality in the meantime. > > The disconnect here is analagous to: I tell you the alleged perpetual > motion machine never worked, and can't ever work; and you push back and > say that you need a perpetual motion machine that works, NOW please, > because you need something that pushes those widgets around. (There are > better ways to push widgets than side effects of a broken machine...) But it *did* work. Userspace could ask the device to suspend, and (in general) that would result in the device going into a lower power state. You've broken that without providing an alternative. > Given that your examples are network adapters, you should really look > more at why "ifdown eth0" (etc) having drivers put the device into a > low power state (like PCI D3hot, or maybe D2) wouldn't work in any > particular case. If you actually have such cases, then maybe those > specific drivers need to drive new power management interfaces. We seem to be arguing at cross purposes here. I've absolutely no objection to this approach in the long run, just as I've got no objection to flying cars or food pills or moon pods. When these things exist, the world will indeed be a glorious place. But that doesn't justify me slashing your tyres, poisoning your crops or setting fire to whatever the real-world analogue of a moon pod is. I had something that worked. Now I don't, but instead have the promise that at some point I'll have something better. Understand why I'm a touch irritated? > That's a workable approach to resolving the underlying problem in the > long term. In the short term, notice the system still works correctly > if you don't try writing those files. Well, except I'm now burning an extra couple of watts of power. I consider that pretty broken. > I'd not be keen on reverting Linus' patch [1] myself, even though few > drivers have started to use that mechanism yet; that would be a step > backwards, and would perpetuate users of that broken sysfs file. I'm sorry, which bit of "Don't break userspace API without adequate prior warning and with a workable replacement" is difficult to understand? -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL only modules
On Dec 18, 2006, "David Schwartz" <[EMAIL PROTECTED]> wrote: > I don't see why you can't distribute a single DVD that combines the contents > of the two you bought, so long as you destroy the originals. Because, for example, per Brazilian law since 1998, fair use only grants you the right to copy small portions of copyrighted works for personal use. http://www.petitiononline.com/netlivre Remember that the GPL is not only about US copyright law or US courts. -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata badness in 2.6.20-rc1? [Was: Re: md patches in -mm]
On Tue, 19 Dec 2006 15:26:00 -0800 (PST) Luben Tuikov <[EMAIL PROTECTED]> wrote: > The reason was that my dev tree was tainted by this bug: > > if (good_bytes && > - scsi_end_request(cmd, 1, good_bytes, !!result) == NULL) > + scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL) > return; > > in scsi_io_completion(). I had there !!result which is wrong, and when > I diffed against master, it produced a bad patch. Oh. I thought that got sorted out. It's a shame this wasn't made clear to me.. > As James mentioned one of the chunks is good and can go in. Please send a new patch, not referential to any previous patch or email, including full changelogging. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL only modules
On Dec 18, 2006, "David Schwartz" <[EMAIL PROTECTED]> wrote: > No automated, mechanical process can create a derivative work of software. > (With a few exceptions not relevant here.) Can you explain what mechanisms are involved in copyright monopolies over object code, then? (there's a hint at http://www.fsfla.org/?q=en/node/128#1 ) -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Wed, 20 Dec 2006, Peter Zijlstra wrote: > On Tue, 2006-12-19 at 14:58 -0800, Andrew Morton wrote: > > > Well... we'd need to see (corruption && this-not-triggering) to be sure. > > > > Peter, have you been able to trigger the corruption? > > Yes; however the mail I send describing that seems to be lost in space. Btw, can somebody actually explain the mess that is ext3 "dirtying". Ext3 does NOT use __set_page_dirty_buffers. It does static int ext3_journalled_set_page_dirty(struct page *page) { SetPageChecked(page); return __set_page_dirty_nobuffers(page); } and uses that "Checked" bit as a "whole page is dirty" bit (which it tests in "writepage()". You realize what this all means? It means that ANYTHING that actually clears the _real_ dirty bit won't actually be doing anything at all for ext3, because the Checked bit will still stay set, and any IO down the line on that page would totally ignore the dirty bits on the buffer heads and just write out everything. That is "The Mess(tm)". It also basically means that anything that clears the dirty bit without just calling "writepage()" had _better_ call "invalidatepage()" for the whole page, because otherwise the PageChecked bit will never be cleared as far as I can see. Happily, at least ext3 seems to _test_ for that case in the release_page() function, so it appears that we do do this. But this seems to just strengthen my argument: you can NEVER clean a page, unless you (a) do IO on it immediately afterwards (writeback) or (b) invalidate it entirely (truncate). I'd really like to see just those two functions exist. Preferably in a form where you can see easily that we actually follow those rules. Rather than having a confusing set of "clear_page_dirty()" and "test_and_clear_page_dirty()" functions that are called from random places. IOW, I think the "clear_page_dirty_for_io()" is fine (it's case (a)) above, and then we should probably have a "cancel_dirty_page()" function that does all the current clear_page_dirty() but also makes sure that we actually call the invalidate_page() function itself. Hmm? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18 mmap hangs unrelated apps
On Tue, 19 Dec 2006 18:19:38 -0500 Trond Myklebust <[EMAIL PROTECTED]> wrote: > NFS: Fix race in nfs_release_page() > > invalidate_inode_pages2() may set the dirty bit on a page owing to the > call > to unmap_mapping_range() after the page was locked. In order to fix this, > NFS has hooked the releasepage() method. This, however leads to deadlocks > in other parts of the VM. hmm, subtle. > Fix is to add a new callback: flushpage(), which will write out a dirty > page that is under the page lock. > I guess this might permit us to clean up some of the nasties in invalidate_inode_pages2() - if the page comes dirty again, write it again. But the requirement that the page remain locked makes it hard. Need to think about it some more. Are you sure this is the cause of the NFS problem? > .prepare_write = nfs_prepare_write, > .commit_write = nfs_commit_write, > .invalidatepage = nfs_invalidate_page, > - .releasepage = nfs_release_page, A NULL ->releasepage means that try_to_release_page() will call try_to_free_buffers() if PagePrivate(). I suspect you'll need a stub to prevent this. (We were supposed to stop doing that about four years ago - change it so that all a_ops must implement ->releasepage, but nobody got around to it). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19.1, sata_sil: sata dvd writer doesn't work
* dmesg is truncated, please post the content of file /var/log/boot.msg. * Please post the result of 'lspci -nnvvv' * Please try the attached patch and see if it makes any difference and post the result of 'dmesg' after trying to play a problematic dvd. -- tejun diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 02b2b27..bbbec75 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -1433,16 +1433,47 @@ static void ata_eh_report(struct ata_port *ap) } for (tag = 0; tag < ATA_MAX_QUEUE; tag++) { + static const char *dma_str[] = { + [DMA_BIDIRECTIONAL] = "bidi", + [DMA_TO_DEVICE] = "out", + [DMA_FROM_DEVICE] = "in", + [DMA_NONE] = "", + }; struct ata_queued_cmd *qc = __ata_qc_from_tag(ap, tag); + struct ata_taskfile *cmd = >tf, *res = >result_tf; + const u8 *c = qc->cdb; + unsigned int nbytes; if (!(qc->flags & ATA_QCFLAG_FAILED) || !qc->err_mask) continue; - ata_dev_printk(qc->dev, KERN_ERR, "tag %d cmd 0x%x " - "Emask 0x%x stat 0x%x err 0x%x (%s)\n", - qc->tag, qc->tf.command, qc->err_mask, - qc->result_tf.command, qc->result_tf.feature, - ata_err_string(qc->err_mask)); + nbytes = qc->nbytes; + if (!nbytes) + nbytes = qc->nsect << 9; + + ata_dev_printk(qc->dev, KERN_ERR, + "cmd %02x/%02x:%02x:%02x:%02x:%02x/%02x:%02x:%02x:%02x:%02x/%02x " + "tag %d cdb 0x%x data %u %s\n " + "res %02x/%02x:%02x:%02x:%02x:%02x/%02x:%02x:%02x:%02x:%02x/%02x " + "Emask 0x%x (%s)\n", + cmd->command, cmd->feature, cmd->nsect, + cmd->lbal, cmd->lbam, cmd->lbah, + cmd->hob_feature, cmd->hob_nsect, + cmd->hob_lbal, cmd->hob_lbam, cmd->hob_lbah, + cmd->device, qc->tag, qc->cdb[0], nbytes, + dma_str[qc->dma_dir], + res->command, res->feature, res->nsect, + res->lbal, res->lbam, res->lbah, + res->hob_feature, res->hob_nsect, + res->hob_lbal, res->hob_lbam, res->hob_lbah, + res->device, qc->err_mask, ata_err_string(qc->err_mask)); + + ata_dev_printk(qc->dev, KERN_ERR, + "CDB: %02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x " + "%02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x p=%d\n", + c[0], c[1], c[2], c[3], c[4], c[5], c[6], c[7], + c[8], c[9], c[10], c[11], c[12], c[13], c[14], c[15], + cmd->protocol); } } diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 3ac4890..f018e49 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -191,6 +191,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, goto out; req->cmd_len = COMMAND_SIZE(cmd[0]); + memset(req->cmd, 0, BLK_MAX_CDB); /* ATAPI hates garbage after CDB */ memcpy(req->cmd, cmd, req->cmd_len); req->sense = sense; req->sense_len = 0;
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote: > OR: > > - page_mkclean_one() is simply buggy. GOLD! it seems to work with all this (full diff against current git). /me rebuilds full kernel to make sure... reboot... test... pff the tension... yay, still good! Andrei; would you please verify. The magic seems to be in the extra tlb flush after clearing the dirty bit. Just too bad ptep_clear_flush_dirty() needs ptep not entry. diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index 5e7cd45..2b8893b 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -135,8 +135,7 @@ static int cn_call_callback(struct cn_msg *msg, void (*destruct_data)(void *), v spin_lock_bh(>cbdev->queue_lock); list_for_each_entry(__cbq, >cbdev->queue_list, callback_entry) { if (cn_cb_equal(&__cbq->id.id, >id)) { - if (likely(!test_bit(WORK_STRUCT_PENDING, -&__cbq->work.work.management) && + if (likely(!delayed_work_pending(&__cbq->work) && __cbq->data.ddata == NULL)) { __cbq->data.callback_priv = msg; diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page) int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page) spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/mm/memory.c b/mm/memory.c index c00bac6..60e0945 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_space *mapping, } EXPORT_SYMBOL(unmap_mapping_range); +static void check_last_page(struct address_space *mapping, loff_t size) +{ + pgoff_t index; + unsigned int offset; + struct page *page; + + if (!mapping) + return; + offset = size & ~PAGE_MASK; + if (!offset) + return; + index = size >> PAGE_SHIFT; + page = find_lock_page(mapping, index); + if (page) { + unsigned int check = 0; + unsigned char *kaddr = kmap_atomic(page, KM_USER0); + do { + check += kaddr[offset++]; + } while (offset < PAGE_SIZE); + kunmap_atomic(kaddr, KM_USER0); + unlock_page(page); + page_cache_release(page); + if (check) + printk(KERN_ERR "%s: BADNESS: truncate check %u\n", current->comm, check); + } +} + /** * vmtruncate - unmap mappings "freed" by truncate() syscall * @inode: inode of the file used @@ -1875,6 +1902,7 @@ do_expand: goto out_sig; if (offset > inode->i_sb->s_maxbytes) goto out_big; + check_last_page(mapping, inode->i_size); i_size_write(inode, offset); out_truncate: diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 237107c..f561e72 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -957,7 +957,7 @@ int test_set_page_writeback(struct page *page) EXPORT_SYMBOL(test_set_page_writeback); /* - * Return true if any of the pages in the mapping are marged with the + * Return true if any of the pages in the mapping are marked with the * passed tag. */ int mapping_tagged(struct address_space *mapping, int tag) diff --git a/mm/rmap.c b/mm/rmap.c index d8a842a..900229a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -432,7 +432,7 @@ static int page_mkclean_one(struct page *page, struct vm_area_struct *vma) { struct mm_struct *mm = vma->vm_mm; unsigned long address; - pte_t *pte,
ok, maybe i misread that whole "kmem_cache_alloc()" thing
all right, i may have misread what's going on with kmem_cache_alloc() and kmem_cache_zalloc(), and my earlier submission may be entirely nonsense, since it involved transformations like this: * it with privilege level 3 because the IVE uses non-privileged accesses to these * tables. IA-32 segmentation is used to protect against IA-32 accesses to them. */ - vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); + vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); if (vma) { - memset(vma, 0, sizeof(*vma)); vma->vm_mm = current->mm; vma->vm_start = IA32_GDT_OFFSET; vma->vm_end = vma->vm_start + PAGE_SIZE; can someone briefly tell me if what i did makes sense? rday - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Changes to PM layer break userspace
On Tuesday 19 December 2006 2:57 pm, Matthew Garrett wrote: > On Tue, Dec 19, 2006 at 01:22:12PM -0800, David Brownell wrote: > > As a generic mechanism, that interface has *ALWAYS* been "broken > > by design"; I'd call it unfixable. It's deprecated, and scheduled > > to vanish; see Documentation/feature-removal-schedule.txt ... > > The fact that something is scheduled to be removed in July 2007 does > *not* mean it's acceptable to break it in 2006. We need to find a way to > fix this functionality in the meantime. The disconnect here is analagous to: I tell you the alleged perpetual motion machine never worked, and can't ever work; and you push back and say that you need a perpetual motion machine that works, NOW please, because you need something that pushes those widgets around. (There are better ways to push widgets than side effects of a broken machine...) Given that your examples are network adapters, you should really look more at why "ifdown eth0" (etc) having drivers put the device into a low power state (like PCI D3hot, or maybe D2) wouldn't work in any particular case. If you actually have such cases, then maybe those specific drivers need to drive new power management interfaces. That's a workable approach to resolving the underlying problem in the long term. In the short term, notice the system still works correctly if you don't try writing those files. I'd not be keen on reverting Linus' patch [1] myself, even though few drivers have started to use that mechanism yet; that would be a step backwards, and would perpetuate users of that broken sysfs file. - Dave [1] cbd69dbbf1adfce6e048f15afc8629901ca9dae5 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc1-mm1
--- Damien Wyart <[EMAIL PROTECTED]> wrote: > > > > The reiser4 failure is unexpected. Could you please see if you can > > > > capture a trace, let the people at [EMAIL PROTECTED] know? > > > > Ok, I've handwritten the messages, here they are : > > > > reiser4 panicked cowardly : reiser4[umount(2451)] : commit_current_atom > > > (fs/reiser4/txmngr.c:1087) (zam-597) > > > write log failed (-5) > > > > [ got 2 copies of them because I have 2 reiser4 fs) > > > > I got them mainly when I try to reboot or halt the machine, and the > > > process doesn't finish, the computer gets stuck after the reiser4 > > > messages. This is only with 2.6.20-mm1, not 2.6.19-rc6-mm2. > > * Laurent Riffard <[EMAIL PROTECTED]> [2006-12-18 09:03]: > > fix-sense-key-medium-error-processing-and-retry.patch seems to be the > > culprit. > > > Reverting it fix those reiser4 panics for me. Damien, could you confirm > > please ? > > Yes, this fixes it too on my side. Thanks for this tracking ! I had a bug in my dev tree which got picked up by the patch when I diffed against master: - scsi_end_request(cmd, 1, good_bytes, !!result) == NULL) + scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL) return; As james explained, the other chunk of the patch is still good. Luben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.20-git] sata_svw: Check for errors from ata_device_add()
On Tue, 2006-12-19 at 17:59 -0500, Ben Collins wrote: > Without this patch, G5 oopses on boot. I've had this in Ubuntu since > 2.6.17, but I forgot it was in there. Still required with 2.6.20. > > Signed-off-by: Ben Collins <[EMAIL PROTECTED]> Ignore this patch for now, BenH and I are discussing the issue further. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata badness in 2.6.20-rc1? [Was: Re: md patches in -mm]
--- [EMAIL PROTECTED] wrote: > From: Andrew Morton <[EMAIL PROTECTED]> > Date: Sun, Dec 17, 2006 at 03:05:39AM -0800 > > On Sun, 17 Dec 2006 12:00:12 +0100 > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > Okay, I have identified the patch that causes the problem to appear, > > > which is > > > > > > fix-sense-key-medium-error-processing-and-retry.patch > > > > > > With this patch reverted -rc1-mm1 is happily running on my test box. > > > > That was rather unexpected. Thanks. > > > I can confirm that 2.6.20-rc1-mm1 with this patch reverted mounts my > raid6 partition without problems. This is x86_64 with SMP. > The reason was that my dev tree was tainted by this bug: if (good_bytes && - scsi_end_request(cmd, 1, good_bytes, !!result) == NULL) + scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL) return; in scsi_io_completion(). I had there !!result which is wrong, and when I diffed against master, it produced a bad patch. As James mentioned one of the chunks is good and can go in. Luben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/