Re: 2.6.24-git: kmap_atomic() WARN_ON()
On 2008.02.25 15:08:35 -0500, Jeff Garzik wrote: > Björn Steinbrink wrote: >> Hm, do you have lockdep enabled? If not, does lockdep make this go away? >> Because lockdep will set IRQF_DISABLED for all interrupt handlers, and >> unless that flag is set, handle_IRQ_event will reenable interrupts while >> the handler is running. And ahci_interrupt only uses a plain spin_lock, >> so interrupts keep being enabled. The patch below should help with that. >> >> Hmhm, maybe that also solves the deadlock you saw? Dunno... >> >> I can't come up with an useful commit message right now, but I'll resend >> in suitable form for submission if that thing actually works. >> >> Björn >> >> >> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c >> index 1db93b6..ae3dbc8 100644 >> --- a/drivers/ata/ahci.c >> +++ b/drivers/ata/ahci.c >> @@ -1739,6 +1739,7 @@ static irqreturn_t ahci_interrupt(int irq, void >> *dev_instance) >> unsigned int i, handled = 0; >> void __iomem *mmio; >> u32 irq_stat, irq_ack = 0; >> +unsigned long flags; >> VPRINTK("ENTER\n"); >> @@ -1751,7 +1752,7 @@ static irqreturn_t ahci_interrupt(int irq, void >> *dev_instance) >> if (!irq_stat) >> return IRQ_NONE; >> - spin_lock(>lock); >> +spin_lock_irqsave(>lock, flags); >> for (i = 0; i < host->n_ports; i++) { >> struct ata_port *ap; >> @@ -1778,7 +1779,7 @@ static irqreturn_t ahci_interrupt(int irq, void >> *dev_instance) >> handled = 1; >> } >> - spin_unlock(>lock); >> +spin_unlock_irqrestore(>lock, flags); > > If this truly fixes the problem, then lockdep is definitely the problem > source. Hm, lockdep keeps the interrupts _disabled_. The code that reenables the interrupts was already in the first revision of Linus' git tree. So lockdep would actually just hide the problem, not cause it. Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git: kmap_atomic() WARN_ON()
On 2008.02.07 00:58:42 +0100, Thomas Gleixner wrote: > current mainline triggers: > > WARNING: at /home/tglx/work/kernel/x86/linux-2.6/arch/x86/mm/highmem_32.c:52 > kmap_atomic_prot+0xe5/0x19b() > Modules linked in: ahci(+) sata_sil libata sd_mod scsi_mod raid1 ext3 jbd > ehci_hcd ohci_hcd uhci_hcd > Pid: 0, comm: swapper Not tainted 2.6.24 #173 > [] warn_on_slowpath+0x41/0x51 > [] ? __enqueue_entity+0x9c/0xa4 > [] ? enqueue_entity+0x124/0x13b > [] ? enqueue_task_fair+0x41/0x4c > [] ? _spin_lock_irqsave+0x14/0x2e > [] ? lock_timer_base+0x1f/0x3e > [] kmap_atomic_prot+0xe5/0x19b > [] kmap_atomic+0x14/0x16 > [] ata_scsi_rbuf_get+0x1e/0x2c [libata] > [] atapi_qc_complete+0x23f/0x289 [libata] > [] __ata_qc_complete+0x8e/0x93 [libata] > [] ata_qc_complete+0x115/0x128 [libata] > [] ata_qc_complete_multiple+0x86/0xa0 [libata] > [] ahci_interrupt+0x370/0x40d [ahci] > [] handle_IRQ_event+0x21/0x48 > [] handle_edge_irq+0xc9/0x10a > [] ? handle_edge_irq+0x0/0x10a > [] do_IRQ+0x8b/0xb7 > [] common_interrupt+0x23/0x28 > [] ? init_chipset_cmd64x+0xb/0x93 > [] ? mwait_idle_with_hints+0x39/0x3d > [] ? mwait_idle+0x0/0xf > [] mwait_idle+0xd/0xf > [] cpu_idle+0xb0/0xe4 > [] rest_init+0x5d/0x5f > > This is not a new problem. It was pointed out some time ago already, > but now the WARN_ON() finally made it into mainline :) > > The fix is not obvious, as this code seems to be called from various > call sites. Hm, do you have lockdep enabled? If not, does lockdep make this go away? Because lockdep will set IRQF_DISABLED for all interrupt handlers, and unless that flag is set, handle_IRQ_event will reenable interrupts while the handler is running. And ahci_interrupt only uses a plain spin_lock, so interrupts keep being enabled. The patch below should help with that. Hmhm, maybe that also solves the deadlock you saw? Dunno... I can't come up with an useful commit message right now, but I'll resend in suitable form for submission if that thing actually works. Björn diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 1db93b6..ae3dbc8 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1739,6 +1739,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) unsigned int i, handled = 0; void __iomem *mmio; u32 irq_stat, irq_ack = 0; + unsigned long flags; VPRINTK("ENTER\n"); @@ -1751,7 +1752,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) if (!irq_stat) return IRQ_NONE; - spin_lock(>lock); + spin_lock_irqsave(>lock, flags); for (i = 0; i < host->n_ports; i++) { struct ata_port *ap; @@ -1778,7 +1779,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) handled = 1; } - spin_unlock(>lock); + spin_unlock_irqrestore(>lock, flags); VPRINTK("EXIT\n"); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git: kmap_atomic() WARN_ON()
On 2008.02.07 00:58:42 +0100, Thomas Gleixner wrote: current mainline triggers: WARNING: at /home/tglx/work/kernel/x86/linux-2.6/arch/x86/mm/highmem_32.c:52 kmap_atomic_prot+0xe5/0x19b() Modules linked in: ahci(+) sata_sil libata sd_mod scsi_mod raid1 ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Not tainted 2.6.24 #173 [c0126b60] warn_on_slowpath+0x41/0x51 [c011c5eb] ? __enqueue_entity+0x9c/0xa4 [c011c717] ? enqueue_entity+0x124/0x13b [c011cedb] ? enqueue_task_fair+0x41/0x4c [c032ce88] ? _spin_lock_irqsave+0x14/0x2e [c012e885] ? lock_timer_base+0x1f/0x3e [c011ad6d] kmap_atomic_prot+0xe5/0x19b [c011ae37] kmap_atomic+0x14/0x16 [f88ea218] ata_scsi_rbuf_get+0x1e/0x2c [libata] [f88ea821] atapi_qc_complete+0x23f/0x289 [libata] [f88e3d00] __ata_qc_complete+0x8e/0x93 [libata] [f88e3fc7] ata_qc_complete+0x115/0x128 [libata] [f88e8d47] ata_qc_complete_multiple+0x86/0xa0 [libata] [f8841a5d] ahci_interrupt+0x370/0x40d [ahci] [c01512c6] handle_IRQ_event+0x21/0x48 [c01521ca] handle_edge_irq+0xc9/0x10a [c0152101] ? handle_edge_irq+0x0/0x10a [c0107518] do_IRQ+0x8b/0xb7 [c01055db] common_interrupt+0x23/0x28 [c032007b] ? init_chipset_cmd64x+0xb/0x93 [c010316e] ? mwait_idle_with_hints+0x39/0x3d [c0103172] ? mwait_idle+0x0/0xf [c010317f] mwait_idle+0xd/0xf [c0103761] cpu_idle+0xb0/0xe4 [c031b509] rest_init+0x5d/0x5f This is not a new problem. It was pointed out some time ago already, but now the WARN_ON() finally made it into mainline :) The fix is not obvious, as this code seems to be called from various call sites. Hm, do you have lockdep enabled? If not, does lockdep make this go away? Because lockdep will set IRQF_DISABLED for all interrupt handlers, and unless that flag is set, handle_IRQ_event will reenable interrupts while the handler is running. And ahci_interrupt only uses a plain spin_lock, so interrupts keep being enabled. The patch below should help with that. Hmhm, maybe that also solves the deadlock you saw? Dunno... I can't come up with an useful commit message right now, but I'll resend in suitable form for submission if that thing actually works. Björn diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 1db93b6..ae3dbc8 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1739,6 +1739,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) unsigned int i, handled = 0; void __iomem *mmio; u32 irq_stat, irq_ack = 0; + unsigned long flags; VPRINTK(ENTER\n); @@ -1751,7 +1752,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) if (!irq_stat) return IRQ_NONE; - spin_lock(host-lock); + spin_lock_irqsave(host-lock, flags); for (i = 0; i host-n_ports; i++) { struct ata_port *ap; @@ -1778,7 +1779,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) handled = 1; } - spin_unlock(host-lock); + spin_unlock_irqrestore(host-lock, flags); VPRINTK(EXIT\n); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git: kmap_atomic() WARN_ON()
On 2008.02.25 15:08:35 -0500, Jeff Garzik wrote: Björn Steinbrink wrote: Hm, do you have lockdep enabled? If not, does lockdep make this go away? Because lockdep will set IRQF_DISABLED for all interrupt handlers, and unless that flag is set, handle_IRQ_event will reenable interrupts while the handler is running. And ahci_interrupt only uses a plain spin_lock, so interrupts keep being enabled. The patch below should help with that. Hmhm, maybe that also solves the deadlock you saw? Dunno... I can't come up with an useful commit message right now, but I'll resend in suitable form for submission if that thing actually works. Björn diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 1db93b6..ae3dbc8 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1739,6 +1739,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) unsigned int i, handled = 0; void __iomem *mmio; u32 irq_stat, irq_ack = 0; +unsigned long flags; VPRINTK(ENTER\n); @@ -1751,7 +1752,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) if (!irq_stat) return IRQ_NONE; - spin_lock(host-lock); +spin_lock_irqsave(host-lock, flags); for (i = 0; i host-n_ports; i++) { struct ata_port *ap; @@ -1778,7 +1779,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) handled = 1; } - spin_unlock(host-lock); +spin_unlock_irqrestore(host-lock, flags); If this truly fixes the problem, then lockdep is definitely the problem source. Hm, lockdep keeps the interrupts _disabled_. The code that reenables the interrupts was already in the first revision of Linus' git tree. So lockdep would actually just hide the problem, not cause it. Björn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why is the kfree() argument const?
On 2008.01.18 10:48:26 +0100, Jakob Oestergaard wrote: > On Thu, Jan 17, 2008 at 01:25:39PM -0800, Linus Torvalds wrote: > ... > > Why do you make that mistake, when it is PROVABLY NOT TRUE! > > > > Try this trivial program: > > > > int main(int argc, char **argv) > > { > > int i; > > const int *c; > > > > i = 5; > > c = > > i = 10; > > return *c; > > } > > > > and realize that according to the C rules, if it returns anything but 10, > > the compiler is *buggy*. > > That's not how this works (as we obviously agree). > > Please consider a rewrite of your example, demonstrating the usefulness and > proper application of const pointers: > > extern foo(const int *); > > int main(int argc, char **argv) > { > int i; > > i = 5; > foo(); > return i; > } > > Now, if the program returns anything else than 5, it means someone cast away > const, which is generally considered a bad idea in most other software > projects, for this very reason. Not at all. #include #include char *lookup[5]; const char *get() { *(*lookup = malloc(1)) = '1'; return *lookup; } void set(const char *d, char val) { for (int i = 0; i < 5; ++i) if (lookup[i] == d) *(lookup[i]) = val; } int main() { const char *p = get(); printf("%c\n", *p); set(p, '2'); printf("%c\n", *p); return 0; } Do you see anything that casts the const away? No? Me neither. Still, the memory that p points to was changed, because there was another pointer and that was not const. > *That* is the purpose of const pointers. The only thing that const can tell you is that you should not modify the value _yourself_, using that pointer _directly_. It's somewhat like a soft "half" protected/private specifier. You may read this value, but if you want to write to it, please use the setter function I provide for you. Because that setter function might do some special stuff, like counting how often that value was written. And accepting a pointer to a const as an argument does _only_ say: It's ok to call this function if you only received a pointer to a const, the function does the Right Thing for such pointers. It does not guarantee at all, that the function won't change the memory the pointer is pointing to. Take a set of functions that manage memory for foo objects: const struct foo *get(someIdentifier); struct foo *makeWritable(const struct foo *); void free(const struct foo *); get() returns a pointer to a foo object, and it might return a pointer to a _shared_ instance. Obviously it should make the pointer const, the caller should not modify the shared instance. makeWritable() accepts a pointer to a const foo, because you generally want to pass it such a pointer to get a non-const one instead. The function might just use ref-counting and see if it needs to create a copy returning a pointer to points to a different location in memory or if it can just return its _internal_ _non-const_ pointer. No casts! free() also accepts a pointer to a const foo, because you obviously want to be able to free the shared stuff, too. It just happens to not always go away immediately, because there's some ref-counting going on. But you are _still_ invalidating your const pointer. You lost all rights to using it, because you _gave it up_ when you called free(). An important part of free() is to _invalidate_ the _pointer_. Whether or not the object it pointed to was modified or not doesn't matter at all, because you have no valid means of accessing it _anyway_, it's totally out of scope. Now I can hear you screaming "But that's refcounting! That's totally different!". It's _not_. Just "pretend" that get() cannot return the same thing twice, then your ref-counting only goes up to one and boom, you've just reinvented malloc/free, with a const pointer being passed to free(). Passing a pointer to free invalidates that pointer, and all pointers that share the same value. And then, when no valid ways to access the object are left, free() is free to use the memory in any way it wants. Without any stricht need to use the const pointer to perform any writes. That might be an optimization, but you can do without. What you're arguing about is actually that you don't want anyone to be able to use a const pointer to invalidate any other pointer, not that you don't want an object to be changed through a const pointer, because that is simply not what happens. If you want to restrict the set of pointers that can be invalidated by an other pointer, you'll have to use something else because const does not talk about invalidating aliasing pointers. Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why is the kfree() argument const?
On 2008.01.18 10:48:26 +0100, Jakob Oestergaard wrote: On Thu, Jan 17, 2008 at 01:25:39PM -0800, Linus Torvalds wrote: ... Why do you make that mistake, when it is PROVABLY NOT TRUE! Try this trivial program: int main(int argc, char **argv) { int i; const int *c; i = 5; c = i; i = 10; return *c; } and realize that according to the C rules, if it returns anything but 10, the compiler is *buggy*. That's not how this works (as we obviously agree). Please consider a rewrite of your example, demonstrating the usefulness and proper application of const pointers: extern foo(const int *); int main(int argc, char **argv) { int i; i = 5; foo(i); return i; } Now, if the program returns anything else than 5, it means someone cast away const, which is generally considered a bad idea in most other software projects, for this very reason. Not at all. #include stdio.h #include stdlib.h char *lookup[5]; const char *get() { *(*lookup = malloc(1)) = '1'; return *lookup; } void set(const char *d, char val) { for (int i = 0; i 5; ++i) if (lookup[i] == d) *(lookup[i]) = val; } int main() { const char *p = get(); printf(%c\n, *p); set(p, '2'); printf(%c\n, *p); return 0; } Do you see anything that casts the const away? No? Me neither. Still, the memory that p points to was changed, because there was another pointer and that was not const. *That* is the purpose of const pointers. The only thing that const can tell you is that you should not modify the value _yourself_, using that pointer _directly_. It's somewhat like a soft half protected/private specifier. You may read this value, but if you want to write to it, please use the setter function I provide for you. Because that setter function might do some special stuff, like counting how often that value was written. And accepting a pointer to a const as an argument does _only_ say: It's ok to call this function if you only received a pointer to a const, the function does the Right Thing for such pointers. It does not guarantee at all, that the function won't change the memory the pointer is pointing to. Take a set of functions that manage memory for foo objects: const struct foo *get(someIdentifier); struct foo *makeWritable(const struct foo *); void free(const struct foo *); get() returns a pointer to a foo object, and it might return a pointer to a _shared_ instance. Obviously it should make the pointer const, the caller should not modify the shared instance. makeWritable() accepts a pointer to a const foo, because you generally want to pass it such a pointer to get a non-const one instead. The function might just use ref-counting and see if it needs to create a copy returning a pointer to points to a different location in memory or if it can just return its _internal_ _non-const_ pointer. No casts! free() also accepts a pointer to a const foo, because you obviously want to be able to free the shared stuff, too. It just happens to not always go away immediately, because there's some ref-counting going on. But you are _still_ invalidating your const pointer. You lost all rights to using it, because you _gave it up_ when you called free(). An important part of free() is to _invalidate_ the _pointer_. Whether or not the object it pointed to was modified or not doesn't matter at all, because you have no valid means of accessing it _anyway_, it's totally out of scope. Now I can hear you screaming But that's refcounting! That's totally different!. It's _not_. Just pretend that get() cannot return the same thing twice, then your ref-counting only goes up to one and boom, you've just reinvented malloc/free, with a const pointer being passed to free(). Passing a pointer to free invalidates that pointer, and all pointers that share the same value. And then, when no valid ways to access the object are left, free() is free to use the memory in any way it wants. Without any stricht need to use the const pointer to perform any writes. That might be an optimization, but you can do without. What you're arguing about is actually that you don't want anyone to be able to use a const pointer to invalidate any other pointer, not that you don't want an object to be changed through a const pointer, because that is simply not what happens. If you want to restrict the set of pointers that can be invalidated by an other pointer, you'll have to use something else because const does not talk about invalidating aliasing pointers. Björn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More breakage in native_rdtsc out of line in git-x86
On 2008.01.09 23:41:42 +0100, Andi Kleen wrote: > On Wed, Jan 09, 2008 at 02:35:55PM -0800, Harvey Harrison wrote: > > On Wed, 2008-01-09 at 23:28 +0100, Andi Kleen wrote: > > > > > Do you have a simple recipe to just update from the the remote branch, > > > assuming there are no local changes or local branches? > > > > > > -Andi > > > > For staying up to date I use the following: > > > > # Add Linus's tree as a remote > > git remote --add linus > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git > > > > # Add Ingo's tree as a remote > > git remote --add x86 > > git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git > > > > # With that setup, just run the following to get any changes you > > # don't have. It will also notice any new branches Ingo/Linus > > # add to their repo. Look in .git/config afterwards, the format > > # to add new remotes is easy to figure out. > > git remote update > > I'm already cloning the branches; the problem is not getting conflicts etc. > when updating. I guess you're using "git pull" to update a local branch? That will try to merge the new state of x86/mm into your local branch, and that breaks. If you just want to have a local branch that gives you a "fixed" state of x86/mm regardless of whether or not you already fetched newer ones, you can do: git branch myThing x86/mm # create the branch work/test/whatever To fetch a new "state" from the remote: git fetch x86/mm # or git remote update, or whatever To update your branch to point to the new state: git branch - f myThing x86/mm That basically replaces it with a new branch of the same name, but pointing to the new x86/mm. Or if you want to get your working tree to that state at the same time, you can also do: git checkout myThing git reset --hard x86/mm Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More breakage in native_rdtsc out of line in git-x86
On 2008.01.09 18:48:00 +0100, Andi Kleen wrote: > On Wed, Jan 09, 2008 at 05:30:18PM +0100, Ingo Molnar wrote: > > then you have a truly ancient x86.git repository ;-) > > I update only infrequently because frankly git's remote branch tracking > is a mess. At least it doesn't really work for x86#mm here. > > I usually have to blow away the repository and reclone > to get back to a sane state. Someone in #git had a similar problem today. Conclusion was that x86/mm is not "stable" in the sense that commits are only added, instead history gets rewritten. That breaks pull/merge/"basic rebase". Basically, you'll want to "rebase --onto", taking your local commits from the old branch history to the new branch history. One way to make that bearable is to have two branches (or a branch and a tag). One branch is used to keep your work, the other branch (or tag) is used to "mark" where the old upstream ended and your work started. Assuming that your remote is called "x86", this could look like this: git branch myStuff x86/mm git branch myStuff_start x86/mm work work work commit commit commit git fetch x86 mm git rebase --onto x86/mm myStuff_start myStuff git branch -f myStuff_start x86/mm The rebase will take all of the commits that you added to myStuff, on top of the old x86/mm (referred to by myStuff_start) and try to apply them on top of the new x86/mm, updating myStuff to point to the rebased commits afterwards. Then myStuff_start is updated to point to the now current start of your stuff, so that you can do the rebase again later. That's not actually a problem of git's tracking branches, because they're not made for that style of work at all. For such stuff, rebase is the only useful option AFAIK. HTH, Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More breakage in native_rdtsc out of line in git-x86
On 2008.01.09 18:48:00 +0100, Andi Kleen wrote: On Wed, Jan 09, 2008 at 05:30:18PM +0100, Ingo Molnar wrote: then you have a truly ancient x86.git repository ;-) I update only infrequently because frankly git's remote branch tracking is a mess. At least it doesn't really work for x86#mm here. I usually have to blow away the repository and reclone to get back to a sane state. Someone in #git had a similar problem today. Conclusion was that x86/mm is not stable in the sense that commits are only added, instead history gets rewritten. That breaks pull/merge/basic rebase. Basically, you'll want to rebase --onto, taking your local commits from the old branch history to the new branch history. One way to make that bearable is to have two branches (or a branch and a tag). One branch is used to keep your work, the other branch (or tag) is used to mark where the old upstream ended and your work started. Assuming that your remote is called x86, this could look like this: git branch myStuff x86/mm git branch myStuff_start x86/mm work work work commit commit commit git fetch x86 mm git rebase --onto x86/mm myStuff_start myStuff git branch -f myStuff_start x86/mm The rebase will take all of the commits that you added to myStuff, on top of the old x86/mm (referred to by myStuff_start) and try to apply them on top of the new x86/mm, updating myStuff to point to the rebased commits afterwards. Then myStuff_start is updated to point to the now current start of your stuff, so that you can do the rebase again later. That's not actually a problem of git's tracking branches, because they're not made for that style of work at all. For such stuff, rebase is the only useful option AFAIK. HTH, Björn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More breakage in native_rdtsc out of line in git-x86
On 2008.01.09 23:41:42 +0100, Andi Kleen wrote: On Wed, Jan 09, 2008 at 02:35:55PM -0800, Harvey Harrison wrote: On Wed, 2008-01-09 at 23:28 +0100, Andi Kleen wrote: Do you have a simple recipe to just update from the the remote branch, assuming there are no local changes or local branches? -Andi For staying up to date I use the following: # Add Linus's tree as a remote git remote --add linus git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git # Add Ingo's tree as a remote git remote --add x86 git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git # With that setup, just run the following to get any changes you # don't have. It will also notice any new branches Ingo/Linus # add to their repo. Look in .git/config afterwards, the format # to add new remotes is easy to figure out. git remote update I'm already cloning the branches; the problem is not getting conflicts etc. when updating. I guess you're using git pull to update a local branch? That will try to merge the new state of x86/mm into your local branch, and that breaks. If you just want to have a local branch that gives you a fixed state of x86/mm regardless of whether or not you already fetched newer ones, you can do: git branch myThing x86/mm # create the branch work/test/whatever To fetch a new state from the remote: git fetch x86/mm # or git remote update, or whatever To update your branch to point to the new state: git branch - f myThing x86/mm That basically replaces it with a new branch of the same name, but pointing to the new x86/mm. Or if you want to get your working tree to that state at the same time, you can also do: git checkout myThing git reset --hard x86/mm Björn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix forcedeth reversing the MAC address on suspend
On 2008.01.04 23:26:33 +0100, Björn Steinbrink wrote: > For cards that initially have the MAC address stored in reverse order, > the forcedeth driver uses a flag to signal whether the address was > already corrected, so that it is not reversed again on a subsequent > probe. > > Unfortunately this flag, which is stored in a register of the card, > seems to get lost during suspend, resulting in the MAC address being > reversed again. To fix that, the MAC address needs to be written back in > reversed order before we suspend and the flag needs to be reset. > > The flag is still required because at least kexec will never write back > the reversed address and thus needs to know what state the card is in. > > Signed-off-by: Björn Steinbrink <[EMAIL PROTECTED]> > --- > On 2008.01.04 15:08:42 +0100, Richard Jonsson wrote: > > Björn Steinbrink skrev: > >> Richard, could you give this a spin? And then we'd likely need someone > >> to test that with kexec... > > > > The patch you sent does the trick, works fine now, thanks! > > I cannot test this with kexec as I barely know what it is, I'll leave that > > to someone else. > > Thanks. > > Ayaz, you originally wrote the kexec fix (IIRC), was my analysis of the > problem correct? If so, I'm quite sure that the patch DTRT. Still it > should be tested for the rmmod+modprobe and the kexec case. I'll try to > get my box free for some testing, but that's unlikely in the next few > days. Plus, I've never used kexec myself either. So I'd be grateful if > someone else would step up. Found a few minutes to test, but kexec would just hang for me. So I'm unable to test that atm. :-( Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth: MAC-address reversed on resume from suspend
On 2008.01.06 19:49:49 -0200, Adolfo R. Brandes wrote: > I have this forcedeth MAC address reversal problem when suspending > on 2 distinct boxes. I can confirm Steinbrink's patch fixes the > problem on only one of them: > > 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev f3) > > On the other one the problem persists: > > 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1) Thanks. Leaves me pretty clueless though. Especially since it worked for Richard, who also has a MCP51. In a private mail, you said that you had hardcoded the mac address in the source. Any chance that you applied the patch on your modified sources and didn't get it right? thanks, Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth: MAC-address reversed on resume from suspend
On 2008.01.06 19:49:49 -0200, Adolfo R. Brandes wrote: I have this forcedeth MAC address reversal problem when suspending on 2 distinct boxes. I can confirm Steinbrink's patch fixes the problem on only one of them: 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev f3) On the other one the problem persists: 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1) Thanks. Leaves me pretty clueless though. Especially since it worked for Richard, who also has a MCP51. In a private mail, you said that you had hardcoded the mac address in the source. Any chance that you applied the patch on your modified sources and didn't get it right? thanks, Björn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix forcedeth reversing the MAC address on suspend
On 2008.01.04 23:26:33 +0100, Björn Steinbrink wrote: For cards that initially have the MAC address stored in reverse order, the forcedeth driver uses a flag to signal whether the address was already corrected, so that it is not reversed again on a subsequent probe. Unfortunately this flag, which is stored in a register of the card, seems to get lost during suspend, resulting in the MAC address being reversed again. To fix that, the MAC address needs to be written back in reversed order before we suspend and the flag needs to be reset. The flag is still required because at least kexec will never write back the reversed address and thus needs to know what state the card is in. Signed-off-by: Björn Steinbrink [EMAIL PROTECTED] --- On 2008.01.04 15:08:42 +0100, Richard Jonsson wrote: Björn Steinbrink skrev: Richard, could you give this a spin? And then we'd likely need someone to test that with kexec... The patch you sent does the trick, works fine now, thanks! I cannot test this with kexec as I barely know what it is, I'll leave that to someone else. Thanks. Ayaz, you originally wrote the kexec fix (IIRC), was my analysis of the problem correct? If so, I'm quite sure that the patch DTRT. Still it should be tested for the rmmod+modprobe and the kexec case. I'll try to get my box free for some testing, but that's unlikely in the next few days. Plus, I've never used kexec myself either. So I'd be grateful if someone else would step up. Found a few minutes to test, but kexec would just hang for me. So I'm unable to test that atm. :-( Björn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth: MAC-address reversed on resume from suspend
On 2008.01.05 07:10:02 +0100, Björn Steinbrink wrote: > - nv_probe sees 00:11:22:00:00:01 > - doesn't match the vendor stuff > - becomes 01:00:00:22:11:00 > > Oops, that's not quite the expected thing. Haha, I just noticed that that even is a multicast address, so you'd get a random MAC address in that case *lol. Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth: MAC-address reversed on resume from suspend
On 2008.01.04 23:43:52 +0100, Andreas Mohr wrote: > On Fri, Jan 04, 2008 at 11:17:40AM +0100, Björn Steinbrink wrote: > > On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote: > > > And then it needs these card I/O functions wrapped into two > > > functions which interface with driver- and OS-related MAC > > > variables (struct variables ALWAYS stored in usual system order, > > > NOT H/W order!!) which optionally reverse the address (if > > > needed for a particular card). > > > > Hu? The MAC address is only ever reversed when the card is not in > > use, why the hell would you want to reverse anything when the rest > > of the OS is involved? This whole reversing stuff is purely before > > and after the card is actually used. It's not that you need to > > reverse the address to communicate with the card, it's just > > initially wrong on the card. > > Huhrmm? OK, let me ask this then: So what you're saying is that the > address is only initially wrong (e.g. due to card EEPROM content > somehow initializing the registers in wrong order on power-on), it's > not _always_ supposed to be stored in wrong order during operation. > > IOW, the default card state after power-on is _unusable_ (due to > invalidly ordered MAC address) and has to be _corrected_ by the > driver, _initially_ only? Yeah. _But_, all deduced from the code. > OTOH I know that at least acx100 has a _permanently_ wrong-ordered MAC > address setup, i.e. it's required to have it in "wrong" order _during > operation_. And I wouldn't be surprised to see several examples of > other hardware having a permanently wrongly-ordered address > requirement, given the amount of MAC reversal in Linux drivers. > > Couldn't it be by chance that it's _believed_ to be > reversed-on-power-on only, whereas those cards should _actually_ have > it reversed-during-lifetime instead? Such a misunderstanding might > explain a lot of trouble... Humm... Wouldn't that have made itself evident? The card's not running in promisc mode all the time, so there should be some problems if the card would expect the MAC in the reversed order, right? (At least I see some code that can switch it into promisc mode, so I assume that it is not enabled all the time). There _is_ a comment about some cards not generating any TX interrupts at all. But that seems to predate any card that stores the MAC address in correct order (the patch for that came after git, the comment predates git). So assuming that the card wants the address in reversed order at runtime, would likely imply that _no_ card would generate TX interrupts (unless promisc?), and the comment basically invalidates that assumption. > > > And then there will never be any confusion about any MAC address > > > format anywhere any more, right? > > > > At probing time, you'll have to know whether the address you read > > from the hardware is reversed or not. Unless you write it back in > > reversed order when you release the card, you need a flag that at > > least survives until nv_probe is called the next time. kexec does > > not write it back, so you do need such a flag. But the flag > > apparently doesn't survive a suspend/resume cycle, so you need to > > write back the reversed address then. But the flag will survive a > > rmmod + modprobe cycle, so you need to reset it when writing back > > the reversed address. > > If it's indeed reversed-on-power-on only, then one probably cannot do > much about it, thus I'd give up and simply check the MAC address for > validity (should be very easy to check for a simple reversal by > checking for the manufacturer-specific fingerprint in reverse order). > Keeping _any_ sort of state to record the fact that it used to be > reversed or now is reversed is futile (or rather: catastrophic) given > the complexity of suspend / virtualisation or whichever other nice > operations Linux may invent in the future ;) There was once a patch that checked the vendor specific address part, long ago. No idea what happened to it. But for kexec, I'd say that that is broken, too. - ifconfig eth0 hw ether 00:11:22:00:00:01 - $kexec_incantation - nv_probe sees 00:11:22:00:00:01 - doesn't match the vendor stuff - becomes 01:00:00:22:11:00 Oops, that's not quite the expected thing. No way to tell whether we have to reverse from the address alone AFAICT. Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix forcedeth reversing the MAC address on suspend
For cards that initially have the MAC address stored in reverse order, the forcedeth driver uses a flag to signal whether the address was already corrected, so that it is not reversed again on a subsequent probe. Unfortunately this flag, which is stored in a register of the card, seems to get lost during suspend, resulting in the MAC address being reversed again. To fix that, the MAC address needs to be written back in reversed order before we suspend and the flag needs to be reset. The flag is still required because at least kexec will never write back the reversed address and thus needs to know what state the card is in. Signed-off-by: Björn Steinbrink <[EMAIL PROTECTED]> --- On 2008.01.04 15:08:42 +0100, Richard Jonsson wrote: > Björn Steinbrink skrev: >> Richard, could you give this a spin? And then we'd likely need someone >> to test that with kexec... > > The patch you sent does the trick, works fine now, thanks! > I cannot test this with kexec as I barely know what it is, I'll leave that > to someone else. Thanks. Ayaz, you originally wrote the kexec fix (IIRC), was my analysis of the problem correct? If so, I'm quite sure that the patch DTRT. Still it should be tested for the rmmod+modprobe and the kexec case. I'll try to get my box free for some testing, but that's unlikely in the next few days. Plus, I've never used kexec myself either. So I'd be grateful if someone else would step up. diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c index a96583c..f84c752 100644 --- a/drivers/net/forcedeth.c +++ b/drivers/net/forcedeth.c @@ -5199,10 +5199,6 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i dev->dev_addr[3] = (np->orig_mac[0] >> 16) & 0xff; dev->dev_addr[4] = (np->orig_mac[0] >> 8) & 0xff; dev->dev_addr[5] = (np->orig_mac[0] >> 0) & 0xff; - /* set permanent address to be correct aswell */ - np->orig_mac[0] = (dev->dev_addr[0] << 0) + (dev->dev_addr[1] << 8) + - (dev->dev_addr[2] << 16) + (dev->dev_addr[3] << 24); - np->orig_mac[1] = (dev->dev_addr[4] << 0) + (dev->dev_addr[5] << 8); writel(txreg|NVREG_TRANSMITPOLL_MAC_ADDR_REV, base + NvRegTransmitPoll); } memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len); @@ -5414,6 +5410,8 @@ static void __devexit nv_remove(struct pci_dev *pci_dev) */ writel(np->orig_mac[0], base + NvRegMacAddrA); writel(np->orig_mac[1], base + NvRegMacAddrB); + writel(readl(base + NvRegTransmitPoll) & ~NVREG_TRANSMITPOLL_MAC_ADDR_REV, + base + NvRegTransmitPoll); /* free all structures */ free_rings(dev); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth: MAC-address reversed on resume from suspend
On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote: > Hi, > > On Fri, Jan 04, 2008 at 04:43:57AM +0100, Björn Steinbrink wrote: > > On 2008.01.03 01:42:09 +0200, Adrian Bunk wrote: > > > [ original bug report: http://lkml.org/lkml/2008/1/2/253 ] > > > > > > On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote: > > > > The nv_probe() function then nicely obeys this flag by reversing the > > > > address if needed, _however_ there's another function, > > > > nv_copy_mac_to_hw(), which does _NOT_ obey it and simply copies the > > > > _raw_ netdev dev_addr to the card register (NvRegMacAddrA etc.). > > > > nv_copy_mac_to_hw() is called from nv_probe() _after_ the MAC address > > has been fixed, and from nv_set_mac_address() which is supposed to get a > > correct MAC address anyway. So I don't see any problem there. > > /* set permanent address to be correct aswell */ > ... orig_mac fumbling ... > > Why, then, does this driver do such a nice hack as: > > /* special op: write back the misordered MAC address - otherwise > * the next nv_probe would see a wrong address. > */ > writel(np->orig_mac[0], base + NvRegMacAddrA); > writel(np->orig_mac[1], base + NvRegMacAddrB); > > I really don't see why this driver needs to do these things in such a > messy way. Sure you can add some layers of wrappers and then have "nv_store_mac(np->orig_mac)" instead. Congratulations, you saved 2 lines and added 6 new ones (assuming that you don't go universally u8, that adds even more). This is the only place that writes the MAC address besides the existing nv_copy_mac_to_hw wrapper which deals with net_devices and thus is perfectly usable, internally and externally. > One thing is for certain: netdev->dev_addr is always in operating system > order, and order should always be handled properly when copying to/from > hardware. > > Such a driver needs exactly *one* central helper method > to reverse an input MAC address in 6 bytes u8 format > (which could arguably be added to include/linux/etherdevice.h even, > since a wee bit too many devices seem to be having trouble > with wrongly ordered MAC addresses). > Then it needs exactly *one* function for I/O register access > to read a MAC address from a device and exactly *one* function > to write it back (both doing raw, unsorted format transfers only, > and possibly as inline function). A function to read the MAC address from the hardware? There's only a single place (nv_probe()) that ever reads it, why bother? > And then it needs these card I/O functions wrapped into two functions which > interface with driver- and OS-related MAC variables > (struct variables ALWAYS stored in usual system order, NOT H/W order!!) > which optionally reverse the address (if needed for a particular card). Hu? The MAC address is only ever reversed when the card is not in use, why the hell would you want to reverse anything when the rest of the OS is involved? This whole reversing stuff is purely before and after the card is actually used. It's not that you need to reverse the address to communicate with the card, it's just initially wrong on the card. > And then there will never be any confusion about any MAC address format > anywhere any more, right? At probing time, you'll have to know whether the address you read from the hardware is reversed or not. Unless you write it back in reversed order when you release the card, you need a flag that at least survives until nv_probe is called the next time. kexec does not write it back, so you do need such a flag. But the flag apparently doesn't survive a suspend/resume cycle, so you need to write back the reversed address then. But the flag will survive a rmmod + modprobe cycle, so you need to reset it when writing back the reversed address. > I really don't mean to sound cranky, but this driver did ask for trouble, > and lots of it ;) > > Thank you for your reply, and especially thank you for this very fast > patch! Well, let's see if my analysis was correct and the patch works, first. I saw the forcedeth code before, but kexec and suspend is totally new to me and I just made some assumptions based on the reported behaviour and the commit messages... ;-) Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth: MAC-address reversed on resume from suspend
On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote: Hi, On Fri, Jan 04, 2008 at 04:43:57AM +0100, Björn Steinbrink wrote: On 2008.01.03 01:42:09 +0200, Adrian Bunk wrote: [ original bug report: http://lkml.org/lkml/2008/1/2/253 ] On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote: The nv_probe() function then nicely obeys this flag by reversing the address if needed, _however_ there's another function, nv_copy_mac_to_hw(), which does _NOT_ obey it and simply copies the _raw_ netdev dev_addr to the card register (NvRegMacAddrA etc.). nv_copy_mac_to_hw() is called from nv_probe() _after_ the MAC address has been fixed, and from nv_set_mac_address() which is supposed to get a correct MAC address anyway. So I don't see any problem there. /* set permanent address to be correct aswell */ ... orig_mac fumbling ... Why, then, does this driver do such a nice hack as: /* special op: write back the misordered MAC address - otherwise * the next nv_probe would see a wrong address. */ writel(np-orig_mac[0], base + NvRegMacAddrA); writel(np-orig_mac[1], base + NvRegMacAddrB); I really don't see why this driver needs to do these things in such a messy way. Sure you can add some layers of wrappers and then have nv_store_mac(np-orig_mac) instead. Congratulations, you saved 2 lines and added 6 new ones (assuming that you don't go universally u8, that adds even more). This is the only place that writes the MAC address besides the existing nv_copy_mac_to_hw wrapper which deals with net_devices and thus is perfectly usable, internally and externally. One thing is for certain: netdev-dev_addr is always in operating system order, and order should always be handled properly when copying to/from hardware. Such a driver needs exactly *one* central helper method to reverse an input MAC address in 6 bytes u8 format (which could arguably be added to include/linux/etherdevice.h even, since a wee bit too many devices seem to be having trouble with wrongly ordered MAC addresses). Then it needs exactly *one* function for I/O register access to read a MAC address from a device and exactly *one* function to write it back (both doing raw, unsorted format transfers only, and possibly as inline function). A function to read the MAC address from the hardware? There's only a single place (nv_probe()) that ever reads it, why bother? And then it needs these card I/O functions wrapped into two functions which interface with driver- and OS-related MAC variables (struct variables ALWAYS stored in usual system order, NOT H/W order!!) which optionally reverse the address (if needed for a particular card). Hu? The MAC address is only ever reversed when the card is not in use, why the hell would you want to reverse anything when the rest of the OS is involved? This whole reversing stuff is purely before and after the card is actually used. It's not that you need to reverse the address to communicate with the card, it's just initially wrong on the card. And then there will never be any confusion about any MAC address format anywhere any more, right? At probing time, you'll have to know whether the address you read from the hardware is reversed or not. Unless you write it back in reversed order when you release the card, you need a flag that at least survives until nv_probe is called the next time. kexec does not write it back, so you do need such a flag. But the flag apparently doesn't survive a suspend/resume cycle, so you need to write back the reversed address then. But the flag will survive a rmmod + modprobe cycle, so you need to reset it when writing back the reversed address. I really don't mean to sound cranky, but this driver did ask for trouble, and lots of it ;) Thank you for your reply, and especially thank you for this very fast patch! Well, let's see if my analysis was correct and the patch works, first. I saw the forcedeth code before, but kexec and suspend is totally new to me and I just made some assumptions based on the reported behaviour and the commit messages... ;-) Björn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth: MAC-address reversed on resume from suspend
On 2008.01.04 23:43:52 +0100, Andreas Mohr wrote: On Fri, Jan 04, 2008 at 11:17:40AM +0100, Björn Steinbrink wrote: On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote: And then it needs these card I/O functions wrapped into two functions which interface with driver- and OS-related MAC variables (struct variables ALWAYS stored in usual system order, NOT H/W order!!) which optionally reverse the address (if needed for a particular card). Hu? The MAC address is only ever reversed when the card is not in use, why the hell would you want to reverse anything when the rest of the OS is involved? This whole reversing stuff is purely before and after the card is actually used. It's not that you need to reverse the address to communicate with the card, it's just initially wrong on the card. Huhrmm? OK, let me ask this then: So what you're saying is that the address is only initially wrong (e.g. due to card EEPROM content somehow initializing the registers in wrong order on power-on), it's not _always_ supposed to be stored in wrong order during operation. IOW, the default card state after power-on is _unusable_ (due to invalidly ordered MAC address) and has to be _corrected_ by the driver, _initially_ only? Yeah. _But_, all deduced from the code. OTOH I know that at least acx100 has a _permanently_ wrong-ordered MAC address setup, i.e. it's required to have it in wrong order _during operation_. And I wouldn't be surprised to see several examples of other hardware having a permanently wrongly-ordered address requirement, given the amount of MAC reversal in Linux drivers. Couldn't it be by chance that it's _believed_ to be reversed-on-power-on only, whereas those cards should _actually_ have it reversed-during-lifetime instead? Such a misunderstanding might explain a lot of trouble... Humm... Wouldn't that have made itself evident? The card's not running in promisc mode all the time, so there should be some problems if the card would expect the MAC in the reversed order, right? (At least I see some code that can switch it into promisc mode, so I assume that it is not enabled all the time). There _is_ a comment about some cards not generating any TX interrupts at all. But that seems to predate any card that stores the MAC address in correct order (the patch for that came after git, the comment predates git). So assuming that the card wants the address in reversed order at runtime, would likely imply that _no_ card would generate TX interrupts (unless promisc?), and the comment basically invalidates that assumption. And then there will never be any confusion about any MAC address format anywhere any more, right? At probing time, you'll have to know whether the address you read from the hardware is reversed or not. Unless you write it back in reversed order when you release the card, you need a flag that at least survives until nv_probe is called the next time. kexec does not write it back, so you do need such a flag. But the flag apparently doesn't survive a suspend/resume cycle, so you need to write back the reversed address then. But the flag will survive a rmmod + modprobe cycle, so you need to reset it when writing back the reversed address. If it's indeed reversed-on-power-on only, then one probably cannot do much about it, thus I'd give up and simply check the MAC address for validity (should be very easy to check for a simple reversal by checking for the manufacturer-specific fingerprint in reverse order). Keeping _any_ sort of state to record the fact that it used to be reversed or now is reversed is futile (or rather: catastrophic) given the complexity of suspend / virtualisation or whichever other nice operations Linux may invent in the future ;) There was once a patch that checked the vendor specific address part, long ago. No idea what happened to it. But for kexec, I'd say that that is broken, too. - ifconfig eth0 hw ether 00:11:22:00:00:01 - $kexec_incantation - nv_probe sees 00:11:22:00:00:01 - doesn't match the vendor stuff - becomes 01:00:00:22:11:00 Oops, that's not quite the expected thing. No way to tell whether we have to reverse from the address alone AFAICT. Björn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth: MAC-address reversed on resume from suspend
On 2008.01.05 07:10:02 +0100, Björn Steinbrink wrote: - nv_probe sees 00:11:22:00:00:01 - doesn't match the vendor stuff - becomes 01:00:00:22:11:00 Oops, that's not quite the expected thing. Haha, I just noticed that that even is a multicast address, so you'd get a random MAC address in that case *lol. Björn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth: MAC-address reversed on resume from suspend
On 2008.01.03 01:42:09 +0200, Adrian Bunk wrote: > [ original bug report: http://lkml.org/lkml/2008/1/2/253 ] > > On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote: > > Hi, > > > > On Wed, Jan 02, 2008 at 10:04:49PM +0100, Richard Jonsson wrote: > > > Bugreport regarding forcedeth driver. > > > > > > When returning from suspend-to-RAM the MAC-address byteorder is > > > reversed. After another suspend-resume cycle the MAC-address is > > > again correct. This brings a great deal of pain since the NIC is > > > assigned a random MAC-address when it is detected as invalid. > > > > > > I cannot say if this issue appeared recently or if it's been in > > > the kernel for a while, as I've been using wireless until > > > recently. > > > > > > I read somewhere on lkml that the mac is read from the device, > > > then reversed and finally written back to the device. Can it be > > > that it is read again on resume and reversed, and then again > > > written to the device? This would explain why it's ok every other > > > resume cycle. > > > > Uh, Use The Source, Luke? > > > > But no, I think it's simply driver dainbreadness, not a matter of > > complicated writing and reading back in reversed order. > > > > drivers/net/forcedeth.c has a nice DEV_HAS_CORRECT_MACADDR flag > > which is being enabled for certain cards (those which need this) and > > disabled for others. > > > > The nv_probe() function then nicely obeys this flag by reversing the > > address if needed, _however_ there's another function, > > nv_copy_mac_to_hw(), which does _NOT_ obey it and simply copies the > > _raw_ netdev dev_addr to the card register (NvRegMacAddrA etc.). nv_copy_mac_to_hw() is called from nv_probe() _after_ the MAC address has been fixed, and from nv_set_mac_address() which is supposed to get a correct MAC address anyway. So I don't see any problem there. After some digging, I guess it's related to 5070d3408405ae1941f259acac7a9882045c3be4 That introduced a flag in a register to signal if the MAC address has been corrected, so that for example kexec won't reverse the adddress again, when calling nv_probe() again. As I know neither the suspend nor the kexec code well enough, I'll have to make a few smart guesses (let's hope that that works out *g*): a) suspend manages to reverse the MAC address again, so it must call nv_probe. And we have lost the flag that stops us from reversing the address. We cannot have lost the fixed MAC address, because then we'd always get the reversed address, and not go back and forth. I'll assume that it will also call nv_remove during suspend, because it's nicely symmetrical then :-) b) kexec does not call nv_remove, because otherwise, it would see a reversed address on its nv_probe call anyway, and the flag wouldn't be necessary. Now the commit that introduced the flag did also introduce an indirect change to nv_remove. Although it still says that it writes back the broken MAC address, that's no longer true, because orig_mac now also gets the correct address in nv_probe. Smart guessing time again: That was required because otherwise a "rmood forcedeth && modprobe forcedeth" would have produced a reversed MAC address. But that also causes the problem that we get when we loose either the flag, we start reversing the fixed address. If we instead return the card to its initial state in nv_remove, ie. unset the flag and write back the reversed address, then everyone going through nv_remove _and_ nv_probe should be happy. And kexec, which only goes through nv_probe again and doesn't loose the flag should be happy, too. Richard, could you give this a spin? And then we'd likely need someone to test that with kexec... thanks, Björn --- diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c index a96583c..f84c752 100644 --- a/drivers/net/forcedeth.c +++ b/drivers/net/forcedeth.c @@ -5199,10 +5199,6 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i dev->dev_addr[3] = (np->orig_mac[0] >> 16) & 0xff; dev->dev_addr[4] = (np->orig_mac[0] >> 8) & 0xff; dev->dev_addr[5] = (np->orig_mac[0] >> 0) & 0xff; - /* set permanent address to be correct aswell */ - np->orig_mac[0] = (dev->dev_addr[0] << 0) + (dev->dev_addr[1] << 8) + - (dev->dev_addr[2] << 16) + (dev->dev_addr[3] << 24); - np->orig_mac[1] = (dev->dev_addr[4] << 0) + (dev->dev_addr[5] << 8); writel(txreg|NVREG_TRANSMITPOLL_MAC_ADDR_REV, base + NvRegTransmitPoll); } memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len); @@ -5414,6 +5410,8 @@ static void __devexit nv_remove(struct pci_dev *pci_dev) */ writel(np->orig_mac[0], base + NvRegMacAddrA); writel(np->orig_mac[1], base + NvRegMacAddrB); + writel(readl(base + NvRegTransmitPoll) & ~NVREG_TRANSMITPOLL_MAC_ADDR_REV, + base + NvRegTransmitPoll); /*
Re: Top 9 kernel oopses/warnings for the week of December 29th, 2007
On 2007.12.29 11:18:18 -0800, Linus Torvalds wrote: > > > On Sat, 29 Dec 2007, Arjan van de Ven wrote: > > > > It has been a quiet week due to the holidays, only 55 oops traces > > have been collected. > > This would be more useful if it was more readable. As it is, you seem to > have some formatting errors in your automation, where the things are > incorrectly grouped: > > > Rank 1: sysctl_head_finish > > sysctl table check failed > > Reported 7 times > > Only reported for the proprietary madwifi driver > > More info: > > http://www.kerneloops.org/search.php?search=sysctl_head_finish > > Rank 2: __ieee80211_rx > > Warning at net/mac80211/rx.c:1672 > > ie notice how that "Rank 2" got reported as if it was part of the "Rank 1" > issue. Shows up correctly here. Random guess is that your MUA treats tabs like spaces when it sees format=flowed. The lines between the different ranks weren't empty but had a single tab. And for whatever reason (I don't see any trailing spaces), Arjan's mail had a format=flowed header. AFAIK, only SPACE CRLF is valid for line-continuation with format=flowed, not TAB CRLF. So the above formatting would be the result of a pointless format=flowed header meeting a MUA that was too eager searching for line continuations. Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Top 9 kernel oopses/warnings for the week of December 29th, 2007
On 2007.12.29 11:18:18 -0800, Linus Torvalds wrote: On Sat, 29 Dec 2007, Arjan van de Ven wrote: It has been a quiet week due to the holidays, only 55 oops traces have been collected. This would be more useful if it was more readable. As it is, you seem to have some formatting errors in your automation, where the things are incorrectly grouped: Rank 1: sysctl_head_finish sysctl table check failed Reported 7 times Only reported for the proprietary madwifi driver More info: http://www.kerneloops.org/search.php?search=sysctl_head_finish Rank 2: __ieee80211_rx Warning at net/mac80211/rx.c:1672 ie notice how that Rank 2 got reported as if it was part of the Rank 1 issue. Shows up correctly here. Random guess is that your MUA treats tabs like spaces when it sees format=flowed. The lines between the different ranks weren't empty but had a single tab. And for whatever reason (I don't see any trailing spaces), Arjan's mail had a format=flowed header. AFAIK, only SPACE CRLF is valid for line-continuation with format=flowed, not TAB CRLF. So the above formatting would be the result of a pointless format=flowed header meeting a MUA that was too eager searching for line continuations. Björn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix dirty page accounting leak with ext3 data=journal
In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0, try_to_free_buffers was changed to bail out if the page was dirty. That caused truncate_complete_page to leak massive amounts of memory, because the dirty bit was only cleared after the call to try_to_free_buffers. So the call to cancel_dirty_page was moved up to have the dirty bit cleared early in 3e67c0987d7567ad41164a153dca9a43b11d. The problem with that fix is, that the page can be redirtied after cancel_dirty_page was called, eg. like this: truncate_complete_page() cancel_dirty_page() // PG_dirty cleared, decr. dirty pages do_invalidatepage() ext3_invalidatepage() journal_invalidatepage() journal_unmap_buffer() __dispose_buffer() __journal_unfile_buffer() __journal_temp_unlink_buffer() mark_buffer_dirty(); // PG_dirty set, incr. dirty pages And then we end up with dirty pages being wrongly accounted. In ecdfc9787fe527491baefc22dce8b2dbd5b2908d the changes to try_to_free_buffers were reverted, so the original reason for the massive memory leak is gone, so we can also revert the move of the call to cancel_dirty_page from truncate_complete_page and get the accounting right again. Signed-off-by: Björn Steinbrink <[EMAIL PROTECTED]> Tested-by: Krzysztof Piotr Oledzki <[EMAIL PROTECTED]> --- I'm not sure if it matters, but opposed to the final check in __remove_from_page_cache, this one also cares about the task io accounting, so maybe we want to use this instead, although it's not quite the clean fix either. diff --git a/mm/truncate.c b/mm/truncate.c index cadc156..2974903 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, struct page *page) if (page->mapping != mapping) return; - cancel_dirty_page(page, PAGE_CACHE_SIZE); - if (PagePrivate(page)) do_invalidatepage(page, 0); + cancel_dirty_page(page, PAGE_CACHE_SIZE); + remove_from_page_cache(page); ClearPageUptodate(page); ClearPageMappedToDisk(page); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix dirty page accounting leak with ext3 data=journal
In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0, try_to_free_buffers was changed to bail out if the page was dirty. That caused truncate_complete_page to leak massive amounts of memory, because the dirty bit was only cleared after the call to try_to_free_buffers. So the call to cancel_dirty_page was moved up to have the dirty bit cleared early in 3e67c0987d7567ad41164a153dca9a43b11d. The problem with that fix is, that the page can be redirtied after cancel_dirty_page was called, eg. like this: truncate_complete_page() cancel_dirty_page() // PG_dirty cleared, decr. dirty pages do_invalidatepage() ext3_invalidatepage() journal_invalidatepage() journal_unmap_buffer() __dispose_buffer() __journal_unfile_buffer() __journal_temp_unlink_buffer() mark_buffer_dirty(); // PG_dirty set, incr. dirty pages And then we end up with dirty pages being wrongly accounted. In ecdfc9787fe527491baefc22dce8b2dbd5b2908d the changes to try_to_free_buffers were reverted, so the original reason for the massive memory leak is gone, so we can also revert the move of the call to cancel_dirty_page from truncate_complete_page and get the accounting right again. Signed-off-by: Björn Steinbrink [EMAIL PROTECTED] Tested-by: Krzysztof Piotr Oledzki [EMAIL PROTECTED] --- I'm not sure if it matters, but opposed to the final check in __remove_from_page_cache, this one also cares about the task io accounting, so maybe we want to use this instead, although it's not quite the clean fix either. diff --git a/mm/truncate.c b/mm/truncate.c index cadc156..2974903 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, struct page *page) if (page-mapping != mapping) return; - cancel_dirty_page(page, PAGE_CACHE_SIZE); - if (PagePrivate(page)) do_invalidatepage(page, 0); + cancel_dirty_page(page, PAGE_CACHE_SIZE); + remove_from_page_cache(page); ClearPageUptodate(page); ClearPageMappedToDisk(page); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 9182] Critical memory leak (dirty pages)
On 2007.12.20 08:25:56 -0800, Linus Torvalds wrote: > > > On Thu, 20 Dec 2007, Bj?rn Steinbrink wrote: > > > > OK, so I looked for PG_dirty anyway. > > > > In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers > > bail out if the page is dirty. > > > > Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed > > truncate_complete_page, because it called cancel_dirty_page (and thus > > cleared PG_dirty) after try_to_free_buffers was called via > > do_invalidatepage. > > > > Now, if I'm not mistaken, we can end up as follows. > > > > truncate_complete_page() > > cancel_dirty_page() // PG_dirty cleared, decr. dirty pages > > do_invalidatepage() > > ext3_invalidatepage() > > journal_invalidatepage() > > journal_unmap_buffer() > > __dispose_buffer() > > __journal_unfile_buffer() > > __journal_temp_unlink_buffer() > > mark_buffer_dirty(); // PG_dirty set, incr. dirty pages > > Good, this seems to be the exact path that actually triggers it. I got to > journal_unmap_buffer(), but was too lazy to actually then bother to follow > it all the way down - I decided that I didn't actually really even care > what the low-level FS layer did, I had already convinced myself that it > obviously must be dirtying the page some way, since that matched the > symptoms exactly (ie only the journaling case was impacted, and this was > all about the journal). > > But perhaps more importantly: regardless of what the low-level filesystem > did at that point, the VM accounting shouldn't care, and should be robust > in the face of a low-level filesystem doing strange and wonderful things. > But thanks for bothering to go through the whole history and figure out > what exactly is up. Oh well, after seeing the move of cancel_dirty_page, I just went backwards from __set_page_dirty using cscope + some smart guessing and quickly ended up at ext3_invalidatepage, so it wasn't that hard :-) > > As try_to_free_buffers got its ext3 hack back in > > ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe > > 3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for > > the accounting fix in cancel_dirty_page, of course). > > Yes, I think we have room for cleanups now, and I agree: we ended up > reinstating some questionable code in the VM just because we didn't really > know or understand what was going on in the ext3 journal code. Hm, you attributed more to my mail than there was actually in it. I didn't even start to think of cleanups (because I don't know jack about the whole ext3/jdb stuff, so I simply cannot come up with any cleanups (yet?)).What I meant is that we only did a half-revert of that hackery. When try_to_free_buffers started to check for PG_dirty, the cancel_dirty_page call had to be called before do_invalidatepage, to "fix" a _huge_ leak. But that caused the accouting breakage we're now seeing, because we never account for the pages that got redirtied during do_invalidatepage. Then the change to try_to_free_buffers got reverted, so we no longer need to call cancel_dirty_page before do_invalidatepage, but still we do. Thus the accounting bug remains. So what I meant to suggest was simply to actually "finish" the revert we started. Or expressed as a patch: diff --git a/mm/truncate.c b/mm/truncate.c index cadc156..2974903 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, struct page *page) if (page->mapping != mapping) return; - cancel_dirty_page(page, PAGE_CACHE_SIZE); - if (PagePrivate(page)) do_invalidatepage(page, 0); + cancel_dirty_page(page, PAGE_CACHE_SIZE); + remove_from_page_cache(page); ClearPageUptodate(page); ClearPageMappedToDisk(page); I'll be the last one to comment on whether or not that causes inaccurate accouting, so I'll just watch you and Jan battle that out until someone comes up with a post-.24 patch to provide a clean fix for the issue. Krzysztof, could you give this patch a test run? If that "fixes" the problem for now, I'll try to come up with some usable commit message, or if somehow wants to beat me to it, you can already have my Signed-off-by: Björn Steinbrink <[EMAIL PROTECTED]> > > On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task > > io accounting for cancelled writes happened always happened if the page > > was dirty, regardless of page->mapping. This was also already true for > > the old test_clear_page_dirty code, and the commit log for > > 8368e328dfe1c534957051333a87b3210a12743b
Re: [Bug 9182] Critical memory leak (dirty pages)
On 2007.12.19 09:44:50 -0800, Linus Torvalds wrote: > > > On Sun, 16 Dec 2007, Krzysztof Oledzki wrote: > > > > I'll confirm this tomorrow but it seems that even switching to data=ordered > > (AFAIK default o ext3) is indeed enough to cure this problem. > > Ok, do we actually have any ext3 expert following this? I have no idea > about what the journalling code does, but I have painful memories of ext3 > doing really odd buffer-head-based IO and totally bypassing all the normal > page dirty logic. > > Judging by the symptoms (sorry for not following this well, it came up > while I was mostly away travelling), something probably *does* clear the > dirty bit on the pages, but the dirty *accounting* is not done properly, > so the kernel keeps thinking it has dirty pages. > > Now, a simple "grep" shows that ext3 does not actually do any > ClearPageDirty() or similar on its own, although maybe I missed some other > subtle way this can happen. And the *normal* VFS routines that do > ClearPageDirty should all be doing the proper accounting. > > So I see a couple of possible cases: > > - actually clearing the PG_dirty bit somehow, without doing the >accounting. > >This looks very unlikely. PG_dirty is always cleared by some variant of >"*ClearPageDirty()", and that bit definition isn't used for anything >else in the whole kernel judging by "grep" (the page allocator tests >the bit, that's it). OK, so I looked for PG_dirty anyway. In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers bail out if the page is dirty. Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed truncate_complete_page, because it called cancel_dirty_page (and thus cleared PG_dirty) after try_to_free_buffers was called via do_invalidatepage. Now, if I'm not mistaken, we can end up as follows. truncate_complete_page() cancel_dirty_page() // PG_dirty cleared, decr. dirty pages do_invalidatepage() ext3_invalidatepage() journal_invalidatepage() journal_unmap_buffer() __dispose_buffer() __journal_unfile_buffer() __journal_temp_unlink_buffer() mark_buffer_dirty(); // PG_dirty set, incr. dirty pages If journal_unmap_buffer then returns 0, try_to_free_buffers is not called and neither is cancel_dirty_page, so the dirty pages accounting is not decreased again. As try_to_free_buffers got its ext3 hack back in ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe 3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for the accounting fix in cancel_dirty_page, of course). On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task io accounting for cancelled writes happened always happened if the page was dirty, regardless of page->mapping. This was also already true for the old test_clear_page_dirty code, and the commit log for 8368e328dfe1c534957051333a87b3210a12743b doesn't mention that semantic change either, so maybe the "if (account_size)" block should be moved out of the if "(mapping && ...)" block? Björn - not sending patches because he needs sleep and wouldn't have a damn clue about what to write as a commit message anyway -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 9182] Critical memory leak (dirty pages)
On 2007.12.19 09:44:50 -0800, Linus Torvalds wrote: On Sun, 16 Dec 2007, Krzysztof Oledzki wrote: I'll confirm this tomorrow but it seems that even switching to data=ordered (AFAIK default o ext3) is indeed enough to cure this problem. Ok, do we actually have any ext3 expert following this? I have no idea about what the journalling code does, but I have painful memories of ext3 doing really odd buffer-head-based IO and totally bypassing all the normal page dirty logic. Judging by the symptoms (sorry for not following this well, it came up while I was mostly away travelling), something probably *does* clear the dirty bit on the pages, but the dirty *accounting* is not done properly, so the kernel keeps thinking it has dirty pages. Now, a simple grep shows that ext3 does not actually do any ClearPageDirty() or similar on its own, although maybe I missed some other subtle way this can happen. And the *normal* VFS routines that do ClearPageDirty should all be doing the proper accounting. So I see a couple of possible cases: - actually clearing the PG_dirty bit somehow, without doing the accounting. This looks very unlikely. PG_dirty is always cleared by some variant of *ClearPageDirty(), and that bit definition isn't used for anything else in the whole kernel judging by grep (the page allocator tests the bit, that's it). OK, so I looked for PG_dirty anyway. In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers bail out if the page is dirty. Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed truncate_complete_page, because it called cancel_dirty_page (and thus cleared PG_dirty) after try_to_free_buffers was called via do_invalidatepage. Now, if I'm not mistaken, we can end up as follows. truncate_complete_page() cancel_dirty_page() // PG_dirty cleared, decr. dirty pages do_invalidatepage() ext3_invalidatepage() journal_invalidatepage() journal_unmap_buffer() __dispose_buffer() __journal_unfile_buffer() __journal_temp_unlink_buffer() mark_buffer_dirty(); // PG_dirty set, incr. dirty pages If journal_unmap_buffer then returns 0, try_to_free_buffers is not called and neither is cancel_dirty_page, so the dirty pages accounting is not decreased again. As try_to_free_buffers got its ext3 hack back in ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe 3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for the accounting fix in cancel_dirty_page, of course). On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task io accounting for cancelled writes happened always happened if the page was dirty, regardless of page-mapping. This was also already true for the old test_clear_page_dirty code, and the commit log for 8368e328dfe1c534957051333a87b3210a12743b doesn't mention that semantic change either, so maybe the if (account_size) block should be moved out of the if (mapping ...) block? Björn - not sending patches because he needs sleep and wouldn't have a damn clue about what to write as a commit message anyway -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 9182] Critical memory leak (dirty pages)
On 2007.12.20 08:25:56 -0800, Linus Torvalds wrote: On Thu, 20 Dec 2007, Bj?rn Steinbrink wrote: OK, so I looked for PG_dirty anyway. In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers bail out if the page is dirty. Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed truncate_complete_page, because it called cancel_dirty_page (and thus cleared PG_dirty) after try_to_free_buffers was called via do_invalidatepage. Now, if I'm not mistaken, we can end up as follows. truncate_complete_page() cancel_dirty_page() // PG_dirty cleared, decr. dirty pages do_invalidatepage() ext3_invalidatepage() journal_invalidatepage() journal_unmap_buffer() __dispose_buffer() __journal_unfile_buffer() __journal_temp_unlink_buffer() mark_buffer_dirty(); // PG_dirty set, incr. dirty pages Good, this seems to be the exact path that actually triggers it. I got to journal_unmap_buffer(), but was too lazy to actually then bother to follow it all the way down - I decided that I didn't actually really even care what the low-level FS layer did, I had already convinced myself that it obviously must be dirtying the page some way, since that matched the symptoms exactly (ie only the journaling case was impacted, and this was all about the journal). But perhaps more importantly: regardless of what the low-level filesystem did at that point, the VM accounting shouldn't care, and should be robust in the face of a low-level filesystem doing strange and wonderful things. But thanks for bothering to go through the whole history and figure out what exactly is up. Oh well, after seeing the move of cancel_dirty_page, I just went backwards from __set_page_dirty using cscope + some smart guessing and quickly ended up at ext3_invalidatepage, so it wasn't that hard :-) As try_to_free_buffers got its ext3 hack back in ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe 3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for the accounting fix in cancel_dirty_page, of course). Yes, I think we have room for cleanups now, and I agree: we ended up reinstating some questionable code in the VM just because we didn't really know or understand what was going on in the ext3 journal code. Hm, you attributed more to my mail than there was actually in it. I didn't even start to think of cleanups (because I don't know jack about the whole ext3/jdb stuff, so I simply cannot come up with any cleanups (yet?)).What I meant is that we only did a half-revert of that hackery. When try_to_free_buffers started to check for PG_dirty, the cancel_dirty_page call had to be called before do_invalidatepage, to fix a _huge_ leak. But that caused the accouting breakage we're now seeing, because we never account for the pages that got redirtied during do_invalidatepage. Then the change to try_to_free_buffers got reverted, so we no longer need to call cancel_dirty_page before do_invalidatepage, but still we do. Thus the accounting bug remains. So what I meant to suggest was simply to actually finish the revert we started. Or expressed as a patch: diff --git a/mm/truncate.c b/mm/truncate.c index cadc156..2974903 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, struct page *page) if (page-mapping != mapping) return; - cancel_dirty_page(page, PAGE_CACHE_SIZE); - if (PagePrivate(page)) do_invalidatepage(page, 0); + cancel_dirty_page(page, PAGE_CACHE_SIZE); + remove_from_page_cache(page); ClearPageUptodate(page); ClearPageMappedToDisk(page); I'll be the last one to comment on whether or not that causes inaccurate accouting, so I'll just watch you and Jan battle that out until someone comes up with a post-.24 patch to provide a clean fix for the issue. Krzysztof, could you give this patch a test run? If that fixes the problem for now, I'll try to come up with some usable commit message, or if somehow wants to beat me to it, you can already have my Signed-off-by: Björn Steinbrink [EMAIL PROTECTED] On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task io accounting for cancelled writes happened always happened if the page was dirty, regardless of page-mapping. This was also already true for the old test_clear_page_dirty code, and the commit log for 8368e328dfe1c534957051333a87b3210a12743b doesn't mention that semantic change either, so maybe the if (account_size) block should be moved out of the if (mapping ...) block? I think the if (account_size) thing was *purely* for me being worried about hugetlb entries, and I think that's the only thing that passes in a zero account size. But hugetlbfs already has BDI_CAP_NO_ACCT_DIRTY set (exactly because we cannot account
Re: Major regression on hackbench with SLUB
On 2007.12.08 17:16:24 -0500, Steven Rostedt wrote: > > Hi Linus, > > > On Fri, 7 Dec 2007, Linus Torvalds wrote: > > > > > > Can you do one run with oprofile, and see exactly where the cost is? It > > > should hopefully be pretty darn obvious, considering your timing. > > The results are here: > > http://people.redhat.com/srostedt/slub/results/slab.op > http://people.redhat.com/srostedt/slub/results/slub.op Hm, you seem to be hitting the "another_slab" stuff in __slab_alloc alot. I wonder if !node_match triggers too often. We always start with the per cpu slab, if that one is on the wrong node, you'll always hit that "another_slab" path. After searching for way too long (given that I have no clue about that stuff anyway and just read the code out of curiousness), I noticed that the the cpu_to_node stuff on x86_64 seems to be initialized to 0xff (arch/x86/mm/numa_64.c), and Google brought me this dmesg output [1], which, AFAICT, shows that the per cpu slab setup is done _before_ cpu_to_node is correctly setup. That would lead to the per cpu slabs all having node == 0xff, which looks pretty bad. Disclaimer: I read the slub/numa/$WHATEVER_I_SAW_THERE for the first time, so this might be total bull ;-) Björn [1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-10/msg04648.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Major regression on hackbench with SLUB
On 2007.12.08 17:16:24 -0500, Steven Rostedt wrote: Hi Linus, On Fri, 7 Dec 2007, Linus Torvalds wrote: Can you do one run with oprofile, and see exactly where the cost is? It should hopefully be pretty darn obvious, considering your timing. The results are here: http://people.redhat.com/srostedt/slub/results/slab.op http://people.redhat.com/srostedt/slub/results/slub.op Hm, you seem to be hitting the another_slab stuff in __slab_alloc alot. I wonder if !node_match triggers too often. We always start with the per cpu slab, if that one is on the wrong node, you'll always hit that another_slab path. After searching for way too long (given that I have no clue about that stuff anyway and just read the code out of curiousness), I noticed that the the cpu_to_node stuff on x86_64 seems to be initialized to 0xff (arch/x86/mm/numa_64.c), and Google brought me this dmesg output [1], which, AFAICT, shows that the per cpu slab setup is done _before_ cpu_to_node is correctly setup. That would lead to the per cpu slabs all having node == 0xff, which looks pretty bad. Disclaimer: I read the slub/numa/$WHATEVER_I_SAW_THERE for the first time, so this might be total bull ;-) Björn [1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-10/msg04648.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: git guidance
On 2007.12.07 13:53:11 +0300, Al Boldi wrote: > Andreas Ericsson wrote: > > So, to get to the bottom of this, which of the following workflows is it > > you want git to support? > > > > ### WORKFLOW A ### > > edit, edit, edit > > edit, edit, edit > > edit, edit, edit > > Oops I made a mistake and need to hop back to "current - 12". > > edit, edit, edit > > edit, edit, edit > > publish everything, similar to just tarring up your workdir and sending > > out ### END WORKFLOW A ### > > > > ### WORKFLOW B ### > > edit, edit, edit > > ok this looks good, I want to save a checkpoint here > > edit, edit, edit > > looks good again. next checkpoint > > edit, edit, edit > > oh crap, back to checkpoint 2 > > edit, edit, edit > > ooh, that's better. save a checkpoint and publish those checkpoints > > ### END WORKFLOW B ### > > ### WORKFLOW C ### > for every save on a gitfs mounted dir, do an implied checkpoint, commit, or > publish (should be adjustable), on its privately created on-the-fly > repository. > ### END WORKFLOW C ### > > For example: > > echo "// last comment on this file" >> /gitfs.mounted/file > > should do an implied checkpoint, and make these checkpoints immediately > visible under some checkpoint branch of the gitfs mounted dir. > > Note, this way the developer gets version control without even noticing, and > works completely transparent to any kind of application. Ouch... That looks worse than "plain" per-file versioning. Not only do you per definition get "broken" commits if there's a change that affects two dependent files, you also get an insane amount of commits just for testing stuff, or fixing bugs. And unless you use some kind of union-fs on top (or keep ignored files in special unversioned area in your gitfs, which seems somewhat ugly), you'll probably also have to track lots of files in the working directory that are generated, unless you want to re-generate them after each reboot. And that leads to even more absolutely useless revisions. Just thinking of my vim .swp files (which I definitely don't want to loose on a crash/power outtage/pkill -9 . dammit) makes me scream because of the gazillion of commits they will produce (and no, I don't want them in some special out of tree directory). Plus, I have vim setup to _replace_ files on write, so that I can more easily use hard-linked copies with changing all copies at once _unless_ I explicitly want to, meaning that I'd get full remove/add commits, which are absolutely useless. And trying to detect such patterns (rename, then write the changed file with the old name and then delete the renamed file) is probably not worth the trouble, because you coincidently might _want_ to have just these three steps recorded when you happen to perform them manually. And if you go for heuristics, you'll complain each time you get a false-positive/negative. That said, out of pure curiousness I came up with the attached script which just uses inotifywait to watch a directory and issue git commands on certain events. It is extremely stupid, but seems to work. And at least it hasn't got the drawbacks of a real gitfs regarding the need to have a "separate" non-versioned storage area for the working directory, because it simply uses the existing working directory wherever that might be stored. It doesn't use GIT_DIR/WORK_DIR yet, but hey, should be easy to add... Feel free to mess with that thing, hey, maybe you even like it and extend it to match your proposed workflow even more. I for sure won't use or even extend it, so you're likely on your own there. Side-note: Writing that script probably took less time than writing this email and probably less time than was wasted on this topic. Makes me want to use today's preferred "Code talks, b...s... walks" statement, but I'll refrain from that... Just because I lack the credibility to say that, and the script attached is quite crappy ;-) Björn #!/bin/bash inotifywait -m -r --exclude ^\./\.git/.* -e close_write -e move -e create -e delete . 2>/dev/null | while read FILE_PATH EVENT FILE_NAME do FILE_NAME="$FILE_PATH$FILE_NAME" FILE_NAME=${FILE_NAME#./} # git doesn't care about directories if [ -d "$FILE_NAME" ] then continue fi case "$EVENT" in *CLOSE_WRITE*) ACTION=change ;; *MOVED_TO*) ACTION=create ;; *MODIFY*) ACTION=change ;; *DELETE*) ACTION=delete ;; *MOVED_FROM*) ACTION=delete ;; *CREATE*) ACTION=create ;; *) continue ;; esac case $ACTION in create) git add "$FILE_NAME" git commit -m "$FILE_NAME created" ;;
Re: git guidance
On 2007.12.07 13:53:11 +0300, Al Boldi wrote: Andreas Ericsson wrote: So, to get to the bottom of this, which of the following workflows is it you want git to support? ### WORKFLOW A ### edit, edit, edit edit, edit, edit edit, edit, edit Oops I made a mistake and need to hop back to current - 12. edit, edit, edit edit, edit, edit publish everything, similar to just tarring up your workdir and sending out ### END WORKFLOW A ### ### WORKFLOW B ### edit, edit, edit ok this looks good, I want to save a checkpoint here edit, edit, edit looks good again. next checkpoint edit, edit, edit oh crap, back to checkpoint 2 edit, edit, edit ooh, that's better. save a checkpoint and publish those checkpoints ### END WORKFLOW B ### ### WORKFLOW C ### for every save on a gitfs mounted dir, do an implied checkpoint, commit, or publish (should be adjustable), on its privately created on-the-fly repository. ### END WORKFLOW C ### For example: echo // last comment on this file /gitfs.mounted/file should do an implied checkpoint, and make these checkpoints immediately visible under some checkpoint branch of the gitfs mounted dir. Note, this way the developer gets version control without even noticing, and works completely transparent to any kind of application. Ouch... That looks worse than plain per-file versioning. Not only do you per definition get broken commits if there's a change that affects two dependent files, you also get an insane amount of commits just for testing stuff, or fixing bugs. And unless you use some kind of union-fs on top (or keep ignored files in special unversioned area in your gitfs, which seems somewhat ugly), you'll probably also have to track lots of files in the working directory that are generated, unless you want to re-generate them after each reboot. And that leads to even more absolutely useless revisions. Just thinking of my vim .swp files (which I definitely don't want to loose on a crash/power outtage/pkill -9 .ENTER dammit) makes me scream because of the gazillion of commits they will produce (and no, I don't want them in some special out of tree directory). Plus, I have vim setup to _replace_ files on write, so that I can more easily use hard-linked copies with changing all copies at once _unless_ I explicitly want to, meaning that I'd get full remove/add commits, which are absolutely useless. And trying to detect such patterns (rename, then write the changed file with the old name and then delete the renamed file) is probably not worth the trouble, because you coincidently might _want_ to have just these three steps recorded when you happen to perform them manually. And if you go for heuristics, you'll complain each time you get a false-positive/negative. That said, out of pure curiousness I came up with the attached script which just uses inotifywait to watch a directory and issue git commands on certain events. It is extremely stupid, but seems to work. And at least it hasn't got the drawbacks of a real gitfs regarding the need to have a separate non-versioned storage area for the working directory, because it simply uses the existing working directory wherever that might be stored. It doesn't use GIT_DIR/WORK_DIR yet, but hey, should be easy to add... Feel free to mess with that thing, hey, maybe you even like it and extend it to match your proposed workflow even more. I for sure won't use or even extend it, so you're likely on your own there. Side-note: Writing that script probably took less time than writing this email and probably less time than was wasted on this topic. Makes me want to use today's preferred Code talks, b...s... walks statement, but I'll refrain from that... Just because I lack the credibility to say that, and the script attached is quite crappy ;-) Björn #!/bin/bash inotifywait -m -r --exclude ^\./\.git/.* -e close_write -e move -e create -e delete . 2/dev/null | while read FILE_PATH EVENT FILE_NAME do FILE_NAME=$FILE_PATH$FILE_NAME FILE_NAME=${FILE_NAME#./} # git doesn't care about directories if [ -d $FILE_NAME ] then continue fi case $EVENT in *CLOSE_WRITE*) ACTION=change ;; *MOVED_TO*) ACTION=create ;; *MODIFY*) ACTION=change ;; *DELETE*) ACTION=delete ;; *MOVED_FROM*) ACTION=delete ;; *CREATE*) ACTION=create ;; *) continue ;; esac case $ACTION in create) git add $FILE_NAME git commit -m $FILE_NAME created ;; delete) git rm --cached $FILE_NAME
Re: [PATCH] fix adbhid mismerge
On 2007.10.17 18:18:21 +0200, Björn Steinbrink wrote: > On 2007.10.16 19:21:53 -0700, Linus Torvalds wrote: > > > > > > On Tue, 16 Oct 2007, Linus Torvalds wrote: > > > > > > I don't think you did anything wrong. You used both --full-history > > > (implicitly: git-whatchanged) and you made sure to see the diffs for both > > > sides of any merge (-m), and that means that you should see every single > > > diff involved. > > > > Btw, if anybody can come up with a better way to find these kinds of > > mis-merges, I'd love to hear about it. > > > [...] > > > > What I'd actually really like would be something that shows the original > > conflict, but that's really expensive to compute (it basically involves > > re-doing the merge from scratch - finding the proper base commit(s) etc). > > So we never did that. > > So here's what I came up with: > > git grep -l "int keycode, up_flag" \ > $(git-rev-list HEAD --parents -- drivers/macintosh/adbhid.c | \ > egrep '(.{40} ?){3}' | cut -d' ' -f1) \ > -- drivers/macintosh/adbhid.c | grep -o '^[^:]*' > > Which gives: b981d8b3f5e008ff10d993be633ad00564fc22cd > > Then: > git checkout b981d8b3f5e008ff10d993be633ad00564fc22cd^1 > git merge b981d8b3f5e008ff10d993be633ad00564fc22cd^2 > > And you got your merge conflict. > > The idea is, that the above ugliness searches for the last commit that Oops! Obviously I meant to do s/last commit/merge commit/ before sending that email. > produced the bad line. The inner git-rev-list call searches for merge > commits (thanks to Ilari in #git for the egrep trick), then git-grep > looks which of these have the "bad line" and the final grep just filters > the filename out. > > If the bash thing spits out more than one commit hash, you probably want > to use the last one... I guess... And if the given result doesn't > produce the request merge conflict, well, I guess you could replace HEAD > in the git-rev-list call with the sha1 you got in the first run, but I'm > not entirely sure about that. > > Is that helpful? > > Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix adbhid mismerge
On 2007.10.16 19:21:53 -0700, Linus Torvalds wrote: > > > On Tue, 16 Oct 2007, Linus Torvalds wrote: > > > > I don't think you did anything wrong. You used both --full-history > > (implicitly: git-whatchanged) and you made sure to see the diffs for both > > sides of any merge (-m), and that means that you should see every single > > diff involved. > > Btw, if anybody can come up with a better way to find these kinds of > mis-merges, I'd love to hear about it. > [...] > > What I'd actually really like would be something that shows the original > conflict, but that's really expensive to compute (it basically involves > re-doing the merge from scratch - finding the proper base commit(s) etc). > So we never did that. So here's what I came up with: git grep -l "int keycode, up_flag" \ $(git-rev-list HEAD --parents -- drivers/macintosh/adbhid.c | \ egrep '(.{40} ?){3}' | cut -d' ' -f1) \ -- drivers/macintosh/adbhid.c | grep -o '^[^:]*' Which gives: b981d8b3f5e008ff10d993be633ad00564fc22cd Then: git checkout b981d8b3f5e008ff10d993be633ad00564fc22cd^1 git merge b981d8b3f5e008ff10d993be633ad00564fc22cd^2 And you got your merge conflict. The idea is, that the above ugliness searches for the last commit that produced the bad line. The inner git-rev-list call searches for merge commits (thanks to Ilari in #git for the egrep trick), then git-grep looks which of these have the "bad line" and the final grep just filters the filename out. If the bash thing spits out more than one commit hash, you probably want to use the last one... I guess... And if the given result doesn't produce the request merge conflict, well, I guess you could replace HEAD in the git-rev-list call with the sha1 you got in the first run, but I'm not entirely sure about that. Is that helpful? Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arcmsr changelog differs from diffs
Hi Hans, On 2007.10.17 12:41:11 +0200, Hans-Peter Jansen wrote: > Dear Björn, > > Am Mittwoch, 17. Oktober 2007 02:53 schrieb Björn Steinbrink: > > And then, if I may guess, James probably just noticed that there were > > changes left and commited them (while they were now down to just the > > whitespace change), without checking what changes were actually left (no > > offense intended). At least I think that git wouldn't cripple the diff > > if the changes that James checked in were not already whitespace-only at > > the time he commited them, and the git history of his tree seems to > > agree. > > (*) Thanks for the clarification, Björn. What made me stumbling was the > different orginators, while Jeff isn't known with catching attention by > assimilating other peoples contributions. I'm just guessing here, and certainly don't want to blame anyone, but Jeff's original patch was from July, while the only date I can find for Nick's commit is September 13th. So it was probably Nick who integrated Jeff's changes into his commit (maybe thinking that Jeff's patch got lost), not Jeff assimilating a subset of Nick's changes. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix adbhid mismerge
On 2007.10.17 18:18:21 +0200, Björn Steinbrink wrote: On 2007.10.16 19:21:53 -0700, Linus Torvalds wrote: On Tue, 16 Oct 2007, Linus Torvalds wrote: I don't think you did anything wrong. You used both --full-history (implicitly: git-whatchanged) and you made sure to see the diffs for both sides of any merge (-m), and that means that you should see every single diff involved. Btw, if anybody can come up with a better way to find these kinds of mis-merges, I'd love to hear about it. [...] What I'd actually really like would be something that shows the original conflict, but that's really expensive to compute (it basically involves re-doing the merge from scratch - finding the proper base commit(s) etc). So we never did that. So here's what I came up with: git grep -l int keycode, up_flag \ $(git-rev-list HEAD --parents -- drivers/macintosh/adbhid.c | \ egrep '(.{40} ?){3}' | cut -d' ' -f1) \ -- drivers/macintosh/adbhid.c | grep -o '^[^:]*' Which gives: b981d8b3f5e008ff10d993be633ad00564fc22cd Then: git checkout b981d8b3f5e008ff10d993be633ad00564fc22cd^1 git merge b981d8b3f5e008ff10d993be633ad00564fc22cd^2 And you got your merge conflict. The idea is, that the above ugliness searches for the last commit that Oops! Obviously I meant to do s/last commit/merge commit/ before sending that email. produced the bad line. The inner git-rev-list call searches for merge commits (thanks to Ilari in #git for the egrep trick), then git-grep looks which of these have the bad line and the final grep just filters the filename out. If the bash thing spits out more than one commit hash, you probably want to use the last one... I guess... And if the given result doesn't produce the request merge conflict, well, I guess you could replace HEAD in the git-rev-list call with the sha1 you got in the first run, but I'm not entirely sure about that. Is that helpful? Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arcmsr changelog differs from diffs
Hi Hans, On 2007.10.17 12:41:11 +0200, Hans-Peter Jansen wrote: Dear Björn, Am Mittwoch, 17. Oktober 2007 02:53 schrieb Björn Steinbrink: And then, if I may guess, James probably just noticed that there were changes left and commited them (while they were now down to just the whitespace change), without checking what changes were actually left (no offense intended). At least I think that git wouldn't cripple the diff if the changes that James checked in were not already whitespace-only at the time he commited them, and the git history of his tree seems to agree. (*) Thanks for the clarification, Björn. What made me stumbling was the different orginators, while Jeff isn't known with catching attention by assimilating other peoples contributions. I'm just guessing here, and certainly don't want to blame anyone, but Jeff's original patch was from July, while the only date I can find for Nick's commit is September 13th. So it was probably Nick who integrated Jeff's changes into his commit (maybe thinking that Jeff's patch got lost), not Jeff assimilating a subset of Nick's changes. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix adbhid mismerge
On 2007.10.16 19:21:53 -0700, Linus Torvalds wrote: On Tue, 16 Oct 2007, Linus Torvalds wrote: I don't think you did anything wrong. You used both --full-history (implicitly: git-whatchanged) and you made sure to see the diffs for both sides of any merge (-m), and that means that you should see every single diff involved. Btw, if anybody can come up with a better way to find these kinds of mis-merges, I'd love to hear about it. [...] What I'd actually really like would be something that shows the original conflict, but that's really expensive to compute (it basically involves re-doing the merge from scratch - finding the proper base commit(s) etc). So we never did that. So here's what I came up with: git grep -l int keycode, up_flag \ $(git-rev-list HEAD --parents -- drivers/macintosh/adbhid.c | \ egrep '(.{40} ?){3}' | cut -d' ' -f1) \ -- drivers/macintosh/adbhid.c | grep -o '^[^:]*' Which gives: b981d8b3f5e008ff10d993be633ad00564fc22cd Then: git checkout b981d8b3f5e008ff10d993be633ad00564fc22cd^1 git merge b981d8b3f5e008ff10d993be633ad00564fc22cd^2 And you got your merge conflict. The idea is, that the above ugliness searches for the last commit that produced the bad line. The inner git-rev-list call searches for merge commits (thanks to Ilari in #git for the egrep trick), then git-grep looks which of these have the bad line and the final grep just filters the filename out. If the bash thing spits out more than one commit hash, you probably want to use the last one... I guess... And if the given result doesn't produce the request merge conflict, well, I guess you could replace HEAD in the git-rev-list call with the sha1 you got in the first run, but I'm not entirely sure about that. Is that helpful? Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arcmsr changelog differs from diffs
On 2007.10.17 00:07:19 +0200, Hans-Peter Jansen wrote: > Hi Jeff, > > while browsing through Linus' current check ins, I stumbled upon: > > [SCSI] arcmsr: irq handler fixes, cleanups, micro-opts: > > --8<-- > 488a5c8a9a3b67ae117784cd0d73bef53a73d57d > drivers/scsi/arcmsr/arcmsr_hba.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c > b/drivers/scsi/arcmsr/arcmsr_hba.c > index 7832a10..f4d2d52 100644 > --- a/drivers/scsi/arcmsr/arcmsr_hba.c > +++ b/drivers/scsi/arcmsr/arcmsr_hba.c > @@ -422,7 +422,7 @@ static int arcmsr_probe(struct pci_dev *pdev, > goto out_release_regions; > > error = request_irq(pdev->irq, arcmsr_do_interrupt, > - IRQF_SHARED, "arcmsr", acb); > + IRQF_SHARED, "arcmsr", acb); > if (error) > goto out_free_ccb_pool; > > -->8-- > > and: [SCSI] arcmsr: Fix hardware wait loops > > --8<-- > 24430458bb924e371ff894e26bfa9f73707f53fb > drivers/scsi/arcmsr/arcmsr_hba.c |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c > b/drivers/scsi/arcmsr/arcmsr_hba.c > index 50e1310..7832a10 100644 > --- a/drivers/scsi/arcmsr/arcmsr_hba.c > +++ b/drivers/scsi/arcmsr/arcmsr_hba.c > @@ -2092,8 +2092,10 @@ static void arcmsr_iop_reset(struct > AdapterControlBlock *acb) > if (atomic_read(>ccboutstandingcount) != 0) { > /* talk to iop 331 outstanding command aborted */ > arcmsr_abort_allcmd(acb); > + > /* wait for 3 sec for all command aborted*/ > ssleep(3); > + > /* disable all outbound interrupt */ > intmask_org = arcmsr_disable_outbound_ints(acb); > /* clear all outbound posted Q */ > -->8-- > > where both changelogs differ significantly from the actual diffs, which both > are simple WS fixups and nothing else. Does qgit fools me here, or is > anything else wrong on my side? Nothing wrong on your side. I took a look at the second one, and everything but the whitespace changes already found its way into Linus' tree via 1a4f550a09f89e3a15eff1971bc9db977571b9f6. One hunk of the original patch[1] was actually made redundant because the code was removed in that commit, so that's probably what James fixed (see full commit message). And then, if I may guess, James probably just noticed that there were changes left and commited them (while they were now down to just the whitespace change), without checking what changes were actually left (no offense intended). At least I think that git wouldn't cripple the diff if the changes that James checked in were not already whitespace-only at the time he commited them, and the git history of his tree seems to agree. Probably the other commit is similar. Björn [1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-07/msg11957.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arcmsr changelog differs from diffs
On 2007.10.17 00:07:19 +0200, Hans-Peter Jansen wrote: Hi Jeff, while browsing through Linus' current check ins, I stumbled upon: [SCSI] arcmsr: irq handler fixes, cleanups, micro-opts: --8-- 488a5c8a9a3b67ae117784cd0d73bef53a73d57d drivers/scsi/arcmsr/arcmsr_hba.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c index 7832a10..f4d2d52 100644 --- a/drivers/scsi/arcmsr/arcmsr_hba.c +++ b/drivers/scsi/arcmsr/arcmsr_hba.c @@ -422,7 +422,7 @@ static int arcmsr_probe(struct pci_dev *pdev, goto out_release_regions; error = request_irq(pdev-irq, arcmsr_do_interrupt, - IRQF_SHARED, arcmsr, acb); + IRQF_SHARED, arcmsr, acb); if (error) goto out_free_ccb_pool; --8-- and: [SCSI] arcmsr: Fix hardware wait loops --8-- 24430458bb924e371ff894e26bfa9f73707f53fb drivers/scsi/arcmsr/arcmsr_hba.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c index 50e1310..7832a10 100644 --- a/drivers/scsi/arcmsr/arcmsr_hba.c +++ b/drivers/scsi/arcmsr/arcmsr_hba.c @@ -2092,8 +2092,10 @@ static void arcmsr_iop_reset(struct AdapterControlBlock *acb) if (atomic_read(acb-ccboutstandingcount) != 0) { /* talk to iop 331 outstanding command aborted */ arcmsr_abort_allcmd(acb); + /* wait for 3 sec for all command aborted*/ ssleep(3); + /* disable all outbound interrupt */ intmask_org = arcmsr_disable_outbound_ints(acb); /* clear all outbound posted Q */ --8-- where both changelogs differ significantly from the actual diffs, which both are simple WS fixups and nothing else. Does qgit fools me here, or is anything else wrong on my side? Nothing wrong on your side. I took a look at the second one, and everything but the whitespace changes already found its way into Linus' tree via 1a4f550a09f89e3a15eff1971bc9db977571b9f6. One hunk of the original patch[1] was actually made redundant because the code was removed in that commit, so that's probably what James fixed (see full commit message). And then, if I may guess, James probably just noticed that there were changes left and commited them (while they were now down to just the whitespace change), without checking what changes were actually left (no offense intended). At least I think that git wouldn't cripple the diff if the changes that James checked in were not already whitespace-only at the time he commited them, and the git history of his tree seems to agree. Probably the other commit is similar. Björn [1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-07/msg11957.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: what is the rationale for "TAINT_USER"?
On 2007.10.12 08:37:04 -0400, Robert P. J. Day wrote: > On Fri, 12 Oct 2007, Björn Steinbrink wrote: > > > On 2007.10.12 08:04:20 -0400, Robert P. J. Day wrote: > > > > > > i can see what the theoretical purpose for it is here: > > > > > > http://kerneltrap.org/node/6656 > > > > > > but it's not clear how it can possibly be set from userland given > > > that: > > > > > > $ grep -r TAINT_USER * > > > include/linux/kernel.h:#define TAINT_USER (1<<6) > > > kernel/panic.c: tainted & TAINT_USER ? 'U' : ' ', > > > $ > > > > > > am i missing something screamingly obvious? > > > > Grepping for "tainted" leads me to: > > > > echo 32 > /proc/sys/kernel/tainted > > ???. i have no idea what you were grepping through to find that > phrase, but TAINT_USER would seem to be equivalent to echo 64, not > echo 32, anyway, no? Oops, yeah, 64... And it didn't lead me to the exact phrase, but the sysctl handler, which (implicitly) led me to that command. Sorry for the confusion. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: what is the rationale for "TAINT_USER"?
On 2007.10.12 08:04:20 -0400, Robert P. J. Day wrote: > > i can see what the theoretical purpose for it is here: > > http://kerneltrap.org/node/6656 > > but it's not clear how it can possibly be set from userland given > that: > > $ grep -r TAINT_USER * > include/linux/kernel.h:#define TAINT_USER (1<<6) > kernel/panic.c: tainted & TAINT_USER ? 'U' : ' ', > $ > > am i missing something screamingly obvious? Grepping for "tainted" leads me to: echo 32 > /proc/sys/kernel/tainted Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NMI watchdog
On 2007.10.12 11:18:24 +0200, John Sigler wrote: > Hello, > > I'm experiencing a full system lockup. I'm using an out-of-tree driver > which I suspect is responsible. I'm trying to enable the NMI watchdog. > > # cat /proc/version > Linux version 2.6.22.1-rt9 (gcc version 3.4.6) #1 PREEMPT RT Tue Oct 9 > 12:25:47 CEST 2007 > > # cat /proc/cmdline > ro root=/dev/hdc1 console=ttyS0,57600n8 console=tty0 panic=3 apic=debug > nmi_watchdog=2 > > However, after boot, the NMI count does not change. > > # cat /proc/interrupts ; sleep 10 ; cat /proc/interrupts Try running some cpu hog in the background. The performance counters get increased only when the CPU is actually doing something. On a mostly idle system, it can take quite a while for the next NMI to show up. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: what is the rationale for TAINT_USER?
On 2007.10.12 08:37:04 -0400, Robert P. J. Day wrote: On Fri, 12 Oct 2007, Björn Steinbrink wrote: On 2007.10.12 08:04:20 -0400, Robert P. J. Day wrote: i can see what the theoretical purpose for it is here: http://kerneltrap.org/node/6656 but it's not clear how it can possibly be set from userland given that: $ grep -r TAINT_USER * include/linux/kernel.h:#define TAINT_USER (16) kernel/panic.c: tainted TAINT_USER ? 'U' : ' ', $ am i missing something screamingly obvious? Grepping for tainted leads me to: echo 32 /proc/sys/kernel/tainted ???. i have no idea what you were grepping through to find that phrase, but TAINT_USER would seem to be equivalent to echo 64, not echo 32, anyway, no? Oops, yeah, 64... And it didn't lead me to the exact phrase, but the sysctl handler, which (implicitly) led me to that command. Sorry for the confusion. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NMI watchdog
On 2007.10.12 11:18:24 +0200, John Sigler wrote: Hello, I'm experiencing a full system lockup. I'm using an out-of-tree driver which I suspect is responsible. I'm trying to enable the NMI watchdog. # cat /proc/version Linux version 2.6.22.1-rt9 (gcc version 3.4.6) #1 PREEMPT RT Tue Oct 9 12:25:47 CEST 2007 # cat /proc/cmdline ro root=/dev/hdc1 console=ttyS0,57600n8 console=tty0 panic=3 apic=debug nmi_watchdog=2 However, after boot, the NMI count does not change. # cat /proc/interrupts ; sleep 10 ; cat /proc/interrupts Try running some cpu hog in the background. The performance counters get increased only when the CPU is actually doing something. On a mostly idle system, it can take quite a while for the next NMI to show up. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: what is the rationale for TAINT_USER?
On 2007.10.12 08:04:20 -0400, Robert P. J. Day wrote: i can see what the theoretical purpose for it is here: http://kerneltrap.org/node/6656 but it's not clear how it can possibly be set from userland given that: $ grep -r TAINT_USER * include/linux/kernel.h:#define TAINT_USER (16) kernel/panic.c: tainted TAINT_USER ? 'U' : ' ', $ am i missing something screamingly obvious? Grepping for tainted leads me to: echo 32 /proc/sys/kernel/tainted Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: time_after - what on earth???
On 2007.09.12 00:10:19 +0100, Adrian McMenamin wrote: > On 12/09/2007, Björn Steinbrink <[EMAIL PROTECTED]> wrote: > > On 2007.09.12 00:19:09 +0200, Rene Herman wrote: > > > On 09/12/2007 12:15 AM, Adrian McMenamin wrote: > > > > > >> On 11/09/2007, Rene Herman <[EMAIL PROTECTED]> wrote: > > >>> On 09/12/2007 12:05 AM, Adrian McMenamin wrote: > > >>> > > >>>> OK, why does this line occasionally return true: > > > > What exactly is "occassionally"? Does it happen more than once per > > boot? If not, and it happens after a certain time after booting, it > > might be wrapping of the jiffie counter (see below). > > > > >>>> > > >>>> if ((maple_dev->interval > 0) && (jiffies >maple_dev->when)) > > >>>> > > >>>> while this one never does (no other changes made): > > >>>> > > >>>> if ((maple_dev->interval > 0) && (time_after(jiffies, > > >>>> maple_dev->when))) > > >>> Is maple_dev->when an unsigned long? > > >>> > > >> Yes. Does that make a difference? > > > > > > If it had been a signed type, it could've wrapped to something you didn't > > > expect, explaining the difference at least... > > > > > > With an unsigned long, the only diference should be that time_after() > > > deals > > > with jiffie wrapping which I assume is not an actual problem here. I'll > > > retreat into the shades again... ;-( > > > > If "occasionally" is limited to once per boot, it might be jiffie > > wrapping. IIRC jiffies are initialized so that they wrap after about 5 > > minutes of uptime to reveal such bugs without forcing you to wait for > > ages just to have the counter wrap for the first time. > > > > No, I mean "works properly" - ie occasionally evaluates as true Ehrm, yeah, I somehow parsed that as if it had a negation in there. Anyway, I looked up the patches you posted. "when" is initialized to 0 and only changed if the above condition evaluates to true. Now, time_after and "<" have different points at which "future" and "past" are separated. time_after splits (about) equally between future and past, so 0 can be either, depending on the value of jiffies. But for "<" 0 is almost always in the past, except for the seldom event of jiffies being 0. Now, given that jiffies start out at a huge value to make the counter wrap around early, 0 happens to be in the "future" for time_after, until the wrap around occurs. So in your case, you just might have to wait those 5 minutes to get the working behaviour, instead of the common case in which it breaks after that time ;-) A fix would likely initialize "when" to jiffies. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: time_after - what on earth???
On 2007.09.12 00:19:09 +0200, Rene Herman wrote: > On 09/12/2007 12:15 AM, Adrian McMenamin wrote: > >> On 11/09/2007, Rene Herman <[EMAIL PROTECTED]> wrote: >>> On 09/12/2007 12:05 AM, Adrian McMenamin wrote: >>> OK, why does this line occasionally return true: What exactly is "occassionally"? Does it happen more than once per boot? If not, and it happens after a certain time after booting, it might be wrapping of the jiffie counter (see below). if ((maple_dev->interval > 0) && (jiffies >maple_dev->when)) while this one never does (no other changes made): if ((maple_dev->interval > 0) && (time_after(jiffies, maple_dev->when))) >>> Is maple_dev->when an unsigned long? >>> >> Yes. Does that make a difference? > > If it had been a signed type, it could've wrapped to something you didn't > expect, explaining the difference at least... > > With an unsigned long, the only diference should be that time_after() deals > with jiffie wrapping which I assume is not an actual problem here. I'll > retreat into the shades again... ;-( If "occasionally" is limited to once per boot, it might be jiffie wrapping. IIRC jiffies are initialized so that they wrap after about 5 minutes of uptime to reveal such bugs without forcing you to wait for ages just to have the counter wrap for the first time. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: time_after - what on earth???
On 2007.09.12 00:19:09 +0200, Rene Herman wrote: On 09/12/2007 12:15 AM, Adrian McMenamin wrote: On 11/09/2007, Rene Herman [EMAIL PROTECTED] wrote: On 09/12/2007 12:05 AM, Adrian McMenamin wrote: OK, why does this line occasionally return true: What exactly is occassionally? Does it happen more than once per boot? If not, and it happens after a certain time after booting, it might be wrapping of the jiffie counter (see below). if ((maple_dev-interval 0) (jiffies maple_dev-when)) while this one never does (no other changes made): if ((maple_dev-interval 0) (time_after(jiffies, maple_dev-when))) Is maple_dev-when an unsigned long? Yes. Does that make a difference? If it had been a signed type, it could've wrapped to something you didn't expect, explaining the difference at least... With an unsigned long, the only diference should be that time_after() deals with jiffie wrapping which I assume is not an actual problem here. I'll retreat into the shades again... ;-( If occasionally is limited to once per boot, it might be jiffie wrapping. IIRC jiffies are initialized so that they wrap after about 5 minutes of uptime to reveal such bugs without forcing you to wait for ages just to have the counter wrap for the first time. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: time_after - what on earth???
On 2007.09.12 00:10:19 +0100, Adrian McMenamin wrote: On 12/09/2007, Björn Steinbrink [EMAIL PROTECTED] wrote: On 2007.09.12 00:19:09 +0200, Rene Herman wrote: On 09/12/2007 12:15 AM, Adrian McMenamin wrote: On 11/09/2007, Rene Herman [EMAIL PROTECTED] wrote: On 09/12/2007 12:05 AM, Adrian McMenamin wrote: OK, why does this line occasionally return true: What exactly is occassionally? Does it happen more than once per boot? If not, and it happens after a certain time after booting, it might be wrapping of the jiffie counter (see below). if ((maple_dev-interval 0) (jiffies maple_dev-when)) while this one never does (no other changes made): if ((maple_dev-interval 0) (time_after(jiffies, maple_dev-when))) Is maple_dev-when an unsigned long? Yes. Does that make a difference? If it had been a signed type, it could've wrapped to something you didn't expect, explaining the difference at least... With an unsigned long, the only diference should be that time_after() deals with jiffie wrapping which I assume is not an actual problem here. I'll retreat into the shades again... ;-( If occasionally is limited to once per boot, it might be jiffie wrapping. IIRC jiffies are initialized so that they wrap after about 5 minutes of uptime to reveal such bugs without forcing you to wait for ages just to have the counter wrap for the first time. No, I mean works properly - ie occasionally evaluates as true Ehrm, yeah, I somehow parsed that as if it had a negation in there. Anyway, I looked up the patches you posted. when is initialized to 0 and only changed if the above condition evaluates to true. Now, time_after and have different points at which future and past are separated. time_after splits (about) equally between future and past, so 0 can be either, depending on the value of jiffies. But for 0 is almost always in the past, except for the seldom event of jiffies being 0. Now, given that jiffies start out at a huge value to make the counter wrap around early, 0 happens to be in the future for time_after, until the wrap around occurs. So in your case, you just might have to wait those 5 minutes to get the working behaviour, instead of the common case in which it breaks after that time ;-) A fix would likely initialize when to jiffies. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Rescanning SCSI/SATA bus
On 2007.09.05 16:45:29 +0200, noah wrote: > 2007/9/5, Björn Steinbrink <[EMAIL PROTECTED]>: > > On 2007.09.05 16:04:13 +0200, noah wrote: > > > 2007/9/5, Martin K. Petersen <[EMAIL PROTECTED]>: > > > > >>>>> "Jan" == Jan Engelhardt <[EMAIL PROTECTED]> writes: > > > > > > > > Jan> 11:40 sun:../scsi_host/host0 # echo 1 >scan > > > > Jan> -bash: echo: write error: Invalid argument > > > > > > > > Jan> What is the proper way to trigger a rescan, if it comes to be > > > > Jan> necessary? > > > > > > > > echo "- - -" > scan > > ^ > > > > > > > # echo --- > /sys/class/scsi_host/host9/scan > > ^^^ > > > -bash: echo: write error: Invalid argument > > > > Note the spaces between the dashes. > > Thanks. Unfortunately it produced the same error. > Is rescanning even supported with the sata_nv driver? Works just fine here: [EMAIL PROTECTED]:/sys/class/scsi_host/host0# echo - - - > scan [EMAIL PROTECTED]:/sys/class/scsi_host/host0# cat proc_name sata_nv Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Rescanning SCSI/SATA bus
On 2007.09.05 16:04:13 +0200, noah wrote: > 2007/9/5, Martin K. Petersen <[EMAIL PROTECTED]>: > > > "Jan" == Jan Engelhardt <[EMAIL PROTECTED]> writes: > > > > Jan> 11:40 sun:../scsi_host/host0 # echo 1 >scan > > Jan> -bash: echo: write error: Invalid argument > > > > Jan> What is the proper way to trigger a rescan, if it comes to be > > Jan> necessary? > > > > echo "- - -" > scan ^ > > > # echo --- > /sys/class/scsi_host/host9/scan ^^^ > -bash: echo: write error: Invalid argument Note the spaces between the dashes. - - - not --- Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Rescanning SCSI/SATA bus
On 2007.09.05 16:04:13 +0200, noah wrote: 2007/9/5, Martin K. Petersen [EMAIL PROTECTED]: Jan == Jan Engelhardt [EMAIL PROTECTED] writes: Jan 11:40 sun:../scsi_host/host0 # echo 1 scan Jan -bash: echo: write error: Invalid argument Jan What is the proper way to trigger a rescan, if it comes to be Jan necessary? echo - - - scan ^ # echo --- /sys/class/scsi_host/host9/scan ^^^ -bash: echo: write error: Invalid argument Note the spaces between the dashes. - - - not --- Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Rescanning SCSI/SATA bus
On 2007.09.05 16:45:29 +0200, noah wrote: 2007/9/5, Björn Steinbrink [EMAIL PROTECTED]: On 2007.09.05 16:04:13 +0200, noah wrote: 2007/9/5, Martin K. Petersen [EMAIL PROTECTED]: Jan == Jan Engelhardt [EMAIL PROTECTED] writes: Jan 11:40 sun:../scsi_host/host0 # echo 1 scan Jan -bash: echo: write error: Invalid argument Jan What is the proper way to trigger a rescan, if it comes to be Jan necessary? echo - - - scan ^ # echo --- /sys/class/scsi_host/host9/scan ^^^ -bash: echo: write error: Invalid argument Note the spaces between the dashes. Thanks. Unfortunately it produced the same error. Is rescanning even supported with the sata_nv driver? Works just fine here: [EMAIL PROTECTED]:/sys/class/scsi_host/host0# echo - - - scan [EMAIL PROTECTED]:/sys/class/scsi_host/host0# cat proc_name sata_nv Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OT] good job guys with the anti-spam !
On 2007.09.02 02:36:18 +0200, Oleg Verych wrote: > * Tue, 31 Jul 2007 23:40:11 +0200 > > > [] > > eventhough people often write only when they have something to complain > > about, I for once would like to congratulate Matti and David, our mail > > admins, for the wonderful job they've done with the spams lately. This > > month, I might have seen maybe one or two instead of perhaps 30 per day > > earlier. While it did not bother me that much earlier, I can say that > > the list is more pleasant to read every day, and I think that's great. > > > > Kudos guys ! > > > > Willy > > > > PS: please do not start another one of those long useless threads from now > > on > > > > Instead let me share an idea and see what will happen (if any). > The mailing list policy: > > 1. sent mails must have in-reply-to (rfc2821 SHOULD) with valid, i.e. >existing in archives, message-id; > > 2. otherwise sender gets response about creating new thread. Replying >to that message will enable (1). Ehrm, you want everyone who wants to start a new thread to: - send an email - await response from the mail server - send the same email again as a reply to the first one Did I get that right? If so, good luck running from anyone who doesn't send his patch series as a bunch of replies to a 00/XX message. Although it would likely be fun to see the next guy that sends a 600 emails patch bomb getting buried alive in "Please confirm that you want to create a new thread" emails. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OT] good job guys with the anti-spam !
On 2007.09.02 02:36:18 +0200, Oleg Verych wrote: * Tue, 31 Jul 2007 23:40:11 +0200 [] eventhough people often write only when they have something to complain about, I for once would like to congratulate Matti and David, our mail admins, for the wonderful job they've done with the spams lately. This month, I might have seen maybe one or two instead of perhaps 30 per day earlier. While it did not bother me that much earlier, I can say that the list is more pleasant to read every day, and I think that's great. Kudos guys ! Willy PS: please do not start another one of those long useless threads from now on Instead let me share an idea and see what will happen (if any). The mailing list policy: 1. sent mails must have in-reply-to (rfc2821 SHOULD) with valid, i.e. existing in archives, message-id; 2. otherwise sender gets response about creating new thread. Replying to that message will enable (1). Ehrm, you want everyone who wants to start a new thread to: - send an email - await response from the mail server - send the same email again as a reply to the first one Did I get that right? If so, good luck running from anyone who doesn't send his patch series as a bunch of replies to a 00/XX message. Although it would likely be fun to see the next guy that sends a 600 emails patch bomb getting buried alive in Please confirm that you want to create a new thread emails. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog=2 regression in 2.6.21
On 2007.08.31 17:24:46 -0700, Daniel Walker wrote: > On Fri, 2007-08-31 at 20:06 +0200, Björn Steinbrink wrote: > > > > > something to do with the nmi hertz adjustment that happens after > > > check_nmi_watchdog() .. > > > > Hm hm, does the same thing (watchdog stuck after check) happen with > > older kernels, ie. those before Stephane's changeset that made it use > > PERFCTR1? > > I noticed the frequency gets turned down after check_nmi_watchdog() is > called.. I think it's suppose to trigger once per second, but it's more > like it updates randomly .. It's once per second if the cpu is 100% busy, if it's just idling and halted, the performance counters won't be increased. > In older kernels it's very slow, but it's more consistent .. With the same load on the box? Maybe some other changes caused the box to behave differently (say, CFS), regarding eg. load distribution amongst the cores. > > Here is some output .. > > morning-glory ~ # cat /proc/interrupts >CPU0 CPU1 CPU2 CPU3 > 0:103 0 0 0 IO-APIC-edge timer > 1: 0 0 0 8 IO-APIC-edge i8042 > 4: 2320 0 0 1 IO-APIC-edge serial > 8: 1 0 0 1 IO-APIC-edge rtc > 12: 0 0 0113 IO-APIC-edge i8042 > 14: 1143 0 0 10 IO-APIC-edge ide0 > 16:227 0 0 1 IO-APIC-fasteoi > uhci_hcd:usb2, eth0 > 18: 0 0 0 0 IO-APIC-fasteoi > ehci_hcd:usb1 > 19: 0 0 0 0 IO-APIC-fasteoi > uhci_hcd:usb3 > 20: 0 0 0 1 IO-APIC-fasteoi acpi > NMI:150168124121 > LOC: 6188 6189 6187 6184 > ERR: 0 > MIS: 0 > morning-glory ~ # cat /proc/interrupts >CPU0 CPU1 CPU2 CPU3 > 0:103 0 0 0 IO-APIC-edge timer > 1: 0 0 0 8 IO-APIC-edge i8042 > 4: 2391 0 0 1 IO-APIC-edge serial > 8: 1 0 0 1 IO-APIC-edge rtc > 12: 0 0 0113 IO-APIC-edge i8042 > 14: 1143 0 0 10 IO-APIC-edge ide0 > 16:872 0 0 1 IO-APIC-fasteoi > uhci_hcd:usb2, eth0 > 18: 0 0 0 0 IO-APIC-fasteoi > ehci_hcd:usb1 > 19: 0 0 0 0 IO-APIC-fasteoi > uhci_hcd:usb3 > 20: 0 0 0 1 IO-APIC-fasteoi acpi > NMI:151168124121 > LOC: 21443 21444 21442 21439 > ERR: 0 > MIS: 0 > dwalker2 ~ # > > > If you look at the LOC values you'll notice a lot of time has passed, > with only one NMI and on only one cpu .. > > It's possible this is something else completely tho .. At least from the interrupt side, that box looks pretty idle, so that's expected I'd say. > > > Maybe you could "activate" the Dprintk in write_watchdog_counter32() to > > see which value gets written to the MSR? (I don't see any switch to > > activate it, so maybe just s/Dprintk(/printk(KERN_WHATEVER / ?) > > Here's the only lines printed, > > setting INTEL_ARCH_PERFCTR0 to -0x0131385e > setting INTEL_ARCH_PERFCTR0 to -0x0131385e > setting INTEL_ARCH_PERFCTR0 to -0x0131385e > setting INTEL_ARCH_PERFCTR0 to -0x0131385e Ok, dumb I am. The "interesting" call from p6_rearm passes NULL as desc, and thus the printk is never called that way. But before we start flooding your logs, could you just hog all cores with a simple cpu hog and check if that causes the NMI counter to increase at about 1 Hz? If that doesn't work, we can go back to that debug output. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog=2 regression in 2.6.21
On 2007.08.31 09:35:23 -0700, Daniel Walker wrote: > On Fri, 2007-08-31 at 09:21 -0700, Stephane Eranian wrote: > > In this patch, the setup_*() routine now extract the MSR from the wd_ops > > to copy them into the nmi_watchdog_ctlblk. This is not done for P4 because > > of the special and ugly case of HT. > > > > With this approach, we can now create a custom wd_ops for CoreDuo that is > > a clone of the intel_arch_wd_ops, except for the MSR. > > > > Could you try this one instead? > > So I tested your patch unchanged and the system boots, and the > check_nmi_watchdog() passes .. However, the nmi stops ticking right > after bootup, > > >>From my /proc/interrupts below, > >CPU0 CPU1 CPU2 CPU3 > 0:108 0 0 0 IO-APIC-edge timer > 1: 0 0 0 8 IO-APIC-edge i8042 > 4: 3427 0 0 1 IO-APIC-edge serial > 8: 1 0 0 1 IO-APIC-edge rtc > 12: 0 0 0113 IO-APIC-edge i8042 > 14: 1128 0 0 10 IO-APIC-edge ide0 > 16: 1664 0 0 1 IO-APIC-fasteoi > uhci_hcd:usb2, eth0 > 18: 0 0 0 0 IO-APIC-fasteoi > ehci_hcd:usb1 > 19: 0 0 0 0 IO-APIC-fasteoi > uhci_hcd:usb3 > 20: 0 0 0 1 IO-APIC-fasteoi acpi > NMI: 1670 1453 1097967 > LOC: 48001 48002 48000 48006 > ERR: 0 > MIS: 0 > > > The NMI field never changes .. > > So I added another change which looked appropriate, > > @@ -674,6 +688,7 @@ unsigned lapic_adjust_nmi_hz(unsigned hz > { > struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); > if (wd->perfctr_msr == MSR_P6_PERFCTR0 || > + wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0 || > wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1) > hz = adjust_for_32bit_ctr(hz); > return hz; > > > Unfortunately that didn't fix anything, but I have a feeling is has That's because MSR_P6_PERFCTR0 is the same as MSR_ARCH_PERFMON_PERFCTR0. But I'd personally add that change anyway, as it makes the code less tricky and the compiler should optimize it away. > something to do with the nmi hertz adjustment that happens after > check_nmi_watchdog() .. Hm hm, does the same thing (watchdog stuck after check) happen with older kernels, ie. those before Stephane's changeset that made it use PERFCTR1? Maybe you could "activate" the Dprintk in write_watchdog_counter32() to see which value gets written to the MSR? (I don't see any switch to activate it, so maybe just s/Dprintk(/printk(KERN_WHATEVER / ?) Thanks, Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog=2 regression in 2.6.21
On 2007.08.31 09:35:23 -0700, Daniel Walker wrote: On Fri, 2007-08-31 at 09:21 -0700, Stephane Eranian wrote: In this patch, the setup_*() routine now extract the MSR from the wd_ops to copy them into the nmi_watchdog_ctlblk. This is not done for P4 because of the special and ugly case of HT. With this approach, we can now create a custom wd_ops for CoreDuo that is a clone of the intel_arch_wd_ops, except for the MSR. Could you try this one instead? So I tested your patch unchanged and the system boots, and the check_nmi_watchdog() passes .. However, the nmi stops ticking right after bootup, From my /proc/interrupts below, CPU0 CPU1 CPU2 CPU3 0:108 0 0 0 IO-APIC-edge timer 1: 0 0 0 8 IO-APIC-edge i8042 4: 3427 0 0 1 IO-APIC-edge serial 8: 1 0 0 1 IO-APIC-edge rtc 12: 0 0 0113 IO-APIC-edge i8042 14: 1128 0 0 10 IO-APIC-edge ide0 16: 1664 0 0 1 IO-APIC-fasteoi uhci_hcd:usb2, eth0 18: 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 20: 0 0 0 1 IO-APIC-fasteoi acpi NMI: 1670 1453 1097967 LOC: 48001 48002 48000 48006 ERR: 0 MIS: 0 The NMI field never changes .. So I added another change which looked appropriate, @@ -674,6 +688,7 @@ unsigned lapic_adjust_nmi_hz(unsigned hz { struct nmi_watchdog_ctlblk *wd = __get_cpu_var(nmi_watchdog_ctlblk); if (wd-perfctr_msr == MSR_P6_PERFCTR0 || + wd-perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0 || wd-perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1) hz = adjust_for_32bit_ctr(hz); return hz; Unfortunately that didn't fix anything, but I have a feeling is has That's because MSR_P6_PERFCTR0 is the same as MSR_ARCH_PERFMON_PERFCTR0. But I'd personally add that change anyway, as it makes the code less tricky and the compiler should optimize it away. something to do with the nmi hertz adjustment that happens after check_nmi_watchdog() .. Hm hm, does the same thing (watchdog stuck after check) happen with older kernels, ie. those before Stephane's changeset that made it use PERFCTR1? Maybe you could activate the Dprintk in write_watchdog_counter32() to see which value gets written to the MSR? (I don't see any switch to activate it, so maybe just s/Dprintk(/printk(KERN_WHATEVER / ?) Thanks, Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog=2 regression in 2.6.21
On 2007.08.31 17:24:46 -0700, Daniel Walker wrote: On Fri, 2007-08-31 at 20:06 +0200, Björn Steinbrink wrote: something to do with the nmi hertz adjustment that happens after check_nmi_watchdog() .. Hm hm, does the same thing (watchdog stuck after check) happen with older kernels, ie. those before Stephane's changeset that made it use PERFCTR1? I noticed the frequency gets turned down after check_nmi_watchdog() is called.. I think it's suppose to trigger once per second, but it's more like it updates randomly .. It's once per second if the cpu is 100% busy, if it's just idling and halted, the performance counters won't be increased. In older kernels it's very slow, but it's more consistent .. With the same load on the box? Maybe some other changes caused the box to behave differently (say, CFS), regarding eg. load distribution amongst the cores. Here is some output .. morning-glory ~ # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0:103 0 0 0 IO-APIC-edge timer 1: 0 0 0 8 IO-APIC-edge i8042 4: 2320 0 0 1 IO-APIC-edge serial 8: 1 0 0 1 IO-APIC-edge rtc 12: 0 0 0113 IO-APIC-edge i8042 14: 1143 0 0 10 IO-APIC-edge ide0 16:227 0 0 1 IO-APIC-fasteoi uhci_hcd:usb2, eth0 18: 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 20: 0 0 0 1 IO-APIC-fasteoi acpi NMI:150168124121 LOC: 6188 6189 6187 6184 ERR: 0 MIS: 0 morning-glory ~ # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0:103 0 0 0 IO-APIC-edge timer 1: 0 0 0 8 IO-APIC-edge i8042 4: 2391 0 0 1 IO-APIC-edge serial 8: 1 0 0 1 IO-APIC-edge rtc 12: 0 0 0113 IO-APIC-edge i8042 14: 1143 0 0 10 IO-APIC-edge ide0 16:872 0 0 1 IO-APIC-fasteoi uhci_hcd:usb2, eth0 18: 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 20: 0 0 0 1 IO-APIC-fasteoi acpi NMI:151168124121 LOC: 21443 21444 21442 21439 ERR: 0 MIS: 0 dwalker2 ~ # If you look at the LOC values you'll notice a lot of time has passed, with only one NMI and on only one cpu .. It's possible this is something else completely tho .. At least from the interrupt side, that box looks pretty idle, so that's expected I'd say. Maybe you could activate the Dprintk in write_watchdog_counter32() to see which value gets written to the MSR? (I don't see any switch to activate it, so maybe just s/Dprintk(/printk(KERN_WHATEVER / ?) Here's the only lines printed, setting INTEL_ARCH_PERFCTR0 to -0x0131385e setting INTEL_ARCH_PERFCTR0 to -0x0131385e setting INTEL_ARCH_PERFCTR0 to -0x0131385e setting INTEL_ARCH_PERFCTR0 to -0x0131385e Ok, dumb I am. The interesting call from p6_rearm passes NULL as desc, and thus the printk is never called that way. But before we start flooding your logs, could you just hog all cores with a simple cpu hog and check if that causes the NMI counter to increase at about 1 Hz? If that doesn't work, we can go back to that debug output. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] avoid negative (and full-width) shifts in radix-tree.c, take 3
On 2007.08.29 10:49:12 -0700, Christoph Lameter wrote: > On Wed, 29 Aug 2007, Peter Lund wrote: > > > > > - if (tmp >= RADIX_TREE_INDEX_BITS) > > - index = ~0UL; > > - return index; > > + if (shift < 0) > > + return ~0UL; > > + if (shift >= 8 * sizeof(unsigned long)) > > 8* sizeof(unsigned long) is the constant BITS_PER_LONG. Or in this context: RADIX_TREE_INDEX_BITS, which happens to be the old check, which just needed to be extended to cover negative shifts. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] avoid negative (and full-width) shifts in radix-tree.c, take 3
On 2007.08.29 10:49:12 -0700, Christoph Lameter wrote: On Wed, 29 Aug 2007, Peter Lund wrote: - if (tmp = RADIX_TREE_INDEX_BITS) - index = ~0UL; - return index; + if (shift 0) + return ~0UL; + if (shift = 8 * sizeof(unsigned long)) 8* sizeof(unsigned long) is the constant BITS_PER_LONG. Or in this context: RADIX_TREE_INDEX_BITS, which happens to be the old check, which just needed to be extended to cover negative shifts. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog=2 regression in 2.6.21
On 2007.08.07 17:06:49 -0700, Daniel Walker wrote: > > This patch below hangs my system on boot if I set nmi_watchdog=2 . It > shows the NMI as stuck then the system hangs .. nmi_watchdog=1 works > fine, and the system boots without any watchdog options .. > > The machine is an Intel allagash development board, and it has two dual > core Pentium-M cpus. I attached the .config I used. Should be fixed in commit e82f64e5bb0648a13630d752c35be1e7bd8bab96 (2.6.23-rc1 IIRC). Unfornately it didn't make it into 2.6.22, but the patch should work even with 2.6.21 IIRC. Björn > > bf8696ed6dfa561198b4736deaf11ab68dcc4845 is first bad commit > commit bf8696ed6dfa561198b4736deaf11ab68dcc4845 > Author: Stephane Eranian <[EMAIL PROTECTED]> > Date: Wed May 2 19:27:05 2007 +0200 > > [PATCH] i386: i386 make NMI use PERFCTR1 for architectural perfmon (take > 2) > > Hello, > > This patch against 2.6.20-git14 makes the NMI watchdog use > PERFSEL1/PERFCTR1 > instead of PERFSEL0/PERFCTR0 on processors supporting Intel architectural > perfmon, such as Intel Core 2. Although all PMU events can work on > both counters, the Precise Event-Based Sampling (PEBS) requires that the > event be in PERFCTR0 to work correctly (see section 18.14.4.1 in the > IA32 SDM Vol 3b). > > A similar patch for x86-64 is to follow. > > Changelog: > - make the i386 NMI watchdog use PERFSEL1/PERFCTR1 instead of > PERFSEL0/PERFCTR0 > on processors supporting the Intel architectural perfmon (e.g. > Core 2 Duo). > This allows PEBS to work when the NMI watchdog is active. > > signed-off-by: stephane eranian <[EMAIL PROTECTED]> > > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> > > :04 04 9cee0745798cb56da2ca82032b8ee88a2c32700a > f59ee02e3cd8f13503edb7312a3494f4d7ec0069 M arch > > # > # Automatically generated make config: don't edit > # Linux kernel version: 2.6.21 > # Tue Aug 7 16:40:44 2007 > # > CONFIG_X86_32=y > CONFIG_GENERIC_TIME=y > CONFIG_CLOCKSOURCE_WATCHDOG=y > CONFIG_GENERIC_CLOCKEVENTS=y > CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y > CONFIG_LOCKDEP_SUPPORT=y > CONFIG_STACKTRACE_SUPPORT=y > CONFIG_SEMAPHORE_SLEEPERS=y > CONFIG_X86=y > CONFIG_MMU=y > CONFIG_ZONE_DMA=y > CONFIG_GENERIC_ISA_DMA=y > CONFIG_GENERIC_IOMAP=y > CONFIG_GENERIC_BUG=y > CONFIG_GENERIC_HWEIGHT=y > CONFIG_ARCH_MAY_HAVE_PC_FDC=y > CONFIG_DMI=y > CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" > > # > # Code maturity level options > # > CONFIG_EXPERIMENTAL=y > CONFIG_LOCK_KERNEL=y > CONFIG_INIT_ENV_ARG_LIMIT=32 > > # > # General setup > # > CONFIG_LOCALVERSION="" > # CONFIG_LOCALVERSION_AUTO is not set > CONFIG_SWAP=y > CONFIG_SYSVIPC=y > # CONFIG_IPC_NS is not set > CONFIG_SYSVIPC_SYSCTL=y > # CONFIG_POSIX_MQUEUE is not set > # CONFIG_BSD_PROCESS_ACCT is not set > # CONFIG_TASKSTATS is not set > # CONFIG_UTS_NS is not set > # CONFIG_AUDIT is not set > CONFIG_IKCONFIG=y > CONFIG_IKCONFIG_PROC=y > # CONFIG_CPUSETS is not set > CONFIG_SYSFS_DEPRECATED=y > # CONFIG_RELAY is not set > # CONFIG_BLK_DEV_INITRD is not set > # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set > CONFIG_SYSCTL=y > # CONFIG_EMBEDDED is not set > CONFIG_UID16=y > CONFIG_SYSCTL_SYSCALL=y > CONFIG_KALLSYMS=y > # CONFIG_KALLSYMS_ALL is not set > # CONFIG_KALLSYMS_EXTRA_PASS is not set > CONFIG_HOTPLUG=y > CONFIG_PRINTK=y > CONFIG_BUG=y > CONFIG_ELF_CORE=y > CONFIG_BASE_FULL=y > CONFIG_FUTEX=y > CONFIG_EPOLL=y > CONFIG_SHMEM=y > CONFIG_SLAB=y > CONFIG_VM_EVENT_COUNTERS=y > CONFIG_RT_MUTEXES=y > # CONFIG_TINY_SHMEM is not set > CONFIG_BASE_SMALL=0 > # CONFIG_SLOB is not set > > # > # Loadable module support > # > CONFIG_MODULES=y > CONFIG_MODULE_UNLOAD=y > CONFIG_MODULE_FORCE_UNLOAD=y > # CONFIG_MODVERSIONS is not set > # CONFIG_MODULE_SRCVERSION_ALL is not set > CONFIG_KMOD=y > CONFIG_STOP_MACHINE=y > > # > # Block layer > # > CONFIG_BLOCK=y > # CONFIG_LBD is not set > # CONFIG_BLK_DEV_IO_TRACE is not set > # CONFIG_LSF is not set > > # > # IO Schedulers > # > CONFIG_IOSCHED_NOOP=y > # CONFIG_IOSCHED_AS is not set > # CONFIG_IOSCHED_DEADLINE is not set > CONFIG_IOSCHED_CFQ=y > # CONFIG_DEFAULT_AS is not set > # CONFIG_DEFAULT_DEADLINE is not set > CONFIG_DEFAULT_CFQ=y > # CONFIG_DEFAULT_NOOP is not set > CONFIG_DEFAULT_IOSCHED="cfq" > > # > # Processor type and features > # > # CONFIG_TICK_ONESHOT is not set > # CONFIG_NO_HZ is not set > # CONFIG_HIGH_RES_TIMERS is not set > CONFIG_SMP=y > CONFIG_X86_PC=y > # CONFIG_X86_ELAN is not set > # CONFIG_X86_VOYAGER is not set > # CONFIG_X86_NUMAQ is not set > # CONFIG_X86_SUMMIT is not set > # CONFIG_X86_BIGSMP is not set > # CONFIG_X86_VISWS is not set > # CONFIG_X86_GENERICARCH is not set > # CONFIG_X86_ES7000 is not set > # CONFIG_PARAVIRT is not set > # CONFIG_M386 is not set > # CONFIG_M486 is not set > # CONFIG_M586 is not set > # CONFIG_M586TSC is not set > # CONFIG_M586MMX is not set > # CONFIG_M686 is not set > # CONFIG_MPENTIUMII is
Re: nmi_watchdog=2 regression in 2.6.21
On 2007.08.07 17:06:49 -0700, Daniel Walker wrote: This patch below hangs my system on boot if I set nmi_watchdog=2 . It shows the NMI as stuck then the system hangs .. nmi_watchdog=1 works fine, and the system boots without any watchdog options .. The machine is an Intel allagash development board, and it has two dual core Pentium-M cpus. I attached the .config I used. Should be fixed in commit e82f64e5bb0648a13630d752c35be1e7bd8bab96 (2.6.23-rc1 IIRC). Unfornately it didn't make it into 2.6.22, but the patch should work even with 2.6.21 IIRC. Björn bf8696ed6dfa561198b4736deaf11ab68dcc4845 is first bad commit commit bf8696ed6dfa561198b4736deaf11ab68dcc4845 Author: Stephane Eranian [EMAIL PROTECTED] Date: Wed May 2 19:27:05 2007 +0200 [PATCH] i386: i386 make NMI use PERFCTR1 for architectural perfmon (take 2) Hello, This patch against 2.6.20-git14 makes the NMI watchdog use PERFSEL1/PERFCTR1 instead of PERFSEL0/PERFCTR0 on processors supporting Intel architectural perfmon, such as Intel Core 2. Although all PMU events can work on both counters, the Precise Event-Based Sampling (PEBS) requires that the event be in PERFCTR0 to work correctly (see section 18.14.4.1 in the IA32 SDM Vol 3b). A similar patch for x86-64 is to follow. Changelog: - make the i386 NMI watchdog use PERFSEL1/PERFCTR1 instead of PERFSEL0/PERFCTR0 on processors supporting the Intel architectural perfmon (e.g. Core 2 Duo). This allows PEBS to work when the NMI watchdog is active. signed-off-by: stephane eranian [EMAIL PROTECTED] Signed-off-by: Andi Kleen [EMAIL PROTECTED] :04 04 9cee0745798cb56da2ca82032b8ee88a2c32700a f59ee02e3cd8f13503edb7312a3494f4d7ec0069 M arch # # Automatically generated make config: don't edit # Linux kernel version: 2.6.21 # Tue Aug 7 16:40:44 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_CPUSETS is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set # CONFIG_BLK_DEV_INITRD is not set # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_STOP_MACHINE=y # # Block layer # CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set # CONFIG_IOSCHED_DEADLINE is not set CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq # # Processor type and features # # CONFIG_TICK_ONESHOT is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set CONFIG_SMP=y CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_PARAVIRT is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set CONFIG_MPENTIUMM=y # CONFIG_MCORE2 is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8
Re: [PATCH] Fix msr register allocation
On 2007.07.30 12:25:54 -0300, Glauber de Oliveira Costa wrote: > Since the value in ret will go through a return statement, ^^^ You mean "err" I guess? > it does not need to be put in eax register directly. Instead, > we let the compiler do his job and choose what to do with it, > potentially getting a better register allocation. > > Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]> > > diff --git a/include/asm-i386/msr.h b/include/asm-i386/msr.h > index df21ea0..cc4263c 100644 > --- a/include/asm-i386/msr.h > +++ b/include/asm-i386/msr.h > @@ -54,7 +54,7 @@ static inline int native_write_msr_safe(unsigned int msr, >" .align 4\n\t" >" .long 2b,3b\n\t" >".previous" > - : "=a" (err) > + : "=r" (err) >: "c" (msr), "0" ((u32)val), "d" ((u32)(val>>32)), > "i" (-EFAULT)); > return err; Note that the EAX output constraint is re-used in the input section for (u32)val, i.e. the lower half of the value to be written. And "wrmsr" needs that in EAX, so you cannot change the output constraint without touching the input constraint. Also, I do not see how the compiler could do any better than having the return value already in EAX, but that doesn't really mean anything ;-) Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix msr register allocation
On 2007.07.30 12:25:54 -0300, Glauber de Oliveira Costa wrote: Since the value in ret will go through a return statement, ^^^ You mean err I guess? it does not need to be put in eax register directly. Instead, we let the compiler do his job and choose what to do with it, potentially getting a better register allocation. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] diff --git a/include/asm-i386/msr.h b/include/asm-i386/msr.h index df21ea0..cc4263c 100644 --- a/include/asm-i386/msr.h +++ b/include/asm-i386/msr.h @@ -54,7 +54,7 @@ static inline int native_write_msr_safe(unsigned int msr, .align 4\n\t .long 2b,3b\n\t .previous - : =a (err) + : =r (err) : c (msr), 0 ((u32)val), d ((u32)(val32)), i (-EFAULT)); return err; Note that the EAX output constraint is re-used in the input section for (u32)val, i.e. the lower half of the value to be written. And wrmsr needs that in EAX, so you cannot change the output constraint without touching the input constraint. Also, I do not see how the compiler could do any better than having the return value already in EAX, but that doesn't really mean anything ;-) Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On 2007.07.28 01:29:19 +0200, Andi Kleen wrote: > > Any faults in that reasoning? > > GNU sort uses a merge sort with temporary files on disk. Not sure > how much it keeps in memory during that, but it's probably less > than 150MB. At some point the dirty limit should kick in and write back the > data of the temporary files; so it's not quite the same as anonymous memory. > But it's not that different given. Hm, does that change anything? The files need to be read at the end (so they go into the cache) and are delete afterwards (cache gets freed I guess?). > It would be better to measure than to guess. At least Andrew's measurements > on 128MB actually didn't show updatedb being really that big a problem. Here's a before/after memory usage for an updatedb run: [EMAIL PROTECTED]:~# free -m total used free sharedbuffers cached Mem: 2011 1995 15 0269779 -/+ buffers/cache:946 1064 Swap: 1945 0 1945 [EMAIL PROTECTED]:~# updatedb [EMAIL PROTECTED]:~# free -m total used free sharedbuffers cached Mem: 2011 1914 96 0209746 -/+ buffers/cache:958 1052 Swap: 1945 0 1944 81MB more unused RAM afterwards. If anyone can make use of that, here's a snippet from /proc/$PID/smaps of updatedb's sort process, when it was at about its peak memory usage (according to the RSS column in top), which was about 50MB. 2b90ab3c1000-2b90ae4c3000 rw-p 2b90ab3c1000 00:00 0 Size: 50184 kB Rss: 50184 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 50184 kB Referenced:50184 kB > Perhaps some people have much more files or simply a less efficient > updatedb implementation? sort (GNU coreutils) 5.97 GNU updatedb version 4.2.31 > I guess the people who complain here that loudly really need to supply > some real numbers. Just to clarify: I'm not complaining either way, neither about not merging swap prefetch, nor about someone wanting that to be merge. It was rather the "discussion" that caught my attention... Just in case ;-) Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On 2007.07.27 20:16:32 +0200, Rene Herman wrote: > On 07/27/2007 07:45 PM, Daniel Hazelton wrote: > >> Updatedb or another process that uses the FS heavily runs on a users >> 256MB P3-800 (when it is idle) and the VFS caches grow, causing memory >> pressure that causes other applications to be swapped to disk. In the >> morning the user has to wait for the system to swap those applications >> back in. >> Questions about it: >> Q) Does swap-prefetch help with this? A) [From all reports I've seen (*)] >> Yes, it does. > > No it does not. If updatedb filled memory to the point of causing swapping > (which noone is reproducing anyway) it HAS FILLED MEMORY and swap-prefetch > hasn't any memory to prefetch into -- updatedb itself doesn't use any > significant memory. > > Here's swap-prefetch's author saying the same: > > http://lkml.org/lkml/2007/2/9/112 > > | It can't help the updatedb scenario. Updatedb leaves the ram full and > | swap prefetch wants to cost as little as possible so it will never > | move anything out of ram in preference for the pages it wants to swap > | back in. > > Now please finally either understand this, or tell us how we're wrong. Con might have been wrong there for boxes with really little memory. My desktop box has not even 300k inodes in use (IIRC someone posted a df -i output showing 1 million inodes in use). Still, the memory footprint of the "sort" process grows up to about 50MB. Assuming that the average filename length stays, that would mean 150MB for the 1 million inode case, just for the "sort" process. Now, sort cannot produce any output before its got all its input, so that RSS usage exists at least as long as the VFS cache is growing due to the ongoing search for files. And then, all that memory that "sort" uses is required, because sort needs to output its results. So if there's memory pressure, the VFS cache is likely to be dropped, because "sort" needs its data, for sorting and producing output. And then sort terminates and leaves that whole lot of memory _unused_. The other actions of updatedb only touch the locate db, which is just a few megs (4.5MB here) big so the cache won't grow that much again. OK, so we got about, say, at least 128MB of totally unused memory, maybe even more. If you look at the vmstat output I sent, you see that I had between 90MB and 128MB free, depending on the swappiness setting, with increased inode usage, that could very well scale up. Conclusion: updatedb does _not_ leave the RAM full. And for a box with little memory (say 256MB) it might even be 50% or more memory that is free after updatedb ran. Might that make swap prefetch kick in? Any faults in that reasoning? Thanks, Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb morning after problem [was: Re: -mm merge plans for 2.6.23]
On 2007.07.27 20:16:32 +0200, Rene Herman wrote: On 07/27/2007 07:45 PM, Daniel Hazelton wrote: Updatedb or another process that uses the FS heavily runs on a users 256MB P3-800 (when it is idle) and the VFS caches grow, causing memory pressure that causes other applications to be swapped to disk. In the morning the user has to wait for the system to swap those applications back in. Questions about it: Q) Does swap-prefetch help with this? A) [From all reports I've seen (*)] Yes, it does. No it does not. If updatedb filled memory to the point of causing swapping (which noone is reproducing anyway) it HAS FILLED MEMORY and swap-prefetch hasn't any memory to prefetch into -- updatedb itself doesn't use any significant memory. Here's swap-prefetch's author saying the same: http://lkml.org/lkml/2007/2/9/112 | It can't help the updatedb scenario. Updatedb leaves the ram full and | swap prefetch wants to cost as little as possible so it will never | move anything out of ram in preference for the pages it wants to swap | back in. Now please finally either understand this, or tell us how we're wrong. Con might have been wrong there for boxes with really little memory. My desktop box has not even 300k inodes in use (IIRC someone posted a df -i output showing 1 million inodes in use). Still, the memory footprint of the sort process grows up to about 50MB. Assuming that the average filename length stays, that would mean 150MB for the 1 million inode case, just for the sort process. Now, sort cannot produce any output before its got all its input, so that RSS usage exists at least as long as the VFS cache is growing due to the ongoing search for files. And then, all that memory that sort uses is required, because sort needs to output its results. So if there's memory pressure, the VFS cache is likely to be dropped, because sort needs its data, for sorting and producing output. And then sort terminates and leaves that whole lot of memory _unused_. The other actions of updatedb only touch the locate db, which is just a few megs (4.5MB here) big so the cache won't grow that much again. OK, so we got about, say, at least 128MB of totally unused memory, maybe even more. If you look at the vmstat output I sent, you see that I had between 90MB and 128MB free, depending on the swappiness setting, with increased inode usage, that could very well scale up. Conclusion: updatedb does _not_ leave the RAM full. And for a box with little memory (say 256MB) it might even be 50% or more memory that is free after updatedb ran. Might that make swap prefetch kick in? Any faults in that reasoning? Thanks, Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb morning after problem [was: Re: -mm merge plans for 2.6.23]
On 2007.07.28 01:29:19 +0200, Andi Kleen wrote: Any faults in that reasoning? GNU sort uses a merge sort with temporary files on disk. Not sure how much it keeps in memory during that, but it's probably less than 150MB. At some point the dirty limit should kick in and write back the data of the temporary files; so it's not quite the same as anonymous memory. But it's not that different given. Hm, does that change anything? The files need to be read at the end (so they go into the cache) and are delete afterwards (cache gets freed I guess?). It would be better to measure than to guess. At least Andrew's measurements on 128MB actually didn't show updatedb being really that big a problem. Here's a before/after memory usage for an updatedb run: [EMAIL PROTECTED]:~# free -m total used free sharedbuffers cached Mem: 2011 1995 15 0269779 -/+ buffers/cache:946 1064 Swap: 1945 0 1945 [EMAIL PROTECTED]:~# updatedb [EMAIL PROTECTED]:~# free -m total used free sharedbuffers cached Mem: 2011 1914 96 0209746 -/+ buffers/cache:958 1052 Swap: 1945 0 1944 81MB more unused RAM afterwards. If anyone can make use of that, here's a snippet from /proc/$PID/smaps of updatedb's sort process, when it was at about its peak memory usage (according to the RSS column in top), which was about 50MB. 2b90ab3c1000-2b90ae4c3000 rw-p 2b90ab3c1000 00:00 0 Size: 50184 kB Rss: 50184 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 50184 kB Referenced:50184 kB Perhaps some people have much more files or simply a less efficient updatedb implementation? sort (GNU coreutils) 5.97 GNU updatedb version 4.2.31 I guess the people who complain here that loudly really need to supply some real numbers. Just to clarify: I'm not complaining either way, neither about not merging swap prefetch, nor about someone wanting that to be merge. It was rather the discussion that caught my attention... Just in case ;-) Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: updatedb
On 2007.07.26 11:58:29 +0200, Björn Steinbrink wrote: > Note that the total RSS usage of updatedb+sort was just about 50MB, > nevertheless swap grew to more than 300MB. It's also interesting that > swapping is so aggressive, that the amount of free memory is constantly > growing. I'm a missing something or wouldn't it be smarter to use that > free memory for buffers and cache first? (x86_64 system, so even if > highmem on x86 could be responsible, it's not the case here.) > > Will now go and see what happens if I play with swappiness. Hm, swappiness set to 0 looks even more weird to me, especially the beginning, where (AFAICT) basically buffers and caches are dropped just to get a pretty huge amount of free RAM. With swappiness set to 100, you basically get what you expect: swapping. But at least to me, that swapping looks _a lot_ smarter than what it did for the default swappiness of 60 or the 0 swappiness. Swap is growing, but so are the buffers, and the cache also only shrinks at a single point, probably when the "sort" process starts to grow. Plus, the amount of free memory isn't growing to insane sizes like in the other cases. vmstat output for both cases to be found below. Björn vmstat output for swappiness = 0 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 1 0 0 25140 37712 6530411 7144 31 0 0 99 0 4 0 0 25132 37736 6531200 038 212 604 1 0 98 1 2 0 0 25132 37736 6531200 0 0 177 479 0 0 100 0 3 1 0 17252 42568 6531200 151623 641 2332 1 3 54 41 3 2 0 15172 45412 5990800 1567 482 585 2051 1 4 50 45 2 2 0 20436 44320 5452400 743 9 368 1196 0 1 50 49 3 2 0 28416 36580 5452000 533 4 312 1016 0 1 50 49 1 2 4 44316 22780 5446400 128 191 240 786 1 1 43 54 3 0 3884 55256 18112 4577200 356 277 318 937 0 3 39 57 1 2 3924 57276 19004 4070000 631 117 347 1160 0 1 50 48 0 1 4108 61328 15348 4034400 768 301 378 1218 0 1 50 49 1 1 4276 60812 15612 402160 300 585 359 344 1045 0 1 50 49 0 2 4276 62184 17484 396240 995 703 1208 370 1188 0 2 50 48 0 2 4296 68244 14616 3626800 36011 275 973 0 1 50 49 2 2 4296 75292 6976 360160 137 667 480 424 1315 0 3 50 47 3 1 4300 78984 6344 3339200 639 635 517 1142 0 3 49 47 2 1 4300 80816 5520 3252001 992 587 683 1479 0 7 48 45 1 2 4556 82244 3704 325040 85 607 659 452 1040 0 3 50 47 0 1 4556 81600 4972 3242000 66544 362 1040 0 1 50 49 0 2 4940 80364 4588 325080 128 552 539 375 995 0 3 50 47 0 1 5004 83116 3576 324200 21 416 591 405 862 0 3 49 47 0 1 5004 80196 5388 3244000 604 0 328 1014 0 2 50 48 0 1 5004 82976 2548 3254400 461 343 363 923 0 4 49 47 1 1 5004 81528 3780 3247200 1124 157 542 1524 1 4 49 47 0 1 5516 81940 4000 324880 171 865 575 489 1356 0 4 49 46 0 2 5804 82412 2380 325160 96 660 423 385 1160 1 6 42 51 0 2 5804 81476 2868 3248000 864 204 493 1242 0 5 50 45 0 1 6268 83124 2088 325080 155 678 551 409 1107 0 7 46 47 0 2 6268 83216 1576 3238000 771 153 450 1330 1 11 43 46 0 3 59420 97888736 322240 17717 300 18035 375 869 0 27 12 60 0 4 176160 214288800 323160 3891323 38917 347 502 0 6 30 64 1 2 176212 242608 2752 327160 17 68395 441 1256 0 4 41 54 1 1 176212 237464 5492 3256800 883 188 452 1488 1 2 50 48 1 0 176212 232296 9628 3263600 1368 263 533 1690 1 2 50 47 0 1 176212 225852 13264 3265200 1212 0 480 1818 1 3 50 46 0 1 176212 202076 31708 3266000 6143 348 1723 5654 2 7 50 41 0 1 176212 177196 49952 3256000 6081 0 1698 5413 1 6 50 43 0 1 176212 147744 58332 3265200 2791 467 884 3565 1 5 50 44 0 1 176212 130604 63788 3266400 1813 407 645 2637 1 5 50 44 0 1 176212 119076 68012 3258400 1408 0 529 2239 0 4 50 45 2 1 176212 74112 83696 3260000 5225 667 1496 7206 5 13 50 33 1 1 176212 29296 99788 3262000 536016 1524 7009 4 15 49 32 0 1 176212 16012 105112 3266000 1773 2901 891 2417 1 4 50 45 1 1 19 16656 117236 326240 6077 4036 6337 1228 6040 6 9 49 36 1 1 194572 15996 122056 325680 43 160743 579 2740 1 5 50 44 0 1 209788 15888 125520 326360 5072 1149
Re: updatedb
On 2007.07.26 08:56:59 +0200, Rene Herman wrote: > On 07/26/2007 08:39 AM, Bongani Hlope wrote: > >> On Thursday 26 July 2007 05:59:53 Rene Herman wrote: > >>> So what's happening? If you sit down with a copy op "top" in one terminal >>> and updatedb in another, what does it show? > >> Just tested that, there's a steady increase in the useage of buff > > Great. Now concentrate on the "swpd" column, as it's the only thing > relevant here. The fact that an updatedb run fills/replaces caches is > completely and utterly unsurprising and not something swap-prefetch helps > with. The only thing it does is bring back stuff from _swap_. But that's with a system that has plenty of RAM available. The following vmstat output is from a run for which I ran a memory hog to simulate a box with just 1GB of RAM (didn't want to reboot ;-)). That (or even less) is probably a more likely amount of RAM for a majority of users. Other than the memory hog, there's a relatively small Firefox process (just about 150MB RSS), Xorg, mutt an apache and some other stuff, leaving about 128MB of RAM "free". procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0696 16360 47608 7460000 7134 30 0 0 99 0 0 0696 16352 47608 7460000 048 213 530 0 0 100 0 0 1796 16024 45516 745480 17 882 160 515 1698 1 3 58 38 0 1 1092 16124 41752 741640 43 193143 660 2219 1 4 50 45 1 1 1548 35096 24224 690360 107 1115 571 473 1616 1 4 50 45 2 1 8980 45560 18552 585800 1324 1069 1324 453 1705 1 4 50 45 2 1 12460 44840 21048 565880 1160 831 1345 403 1351 0 1 51 48 2 1 14348 44220 23016 554080 629 661 947 353 1140 0 2 50 48 [snip] procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 3 1 88904 72160 55368 389080 1377 836 1576 424 1403 0 3 50 47 0 1 96080 74084 57600 386600 2373 747 2559 412 1312 0 2 48 49 1 1 100036 74816 61544 386600 1319 1312 1547 524 1605 1 3 50 47 0 1 107032 72996 64728 377804 2332 1065 2341 461 1686 1 5 50 45 2 1 115036 68944 75908 367680 2660 3731 2941 1133 3721 1 6 49 44 3 0 125160 58768 90548 366280 3375 4883 3798 1458 4606 1 6 50 43 2 1 125176 48560 102364 3653605 3973 1377 1342 3701 1 4 50 46 [snip] procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 5 1 360628 101444 191420 344960 748 1927 760 670 2322 3 3 48 46 1 0 362064 100996 191972 345200 479 184 479 226 654 50 1 41 8 1 0 362064 99752 191980 3452000 0 9 182 594 50 0 50 0 4 0 362064 98728 191980 34520 11011 5 179 588 49 0 50 0 2 0 362064 97528 191988 3452000 015 188 603 50 0 50 0 2 0 362064 95876 191988 34520 4304313 190 603 50 0 49 1 1 0 362064 95008 191996 34520 2102112 183 604 50 0 50 0 2 0 364900 63516 193212 634560 947 408 1281 368 1163 16 3 50 31 0 0 364868 139108 193284 392200069 11213 383 15413 25 8 61 6 1 0 364868 139116 193312 3922000 0 1284 224 595 0 0 98 1 2 0 364868 139240 193320 3922000 0 9 182 553 0 0 100 0 Note that the total RSS usage of updatedb+sort was just about 50MB, nevertheless swap grew to more than 300MB. It's also interesting that swapping is so aggressive, that the amount of free memory is constantly growing. I'm a missing something or wouldn't it be smarter to use that free memory for buffers and cache first? (x86_64 system, so even if highmem on x86 could be responsible, it's not the case here.) Will now go and see what happens if I play with swappiness. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: updatedb
On 2007.07.26 11:58:29 +0200, Björn Steinbrink wrote: Note that the total RSS usage of updatedb+sort was just about 50MB, nevertheless swap grew to more than 300MB. It's also interesting that swapping is so aggressive, that the amount of free memory is constantly growing. I'm a missing something or wouldn't it be smarter to use that free memory for buffers and cache first? (x86_64 system, so even if highmem on x86 could be responsible, it's not the case here.) Will now go and see what happens if I play with swappiness. Hm, swappiness set to 0 looks even more weird to me, especially the beginning, where (AFAICT) basically buffers and caches are dropped just to get a pretty huge amount of free RAM. With swappiness set to 100, you basically get what you expect: swapping. But at least to me, that swapping looks _a lot_ smarter than what it did for the default swappiness of 60 or the 0 swappiness. Swap is growing, but so are the buffers, and the cache also only shrinks at a single point, probably when the sort process starts to grow. Plus, the amount of free memory isn't growing to insane sizes like in the other cases. vmstat output for both cases to be found below. Björn vmstat output for swappiness = 0 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 1 0 0 25140 37712 6530411 7144 31 0 0 99 0 4 0 0 25132 37736 6531200 038 212 604 1 0 98 1 2 0 0 25132 37736 6531200 0 0 177 479 0 0 100 0 3 1 0 17252 42568 6531200 151623 641 2332 1 3 54 41 3 2 0 15172 45412 5990800 1567 482 585 2051 1 4 50 45 2 2 0 20436 44320 5452400 743 9 368 1196 0 1 50 49 3 2 0 28416 36580 5452000 533 4 312 1016 0 1 50 49 1 2 4 44316 22780 5446400 128 191 240 786 1 1 43 54 3 0 3884 55256 18112 4577200 356 277 318 937 0 3 39 57 1 2 3924 57276 19004 4070000 631 117 347 1160 0 1 50 48 0 1 4108 61328 15348 4034400 768 301 378 1218 0 1 50 49 1 1 4276 60812 15612 402160 300 585 359 344 1045 0 1 50 49 0 2 4276 62184 17484 396240 995 703 1208 370 1188 0 2 50 48 0 2 4296 68244 14616 3626800 36011 275 973 0 1 50 49 2 2 4296 75292 6976 360160 137 667 480 424 1315 0 3 50 47 3 1 4300 78984 6344 3339200 639 635 517 1142 0 3 49 47 2 1 4300 80816 5520 3252001 992 587 683 1479 0 7 48 45 1 2 4556 82244 3704 325040 85 607 659 452 1040 0 3 50 47 0 1 4556 81600 4972 3242000 66544 362 1040 0 1 50 49 0 2 4940 80364 4588 325080 128 552 539 375 995 0 3 50 47 0 1 5004 83116 3576 324200 21 416 591 405 862 0 3 49 47 0 1 5004 80196 5388 3244000 604 0 328 1014 0 2 50 48 0 1 5004 82976 2548 3254400 461 343 363 923 0 4 49 47 1 1 5004 81528 3780 3247200 1124 157 542 1524 1 4 49 47 0 1 5516 81940 4000 324880 171 865 575 489 1356 0 4 49 46 0 2 5804 82412 2380 325160 96 660 423 385 1160 1 6 42 51 0 2 5804 81476 2868 3248000 864 204 493 1242 0 5 50 45 0 1 6268 83124 2088 325080 155 678 551 409 1107 0 7 46 47 0 2 6268 83216 1576 3238000 771 153 450 1330 1 11 43 46 0 3 59420 97888736 322240 17717 300 18035 375 869 0 27 12 60 0 4 176160 214288800 323160 3891323 38917 347 502 0 6 30 64 1 2 176212 242608 2752 327160 17 68395 441 1256 0 4 41 54 1 1 176212 237464 5492 3256800 883 188 452 1488 1 2 50 48 1 0 176212 232296 9628 3263600 1368 263 533 1690 1 2 50 47 0 1 176212 225852 13264 3265200 1212 0 480 1818 1 3 50 46 0 1 176212 202076 31708 3266000 6143 348 1723 5654 2 7 50 41 0 1 176212 177196 49952 3256000 6081 0 1698 5413 1 6 50 43 0 1 176212 147744 58332 3265200 2791 467 884 3565 1 5 50 44 0 1 176212 130604 63788 3266400 1813 407 645 2637 1 5 50 44 0 1 176212 119076 68012 3258400 1408 0 529 2239 0 4 50 45 2 1 176212 74112 83696 3260000 5225 667 1496 7206 5 13 50 33 1 1 176212 29296 99788 3262000 536016 1524 7009 4 15 49 32 0 1 176212 16012 105112 3266000 1773 2901 891 2417 1 4 50 45 1 1 19 16656 117236 326240 6077 4036 6337 1228 6040 6 9 49 36 1 1 194572 15996 122056 325680 43 160743 579 2740 1 5 50 44 0 1 209788 15888 125520 326360 5072 1149 5520 503 1836 1 4 49 46 0 1 231568
Re: updatedb
On 2007.07.26 08:56:59 +0200, Rene Herman wrote: On 07/26/2007 08:39 AM, Bongani Hlope wrote: On Thursday 26 July 2007 05:59:53 Rene Herman wrote: So what's happening? If you sit down with a copy op top in one terminal and updatedb in another, what does it show? Just tested that, there's a steady increase in the useage of buff Great. Now concentrate on the swpd column, as it's the only thing relevant here. The fact that an updatedb run fills/replaces caches is completely and utterly unsurprising and not something swap-prefetch helps with. The only thing it does is bring back stuff from _swap_. But that's with a system that has plenty of RAM available. The following vmstat output is from a run for which I ran a memory hog to simulate a box with just 1GB of RAM (didn't want to reboot ;-)). That (or even less) is probably a more likely amount of RAM for a majority of users. Other than the memory hog, there's a relatively small Firefox process (just about 150MB RSS), Xorg, mutt an apache and some other stuff, leaving about 128MB of RAM free. procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0696 16360 47608 7460000 7134 30 0 0 99 0 0 0696 16352 47608 7460000 048 213 530 0 0 100 0 0 1796 16024 45516 745480 17 882 160 515 1698 1 3 58 38 0 1 1092 16124 41752 741640 43 193143 660 2219 1 4 50 45 1 1 1548 35096 24224 690360 107 1115 571 473 1616 1 4 50 45 2 1 8980 45560 18552 585800 1324 1069 1324 453 1705 1 4 50 45 2 1 12460 44840 21048 565880 1160 831 1345 403 1351 0 1 51 48 2 1 14348 44220 23016 554080 629 661 947 353 1140 0 2 50 48 [snip] procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 3 1 88904 72160 55368 389080 1377 836 1576 424 1403 0 3 50 47 0 1 96080 74084 57600 386600 2373 747 2559 412 1312 0 2 48 49 1 1 100036 74816 61544 386600 1319 1312 1547 524 1605 1 3 50 47 0 1 107032 72996 64728 377804 2332 1065 2341 461 1686 1 5 50 45 2 1 115036 68944 75908 367680 2660 3731 2941 1133 3721 1 6 49 44 3 0 125160 58768 90548 366280 3375 4883 3798 1458 4606 1 6 50 43 2 1 125176 48560 102364 3653605 3973 1377 1342 3701 1 4 50 46 [snip] procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 5 1 360628 101444 191420 344960 748 1927 760 670 2322 3 3 48 46 1 0 362064 100996 191972 345200 479 184 479 226 654 50 1 41 8 1 0 362064 99752 191980 3452000 0 9 182 594 50 0 50 0 4 0 362064 98728 191980 34520 11011 5 179 588 49 0 50 0 2 0 362064 97528 191988 3452000 015 188 603 50 0 50 0 2 0 362064 95876 191988 34520 4304313 190 603 50 0 49 1 1 0 362064 95008 191996 34520 2102112 183 604 50 0 50 0 2 0 364900 63516 193212 634560 947 408 1281 368 1163 16 3 50 31 0 0 364868 139108 193284 392200069 11213 383 15413 25 8 61 6 1 0 364868 139116 193312 3922000 0 1284 224 595 0 0 98 1 2 0 364868 139240 193320 3922000 0 9 182 553 0 0 100 0 Note that the total RSS usage of updatedb+sort was just about 50MB, nevertheless swap grew to more than 300MB. It's also interesting that swapping is so aggressive, that the amount of free memory is constantly growing. I'm a missing something or wouldn't it be smarter to use that free memory for buffers and cache first? (x86_64 system, so even if highmem on x86 could be responsible, it's not the case here.) Will now go and see what happens if I play with swappiness. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [20/58] x86: Always probe the NMI watchdog
On 2007.07.19 11:55:05 +0200, Andi Kleen wrote: > > From: [** iso-8859-1 charset **] Bj�rnSteinbrink <[EMAIL PROTECTED]> > > The performance counter allocator relies on the nmi watchdog being > probed, so we have to do that even if the watchdog is not enabled. Are you going to revert your fixes to the msr->bit conversions? Or is this patch still required with them in place? I actually like your fix, as it also fixes the semantics of single_msr_reserve() (see other mail). So, can we just drop this one? Thanks, Bj�rn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [21/58] i386: Reserve the right performance counter for the Intel PerfMon NMI watchdog
On 2007.07.19 11:55:06 +0200, Andi Kleen wrote: > > From: [** iso-8859-1 charset **] Bj�rnSteinbrink <[EMAIL PROTECTED]> > > The Intel PerfMon NMI watchdog was using the generic reservation > function which always reserves the first performance counter. But the > watchdog actually uses the second performance counter, thus we need a > specialised function. Ah, almost forgot about that patch. Actually, thanks to your fix that basically reverted the msr->offset conversation to its 2.6.21 implementation, single_msr_reserve has sane semantics now and does just what the name suggests (before, the wd_ops entries had to store the "base" msrs, so it was really a first_msr_reserve). With wd_ops->perfctr no longer needed to be the base msr, we can just fix that value for the arch perfmon watchdog. (And maybe we should remove the values for those implementations that don't employ the single_msr_reserve() stuff?) Thanks, Bj�rn From: Bj�rn Steinbrink <[EMAIL PROTECTED]> The Intel PerfMon NMI watchdog reserves the first performance counter, but uses the second one. Make it correctly reserve the second one. Signed-off-by: Bj�rn Steinbrink <[EMAIL PROTECTED]> --- diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c index 4d26d51..30b5e48 100644 --- a/arch/i386/kernel/cpu/perfctr-watchdog.c +++ b/arch/i386/kernel/cpu/perfctr-watchdog.c @@ -599,8 +599,8 @@ static struct wd_ops intel_arch_wd_ops = { .setup = setup_intel_arch_watchdog, .rearm = p6_rearm, .stop = single_msr_stop_watchdog, - .perfctr = MSR_ARCH_PERFMON_PERFCTR0, - .evntsel = MSR_ARCH_PERFMON_EVENTSEL0, + .perfctr = MSR_ARCH_PERFMON_PERFCTR1, + .evntsel = MSR_ARCH_PERFMON_EVENTSEL1, }; static void probe_nmi_watchdog(void) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [21/58] i386: Reserve the right performance counter for the Intel PerfMon NMI watchdog
On 2007.07.19 11:55:06 +0200, Andi Kleen wrote: From: [** iso-8859-1 charset **] Bj�rnSteinbrink [EMAIL PROTECTED] The Intel PerfMon NMI watchdog was using the generic reservation function which always reserves the first performance counter. But the watchdog actually uses the second performance counter, thus we need a specialised function. Ah, almost forgot about that patch. Actually, thanks to your fix that basically reverted the msr-offset conversation to its 2.6.21 implementation, single_msr_reserve has sane semantics now and does just what the name suggests (before, the wd_ops entries had to store the base msrs, so it was really a first_msr_reserve). With wd_ops-perfctr no longer needed to be the base msr, we can just fix that value for the arch perfmon watchdog. (And maybe we should remove the values for those implementations that don't employ the single_msr_reserve() stuff?) Thanks, Bj�rn From: Bj�rn Steinbrink [EMAIL PROTECTED] The Intel PerfMon NMI watchdog reserves the first performance counter, but uses the second one. Make it correctly reserve the second one. Signed-off-by: Bj�rn Steinbrink [EMAIL PROTECTED] --- diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c index 4d26d51..30b5e48 100644 --- a/arch/i386/kernel/cpu/perfctr-watchdog.c +++ b/arch/i386/kernel/cpu/perfctr-watchdog.c @@ -599,8 +599,8 @@ static struct wd_ops intel_arch_wd_ops = { .setup = setup_intel_arch_watchdog, .rearm = p6_rearm, .stop = single_msr_stop_watchdog, - .perfctr = MSR_ARCH_PERFMON_PERFCTR0, - .evntsel = MSR_ARCH_PERFMON_EVENTSEL0, + .perfctr = MSR_ARCH_PERFMON_PERFCTR1, + .evntsel = MSR_ARCH_PERFMON_EVENTSEL1, }; static void probe_nmi_watchdog(void) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [20/58] x86: Always probe the NMI watchdog
On 2007.07.19 11:55:05 +0200, Andi Kleen wrote: From: [** iso-8859-1 charset **] Bj�rnSteinbrink [EMAIL PROTECTED] The performance counter allocator relies on the nmi watchdog being probed, so we have to do that even if the watchdog is not enabled. Are you going to revert your fixes to the msr-bit conversions? Or is this patch still required with them in place? I actually like your fix, as it also fixes the semantics of single_msr_reserve() (see other mail). So, can we just drop this one? Thanks, Bj�rn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc7: known regressions with patches
On 2007.07.03 14:42:25 -0700, Linus Torvalds wrote: > > > On Tue, 3 Jul 2007, Bj?rn Steinbrink wrote: > > Andi said that one of the regression fixes wasn't critical for .22 and > > that he wants to do a stopgap for the other regression (my patch > > sucked), reverting the code to the .21 version. So you can drop the > > patches and/or me here. > > Can you say which patch should be reverted. This thing really shouldn't > have gone on this long, I would have hopef we had the oprofile thing > sorted out already.. That would be commit 09198e68501a7e34737cd9264d266f42429abcdc, for which there are already a few fixes in your tree. Andi, did you intent to fully revert that, or just certain parts of it? Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc7: known regressions with patches
On 2007.07.03 18:45:24 +0200, Michal Piotrowski wrote: > Subject: OProfile issues > References : http://lkml.org/lkml/2007/6/12/207 > Submitter : Stephane Eranian <[EMAIL PROTECTED]> > Handled-By : Björn Steinbrink <[EMAIL PROTECTED]> > Patch : http://lkml.org/lkml/2007/6/12/392 > http://lkml.org/lkml/2007/6/20/275 > http://lkml.org/lkml/2007/6/20/276 > Status : patches were suggested Andi said that one of the regression fixes wasn't critical for .22 and that he wants to do a stopgap for the other regression (my patch sucked), reverting the code to the .21 version. So you can drop the patches and/or me here. Thanks, Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc7: known regressions with patches
On 2007.07.03 18:45:24 +0200, Michal Piotrowski wrote: Subject: OProfile issues References : http://lkml.org/lkml/2007/6/12/207 Submitter : Stephane Eranian [EMAIL PROTECTED] Handled-By : Björn Steinbrink [EMAIL PROTECTED] Patch : http://lkml.org/lkml/2007/6/12/392 http://lkml.org/lkml/2007/6/20/275 http://lkml.org/lkml/2007/6/20/276 Status : patches were suggested Andi said that one of the regression fixes wasn't critical for .22 and that he wants to do a stopgap for the other regression (my patch sucked), reverting the code to the .21 version. So you can drop the patches and/or me here. Thanks, Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc7: known regressions with patches
On 2007.07.03 14:42:25 -0700, Linus Torvalds wrote: On Tue, 3 Jul 2007, Bj?rn Steinbrink wrote: Andi said that one of the regression fixes wasn't critical for .22 and that he wants to do a stopgap for the other regression (my patch sucked), reverting the code to the .21 version. So you can drop the patches and/or me here. Can you say which patch should be reverted. This thing really shouldn't have gone on this long, I would have hopef we had the oprofile thing sorted out already.. That would be commit 09198e68501a7e34737cd9264d266f42429abcdc, for which there are already a few fixes in your tree. Andi, did you intent to fully revert that, or just certain parts of it? Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Old bug in tg3 driver unfixed?
On 2007.06.30 14:06:14 +0200, Tim Boneko wrote: > Hello! > I am not subscribed to this list so please CC answers to my mail > address. THX! > > I recently replaced the mainboard of one of my servers with a Tyan > Tomcat K8E. The onboard gigabit NIC is a Broadcom BCM5721. After > compiling and loading the tg3 driver in Kernel 2.6.21.5, the interface > could not be configured: "Device not found". > While searching the net i found a few other people with the same problem > but no solution. > > By coincidence i found that a simpe "ifconfig eth1" worked OK and > afterwards the device could be configured and used as desired. After > searching this list, i found this posting > > http://uwsg.iu.edu/hypermail/linux/kernel/0409.0/0224.html > > by someone with obviously the same problem. Sounds more like a iftab/udev problem. Check /etc/iftab and your udev rules please. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Old bug in tg3 driver unfixed?
On 2007.06.30 14:06:14 +0200, Tim Boneko wrote: Hello! I am not subscribed to this list so please CC answers to my mail address. THX! I recently replaced the mainboard of one of my servers with a Tyan Tomcat K8E. The onboard gigabit NIC is a Broadcom BCM5721. After compiling and loading the tg3 driver in Kernel 2.6.21.5, the interface could not be configured: Device not found. While searching the net i found a few other people with the same problem but no solution. By coincidence i found that a simpe ifconfig eth1 worked OK and afterwards the device could be configured and used as desired. After searching this list, i found this posting http://uwsg.iu.edu/hypermail/linux/kernel/0409.0/0224.html by someone with obviously the same problem. Sounds more like a iftab/udev problem. Check /etc/iftab and your udev rules please. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OT] Vim highlighting for trailing spaces
On 2007.06.29 01:42:22 -0700, Josh Triplett wrote: > Jan Engelhardt wrote: > > On Jun 29 2007 00:53, Josh Triplett wrote: > >> And if you really want highlighting, you can always use grep --color. :) > > > > Been there, done that, have GREP_COLOR env variable defined! > > Same here. Now I just need to convince git-grep to use it. You need to convince grep. When piping its output to less, it won't colorize unless forced. Always forcing color via GREP_OPTIONS might break certain use-cases, and git-grep doesn't allow to pass options. So for me, a bash alias it is: alias gg='GREP_OPTIONS=--color=always git-grep' You might need to set LESS=-R in addition to that, to stop less from stripping the color codes. Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OT] Vim highlighting for trailing spaces
On 2007.06.29 01:42:22 -0700, Josh Triplett wrote: Jan Engelhardt wrote: On Jun 29 2007 00:53, Josh Triplett wrote: And if you really want highlighting, you can always use grep --color. :) Been there, done that, have GREP_COLOR env variable defined! Same here. Now I just need to convince git-grep to use it. You need to convince grep. When piping its output to less, it won't colorize unless forced. Always forcing color via GREP_OPTIONS might break certain use-cases, and git-grep doesn't allow to pass options. So for me, a bash alias it is: alias gg='GREP_OPTIONS=--color=always git-grep' You might need to set LESS=-R in addition to that, to stop less from stripping the color codes. Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DIE_NMI_IPI to oprofile ?
On 2007.06.25 23:51:51 -0700, Amitabha Roy wrote: > Hi > > >From looking at the default_do_nmi code in traps.c it seems that (at > least for the non BSP case, with reason=0) the die notifier chain gets > called with val=DIE_NMI_IPI. > > However in the profile_exceptions_notify_handler we check for DIE_NMI > (and not DIE_NMI_IPI) to decide if we want to cycle through the > counters. This seems wrong to me unless I am missing something (which > must be the case, otherwise this code is broken :). > > Can anyone help me understand this ? notify_die(DIE_NMI, ...) is also called from nmi_watchdog_tick. So depending on "reason" you call: notify_die(DIE_NMI_IPI, ...) nmi_watchdog_tick(...) ---> notify_die(DIE_NMI, ...) or just notify_die(DIE_NMI, ...) HTH Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DIE_NMI_IPI to oprofile ?
On 2007.06.25 23:51:51 -0700, Amitabha Roy wrote: Hi From looking at the default_do_nmi code in traps.c it seems that (at least for the non BSP case, with reason=0) the die notifier chain gets called with val=DIE_NMI_IPI. However in the profile_exceptions_notify_handler we check for DIE_NMI (and not DIE_NMI_IPI) to decide if we want to cycle through the counters. This seems wrong to me unless I am missing something (which must be the case, otherwise this code is broken :). Can anyone help me understand this ? notify_die(DIE_NMI, ...) is also called from nmi_watchdog_tick. So depending on reason you call: notify_die(DIE_NMI_IPI, ...) nmi_watchdog_tick(...) --- notify_die(DIE_NMI, ...) or just notify_die(DIE_NMI, ...) HTH Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Always probe the NMI watchdog
On 2007.06.25 21:36:17 +0200, Andi Kleen wrote: > On Monday 25 June 2007 21:09, Andrew Morton wrote: > > On Wed, 20 Jun 2007 20:34:48 +0200 > > > > Bj__rn Steinbrink <[EMAIL PROTECTED]> wrote: > > > The performance counter allocator relies on the nmi watchdog being > > > probed, so we have to do that even if the watchdog is not enabled. > > > > So... what's the status of this lot? > > > > I've just merged this patch and the second one: > > > > Subject: [PATCH 2/2] Reserve the right performance counter for the Intel > > PerfMon NMI watchdog Message-ID: <[EMAIL PROTECTED]> > > > > but there was no followup discussion afaict. > > > > Andi, Stephane: acks? > > Yes, although I'm still a little uneasy about the always probe one. > > > If acked, do we agree that this is 2.6.22 material? > > The first (always probe) is probably .22 material, but needs more testing > first. Hm, without the second, I expect OProfile to break when the watchdog is enabled. Alternatively to the patch I sent, we could revert the change that makes it use perfctr1 instead of perfctr0. Would you prefer that? Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Always probe the NMI watchdog
On 2007.06.25 13:01:58 -0700, Stephane Eranian wrote: > Hi, > On Mon, Jun 25, 2007 at 09:36:17PM +0200, Andi Kleen wrote: > > On Monday 25 June 2007 21:09, Andrew Morton wrote: > > > On Wed, 20 Jun 2007 20:34:48 +0200 > > > > > > Bj__rn Steinbrink <[EMAIL PROTECTED]> wrote: > > > > The performance counter allocator relies on the nmi watchdog being > > > > probed, so we have to do that even if the watchdog is not enabled. > > > > > > So... what's the status of this lot? > > > > > > I've just merged this patch and the second one: > > > > > > Subject: [PATCH 2/2] Reserve the right performance counter for the Intel > > > PerfMon NMI watchdog Message-ID: <[EMAIL PROTECTED]> > > > > > > but there was no followup discussion afaict. > > > > > > Andi, Stephane: acks? > > > > Yes, although I'm still a little uneasy about the always probe one. > > > > I looked at the code I have in my tree coming from Bjon's patches and > I am a bit confused by the flow for probing as well. > > The register allocator works globally, i.e., you reserve a register > for all CPUs at once. > > The probe_nmi_watchdog() routine simply probes the CPU type to initialize > the watchdog data structure (wd_ops). This needs to be done once and for all. > Why put it in a route that is called with on_each_cpu()? Ehrm, that's a good question actually... I moved the probing call up into setup_local_APIC in that other big patch and put a check at the start of the probing function, so that it is executed only once. No idea why I did it so weird in this one. > I think the tricky part is that we do want to reserve perfctr1 even > though the NMI watchdog is not active. This comes from the fact that > the NMI watchdog knows about only one counter and if it can't get that > one, it probably fails. By reserving it from the start, we ensure NMI > watchdog will work when eventually activated. Can you enable it later on at all? It failed for me when I tried, because it didn't know which hardware to use. Had to pass the kernel parameter to make the proc files do anything. Seems like it has to be enable at boot to work at all. And AFAICT we never unconditionally reserved a perfctr for the watchdog. > Unlike sharing between Oprofile and perfmon which works by enforcing > mutual exclusion between the two subsystems, the NMI watchdog must > work concurrently with either Oprofile or Perfmon. In 2.6.21 the nmi watchdog, if enabled, just reserved its perfctrs and everything else had to deal with it. Since the cleanup, the watchdog will release its perfctr when disabled, so another subsystem can grab it. But that also means that that other subsystem must release it again before you can reenable the watchdog. > Bjorn, did I understand the constraints correctly? I'll tell you, once I'm sure that I understood them correctly ;-) Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang
On 2007.06.25 08:40:35 -0400, Jeremy Fitzhardinge wrote: > Ingo Molnar wrote: > >* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > >>hm, restoring nmi.c to the v2.6.21 state does not fix the > >>nmi_watchdog=2 hang. I'll do a bisection run. > >> > > > >and after spending an hour on 15 bisection steps: > > > > git-bisect start > > git-bisect good d1be341dba5521506d9e6dccfd66179080705bea > > git-bisect bad a06381fec77bf88ec6c5eb6324457cb04e9ffd69 > > git-bisect bad 794543a236074f49a8af89ef08ef6a753e4777e5 > > git-bisect good 24a77daf3d80bddcece044e6dc3675e427eef3f3 > > git-bisect bad ea62ccd00fd0b6720b033adfc9984f31130ce195 > > git-bisect good 7e20ef030dde0e52dd5a57220ee82fa9facbea4e > > git-bisect bad f19cccf366a07e05703c90038704a3a5ffcb0607 > > git-bisect good 0d08e0d3a97cce22ebf80b54785e00d9b94e1add > > git-bisect bad 856f44ff4af6e57fdc39a8b2bec498c88438bd27 > > git-bisect bad f8822f42019eceed19cc6c0f985a489e17796ed8 > > git-bisect good 1c3d99c11c47c8a1a9ed6a46555dbf6520683c52 > > git-bisect good b239fb2501117bf3aeb4dd6926edd855be92333d > > git-bisect good 98de032b681d8a7532d44dfc66aa5c0c1c755a9d > > git-bisect good 42c24fa22e86365055fc931d833f26165e687c19 > > > >the winner is ... > > > > f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit > > commit f8822f42019eceed19cc6c0f985a489e17796ed8 > > Author: Jeremy Fitzhardinge <[EMAIL PROTECTED]> > > Date: Wed May 2 19:27:14 2007 +0200 > > > >[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to > >make them patchable > > > >... our wonderful paravirt subsystem, honed to eternal perfection by the > >testing-machine x86_64 tree. > > > >reverting -git-curr's paravirt.c, paravirt.h, smp.c and tlbflush.h to > >before the bad commit makes the NMI watchdog work again. Patch against > >-rc6 is below. > > > > Er, wow. I've been running with this stuff for months without a > problem. Do you have CONFIG_PARAVIRT enabled? Do you still get the > hang if you boot with "noreplace-paravirt" to disable the patching? Are you running on AMD hardware? As Intel performance counters are only 32 bits wide, the wrmsrl bug should be a non-issue at least for the NMI watchdog on Intel hardware. AMD uses 48bit wide performance counters, which are probably less happy ;-) Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang
On 2007.06.25 08:49:05 -0400, Jeremy Fitzhardinge wrote: > Björn Steinbrink wrote: > >On 2007.06.25 10:26:52 +0200, Ingo Molnar wrote: > > > >>* Ingo Molnar <[EMAIL PROTECTED]> wrote: > >> > >> > >>>the winner is ... > >>> > >>> f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit > >>> commit f8822f42019eceed19cc6c0f985a489e17796ed8 > >>> Author: Jeremy Fitzhardinge <[EMAIL PROTECTED]> > >>> Date: Wed May 2 19:27:14 2007 +0200 > >>> > >>>[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to > >>>make them patchable > >>> > >>and of course i'm happy to test any patch that is simpler than the > >>brutal revert i sent. > >> > > > >wrmsrl() looks broken, dropping the upper 32bits of the value to be > >written. Does this help? > > > > Crap. That's embarrassing. Does it help, because it seems likely? > (Esp since Ingo didn't even have CONFIG_PARAVIRT enabled, so most of his > revert would have been dead code anyway.) He has. The config Ingo sent was for x86_64, which (AFAICT) doesn't have CONFIG_PARAVIRT, so the config was unfortunately useless. But his bootlog tells us: Booting paravirtualized kernel on bare hardware Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang
On 2007.06.25 10:26:52 +0200, Ingo Molnar wrote: > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > the winner is ... > > > > f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit > > commit f8822f42019eceed19cc6c0f985a489e17796ed8 > > Author: Jeremy Fitzhardinge <[EMAIL PROTECTED]> > > Date: Wed May 2 19:27:14 2007 +0200 > > > > [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to > > make them patchable > > and of course i'm happy to test any patch that is simpler than the > brutal revert i sent. wrmsrl() looks broken, dropping the upper 32bits of the value to be written. Does this help? Björn --- diff --git a/include/asm-i386/paravirt.h b/include/asm-i386/paravirt.h index d7a0512..7f846a7 100644 --- a/include/asm-i386/paravirt.h +++ b/include/asm-i386/paravirt.h @@ -539,7 +539,7 @@ static inline int paravirt_write_msr(unsigned msr, unsigned low, unsigned high) val = paravirt_read_msr(msr, &_err);\ } while(0) -#define wrmsrl(msr,val)((void)paravirt_write_msr(msr, val, 0)) +#define wrmsrl(msr,val)wrmsr(msr, (u32)((u64)(val)), ((u64)(val))>>32) #define wrmsr_safe(msr,a,b)paravirt_write_msr(msr, a, b) /* rdmsr with exception handling */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang
On 2007.06.25 10:26:52 +0200, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: the winner is ... f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit commit f8822f42019eceed19cc6c0f985a489e17796ed8 Author: Jeremy Fitzhardinge [EMAIL PROTECTED] Date: Wed May 2 19:27:14 2007 +0200 [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable and of course i'm happy to test any patch that is simpler than the brutal revert i sent. wrmsrl() looks broken, dropping the upper 32bits of the value to be written. Does this help? Björn --- diff --git a/include/asm-i386/paravirt.h b/include/asm-i386/paravirt.h index d7a0512..7f846a7 100644 --- a/include/asm-i386/paravirt.h +++ b/include/asm-i386/paravirt.h @@ -539,7 +539,7 @@ static inline int paravirt_write_msr(unsigned msr, unsigned low, unsigned high) val = paravirt_read_msr(msr, _err);\ } while(0) -#define wrmsrl(msr,val)((void)paravirt_write_msr(msr, val, 0)) +#define wrmsrl(msr,val)wrmsr(msr, (u32)((u64)(val)), ((u64)(val))32) #define wrmsr_safe(msr,a,b)paravirt_write_msr(msr, a, b) /* rdmsr with exception handling */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang
On 2007.06.25 08:49:05 -0400, Jeremy Fitzhardinge wrote: Björn Steinbrink wrote: On 2007.06.25 10:26:52 +0200, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: the winner is ... f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit commit f8822f42019eceed19cc6c0f985a489e17796ed8 Author: Jeremy Fitzhardinge [EMAIL PROTECTED] Date: Wed May 2 19:27:14 2007 +0200 [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable and of course i'm happy to test any patch that is simpler than the brutal revert i sent. wrmsrl() looks broken, dropping the upper 32bits of the value to be written. Does this help? Crap. That's embarrassing. Does it help, because it seems likely? (Esp since Ingo didn't even have CONFIG_PARAVIRT enabled, so most of his revert would have been dead code anyway.) He has. The config Ingo sent was for x86_64, which (AFAICT) doesn't have CONFIG_PARAVIRT, so the config was unfortunately useless. But his bootlog tells us: Booting paravirtualized kernel on bare hardware Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang
On 2007.06.25 08:40:35 -0400, Jeremy Fitzhardinge wrote: Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: hm, restoring nmi.c to the v2.6.21 state does not fix the nmi_watchdog=2 hang. I'll do a bisection run. and after spending an hour on 15 bisection steps: git-bisect start git-bisect good d1be341dba5521506d9e6dccfd66179080705bea git-bisect bad a06381fec77bf88ec6c5eb6324457cb04e9ffd69 git-bisect bad 794543a236074f49a8af89ef08ef6a753e4777e5 git-bisect good 24a77daf3d80bddcece044e6dc3675e427eef3f3 git-bisect bad ea62ccd00fd0b6720b033adfc9984f31130ce195 git-bisect good 7e20ef030dde0e52dd5a57220ee82fa9facbea4e git-bisect bad f19cccf366a07e05703c90038704a3a5ffcb0607 git-bisect good 0d08e0d3a97cce22ebf80b54785e00d9b94e1add git-bisect bad 856f44ff4af6e57fdc39a8b2bec498c88438bd27 git-bisect bad f8822f42019eceed19cc6c0f985a489e17796ed8 git-bisect good 1c3d99c11c47c8a1a9ed6a46555dbf6520683c52 git-bisect good b239fb2501117bf3aeb4dd6926edd855be92333d git-bisect good 98de032b681d8a7532d44dfc66aa5c0c1c755a9d git-bisect good 42c24fa22e86365055fc931d833f26165e687c19 the winner is ... f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit commit f8822f42019eceed19cc6c0f985a489e17796ed8 Author: Jeremy Fitzhardinge [EMAIL PROTECTED] Date: Wed May 2 19:27:14 2007 +0200 [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable ... our wonderful paravirt subsystem, honed to eternal perfection by the testing-machine x86_64 tree. reverting -git-curr's paravirt.c, paravirt.h, smp.c and tlbflush.h to before the bad commit makes the NMI watchdog work again. Patch against -rc6 is below. Er, wow. I've been running with this stuff for months without a problem. Do you have CONFIG_PARAVIRT enabled? Do you still get the hang if you boot with noreplace-paravirt to disable the patching? Are you running on AMD hardware? As Intel performance counters are only 32 bits wide, the wrmsrl bug should be a non-issue at least for the NMI watchdog on Intel hardware. AMD uses 48bit wide performance counters, which are probably less happy ;-) Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Always probe the NMI watchdog
On 2007.06.25 13:01:58 -0700, Stephane Eranian wrote: Hi, On Mon, Jun 25, 2007 at 09:36:17PM +0200, Andi Kleen wrote: On Monday 25 June 2007 21:09, Andrew Morton wrote: On Wed, 20 Jun 2007 20:34:48 +0200 Bj__rn Steinbrink [EMAIL PROTECTED] wrote: The performance counter allocator relies on the nmi watchdog being probed, so we have to do that even if the watchdog is not enabled. So... what's the status of this lot? I've just merged this patch and the second one: Subject: [PATCH 2/2] Reserve the right performance counter for the Intel PerfMon NMI watchdog Message-ID: [EMAIL PROTECTED] but there was no followup discussion afaict. Andi, Stephane: acks? Yes, although I'm still a little uneasy about the always probe one. I looked at the code I have in my tree coming from Bjon's patches and I am a bit confused by the flow for probing as well. The register allocator works globally, i.e., you reserve a register for all CPUs at once. The probe_nmi_watchdog() routine simply probes the CPU type to initialize the watchdog data structure (wd_ops). This needs to be done once and for all. Why put it in a route that is called with on_each_cpu()? Ehrm, that's a good question actually... I moved the probing call up into setup_local_APIC in that other big patch and put a check at the start of the probing function, so that it is executed only once. No idea why I did it so weird in this one. I think the tricky part is that we do want to reserve perfctr1 even though the NMI watchdog is not active. This comes from the fact that the NMI watchdog knows about only one counter and if it can't get that one, it probably fails. By reserving it from the start, we ensure NMI watchdog will work when eventually activated. Can you enable it later on at all? It failed for me when I tried, because it didn't know which hardware to use. Had to pass the kernel parameter to make the proc files do anything. Seems like it has to be enable at boot to work at all. And AFAICT we never unconditionally reserved a perfctr for the watchdog. Unlike sharing between Oprofile and perfmon which works by enforcing mutual exclusion between the two subsystems, the NMI watchdog must work concurrently with either Oprofile or Perfmon. In 2.6.21 the nmi watchdog, if enabled, just reserved its perfctrs and everything else had to deal with it. Since the cleanup, the watchdog will release its perfctr when disabled, so another subsystem can grab it. But that also means that that other subsystem must release it again before you can reenable the watchdog. Bjorn, did I understand the constraints correctly? I'll tell you, once I'm sure that I understood them correctly ;-) Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Always probe the NMI watchdog
On 2007.06.25 21:36:17 +0200, Andi Kleen wrote: On Monday 25 June 2007 21:09, Andrew Morton wrote: On Wed, 20 Jun 2007 20:34:48 +0200 Bj__rn Steinbrink [EMAIL PROTECTED] wrote: The performance counter allocator relies on the nmi watchdog being probed, so we have to do that even if the watchdog is not enabled. So... what's the status of this lot? I've just merged this patch and the second one: Subject: [PATCH 2/2] Reserve the right performance counter for the Intel PerfMon NMI watchdog Message-ID: [EMAIL PROTECTED] but there was no followup discussion afaict. Andi, Stephane: acks? Yes, although I'm still a little uneasy about the always probe one. If acked, do we agree that this is 2.6.22 material? The first (always probe) is probably .22 material, but needs more testing first. Hm, without the second, I expect OProfile to break when the watchdog is enabled. Alternatively to the patch I sent, we could revert the change that makes it use perfctr1 instead of perfctr0. Would you prefer that? Björn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/