Re: 2.6.24-git: kmap_atomic() WARN_ON()

2008-02-25 Thread Björn Steinbrink
On 2008.02.25 15:08:35 -0500, Jeff Garzik wrote:
> Björn Steinbrink wrote:
>> Hm, do you have lockdep enabled? If not, does lockdep make this go away?
>> Because lockdep will set IRQF_DISABLED for all interrupt handlers, and
>> unless that flag is set, handle_IRQ_event will reenable interrupts while
>> the handler is running. And ahci_interrupt only uses a plain spin_lock,
>> so interrupts keep being enabled. The patch below should help with that.
>>
>> Hmhm, maybe that also solves the deadlock you saw? Dunno...
>>
>> I can't come up with an useful commit message right now, but I'll resend
>> in suitable form for submission if that thing actually works.
>>
>> Björn
>>
>>
>> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
>> index 1db93b6..ae3dbc8 100644
>> --- a/drivers/ata/ahci.c
>> +++ b/drivers/ata/ahci.c
>> @@ -1739,6 +1739,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
>> *dev_instance)
>>  unsigned int i, handled = 0;
>>  void __iomem *mmio;
>>  u32 irq_stat, irq_ack = 0;
>> +unsigned long flags;
>>  VPRINTK("ENTER\n");
>>  @@ -1751,7 +1752,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
>> *dev_instance)
>>  if (!irq_stat)
>>  return IRQ_NONE;
>>  -   spin_lock(>lock);
>> +spin_lock_irqsave(>lock, flags);
>>  for (i = 0; i < host->n_ports; i++) {
>>  struct ata_port *ap;
>> @@ -1778,7 +1779,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
>> *dev_instance)
>>  handled = 1;
>>  }
>>  -   spin_unlock(>lock);
>> +spin_unlock_irqrestore(>lock, flags);
>
> If this truly fixes the problem, then lockdep is definitely the problem  
> source.

Hm, lockdep keeps the interrupts _disabled_. The code that reenables the
interrupts was already in the first revision of Linus' git tree. So
lockdep would actually just hide the problem, not cause it.

Björn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-git: kmap_atomic() WARN_ON()

2008-02-25 Thread Björn Steinbrink
On 2008.02.07 00:58:42 +0100, Thomas Gleixner wrote:
> current mainline triggers:
> 
> WARNING: at /home/tglx/work/kernel/x86/linux-2.6/arch/x86/mm/highmem_32.c:52 
> kmap_atomic_prot+0xe5/0x19b()
> Modules linked in: ahci(+) sata_sil libata sd_mod scsi_mod raid1 ext3 jbd 
> ehci_hcd ohci_hcd uhci_hcd
> Pid: 0, comm: swapper Not tainted 2.6.24 #173
>  [] warn_on_slowpath+0x41/0x51
>  [] ? __enqueue_entity+0x9c/0xa4
>  [] ? enqueue_entity+0x124/0x13b
>  [] ? enqueue_task_fair+0x41/0x4c
>  [] ? _spin_lock_irqsave+0x14/0x2e
>  [] ? lock_timer_base+0x1f/0x3e
>  [] kmap_atomic_prot+0xe5/0x19b
>  [] kmap_atomic+0x14/0x16
>  [] ata_scsi_rbuf_get+0x1e/0x2c [libata]
>  [] atapi_qc_complete+0x23f/0x289 [libata]
>  [] __ata_qc_complete+0x8e/0x93 [libata]
>  [] ata_qc_complete+0x115/0x128 [libata]
>  [] ata_qc_complete_multiple+0x86/0xa0 [libata]
>  [] ahci_interrupt+0x370/0x40d [ahci]
>  [] handle_IRQ_event+0x21/0x48
>  [] handle_edge_irq+0xc9/0x10a
>  [] ? handle_edge_irq+0x0/0x10a
>  [] do_IRQ+0x8b/0xb7
>  [] common_interrupt+0x23/0x28
>  [] ? init_chipset_cmd64x+0xb/0x93
>  [] ? mwait_idle_with_hints+0x39/0x3d
>  [] ? mwait_idle+0x0/0xf
>  [] mwait_idle+0xd/0xf
>  [] cpu_idle+0xb0/0xe4
>  [] rest_init+0x5d/0x5f
> 
> This is not a new problem. It was pointed out some time ago already,
> but now the WARN_ON() finally made it into mainline :)
> 
> The fix is not obvious, as this code seems to be called from various
> call sites.

Hm, do you have lockdep enabled? If not, does lockdep make this go away?
Because lockdep will set IRQF_DISABLED for all interrupt handlers, and
unless that flag is set, handle_IRQ_event will reenable interrupts while
the handler is running. And ahci_interrupt only uses a plain spin_lock,
so interrupts keep being enabled. The patch below should help with that.

Hmhm, maybe that also solves the deadlock you saw? Dunno...

I can't come up with an useful commit message right now, but I'll resend
in suitable form for submission if that thing actually works.

Björn


diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 1db93b6..ae3dbc8 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1739,6 +1739,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
*dev_instance)
unsigned int i, handled = 0;
void __iomem *mmio;
u32 irq_stat, irq_ack = 0;
+   unsigned long flags;
 
VPRINTK("ENTER\n");
 
@@ -1751,7 +1752,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
*dev_instance)
if (!irq_stat)
return IRQ_NONE;
 
-   spin_lock(>lock);
+   spin_lock_irqsave(>lock, flags);
 
for (i = 0; i < host->n_ports; i++) {
struct ata_port *ap;
@@ -1778,7 +1779,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
*dev_instance)
handled = 1;
}
 
-   spin_unlock(>lock);
+   spin_unlock_irqrestore(>lock, flags);
 
VPRINTK("EXIT\n");
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-git: kmap_atomic() WARN_ON()

2008-02-25 Thread Björn Steinbrink
On 2008.02.07 00:58:42 +0100, Thomas Gleixner wrote:
 current mainline triggers:
 
 WARNING: at /home/tglx/work/kernel/x86/linux-2.6/arch/x86/mm/highmem_32.c:52 
 kmap_atomic_prot+0xe5/0x19b()
 Modules linked in: ahci(+) sata_sil libata sd_mod scsi_mod raid1 ext3 jbd 
 ehci_hcd ohci_hcd uhci_hcd
 Pid: 0, comm: swapper Not tainted 2.6.24 #173
  [c0126b60] warn_on_slowpath+0x41/0x51
  [c011c5eb] ? __enqueue_entity+0x9c/0xa4
  [c011c717] ? enqueue_entity+0x124/0x13b
  [c011cedb] ? enqueue_task_fair+0x41/0x4c
  [c032ce88] ? _spin_lock_irqsave+0x14/0x2e
  [c012e885] ? lock_timer_base+0x1f/0x3e
  [c011ad6d] kmap_atomic_prot+0xe5/0x19b
  [c011ae37] kmap_atomic+0x14/0x16
  [f88ea218] ata_scsi_rbuf_get+0x1e/0x2c [libata]
  [f88ea821] atapi_qc_complete+0x23f/0x289 [libata]
  [f88e3d00] __ata_qc_complete+0x8e/0x93 [libata]
  [f88e3fc7] ata_qc_complete+0x115/0x128 [libata]
  [f88e8d47] ata_qc_complete_multiple+0x86/0xa0 [libata]
  [f8841a5d] ahci_interrupt+0x370/0x40d [ahci]
  [c01512c6] handle_IRQ_event+0x21/0x48
  [c01521ca] handle_edge_irq+0xc9/0x10a
  [c0152101] ? handle_edge_irq+0x0/0x10a
  [c0107518] do_IRQ+0x8b/0xb7
  [c01055db] common_interrupt+0x23/0x28
  [c032007b] ? init_chipset_cmd64x+0xb/0x93
  [c010316e] ? mwait_idle_with_hints+0x39/0x3d
  [c0103172] ? mwait_idle+0x0/0xf
  [c010317f] mwait_idle+0xd/0xf
  [c0103761] cpu_idle+0xb0/0xe4
  [c031b509] rest_init+0x5d/0x5f
 
 This is not a new problem. It was pointed out some time ago already,
 but now the WARN_ON() finally made it into mainline :)
 
 The fix is not obvious, as this code seems to be called from various
 call sites.

Hm, do you have lockdep enabled? If not, does lockdep make this go away?
Because lockdep will set IRQF_DISABLED for all interrupt handlers, and
unless that flag is set, handle_IRQ_event will reenable interrupts while
the handler is running. And ahci_interrupt only uses a plain spin_lock,
so interrupts keep being enabled. The patch below should help with that.

Hmhm, maybe that also solves the deadlock you saw? Dunno...

I can't come up with an useful commit message right now, but I'll resend
in suitable form for submission if that thing actually works.

Björn


diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 1db93b6..ae3dbc8 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1739,6 +1739,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
*dev_instance)
unsigned int i, handled = 0;
void __iomem *mmio;
u32 irq_stat, irq_ack = 0;
+   unsigned long flags;
 
VPRINTK(ENTER\n);
 
@@ -1751,7 +1752,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
*dev_instance)
if (!irq_stat)
return IRQ_NONE;
 
-   spin_lock(host-lock);
+   spin_lock_irqsave(host-lock, flags);
 
for (i = 0; i  host-n_ports; i++) {
struct ata_port *ap;
@@ -1778,7 +1779,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
*dev_instance)
handled = 1;
}
 
-   spin_unlock(host-lock);
+   spin_unlock_irqrestore(host-lock, flags);
 
VPRINTK(EXIT\n);
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-git: kmap_atomic() WARN_ON()

2008-02-25 Thread Björn Steinbrink
On 2008.02.25 15:08:35 -0500, Jeff Garzik wrote:
 Björn Steinbrink wrote:
 Hm, do you have lockdep enabled? If not, does lockdep make this go away?
 Because lockdep will set IRQF_DISABLED for all interrupt handlers, and
 unless that flag is set, handle_IRQ_event will reenable interrupts while
 the handler is running. And ahci_interrupt only uses a plain spin_lock,
 so interrupts keep being enabled. The patch below should help with that.

 Hmhm, maybe that also solves the deadlock you saw? Dunno...

 I can't come up with an useful commit message right now, but I'll resend
 in suitable form for submission if that thing actually works.

 Björn


 diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
 index 1db93b6..ae3dbc8 100644
 --- a/drivers/ata/ahci.c
 +++ b/drivers/ata/ahci.c
 @@ -1739,6 +1739,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
 *dev_instance)
  unsigned int i, handled = 0;
  void __iomem *mmio;
  u32 irq_stat, irq_ack = 0;
 +unsigned long flags;
  VPRINTK(ENTER\n);
  @@ -1751,7 +1752,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
 *dev_instance)
  if (!irq_stat)
  return IRQ_NONE;
  -   spin_lock(host-lock);
 +spin_lock_irqsave(host-lock, flags);
  for (i = 0; i  host-n_ports; i++) {
  struct ata_port *ap;
 @@ -1778,7 +1779,7 @@ static irqreturn_t ahci_interrupt(int irq, void 
 *dev_instance)
  handled = 1;
  }
  -   spin_unlock(host-lock);
 +spin_unlock_irqrestore(host-lock, flags);

 If this truly fixes the problem, then lockdep is definitely the problem  
 source.

Hm, lockdep keeps the interrupts _disabled_. The code that reenables the
interrupts was already in the first revision of Linus' git tree. So
lockdep would actually just hide the problem, not cause it.

Björn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why is the kfree() argument const?

2008-01-18 Thread Björn Steinbrink
On 2008.01.18 10:48:26 +0100, Jakob Oestergaard wrote:
> On Thu, Jan 17, 2008 at 01:25:39PM -0800, Linus Torvalds wrote:
> ...
> > Why do you make that mistake, when it is PROVABLY NOT TRUE!
> > 
> > Try this trivial program:
> > 
> > int main(int argc, char **argv)
> > {
> > int i;
> > const int *c;
> > 
> > i = 5;
> > c = 
> > i = 10;
> > return *c;
> > }
> > 
> > and realize that according to the C rules, if it returns anything but 10, 
> > the compiler is *buggy*.
> 
> That's not how this works (as we obviously agree).
> 
> Please consider a rewrite of your example, demonstrating the usefulness and
> proper application of const pointers:
> 
> extern foo(const int *);
> 
> int main(int argc, char **argv)
> {
>  int i;
> 
>  i = 5;
>  foo();
>  return i;
> }
> 
> Now, if the program returns anything else than 5, it means someone cast away
> const, which is generally considered a bad idea in most other software
> projects, for this very reason.

Not at all.

#include 
#include 

char *lookup[5];

const char *get()
{
*(*lookup = malloc(1)) = '1';
return *lookup;
}

void set(const char *d, char val)
{
for (int i = 0; i < 5; ++i)
if (lookup[i] == d)
*(lookup[i]) = val;
}

int main()
{
const char *p = get();

printf("%c\n", *p);
set(p, '2');
printf("%c\n", *p);

return 0;
}

Do you see anything that casts the const away? No? Me neither. Still,
the memory that p points to was changed, because there was another
pointer and that was not const.

> *That* is the purpose of const pointers.

The only thing that const can tell you is that you should not modify the
value _yourself_, using that pointer _directly_. It's somewhat like a
soft "half" protected/private specifier. You may read this value, but if
you want to write to it, please use the setter function I provide for
you. Because that setter function might do some special stuff, like
counting how often that value was written.

And accepting a pointer to a const as an argument does _only_ say: It's
ok to call this function if you only received a pointer to a const, the
function does the Right Thing for such pointers. It does not guarantee
at all, that the function won't change the memory the pointer is
pointing to. Take a set of functions that manage memory for foo objects:

const struct foo *get(someIdentifier);
struct foo *makeWritable(const struct foo *);
void free(const struct foo *);

get() returns a pointer to a foo object, and it might return a pointer
to a _shared_ instance. Obviously it should make the pointer const, the
caller should not modify the shared instance.

makeWritable() accepts a pointer to a const foo, because you generally
want to pass it such a pointer to get a non-const one instead. The
function might just use ref-counting and see if it needs to create a
copy returning a pointer to points to a different location in memory or
if it can just return its _internal_ _non-const_ pointer. No casts!

free() also accepts a pointer to a const foo, because you obviously want
to be able to free the shared stuff, too. It just happens to not always
go away immediately, because there's some ref-counting going on. But you
are _still_ invalidating your const pointer. You lost all rights to
using it, because you _gave it up_ when you called free(). An important
part of free() is to _invalidate_ the _pointer_. Whether or not the
object it pointed to was modified or not doesn't matter at all, because
you have no valid means of accessing it _anyway_, it's totally out of
scope.

Now I can hear you screaming "But that's refcounting! That's totally
different!". It's _not_. Just "pretend" that get() cannot return the
same thing twice, then your ref-counting only goes up to one and boom,
you've just reinvented malloc/free, with a const pointer being passed to
free().

Passing a pointer to free invalidates that pointer, and all pointers
that share the same value. And then, when no valid ways to access the
object are left, free() is free to use the memory in any way it wants.
Without any stricht need to use the const pointer to perform any writes.
That might be an optimization, but you can do without.

What you're arguing about is actually that you don't want anyone to be
able to use a const pointer to invalidate any other pointer, not that
you don't want an object to be changed through a const pointer, because
that is simply not what happens.

If you want to restrict the set of pointers that can be invalidated by
an other pointer, you'll have to use something else because const does
not talk about invalidating aliasing pointers.

Björn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why is the kfree() argument const?

2008-01-18 Thread Björn Steinbrink
On 2008.01.18 10:48:26 +0100, Jakob Oestergaard wrote:
 On Thu, Jan 17, 2008 at 01:25:39PM -0800, Linus Torvalds wrote:
 ...
  Why do you make that mistake, when it is PROVABLY NOT TRUE!
  
  Try this trivial program:
  
  int main(int argc, char **argv)
  {
  int i;
  const int *c;
  
  i = 5;
  c = i;
  i = 10;
  return *c;
  }
  
  and realize that according to the C rules, if it returns anything but 10, 
  the compiler is *buggy*.
 
 That's not how this works (as we obviously agree).
 
 Please consider a rewrite of your example, demonstrating the usefulness and
 proper application of const pointers:
 
 extern foo(const int *);
 
 int main(int argc, char **argv)
 {
  int i;
 
  i = 5;
  foo(i);
  return i;
 }
 
 Now, if the program returns anything else than 5, it means someone cast away
 const, which is generally considered a bad idea in most other software
 projects, for this very reason.

Not at all.

#include stdio.h
#include stdlib.h

char *lookup[5];

const char *get()
{
*(*lookup = malloc(1)) = '1';
return *lookup;
}

void set(const char *d, char val)
{
for (int i = 0; i  5; ++i)
if (lookup[i] == d)
*(lookup[i]) = val;
}

int main()
{
const char *p = get();

printf(%c\n, *p);
set(p, '2');
printf(%c\n, *p);

return 0;
}

Do you see anything that casts the const away? No? Me neither. Still,
the memory that p points to was changed, because there was another
pointer and that was not const.

 *That* is the purpose of const pointers.

The only thing that const can tell you is that you should not modify the
value _yourself_, using that pointer _directly_. It's somewhat like a
soft half protected/private specifier. You may read this value, but if
you want to write to it, please use the setter function I provide for
you. Because that setter function might do some special stuff, like
counting how often that value was written.

And accepting a pointer to a const as an argument does _only_ say: It's
ok to call this function if you only received a pointer to a const, the
function does the Right Thing for such pointers. It does not guarantee
at all, that the function won't change the memory the pointer is
pointing to. Take a set of functions that manage memory for foo objects:

const struct foo *get(someIdentifier);
struct foo *makeWritable(const struct foo *);
void free(const struct foo *);

get() returns a pointer to a foo object, and it might return a pointer
to a _shared_ instance. Obviously it should make the pointer const, the
caller should not modify the shared instance.

makeWritable() accepts a pointer to a const foo, because you generally
want to pass it such a pointer to get a non-const one instead. The
function might just use ref-counting and see if it needs to create a
copy returning a pointer to points to a different location in memory or
if it can just return its _internal_ _non-const_ pointer. No casts!

free() also accepts a pointer to a const foo, because you obviously want
to be able to free the shared stuff, too. It just happens to not always
go away immediately, because there's some ref-counting going on. But you
are _still_ invalidating your const pointer. You lost all rights to
using it, because you _gave it up_ when you called free(). An important
part of free() is to _invalidate_ the _pointer_. Whether or not the
object it pointed to was modified or not doesn't matter at all, because
you have no valid means of accessing it _anyway_, it's totally out of
scope.

Now I can hear you screaming But that's refcounting! That's totally
different!. It's _not_. Just pretend that get() cannot return the
same thing twice, then your ref-counting only goes up to one and boom,
you've just reinvented malloc/free, with a const pointer being passed to
free().

Passing a pointer to free invalidates that pointer, and all pointers
that share the same value. And then, when no valid ways to access the
object are left, free() is free to use the memory in any way it wants.
Without any stricht need to use the const pointer to perform any writes.
That might be an optimization, but you can do without.

What you're arguing about is actually that you don't want anyone to be
able to use a const pointer to invalidate any other pointer, not that
you don't want an object to be changed through a const pointer, because
that is simply not what happens.

If you want to restrict the set of pointers that can be invalidated by
an other pointer, you'll have to use something else because const does
not talk about invalidating aliasing pointers.

Björn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More breakage in native_rdtsc out of line in git-x86

2008-01-09 Thread Björn Steinbrink
On 2008.01.09 23:41:42 +0100, Andi Kleen wrote:
> On Wed, Jan 09, 2008 at 02:35:55PM -0800, Harvey Harrison wrote:
> > On Wed, 2008-01-09 at 23:28 +0100, Andi Kleen wrote:
> > 
> > > Do you have a simple recipe to just update from the the remote branch,
> > > assuming there are no local changes or local branches? 
> > > 
> > > -Andi
> > 
> > For staying up to date I use the following:
> > 
> > # Add Linus's tree as a remote
> > git remote --add linus
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> > 
> > # Add Ingo's tree as a remote
> > git remote --add x86
> > git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
> > 
> > # With that setup, just run the following to get any changes you
> > # don't have.  It will also notice any new branches Ingo/Linus
> > # add to their repo.  Look in .git/config afterwards, the format
> > # to add new remotes is easy to figure out.
> > git remote update
> 
> I'm already cloning the branches; the problem is not getting conflicts etc.
> when updating.

I guess you're using "git pull" to update a local branch? That will try
to merge the new state of x86/mm into your local branch, and that
breaks. If you just want to have a local branch that gives you a "fixed"
state of x86/mm regardless of whether or not you already fetched newer
ones, you can do:

git branch myThing x86/mm # create the branch

work/test/whatever

To fetch a new "state" from the remote:
git fetch x86/mm # or git remote update, or whatever

To update your branch to point to the new state:
git branch - f myThing x86/mm

That basically replaces it with a new branch of the same name, but
pointing to the new x86/mm.

Or if you want to get your working tree to that state at the same time,
you can also do:
git checkout myThing
git reset --hard x86/mm

Björn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More breakage in native_rdtsc out of line in git-x86

2008-01-09 Thread Björn Steinbrink
On 2008.01.09 18:48:00 +0100, Andi Kleen wrote:
> On Wed, Jan 09, 2008 at 05:30:18PM +0100, Ingo Molnar wrote:
> > then you have a truly ancient x86.git repository ;-)
> 
> I update only infrequently because frankly git's remote branch tracking
> is a mess. At least it doesn't really work for x86#mm here. 
> 
> I usually have to blow away the repository and reclone
> to get back to a sane state.

Someone in #git had a similar problem today. Conclusion was that x86/mm
is not "stable" in the sense that commits are only added, instead
history gets rewritten. That breaks pull/merge/"basic rebase".

Basically, you'll want to "rebase --onto", taking your local commits
from the old branch history to the new branch history. One way to make
that bearable is to have two branches (or a branch and a tag). One
branch is used to keep your work, the other branch (or tag) is used to
"mark" where the old upstream ended and your work started.

Assuming that your remote is called "x86", this could look like this:

git branch myStuff x86/mm
git branch myStuff_start x86/mm

work work work commit commit commit

git fetch x86 mm

git rebase --onto x86/mm myStuff_start myStuff
git branch -f myStuff_start x86/mm


The rebase will take all of the commits that you added to myStuff, on
top of the old x86/mm (referred to by myStuff_start) and try to apply
them on top of the new x86/mm, updating myStuff to point to the rebased
commits afterwards.

Then myStuff_start is updated to point to the now current start of your
stuff, so that you can do the rebase again later.

That's not actually a problem of git's tracking branches, because
they're not made for that style of work at all. For such stuff, rebase
is the only useful option AFAIK.

HTH,
Björn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More breakage in native_rdtsc out of line in git-x86

2008-01-09 Thread Björn Steinbrink
On 2008.01.09 18:48:00 +0100, Andi Kleen wrote:
 On Wed, Jan 09, 2008 at 05:30:18PM +0100, Ingo Molnar wrote:
  then you have a truly ancient x86.git repository ;-)
 
 I update only infrequently because frankly git's remote branch tracking
 is a mess. At least it doesn't really work for x86#mm here. 
 
 I usually have to blow away the repository and reclone
 to get back to a sane state.

Someone in #git had a similar problem today. Conclusion was that x86/mm
is not stable in the sense that commits are only added, instead
history gets rewritten. That breaks pull/merge/basic rebase.

Basically, you'll want to rebase --onto, taking your local commits
from the old branch history to the new branch history. One way to make
that bearable is to have two branches (or a branch and a tag). One
branch is used to keep your work, the other branch (or tag) is used to
mark where the old upstream ended and your work started.

Assuming that your remote is called x86, this could look like this:

git branch myStuff x86/mm
git branch myStuff_start x86/mm

work work work commit commit commit

git fetch x86 mm

git rebase --onto x86/mm myStuff_start myStuff
git branch -f myStuff_start x86/mm


The rebase will take all of the commits that you added to myStuff, on
top of the old x86/mm (referred to by myStuff_start) and try to apply
them on top of the new x86/mm, updating myStuff to point to the rebased
commits afterwards.

Then myStuff_start is updated to point to the now current start of your
stuff, so that you can do the rebase again later.

That's not actually a problem of git's tracking branches, because
they're not made for that style of work at all. For such stuff, rebase
is the only useful option AFAIK.

HTH,
Björn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More breakage in native_rdtsc out of line in git-x86

2008-01-09 Thread Björn Steinbrink
On 2008.01.09 23:41:42 +0100, Andi Kleen wrote:
 On Wed, Jan 09, 2008 at 02:35:55PM -0800, Harvey Harrison wrote:
  On Wed, 2008-01-09 at 23:28 +0100, Andi Kleen wrote:
  
   Do you have a simple recipe to just update from the the remote branch,
   assuming there are no local changes or local branches? 
   
   -Andi
  
  For staying up to date I use the following:
  
  # Add Linus's tree as a remote
  git remote --add linus
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
  
  # Add Ingo's tree as a remote
  git remote --add x86
  git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
  
  # With that setup, just run the following to get any changes you
  # don't have.  It will also notice any new branches Ingo/Linus
  # add to their repo.  Look in .git/config afterwards, the format
  # to add new remotes is easy to figure out.
  git remote update
 
 I'm already cloning the branches; the problem is not getting conflicts etc.
 when updating.

I guess you're using git pull to update a local branch? That will try
to merge the new state of x86/mm into your local branch, and that
breaks. If you just want to have a local branch that gives you a fixed
state of x86/mm regardless of whether or not you already fetched newer
ones, you can do:

git branch myThing x86/mm # create the branch

work/test/whatever

To fetch a new state from the remote:
git fetch x86/mm # or git remote update, or whatever

To update your branch to point to the new state:
git branch - f myThing x86/mm

That basically replaces it with a new branch of the same name, but
pointing to the new x86/mm.

Or if you want to get your working tree to that state at the same time,
you can also do:
git checkout myThing
git reset --hard x86/mm

Björn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix forcedeth reversing the MAC address on suspend

2008-01-06 Thread Björn Steinbrink
On 2008.01.04 23:26:33 +0100, Björn Steinbrink wrote:
> For cards that initially have the MAC address stored in reverse order,
> the forcedeth driver uses a flag to signal whether the address was
> already corrected, so that it is not reversed again on a subsequent
> probe.
> 
> Unfortunately this flag, which is stored in a register of the card,
> seems to get lost during suspend, resulting in the MAC address being
> reversed again. To fix that, the MAC address needs to be written back in
> reversed order before we suspend and the flag needs to be reset.
> 
> The flag is still required because at least kexec will never write back
> the reversed address and thus needs to know what state the card is in.
> 
> Signed-off-by: Björn Steinbrink <[EMAIL PROTECTED]>
> ---
> On 2008.01.04 15:08:42 +0100, Richard Jonsson wrote:
> > Björn Steinbrink skrev:
> >> Richard, could you give this a spin? And then we'd likely need someone
> >> to test that with kexec...
> >
> > The patch you sent does the trick, works fine now, thanks!
> > I cannot test this with kexec as I barely know what it is, I'll leave that 
> > to someone else.
> 
> Thanks.
> 
> Ayaz, you originally wrote the kexec fix (IIRC), was my analysis of the
> problem correct? If so, I'm quite sure that the patch DTRT. Still it
> should be tested for the rmmod+modprobe and the kexec case. I'll try to
> get my box free for some testing, but that's unlikely in the next few
> days. Plus, I've never used kexec myself either. So I'd be grateful if
> someone else would step up.

Found a few minutes to test, but kexec would just hang for me. So I'm
unable to test that atm. :-(

Björn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth: MAC-address reversed on resume from suspend

2008-01-06 Thread Björn Steinbrink
On 2008.01.06 19:49:49 -0200, Adolfo R. Brandes wrote:
>  I have this forcedeth MAC address reversal problem when suspending
> on 2 distinct boxes.  I can confirm Steinbrink's patch fixes the
> problem on only one of them:
> 
> 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev f3)
> 
>  On the other one the problem persists:
> 
> 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1)

Thanks. Leaves me pretty clueless though. Especially since it worked for
Richard, who also has a MCP51. In a private mail, you said that you had
hardcoded the mac address in the source. Any chance that you applied the
patch on your modified sources and didn't get it right?

thanks,
Björn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth: MAC-address reversed on resume from suspend

2008-01-06 Thread Björn Steinbrink
On 2008.01.06 19:49:49 -0200, Adolfo R. Brandes wrote:
  I have this forcedeth MAC address reversal problem when suspending
 on 2 distinct boxes.  I can confirm Steinbrink's patch fixes the
 problem on only one of them:
 
 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev f3)
 
  On the other one the problem persists:
 
 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1)

Thanks. Leaves me pretty clueless though. Especially since it worked for
Richard, who also has a MCP51. In a private mail, you said that you had
hardcoded the mac address in the source. Any chance that you applied the
patch on your modified sources and didn't get it right?

thanks,
Björn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix forcedeth reversing the MAC address on suspend

2008-01-06 Thread Björn Steinbrink
On 2008.01.04 23:26:33 +0100, Björn Steinbrink wrote:
 For cards that initially have the MAC address stored in reverse order,
 the forcedeth driver uses a flag to signal whether the address was
 already corrected, so that it is not reversed again on a subsequent
 probe.
 
 Unfortunately this flag, which is stored in a register of the card,
 seems to get lost during suspend, resulting in the MAC address being
 reversed again. To fix that, the MAC address needs to be written back in
 reversed order before we suspend and the flag needs to be reset.
 
 The flag is still required because at least kexec will never write back
 the reversed address and thus needs to know what state the card is in.
 
 Signed-off-by: Björn Steinbrink [EMAIL PROTECTED]
 ---
 On 2008.01.04 15:08:42 +0100, Richard Jonsson wrote:
  Björn Steinbrink skrev:
  Richard, could you give this a spin? And then we'd likely need someone
  to test that with kexec...
 
  The patch you sent does the trick, works fine now, thanks!
  I cannot test this with kexec as I barely know what it is, I'll leave that 
  to someone else.
 
 Thanks.
 
 Ayaz, you originally wrote the kexec fix (IIRC), was my analysis of the
 problem correct? If so, I'm quite sure that the patch DTRT. Still it
 should be tested for the rmmod+modprobe and the kexec case. I'll try to
 get my box free for some testing, but that's unlikely in the next few
 days. Plus, I've never used kexec myself either. So I'd be grateful if
 someone else would step up.

Found a few minutes to test, but kexec would just hang for me. So I'm
unable to test that atm. :-(

Björn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth: MAC-address reversed on resume from suspend

2008-01-04 Thread Björn Steinbrink
On 2008.01.05 07:10:02 +0100, Björn Steinbrink wrote:
> - nv_probe sees 00:11:22:00:00:01
> - doesn't match the vendor stuff
> - becomes 01:00:00:22:11:00
> 
> Oops, that's not quite the expected thing.

Haha, I just noticed that that even is a multicast address, so you'd get
a random MAC address in that case *lol.

Björn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth: MAC-address reversed on resume from suspend

2008-01-04 Thread Björn Steinbrink
On 2008.01.04 23:43:52 +0100, Andreas Mohr wrote:
> On Fri, Jan 04, 2008 at 11:17:40AM +0100, Björn Steinbrink wrote:
> > On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote:
> > > And then it needs these card I/O functions wrapped into two
> > > functions which interface with driver- and OS-related MAC
> > > variables (struct variables ALWAYS stored in usual system order,
> > > NOT H/W order!!) which optionally reverse the address (if
> > > needed for a particular card).
> > 
> > Hu? The MAC address is only ever reversed when the card is not in
> > use, why the hell would you want to reverse anything when the rest
> > of the OS is involved? This whole reversing stuff is purely before
> > and after the card is actually used. It's not that you need to
> > reverse the address to communicate with the card, it's just
> > initially wrong on the card.
> 
> Huhrmm? OK, let me ask this then: So what you're saying is that the
> address is only initially wrong (e.g. due to card EEPROM content
> somehow initializing the registers in wrong order on power-on), it's
> not _always_ supposed to be stored in wrong order during operation.
> 
> IOW, the default card state after power-on is _unusable_ (due to
> invalidly ordered MAC address) and has to be _corrected_ by the
> driver, _initially_ only?

Yeah. _But_, all deduced from the code.

> OTOH I know that at least acx100 has a _permanently_ wrong-ordered MAC
> address setup, i.e. it's required to have it in "wrong" order _during
> operation_. And I wouldn't be surprised to see several examples of
> other hardware having a permanently wrongly-ordered address
> requirement, given the amount of MAC reversal in Linux drivers.
>
> Couldn't it be by chance that it's _believed_ to be
> reversed-on-power-on only, whereas those cards should _actually_ have
> it reversed-during-lifetime instead? Such a misunderstanding might
> explain a lot of trouble...

Humm... Wouldn't that have made itself evident? The card's not running
in promisc mode all the time, so there should be some problems if the
card would expect the MAC in the reversed order, right? (At least I see
some code that can switch it into promisc mode, so I assume that it is
not enabled all the time).

There _is_ a comment about some cards not generating any TX interrupts
at all. But that seems to predate any card that stores the MAC address
in correct order (the patch for that came after git, the comment
predates git). So assuming that the card wants the address in reversed
order at runtime, would likely imply that _no_ card would generate TX
interrupts (unless promisc?), and the comment basically invalidates that
assumption.

> > > And then there will never be any confusion about any MAC address
> > > format anywhere any more, right?
> > 
> > At probing time, you'll have to know whether the address you read
> > from the hardware is reversed or not. Unless you write it back in
> > reversed order when you release the card, you need a flag that at
> > least survives until nv_probe is called the next time. kexec does
> > not write it back, so you do need such a flag. But the flag
> > apparently doesn't survive a suspend/resume cycle, so you need to
> > write back the reversed address then. But the flag will survive a
> > rmmod + modprobe cycle, so you need to reset it when writing back
> > the reversed address.
> 
> If it's indeed reversed-on-power-on only, then one probably cannot do
> much about it, thus I'd give up and simply check the MAC address for
> validity (should be very easy to check for a simple reversal by
> checking for the manufacturer-specific fingerprint in reverse order).
> Keeping _any_ sort of state to record the fact that it used to be
> reversed or now is reversed is futile (or rather: catastrophic) given
> the complexity of suspend / virtualisation or whichever other nice
> operations Linux may invent in the future ;)

There was once a patch that checked the vendor specific address part,
long ago. No idea what happened to it. But for kexec, I'd say that that
is broken, too.

- ifconfig eth0 hw ether 00:11:22:00:00:01
- $kexec_incantation

- nv_probe sees 00:11:22:00:00:01
- doesn't match the vendor stuff
- becomes 01:00:00:22:11:00

Oops, that's not quite the expected thing.

No way to tell whether we have to reverse from the address alone AFAICT.

Björn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix forcedeth reversing the MAC address on suspend

2008-01-04 Thread Björn Steinbrink
For cards that initially have the MAC address stored in reverse order,
the forcedeth driver uses a flag to signal whether the address was
already corrected, so that it is not reversed again on a subsequent
probe.

Unfortunately this flag, which is stored in a register of the card,
seems to get lost during suspend, resulting in the MAC address being
reversed again. To fix that, the MAC address needs to be written back in
reversed order before we suspend and the flag needs to be reset.

The flag is still required because at least kexec will never write back
the reversed address and thus needs to know what state the card is in.

Signed-off-by: Björn Steinbrink <[EMAIL PROTECTED]>
---
On 2008.01.04 15:08:42 +0100, Richard Jonsson wrote:
> Björn Steinbrink skrev:
>> Richard, could you give this a spin? And then we'd likely need someone
>> to test that with kexec...
>
> The patch you sent does the trick, works fine now, thanks!
> I cannot test this with kexec as I barely know what it is, I'll leave that 
> to someone else.

Thanks.

Ayaz, you originally wrote the kexec fix (IIRC), was my analysis of the
problem correct? If so, I'm quite sure that the patch DTRT. Still it
should be tested for the rmmod+modprobe and the kexec case. I'll try to
get my box free for some testing, but that's unlikely in the next few
days. Plus, I've never used kexec myself either. So I'd be grateful if
someone else would step up.

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index a96583c..f84c752 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -5199,10 +5199,6 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, 
const struct pci_device_i
dev->dev_addr[3] = (np->orig_mac[0] >> 16) & 0xff;
dev->dev_addr[4] = (np->orig_mac[0] >>  8) & 0xff;
dev->dev_addr[5] = (np->orig_mac[0] >>  0) & 0xff;
-   /* set permanent address to be correct aswell */
-   np->orig_mac[0] = (dev->dev_addr[0] << 0) + (dev->dev_addr[1] 
<< 8) +
-   (dev->dev_addr[2] << 16) + (dev->dev_addr[3] << 24);
-   np->orig_mac[1] = (dev->dev_addr[4] << 0) + (dev->dev_addr[5] 
<< 8);
writel(txreg|NVREG_TRANSMITPOLL_MAC_ADDR_REV, base + 
NvRegTransmitPoll);
}
memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
@@ -5414,6 +5410,8 @@ static void __devexit nv_remove(struct pci_dev *pci_dev)
 */
writel(np->orig_mac[0], base + NvRegMacAddrA);
writel(np->orig_mac[1], base + NvRegMacAddrB);
+   writel(readl(base + NvRegTransmitPoll) & 
~NVREG_TRANSMITPOLL_MAC_ADDR_REV,
+  base + NvRegTransmitPoll);
 
/* free all structures */
free_rings(dev);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth: MAC-address reversed on resume from suspend

2008-01-04 Thread Björn Steinbrink
On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote:
> Hi,
> 
> On Fri, Jan 04, 2008 at 04:43:57AM +0100, Björn Steinbrink wrote:
> > On 2008.01.03 01:42:09 +0200, Adrian Bunk wrote:
> > > [ original bug report: http://lkml.org/lkml/2008/1/2/253 ]
> > > 
> > > On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote:
> > > > The nv_probe() function then nicely obeys this flag by reversing the
> > > > address if needed, _however_ there's another function,
> > > > nv_copy_mac_to_hw(), which does _NOT_ obey it and simply copies the
> > > > _raw_ netdev dev_addr to the card register (NvRegMacAddrA etc.).
> > 
> > nv_copy_mac_to_hw() is called from nv_probe() _after_ the MAC address
> > has been fixed, and from nv_set_mac_address() which is supposed to get a
> > correct MAC address anyway. So I don't see any problem there.
> 
> /* set permanent address to be correct aswell */
> ... orig_mac fumbling ...
> 
> Why, then, does this driver do such a nice hack as:
> 
> /* special op: write back the misordered MAC address - otherwise
>  * the next nv_probe would see a wrong address.
>  */
> writel(np->orig_mac[0], base + NvRegMacAddrA);
> writel(np->orig_mac[1], base + NvRegMacAddrB);
> 
> I really don't see why this driver needs to do these things in such a
> messy way.

Sure you can add some layers of wrappers and then have
"nv_store_mac(np->orig_mac)" instead. Congratulations, you saved 2 lines
and added 6 new ones (assuming that you don't go universally u8, that
adds even more). This is the only place that writes the MAC address
besides the existing nv_copy_mac_to_hw wrapper which deals with
net_devices and thus is perfectly usable, internally and externally.

> One thing is for certain: netdev->dev_addr is always in operating system
> order, and order should always be handled properly when copying to/from
> hardware.
> 
> Such a driver needs exactly *one* central helper method
> to reverse an input MAC address in 6 bytes u8 format
> (which could arguably be added to include/linux/etherdevice.h even,
> since a wee bit too many devices seem to be having trouble
> with wrongly ordered MAC addresses).
> Then it needs exactly *one* function for I/O register access
> to read a MAC address from a device and exactly *one* function
> to write it back (both doing raw, unsorted format transfers only,
> and possibly as inline function).

A function to read the MAC address from the hardware? There's only a
single place (nv_probe()) that ever reads it, why bother?

> And then it needs these card I/O functions wrapped into two functions which
> interface with driver- and OS-related MAC variables
> (struct variables ALWAYS stored in usual system order, NOT H/W order!!)
> which optionally reverse the address (if needed for a particular card).

Hu? The MAC address is only ever reversed when the card is not in use,
why the hell would you want to reverse anything when the rest of the OS
is involved? This whole reversing stuff is purely before and after the
card is actually used. It's not that you need to reverse the address to
communicate with the card, it's just initially wrong on the card.

> And then there will never be any confusion about any MAC address format
> anywhere any more, right?

At probing time, you'll have to know whether the address you read from
the hardware is reversed or not. Unless you write it back in reversed
order when you release the card, you need a flag that at least survives
until nv_probe is called the next time. kexec does not write it back, so
you do need such a flag. But the flag apparently doesn't survive a
suspend/resume cycle, so you need to write back the reversed address
then. But the flag will survive a rmmod + modprobe cycle, so you need to
reset it when writing back the reversed address.

> I really don't mean to sound cranky, but this driver did ask for trouble,
> and lots of it ;)
> 
> Thank you for your reply, and especially thank you for this very fast
> patch!

Well, let's see if my analysis was correct and the patch works, first. I
saw the forcedeth code before, but kexec and suspend is totally new to
me and I just made some assumptions based on the reported behaviour and
the commit messages... ;-)

Björn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth: MAC-address reversed on resume from suspend

2008-01-04 Thread Björn Steinbrink
On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote:
 Hi,
 
 On Fri, Jan 04, 2008 at 04:43:57AM +0100, Björn Steinbrink wrote:
  On 2008.01.03 01:42:09 +0200, Adrian Bunk wrote:
   [ original bug report: http://lkml.org/lkml/2008/1/2/253 ]
   
   On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote:
The nv_probe() function then nicely obeys this flag by reversing the
address if needed, _however_ there's another function,
nv_copy_mac_to_hw(), which does _NOT_ obey it and simply copies the
_raw_ netdev dev_addr to the card register (NvRegMacAddrA etc.).
  
  nv_copy_mac_to_hw() is called from nv_probe() _after_ the MAC address
  has been fixed, and from nv_set_mac_address() which is supposed to get a
  correct MAC address anyway. So I don't see any problem there.
 
 /* set permanent address to be correct aswell */
 ... orig_mac fumbling ...
 
 Why, then, does this driver do such a nice hack as:
 
 /* special op: write back the misordered MAC address - otherwise
  * the next nv_probe would see a wrong address.
  */
 writel(np-orig_mac[0], base + NvRegMacAddrA);
 writel(np-orig_mac[1], base + NvRegMacAddrB);
 
 I really don't see why this driver needs to do these things in such a
 messy way.

Sure you can add some layers of wrappers and then have
nv_store_mac(np-orig_mac) instead. Congratulations, you saved 2 lines
and added 6 new ones (assuming that you don't go universally u8, that
adds even more). This is the only place that writes the MAC address
besides the existing nv_copy_mac_to_hw wrapper which deals with
net_devices and thus is perfectly usable, internally and externally.

 One thing is for certain: netdev-dev_addr is always in operating system
 order, and order should always be handled properly when copying to/from
 hardware.
 
 Such a driver needs exactly *one* central helper method
 to reverse an input MAC address in 6 bytes u8 format
 (which could arguably be added to include/linux/etherdevice.h even,
 since a wee bit too many devices seem to be having trouble
 with wrongly ordered MAC addresses).
 Then it needs exactly *one* function for I/O register access
 to read a MAC address from a device and exactly *one* function
 to write it back (both doing raw, unsorted format transfers only,
 and possibly as inline function).

A function to read the MAC address from the hardware? There's only a
single place (nv_probe()) that ever reads it, why bother?

 And then it needs these card I/O functions wrapped into two functions which
 interface with driver- and OS-related MAC variables
 (struct variables ALWAYS stored in usual system order, NOT H/W order!!)
 which optionally reverse the address (if needed for a particular card).

Hu? The MAC address is only ever reversed when the card is not in use,
why the hell would you want to reverse anything when the rest of the OS
is involved? This whole reversing stuff is purely before and after the
card is actually used. It's not that you need to reverse the address to
communicate with the card, it's just initially wrong on the card.

 And then there will never be any confusion about any MAC address format
 anywhere any more, right?

At probing time, you'll have to know whether the address you read from
the hardware is reversed or not. Unless you write it back in reversed
order when you release the card, you need a flag that at least survives
until nv_probe is called the next time. kexec does not write it back, so
you do need such a flag. But the flag apparently doesn't survive a
suspend/resume cycle, so you need to write back the reversed address
then. But the flag will survive a rmmod + modprobe cycle, so you need to
reset it when writing back the reversed address.

 I really don't mean to sound cranky, but this driver did ask for trouble,
 and lots of it ;)
 
 Thank you for your reply, and especially thank you for this very fast
 patch!

Well, let's see if my analysis was correct and the patch works, first. I
saw the forcedeth code before, but kexec and suspend is totally new to
me and I just made some assumptions based on the reported behaviour and
the commit messages... ;-)

Björn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth: MAC-address reversed on resume from suspend

2008-01-04 Thread Björn Steinbrink
On 2008.01.04 23:43:52 +0100, Andreas Mohr wrote:
 On Fri, Jan 04, 2008 at 11:17:40AM +0100, Björn Steinbrink wrote:
  On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote:
   And then it needs these card I/O functions wrapped into two
   functions which interface with driver- and OS-related MAC
   variables (struct variables ALWAYS stored in usual system order,
   NOT H/W order!!) which optionally reverse the address (if
   needed for a particular card).
  
  Hu? The MAC address is only ever reversed when the card is not in
  use, why the hell would you want to reverse anything when the rest
  of the OS is involved? This whole reversing stuff is purely before
  and after the card is actually used. It's not that you need to
  reverse the address to communicate with the card, it's just
  initially wrong on the card.
 
 Huhrmm? OK, let me ask this then: So what you're saying is that the
 address is only initially wrong (e.g. due to card EEPROM content
 somehow initializing the registers in wrong order on power-on), it's
 not _always_ supposed to be stored in wrong order during operation.
 
 IOW, the default card state after power-on is _unusable_ (due to
 invalidly ordered MAC address) and has to be _corrected_ by the
 driver, _initially_ only?

Yeah. _But_, all deduced from the code.

 OTOH I know that at least acx100 has a _permanently_ wrong-ordered MAC
 address setup, i.e. it's required to have it in wrong order _during
 operation_. And I wouldn't be surprised to see several examples of
 other hardware having a permanently wrongly-ordered address
 requirement, given the amount of MAC reversal in Linux drivers.

 Couldn't it be by chance that it's _believed_ to be
 reversed-on-power-on only, whereas those cards should _actually_ have
 it reversed-during-lifetime instead? Such a misunderstanding might
 explain a lot of trouble...

Humm... Wouldn't that have made itself evident? The card's not running
in promisc mode all the time, so there should be some problems if the
card would expect the MAC in the reversed order, right? (At least I see
some code that can switch it into promisc mode, so I assume that it is
not enabled all the time).

There _is_ a comment about some cards not generating any TX interrupts
at all. But that seems to predate any card that stores the MAC address
in correct order (the patch for that came after git, the comment
predates git). So assuming that the card wants the address in reversed
order at runtime, would likely imply that _no_ card would generate TX
interrupts (unless promisc?), and the comment basically invalidates that
assumption.

   And then there will never be any confusion about any MAC address
   format anywhere any more, right?
  
  At probing time, you'll have to know whether the address you read
  from the hardware is reversed or not. Unless you write it back in
  reversed order when you release the card, you need a flag that at
  least survives until nv_probe is called the next time. kexec does
  not write it back, so you do need such a flag. But the flag
  apparently doesn't survive a suspend/resume cycle, so you need to
  write back the reversed address then. But the flag will survive a
  rmmod + modprobe cycle, so you need to reset it when writing back
  the reversed address.
 
 If it's indeed reversed-on-power-on only, then one probably cannot do
 much about it, thus I'd give up and simply check the MAC address for
 validity (should be very easy to check for a simple reversal by
 checking for the manufacturer-specific fingerprint in reverse order).
 Keeping _any_ sort of state to record the fact that it used to be
 reversed or now is reversed is futile (or rather: catastrophic) given
 the complexity of suspend / virtualisation or whichever other nice
 operations Linux may invent in the future ;)

There was once a patch that checked the vendor specific address part,
long ago. No idea what happened to it. But for kexec, I'd say that that
is broken, too.

- ifconfig eth0 hw ether 00:11:22:00:00:01
- $kexec_incantation

- nv_probe sees 00:11:22:00:00:01
- doesn't match the vendor stuff
- becomes 01:00:00:22:11:00

Oops, that's not quite the expected thing.

No way to tell whether we have to reverse from the address alone AFAICT.

Björn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth: MAC-address reversed on resume from suspend

2008-01-04 Thread Björn Steinbrink
On 2008.01.05 07:10:02 +0100, Björn Steinbrink wrote:
 - nv_probe sees 00:11:22:00:00:01
 - doesn't match the vendor stuff
 - becomes 01:00:00:22:11:00
 
 Oops, that's not quite the expected thing.

Haha, I just noticed that that even is a multicast address, so you'd get
a random MAC address in that case *lol.

Björn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth: MAC-address reversed on resume from suspend

2008-01-03 Thread Björn Steinbrink
On 2008.01.03 01:42:09 +0200, Adrian Bunk wrote:
> [ original bug report: http://lkml.org/lkml/2008/1/2/253 ]
> 
> On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote:
> > Hi,
> > 
> > On Wed, Jan 02, 2008 at 10:04:49PM +0100, Richard Jonsson wrote:
> > > Bugreport regarding forcedeth driver.
> > >
> > > When returning from suspend-to-RAM the MAC-address byteorder is
> > > reversed.  After another suspend-resume cycle the MAC-address is
> > > again correct. This brings a great deal of pain since the NIC is
> > > assigned a random MAC-address when it is detected as invalid.
> > >
> > > I cannot say if this issue appeared recently or if it's been in
> > > the kernel for a while, as I've been using wireless until
> > > recently.
> > >
> > > I read somewhere on lkml that the mac is read from the device,
> > > then reversed and finally written back to the device. Can it be
> > > that it is read again on resume and reversed, and then again
> > > written to the device? This would explain why it's ok every other
> > > resume cycle.
> > 
> > Uh, Use The Source, Luke?
> > 
> > But no, I think it's simply driver dainbreadness, not a matter of
> > complicated writing and reading back in reversed order.
> > 
> > drivers/net/forcedeth.c has a nice DEV_HAS_CORRECT_MACADDR flag
> > which is being enabled for certain cards (those which need this) and
> > disabled for others.
> > 
> > The nv_probe() function then nicely obeys this flag by reversing the
> > address if needed, _however_ there's another function,
> > nv_copy_mac_to_hw(), which does _NOT_ obey it and simply copies the
> > _raw_ netdev dev_addr to the card register (NvRegMacAddrA etc.).

nv_copy_mac_to_hw() is called from nv_probe() _after_ the MAC address
has been fixed, and from nv_set_mac_address() which is supposed to get a
correct MAC address anyway. So I don't see any problem there.


After some digging, I guess it's related to
5070d3408405ae1941f259acac7a9882045c3be4

That introduced a flag in a register to signal if the MAC address has
been corrected, so that for example kexec won't reverse the adddress
again, when calling nv_probe() again.

As I know neither the suspend nor the kexec code well enough, I'll have
to make a few smart guesses (let's hope that that works out *g*):

a) suspend manages to reverse the MAC address again, so it must call
nv_probe. And we have lost the flag that stops us from reversing the
address. We cannot have lost the fixed MAC address, because then we'd
always get the reversed address, and not go back and forth. I'll assume
that it will also call nv_remove during suspend, because it's nicely
symmetrical then :-)

b) kexec does not call nv_remove, because otherwise, it would see a
reversed address on its nv_probe call anyway, and the flag wouldn't be
necessary.

Now the commit that introduced the flag did also introduce an indirect
change to nv_remove. Although it still says that it writes back the
broken MAC address, that's no longer true, because orig_mac now also
gets the correct address in nv_probe.

Smart guessing time again: That was required because otherwise a "rmood
forcedeth && modprobe forcedeth" would have produced a reversed MAC
address.

But that also causes the problem that we get when we loose either the
flag, we start reversing the fixed address. If we instead return the
card to its initial state in nv_remove, ie. unset the flag and write
back the reversed address, then everyone going through nv_remove _and_
nv_probe should be happy. And kexec, which only goes through nv_probe
again and doesn't loose the flag should be happy, too.

Richard, could you give this a spin? And then we'd likely need someone
to test that with kexec...

thanks,
Björn
---
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index a96583c..f84c752 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -5199,10 +5199,6 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, 
const struct pci_device_i
dev->dev_addr[3] = (np->orig_mac[0] >> 16) & 0xff;
dev->dev_addr[4] = (np->orig_mac[0] >>  8) & 0xff;
dev->dev_addr[5] = (np->orig_mac[0] >>  0) & 0xff;
-   /* set permanent address to be correct aswell */
-   np->orig_mac[0] = (dev->dev_addr[0] << 0) + (dev->dev_addr[1] 
<< 8) +
-   (dev->dev_addr[2] << 16) + (dev->dev_addr[3] << 24);
-   np->orig_mac[1] = (dev->dev_addr[4] << 0) + (dev->dev_addr[5] 
<< 8);
writel(txreg|NVREG_TRANSMITPOLL_MAC_ADDR_REV, base + 
NvRegTransmitPoll);
}
memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
@@ -5414,6 +5410,8 @@ static void __devexit nv_remove(struct pci_dev *pci_dev)
 */
writel(np->orig_mac[0], base + NvRegMacAddrA);
writel(np->orig_mac[1], base + NvRegMacAddrB);
+   writel(readl(base + NvRegTransmitPoll) & 
~NVREG_TRANSMITPOLL_MAC_ADDR_REV,
+  base + NvRegTransmitPoll);
 
/* 

Re: Top 9 kernel oopses/warnings for the week of December 29th, 2007

2007-12-29 Thread Björn Steinbrink
On 2007.12.29 11:18:18 -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 29 Dec 2007, Arjan van de Ven wrote:
> >
> > It has been a quiet week due to the holidays, only 55 oops traces
> > have been collected.
> 
> This would be more useful if it was more readable. As it is, you seem to 
> have some formatting errors in your automation, where the things are 
> incorrectly grouped:
> 
> > Rank 1: sysctl_head_finish
> > sysctl table check failed
> > Reported 7 times
> > Only reported for the proprietary madwifi driver
> > More info: 
> > http://www.kerneloops.org/search.php?search=sysctl_head_finish
> > Rank 2: __ieee80211_rx
> > Warning at net/mac80211/rx.c:1672
> 
> ie notice how that "Rank 2" got reported as if it was part of the "Rank 1" 
> issue. 

Shows up correctly here. Random guess is that your MUA treats tabs like
spaces when it sees format=flowed. The lines between the different ranks
weren't empty but had a single tab. And for whatever reason (I don't see
any trailing spaces), Arjan's mail had a format=flowed header.

AFAIK, only SPACE CRLF is valid for line-continuation with
format=flowed, not TAB CRLF.  So the above formatting would be the
result of a pointless format=flowed header meeting a MUA that was too
eager searching for line continuations.

Björn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Top 9 kernel oopses/warnings for the week of December 29th, 2007

2007-12-29 Thread Björn Steinbrink
On 2007.12.29 11:18:18 -0800, Linus Torvalds wrote:
 
 
 On Sat, 29 Dec 2007, Arjan van de Ven wrote:
 
  It has been a quiet week due to the holidays, only 55 oops traces
  have been collected.
 
 This would be more useful if it was more readable. As it is, you seem to 
 have some formatting errors in your automation, where the things are 
 incorrectly grouped:
 
  Rank 1: sysctl_head_finish
  sysctl table check failed
  Reported 7 times
  Only reported for the proprietary madwifi driver
  More info: 
  http://www.kerneloops.org/search.php?search=sysctl_head_finish
  Rank 2: __ieee80211_rx
  Warning at net/mac80211/rx.c:1672
 
 ie notice how that Rank 2 got reported as if it was part of the Rank 1 
 issue. 

Shows up correctly here. Random guess is that your MUA treats tabs like
spaces when it sees format=flowed. The lines between the different ranks
weren't empty but had a single tab. And for whatever reason (I don't see
any trailing spaces), Arjan's mail had a format=flowed header.

AFAIK, only SPACE CRLF is valid for line-continuation with
format=flowed, not TAB CRLF.  So the above formatting would be the
result of a pointless format=flowed header meeting a MUA that was too
eager searching for line continuations.

Björn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix dirty page accounting leak with ext3 data=journal

2007-12-21 Thread Björn Steinbrink
In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0, try_to_free_buffers was
changed to bail out if the page was dirty. That caused
truncate_complete_page to leak massive amounts of memory, because the
dirty bit was only cleared after the call to try_to_free_buffers. So the
call to cancel_dirty_page was moved up to have the dirty bit cleared
early in 3e67c0987d7567ad41164a153dca9a43b11d.

The problem with that fix is, that the page can be redirtied after
cancel_dirty_page was called, eg. like this:

truncate_complete_page()
  cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
  do_invalidatepage()
ext3_invalidatepage()
  journal_invalidatepage()
journal_unmap_buffer()
  __dispose_buffer()
__journal_unfile_buffer()
  __journal_temp_unlink_buffer()
mark_buffer_dirty(); // PG_dirty set, incr. dirty pages

And then we end up with dirty pages being wrongly accounted.

In ecdfc9787fe527491baefc22dce8b2dbd5b2908d the changes to
try_to_free_buffers were reverted, so the original reason for the
massive memory leak is gone, so we can also revert the move of
the call to cancel_dirty_page from truncate_complete_page and get the
accounting right again.

Signed-off-by: Björn Steinbrink <[EMAIL PROTECTED]>
Tested-by: Krzysztof Piotr Oledzki <[EMAIL PROTECTED]>

---
I'm not sure if it matters, but opposed to the final check in
__remove_from_page_cache, this one also cares about the task io
accounting, so maybe we want to use this instead, although it's not
quite the clean fix either.

diff --git a/mm/truncate.c b/mm/truncate.c
index cadc156..2974903 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, 
struct page *page)
if (page->mapping != mapping)
return;
 
-   cancel_dirty_page(page, PAGE_CACHE_SIZE);
-
if (PagePrivate(page))
do_invalidatepage(page, 0);
 
+   cancel_dirty_page(page, PAGE_CACHE_SIZE);
+
remove_from_page_cache(page);
ClearPageUptodate(page);
ClearPageMappedToDisk(page);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix dirty page accounting leak with ext3 data=journal

2007-12-21 Thread Björn Steinbrink
In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0, try_to_free_buffers was
changed to bail out if the page was dirty. That caused
truncate_complete_page to leak massive amounts of memory, because the
dirty bit was only cleared after the call to try_to_free_buffers. So the
call to cancel_dirty_page was moved up to have the dirty bit cleared
early in 3e67c0987d7567ad41164a153dca9a43b11d.

The problem with that fix is, that the page can be redirtied after
cancel_dirty_page was called, eg. like this:

truncate_complete_page()
  cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
  do_invalidatepage()
ext3_invalidatepage()
  journal_invalidatepage()
journal_unmap_buffer()
  __dispose_buffer()
__journal_unfile_buffer()
  __journal_temp_unlink_buffer()
mark_buffer_dirty(); // PG_dirty set, incr. dirty pages

And then we end up with dirty pages being wrongly accounted.

In ecdfc9787fe527491baefc22dce8b2dbd5b2908d the changes to
try_to_free_buffers were reverted, so the original reason for the
massive memory leak is gone, so we can also revert the move of
the call to cancel_dirty_page from truncate_complete_page and get the
accounting right again.

Signed-off-by: Björn Steinbrink [EMAIL PROTECTED]
Tested-by: Krzysztof Piotr Oledzki [EMAIL PROTECTED]

---
I'm not sure if it matters, but opposed to the final check in
__remove_from_page_cache, this one also cares about the task io
accounting, so maybe we want to use this instead, although it's not
quite the clean fix either.

diff --git a/mm/truncate.c b/mm/truncate.c
index cadc156..2974903 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, 
struct page *page)
if (page-mapping != mapping)
return;
 
-   cancel_dirty_page(page, PAGE_CACHE_SIZE);
-
if (PagePrivate(page))
do_invalidatepage(page, 0);
 
+   cancel_dirty_page(page, PAGE_CACHE_SIZE);
+
remove_from_page_cache(page);
ClearPageUptodate(page);
ClearPageMappedToDisk(page);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-20 Thread Björn Steinbrink
On 2007.12.20 08:25:56 -0800, Linus Torvalds wrote:
> 
> 
> On Thu, 20 Dec 2007, Bj?rn Steinbrink wrote:
> > 
> > OK, so I looked for PG_dirty anyway.
> > 
> > In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers
> > bail out if the page is dirty.
> > 
> > Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed
> > truncate_complete_page, because it called cancel_dirty_page (and thus
> > cleared PG_dirty) after try_to_free_buffers was called via
> > do_invalidatepage.
> > 
> > Now, if I'm not mistaken, we can end up as follows.
> > 
> > truncate_complete_page()
> >   cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
> >   do_invalidatepage()
> > ext3_invalidatepage()
> >   journal_invalidatepage()
> > journal_unmap_buffer()
> >   __dispose_buffer()
> > __journal_unfile_buffer()
> >   __journal_temp_unlink_buffer()
> > mark_buffer_dirty(); // PG_dirty set, incr. dirty pages
> 
> Good, this seems to be the exact path that actually triggers it. I got to 
> journal_unmap_buffer(), but was too lazy to actually then bother to follow 
> it all the way down - I decided that I didn't actually really even care 
> what the low-level FS layer did, I had already convinced myself that it 
> obviously must be dirtying the page some way, since that matched the 
> symptoms exactly (ie only the journaling case was impacted, and this was 
> all about the journal).
> 
> But perhaps more importantly: regardless of what the low-level filesystem 
> did at that point, the VM accounting shouldn't care, and should be robust 
> in the face of a low-level filesystem doing strange and wonderful things. 
> But thanks for bothering to go through the whole history and figure out 
> what exactly is up.

Oh well, after seeing the move of cancel_dirty_page, I just went
backwards from __set_page_dirty using cscope + some smart guessing and
quickly ended up at ext3_invalidatepage, so it wasn't that hard :-)

> > As try_to_free_buffers got its ext3 hack back in
> > ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe
> > 3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for
> > the accounting fix in cancel_dirty_page, of course).
> 
> Yes, I think we have room for cleanups now, and I agree: we ended up 
> reinstating some questionable code in the VM just because we didn't really 
> know or understand what was going on in the ext3 journal code. 

Hm, you attributed more to my mail than there was actually in it. I
didn't even start to think of cleanups (because I don't know jack about
the whole ext3/jdb stuff, so I simply cannot come up with any cleanups
(yet?)).What I meant is that we only did a half-revert of that hackery.

When try_to_free_buffers started to check for PG_dirty, the
cancel_dirty_page call had to be called before do_invalidatepage, to
"fix" a _huge_ leak.  But that caused the accouting breakage we're now
seeing, because we never account for the pages that got redirtied during
do_invalidatepage.

Then the change to try_to_free_buffers got reverted, so we no longer
need to call cancel_dirty_page before do_invalidatepage, but still we
do. Thus the accounting bug remains. So what I meant to suggest was
simply to actually "finish" the revert we started.

Or expressed as a patch:

diff --git a/mm/truncate.c b/mm/truncate.c
index cadc156..2974903 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, 
struct page *page)
if (page->mapping != mapping)
return;
 
-   cancel_dirty_page(page, PAGE_CACHE_SIZE);
-
if (PagePrivate(page))
do_invalidatepage(page, 0);
 
+   cancel_dirty_page(page, PAGE_CACHE_SIZE);
+
remove_from_page_cache(page);
ClearPageUptodate(page);
ClearPageMappedToDisk(page);

I'll be the last one to comment on whether or not that causes inaccurate
accouting, so I'll just watch you and Jan battle that out until someone
comes up with a post-.24 patch to provide a clean fix for the issue.

Krzysztof, could you give this patch a test run?

If that "fixes" the problem for now, I'll try to come up with some
usable commit message, or if somehow wants to beat me to it, you can
already have my

Signed-off-by: Björn Steinbrink <[EMAIL PROTECTED]>

> > On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task
> > io accounting for cancelled writes happened always happened if the page
> > was dirty, regardless of page->mapping. This was also already true for
> > the old test_clear_page_dirty code, and the commit log for
> > 8368e328dfe1c534957051333a87b3210a12743b

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-20 Thread Björn Steinbrink
On 2007.12.19 09:44:50 -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 16 Dec 2007, Krzysztof Oledzki wrote:
> > 
> > I'll confirm this tomorrow but it seems that even switching to data=ordered
> > (AFAIK default o ext3) is indeed enough to cure this problem.
> 
> Ok, do we actually have any ext3 expert following this? I have no idea 
> about what the journalling code does, but I have painful memories of ext3 
> doing really odd buffer-head-based IO and totally bypassing all the normal 
> page dirty logic.
> 
> Judging by the symptoms (sorry for not following this well, it came up 
> while I was mostly away travelling), something probably *does* clear the 
> dirty bit on the pages, but the dirty *accounting* is not done properly, 
> so the kernel keeps thinking it has dirty pages.
> 
> Now, a simple "grep" shows that ext3 does not actually do any 
> ClearPageDirty() or similar on its own, although maybe I missed some other 
> subtle way this can happen. And the *normal* VFS routines that do 
> ClearPageDirty should all be doing the proper accounting.
> 
> So I see a couple of possible cases:
> 
>  - actually clearing the PG_dirty bit somehow, without doing the 
>accounting.
> 
>This looks very unlikely. PG_dirty is always cleared by some variant of 
>"*ClearPageDirty()", and that bit definition isn't used for anything 
>else in the whole kernel judging by "grep" (the page allocator tests 
>the bit, that's it).

OK, so I looked for PG_dirty anyway.

In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers
bail out if the page is dirty.

Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed
truncate_complete_page, because it called cancel_dirty_page (and thus
cleared PG_dirty) after try_to_free_buffers was called via
do_invalidatepage.

Now, if I'm not mistaken, we can end up as follows.

truncate_complete_page()
  cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
  do_invalidatepage()
ext3_invalidatepage()
  journal_invalidatepage()
journal_unmap_buffer()
  __dispose_buffer()
__journal_unfile_buffer()
  __journal_temp_unlink_buffer()
mark_buffer_dirty(); // PG_dirty set, incr. dirty pages

If journal_unmap_buffer then returns 0, try_to_free_buffers is not
called and neither is cancel_dirty_page, so the dirty pages accounting
is not decreased again.

As try_to_free_buffers got its ext3 hack back in
ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe
3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for
the accounting fix in cancel_dirty_page, of course).


On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task
io accounting for cancelled writes happened always happened if the page
was dirty, regardless of page->mapping. This was also already true for
the old test_clear_page_dirty code, and the commit log for
8368e328dfe1c534957051333a87b3210a12743b doesn't mention that semantic
change either, so maybe the "if (account_size)" block should be moved
out of the if "(mapping && ...)" block?

Björn - not sending patches because he needs sleep and wouldn't have a
damn clue about what to write as a commit message anyway
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-20 Thread Björn Steinbrink
On 2007.12.19 09:44:50 -0800, Linus Torvalds wrote:
 
 
 On Sun, 16 Dec 2007, Krzysztof Oledzki wrote:
  
  I'll confirm this tomorrow but it seems that even switching to data=ordered
  (AFAIK default o ext3) is indeed enough to cure this problem.
 
 Ok, do we actually have any ext3 expert following this? I have no idea 
 about what the journalling code does, but I have painful memories of ext3 
 doing really odd buffer-head-based IO and totally bypassing all the normal 
 page dirty logic.
 
 Judging by the symptoms (sorry for not following this well, it came up 
 while I was mostly away travelling), something probably *does* clear the 
 dirty bit on the pages, but the dirty *accounting* is not done properly, 
 so the kernel keeps thinking it has dirty pages.
 
 Now, a simple grep shows that ext3 does not actually do any 
 ClearPageDirty() or similar on its own, although maybe I missed some other 
 subtle way this can happen. And the *normal* VFS routines that do 
 ClearPageDirty should all be doing the proper accounting.
 
 So I see a couple of possible cases:
 
  - actually clearing the PG_dirty bit somehow, without doing the 
accounting.
 
This looks very unlikely. PG_dirty is always cleared by some variant of 
*ClearPageDirty(), and that bit definition isn't used for anything 
else in the whole kernel judging by grep (the page allocator tests 
the bit, that's it).

OK, so I looked for PG_dirty anyway.

In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers
bail out if the page is dirty.

Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed
truncate_complete_page, because it called cancel_dirty_page (and thus
cleared PG_dirty) after try_to_free_buffers was called via
do_invalidatepage.

Now, if I'm not mistaken, we can end up as follows.

truncate_complete_page()
  cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
  do_invalidatepage()
ext3_invalidatepage()
  journal_invalidatepage()
journal_unmap_buffer()
  __dispose_buffer()
__journal_unfile_buffer()
  __journal_temp_unlink_buffer()
mark_buffer_dirty(); // PG_dirty set, incr. dirty pages

If journal_unmap_buffer then returns 0, try_to_free_buffers is not
called and neither is cancel_dirty_page, so the dirty pages accounting
is not decreased again.

As try_to_free_buffers got its ext3 hack back in
ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe
3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for
the accounting fix in cancel_dirty_page, of course).


On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task
io accounting for cancelled writes happened always happened if the page
was dirty, regardless of page-mapping. This was also already true for
the old test_clear_page_dirty code, and the commit log for
8368e328dfe1c534957051333a87b3210a12743b doesn't mention that semantic
change either, so maybe the if (account_size) block should be moved
out of the if (mapping  ...) block?

Björn - not sending patches because he needs sleep and wouldn't have a
damn clue about what to write as a commit message anyway
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-20 Thread Björn Steinbrink
On 2007.12.20 08:25:56 -0800, Linus Torvalds wrote:
 
 
 On Thu, 20 Dec 2007, Bj?rn Steinbrink wrote:
  
  OK, so I looked for PG_dirty anyway.
  
  In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers
  bail out if the page is dirty.
  
  Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed
  truncate_complete_page, because it called cancel_dirty_page (and thus
  cleared PG_dirty) after try_to_free_buffers was called via
  do_invalidatepage.
  
  Now, if I'm not mistaken, we can end up as follows.
  
  truncate_complete_page()
cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
do_invalidatepage()
  ext3_invalidatepage()
journal_invalidatepage()
  journal_unmap_buffer()
__dispose_buffer()
  __journal_unfile_buffer()
__journal_temp_unlink_buffer()
  mark_buffer_dirty(); // PG_dirty set, incr. dirty pages
 
 Good, this seems to be the exact path that actually triggers it. I got to 
 journal_unmap_buffer(), but was too lazy to actually then bother to follow 
 it all the way down - I decided that I didn't actually really even care 
 what the low-level FS layer did, I had already convinced myself that it 
 obviously must be dirtying the page some way, since that matched the 
 symptoms exactly (ie only the journaling case was impacted, and this was 
 all about the journal).
 
 But perhaps more importantly: regardless of what the low-level filesystem 
 did at that point, the VM accounting shouldn't care, and should be robust 
 in the face of a low-level filesystem doing strange and wonderful things. 
 But thanks for bothering to go through the whole history and figure out 
 what exactly is up.

Oh well, after seeing the move of cancel_dirty_page, I just went
backwards from __set_page_dirty using cscope + some smart guessing and
quickly ended up at ext3_invalidatepage, so it wasn't that hard :-)

  As try_to_free_buffers got its ext3 hack back in
  ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe
  3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for
  the accounting fix in cancel_dirty_page, of course).
 
 Yes, I think we have room for cleanups now, and I agree: we ended up 
 reinstating some questionable code in the VM just because we didn't really 
 know or understand what was going on in the ext3 journal code. 

Hm, you attributed more to my mail than there was actually in it. I
didn't even start to think of cleanups (because I don't know jack about
the whole ext3/jdb stuff, so I simply cannot come up with any cleanups
(yet?)).What I meant is that we only did a half-revert of that hackery.

When try_to_free_buffers started to check for PG_dirty, the
cancel_dirty_page call had to be called before do_invalidatepage, to
fix a _huge_ leak.  But that caused the accouting breakage we're now
seeing, because we never account for the pages that got redirtied during
do_invalidatepage.

Then the change to try_to_free_buffers got reverted, so we no longer
need to call cancel_dirty_page before do_invalidatepage, but still we
do. Thus the accounting bug remains. So what I meant to suggest was
simply to actually finish the revert we started.

Or expressed as a patch:

diff --git a/mm/truncate.c b/mm/truncate.c
index cadc156..2974903 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, 
struct page *page)
if (page-mapping != mapping)
return;
 
-   cancel_dirty_page(page, PAGE_CACHE_SIZE);
-
if (PagePrivate(page))
do_invalidatepage(page, 0);
 
+   cancel_dirty_page(page, PAGE_CACHE_SIZE);
+
remove_from_page_cache(page);
ClearPageUptodate(page);
ClearPageMappedToDisk(page);

I'll be the last one to comment on whether or not that causes inaccurate
accouting, so I'll just watch you and Jan battle that out until someone
comes up with a post-.24 patch to provide a clean fix for the issue.

Krzysztof, could you give this patch a test run?

If that fixes the problem for now, I'll try to come up with some
usable commit message, or if somehow wants to beat me to it, you can
already have my

Signed-off-by: Björn Steinbrink [EMAIL PROTECTED]

  On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task
  io accounting for cancelled writes happened always happened if the page
  was dirty, regardless of page-mapping. This was also already true for
  the old test_clear_page_dirty code, and the commit log for
  8368e328dfe1c534957051333a87b3210a12743b doesn't mention that semantic
  change either, so maybe the if (account_size) block should be moved
  out of the if (mapping  ...) block?
 
 I think the if (account_size) thing was *purely* for me being worried 
 about hugetlb entries, and I think that's the only thing that passes in a 
 zero account size.
 
 But hugetlbfs already has BDI_CAP_NO_ACCT_DIRTY set (exactly because we 
 cannot account

Re: Major regression on hackbench with SLUB

2007-12-09 Thread Björn Steinbrink
On 2007.12.08 17:16:24 -0500, Steven Rostedt wrote:
> 
> Hi Linus,
> 
> > On Fri, 7 Dec 2007, Linus Torvalds wrote:
> > >
> > > Can you do one run with oprofile, and see exactly where the cost is? It
> > > should hopefully be pretty darn obvious, considering your timing.
> 
> The results are here:
> 
> http://people.redhat.com/srostedt/slub/results/slab.op
> http://people.redhat.com/srostedt/slub/results/slub.op

Hm, you seem to be hitting the "another_slab" stuff in __slab_alloc
alot. I wonder if !node_match triggers too often. We always start with
the per cpu slab, if that one is on the wrong node, you'll always hit
that "another_slab" path.

After searching for way too long (given that I have no clue about that
stuff anyway and just read the code out of curiousness), I noticed that
the the cpu_to_node stuff on x86_64 seems to be initialized to 0xff
(arch/x86/mm/numa_64.c), and Google brought me this dmesg output [1],
which, AFAICT, shows that the per cpu slab setup is done _before_
cpu_to_node is correctly setup. That would lead to the per cpu slabs all
having node == 0xff, which looks pretty bad.

Disclaimer: I read the slub/numa/$WHATEVER_I_SAW_THERE for the first
time, so this might be total bull ;-)

Björn

[1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-10/msg04648.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Major regression on hackbench with SLUB

2007-12-09 Thread Björn Steinbrink
On 2007.12.08 17:16:24 -0500, Steven Rostedt wrote:
 
 Hi Linus,
 
  On Fri, 7 Dec 2007, Linus Torvalds wrote:
  
   Can you do one run with oprofile, and see exactly where the cost is? It
   should hopefully be pretty darn obvious, considering your timing.
 
 The results are here:
 
 http://people.redhat.com/srostedt/slub/results/slab.op
 http://people.redhat.com/srostedt/slub/results/slub.op

Hm, you seem to be hitting the another_slab stuff in __slab_alloc
alot. I wonder if !node_match triggers too often. We always start with
the per cpu slab, if that one is on the wrong node, you'll always hit
that another_slab path.

After searching for way too long (given that I have no clue about that
stuff anyway and just read the code out of curiousness), I noticed that
the the cpu_to_node stuff on x86_64 seems to be initialized to 0xff
(arch/x86/mm/numa_64.c), and Google brought me this dmesg output [1],
which, AFAICT, shows that the per cpu slab setup is done _before_
cpu_to_node is correctly setup. That would lead to the per cpu slabs all
having node == 0xff, which looks pretty bad.

Disclaimer: I read the slub/numa/$WHATEVER_I_SAW_THERE for the first
time, so this might be total bull ;-)

Björn

[1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-10/msg04648.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: git guidance

2007-12-07 Thread Björn Steinbrink
On 2007.12.07 13:53:11 +0300, Al Boldi wrote:
> Andreas Ericsson wrote:
> > So, to get to the bottom of this, which of the following workflows is it
> > you want git to support?
> >
> > ### WORKFLOW A ###
> > edit, edit, edit
> > edit, edit, edit
> > edit, edit, edit
> > Oops I made a mistake and need to hop back to "current - 12".
> > edit, edit, edit
> > edit, edit, edit
> > publish everything, similar to just tarring up your workdir and sending
> > out ### END WORKFLOW A ###
> >
> > ### WORKFLOW B ###
> > edit, edit, edit
> > ok this looks good, I want to save a checkpoint here
> > edit, edit, edit
> > looks good again. next checkpoint
> > edit, edit, edit
> > oh crap, back to checkpoint 2
> > edit, edit, edit
> > ooh, that's better. save a checkpoint and publish those checkpoints
> > ### END WORKFLOW B ###
> 
> ### WORKFLOW C ###
> for every save on a gitfs mounted dir, do an implied checkpoint, commit, or 
> publish (should be adjustable), on its privately created on-the-fly 
> repository.
> ### END WORKFLOW C ###
> 
> For example:
> 
>   echo "// last comment on this file" >> /gitfs.mounted/file
> 
> should do an implied checkpoint, and make these checkpoints immediately 
> visible under some checkpoint branch of the gitfs mounted dir.
> 
> Note, this way the developer gets version control without even noticing, and 
> works completely transparent to any kind of application.

Ouch... That looks worse than "plain" per-file versioning. Not only do
you per definition get "broken" commits if there's a change that affects
two dependent files, you also get an insane amount of commits just for
testing stuff, or fixing bugs.

And unless you use some kind of union-fs on top (or keep ignored files
in special unversioned area in your gitfs, which seems somewhat ugly),
you'll probably also have to track lots of files in the working
directory that are generated, unless you want to re-generate them after
each reboot. And that leads to even more absolutely useless revisions.

Just thinking of my vim .swp files (which I definitely don't want to
loose on a crash/power outtage/pkill -9 . dammit) makes me scream
because of the gazillion of commits they will produce (and no, I don't
want them in some special out of tree directory).

Plus, I have vim setup to _replace_ files on write, so that I can more
easily use hard-linked copies with changing all copies at once _unless_
I explicitly want to, meaning that I'd get full remove/add commits,
which are absolutely useless. And trying to detect such patterns
(rename, then write the changed file with the old name and then delete
the renamed file) is probably not worth the trouble, because you
coincidently might _want_ to have just these three steps recorded when
you happen to perform them manually. And if you go for heuristics,
you'll complain each time you get a false-positive/negative.


That said, out of pure curiousness I came up with the attached script
which just uses inotifywait to watch a directory and issue git commands
on certain events. It is extremely stupid, but seems to work. And at
least it hasn't got the drawbacks of a real gitfs regarding the need to
have a "separate" non-versioned storage area for the working directory,
because it simply uses the existing working directory wherever that
might be stored. It doesn't use GIT_DIR/WORK_DIR yet, but hey, should be
easy to add...

Feel free to mess with that thing, hey, maybe you even like it and
extend it to match your proposed workflow even more. I for sure won't
use or even extend it, so you're likely on your own there.

Side-note: Writing that script probably took less time than writing this
email and probably less time than was wasted on this topic. Makes me
want to use today's preferred "Code talks, b...s... walks" statement,
but I'll refrain from that... Just because I lack the credibility to say
that, and the script attached is quite crappy ;-)

Björn
#!/bin/bash
inotifywait -m -r --exclude ^\./\.git/.* -e close_write -e move -e create -e 
delete . 2>/dev/null |
while read FILE_PATH EVENT FILE_NAME
do
FILE_NAME="$FILE_PATH$FILE_NAME"
FILE_NAME=${FILE_NAME#./}

# git doesn't care about directories
if [ -d "$FILE_NAME" ]
then
continue
fi

case "$EVENT" in
*CLOSE_WRITE*)
ACTION=change
;;
*MOVED_TO*)
ACTION=create
;;
*MODIFY*)
ACTION=change
;;
*DELETE*)
ACTION=delete
;;
*MOVED_FROM*)
ACTION=delete
;;
*CREATE*)
ACTION=create
;;
*)
continue
;;
esac

case $ACTION in
create)
git add "$FILE_NAME"
git commit -m "$FILE_NAME created"
;;
   

Re: git guidance

2007-12-07 Thread Björn Steinbrink
On 2007.12.07 13:53:11 +0300, Al Boldi wrote:
 Andreas Ericsson wrote:
  So, to get to the bottom of this, which of the following workflows is it
  you want git to support?
 
  ### WORKFLOW A ###
  edit, edit, edit
  edit, edit, edit
  edit, edit, edit
  Oops I made a mistake and need to hop back to current - 12.
  edit, edit, edit
  edit, edit, edit
  publish everything, similar to just tarring up your workdir and sending
  out ### END WORKFLOW A ###
 
  ### WORKFLOW B ###
  edit, edit, edit
  ok this looks good, I want to save a checkpoint here
  edit, edit, edit
  looks good again. next checkpoint
  edit, edit, edit
  oh crap, back to checkpoint 2
  edit, edit, edit
  ooh, that's better. save a checkpoint and publish those checkpoints
  ### END WORKFLOW B ###
 
 ### WORKFLOW C ###
 for every save on a gitfs mounted dir, do an implied checkpoint, commit, or 
 publish (should be adjustable), on its privately created on-the-fly 
 repository.
 ### END WORKFLOW C ###
 
 For example:
 
   echo // last comment on this file  /gitfs.mounted/file
 
 should do an implied checkpoint, and make these checkpoints immediately 
 visible under some checkpoint branch of the gitfs mounted dir.
 
 Note, this way the developer gets version control without even noticing, and 
 works completely transparent to any kind of application.

Ouch... That looks worse than plain per-file versioning. Not only do
you per definition get broken commits if there's a change that affects
two dependent files, you also get an insane amount of commits just for
testing stuff, or fixing bugs.

And unless you use some kind of union-fs on top (or keep ignored files
in special unversioned area in your gitfs, which seems somewhat ugly),
you'll probably also have to track lots of files in the working
directory that are generated, unless you want to re-generate them after
each reboot. And that leads to even more absolutely useless revisions.

Just thinking of my vim .swp files (which I definitely don't want to
loose on a crash/power outtage/pkill -9 .ENTER dammit) makes me scream
because of the gazillion of commits they will produce (and no, I don't
want them in some special out of tree directory).

Plus, I have vim setup to _replace_ files on write, so that I can more
easily use hard-linked copies with changing all copies at once _unless_
I explicitly want to, meaning that I'd get full remove/add commits,
which are absolutely useless. And trying to detect such patterns
(rename, then write the changed file with the old name and then delete
the renamed file) is probably not worth the trouble, because you
coincidently might _want_ to have just these three steps recorded when
you happen to perform them manually. And if you go for heuristics,
you'll complain each time you get a false-positive/negative.


That said, out of pure curiousness I came up with the attached script
which just uses inotifywait to watch a directory and issue git commands
on certain events. It is extremely stupid, but seems to work. And at
least it hasn't got the drawbacks of a real gitfs regarding the need to
have a separate non-versioned storage area for the working directory,
because it simply uses the existing working directory wherever that
might be stored. It doesn't use GIT_DIR/WORK_DIR yet, but hey, should be
easy to add...

Feel free to mess with that thing, hey, maybe you even like it and
extend it to match your proposed workflow even more. I for sure won't
use or even extend it, so you're likely on your own there.

Side-note: Writing that script probably took less time than writing this
email and probably less time than was wasted on this topic. Makes me
want to use today's preferred Code talks, b...s... walks statement,
but I'll refrain from that... Just because I lack the credibility to say
that, and the script attached is quite crappy ;-)

Björn
#!/bin/bash
inotifywait -m -r --exclude ^\./\.git/.* -e close_write -e move -e create -e 
delete . 2/dev/null |
while read FILE_PATH EVENT FILE_NAME
do
FILE_NAME=$FILE_PATH$FILE_NAME
FILE_NAME=${FILE_NAME#./}

# git doesn't care about directories
if [ -d $FILE_NAME ]
then
continue
fi

case $EVENT in
*CLOSE_WRITE*)
ACTION=change
;;
*MOVED_TO*)
ACTION=create
;;
*MODIFY*)
ACTION=change
;;
*DELETE*)
ACTION=delete
;;
*MOVED_FROM*)
ACTION=delete
;;
*CREATE*)
ACTION=create
;;
*)
continue
;;
esac

case $ACTION in
create)
git add $FILE_NAME
git commit -m $FILE_NAME created
;;
delete)
git rm --cached $FILE_NAME

Re: [PATCH] fix adbhid mismerge

2007-10-17 Thread Björn Steinbrink
On 2007.10.17 18:18:21 +0200, Björn Steinbrink wrote:
> On 2007.10.16 19:21:53 -0700, Linus Torvalds wrote:
> > 
> > 
> > On Tue, 16 Oct 2007, Linus Torvalds wrote:
> > > 
> > > I don't think you did anything wrong. You used both --full-history 
> > > (implicitly: git-whatchanged) and you made sure to see the diffs for both 
> > > sides of any merge (-m), and that means that you should see every single 
> > > diff involved.
> > 
> > Btw, if anybody can come up with a better way to find these kinds of 
> > mis-merges, I'd love to hear about it.
> > 
> [...]
> > 
> > What I'd actually really like would be something that shows the original 
> > conflict, but that's really expensive to compute (it basically involves 
> > re-doing the merge from scratch - finding the proper base commit(s) etc). 
> > So we never did that.
> 
> So here's what I came up with:
> 
> git grep -l "int keycode, up_flag" \
>   $(git-rev-list HEAD --parents -- drivers/macintosh/adbhid.c | \
>   egrep '(.{40} ?){3}' | cut -d' ' -f1) \
>   -- drivers/macintosh/adbhid.c | grep -o '^[^:]*'
> 
> Which gives: b981d8b3f5e008ff10d993be633ad00564fc22cd
> 
> Then:
> git checkout b981d8b3f5e008ff10d993be633ad00564fc22cd^1
> git merge b981d8b3f5e008ff10d993be633ad00564fc22cd^2
> 
> And you got your merge conflict.
> 
> The idea is, that the above ugliness searches for the last commit that

Oops!
Obviously I meant to do s/last commit/merge commit/ before sending that
email.

> produced the bad line. The inner git-rev-list call searches for merge
> commits (thanks to Ilari in #git for the egrep trick), then git-grep
> looks which of these have the "bad line" and the final grep just filters
> the filename out.
> 
> If the bash thing spits out more than one commit hash, you probably want
> to use the last one... I guess... And if the given result doesn't
> produce the request merge conflict, well, I guess you could replace HEAD
> in the git-rev-list call with the sha1 you got in the first run, but I'm
> not entirely sure about that.
> 
> Is that helpful?
> 
> Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix adbhid mismerge

2007-10-17 Thread Björn Steinbrink
On 2007.10.16 19:21:53 -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 16 Oct 2007, Linus Torvalds wrote:
> > 
> > I don't think you did anything wrong. You used both --full-history 
> > (implicitly: git-whatchanged) and you made sure to see the diffs for both 
> > sides of any merge (-m), and that means that you should see every single 
> > diff involved.
> 
> Btw, if anybody can come up with a better way to find these kinds of 
> mis-merges, I'd love to hear about it.
> 
[...]
> 
> What I'd actually really like would be something that shows the original 
> conflict, but that's really expensive to compute (it basically involves 
> re-doing the merge from scratch - finding the proper base commit(s) etc). 
> So we never did that.

So here's what I came up with:

git grep -l "int keycode, up_flag" \
$(git-rev-list HEAD --parents -- drivers/macintosh/adbhid.c | \
egrep '(.{40} ?){3}' | cut -d' ' -f1) \
-- drivers/macintosh/adbhid.c | grep -o '^[^:]*'

Which gives: b981d8b3f5e008ff10d993be633ad00564fc22cd

Then:
git checkout b981d8b3f5e008ff10d993be633ad00564fc22cd^1
git merge b981d8b3f5e008ff10d993be633ad00564fc22cd^2

And you got your merge conflict.

The idea is, that the above ugliness searches for the last commit that
produced the bad line. The inner git-rev-list call searches for merge
commits (thanks to Ilari in #git for the egrep trick), then git-grep
looks which of these have the "bad line" and the final grep just filters
the filename out.

If the bash thing spits out more than one commit hash, you probably want
to use the last one... I guess... And if the given result doesn't
produce the request merge conflict, well, I guess you could replace HEAD
in the git-rev-list call with the sha1 you got in the first run, but I'm
not entirely sure about that.

Is that helpful?

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: arcmsr changelog differs from diffs

2007-10-17 Thread Björn Steinbrink
Hi Hans,

On 2007.10.17 12:41:11 +0200, Hans-Peter Jansen wrote:
> Dear Björn,
> 
> Am Mittwoch, 17. Oktober 2007 02:53 schrieb Björn Steinbrink:
> > And then, if I may guess, James probably just noticed that there were
> > changes left and commited them (while they were now down to just the
> > whitespace change), without checking what changes were actually left (no
> > offense intended). At least I think that git wouldn't cripple the diff
> > if the changes that James checked in were not already whitespace-only at
> > the time he commited them, and the git history of his tree seems to
> > agree.
> 
> (*) Thanks for the clarification, Björn. What made me stumbling was the 
> different orginators, while Jeff isn't known with catching attention by 
> assimilating other peoples contributions. 

I'm just guessing here, and certainly don't want to blame anyone, but
Jeff's original patch was from July, while the only date I can find for
Nick's commit is September 13th. So it was probably Nick who integrated
Jeff's changes into his commit (maybe thinking that Jeff's patch got
lost), not Jeff assimilating a subset of Nick's changes.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix adbhid mismerge

2007-10-17 Thread Björn Steinbrink
On 2007.10.17 18:18:21 +0200, Björn Steinbrink wrote:
 On 2007.10.16 19:21:53 -0700, Linus Torvalds wrote:
  
  
  On Tue, 16 Oct 2007, Linus Torvalds wrote:
   
   I don't think you did anything wrong. You used both --full-history 
   (implicitly: git-whatchanged) and you made sure to see the diffs for both 
   sides of any merge (-m), and that means that you should see every single 
   diff involved.
  
  Btw, if anybody can come up with a better way to find these kinds of 
  mis-merges, I'd love to hear about it.
  
 [...]
  
  What I'd actually really like would be something that shows the original 
  conflict, but that's really expensive to compute (it basically involves 
  re-doing the merge from scratch - finding the proper base commit(s) etc). 
  So we never did that.
 
 So here's what I came up with:
 
 git grep -l int keycode, up_flag \
   $(git-rev-list HEAD --parents -- drivers/macintosh/adbhid.c | \
   egrep '(.{40} ?){3}' | cut -d' ' -f1) \
   -- drivers/macintosh/adbhid.c | grep -o '^[^:]*'
 
 Which gives: b981d8b3f5e008ff10d993be633ad00564fc22cd
 
 Then:
 git checkout b981d8b3f5e008ff10d993be633ad00564fc22cd^1
 git merge b981d8b3f5e008ff10d993be633ad00564fc22cd^2
 
 And you got your merge conflict.
 
 The idea is, that the above ugliness searches for the last commit that

Oops!
Obviously I meant to do s/last commit/merge commit/ before sending that
email.

 produced the bad line. The inner git-rev-list call searches for merge
 commits (thanks to Ilari in #git for the egrep trick), then git-grep
 looks which of these have the bad line and the final grep just filters
 the filename out.
 
 If the bash thing spits out more than one commit hash, you probably want
 to use the last one... I guess... And if the given result doesn't
 produce the request merge conflict, well, I guess you could replace HEAD
 in the git-rev-list call with the sha1 you got in the first run, but I'm
 not entirely sure about that.
 
 Is that helpful?
 
 Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: arcmsr changelog differs from diffs

2007-10-17 Thread Björn Steinbrink
Hi Hans,

On 2007.10.17 12:41:11 +0200, Hans-Peter Jansen wrote:
 Dear Björn,
 
 Am Mittwoch, 17. Oktober 2007 02:53 schrieb Björn Steinbrink:
  And then, if I may guess, James probably just noticed that there were
  changes left and commited them (while they were now down to just the
  whitespace change), without checking what changes were actually left (no
  offense intended). At least I think that git wouldn't cripple the diff
  if the changes that James checked in were not already whitespace-only at
  the time he commited them, and the git history of his tree seems to
  agree.
 
 (*) Thanks for the clarification, Björn. What made me stumbling was the 
 different orginators, while Jeff isn't known with catching attention by 
 assimilating other peoples contributions. 

I'm just guessing here, and certainly don't want to blame anyone, but
Jeff's original patch was from July, while the only date I can find for
Nick's commit is September 13th. So it was probably Nick who integrated
Jeff's changes into his commit (maybe thinking that Jeff's patch got
lost), not Jeff assimilating a subset of Nick's changes.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix adbhid mismerge

2007-10-17 Thread Björn Steinbrink
On 2007.10.16 19:21:53 -0700, Linus Torvalds wrote:
 
 
 On Tue, 16 Oct 2007, Linus Torvalds wrote:
  
  I don't think you did anything wrong. You used both --full-history 
  (implicitly: git-whatchanged) and you made sure to see the diffs for both 
  sides of any merge (-m), and that means that you should see every single 
  diff involved.
 
 Btw, if anybody can come up with a better way to find these kinds of 
 mis-merges, I'd love to hear about it.
 
[...]
 
 What I'd actually really like would be something that shows the original 
 conflict, but that's really expensive to compute (it basically involves 
 re-doing the merge from scratch - finding the proper base commit(s) etc). 
 So we never did that.

So here's what I came up with:

git grep -l int keycode, up_flag \
$(git-rev-list HEAD --parents -- drivers/macintosh/adbhid.c | \
egrep '(.{40} ?){3}' | cut -d' ' -f1) \
-- drivers/macintosh/adbhid.c | grep -o '^[^:]*'

Which gives: b981d8b3f5e008ff10d993be633ad00564fc22cd

Then:
git checkout b981d8b3f5e008ff10d993be633ad00564fc22cd^1
git merge b981d8b3f5e008ff10d993be633ad00564fc22cd^2

And you got your merge conflict.

The idea is, that the above ugliness searches for the last commit that
produced the bad line. The inner git-rev-list call searches for merge
commits (thanks to Ilari in #git for the egrep trick), then git-grep
looks which of these have the bad line and the final grep just filters
the filename out.

If the bash thing spits out more than one commit hash, you probably want
to use the last one... I guess... And if the given result doesn't
produce the request merge conflict, well, I guess you could replace HEAD
in the git-rev-list call with the sha1 you got in the first run, but I'm
not entirely sure about that.

Is that helpful?

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: arcmsr changelog differs from diffs

2007-10-16 Thread Björn Steinbrink
On 2007.10.17 00:07:19 +0200, Hans-Peter Jansen wrote:
> Hi Jeff,
> 
> while browsing through Linus' current check ins, I stumbled upon:
> 
> [SCSI] arcmsr: irq handler fixes, cleanups, micro-opts:
> 
> --8<--
> 488a5c8a9a3b67ae117784cd0d73bef53a73d57d
>  drivers/scsi/arcmsr/arcmsr_hba.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c 
> b/drivers/scsi/arcmsr/arcmsr_hba.c
> index 7832a10..f4d2d52 100644
> --- a/drivers/scsi/arcmsr/arcmsr_hba.c
> +++ b/drivers/scsi/arcmsr/arcmsr_hba.c
> @@ -422,7 +422,7 @@ static int arcmsr_probe(struct pci_dev *pdev,
>   goto out_release_regions;
>  
>   error = request_irq(pdev->irq, arcmsr_do_interrupt,
> - IRQF_SHARED, "arcmsr", acb);
> + IRQF_SHARED, "arcmsr", acb);
>   if (error)
>   goto out_free_ccb_pool;
>  
> -->8--
> 
> and: [SCSI] arcmsr: Fix hardware wait loops
> 
> --8<--
> 24430458bb924e371ff894e26bfa9f73707f53fb
>  drivers/scsi/arcmsr/arcmsr_hba.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c 
> b/drivers/scsi/arcmsr/arcmsr_hba.c
> index 50e1310..7832a10 100644
> --- a/drivers/scsi/arcmsr/arcmsr_hba.c
> +++ b/drivers/scsi/arcmsr/arcmsr_hba.c
> @@ -2092,8 +2092,10 @@ static void arcmsr_iop_reset(struct 
> AdapterControlBlock *acb)
>   if (atomic_read(>ccboutstandingcount) != 0) {
>   /* talk to iop 331 outstanding command aborted */
>   arcmsr_abort_allcmd(acb);
> +
>   /* wait for 3 sec for all command aborted*/
>   ssleep(3);
> +
>   /* disable all outbound interrupt */
>   intmask_org = arcmsr_disable_outbound_ints(acb);
>   /* clear all outbound posted Q */
> -->8--
> 
> where both changelogs differ significantly from the actual diffs, which both 
> are simple WS fixups and nothing else. Does qgit fools me here, or is 
> anything else wrong on my side?

Nothing wrong on your side. I took a look at the second one, and
everything but the whitespace changes already found its way into Linus'
tree via 1a4f550a09f89e3a15eff1971bc9db977571b9f6. One hunk of the
original patch[1] was actually made redundant because the code was
removed in that commit, so that's probably what James fixed (see full
commit message).

And then, if I may guess, James probably just noticed that there were
changes left and commited them (while they were now down to just the
whitespace change), without checking what changes were actually left (no
offense intended). At least I think that git wouldn't cripple the diff
if the changes that James checked in were not already whitespace-only at
the time he commited them, and the git history of his tree seems to
agree.

Probably the other commit is similar.

Björn

[1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-07/msg11957.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: arcmsr changelog differs from diffs

2007-10-16 Thread Björn Steinbrink
On 2007.10.17 00:07:19 +0200, Hans-Peter Jansen wrote:
 Hi Jeff,
 
 while browsing through Linus' current check ins, I stumbled upon:
 
 [SCSI] arcmsr: irq handler fixes, cleanups, micro-opts:
 
 --8--
 488a5c8a9a3b67ae117784cd0d73bef53a73d57d
  drivers/scsi/arcmsr/arcmsr_hba.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c 
 b/drivers/scsi/arcmsr/arcmsr_hba.c
 index 7832a10..f4d2d52 100644
 --- a/drivers/scsi/arcmsr/arcmsr_hba.c
 +++ b/drivers/scsi/arcmsr/arcmsr_hba.c
 @@ -422,7 +422,7 @@ static int arcmsr_probe(struct pci_dev *pdev,
   goto out_release_regions;
  
   error = request_irq(pdev-irq, arcmsr_do_interrupt,
 - IRQF_SHARED, arcmsr, acb);
 + IRQF_SHARED, arcmsr, acb);
   if (error)
   goto out_free_ccb_pool;
  
 --8--
 
 and: [SCSI] arcmsr: Fix hardware wait loops
 
 --8--
 24430458bb924e371ff894e26bfa9f73707f53fb
  drivers/scsi/arcmsr/arcmsr_hba.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c 
 b/drivers/scsi/arcmsr/arcmsr_hba.c
 index 50e1310..7832a10 100644
 --- a/drivers/scsi/arcmsr/arcmsr_hba.c
 +++ b/drivers/scsi/arcmsr/arcmsr_hba.c
 @@ -2092,8 +2092,10 @@ static void arcmsr_iop_reset(struct 
 AdapterControlBlock *acb)
   if (atomic_read(acb-ccboutstandingcount) != 0) {
   /* talk to iop 331 outstanding command aborted */
   arcmsr_abort_allcmd(acb);
 +
   /* wait for 3 sec for all command aborted*/
   ssleep(3);
 +
   /* disable all outbound interrupt */
   intmask_org = arcmsr_disable_outbound_ints(acb);
   /* clear all outbound posted Q */
 --8--
 
 where both changelogs differ significantly from the actual diffs, which both 
 are simple WS fixups and nothing else. Does qgit fools me here, or is 
 anything else wrong on my side?

Nothing wrong on your side. I took a look at the second one, and
everything but the whitespace changes already found its way into Linus'
tree via 1a4f550a09f89e3a15eff1971bc9db977571b9f6. One hunk of the
original patch[1] was actually made redundant because the code was
removed in that commit, so that's probably what James fixed (see full
commit message).

And then, if I may guess, James probably just noticed that there were
changes left and commited them (while they were now down to just the
whitespace change), without checking what changes were actually left (no
offense intended). At least I think that git wouldn't cripple the diff
if the changes that James checked in were not already whitespace-only at
the time he commited them, and the git history of his tree seems to
agree.

Probably the other commit is similar.

Björn

[1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-07/msg11957.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: what is the rationale for "TAINT_USER"?

2007-10-12 Thread Björn Steinbrink
On 2007.10.12 08:37:04 -0400, Robert P. J. Day wrote:
> On Fri, 12 Oct 2007, Björn Steinbrink wrote:
> 
> > On 2007.10.12 08:04:20 -0400, Robert P. J. Day wrote:
> > >
> > >   i can see what the theoretical purpose for it is here:
> > >
> > > http://kerneltrap.org/node/6656
> > >
> > > but it's not clear how it can possibly be set from userland given
> > > that:
> > >
> > > $ grep -r TAINT_USER *
> > > include/linux/kernel.h:#define TAINT_USER   (1<<6)
> > > kernel/panic.c: tainted & TAINT_USER ? 'U' : ' ',
> > > $
> > >
> > >   am i missing something screamingly obvious?
> >
> > Grepping for "tainted" leads me to:
> >
> > echo 32 > /proc/sys/kernel/tainted
> 
> ???.  i have no idea what you were grepping through to find that
> phrase, but TAINT_USER would seem to be equivalent to echo 64, not
> echo 32, anyway, no?

Oops, yeah, 64... And it didn't lead me to the exact phrase, but the
sysctl handler, which (implicitly) led me to that command. Sorry for
the confusion.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: what is the rationale for "TAINT_USER"?

2007-10-12 Thread Björn Steinbrink
On 2007.10.12 08:04:20 -0400, Robert P. J. Day wrote:
> 
>   i can see what the theoretical purpose for it is here:
> 
> http://kerneltrap.org/node/6656
> 
> but it's not clear how it can possibly be set from userland given
> that:
> 
> $ grep -r TAINT_USER *
> include/linux/kernel.h:#define TAINT_USER   (1<<6)
> kernel/panic.c: tainted & TAINT_USER ? 'U' : ' ',
> $
> 
>   am i missing something screamingly obvious?

Grepping for "tainted" leads me to:

echo 32 > /proc/sys/kernel/tainted

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NMI watchdog

2007-10-12 Thread Björn Steinbrink
On 2007.10.12 11:18:24 +0200, John Sigler wrote:
> Hello,
>
> I'm experiencing a full system lockup. I'm using an out-of-tree driver 
> which I suspect is responsible. I'm trying to enable the NMI watchdog.
>
> # cat /proc/version
> Linux version 2.6.22.1-rt9 (gcc version 3.4.6) #1 PREEMPT RT Tue Oct 9 
> 12:25:47 CEST 2007
>
> # cat /proc/cmdline
> ro root=/dev/hdc1 console=ttyS0,57600n8 console=tty0 panic=3 apic=debug 
> nmi_watchdog=2
>
> However, after boot, the NMI count does not change.
>
> # cat /proc/interrupts ; sleep 10 ; cat /proc/interrupts

Try running some cpu hog in the background. The performance counters get
increased only when the CPU is actually doing something. On a mostly
idle system, it can take quite a while for the next NMI to show up.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: what is the rationale for TAINT_USER?

2007-10-12 Thread Björn Steinbrink
On 2007.10.12 08:37:04 -0400, Robert P. J. Day wrote:
 On Fri, 12 Oct 2007, Björn Steinbrink wrote:
 
  On 2007.10.12 08:04:20 -0400, Robert P. J. Day wrote:
  
 i can see what the theoretical purpose for it is here:
  
   http://kerneltrap.org/node/6656
  
   but it's not clear how it can possibly be set from userland given
   that:
  
   $ grep -r TAINT_USER *
   include/linux/kernel.h:#define TAINT_USER   (16)
   kernel/panic.c: tainted  TAINT_USER ? 'U' : ' ',
   $
  
 am i missing something screamingly obvious?
 
  Grepping for tainted leads me to:
 
  echo 32  /proc/sys/kernel/tainted
 
 ???.  i have no idea what you were grepping through to find that
 phrase, but TAINT_USER would seem to be equivalent to echo 64, not
 echo 32, anyway, no?

Oops, yeah, 64... And it didn't lead me to the exact phrase, but the
sysctl handler, which (implicitly) led me to that command. Sorry for
the confusion.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NMI watchdog

2007-10-12 Thread Björn Steinbrink
On 2007.10.12 11:18:24 +0200, John Sigler wrote:
 Hello,

 I'm experiencing a full system lockup. I'm using an out-of-tree driver 
 which I suspect is responsible. I'm trying to enable the NMI watchdog.

 # cat /proc/version
 Linux version 2.6.22.1-rt9 (gcc version 3.4.6) #1 PREEMPT RT Tue Oct 9 
 12:25:47 CEST 2007

 # cat /proc/cmdline
 ro root=/dev/hdc1 console=ttyS0,57600n8 console=tty0 panic=3 apic=debug 
 nmi_watchdog=2

 However, after boot, the NMI count does not change.

 # cat /proc/interrupts ; sleep 10 ; cat /proc/interrupts

Try running some cpu hog in the background. The performance counters get
increased only when the CPU is actually doing something. On a mostly
idle system, it can take quite a while for the next NMI to show up.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: what is the rationale for TAINT_USER?

2007-10-12 Thread Björn Steinbrink
On 2007.10.12 08:04:20 -0400, Robert P. J. Day wrote:
 
   i can see what the theoretical purpose for it is here:
 
 http://kerneltrap.org/node/6656
 
 but it's not clear how it can possibly be set from userland given
 that:
 
 $ grep -r TAINT_USER *
 include/linux/kernel.h:#define TAINT_USER   (16)
 kernel/panic.c: tainted  TAINT_USER ? 'U' : ' ',
 $
 
   am i missing something screamingly obvious?

Grepping for tainted leads me to:

echo 32  /proc/sys/kernel/tainted

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: time_after - what on earth???

2007-09-11 Thread Björn Steinbrink
On 2007.09.12 00:10:19 +0100, Adrian McMenamin wrote:
> On 12/09/2007, Björn Steinbrink <[EMAIL PROTECTED]> wrote:
> > On 2007.09.12 00:19:09 +0200, Rene Herman wrote:
> > > On 09/12/2007 12:15 AM, Adrian McMenamin wrote:
> > >
> > >> On 11/09/2007, Rene Herman <[EMAIL PROTECTED]> wrote:
> > >>> On 09/12/2007 12:05 AM, Adrian McMenamin wrote:
> > >>>
> > >>>> OK, why does this line occasionally return true:
> >
> > What exactly is "occassionally"?  Does it happen more than once per
> > boot? If not, and it happens after a certain time after booting, it
> > might be wrapping of the jiffie counter (see below).
> >
> > >>>>
> > >>>>   if ((maple_dev->interval > 0) && (jiffies >maple_dev->when))
> > >>>>
> > >>>> while this one never does (no other changes made):
> > >>>>
> > >>>> if  ((maple_dev->interval > 0) && (time_after(jiffies,
> > >>>> maple_dev->when)))
> > >>> Is maple_dev->when an unsigned long?
> > >>>
> > >> Yes. Does that make a difference?
> > >
> > > If it had been a signed type, it could've wrapped to something you didn't
> > > expect, explaining the difference at least...
> > >
> > > With an unsigned long, the only diference should be that time_after() 
> > > deals
> > > with jiffie wrapping which I assume is not an actual problem here. I'll
> > > retreat into the shades again... ;-(
> >
> > If "occasionally" is limited to once per boot, it might be jiffie
> > wrapping. IIRC jiffies are initialized so that they wrap after about 5
> > minutes of uptime to reveal such bugs without forcing you to wait for
> > ages just to have the counter wrap for the first time.
> >
> 
> No, I mean "works properly" - ie occasionally evaluates as true

Ehrm, yeah, I somehow parsed that as if it had a negation in there.

Anyway, I looked up the patches you posted. "when" is initialized to 0
and only changed if the above condition evaluates to true. Now,
time_after and "<" have different points at which "future" and "past"
are separated. time_after splits (about) equally between future and
past, so 0 can be either, depending on the value of jiffies. But for "<"
0 is almost always in the past, except for the seldom event of jiffies
being 0.

Now, given that jiffies start out at a huge value to make the counter
wrap around early, 0 happens to be in the "future" for time_after, until
the wrap around occurs. So in your case, you just might have to wait
those 5 minutes to get the working behaviour, instead of the common case
in which it breaks after that time ;-)

A fix would likely initialize "when" to jiffies.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: time_after - what on earth???

2007-09-11 Thread Björn Steinbrink
On 2007.09.12 00:19:09 +0200, Rene Herman wrote:
> On 09/12/2007 12:15 AM, Adrian McMenamin wrote:
>
>> On 11/09/2007, Rene Herman <[EMAIL PROTECTED]> wrote:
>>> On 09/12/2007 12:05 AM, Adrian McMenamin wrote:
>>>
 OK, why does this line occasionally return true:

What exactly is "occassionally"?  Does it happen more than once per
boot? If not, and it happens after a certain time after booting, it
might be wrapping of the jiffie counter (see below).


   if ((maple_dev->interval > 0) && (jiffies >maple_dev->when))

 while this one never does (no other changes made):

 if  ((maple_dev->interval > 0) && (time_after(jiffies, 
 maple_dev->when)))
>>> Is maple_dev->when an unsigned long?
>>>
>> Yes. Does that make a difference?
>
> If it had been a signed type, it could've wrapped to something you didn't 
> expect, explaining the difference at least...
>
> With an unsigned long, the only diference should be that time_after() deals 
> with jiffie wrapping which I assume is not an actual problem here. I'll 
> retreat into the shades again... ;-(

If "occasionally" is limited to once per boot, it might be jiffie
wrapping. IIRC jiffies are initialized so that they wrap after about 5
minutes of uptime to reveal such bugs without forcing you to wait for
ages just to have the counter wrap for the first time.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: time_after - what on earth???

2007-09-11 Thread Björn Steinbrink
On 2007.09.12 00:19:09 +0200, Rene Herman wrote:
 On 09/12/2007 12:15 AM, Adrian McMenamin wrote:

 On 11/09/2007, Rene Herman [EMAIL PROTECTED] wrote:
 On 09/12/2007 12:05 AM, Adrian McMenamin wrote:

 OK, why does this line occasionally return true:

What exactly is occassionally?  Does it happen more than once per
boot? If not, and it happens after a certain time after booting, it
might be wrapping of the jiffie counter (see below).


   if ((maple_dev-interval  0)  (jiffies maple_dev-when))

 while this one never does (no other changes made):

 if  ((maple_dev-interval  0)  (time_after(jiffies, 
 maple_dev-when)))
 Is maple_dev-when an unsigned long?

 Yes. Does that make a difference?

 If it had been a signed type, it could've wrapped to something you didn't 
 expect, explaining the difference at least...

 With an unsigned long, the only diference should be that time_after() deals 
 with jiffie wrapping which I assume is not an actual problem here. I'll 
 retreat into the shades again... ;-(

If occasionally is limited to once per boot, it might be jiffie
wrapping. IIRC jiffies are initialized so that they wrap after about 5
minutes of uptime to reveal such bugs without forcing you to wait for
ages just to have the counter wrap for the first time.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: time_after - what on earth???

2007-09-11 Thread Björn Steinbrink
On 2007.09.12 00:10:19 +0100, Adrian McMenamin wrote:
 On 12/09/2007, Björn Steinbrink [EMAIL PROTECTED] wrote:
  On 2007.09.12 00:19:09 +0200, Rene Herman wrote:
   On 09/12/2007 12:15 AM, Adrian McMenamin wrote:
  
   On 11/09/2007, Rene Herman [EMAIL PROTECTED] wrote:
   On 09/12/2007 12:05 AM, Adrian McMenamin wrote:
  
   OK, why does this line occasionally return true:
 
  What exactly is occassionally?  Does it happen more than once per
  boot? If not, and it happens after a certain time after booting, it
  might be wrapping of the jiffie counter (see below).
 
  
 if ((maple_dev-interval  0)  (jiffies maple_dev-when))
  
   while this one never does (no other changes made):
  
   if  ((maple_dev-interval  0)  (time_after(jiffies,
   maple_dev-when)))
   Is maple_dev-when an unsigned long?
  
   Yes. Does that make a difference?
  
   If it had been a signed type, it could've wrapped to something you didn't
   expect, explaining the difference at least...
  
   With an unsigned long, the only diference should be that time_after() 
   deals
   with jiffie wrapping which I assume is not an actual problem here. I'll
   retreat into the shades again... ;-(
 
  If occasionally is limited to once per boot, it might be jiffie
  wrapping. IIRC jiffies are initialized so that they wrap after about 5
  minutes of uptime to reveal such bugs without forcing you to wait for
  ages just to have the counter wrap for the first time.
 
 
 No, I mean works properly - ie occasionally evaluates as true

Ehrm, yeah, I somehow parsed that as if it had a negation in there.

Anyway, I looked up the patches you posted. when is initialized to 0
and only changed if the above condition evaluates to true. Now,
time_after and  have different points at which future and past
are separated. time_after splits (about) equally between future and
past, so 0 can be either, depending on the value of jiffies. But for 
0 is almost always in the past, except for the seldom event of jiffies
being 0.

Now, given that jiffies start out at a huge value to make the counter
wrap around early, 0 happens to be in the future for time_after, until
the wrap around occurs. So in your case, you just might have to wait
those 5 minutes to get the working behaviour, instead of the common case
in which it breaks after that time ;-)

A fix would likely initialize when to jiffies.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Rescanning SCSI/SATA bus

2007-09-05 Thread Björn Steinbrink
On 2007.09.05 16:45:29 +0200, noah wrote:
> 2007/9/5, Björn Steinbrink <[EMAIL PROTECTED]>:
> > On 2007.09.05 16:04:13 +0200, noah wrote:
> > > 2007/9/5, Martin K. Petersen <[EMAIL PROTECTED]>:
> > > > >>>>> "Jan" == Jan Engelhardt <[EMAIL PROTECTED]> writes:
> > > >
> > > > Jan> 11:40 sun:../scsi_host/host0 # echo 1 >scan
> > > > Jan> -bash: echo: write error: Invalid argument
> > > >
> > > > Jan> What is the proper way to trigger a rescan, if it comes to be
> > > > Jan> necessary?
> > > >
> > > > echo "- - -" > scan
> >   ^
> > > >
> > > # echo --- > /sys/class/scsi_host/host9/scan
> >  ^^^
> > > -bash: echo: write error: Invalid argument
> >
> > Note the spaces between the dashes.
> 
> Thanks. Unfortunately it produced the same error.
> Is rescanning even supported with the sata_nv driver?

Works just fine here:
[EMAIL PROTECTED]:/sys/class/scsi_host/host0# echo - - - > scan 
[EMAIL PROTECTED]:/sys/class/scsi_host/host0# cat proc_name 
sata_nv

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Rescanning SCSI/SATA bus

2007-09-05 Thread Björn Steinbrink
On 2007.09.05 16:04:13 +0200, noah wrote:
> 2007/9/5, Martin K. Petersen <[EMAIL PROTECTED]>:
> > > "Jan" == Jan Engelhardt <[EMAIL PROTECTED]> writes:
> >
> > Jan> 11:40 sun:../scsi_host/host0 # echo 1 >scan
> > Jan> -bash: echo: write error: Invalid argument
> >
> > Jan> What is the proper way to trigger a rescan, if it comes to be
> > Jan> necessary?
> >
> > echo "- - -" > scan
  ^
> >
> # echo --- > /sys/class/scsi_host/host9/scan
 ^^^
> -bash: echo: write error: Invalid argument

Note the spaces between the dashes.
 - - -
not
 ---

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Rescanning SCSI/SATA bus

2007-09-05 Thread Björn Steinbrink
On 2007.09.05 16:04:13 +0200, noah wrote:
 2007/9/5, Martin K. Petersen [EMAIL PROTECTED]:
   Jan == Jan Engelhardt [EMAIL PROTECTED] writes:
 
  Jan 11:40 sun:../scsi_host/host0 # echo 1 scan
  Jan -bash: echo: write error: Invalid argument
 
  Jan What is the proper way to trigger a rescan, if it comes to be
  Jan necessary?
 
  echo - - -  scan
  ^
 
 # echo ---  /sys/class/scsi_host/host9/scan
 ^^^
 -bash: echo: write error: Invalid argument

Note the spaces between the dashes.
 - - -
not
 ---

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Rescanning SCSI/SATA bus

2007-09-05 Thread Björn Steinbrink
On 2007.09.05 16:45:29 +0200, noah wrote:
 2007/9/5, Björn Steinbrink [EMAIL PROTECTED]:
  On 2007.09.05 16:04:13 +0200, noah wrote:
   2007/9/5, Martin K. Petersen [EMAIL PROTECTED]:
 Jan == Jan Engelhardt [EMAIL PROTECTED] writes:
   
Jan 11:40 sun:../scsi_host/host0 # echo 1 scan
Jan -bash: echo: write error: Invalid argument
   
Jan What is the proper way to trigger a rescan, if it comes to be
Jan necessary?
   
echo - - -  scan
^
   
   # echo ---  /sys/class/scsi_host/host9/scan
   ^^^
   -bash: echo: write error: Invalid argument
 
  Note the spaces between the dashes.
 
 Thanks. Unfortunately it produced the same error.
 Is rescanning even supported with the sata_nv driver?

Works just fine here:
[EMAIL PROTECTED]:/sys/class/scsi_host/host0# echo - - -  scan 
[EMAIL PROTECTED]:/sys/class/scsi_host/host0# cat proc_name 
sata_nv

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [OT] good job guys with the anti-spam !

2007-09-01 Thread Björn Steinbrink
On 2007.09.02 02:36:18 +0200, Oleg Verych wrote:
> * Tue, 31 Jul 2007 23:40:11 +0200
> >
> []
> > eventhough people often write only when they have something to complain
> > about, I for once would like to congratulate Matti and David, our mail
> > admins, for the wonderful job they've done with the spams lately. This
> > month, I might have seen maybe one or two instead of perhaps 30 per day
> > earlier. While it did not bother me that much earlier, I can say that
> > the list is more pleasant to read every day, and I think that's great.
> >
> > Kudos guys !
> >
> > Willy
> >
> > PS: please do not start another one of those long useless threads from now 
> > on
> >
> 
> Instead let me share an idea and see what will happen (if any).
> The mailing list policy:
> 
> 1. sent mails must have in-reply-to (rfc2821 SHOULD) with valid, i.e.
>existing in archives, message-id;
> 
> 2. otherwise sender gets response about creating new thread. Replying
>to that message will enable (1).

Ehrm, you want everyone who wants to start a new thread to:

 - send an email
 - await response from the mail server
 - send the same email again as a reply to the first one

Did I get that right? If so, good luck running from anyone who doesn't
send his patch series as a bunch of replies to a 00/XX message.

Although it would likely be fun to see the next guy that sends a 600
emails patch bomb getting buried alive in "Please confirm that you want
to create a new thread" emails.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [OT] good job guys with the anti-spam !

2007-09-01 Thread Björn Steinbrink
On 2007.09.02 02:36:18 +0200, Oleg Verych wrote:
 * Tue, 31 Jul 2007 23:40:11 +0200
 
 []
  eventhough people often write only when they have something to complain
  about, I for once would like to congratulate Matti and David, our mail
  admins, for the wonderful job they've done with the spams lately. This
  month, I might have seen maybe one or two instead of perhaps 30 per day
  earlier. While it did not bother me that much earlier, I can say that
  the list is more pleasant to read every day, and I think that's great.
 
  Kudos guys !
 
  Willy
 
  PS: please do not start another one of those long useless threads from now 
  on
 
 
 Instead let me share an idea and see what will happen (if any).
 The mailing list policy:
 
 1. sent mails must have in-reply-to (rfc2821 SHOULD) with valid, i.e.
existing in archives, message-id;
 
 2. otherwise sender gets response about creating new thread. Replying
to that message will enable (1).

Ehrm, you want everyone who wants to start a new thread to:

 - send an email
 - await response from the mail server
 - send the same email again as a reply to the first one

Did I get that right? If so, good luck running from anyone who doesn't
send his patch series as a bunch of replies to a 00/XX message.

Although it would likely be fun to see the next guy that sends a 600
emails patch bomb getting buried alive in Please confirm that you want
to create a new thread emails.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nmi_watchdog=2 regression in 2.6.21

2007-08-31 Thread Björn Steinbrink
On 2007.08.31 17:24:46 -0700, Daniel Walker wrote:
> On Fri, 2007-08-31 at 20:06 +0200, Björn Steinbrink wrote:
> 
> 
> > > something to do with the nmi hertz adjustment that happens after
> > > check_nmi_watchdog() ..
> > 
> > Hm hm, does the same thing (watchdog stuck after check) happen with
> > older kernels, ie. those before Stephane's changeset that made it use
> > PERFCTR1?
> 
> I noticed the frequency gets turned down after check_nmi_watchdog() is
> called.. I think it's suppose to trigger once per second, but it's more
> like it updates randomly ..

It's once per second if the cpu is 100% busy, if it's just idling and
halted, the performance counters won't be increased.

> In older kernels it's very slow, but it's more consistent ..

With the same load on the box? Maybe some other changes caused the box
to behave differently (say, CFS), regarding eg. load distribution
amongst the cores.

> 
> Here is some output ..
> 
> morning-glory ~ # cat /proc/interrupts
>CPU0   CPU1   CPU2   CPU3
>   0:103  0  0  0   IO-APIC-edge  timer
>   1:  0  0  0  8   IO-APIC-edge  i8042
>   4:   2320  0  0  1   IO-APIC-edge  serial
>   8:  1  0  0  1   IO-APIC-edge  rtc
>  12:  0  0  0113   IO-APIC-edge  i8042
>  14:   1143  0  0 10   IO-APIC-edge  ide0
>  16:227  0  0  1   IO-APIC-fasteoi   
> uhci_hcd:usb2, eth0
>  18:  0  0  0  0   IO-APIC-fasteoi   
> ehci_hcd:usb1
>  19:  0  0  0  0   IO-APIC-fasteoi   
> uhci_hcd:usb3
>  20:  0  0  0  1   IO-APIC-fasteoi   acpi
> NMI:150168124121
> LOC:   6188   6189   6187   6184
> ERR:  0
> MIS:  0
> morning-glory ~ # cat /proc/interrupts 
>CPU0   CPU1   CPU2   CPU3   
>   0:103  0  0  0   IO-APIC-edge  timer
>   1:  0  0  0  8   IO-APIC-edge  i8042
>   4:   2391  0  0  1   IO-APIC-edge  serial
>   8:  1  0  0  1   IO-APIC-edge  rtc
>  12:  0  0  0113   IO-APIC-edge  i8042
>  14:   1143  0  0 10   IO-APIC-edge  ide0
>  16:872  0  0  1   IO-APIC-fasteoi   
> uhci_hcd:usb2, eth0
>  18:  0  0  0  0   IO-APIC-fasteoi   
> ehci_hcd:usb1
>  19:  0  0  0  0   IO-APIC-fasteoi   
> uhci_hcd:usb3
>  20:  0  0  0  1   IO-APIC-fasteoi   acpi
> NMI:151168124121 
> LOC:  21443  21444  21442  21439 
> ERR:  0
> MIS:  0
> dwalker2 ~ # 
> 
> 
> If you look at the LOC values you'll notice a lot of time has passed,
> with only one NMI and on only one cpu ..
> 
> It's possible this is something else completely tho ..

At least from the interrupt side, that box looks pretty idle, so that's
expected I'd say.

> 
> > Maybe you could "activate" the Dprintk in write_watchdog_counter32() to
> > see which value gets written to the MSR? (I don't see any switch to
> > activate it, so maybe just s/Dprintk(/printk(KERN_WHATEVER / ?)
> 
> Here's the only lines printed,
> 
> setting INTEL_ARCH_PERFCTR0 to -0x0131385e
> setting INTEL_ARCH_PERFCTR0 to -0x0131385e
> setting INTEL_ARCH_PERFCTR0 to -0x0131385e
> setting INTEL_ARCH_PERFCTR0 to -0x0131385e

Ok, dumb I am.

The "interesting" call from p6_rearm passes NULL as desc, and thus the
printk is never called that way. But before we start flooding your logs,
could you just hog all cores with a simple cpu hog and check if that
causes the NMI counter to increase at about 1 Hz? If that doesn't work,
we can go back to that debug output.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nmi_watchdog=2 regression in 2.6.21

2007-08-31 Thread Björn Steinbrink
On 2007.08.31 09:35:23 -0700, Daniel Walker wrote:
> On Fri, 2007-08-31 at 09:21 -0700, Stephane Eranian wrote:
> > In this patch, the setup_*() routine now extract the MSR from the wd_ops
> > to copy them into the nmi_watchdog_ctlblk. This is not done for P4 because
> > of the special and ugly case of HT. 
> > 
> > With this approach, we can now create a custom wd_ops for CoreDuo that is
> > a clone of the intel_arch_wd_ops, except for the MSR.
> > 
> > Could you try this one instead?
> 
> So I tested your patch unchanged and the system boots, and the
> check_nmi_watchdog() passes .. However, the nmi stops ticking right
> after bootup,
> 
> >>From my /proc/interrupts below,
> 
>CPU0   CPU1   CPU2   CPU3   
>   0:108  0  0  0   IO-APIC-edge  timer
>   1:  0  0  0  8   IO-APIC-edge  i8042
>   4:   3427  0  0  1   IO-APIC-edge  serial
>   8:  1  0  0  1   IO-APIC-edge  rtc
>  12:  0  0  0113   IO-APIC-edge  i8042
>  14:   1128  0  0 10   IO-APIC-edge  ide0
>  16:   1664  0  0  1   IO-APIC-fasteoi   
> uhci_hcd:usb2, eth0
>  18:  0  0  0  0   IO-APIC-fasteoi   
> ehci_hcd:usb1
>  19:  0  0  0  0   IO-APIC-fasteoi   
> uhci_hcd:usb3
>  20:  0  0  0  1   IO-APIC-fasteoi   acpi
> NMI:   1670   1453   1097967 
> LOC:  48001  48002  48000  48006 
> ERR:  0
> MIS:  0
> 
> 
> The NMI field never changes ..
> 
> So I added another change which looked appropriate,
> 
> @@ -674,6 +688,7 @@ unsigned lapic_adjust_nmi_hz(unsigned hz
>  {
> struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
> if (wd->perfctr_msr == MSR_P6_PERFCTR0 ||
> +   wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0 ||
> wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1)
> hz = adjust_for_32bit_ctr(hz);
> return hz;
> 
> 
> Unfortunately that didn't fix anything, but I have a feeling is has

That's because MSR_P6_PERFCTR0 is the same as MSR_ARCH_PERFMON_PERFCTR0.
But I'd personally add that change anyway, as it makes the code less
tricky and the compiler should optimize it away.

> something to do with the nmi hertz adjustment that happens after
> check_nmi_watchdog() ..

Hm hm, does the same thing (watchdog stuck after check) happen with
older kernels, ie. those before Stephane's changeset that made it use
PERFCTR1?

Maybe you could "activate" the Dprintk in write_watchdog_counter32() to
see which value gets written to the MSR? (I don't see any switch to
activate it, so maybe just s/Dprintk(/printk(KERN_WHATEVER / ?)

Thanks,
Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nmi_watchdog=2 regression in 2.6.21

2007-08-31 Thread Björn Steinbrink
On 2007.08.31 09:35:23 -0700, Daniel Walker wrote:
 On Fri, 2007-08-31 at 09:21 -0700, Stephane Eranian wrote:
  In this patch, the setup_*() routine now extract the MSR from the wd_ops
  to copy them into the nmi_watchdog_ctlblk. This is not done for P4 because
  of the special and ugly case of HT. 
  
  With this approach, we can now create a custom wd_ops for CoreDuo that is
  a clone of the intel_arch_wd_ops, except for the MSR.
  
  Could you try this one instead?
 
 So I tested your patch unchanged and the system boots, and the
 check_nmi_watchdog() passes .. However, the nmi stops ticking right
 after bootup,
 
 From my /proc/interrupts below,
 
CPU0   CPU1   CPU2   CPU3   
   0:108  0  0  0   IO-APIC-edge  timer
   1:  0  0  0  8   IO-APIC-edge  i8042
   4:   3427  0  0  1   IO-APIC-edge  serial
   8:  1  0  0  1   IO-APIC-edge  rtc
  12:  0  0  0113   IO-APIC-edge  i8042
  14:   1128  0  0 10   IO-APIC-edge  ide0
  16:   1664  0  0  1   IO-APIC-fasteoi   
 uhci_hcd:usb2, eth0
  18:  0  0  0  0   IO-APIC-fasteoi   
 ehci_hcd:usb1
  19:  0  0  0  0   IO-APIC-fasteoi   
 uhci_hcd:usb3
  20:  0  0  0  1   IO-APIC-fasteoi   acpi
 NMI:   1670   1453   1097967 
 LOC:  48001  48002  48000  48006 
 ERR:  0
 MIS:  0
 
 
 The NMI field never changes ..
 
 So I added another change which looked appropriate,
 
 @@ -674,6 +688,7 @@ unsigned lapic_adjust_nmi_hz(unsigned hz
  {
 struct nmi_watchdog_ctlblk *wd = __get_cpu_var(nmi_watchdog_ctlblk);
 if (wd-perfctr_msr == MSR_P6_PERFCTR0 ||
 +   wd-perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0 ||
 wd-perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1)
 hz = adjust_for_32bit_ctr(hz);
 return hz;
 
 
 Unfortunately that didn't fix anything, but I have a feeling is has

That's because MSR_P6_PERFCTR0 is the same as MSR_ARCH_PERFMON_PERFCTR0.
But I'd personally add that change anyway, as it makes the code less
tricky and the compiler should optimize it away.

 something to do with the nmi hertz adjustment that happens after
 check_nmi_watchdog() ..

Hm hm, does the same thing (watchdog stuck after check) happen with
older kernels, ie. those before Stephane's changeset that made it use
PERFCTR1?

Maybe you could activate the Dprintk in write_watchdog_counter32() to
see which value gets written to the MSR? (I don't see any switch to
activate it, so maybe just s/Dprintk(/printk(KERN_WHATEVER / ?)

Thanks,
Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nmi_watchdog=2 regression in 2.6.21

2007-08-31 Thread Björn Steinbrink
On 2007.08.31 17:24:46 -0700, Daniel Walker wrote:
 On Fri, 2007-08-31 at 20:06 +0200, Björn Steinbrink wrote:
 
 
   something to do with the nmi hertz adjustment that happens after
   check_nmi_watchdog() ..
  
  Hm hm, does the same thing (watchdog stuck after check) happen with
  older kernels, ie. those before Stephane's changeset that made it use
  PERFCTR1?
 
 I noticed the frequency gets turned down after check_nmi_watchdog() is
 called.. I think it's suppose to trigger once per second, but it's more
 like it updates randomly ..

It's once per second if the cpu is 100% busy, if it's just idling and
halted, the performance counters won't be increased.

 In older kernels it's very slow, but it's more consistent ..

With the same load on the box? Maybe some other changes caused the box
to behave differently (say, CFS), regarding eg. load distribution
amongst the cores.

 
 Here is some output ..
 
 morning-glory ~ # cat /proc/interrupts
CPU0   CPU1   CPU2   CPU3
   0:103  0  0  0   IO-APIC-edge  timer
   1:  0  0  0  8   IO-APIC-edge  i8042
   4:   2320  0  0  1   IO-APIC-edge  serial
   8:  1  0  0  1   IO-APIC-edge  rtc
  12:  0  0  0113   IO-APIC-edge  i8042
  14:   1143  0  0 10   IO-APIC-edge  ide0
  16:227  0  0  1   IO-APIC-fasteoi   
 uhci_hcd:usb2, eth0
  18:  0  0  0  0   IO-APIC-fasteoi   
 ehci_hcd:usb1
  19:  0  0  0  0   IO-APIC-fasteoi   
 uhci_hcd:usb3
  20:  0  0  0  1   IO-APIC-fasteoi   acpi
 NMI:150168124121
 LOC:   6188   6189   6187   6184
 ERR:  0
 MIS:  0
 morning-glory ~ # cat /proc/interrupts 
CPU0   CPU1   CPU2   CPU3   
   0:103  0  0  0   IO-APIC-edge  timer
   1:  0  0  0  8   IO-APIC-edge  i8042
   4:   2391  0  0  1   IO-APIC-edge  serial
   8:  1  0  0  1   IO-APIC-edge  rtc
  12:  0  0  0113   IO-APIC-edge  i8042
  14:   1143  0  0 10   IO-APIC-edge  ide0
  16:872  0  0  1   IO-APIC-fasteoi   
 uhci_hcd:usb2, eth0
  18:  0  0  0  0   IO-APIC-fasteoi   
 ehci_hcd:usb1
  19:  0  0  0  0   IO-APIC-fasteoi   
 uhci_hcd:usb3
  20:  0  0  0  1   IO-APIC-fasteoi   acpi
 NMI:151168124121 
 LOC:  21443  21444  21442  21439 
 ERR:  0
 MIS:  0
 dwalker2 ~ # 
 
 
 If you look at the LOC values you'll notice a lot of time has passed,
 with only one NMI and on only one cpu ..
 
 It's possible this is something else completely tho ..

At least from the interrupt side, that box looks pretty idle, so that's
expected I'd say.

 
  Maybe you could activate the Dprintk in write_watchdog_counter32() to
  see which value gets written to the MSR? (I don't see any switch to
  activate it, so maybe just s/Dprintk(/printk(KERN_WHATEVER / ?)
 
 Here's the only lines printed,
 
 setting INTEL_ARCH_PERFCTR0 to -0x0131385e
 setting INTEL_ARCH_PERFCTR0 to -0x0131385e
 setting INTEL_ARCH_PERFCTR0 to -0x0131385e
 setting INTEL_ARCH_PERFCTR0 to -0x0131385e

Ok, dumb I am.

The interesting call from p6_rearm passes NULL as desc, and thus the
printk is never called that way. But before we start flooding your logs,
could you just hog all cores with a simple cpu hog and check if that
causes the NMI counter to increase at about 1 Hz? If that doesn't work,
we can go back to that debug output.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] avoid negative (and full-width) shifts in radix-tree.c, take 3

2007-08-29 Thread Björn Steinbrink
On 2007.08.29 10:49:12 -0700, Christoph Lameter wrote:
> On Wed, 29 Aug 2007, Peter Lund wrote:
> 
> >  
> > -   if (tmp >= RADIX_TREE_INDEX_BITS)
> > -   index = ~0UL;
> > -   return index;
> > +   if (shift < 0)
> > +   return ~0UL;
> > +   if (shift >= 8 * sizeof(unsigned long))
> 
> 8* sizeof(unsigned long) is the constant BITS_PER_LONG.

Or in this context: RADIX_TREE_INDEX_BITS, which happens to be the old
check, which just needed to be extended to cover negative shifts.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] avoid negative (and full-width) shifts in radix-tree.c, take 3

2007-08-29 Thread Björn Steinbrink
On 2007.08.29 10:49:12 -0700, Christoph Lameter wrote:
 On Wed, 29 Aug 2007, Peter Lund wrote:
 
   
  -   if (tmp = RADIX_TREE_INDEX_BITS)
  -   index = ~0UL;
  -   return index;
  +   if (shift  0)
  +   return ~0UL;
  +   if (shift = 8 * sizeof(unsigned long))
 
 8* sizeof(unsigned long) is the constant BITS_PER_LONG.

Or in this context: RADIX_TREE_INDEX_BITS, which happens to be the old
check, which just needed to be extended to cover negative shifts.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nmi_watchdog=2 regression in 2.6.21

2007-08-08 Thread Björn Steinbrink
On 2007.08.07 17:06:49 -0700, Daniel Walker wrote:
> 
> This patch below hangs my system on boot if I set nmi_watchdog=2 . It
> shows the NMI as stuck then the system hangs .. nmi_watchdog=1 works
> fine, and the system boots without any watchdog options ..
> 
> The machine is an Intel allagash development board, and it has two dual
> core Pentium-M cpus. I attached the .config I used.

Should be fixed in commit e82f64e5bb0648a13630d752c35be1e7bd8bab96
(2.6.23-rc1 IIRC).

Unfornately it didn't make it into 2.6.22, but the patch should work
even with 2.6.21 IIRC.

Björn

> 
> bf8696ed6dfa561198b4736deaf11ab68dcc4845 is first bad commit
> commit bf8696ed6dfa561198b4736deaf11ab68dcc4845
> Author: Stephane Eranian <[EMAIL PROTECTED]>
> Date:   Wed May 2 19:27:05 2007 +0200
> 
> [PATCH] i386: i386 make NMI use PERFCTR1 for architectural perfmon (take 
> 2)
> 
> Hello,
> 
> This patch against 2.6.20-git14 makes the NMI watchdog use 
> PERFSEL1/PERFCTR1
> instead of PERFSEL0/PERFCTR0 on processors supporting Intel architectural
> perfmon, such as Intel Core 2. Although all PMU events can work on
> both counters, the Precise Event-Based Sampling (PEBS) requires that the
> event be in PERFCTR0 to work correctly (see section 18.14.4.1 in the
> IA32 SDM Vol 3b).
> 
> A similar patch for x86-64 is to follow.
> 
> Changelog:
> - make the i386 NMI watchdog use PERFSEL1/PERFCTR1 instead of 
> PERFSEL0/PERFCTR0
>   on processors supporting the Intel architectural perfmon (e.g. 
> Core 2 Duo).
>   This allows PEBS to work when the NMI watchdog is active.
> 
> signed-off-by: stephane eranian <[EMAIL PROTECTED]>
> 
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
> 
> :04 04 9cee0745798cb56da2ca82032b8ee88a2c32700a 
> f59ee02e3cd8f13503edb7312a3494f4d7ec0069 M  arch
> 

> #
> # Automatically generated make config: don't edit
> # Linux kernel version: 2.6.21
> # Tue Aug  7 16:40:44 2007
> #
> CONFIG_X86_32=y
> CONFIG_GENERIC_TIME=y
> CONFIG_CLOCKSOURCE_WATCHDOG=y
> CONFIG_GENERIC_CLOCKEVENTS=y
> CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_SEMAPHORE_SLEEPERS=y
> CONFIG_X86=y
> CONFIG_MMU=y
> CONFIG_ZONE_DMA=y
> CONFIG_GENERIC_ISA_DMA=y
> CONFIG_GENERIC_IOMAP=y
> CONFIG_GENERIC_BUG=y
> CONFIG_GENERIC_HWEIGHT=y
> CONFIG_ARCH_MAY_HAVE_PC_FDC=y
> CONFIG_DMI=y
> CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
> 
> #
> # Code maturity level options
> #
> CONFIG_EXPERIMENTAL=y
> CONFIG_LOCK_KERNEL=y
> CONFIG_INIT_ENV_ARG_LIMIT=32
> 
> #
> # General setup
> #
> CONFIG_LOCALVERSION=""
> # CONFIG_LOCALVERSION_AUTO is not set
> CONFIG_SWAP=y
> CONFIG_SYSVIPC=y
> # CONFIG_IPC_NS is not set
> CONFIG_SYSVIPC_SYSCTL=y
> # CONFIG_POSIX_MQUEUE is not set
> # CONFIG_BSD_PROCESS_ACCT is not set
> # CONFIG_TASKSTATS is not set
> # CONFIG_UTS_NS is not set
> # CONFIG_AUDIT is not set
> CONFIG_IKCONFIG=y
> CONFIG_IKCONFIG_PROC=y
> # CONFIG_CPUSETS is not set
> CONFIG_SYSFS_DEPRECATED=y
> # CONFIG_RELAY is not set
> # CONFIG_BLK_DEV_INITRD is not set
> # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
> CONFIG_SYSCTL=y
> # CONFIG_EMBEDDED is not set
> CONFIG_UID16=y
> CONFIG_SYSCTL_SYSCALL=y
> CONFIG_KALLSYMS=y
> # CONFIG_KALLSYMS_ALL is not set
> # CONFIG_KALLSYMS_EXTRA_PASS is not set
> CONFIG_HOTPLUG=y
> CONFIG_PRINTK=y
> CONFIG_BUG=y
> CONFIG_ELF_CORE=y
> CONFIG_BASE_FULL=y
> CONFIG_FUTEX=y
> CONFIG_EPOLL=y
> CONFIG_SHMEM=y
> CONFIG_SLAB=y
> CONFIG_VM_EVENT_COUNTERS=y
> CONFIG_RT_MUTEXES=y
> # CONFIG_TINY_SHMEM is not set
> CONFIG_BASE_SMALL=0
> # CONFIG_SLOB is not set
> 
> #
> # Loadable module support
> #
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> CONFIG_MODULE_FORCE_UNLOAD=y
> # CONFIG_MODVERSIONS is not set
> # CONFIG_MODULE_SRCVERSION_ALL is not set
> CONFIG_KMOD=y
> CONFIG_STOP_MACHINE=y
> 
> #
> # Block layer
> #
> CONFIG_BLOCK=y
> # CONFIG_LBD is not set
> # CONFIG_BLK_DEV_IO_TRACE is not set
> # CONFIG_LSF is not set
> 
> #
> # IO Schedulers
> #
> CONFIG_IOSCHED_NOOP=y
> # CONFIG_IOSCHED_AS is not set
> # CONFIG_IOSCHED_DEADLINE is not set
> CONFIG_IOSCHED_CFQ=y
> # CONFIG_DEFAULT_AS is not set
> # CONFIG_DEFAULT_DEADLINE is not set
> CONFIG_DEFAULT_CFQ=y
> # CONFIG_DEFAULT_NOOP is not set
> CONFIG_DEFAULT_IOSCHED="cfq"
> 
> #
> # Processor type and features
> #
> # CONFIG_TICK_ONESHOT is not set
> # CONFIG_NO_HZ is not set
> # CONFIG_HIGH_RES_TIMERS is not set
> CONFIG_SMP=y
> CONFIG_X86_PC=y
> # CONFIG_X86_ELAN is not set
> # CONFIG_X86_VOYAGER is not set
> # CONFIG_X86_NUMAQ is not set
> # CONFIG_X86_SUMMIT is not set
> # CONFIG_X86_BIGSMP is not set
> # CONFIG_X86_VISWS is not set
> # CONFIG_X86_GENERICARCH is not set
> # CONFIG_X86_ES7000 is not set
> # CONFIG_PARAVIRT is not set
> # CONFIG_M386 is not set
> # CONFIG_M486 is not set
> # CONFIG_M586 is not set
> # CONFIG_M586TSC is not set
> # CONFIG_M586MMX is not set
> # CONFIG_M686 is not set
> # CONFIG_MPENTIUMII is 

Re: nmi_watchdog=2 regression in 2.6.21

2007-08-08 Thread Björn Steinbrink
On 2007.08.07 17:06:49 -0700, Daniel Walker wrote:
 
 This patch below hangs my system on boot if I set nmi_watchdog=2 . It
 shows the NMI as stuck then the system hangs .. nmi_watchdog=1 works
 fine, and the system boots without any watchdog options ..
 
 The machine is an Intel allagash development board, and it has two dual
 core Pentium-M cpus. I attached the .config I used.

Should be fixed in commit e82f64e5bb0648a13630d752c35be1e7bd8bab96
(2.6.23-rc1 IIRC).

Unfornately it didn't make it into 2.6.22, but the patch should work
even with 2.6.21 IIRC.

Björn

 
 bf8696ed6dfa561198b4736deaf11ab68dcc4845 is first bad commit
 commit bf8696ed6dfa561198b4736deaf11ab68dcc4845
 Author: Stephane Eranian [EMAIL PROTECTED]
 Date:   Wed May 2 19:27:05 2007 +0200
 
 [PATCH] i386: i386 make NMI use PERFCTR1 for architectural perfmon (take 
 2)
 
 Hello,
 
 This patch against 2.6.20-git14 makes the NMI watchdog use 
 PERFSEL1/PERFCTR1
 instead of PERFSEL0/PERFCTR0 on processors supporting Intel architectural
 perfmon, such as Intel Core 2. Although all PMU events can work on
 both counters, the Precise Event-Based Sampling (PEBS) requires that the
 event be in PERFCTR0 to work correctly (see section 18.14.4.1 in the
 IA32 SDM Vol 3b).
 
 A similar patch for x86-64 is to follow.
 
 Changelog:
 - make the i386 NMI watchdog use PERFSEL1/PERFCTR1 instead of 
 PERFSEL0/PERFCTR0
   on processors supporting the Intel architectural perfmon (e.g. 
 Core 2 Duo).
   This allows PEBS to work when the NMI watchdog is active.
 
 signed-off-by: stephane eranian [EMAIL PROTECTED]
 
 Signed-off-by: Andi Kleen [EMAIL PROTECTED]
 
 :04 04 9cee0745798cb56da2ca82032b8ee88a2c32700a 
 f59ee02e3cd8f13503edb7312a3494f4d7ec0069 M  arch
 

 #
 # Automatically generated make config: don't edit
 # Linux kernel version: 2.6.21
 # Tue Aug  7 16:40:44 2007
 #
 CONFIG_X86_32=y
 CONFIG_GENERIC_TIME=y
 CONFIG_CLOCKSOURCE_WATCHDOG=y
 CONFIG_GENERIC_CLOCKEVENTS=y
 CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
 CONFIG_LOCKDEP_SUPPORT=y
 CONFIG_STACKTRACE_SUPPORT=y
 CONFIG_SEMAPHORE_SLEEPERS=y
 CONFIG_X86=y
 CONFIG_MMU=y
 CONFIG_ZONE_DMA=y
 CONFIG_GENERIC_ISA_DMA=y
 CONFIG_GENERIC_IOMAP=y
 CONFIG_GENERIC_BUG=y
 CONFIG_GENERIC_HWEIGHT=y
 CONFIG_ARCH_MAY_HAVE_PC_FDC=y
 CONFIG_DMI=y
 CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config
 
 #
 # Code maturity level options
 #
 CONFIG_EXPERIMENTAL=y
 CONFIG_LOCK_KERNEL=y
 CONFIG_INIT_ENV_ARG_LIMIT=32
 
 #
 # General setup
 #
 CONFIG_LOCALVERSION=
 # CONFIG_LOCALVERSION_AUTO is not set
 CONFIG_SWAP=y
 CONFIG_SYSVIPC=y
 # CONFIG_IPC_NS is not set
 CONFIG_SYSVIPC_SYSCTL=y
 # CONFIG_POSIX_MQUEUE is not set
 # CONFIG_BSD_PROCESS_ACCT is not set
 # CONFIG_TASKSTATS is not set
 # CONFIG_UTS_NS is not set
 # CONFIG_AUDIT is not set
 CONFIG_IKCONFIG=y
 CONFIG_IKCONFIG_PROC=y
 # CONFIG_CPUSETS is not set
 CONFIG_SYSFS_DEPRECATED=y
 # CONFIG_RELAY is not set
 # CONFIG_BLK_DEV_INITRD is not set
 # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
 CONFIG_SYSCTL=y
 # CONFIG_EMBEDDED is not set
 CONFIG_UID16=y
 CONFIG_SYSCTL_SYSCALL=y
 CONFIG_KALLSYMS=y
 # CONFIG_KALLSYMS_ALL is not set
 # CONFIG_KALLSYMS_EXTRA_PASS is not set
 CONFIG_HOTPLUG=y
 CONFIG_PRINTK=y
 CONFIG_BUG=y
 CONFIG_ELF_CORE=y
 CONFIG_BASE_FULL=y
 CONFIG_FUTEX=y
 CONFIG_EPOLL=y
 CONFIG_SHMEM=y
 CONFIG_SLAB=y
 CONFIG_VM_EVENT_COUNTERS=y
 CONFIG_RT_MUTEXES=y
 # CONFIG_TINY_SHMEM is not set
 CONFIG_BASE_SMALL=0
 # CONFIG_SLOB is not set
 
 #
 # Loadable module support
 #
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 CONFIG_MODULE_FORCE_UNLOAD=y
 # CONFIG_MODVERSIONS is not set
 # CONFIG_MODULE_SRCVERSION_ALL is not set
 CONFIG_KMOD=y
 CONFIG_STOP_MACHINE=y
 
 #
 # Block layer
 #
 CONFIG_BLOCK=y
 # CONFIG_LBD is not set
 # CONFIG_BLK_DEV_IO_TRACE is not set
 # CONFIG_LSF is not set
 
 #
 # IO Schedulers
 #
 CONFIG_IOSCHED_NOOP=y
 # CONFIG_IOSCHED_AS is not set
 # CONFIG_IOSCHED_DEADLINE is not set
 CONFIG_IOSCHED_CFQ=y
 # CONFIG_DEFAULT_AS is not set
 # CONFIG_DEFAULT_DEADLINE is not set
 CONFIG_DEFAULT_CFQ=y
 # CONFIG_DEFAULT_NOOP is not set
 CONFIG_DEFAULT_IOSCHED=cfq
 
 #
 # Processor type and features
 #
 # CONFIG_TICK_ONESHOT is not set
 # CONFIG_NO_HZ is not set
 # CONFIG_HIGH_RES_TIMERS is not set
 CONFIG_SMP=y
 CONFIG_X86_PC=y
 # CONFIG_X86_ELAN is not set
 # CONFIG_X86_VOYAGER is not set
 # CONFIG_X86_NUMAQ is not set
 # CONFIG_X86_SUMMIT is not set
 # CONFIG_X86_BIGSMP is not set
 # CONFIG_X86_VISWS is not set
 # CONFIG_X86_GENERICARCH is not set
 # CONFIG_X86_ES7000 is not set
 # CONFIG_PARAVIRT is not set
 # CONFIG_M386 is not set
 # CONFIG_M486 is not set
 # CONFIG_M586 is not set
 # CONFIG_M586TSC is not set
 # CONFIG_M586MMX is not set
 # CONFIG_M686 is not set
 # CONFIG_MPENTIUMII is not set
 # CONFIG_MPENTIUMIII is not set
 CONFIG_MPENTIUMM=y
 # CONFIG_MCORE2 is not set
 # CONFIG_MPENTIUM4 is not set
 # CONFIG_MK6 is not set
 # CONFIG_MK7 is not set
 # CONFIG_MK8 

Re: [PATCH] Fix msr register allocation

2007-07-30 Thread Björn Steinbrink
On 2007.07.30 12:25:54 -0300, Glauber de Oliveira Costa wrote:
> Since the value in ret will go through a return statement,
 ^^^
You mean "err" I guess?

> it does not need to be put in eax register directly. Instead,
> we let the compiler do his job and choose what to do with it,
> potentially getting a better register allocation.
> 
> Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> 
> diff --git a/include/asm-i386/msr.h b/include/asm-i386/msr.h
> index df21ea0..cc4263c 100644
> --- a/include/asm-i386/msr.h
> +++ b/include/asm-i386/msr.h
> @@ -54,7 +54,7 @@ static inline int native_write_msr_safe(unsigned int msr,
>"   .align 4\n\t"
>"   .long  2b,3b\n\t"
>".previous"
> -  : "=a" (err)
> +  : "=r" (err)
>: "c" (msr), "0" ((u32)val), "d" ((u32)(val>>32)),
>  "i" (-EFAULT));
>   return err;

Note that the EAX output constraint is re-used in the input section for
(u32)val, i.e. the lower half of the value to be written. And "wrmsr"
needs that in EAX, so you cannot change the output constraint without
touching the input constraint.

Also, I do not see how the compiler could do any better than having the
return value already in EAX, but that doesn't really mean anything ;-)

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix msr register allocation

2007-07-30 Thread Björn Steinbrink
On 2007.07.30 12:25:54 -0300, Glauber de Oliveira Costa wrote:
 Since the value in ret will go through a return statement,
 ^^^
You mean err I guess?

 it does not need to be put in eax register directly. Instead,
 we let the compiler do his job and choose what to do with it,
 potentially getting a better register allocation.
 
 Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
 
 diff --git a/include/asm-i386/msr.h b/include/asm-i386/msr.h
 index df21ea0..cc4263c 100644
 --- a/include/asm-i386/msr.h
 +++ b/include/asm-i386/msr.h
 @@ -54,7 +54,7 @@ static inline int native_write_msr_safe(unsigned int msr,
   .align 4\n\t
   .long  2b,3b\n\t
.previous
 -  : =a (err)
 +  : =r (err)
: c (msr), 0 ((u32)val), d ((u32)(val32)),
  i (-EFAULT));
   return err;

Note that the EAX output constraint is re-used in the input section for
(u32)val, i.e. the lower half of the value to be written. And wrmsr
needs that in EAX, so you cannot change the output constraint without
touching the input constraint.

Also, I do not see how the compiler could do any better than having the
return value already in EAX, but that doesn't really mean anything ;-)

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]

2007-07-27 Thread Björn Steinbrink
On 2007.07.28 01:29:19 +0200, Andi Kleen wrote:
> > Any faults in that reasoning?
> 
> GNU sort uses a merge sort with temporary files on disk. Not sure
> how much it keeps in memory during that, but it's probably less
> than 150MB. At some point the dirty limit should kick in and write back the 
> data of the temporary files; so it's not quite the same as anonymous memory. 
> But it's not that different given.

Hm, does that change anything? The files need to be read at the end (so
they go into the cache) and are delete afterwards (cache gets freed I
guess?).

> It would be better to measure than to guess. At least Andrew's measurements
> on 128MB actually didn't show updatedb being really that big a problem.

Here's a before/after memory usage for an updatedb run:
[EMAIL PROTECTED]:~# free -m
 total   used   free sharedbuffers cached
Mem:  2011   1995 15  0269779
-/+ buffers/cache:946   1064
Swap: 1945  0   1945
[EMAIL PROTECTED]:~# updatedb
[EMAIL PROTECTED]:~# free -m
 total   used   free sharedbuffers cached
Mem:  2011   1914 96  0209746
-/+ buffers/cache:958   1052
Swap: 1945  0   1944

81MB more unused RAM afterwards.

If anyone can make use of that, here's a snippet from /proc/$PID/smaps
of updatedb's sort process, when it was at about its peak memory usage
(according to the RSS column in top), which was about 50MB.

2b90ab3c1000-2b90ae4c3000 rw-p 2b90ab3c1000 00:00 0 
Size:  50184 kB
Rss:   50184 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty: 50184 kB
Referenced:50184 kB

> Perhaps some people have much more files or simply a less efficient
> updatedb implementation?

sort (GNU coreutils) 5.97

GNU updatedb version 4.2.31

> I guess the people who complain here that loudly really need to supply
> some real numbers. 

Just to clarify: I'm not complaining either way, neither about not
merging swap prefetch, nor about someone wanting that to be merge. It
was rather the "discussion" that caught my attention... Just in case ;-)

Björn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]

2007-07-27 Thread Björn Steinbrink
On 2007.07.27 20:16:32 +0200, Rene Herman wrote:
> On 07/27/2007 07:45 PM, Daniel Hazelton wrote:
>
>> Updatedb or another process that uses the FS heavily runs on a users
>> 256MB P3-800 (when it is idle) and the VFS caches grow, causing memory
>> pressure that causes other applications to be swapped to disk. In the
>> morning the user has to wait for the system to swap those applications
>> back in.
>> Questions about it:
>> Q) Does swap-prefetch help with this? A) [From all reports I've seen (*)] 
>> Yes, it does. 
>
> No it does not. If updatedb filled memory to the point of causing swapping 
> (which noone is reproducing anyway) it HAS FILLED MEMORY and swap-prefetch 
> hasn't any memory to prefetch into -- updatedb itself doesn't use any 
> significant memory.
>
> Here's swap-prefetch's author saying the same:
>
> http://lkml.org/lkml/2007/2/9/112
>
> | It can't help the updatedb scenario. Updatedb leaves the ram full and
> | swap prefetch wants to cost as little as possible so it will never
> | move anything out of ram in preference for the pages it wants to swap
> | back in.
>
> Now please finally either understand this, or tell us how we're wrong.

Con might have been wrong there for boxes with really little memory.

My desktop box has not even 300k inodes in use (IIRC someone posted a df
-i output showing 1 million inodes in use). Still, the memory footprint
of the "sort" process grows up to about 50MB. Assuming that the average
filename length stays, that would mean 150MB for the 1 million inode
case, just for the "sort" process.

Now, sort cannot produce any output before its got all its input, so
that RSS usage exists at least as long as the VFS cache is growing due
to the ongoing search for files.

And then, all that memory that "sort" uses is required, because sort
needs to output its results. So if there's memory pressure, the VFS
cache is likely to be dropped, because "sort" needs its data, for
sorting and producing output. And then sort terminates and leaves that
whole lot of memory _unused_. The other actions of updatedb only touch
the locate db, which is just a few megs (4.5MB here) big so the cache
won't grow that much again.

OK, so we got about, say, at least 128MB of totally unused memory, maybe
even more. If you look at the vmstat output I sent, you see that I had
between 90MB and 128MB free, depending on the swappiness setting, with
increased inode usage, that could very well scale up.

Conclusion: updatedb does _not_ leave the RAM full. And for a box with
little memory (say 256MB) it might even be 50% or more memory that is
free after updatedb ran. Might that make swap prefetch kick in?


Any faults in that reasoning?

Thanks,
Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFT: updatedb morning after problem [was: Re: -mm merge plans for 2.6.23]

2007-07-27 Thread Björn Steinbrink
On 2007.07.27 20:16:32 +0200, Rene Herman wrote:
 On 07/27/2007 07:45 PM, Daniel Hazelton wrote:

 Updatedb or another process that uses the FS heavily runs on a users
 256MB P3-800 (when it is idle) and the VFS caches grow, causing memory
 pressure that causes other applications to be swapped to disk. In the
 morning the user has to wait for the system to swap those applications
 back in.
 Questions about it:
 Q) Does swap-prefetch help with this? A) [From all reports I've seen (*)] 
 Yes, it does. 

 No it does not. If updatedb filled memory to the point of causing swapping 
 (which noone is reproducing anyway) it HAS FILLED MEMORY and swap-prefetch 
 hasn't any memory to prefetch into -- updatedb itself doesn't use any 
 significant memory.

 Here's swap-prefetch's author saying the same:

 http://lkml.org/lkml/2007/2/9/112

 | It can't help the updatedb scenario. Updatedb leaves the ram full and
 | swap prefetch wants to cost as little as possible so it will never
 | move anything out of ram in preference for the pages it wants to swap
 | back in.

 Now please finally either understand this, or tell us how we're wrong.

Con might have been wrong there for boxes with really little memory.

My desktop box has not even 300k inodes in use (IIRC someone posted a df
-i output showing 1 million inodes in use). Still, the memory footprint
of the sort process grows up to about 50MB. Assuming that the average
filename length stays, that would mean 150MB for the 1 million inode
case, just for the sort process.

Now, sort cannot produce any output before its got all its input, so
that RSS usage exists at least as long as the VFS cache is growing due
to the ongoing search for files.

And then, all that memory that sort uses is required, because sort
needs to output its results. So if there's memory pressure, the VFS
cache is likely to be dropped, because sort needs its data, for
sorting and producing output. And then sort terminates and leaves that
whole lot of memory _unused_. The other actions of updatedb only touch
the locate db, which is just a few megs (4.5MB here) big so the cache
won't grow that much again.

OK, so we got about, say, at least 128MB of totally unused memory, maybe
even more. If you look at the vmstat output I sent, you see that I had
between 90MB and 128MB free, depending on the swappiness setting, with
increased inode usage, that could very well scale up.

Conclusion: updatedb does _not_ leave the RAM full. And for a box with
little memory (say 256MB) it might even be 50% or more memory that is
free after updatedb ran. Might that make swap prefetch kick in?


Any faults in that reasoning?

Thanks,
Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFT: updatedb morning after problem [was: Re: -mm merge plans for 2.6.23]

2007-07-27 Thread Björn Steinbrink
On 2007.07.28 01:29:19 +0200, Andi Kleen wrote:
  Any faults in that reasoning?
 
 GNU sort uses a merge sort with temporary files on disk. Not sure
 how much it keeps in memory during that, but it's probably less
 than 150MB. At some point the dirty limit should kick in and write back the 
 data of the temporary files; so it's not quite the same as anonymous memory. 
 But it's not that different given.

Hm, does that change anything? The files need to be read at the end (so
they go into the cache) and are delete afterwards (cache gets freed I
guess?).

 It would be better to measure than to guess. At least Andrew's measurements
 on 128MB actually didn't show updatedb being really that big a problem.

Here's a before/after memory usage for an updatedb run:
[EMAIL PROTECTED]:~# free -m
 total   used   free sharedbuffers cached
Mem:  2011   1995 15  0269779
-/+ buffers/cache:946   1064
Swap: 1945  0   1945
[EMAIL PROTECTED]:~# updatedb
[EMAIL PROTECTED]:~# free -m
 total   used   free sharedbuffers cached
Mem:  2011   1914 96  0209746
-/+ buffers/cache:958   1052
Swap: 1945  0   1944

81MB more unused RAM afterwards.

If anyone can make use of that, here's a snippet from /proc/$PID/smaps
of updatedb's sort process, when it was at about its peak memory usage
(according to the RSS column in top), which was about 50MB.

2b90ab3c1000-2b90ae4c3000 rw-p 2b90ab3c1000 00:00 0 
Size:  50184 kB
Rss:   50184 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty: 50184 kB
Referenced:50184 kB

 Perhaps some people have much more files or simply a less efficient
 updatedb implementation?

sort (GNU coreutils) 5.97

GNU updatedb version 4.2.31

 I guess the people who complain here that loudly really need to supply
 some real numbers. 

Just to clarify: I'm not complaining either way, neither about not
merging swap prefetch, nor about someone wanting that to be merge. It
was rather the discussion that caught my attention... Just in case ;-)

Björn

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: updatedb

2007-07-26 Thread Björn Steinbrink
On 2007.07.26 11:58:29 +0200, Björn Steinbrink wrote:
> Note that the total RSS usage of updatedb+sort was just about 50MB,
> nevertheless swap grew to more than 300MB. It's also interesting that
> swapping is so aggressive, that the amount of free memory is constantly
> growing. I'm a missing something or wouldn't it be smarter to use that
> free memory for buffers and cache first? (x86_64 system, so even if
> highmem on x86 could be responsible, it's not the case here.)
> 
> Will now go and see what happens if I play with swappiness.

Hm, swappiness set to 0 looks even more weird to me, especially the
beginning, where (AFAICT) basically buffers and caches are dropped just
to get a pretty huge amount of free RAM.

With swappiness set to 100, you basically get what you expect: swapping.
But at least to me, that swapping looks _a lot_ smarter than what it did
for the default swappiness of 60 or the 0 swappiness. Swap is growing,
but so are the buffers, and the cache also only shrinks at a single
point, probably when the "sort" process starts to grow. Plus, the amount
of free memory isn't growing to insane sizes like in the other cases.

vmstat output for both cases to be found below.

Björn

vmstat output for swappiness = 0

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  0  0  25140  37712  6530411 7144   31  0  0 99  0
 4  0  0  25132  37736  6531200 038  212  604  1  0 98  1
 2  0  0  25132  37736  6531200 0 0  177  479  0  0 100  0
 3  1  0  17252  42568  6531200  151623  641 2332  1  3 54 41
 3  2  0  15172  45412  5990800  1567   482  585 2051  1  4 50 45
 2  2  0  20436  44320  5452400   743 9  368 1196  0  1 50 49
 3  2  0  28416  36580  5452000   533 4  312 1016  0  1 50 49
 1  2  4  44316  22780  5446400   128   191  240  786  1  1 43 54
 3  0   3884  55256  18112  4577200   356   277  318  937  0  3 39 57
 1  2   3924  57276  19004  4070000   631   117  347 1160  0  1 50 48
 0  1   4108  61328  15348  4034400   768   301  378 1218  0  1 50 49
 1  1   4276  60812  15612  402160  300   585   359  344 1045  0  1 50 49
 0  2   4276  62184  17484  396240  995   703  1208  370 1188  0  2 50 48
 0  2   4296  68244  14616  3626800   36011  275  973  0  1 50 49
 2  2   4296  75292   6976  360160  137   667   480  424 1315  0  3 50 47
 3  1   4300  78984   6344  3339200   639   635  517 1142  0  3 49 47
 2  1   4300  80816   5520  3252001   992   587  683 1479  0  7 48 45
 1  2   4556  82244   3704  325040   85   607   659  452 1040  0  3 50 47
 0  1   4556  81600   4972  3242000   66544  362 1040  0  1 50 49
 0  2   4940  80364   4588  325080  128   552   539  375  995  0  3 50 47
 0  1   5004  83116   3576  324200   21   416   591  405  862  0  3 49 47
 0  1   5004  80196   5388  3244000   604 0  328 1014  0  2 50 48
 0  1   5004  82976   2548  3254400   461   343  363  923  0  4 49 47
 1  1   5004  81528   3780  3247200  1124   157  542 1524  1  4 49 47
 0  1   5516  81940   4000  324880  171   865   575  489 1356  0  4 49 46
 0  2   5804  82412   2380  325160   96   660   423  385 1160  1  6 42 51
 0  2   5804  81476   2868  3248000   864   204  493 1242  0  5 50 45
 0  1   6268  83124   2088  325080  155   678   551  409 1107  0  7 46 47
 0  2   6268  83216   1576  3238000   771   153  450 1330  1 11 43 46
 0  3  59420  97888736  322240 17717   300 18035  375  869  0 27 12 60
 0  4 176160 214288800  323160 3891323 38917  347  502  0  6 30 64
 1  2 176212 242608   2752  327160   17   68395  441 1256  0  4 41 54
 1  1 176212 237464   5492  3256800   883   188  452 1488  1  2 50 48
 1  0 176212 232296   9628  3263600  1368   263  533 1690  1  2 50 47
 0  1 176212 225852  13264  3265200  1212 0  480 1818  1  3 50 46
 0  1 176212 202076  31708  3266000  6143   348 1723 5654  2  7 50 41
 0  1 176212 177196  49952  3256000  6081 0 1698 5413  1  6 50 43
 0  1 176212 147744  58332  3265200  2791   467  884 3565  1  5 50 44
 0  1 176212 130604  63788  3266400  1813   407  645 2637  1  5 50 44
 0  1 176212 119076  68012  3258400  1408 0  529 2239  0  4 50 45
 2  1 176212  74112  83696  3260000  5225   667 1496 7206  5 13 50 33
 1  1 176212  29296  99788  3262000  536016 1524 7009  4 15 49 32
 0  1 176212  16012 105112  3266000  1773  2901  891 2417  1  4 50 45
 1  1 19  16656 117236  326240 6077  4036  6337 1228 6040  6  9 49 36
 1  1 194572  15996 122056  325680   43  160743  579 2740  1  5 50 44
 0  1 209788  15888 125520  326360 5072  1149

Re: updatedb

2007-07-26 Thread Björn Steinbrink
On 2007.07.26 08:56:59 +0200, Rene Herman wrote:
> On 07/26/2007 08:39 AM, Bongani Hlope wrote:
>
>> On Thursday 26 July 2007 05:59:53 Rene Herman wrote:
>
>>> So what's happening? If you sit down with a copy op "top" in one terminal
>>> and updatedb in another, what does it show?
>
>> Just tested that, there's a steady increase in the useage of buff
>
> Great. Now concentrate on the "swpd" column, as it's the only thing 
> relevant here. The fact that an updatedb run fills/replaces caches is 
> completely and utterly unsurprising and not something swap-prefetch helps 
> with. The only thing it does is bring back stuff from _swap_.

But that's with a system that has plenty of RAM available.

The following vmstat output is from a run for which I ran a memory hog
to simulate a box with just 1GB of RAM (didn't want to reboot ;-)). That
(or even less) is probably a more likely amount of RAM for a majority of
users.

Other than the memory hog, there's a relatively small Firefox process
(just about 150MB RSS), Xorg, mutt an apache and some other stuff,
leaving about 128MB of RAM "free".

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0696  16360  47608  7460000 7134   30  0  0 99  0
 0  0696  16352  47608  7460000 048  213  530  0  0 100  0
 0  1796  16024  45516  745480   17   882   160  515 1698  1  3 58 38
 0  1   1092  16124  41752  741640   43  193143  660 2219  1  4 50 45
 1  1   1548  35096  24224  690360  107  1115   571  473 1616  1  4 50 45
 2  1   8980  45560  18552  585800 1324  1069  1324  453 1705  1  4 50 45
 2  1  12460  44840  21048  565880 1160   831  1345  403 1351  0  1 51 48
 2  1  14348  44220  23016  554080  629   661   947  353 1140  0  2 50 48
 [snip]
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 3  1  88904  72160  55368  389080 1377   836  1576  424 1403  0  3 50 47
 0  1  96080  74084  57600  386600 2373   747  2559  412 1312  0  2 48 49
 1  1 100036  74816  61544  386600 1319  1312  1547  524 1605  1  3 50 47
 0  1 107032  72996  64728  377804 2332  1065  2341  461 1686  1  5 50 45
 2  1 115036  68944  75908  367680 2660  3731  2941 1133 3721  1  6 49 44
 3  0 125160  58768  90548  366280 3375  4883  3798 1458 4606  1  6 50 43
 2  1 125176  48560 102364  3653605  3973  1377 1342 3701  1  4 50 46
 [snip]
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 5  1 360628 101444 191420  344960  748  1927   760  670 2322  3  3 48 46
 1  0 362064 100996 191972  345200  479   184   479  226  654 50  1 41  8
 1  0 362064  99752 191980  3452000 0 9  182  594 50  0 50  0
 4  0 362064  98728 191980  34520   11011 5  179  588 49  0 50  0
 2  0 362064  97528 191988  3452000 015  188  603 50  0 50  0
 2  0 362064  95876 191988  34520   4304313  190  603 50  0 49  1
 1  0 362064  95008 191996  34520   2102112  183  604 50  0 50  0
 2  0 364900  63516 193212  634560  947   408  1281  368 1163 16  3 50 31
 0  0 364868 139108 193284  392200069 11213  383 15413 25  8 61  6
 1  0 364868 139116 193312  3922000 0  1284  224  595  0  0 98  1
 2  0 364868 139240 193320  3922000 0 9  182  553  0  0 100  0


Note that the total RSS usage of updatedb+sort was just about 50MB,
nevertheless swap grew to more than 300MB. It's also interesting that
swapping is so aggressive, that the amount of free memory is constantly
growing. I'm a missing something or wouldn't it be smarter to use that
free memory for buffers and cache first? (x86_64 system, so even if
highmem on x86 could be responsible, it's not the case here.)

Will now go and see what happens if I play with swappiness.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: updatedb

2007-07-26 Thread Björn Steinbrink
On 2007.07.26 11:58:29 +0200, Björn Steinbrink wrote:
 Note that the total RSS usage of updatedb+sort was just about 50MB,
 nevertheless swap grew to more than 300MB. It's also interesting that
 swapping is so aggressive, that the amount of free memory is constantly
 growing. I'm a missing something or wouldn't it be smarter to use that
 free memory for buffers and cache first? (x86_64 system, so even if
 highmem on x86 could be responsible, it's not the case here.)
 
 Will now go and see what happens if I play with swappiness.

Hm, swappiness set to 0 looks even more weird to me, especially the
beginning, where (AFAICT) basically buffers and caches are dropped just
to get a pretty huge amount of free RAM.

With swappiness set to 100, you basically get what you expect: swapping.
But at least to me, that swapping looks _a lot_ smarter than what it did
for the default swappiness of 60 or the 0 swappiness. Swap is growing,
but so are the buffers, and the cache also only shrinks at a single
point, probably when the sort process starts to grow. Plus, the amount
of free memory isn't growing to insane sizes like in the other cases.

vmstat output for both cases to be found below.

Björn

vmstat output for swappiness = 0

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  0  0  25140  37712  6530411 7144   31  0  0 99  0
 4  0  0  25132  37736  6531200 038  212  604  1  0 98  1
 2  0  0  25132  37736  6531200 0 0  177  479  0  0 100  0
 3  1  0  17252  42568  6531200  151623  641 2332  1  3 54 41
 3  2  0  15172  45412  5990800  1567   482  585 2051  1  4 50 45
 2  2  0  20436  44320  5452400   743 9  368 1196  0  1 50 49
 3  2  0  28416  36580  5452000   533 4  312 1016  0  1 50 49
 1  2  4  44316  22780  5446400   128   191  240  786  1  1 43 54
 3  0   3884  55256  18112  4577200   356   277  318  937  0  3 39 57
 1  2   3924  57276  19004  4070000   631   117  347 1160  0  1 50 48
 0  1   4108  61328  15348  4034400   768   301  378 1218  0  1 50 49
 1  1   4276  60812  15612  402160  300   585   359  344 1045  0  1 50 49
 0  2   4276  62184  17484  396240  995   703  1208  370 1188  0  2 50 48
 0  2   4296  68244  14616  3626800   36011  275  973  0  1 50 49
 2  2   4296  75292   6976  360160  137   667   480  424 1315  0  3 50 47
 3  1   4300  78984   6344  3339200   639   635  517 1142  0  3 49 47
 2  1   4300  80816   5520  3252001   992   587  683 1479  0  7 48 45
 1  2   4556  82244   3704  325040   85   607   659  452 1040  0  3 50 47
 0  1   4556  81600   4972  3242000   66544  362 1040  0  1 50 49
 0  2   4940  80364   4588  325080  128   552   539  375  995  0  3 50 47
 0  1   5004  83116   3576  324200   21   416   591  405  862  0  3 49 47
 0  1   5004  80196   5388  3244000   604 0  328 1014  0  2 50 48
 0  1   5004  82976   2548  3254400   461   343  363  923  0  4 49 47
 1  1   5004  81528   3780  3247200  1124   157  542 1524  1  4 49 47
 0  1   5516  81940   4000  324880  171   865   575  489 1356  0  4 49 46
 0  2   5804  82412   2380  325160   96   660   423  385 1160  1  6 42 51
 0  2   5804  81476   2868  3248000   864   204  493 1242  0  5 50 45
 0  1   6268  83124   2088  325080  155   678   551  409 1107  0  7 46 47
 0  2   6268  83216   1576  3238000   771   153  450 1330  1 11 43 46
 0  3  59420  97888736  322240 17717   300 18035  375  869  0 27 12 60
 0  4 176160 214288800  323160 3891323 38917  347  502  0  6 30 64
 1  2 176212 242608   2752  327160   17   68395  441 1256  0  4 41 54
 1  1 176212 237464   5492  3256800   883   188  452 1488  1  2 50 48
 1  0 176212 232296   9628  3263600  1368   263  533 1690  1  2 50 47
 0  1 176212 225852  13264  3265200  1212 0  480 1818  1  3 50 46
 0  1 176212 202076  31708  3266000  6143   348 1723 5654  2  7 50 41
 0  1 176212 177196  49952  3256000  6081 0 1698 5413  1  6 50 43
 0  1 176212 147744  58332  3265200  2791   467  884 3565  1  5 50 44
 0  1 176212 130604  63788  3266400  1813   407  645 2637  1  5 50 44
 0  1 176212 119076  68012  3258400  1408 0  529 2239  0  4 50 45
 2  1 176212  74112  83696  3260000  5225   667 1496 7206  5 13 50 33
 1  1 176212  29296  99788  3262000  536016 1524 7009  4 15 49 32
 0  1 176212  16012 105112  3266000  1773  2901  891 2417  1  4 50 45
 1  1 19  16656 117236  326240 6077  4036  6337 1228 6040  6  9 49 36
 1  1 194572  15996 122056  325680   43  160743  579 2740  1  5 50 44
 0  1 209788  15888 125520  326360 5072  1149  5520  503 1836  1  4 49 46
 0  1 231568

Re: updatedb

2007-07-26 Thread Björn Steinbrink
On 2007.07.26 08:56:59 +0200, Rene Herman wrote:
 On 07/26/2007 08:39 AM, Bongani Hlope wrote:

 On Thursday 26 July 2007 05:59:53 Rene Herman wrote:

 So what's happening? If you sit down with a copy op top in one terminal
 and updatedb in another, what does it show?

 Just tested that, there's a steady increase in the useage of buff

 Great. Now concentrate on the swpd column, as it's the only thing 
 relevant here. The fact that an updatedb run fills/replaces caches is 
 completely and utterly unsurprising and not something swap-prefetch helps 
 with. The only thing it does is bring back stuff from _swap_.

But that's with a system that has plenty of RAM available.

The following vmstat output is from a run for which I ran a memory hog
to simulate a box with just 1GB of RAM (didn't want to reboot ;-)). That
(or even less) is probably a more likely amount of RAM for a majority of
users.

Other than the memory hog, there's a relatively small Firefox process
(just about 150MB RSS), Xorg, mutt an apache and some other stuff,
leaving about 128MB of RAM free.

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0696  16360  47608  7460000 7134   30  0  0 99  0
 0  0696  16352  47608  7460000 048  213  530  0  0 100  0
 0  1796  16024  45516  745480   17   882   160  515 1698  1  3 58 38
 0  1   1092  16124  41752  741640   43  193143  660 2219  1  4 50 45
 1  1   1548  35096  24224  690360  107  1115   571  473 1616  1  4 50 45
 2  1   8980  45560  18552  585800 1324  1069  1324  453 1705  1  4 50 45
 2  1  12460  44840  21048  565880 1160   831  1345  403 1351  0  1 51 48
 2  1  14348  44220  23016  554080  629   661   947  353 1140  0  2 50 48
 [snip]
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 3  1  88904  72160  55368  389080 1377   836  1576  424 1403  0  3 50 47
 0  1  96080  74084  57600  386600 2373   747  2559  412 1312  0  2 48 49
 1  1 100036  74816  61544  386600 1319  1312  1547  524 1605  1  3 50 47
 0  1 107032  72996  64728  377804 2332  1065  2341  461 1686  1  5 50 45
 2  1 115036  68944  75908  367680 2660  3731  2941 1133 3721  1  6 49 44
 3  0 125160  58768  90548  366280 3375  4883  3798 1458 4606  1  6 50 43
 2  1 125176  48560 102364  3653605  3973  1377 1342 3701  1  4 50 46
 [snip]
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 5  1 360628 101444 191420  344960  748  1927   760  670 2322  3  3 48 46
 1  0 362064 100996 191972  345200  479   184   479  226  654 50  1 41  8
 1  0 362064  99752 191980  3452000 0 9  182  594 50  0 50  0
 4  0 362064  98728 191980  34520   11011 5  179  588 49  0 50  0
 2  0 362064  97528 191988  3452000 015  188  603 50  0 50  0
 2  0 362064  95876 191988  34520   4304313  190  603 50  0 49  1
 1  0 362064  95008 191996  34520   2102112  183  604 50  0 50  0
 2  0 364900  63516 193212  634560  947   408  1281  368 1163 16  3 50 31
 0  0 364868 139108 193284  392200069 11213  383 15413 25  8 61  6
 1  0 364868 139116 193312  3922000 0  1284  224  595  0  0 98  1
 2  0 364868 139240 193320  3922000 0 9  182  553  0  0 100  0


Note that the total RSS usage of updatedb+sort was just about 50MB,
nevertheless swap grew to more than 300MB. It's also interesting that
swapping is so aggressive, that the amount of free memory is constantly
growing. I'm a missing something or wouldn't it be smarter to use that
free memory for buffers and cache first? (x86_64 system, so even if
highmem on x86 could be responsible, it's not the case here.)

Will now go and see what happens if I play with swappiness.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [20/58] x86: Always probe the NMI watchdog

2007-07-19 Thread Björn Steinbrink
On 2007.07.19 11:55:05 +0200, Andi Kleen wrote:
> 
> From: [** iso-8859-1 charset **] Bj�rnSteinbrink <[EMAIL PROTECTED]>
> 
> The performance counter allocator relies on the nmi watchdog being
> probed, so we have to do that even if the watchdog is not enabled.

Are you going to revert your fixes to the msr->bit conversions? Or is
this patch still required with them in place? I actually like your fix,
as it also fixes the semantics of single_msr_reserve() (see other mail).

So, can we just drop this one?

Thanks,
Bj�rn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [21/58] i386: Reserve the right performance counter for the Intel PerfMon NMI watchdog

2007-07-19 Thread Björn Steinbrink
On 2007.07.19 11:55:06 +0200, Andi Kleen wrote:
> 
> From: [** iso-8859-1 charset **] Bj�rnSteinbrink <[EMAIL PROTECTED]>
> 
> The Intel PerfMon NMI watchdog was using the generic reservation
> function which always reserves the first performance counter. But the
> watchdog actually uses the second performance counter, thus we need a
> specialised function.

Ah, almost forgot about that patch. Actually, thanks to your fix that
basically reverted the msr->offset conversation to its 2.6.21
implementation, single_msr_reserve has sane semantics now and does just
what the name suggests (before, the wd_ops entries had to store the
"base" msrs, so it was really a first_msr_reserve).

With wd_ops->perfctr no longer needed to be the base msr, we can just
fix that value for the arch perfmon watchdog. (And maybe we should
remove the values for those implementations that don't employ the
single_msr_reserve() stuff?)

Thanks,
Bj�rn



From: Bj�rn Steinbrink <[EMAIL PROTECTED]>

The Intel PerfMon NMI watchdog reserves the first performance counter,
but uses the second one. Make it correctly reserve the second one.

Signed-off-by: Bj�rn Steinbrink <[EMAIL PROTECTED]>
---
diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c 
b/arch/i386/kernel/cpu/perfctr-watchdog.c
index 4d26d51..30b5e48 100644
--- a/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ b/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -599,8 +599,8 @@ static struct wd_ops intel_arch_wd_ops = {
.setup = setup_intel_arch_watchdog,
.rearm = p6_rearm,
.stop = single_msr_stop_watchdog,
-   .perfctr = MSR_ARCH_PERFMON_PERFCTR0,
-   .evntsel = MSR_ARCH_PERFMON_EVENTSEL0,
+   .perfctr = MSR_ARCH_PERFMON_PERFCTR1,
+   .evntsel = MSR_ARCH_PERFMON_EVENTSEL1,
 };
 
 static void probe_nmi_watchdog(void)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [21/58] i386: Reserve the right performance counter for the Intel PerfMon NMI watchdog

2007-07-19 Thread Björn Steinbrink
On 2007.07.19 11:55:06 +0200, Andi Kleen wrote:
 
 From: [** iso-8859-1 charset **] Bj�rnSteinbrink [EMAIL PROTECTED]
 
 The Intel PerfMon NMI watchdog was using the generic reservation
 function which always reserves the first performance counter. But the
 watchdog actually uses the second performance counter, thus we need a
 specialised function.

Ah, almost forgot about that patch. Actually, thanks to your fix that
basically reverted the msr-offset conversation to its 2.6.21
implementation, single_msr_reserve has sane semantics now and does just
what the name suggests (before, the wd_ops entries had to store the
base msrs, so it was really a first_msr_reserve).

With wd_ops-perfctr no longer needed to be the base msr, we can just
fix that value for the arch perfmon watchdog. (And maybe we should
remove the values for those implementations that don't employ the
single_msr_reserve() stuff?)

Thanks,
Bj�rn



From: Bj�rn Steinbrink [EMAIL PROTECTED]

The Intel PerfMon NMI watchdog reserves the first performance counter,
but uses the second one. Make it correctly reserve the second one.

Signed-off-by: Bj�rn Steinbrink [EMAIL PROTECTED]
---
diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c 
b/arch/i386/kernel/cpu/perfctr-watchdog.c
index 4d26d51..30b5e48 100644
--- a/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ b/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -599,8 +599,8 @@ static struct wd_ops intel_arch_wd_ops = {
.setup = setup_intel_arch_watchdog,
.rearm = p6_rearm,
.stop = single_msr_stop_watchdog,
-   .perfctr = MSR_ARCH_PERFMON_PERFCTR0,
-   .evntsel = MSR_ARCH_PERFMON_EVENTSEL0,
+   .perfctr = MSR_ARCH_PERFMON_PERFCTR1,
+   .evntsel = MSR_ARCH_PERFMON_EVENTSEL1,
 };
 
 static void probe_nmi_watchdog(void)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [20/58] x86: Always probe the NMI watchdog

2007-07-19 Thread Björn Steinbrink
On 2007.07.19 11:55:05 +0200, Andi Kleen wrote:
 
 From: [** iso-8859-1 charset **] Bj�rnSteinbrink [EMAIL PROTECTED]
 
 The performance counter allocator relies on the nmi watchdog being
 probed, so we have to do that even if the watchdog is not enabled.

Are you going to revert your fixes to the msr-bit conversions? Or is
this patch still required with them in place? I actually like your fix,
as it also fixes the semantics of single_msr_reserve() (see other mail).

So, can we just drop this one?

Thanks,
Bj�rn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc7: known regressions with patches

2007-07-03 Thread Björn Steinbrink
On 2007.07.03 14:42:25 -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 3 Jul 2007, Bj?rn Steinbrink wrote:
> > Andi said that one of the regression fixes wasn't critical for .22 and
> > that he wants to do a stopgap for the other regression (my patch
> > sucked), reverting the code to the .21 version. So you can drop the
> > patches and/or me here.
> 
> Can you say which patch should be reverted. This thing really shouldn't 
> have gone on this long, I would have hopef we had the oprofile thing 
> sorted out already..

That would be commit 09198e68501a7e34737cd9264d266f42429abcdc, for which
there are already a few fixes in your tree. Andi, did you intent to
fully revert that, or just certain parts of it?

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc7: known regressions with patches

2007-07-03 Thread Björn Steinbrink
On 2007.07.03 18:45:24 +0200, Michal Piotrowski wrote:
> Subject: OProfile issues
> References : http://lkml.org/lkml/2007/6/12/207
> Submitter  : Stephane Eranian <[EMAIL PROTECTED]>
> Handled-By : Björn Steinbrink <[EMAIL PROTECTED]>
> Patch  : http://lkml.org/lkml/2007/6/12/392
>  http://lkml.org/lkml/2007/6/20/275
>  http://lkml.org/lkml/2007/6/20/276
> Status : patches were suggested

Andi said that one of the regression fixes wasn't critical for .22 and
that he wants to do a stopgap for the other regression (my patch
sucked), reverting the code to the .21 version. So you can drop the
patches and/or me here.

Thanks,
Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc7: known regressions with patches

2007-07-03 Thread Björn Steinbrink
On 2007.07.03 18:45:24 +0200, Michal Piotrowski wrote:
 Subject: OProfile issues
 References : http://lkml.org/lkml/2007/6/12/207
 Submitter  : Stephane Eranian [EMAIL PROTECTED]
 Handled-By : Björn Steinbrink [EMAIL PROTECTED]
 Patch  : http://lkml.org/lkml/2007/6/12/392
  http://lkml.org/lkml/2007/6/20/275
  http://lkml.org/lkml/2007/6/20/276
 Status : patches were suggested

Andi said that one of the regression fixes wasn't critical for .22 and
that he wants to do a stopgap for the other regression (my patch
sucked), reverting the code to the .21 version. So you can drop the
patches and/or me here.

Thanks,
Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc7: known regressions with patches

2007-07-03 Thread Björn Steinbrink
On 2007.07.03 14:42:25 -0700, Linus Torvalds wrote:
 
 
 On Tue, 3 Jul 2007, Bj?rn Steinbrink wrote:
  Andi said that one of the regression fixes wasn't critical for .22 and
  that he wants to do a stopgap for the other regression (my patch
  sucked), reverting the code to the .21 version. So you can drop the
  patches and/or me here.
 
 Can you say which patch should be reverted. This thing really shouldn't 
 have gone on this long, I would have hopef we had the oprofile thing 
 sorted out already..

That would be commit 09198e68501a7e34737cd9264d266f42429abcdc, for which
there are already a few fixes in your tree. Andi, did you intent to
fully revert that, or just certain parts of it?

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Old bug in tg3 driver unfixed?

2007-07-02 Thread Björn Steinbrink
On 2007.06.30 14:06:14 +0200, Tim Boneko wrote:
> Hello!
> I am not subscribed to this list so please CC answers to my mail
> address. THX!
> 
> I recently replaced the mainboard of one of my servers with a Tyan
> Tomcat K8E. The onboard gigabit NIC is a Broadcom BCM5721. After
> compiling and loading the tg3 driver in Kernel 2.6.21.5, the interface
> could not be configured: "Device not found".
> While searching the net i found a few other people with the same problem
> but no solution.
> 
> By coincidence i found that a simpe "ifconfig eth1" worked OK and
> afterwards the device could be configured and used as desired. After
> searching this list, i found this posting
> 
> http://uwsg.iu.edu/hypermail/linux/kernel/0409.0/0224.html
> 
> by someone with obviously the same problem.

Sounds more like a iftab/udev problem. Check /etc/iftab and your udev
rules please.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Old bug in tg3 driver unfixed?

2007-07-02 Thread Björn Steinbrink
On 2007.06.30 14:06:14 +0200, Tim Boneko wrote:
 Hello!
 I am not subscribed to this list so please CC answers to my mail
 address. THX!
 
 I recently replaced the mainboard of one of my servers with a Tyan
 Tomcat K8E. The onboard gigabit NIC is a Broadcom BCM5721. After
 compiling and loading the tg3 driver in Kernel 2.6.21.5, the interface
 could not be configured: Device not found.
 While searching the net i found a few other people with the same problem
 but no solution.
 
 By coincidence i found that a simpe ifconfig eth1 worked OK and
 afterwards the device could be configured and used as desired. After
 searching this list, i found this posting
 
 http://uwsg.iu.edu/hypermail/linux/kernel/0409.0/0224.html
 
 by someone with obviously the same problem.

Sounds more like a iftab/udev problem. Check /etc/iftab and your udev
rules please.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [OT] Vim highlighting for trailing spaces

2007-06-29 Thread Björn Steinbrink
On 2007.06.29 01:42:22 -0700, Josh Triplett wrote:
> Jan Engelhardt wrote:
> > On Jun 29 2007 00:53, Josh Triplett wrote:
> >> And if you really want highlighting, you can always use grep --color. :)
> > 
> > Been there, done that, have GREP_COLOR env variable defined!
> 
> Same here.  Now I just need to convince git-grep to use it.

You need to convince grep. When piping its output to less, it won't
colorize unless forced. Always forcing color via GREP_OPTIONS might
break certain use-cases, and git-grep doesn't allow to pass options. So
for me, a bash alias it is:

alias gg='GREP_OPTIONS=--color=always git-grep'

You might need to set LESS=-R in addition to that, to stop less from
stripping the color codes.

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [OT] Vim highlighting for trailing spaces

2007-06-29 Thread Björn Steinbrink
On 2007.06.29 01:42:22 -0700, Josh Triplett wrote:
 Jan Engelhardt wrote:
  On Jun 29 2007 00:53, Josh Triplett wrote:
  And if you really want highlighting, you can always use grep --color. :)
  
  Been there, done that, have GREP_COLOR env variable defined!
 
 Same here.  Now I just need to convince git-grep to use it.

You need to convince grep. When piping its output to less, it won't
colorize unless forced. Always forcing color via GREP_OPTIONS might
break certain use-cases, and git-grep doesn't allow to pass options. So
for me, a bash alias it is:

alias gg='GREP_OPTIONS=--color=always git-grep'

You might need to set LESS=-R in addition to that, to stop less from
stripping the color codes.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: DIE_NMI_IPI to oprofile ?

2007-06-26 Thread Björn Steinbrink
On 2007.06.25 23:51:51 -0700, Amitabha Roy wrote:
> Hi
> 
> >From looking at the default_do_nmi code in traps.c it seems that (at
> least for the non BSP case, with reason=0) the die notifier chain gets
> called with val=DIE_NMI_IPI.
> 
> However in the profile_exceptions_notify_handler we check for DIE_NMI
> (and not DIE_NMI_IPI) to decide if we want to cycle through the
> counters. This seems wrong to me unless I am missing something (which
> must be the case, otherwise this code is broken :).
> 
> Can anyone help me understand this ?

notify_die(DIE_NMI, ...) is also called from nmi_watchdog_tick. So
depending on "reason" you call:

notify_die(DIE_NMI_IPI, ...)
nmi_watchdog_tick(...)
---> notify_die(DIE_NMI, ...)

or just

notify_die(DIE_NMI, ...)

HTH
Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: DIE_NMI_IPI to oprofile ?

2007-06-26 Thread Björn Steinbrink
On 2007.06.25 23:51:51 -0700, Amitabha Roy wrote:
 Hi
 
 From looking at the default_do_nmi code in traps.c it seems that (at
 least for the non BSP case, with reason=0) the die notifier chain gets
 called with val=DIE_NMI_IPI.
 
 However in the profile_exceptions_notify_handler we check for DIE_NMI
 (and not DIE_NMI_IPI) to decide if we want to cycle through the
 counters. This seems wrong to me unless I am missing something (which
 must be the case, otherwise this code is broken :).
 
 Can anyone help me understand this ?

notify_die(DIE_NMI, ...) is also called from nmi_watchdog_tick. So
depending on reason you call:

notify_die(DIE_NMI_IPI, ...)
nmi_watchdog_tick(...)
--- notify_die(DIE_NMI, ...)

or just

notify_die(DIE_NMI, ...)

HTH
Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Always probe the NMI watchdog

2007-06-25 Thread Björn Steinbrink
On 2007.06.25 21:36:17 +0200, Andi Kleen wrote:
> On Monday 25 June 2007 21:09, Andrew Morton wrote:
> > On Wed, 20 Jun 2007 20:34:48 +0200
> >
> > Bj__rn Steinbrink <[EMAIL PROTECTED]> wrote:
> > > The performance counter allocator relies on the nmi watchdog being
> > > probed, so we have to do that even if the watchdog is not enabled.
> >
> > So...  what's the status of this lot?
> >
> > I've just merged this patch and the second one:
> >
> > Subject: [PATCH 2/2] Reserve the right performance counter for the Intel
> > PerfMon NMI watchdog Message-ID: <[EMAIL PROTECTED]>
> >
> > but there was no followup discussion afaict.
> >
> > Andi, Stephane: acks?
> 
> Yes, although I'm still a little uneasy about the always probe one.
> 
> > If acked, do we agree that this is 2.6.22 material?
> 
> The first (always probe) is probably .22 material, but needs more testing 
> first.

Hm, without the second, I expect OProfile to break when the watchdog is
enabled. Alternatively to the patch I sent, we could revert the change
that makes it use perfctr1 instead of perfctr0. Would you prefer that?

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Always probe the NMI watchdog

2007-06-25 Thread Björn Steinbrink
On 2007.06.25 13:01:58 -0700, Stephane Eranian wrote:
> Hi,
> On Mon, Jun 25, 2007 at 09:36:17PM +0200, Andi Kleen wrote:
> > On Monday 25 June 2007 21:09, Andrew Morton wrote:
> > > On Wed, 20 Jun 2007 20:34:48 +0200
> > >
> > > Bj__rn Steinbrink <[EMAIL PROTECTED]> wrote:
> > > > The performance counter allocator relies on the nmi watchdog being
> > > > probed, so we have to do that even if the watchdog is not enabled.
> > >
> > > So...  what's the status of this lot?
> > >
> > > I've just merged this patch and the second one:
> > >
> > > Subject: [PATCH 2/2] Reserve the right performance counter for the Intel
> > > PerfMon NMI watchdog Message-ID: <[EMAIL PROTECTED]>
> > >
> > > but there was no followup discussion afaict.
> > >
> > > Andi, Stephane: acks?
> > 
> > Yes, although I'm still a little uneasy about the always probe one.
> > 
> 
> I looked at the code I have in my tree coming from Bjon's patches and
> I am a bit confused by the flow for probing as well.
> 
> The register allocator works globally, i.e., you reserve a register
> for all CPUs at once.
> 
> The probe_nmi_watchdog() routine simply probes the CPU type to initialize
> the watchdog data structure (wd_ops). This needs to be done once and for all.
> Why put it in a route that is called with on_each_cpu()?

Ehrm, that's a good question actually... I moved the probing call up
into setup_local_APIC in that other big patch and put a check at the
start of the probing function, so that it is executed only once. No idea
why I did it so weird in this one.

> I think the tricky part is that we do want to reserve perfctr1 even
> though the NMI watchdog is not active. This comes from the fact that
> the NMI watchdog knows about only one counter and if it can't get that
> one, it probably fails.  By reserving it from the start, we ensure NMI
> watchdog will work when eventually activated.

Can you enable it later on at all? It failed for me when I tried,
because it didn't know which hardware to use. Had to pass the kernel
parameter to make the proc files do anything. Seems like it has to be
enable at boot to work at all.

And AFAICT we never unconditionally reserved a perfctr for the watchdog.

> Unlike sharing between Oprofile and perfmon which works by enforcing
> mutual exclusion between the two subsystems, the NMI watchdog must
> work concurrently with either Oprofile or Perfmon.

In 2.6.21 the nmi watchdog, if enabled, just reserved its perfctrs and
everything else had to deal with it. Since the cleanup, the watchdog
will release its perfctr when disabled, so another subsystem can grab
it. But that also means that that other subsystem must release it again
before you can reenable the watchdog.

> Bjorn, did I understand the constraints correctly?

I'll tell you, once I'm sure that I understood them correctly ;-)

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang

2007-06-25 Thread Björn Steinbrink
On 2007.06.25 08:40:35 -0400, Jeremy Fitzhardinge wrote:
> Ingo Molnar wrote:
> >* Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >
> >  
> >>hm, restoring nmi.c to the v2.6.21 state does not fix the 
> >>nmi_watchdog=2 hang. I'll do a bisection run.
> >>
> >
> >and after spending an hour on 15 bisection steps:
> >
> > git-bisect start
> > git-bisect good d1be341dba5521506d9e6dccfd66179080705bea
> > git-bisect bad a06381fec77bf88ec6c5eb6324457cb04e9ffd69
> > git-bisect bad 794543a236074f49a8af89ef08ef6a753e4777e5
> > git-bisect good 24a77daf3d80bddcece044e6dc3675e427eef3f3
> > git-bisect bad ea62ccd00fd0b6720b033adfc9984f31130ce195
> > git-bisect good 7e20ef030dde0e52dd5a57220ee82fa9facbea4e
> > git-bisect bad f19cccf366a07e05703c90038704a3a5ffcb0607
> > git-bisect good 0d08e0d3a97cce22ebf80b54785e00d9b94e1add
> > git-bisect bad 856f44ff4af6e57fdc39a8b2bec498c88438bd27
> > git-bisect bad f8822f42019eceed19cc6c0f985a489e17796ed8
> > git-bisect good 1c3d99c11c47c8a1a9ed6a46555dbf6520683c52
> > git-bisect good b239fb2501117bf3aeb4dd6926edd855be92333d
> > git-bisect good 98de032b681d8a7532d44dfc66aa5c0c1c755a9d
> > git-bisect good 42c24fa22e86365055fc931d833f26165e687c19
> >
> >the winner is ...
> >
> > f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit
> > commit f8822f42019eceed19cc6c0f985a489e17796ed8
> > Author: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> > Date:   Wed May 2 19:27:14 2007 +0200
> >
> >[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to 
> >make them patchable
> >
> >... our wonderful paravirt subsystem, honed to eternal perfection by the 
> >testing-machine x86_64 tree.
> >
> >reverting -git-curr's paravirt.c, paravirt.h, smp.c and tlbflush.h to 
> >before the bad commit makes the NMI watchdog work again. Patch against 
> >-rc6 is below.
> >  
> 
> Er, wow.  I've been running with this stuff for months without a 
> problem.   Do you have CONFIG_PARAVIRT enabled?  Do you still get the 
> hang if you boot with "noreplace-paravirt" to disable the patching? 

Are you running on AMD hardware? As Intel performance counters are only
32 bits wide, the wrmsrl bug should be a non-issue at least for the NMI
watchdog on Intel hardware. AMD uses 48bit wide performance counters,
which are probably less happy ;-)

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang

2007-06-25 Thread Björn Steinbrink
On 2007.06.25 08:49:05 -0400, Jeremy Fitzhardinge wrote:
> Björn Steinbrink wrote:
> >On 2007.06.25 10:26:52 +0200, Ingo Molnar wrote:
> >  
> >>* Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >>
> >>
> >>>the winner is ...
> >>>
> >>> f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit
> >>> commit f8822f42019eceed19cc6c0f985a489e17796ed8
> >>> Author: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> >>> Date:   Wed May 2 19:27:14 2007 +0200
> >>>
> >>>[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to 
> >>>make them patchable
> >>>  
> >>and of course i'm happy to test any patch that is simpler than the 
> >>brutal revert i sent.
> >>
> >
> >wrmsrl() looks broken, dropping the upper 32bits of the value to be
> >written. Does this help?
> >  
> 
> Crap.  That's embarrassing.  Does it help, because it seems likely?  
> (Esp since Ingo didn't even have CONFIG_PARAVIRT enabled, so most of his 
> revert would have been dead code anyway.)

He has. The config Ingo sent was for x86_64, which (AFAICT) doesn't have
CONFIG_PARAVIRT, so the config was unfortunately useless. But his
bootlog tells us:

Booting paravirtualized kernel on bare hardware

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang

2007-06-25 Thread Björn Steinbrink
On 2007.06.25 10:26:52 +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > the winner is ...
> > 
> >  f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit
> >  commit f8822f42019eceed19cc6c0f985a489e17796ed8
> >  Author: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> >  Date:   Wed May 2 19:27:14 2007 +0200
> > 
> > [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to 
> > make them patchable
> 
> and of course i'm happy to test any patch that is simpler than the 
> brutal revert i sent.

wrmsrl() looks broken, dropping the upper 32bits of the value to be
written. Does this help?

Björn
---
diff --git a/include/asm-i386/paravirt.h b/include/asm-i386/paravirt.h
index d7a0512..7f846a7 100644
--- a/include/asm-i386/paravirt.h
+++ b/include/asm-i386/paravirt.h
@@ -539,7 +539,7 @@ static inline int paravirt_write_msr(unsigned msr, unsigned 
low, unsigned high)
val = paravirt_read_msr(msr, &_err);\
 } while(0)
 
-#define wrmsrl(msr,val)((void)paravirt_write_msr(msr, val, 0))
+#define wrmsrl(msr,val)wrmsr(msr, (u32)((u64)(val)), 
((u64)(val))>>32)
 #define wrmsr_safe(msr,a,b)paravirt_write_msr(msr, a, b)
 
 /* rdmsr with exception handling */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang

2007-06-25 Thread Björn Steinbrink
On 2007.06.25 10:26:52 +0200, Ingo Molnar wrote:
 
 * Ingo Molnar [EMAIL PROTECTED] wrote:
 
  the winner is ...
  
   f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit
   commit f8822f42019eceed19cc6c0f985a489e17796ed8
   Author: Jeremy Fitzhardinge [EMAIL PROTECTED]
   Date:   Wed May 2 19:27:14 2007 +0200
  
  [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to 
  make them patchable
 
 and of course i'm happy to test any patch that is simpler than the 
 brutal revert i sent.

wrmsrl() looks broken, dropping the upper 32bits of the value to be
written. Does this help?

Björn
---
diff --git a/include/asm-i386/paravirt.h b/include/asm-i386/paravirt.h
index d7a0512..7f846a7 100644
--- a/include/asm-i386/paravirt.h
+++ b/include/asm-i386/paravirt.h
@@ -539,7 +539,7 @@ static inline int paravirt_write_msr(unsigned msr, unsigned 
low, unsigned high)
val = paravirt_read_msr(msr, _err);\
 } while(0)
 
-#define wrmsrl(msr,val)((void)paravirt_write_msr(msr, val, 0))
+#define wrmsrl(msr,val)wrmsr(msr, (u32)((u64)(val)), 
((u64)(val))32)
 #define wrmsr_safe(msr,a,b)paravirt_write_msr(msr, a, b)
 
 /* rdmsr with exception handling */
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang

2007-06-25 Thread Björn Steinbrink
On 2007.06.25 08:49:05 -0400, Jeremy Fitzhardinge wrote:
 Björn Steinbrink wrote:
 On 2007.06.25 10:26:52 +0200, Ingo Molnar wrote:
   
 * Ingo Molnar [EMAIL PROTECTED] wrote:
 
 
 the winner is ...
 
  f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit
  commit f8822f42019eceed19cc6c0f985a489e17796ed8
  Author: Jeremy Fitzhardinge [EMAIL PROTECTED]
  Date:   Wed May 2 19:27:14 2007 +0200
 
 [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to 
 make them patchable
   
 and of course i'm happy to test any patch that is simpler than the 
 brutal revert i sent.
 
 
 wrmsrl() looks broken, dropping the upper 32bits of the value to be
 written. Does this help?
   
 
 Crap.  That's embarrassing.  Does it help, because it seems likely?  
 (Esp since Ingo didn't even have CONFIG_PARAVIRT enabled, so most of his 
 revert would have been dead code anyway.)

He has. The config Ingo sent was for x86_64, which (AFAICT) doesn't have
CONFIG_PARAVIRT, so the config was unfortunately useless. But his
bootlog tells us:

Booting paravirtualized kernel on bare hardware

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang

2007-06-25 Thread Björn Steinbrink
On 2007.06.25 08:40:35 -0400, Jeremy Fitzhardinge wrote:
 Ingo Molnar wrote:
 * Ingo Molnar [EMAIL PROTECTED] wrote:
 
   
 hm, restoring nmi.c to the v2.6.21 state does not fix the 
 nmi_watchdog=2 hang. I'll do a bisection run.
 
 
 and after spending an hour on 15 bisection steps:
 
  git-bisect start
  git-bisect good d1be341dba5521506d9e6dccfd66179080705bea
  git-bisect bad a06381fec77bf88ec6c5eb6324457cb04e9ffd69
  git-bisect bad 794543a236074f49a8af89ef08ef6a753e4777e5
  git-bisect good 24a77daf3d80bddcece044e6dc3675e427eef3f3
  git-bisect bad ea62ccd00fd0b6720b033adfc9984f31130ce195
  git-bisect good 7e20ef030dde0e52dd5a57220ee82fa9facbea4e
  git-bisect bad f19cccf366a07e05703c90038704a3a5ffcb0607
  git-bisect good 0d08e0d3a97cce22ebf80b54785e00d9b94e1add
  git-bisect bad 856f44ff4af6e57fdc39a8b2bec498c88438bd27
  git-bisect bad f8822f42019eceed19cc6c0f985a489e17796ed8
  git-bisect good 1c3d99c11c47c8a1a9ed6a46555dbf6520683c52
  git-bisect good b239fb2501117bf3aeb4dd6926edd855be92333d
  git-bisect good 98de032b681d8a7532d44dfc66aa5c0c1c755a9d
  git-bisect good 42c24fa22e86365055fc931d833f26165e687c19
 
 the winner is ...
 
  f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit
  commit f8822f42019eceed19cc6c0f985a489e17796ed8
  Author: Jeremy Fitzhardinge [EMAIL PROTECTED]
  Date:   Wed May 2 19:27:14 2007 +0200
 
 [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to 
 make them patchable
 
 ... our wonderful paravirt subsystem, honed to eternal perfection by the 
 testing-machine x86_64 tree.
 
 reverting -git-curr's paravirt.c, paravirt.h, smp.c and tlbflush.h to 
 before the bad commit makes the NMI watchdog work again. Patch against 
 -rc6 is below.
   
 
 Er, wow.  I've been running with this stuff for months without a 
 problem.   Do you have CONFIG_PARAVIRT enabled?  Do you still get the 
 hang if you boot with noreplace-paravirt to disable the patching? 

Are you running on AMD hardware? As Intel performance counters are only
32 bits wide, the wrmsrl bug should be a non-issue at least for the NMI
watchdog on Intel hardware. AMD uses 48bit wide performance counters,
which are probably less happy ;-)

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Always probe the NMI watchdog

2007-06-25 Thread Björn Steinbrink
On 2007.06.25 13:01:58 -0700, Stephane Eranian wrote:
 Hi,
 On Mon, Jun 25, 2007 at 09:36:17PM +0200, Andi Kleen wrote:
  On Monday 25 June 2007 21:09, Andrew Morton wrote:
   On Wed, 20 Jun 2007 20:34:48 +0200
  
   Bj__rn Steinbrink [EMAIL PROTECTED] wrote:
The performance counter allocator relies on the nmi watchdog being
probed, so we have to do that even if the watchdog is not enabled.
  
   So...  what's the status of this lot?
  
   I've just merged this patch and the second one:
  
   Subject: [PATCH 2/2] Reserve the right performance counter for the Intel
   PerfMon NMI watchdog Message-ID: [EMAIL PROTECTED]
  
   but there was no followup discussion afaict.
  
   Andi, Stephane: acks?
  
  Yes, although I'm still a little uneasy about the always probe one.
  
 
 I looked at the code I have in my tree coming from Bjon's patches and
 I am a bit confused by the flow for probing as well.
 
 The register allocator works globally, i.e., you reserve a register
 for all CPUs at once.
 
 The probe_nmi_watchdog() routine simply probes the CPU type to initialize
 the watchdog data structure (wd_ops). This needs to be done once and for all.
 Why put it in a route that is called with on_each_cpu()?

Ehrm, that's a good question actually... I moved the probing call up
into setup_local_APIC in that other big patch and put a check at the
start of the probing function, so that it is executed only once. No idea
why I did it so weird in this one.

 I think the tricky part is that we do want to reserve perfctr1 even
 though the NMI watchdog is not active. This comes from the fact that
 the NMI watchdog knows about only one counter and if it can't get that
 one, it probably fails.  By reserving it from the start, we ensure NMI
 watchdog will work when eventually activated.

Can you enable it later on at all? It failed for me when I tried,
because it didn't know which hardware to use. Had to pass the kernel
parameter to make the proc files do anything. Seems like it has to be
enable at boot to work at all.

And AFAICT we never unconditionally reserved a perfctr for the watchdog.

 Unlike sharing between Oprofile and perfmon which works by enforcing
 mutual exclusion between the two subsystems, the NMI watchdog must
 work concurrently with either Oprofile or Perfmon.

In 2.6.21 the nmi watchdog, if enabled, just reserved its perfctrs and
everything else had to deal with it. Since the cleanup, the watchdog
will release its perfctr when disabled, so another subsystem can grab
it. But that also means that that other subsystem must release it again
before you can reenable the watchdog.

 Bjorn, did I understand the constraints correctly?

I'll tell you, once I'm sure that I understood them correctly ;-)

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Always probe the NMI watchdog

2007-06-25 Thread Björn Steinbrink
On 2007.06.25 21:36:17 +0200, Andi Kleen wrote:
 On Monday 25 June 2007 21:09, Andrew Morton wrote:
  On Wed, 20 Jun 2007 20:34:48 +0200
 
  Bj__rn Steinbrink [EMAIL PROTECTED] wrote:
   The performance counter allocator relies on the nmi watchdog being
   probed, so we have to do that even if the watchdog is not enabled.
 
  So...  what's the status of this lot?
 
  I've just merged this patch and the second one:
 
  Subject: [PATCH 2/2] Reserve the right performance counter for the Intel
  PerfMon NMI watchdog Message-ID: [EMAIL PROTECTED]
 
  but there was no followup discussion afaict.
 
  Andi, Stephane: acks?
 
 Yes, although I'm still a little uneasy about the always probe one.
 
  If acked, do we agree that this is 2.6.22 material?
 
 The first (always probe) is probably .22 material, but needs more testing 
 first.

Hm, without the second, I expect OProfile to break when the watchdog is
enabled. Alternatively to the patch I sent, we could revert the change
that makes it use perfctr1 instead of perfctr0. Would you prefer that?

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >