Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-25 Thread Andi Kleen

> This seems to imply that we need the lock prefix even when updating
> different bits within the same long (not just the same byte). Is this
> interpretation correct?

Yes, that is what I said. The consistency applies to loads/stores
(so e.g. for 128bit SSE2 load/stores too), not bytes.

The current IA32 manuals have a more detailed description of the rules 
in volume 3

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-25 Thread Fernando Luis Vázquez Cao
On Tue, 2007-04-24 at 16:22 +0200, Andi Kleen wrote:
> > Why would you need any kind of lock when just changing a single bit,
> > if it didn't affect other bits of the same word?  Just as you don't
> > need a lock when simply assigning a word, setting a bit to 0 or 1
> > is simple in itself (it doesn't matter if it was 0 or 1 before).
> > 
> > > But, I think that concurrent bit operation on different bits
> > > is just like OR operation , so lock prefix is not needed.
> > 
> > I firmly believe that it is; but I'm not a processor expert.
> 
> Think of the CPU cache like the page cache. The VFS cannot change
> anything directly on disk - it always has to read a page (or block);
> change it there even if it's only a single bit and back.
> 
> Now imagine multiple independent kernels to that disk doing this in parallel
> without any locking. You will lose data on simple changes because
> of data races.
> 
> The CPU is similar. The memory is the disk; the protocol talking to 
> the memory only knows cache lines.  There are multiple CPUs talking
> to that memory. The CPUs cannot change anything 
> without reading  a full cache line first and then later writing it back. 
> There is "just do OR operation" when talking to memory, just like
> disks don't have such a operation; just read and write.
> 
> [there are some special cases with uncached mappings, but they are not
> relevant here]
> 
> With lock the CPU ensures the read-modify-write cycle is atomic, 
> without it doesn't.
>  
> The CPU also guarantees you that multiple writes don't get lost
> even when they hit the same cache line (otherwise you would need lock
> for anything in the same cache line), but it doesn't guarantee that
> for a word. The details for that vary by architecture; on x86 it is 
> any memory access, on others it can be only guaranteed for long stores.
> 
> The exact rules for all this can be quite complicated (and also vary
> by architecture), this is a simplification but should be enough as a rough 
> guide.

Intel 80386 Reference Programmer's Manual reads:

" When accessing a bit in memory, the processor may access four bytes
starting from the memory address given by:

   Effective Address + (4 * (BitOffset DIV 32))
for a 32-bit operand size, or two bytes starting from the memory address
given by: 
   Effective Address + (2 * (BitOffset DIV 16))
for a 16-bit operand size. It may do this even when only a single byte
needs to be accessed in order to get at the given bit. "

This seems to imply that we need the lock prefix even when updating
different bits within the same long (not just the same byte). Is this
interpretation correct?

 - Fernando

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-25 Thread Fernando Luis Vázquez Cao
On Tue, 2007-04-24 at 16:22 +0200, Andi Kleen wrote:
  Why would you need any kind of lock when just changing a single bit,
  if it didn't affect other bits of the same word?  Just as you don't
  need a lock when simply assigning a word, setting a bit to 0 or 1
  is simple in itself (it doesn't matter if it was 0 or 1 before).
  
   But, I think that concurrent bit operation on different bits
   is just like OR operation , so lock prefix is not needed.
  
  I firmly believe that it is; but I'm not a processor expert.
 
 Think of the CPU cache like the page cache. The VFS cannot change
 anything directly on disk - it always has to read a page (or block);
 change it there even if it's only a single bit and back.
 
 Now imagine multiple independent kernels to that disk doing this in parallel
 without any locking. You will lose data on simple changes because
 of data races.
 
 The CPU is similar. The memory is the disk; the protocol talking to 
 the memory only knows cache lines.  There are multiple CPUs talking
 to that memory. The CPUs cannot change anything 
 without reading  a full cache line first and then later writing it back. 
 There is just do OR operation when talking to memory, just like
 disks don't have such a operation; just read and write.
 
 [there are some special cases with uncached mappings, but they are not
 relevant here]
 
 With lock the CPU ensures the read-modify-write cycle is atomic, 
 without it doesn't.
  
 The CPU also guarantees you that multiple writes don't get lost
 even when they hit the same cache line (otherwise you would need lock
 for anything in the same cache line), but it doesn't guarantee that
 for a word. The details for that vary by architecture; on x86 it is 
 any memory access, on others it can be only guaranteed for long stores.
 
 The exact rules for all this can be quite complicated (and also vary
 by architecture), this is a simplification but should be enough as a rough 
 guide.

Intel 80386 Reference Programmer's Manual reads:

 When accessing a bit in memory, the processor may access four bytes
starting from the memory address given by:

   Effective Address + (4 * (BitOffset DIV 32))
for a 32-bit operand size, or two bytes starting from the memory address
given by: 
   Effective Address + (2 * (BitOffset DIV 16))
for a 16-bit operand size. It may do this even when only a single byte
needs to be accessed in order to get at the given bit. 

This seems to imply that we need the lock prefix even when updating
different bits within the same long (not just the same byte). Is this
interpretation correct?

 - Fernando

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-25 Thread Andi Kleen

 This seems to imply that we need the lock prefix even when updating
 different bits within the same long (not just the same byte). Is this
 interpretation correct?

Yes, that is what I said. The consistency applies to loads/stores
(so e.g. for 128bit SSE2 load/stores too), not bytes.

The current IA32 manuals have a more detailed description of the rules 
in volume 3

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-24 Thread Andi Kleen

> Why would you need any kind of lock when just changing a single bit,
> if it didn't affect other bits of the same word?  Just as you don't
> need a lock when simply assigning a word, setting a bit to 0 or 1
> is simple in itself (it doesn't matter if it was 0 or 1 before).
> 
> > But, I think that concurrent bit operation on different bits
> > is just like OR operation , so lock prefix is not needed.
> 
> I firmly believe that it is; but I'm not a processor expert.

Think of the CPU cache like the page cache. The VFS cannot change
anything directly on disk - it always has to read a page (or block);
change it there even if it's only a single bit and back.

Now imagine multiple independent kernels to that disk doing this in parallel
without any locking. You will lose data on simple changes because
of data races.

The CPU is similar. The memory is the disk; the protocol talking to 
the memory only knows cache lines.  There are multiple CPUs talking
to that memory. The CPUs cannot change anything 
without reading  a full cache line first and then later writing it back. 
There is "just do OR operation" when talking to memory, just like
disks don't have such a operation; just read and write.

[there are some special cases with uncached mappings, but they are not
relevant here]

With lock the CPU ensures the read-modify-write cycle is atomic, 
without it doesn't.
 
The CPU also guarantees you that multiple writes don't get lost
even when they hit the same cache line (otherwise you would need lock
for anything in the same cache line), but it doesn't guarantee that
for a word. The details for that vary by architecture; on x86 it is 
any memory access, on others it can be only guaranteed for long stores.

The exact rules for all this can be quite complicated (and also vary
by architecture), this is a simplification but should be enough as a rough 
guide.

Hope this helps,

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-24 Thread Hugh Dickins
On Tue, 24 Apr 2007, Hisashi Hifumi wrote:
> At 22:42 07/04/23, Hugh Dickins wrote:
> >On Mon, 23 Apr 2007, Hisashi Hifumi wrote:
> > > >No.  The PG_lru flag bit is just one bit amongst many others:
> > > >what of concurrent operations changing other bits in that same
> > > >unsigned long e.g. trying to lock the page by setting PG_locked?
> > > >There are some places where such micro-optimizations can be made
> > > >(typically while first allocating the page); but in general, no.
> > >
> > > In i386 and x86_64, btsl is used to change page flag. In this case,
> > > if btsl without lock prefix
> > > set PG_locked and PG_lru flag concurrently, does only one operation
> > > succeed ?
> >
> >That's right: on an SMP machine, without the lock prefix, the operation
> >is no longer atomic: what's stored back may be missing the result of
> >one or the other of the racing operations.
> 
> In the case that changing the same bit concurrently, lock prefix or other
> spinlock is needed.

Why would you need any kind of lock when just changing a single bit,
if it didn't affect other bits of the same word?  Just as you don't
need a lock when simply assigning a word, setting a bit to 0 or 1
is simple in itself (it doesn't matter if it was 0 or 1 before).

> But, I think that concurrent bit operation on different bits
> is just like OR operation , so lock prefix is not needed.

I firmly believe that it is; but I'm not a processor expert.

> AMD instruction manual says about bts that ,
> 
> "Copies a bit, specified by bit index in a register or 8-bit immediate value
> (second operand), from a bit
> string (first operand), also called the bit base, to the carry flag (CF) of
> the rFLAGS register, and then
> sets the bit in the bit string to 1."
> 
> BTS instruction is read-modify-write instruction on bit unit. So concurrent
> bit operation on different bits may be possible.

read-modify-write indeed.  That's exactly what the "lock" prefix is
needed for, isn't it?  To lock together the read-modify-write cycles
to make the entire operation atomic on SMP.  Without which you may
lose the changes made concurrently to the same word by another cpu,
or it lose yours.

(If another cpu is merely changing the same bit as you are, no problem:
either you're both changing it to the same value, or you're racing to
set it to opposite values, and one of you wins - fair enough, however
much locking you add, you're still left with the situation that two
are racing and one wins in the end.)

You now seem to be arguing, never mind your original PageLRU case,
that include/asm-i386/bitops.h and include/asm-x86_64/bitops.h
don't need the LOCK_PREFIX before the "btsl" in set_bit().

I disagree with you; but let someone more authoritative speak up.
Mabye Andi can explain it a lot better than I can?

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-24 Thread Nick Piggin

Hisashi Hifumi wrote:


At 11:47 07/04/24, Nick Piggin wrote:

 >As Hugh points out, we must have atomic ops here, so changing the generic
 >code to use the __ version is wrong. However if there is a faster way 
that
 >i386 can perform the atomic variant, then doing so will speed up the 
generic

 >code without breaking other architectures.
 >

Do you mean writing page-flags.h specific for i386 so improving generic 
code

and without breaking other architectures ?


I meant improving the i386 bitops specific code.

However if there is some variant of operation that is not captured
with the current bitop API, but could provide a useful speedup of
common page flag manipulations, then you might consider extending
the bitop API and making page-flags.h use that new operation.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-24 Thread Hisashi Hifumi


At 11:47 07/04/24, Nick Piggin wrote:

>As Hugh points out, we must have atomic ops here, so changing the generic
>code to use the __ version is wrong. However if there is a faster way that
>i386 can perform the atomic variant, then doing so will speed up the generic
>code without breaking other architectures.
>

Do you mean writing page-flags.h specific for i386 so improving generic code
and without breaking other architectures ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-24 Thread Hisashi Hifumi


At 11:47 07/04/24, Nick Piggin wrote:

As Hugh points out, we must have atomic ops here, so changing the generic
code to use the __ version is wrong. However if there is a faster way that
i386 can perform the atomic variant, then doing so will speed up the generic
code without breaking other architectures.


Do you mean writing page-flags.h specific for i386 so improving generic code
and without breaking other architectures ?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-24 Thread Nick Piggin

Hisashi Hifumi wrote:


At 11:47 07/04/24, Nick Piggin wrote:

 As Hugh points out, we must have atomic ops here, so changing the generic
 code to use the __ version is wrong. However if there is a faster way 
that
 i386 can perform the atomic variant, then doing so will speed up the 
generic

 code without breaking other architectures.
 

Do you mean writing page-flags.h specific for i386 so improving generic 
code

and without breaking other architectures ?


I meant improving the i386 bitops specific code.

However if there is some variant of operation that is not captured
with the current bitop API, but could provide a useful speedup of
common page flag manipulations, then you might consider extending
the bitop API and making page-flags.h use that new operation.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-24 Thread Hugh Dickins
On Tue, 24 Apr 2007, Hisashi Hifumi wrote:
 At 22:42 07/04/23, Hugh Dickins wrote:
 On Mon, 23 Apr 2007, Hisashi Hifumi wrote:
   No.  The PG_lru flag bit is just one bit amongst many others:
   what of concurrent operations changing other bits in that same
   unsigned long e.g. trying to lock the page by setting PG_locked?
   There are some places where such micro-optimizations can be made
   (typically while first allocating the page); but in general, no.
  
   In i386 and x86_64, btsl is used to change page flag. In this case,
   if btsl without lock prefix
   set PG_locked and PG_lru flag concurrently, does only one operation
   succeed ?
 
 That's right: on an SMP machine, without the lock prefix, the operation
 is no longer atomic: what's stored back may be missing the result of
 one or the other of the racing operations.
 
 In the case that changing the same bit concurrently, lock prefix or other
 spinlock is needed.

Why would you need any kind of lock when just changing a single bit,
if it didn't affect other bits of the same word?  Just as you don't
need a lock when simply assigning a word, setting a bit to 0 or 1
is simple in itself (it doesn't matter if it was 0 or 1 before).

 But, I think that concurrent bit operation on different bits
 is just like OR operation , so lock prefix is not needed.

I firmly believe that it is; but I'm not a processor expert.

 AMD instruction manual says about bts that ,
 
 Copies a bit, specified by bit index in a register or 8-bit immediate value
 (second operand), from a bit
 string (first operand), also called the bit base, to the carry flag (CF) of
 the rFLAGS register, and then
 sets the bit in the bit string to 1.
 
 BTS instruction is read-modify-write instruction on bit unit. So concurrent
 bit operation on different bits may be possible.

read-modify-write indeed.  That's exactly what the lock prefix is
needed for, isn't it?  To lock together the read-modify-write cycles
to make the entire operation atomic on SMP.  Without which you may
lose the changes made concurrently to the same word by another cpu,
or it lose yours.

(If another cpu is merely changing the same bit as you are, no problem:
either you're both changing it to the same value, or you're racing to
set it to opposite values, and one of you wins - fair enough, however
much locking you add, you're still left with the situation that two
are racing and one wins in the end.)

You now seem to be arguing, never mind your original PageLRU case,
that include/asm-i386/bitops.h and include/asm-x86_64/bitops.h
don't need the LOCK_PREFIX before the btsl in set_bit().

I disagree with you; but let someone more authoritative speak up.
Mabye Andi can explain it a lot better than I can?

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-24 Thread Andi Kleen

 Why would you need any kind of lock when just changing a single bit,
 if it didn't affect other bits of the same word?  Just as you don't
 need a lock when simply assigning a word, setting a bit to 0 or 1
 is simple in itself (it doesn't matter if it was 0 or 1 before).
 
  But, I think that concurrent bit operation on different bits
  is just like OR operation , so lock prefix is not needed.
 
 I firmly believe that it is; but I'm not a processor expert.

Think of the CPU cache like the page cache. The VFS cannot change
anything directly on disk - it always has to read a page (or block);
change it there even if it's only a single bit and back.

Now imagine multiple independent kernels to that disk doing this in parallel
without any locking. You will lose data on simple changes because
of data races.

The CPU is similar. The memory is the disk; the protocol talking to 
the memory only knows cache lines.  There are multiple CPUs talking
to that memory. The CPUs cannot change anything 
without reading  a full cache line first and then later writing it back. 
There is just do OR operation when talking to memory, just like
disks don't have such a operation; just read and write.

[there are some special cases with uncached mappings, but they are not
relevant here]

With lock the CPU ensures the read-modify-write cycle is atomic, 
without it doesn't.
 
The CPU also guarantees you that multiple writes don't get lost
even when they hit the same cache line (otherwise you would need lock
for anything in the same cache line), but it doesn't guarantee that
for a word. The details for that vary by architecture; on x86 it is 
any memory access, on others it can be only guaranteed for long stores.

The exact rules for all this can be quite complicated (and also vary
by architecture), this is a simplification but should be enough as a rough 
guide.

Hope this helps,

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Nick Piggin

Hisashi Hifumi wrote:


At 22:42 07/04/23, Hugh Dickins wrote:
 >On Mon, 23 Apr 2007, Hisashi Hifumi wrote:
 >> >No.  The PG_lru flag bit is just one bit amongst many others:
 >> >what of concurrent operations changing other bits in that same
 >> >unsigned long e.g. trying to lock the page by setting PG_locked?
 >> >There are some places where such micro-optimizations can be made
 >> >(typically while first allocating the page); but in general, no.
 >>
 >> In i386 and x86_64, btsl is used to change page flag. In this case, 
if btsl

 >> without lock prefix
 >> set PG_locked and PG_lru flag concurrently, does only one operation
 >> succeed ?
 >
 >That's right: on an SMP machine, without the lock prefix, the operation
 >is no longer atomic: what's stored back may be missing the result of
 >one or the other of the racing operations.
 >

In the case that changing the same bit concurrently, lock prefix or other
spinlock is needed. But, I think that concurrent bit operation on 
different bits

is just like OR operation , so lock prefix is not needed.

AMD instruction manual says about bts that ,

"Copies a bit, specified by bit index in a register or 8-bit immediate 
value (second operand), from a bit
string (first operand), also called the bit base, to the carry flag (CF) 
of the rFLAGS register, and then

sets the bit in the bit string to 1."

BTS instruction is read-modify-write instruction on bit unit. So 
concurrent bit operation on different

bits may be possible.



No matter what actual instruction is used, the SetPageLRU operation (ie.
without the double underscore prefix) must be atomic, and the __SetPageLRU
operation *can* be non-atomic if that would be faster.

As Hugh points out, we must have atomic ops here, so changing the generic
code to use the __ version is wrong. However if there is a faster way that
i386 can perform the atomic variant, then doing so will speed up the generic
code without breaking other architectures.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread KAMEZAWA Hiroyuki
On Tue, 24 Apr 2007 10:54:27 +0900
Hisashi Hifumi <[EMAIL PROTECTED]> wrote:
> In the case that changing the same bit concurrently, lock prefix or other
> spinlock is needed. But, I think that concurrent bit operation on different 
> bits
> is just like OR operation , so lock prefix is not needed.
> 
> AMD instruction manual says about bts that ,
> 
> "Copies a bit, specified by bit index in a register or 8-bit immediate 
> value (second operand), from a bit
> string (first operand), also called the bit base, to the carry flag (CF) of 
> the rFLAGS register, and then
> sets the bit in the bit string to 1."
> 
> BTS instruction is read-modify-write instruction on bit unit. So concurrent 
> bit operation on different
> bits may be possible.
> 
This is ia64's __set_bit() hehe..
==
static __inline__ void
__set_bit (int nr, volatile void *addr)
{
*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
}
==

Bye.
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Hisashi Hifumi


At 22:42 07/04/23, Hugh Dickins wrote:
>On Mon, 23 Apr 2007, Hisashi Hifumi wrote:
>> >No.  The PG_lru flag bit is just one bit amongst many others:
>> >what of concurrent operations changing other bits in that same
>> >unsigned long e.g. trying to lock the page by setting PG_locked?
>> >There are some places where such micro-optimizations can be made
>> >(typically while first allocating the page); but in general, no.
>>
>> In i386 and x86_64, btsl is used to change page flag. In this case, if btsl
>> without lock prefix
>> set PG_locked and PG_lru flag concurrently, does only one operation
>> succeed ?
>
>That's right: on an SMP machine, without the lock prefix, the operation
>is no longer atomic: what's stored back may be missing the result of
>one or the other of the racing operations.
>

In the case that changing the same bit concurrently, lock prefix or other
spinlock is needed. But, I think that concurrent bit operation on different 
bits

is just like OR operation , so lock prefix is not needed.

AMD instruction manual says about bts that ,

"Copies a bit, specified by bit index in a register or 8-bit immediate 
value (second operand), from a bit
string (first operand), also called the bit base, to the carry flag (CF) of 
the rFLAGS register, and then

sets the bit in the bit string to 1."

BTS instruction is read-modify-write instruction on bit unit. So concurrent 
bit operation on different

bits may be possible.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Hugh Dickins
On Mon, 23 Apr 2007, Hisashi Hifumi wrote:
> >No.  The PG_lru flag bit is just one bit amongst many others:
> >what of concurrent operations changing other bits in that same
> >unsigned long e.g. trying to lock the page by setting PG_locked?
> >There are some places where such micro-optimizations can be made
> >(typically while first allocating the page); but in general, no.
> 
> In i386 and x86_64, btsl is used to change page flag. In this case, if btsl
> without lock prefix
> set PG_locked and PG_lru flag concurrently, does only one operation
> succeed ?

That's right: on an SMP machine, without the lock prefix, the operation
is no longer atomic: what's stored back may be missing the result of
one or the other of the racing operations.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Hisashi Hifumi


>No.  The PG_lru flag bit is just one bit amongst many others:
>what of concurrent operations changing other bits in that same
>unsigned long e.g. trying to lock the page by setting PG_locked?
>There are some places where such micro-optimizations can be made
>(typically while first allocating the page); but in general, no.

In i386 and x86_64, btsl is used to change page flag. In this case, if btsl 
without lock prefix

set PG_locked and PG_lru flag concurrently, does only one operation
succeed ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Hugh Dickins
On Mon, 23 Apr 2007, Hisashi Hifumi wrote:
> 
> PageLRU flag operation is protected by zone->lru_lock, so
> SetPageLRU/ClearPageLRU
> can be replaced to __SetPageLRU/__ClearPageLRU non-atomic bit operation.

No.  The PG_lru flag bit is just one bit amongst many others:
what of concurrent operations changing other bits in that same
unsigned long e.g. trying to lock the page by setting PG_locked?
There are some places where such micro-optimizations can be made
(typically while first allocating the page); but in general, no.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Hisashi Hifumi

Hi

PageLRU flag operation is protected by zone->lru_lock, so 
SetPageLRU/ClearPageLRU
can be replaced to __SetPageLRU/__ClearPageLRU non-atomic bit operation.

Thanks.

Signed-off-by :Hisashi Hifumi <[EMAIL PROTECTED]>

diff -Nrup linux-2.6.21-rc7.org/include/linux/page-flags.h 
linux-2.6.21-rc7/include/linux/page-flags.h
--- linux-2.6.21-rc7.org/include/linux/page-flags.h	2007-04-17 
16:36:05.0 +0900
+++ linux-2.6.21-rc7/include/linux/page-flags.h	2007-04-23 
18:28:19.0 +0900

@@ -148,8 +148,7 @@ static inline void SetPageUptodate(struc
 #define TestClearPageDirty(page) test_and_clear_bit(PG_dirty, &(page)->flags)

 #define PageLRU(page)  test_bit(PG_lru, &(page)->flags)
-#define SetPageLRU(page)   set_bit(PG_lru, &(page)->flags)
-#define ClearPageLRU(page) clear_bit(PG_lru, &(page)->flags)
+#define __SetPageLRU(page) __set_bit(PG_lru, &(page)->flags)
 #define __ClearPageLRU(page)   __clear_bit(PG_lru, &(page)->flags)

 #define PageActive(page)   test_bit(PG_active, &(page)->flags)
diff -Nrup linux-2.6.21-rc7.org/mm/migrate.c linux-2.6.21-rc7/mm/migrate.c
--- linux-2.6.21-rc7.org/mm/migrate.c   2007-04-17 16:36:05.0 +0900
+++ linux-2.6.21-rc7/mm/migrate.c   2007-04-23 18:30:43.0 +0900
@@ -52,7 +52,7 @@ int isolate_lru_page(struct page *page,
if (PageLRU(page)) {
ret = 0;
get_page(page);
-   ClearPageLRU(page);
+   __ClearPageLRU(page);
if (PageActive(page))
del_page_from_active_list(zone, page);
else
diff -Nrup linux-2.6.21-rc7.org/mm/swap.c linux-2.6.21-rc7/mm/swap.c
--- linux-2.6.21-rc7.org/mm/swap.c  2007-02-05 03:44:54.0 +0900
+++ linux-2.6.21-rc7/mm/swap.c  2007-04-23 18:29:51.0 +0900
@@ -364,7 +364,7 @@ void __pagevec_lru_add(struct pagevec *p
spin_lock_irq(>lru_lock);
}
VM_BUG_ON(PageLRU(page));
-   SetPageLRU(page);
+   __SetPageLRU(page);
add_page_to_inactive_list(zone, page);
}
if (zone)
@@ -391,7 +391,7 @@ void __pagevec_lru_add_active(struct pag
spin_lock_irq(>lru_lock);
}
VM_BUG_ON(PageLRU(page));
-   SetPageLRU(page);
+   __SetPageLRU(page);
VM_BUG_ON(PageActive(page));
SetPageActive(page);
add_page_to_active_list(zone, page);
diff -Nrup linux-2.6.21-rc7.org/mm/vmscan.c linux-2.6.21-rc7/mm/vmscan.c
--- linux-2.6.21-rc7.org/mm/vmscan.c2007-04-17 16:36:05.0 +0900
+++ linux-2.6.21-rc7/mm/vmscan.c2007-04-23 18:30:30.0 +0900
@@ -642,7 +642,7 @@ static unsigned long isolate_lru_pages(u
 * sure the page is not being freed elsewhere -- the
 * page release code relies on it.
 */
-   ClearPageLRU(page);
+   __ClearPageLRU(page);
target = dst;
nr_taken++;
} /* else it is being freed elsewhere */
@@ -704,7 +704,7 @@ static unsigned long shrink_inactive_lis
while (!list_empty(_list)) {
page = lru_to_page(_list);
VM_BUG_ON(PageLRU(page));
-   SetPageLRU(page);
+   __SetPageLRU(page);
list_del(>lru);
if (PageActive(page))
add_page_to_active_list(zone, page);
@@ -851,7 +851,7 @@ force_reclaim_mapped:
page = lru_to_page(_inactive);
prefetchw_prev_lru_page(page, _inactive, flags);
VM_BUG_ON(PageLRU(page));
-   SetPageLRU(page);
+   __SetPageLRU(page);
VM_BUG_ON(!PageActive(page));
ClearPageActive(page);

@@ -881,7 +881,7 @@ force_reclaim_mapped:
page = lru_to_page(_active);
prefetchw_prev_lru_page(page, _active, flags);
VM_BUG_ON(PageLRU(page));
-   SetPageLRU(page);
+   __SetPageLRU(page);
VM_BUG_ON(!PageActive(page));
list_move(>lru, >active_list);
 		pgmoved++; 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Hisashi Hifumi

Hi

PageLRU flag operation is protected by zone-lru_lock, so 
SetPageLRU/ClearPageLRU
can be replaced to __SetPageLRU/__ClearPageLRU non-atomic bit operation.

Thanks.

Signed-off-by :Hisashi Hifumi [EMAIL PROTECTED]

diff -Nrup linux-2.6.21-rc7.org/include/linux/page-flags.h 
linux-2.6.21-rc7/include/linux/page-flags.h
--- linux-2.6.21-rc7.org/include/linux/page-flags.h	2007-04-17 
16:36:05.0 +0900
+++ linux-2.6.21-rc7/include/linux/page-flags.h	2007-04-23 
18:28:19.0 +0900

@@ -148,8 +148,7 @@ static inline void SetPageUptodate(struc
 #define TestClearPageDirty(page) test_and_clear_bit(PG_dirty, (page)-flags)

 #define PageLRU(page)  test_bit(PG_lru, (page)-flags)
-#define SetPageLRU(page)   set_bit(PG_lru, (page)-flags)
-#define ClearPageLRU(page) clear_bit(PG_lru, (page)-flags)
+#define __SetPageLRU(page) __set_bit(PG_lru, (page)-flags)
 #define __ClearPageLRU(page)   __clear_bit(PG_lru, (page)-flags)

 #define PageActive(page)   test_bit(PG_active, (page)-flags)
diff -Nrup linux-2.6.21-rc7.org/mm/migrate.c linux-2.6.21-rc7/mm/migrate.c
--- linux-2.6.21-rc7.org/mm/migrate.c   2007-04-17 16:36:05.0 +0900
+++ linux-2.6.21-rc7/mm/migrate.c   2007-04-23 18:30:43.0 +0900
@@ -52,7 +52,7 @@ int isolate_lru_page(struct page *page,
if (PageLRU(page)) {
ret = 0;
get_page(page);
-   ClearPageLRU(page);
+   __ClearPageLRU(page);
if (PageActive(page))
del_page_from_active_list(zone, page);
else
diff -Nrup linux-2.6.21-rc7.org/mm/swap.c linux-2.6.21-rc7/mm/swap.c
--- linux-2.6.21-rc7.org/mm/swap.c  2007-02-05 03:44:54.0 +0900
+++ linux-2.6.21-rc7/mm/swap.c  2007-04-23 18:29:51.0 +0900
@@ -364,7 +364,7 @@ void __pagevec_lru_add(struct pagevec *p
spin_lock_irq(zone-lru_lock);
}
VM_BUG_ON(PageLRU(page));
-   SetPageLRU(page);
+   __SetPageLRU(page);
add_page_to_inactive_list(zone, page);
}
if (zone)
@@ -391,7 +391,7 @@ void __pagevec_lru_add_active(struct pag
spin_lock_irq(zone-lru_lock);
}
VM_BUG_ON(PageLRU(page));
-   SetPageLRU(page);
+   __SetPageLRU(page);
VM_BUG_ON(PageActive(page));
SetPageActive(page);
add_page_to_active_list(zone, page);
diff -Nrup linux-2.6.21-rc7.org/mm/vmscan.c linux-2.6.21-rc7/mm/vmscan.c
--- linux-2.6.21-rc7.org/mm/vmscan.c2007-04-17 16:36:05.0 +0900
+++ linux-2.6.21-rc7/mm/vmscan.c2007-04-23 18:30:30.0 +0900
@@ -642,7 +642,7 @@ static unsigned long isolate_lru_pages(u
 * sure the page is not being freed elsewhere -- the
 * page release code relies on it.
 */
-   ClearPageLRU(page);
+   __ClearPageLRU(page);
target = dst;
nr_taken++;
} /* else it is being freed elsewhere */
@@ -704,7 +704,7 @@ static unsigned long shrink_inactive_lis
while (!list_empty(page_list)) {
page = lru_to_page(page_list);
VM_BUG_ON(PageLRU(page));
-   SetPageLRU(page);
+   __SetPageLRU(page);
list_del(page-lru);
if (PageActive(page))
add_page_to_active_list(zone, page);
@@ -851,7 +851,7 @@ force_reclaim_mapped:
page = lru_to_page(l_inactive);
prefetchw_prev_lru_page(page, l_inactive, flags);
VM_BUG_ON(PageLRU(page));
-   SetPageLRU(page);
+   __SetPageLRU(page);
VM_BUG_ON(!PageActive(page));
ClearPageActive(page);

@@ -881,7 +881,7 @@ force_reclaim_mapped:
page = lru_to_page(l_active);
prefetchw_prev_lru_page(page, l_active, flags);
VM_BUG_ON(PageLRU(page));
-   SetPageLRU(page);
+   __SetPageLRU(page);
VM_BUG_ON(!PageActive(page));
list_move(page-lru, zone-active_list);
 		pgmoved++; 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Hugh Dickins
On Mon, 23 Apr 2007, Hisashi Hifumi wrote:
 
 PageLRU flag operation is protected by zone-lru_lock, so
 SetPageLRU/ClearPageLRU
 can be replaced to __SetPageLRU/__ClearPageLRU non-atomic bit operation.

No.  The PG_lru flag bit is just one bit amongst many others:
what of concurrent operations changing other bits in that same
unsigned long e.g. trying to lock the page by setting PG_locked?
There are some places where such micro-optimizations can be made
(typically while first allocating the page); but in general, no.

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Hisashi Hifumi


No.  The PG_lru flag bit is just one bit amongst many others:
what of concurrent operations changing other bits in that same
unsigned long e.g. trying to lock the page by setting PG_locked?
There are some places where such micro-optimizations can be made
(typically while first allocating the page); but in general, no.

In i386 and x86_64, btsl is used to change page flag. In this case, if btsl 
without lock prefix

set PG_locked and PG_lru flag concurrently, does only one operation
succeed ?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Hugh Dickins
On Mon, 23 Apr 2007, Hisashi Hifumi wrote:
 No.  The PG_lru flag bit is just one bit amongst many others:
 what of concurrent operations changing other bits in that same
 unsigned long e.g. trying to lock the page by setting PG_locked?
 There are some places where such micro-optimizations can be made
 (typically while first allocating the page); but in general, no.
 
 In i386 and x86_64, btsl is used to change page flag. In this case, if btsl
 without lock prefix
 set PG_locked and PG_lru flag concurrently, does only one operation
 succeed ?

That's right: on an SMP machine, without the lock prefix, the operation
is no longer atomic: what's stored back may be missing the result of
one or the other of the racing operations.

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Hisashi Hifumi


At 22:42 07/04/23, Hugh Dickins wrote:
On Mon, 23 Apr 2007, Hisashi Hifumi wrote:
 No.  The PG_lru flag bit is just one bit amongst many others:
 what of concurrent operations changing other bits in that same
 unsigned long e.g. trying to lock the page by setting PG_locked?
 There are some places where such micro-optimizations can be made
 (typically while first allocating the page); but in general, no.

 In i386 and x86_64, btsl is used to change page flag. In this case, if btsl
 without lock prefix
 set PG_locked and PG_lru flag concurrently, does only one operation
 succeed ?

That's right: on an SMP machine, without the lock prefix, the operation
is no longer atomic: what's stored back may be missing the result of
one or the other of the racing operations.


In the case that changing the same bit concurrently, lock prefix or other
spinlock is needed. But, I think that concurrent bit operation on different 
bits

is just like OR operation , so lock prefix is not needed.

AMD instruction manual says about bts that ,

Copies a bit, specified by bit index in a register or 8-bit immediate 
value (second operand), from a bit
string (first operand), also called the bit base, to the carry flag (CF) of 
the rFLAGS register, and then

sets the bit in the bit string to 1.

BTS instruction is read-modify-write instruction on bit unit. So concurrent 
bit operation on different

bits may be possible.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread KAMEZAWA Hiroyuki
On Tue, 24 Apr 2007 10:54:27 +0900
Hisashi Hifumi [EMAIL PROTECTED] wrote:
 In the case that changing the same bit concurrently, lock prefix or other
 spinlock is needed. But, I think that concurrent bit operation on different 
 bits
 is just like OR operation , so lock prefix is not needed.
 
 AMD instruction manual says about bts that ,
 
 Copies a bit, specified by bit index in a register or 8-bit immediate 
 value (second operand), from a bit
 string (first operand), also called the bit base, to the carry flag (CF) of 
 the rFLAGS register, and then
 sets the bit in the bit string to 1.
 
 BTS instruction is read-modify-write instruction on bit unit. So concurrent 
 bit operation on different
 bits may be possible.
 
This is ia64's __set_bit() hehe..
==
static __inline__ void
__set_bit (int nr, volatile void *addr)
{
*((__u32 *) addr + (nr  5)) |= (1  (nr  31));
}
==

Bye.
-Kame

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-23 Thread Nick Piggin

Hisashi Hifumi wrote:


At 22:42 07/04/23, Hugh Dickins wrote:
 On Mon, 23 Apr 2007, Hisashi Hifumi wrote:
  No.  The PG_lru flag bit is just one bit amongst many others:
  what of concurrent operations changing other bits in that same
  unsigned long e.g. trying to lock the page by setting PG_locked?
  There are some places where such micro-optimizations can be made
  (typically while first allocating the page); but in general, no.
 
  In i386 and x86_64, btsl is used to change page flag. In this case, 
if btsl

  without lock prefix
  set PG_locked and PG_lru flag concurrently, does only one operation
  succeed ?
 
 That's right: on an SMP machine, without the lock prefix, the operation
 is no longer atomic: what's stored back may be missing the result of
 one or the other of the racing operations.
 

In the case that changing the same bit concurrently, lock prefix or other
spinlock is needed. But, I think that concurrent bit operation on 
different bits

is just like OR operation , so lock prefix is not needed.

AMD instruction manual says about bts that ,

Copies a bit, specified by bit index in a register or 8-bit immediate 
value (second operand), from a bit
string (first operand), also called the bit base, to the carry flag (CF) 
of the rFLAGS register, and then

sets the bit in the bit string to 1.

BTS instruction is read-modify-write instruction on bit unit. So 
concurrent bit operation on different

bits may be possible.



No matter what actual instruction is used, the SetPageLRU operation (ie.
without the double underscore prefix) must be atomic, and the __SetPageLRU
operation *can* be non-atomic if that would be faster.

As Hugh points out, we must have atomic ops here, so changing the generic
code to use the __ version is wrong. However if there is a faster way that
i386 can perform the atomic variant, then doing so will speed up the generic
code without breaking other architectures.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/