Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-18 Thread Mikulas Patocka
> > > > You already must not place any data structures into WC memory --- for 
> > > > example, spinlocks wouldn't work there.
> > > 
> > > What do you mean "already"?
> > 
> > I mean "in current kernel" (I checked it in 2.6.22)
> 
> Ahh, that's not "current kernel", though ;)
> 
> 4071c718555d955a35e9651f77086096ad87d498
>
> > So drivers can't assume that wmb() works on write-combining memory.
> 
> Drivers should be able to assume that wmb() orders _everything_ (except
> some whacky Altix thing, which I really want to fold under wmb at some
> point anyway).
> 
> So I decided that old x86 semantics isn't right, and now it really is a
> lock op / sfence everywhere.

I see. I'm just curious --- is there any real usage for WC memory, except 
graphics card memory?

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-18 Thread Mikulas Patocka
You already must not place any data structures into WC memory --- for 
example, spinlocks wouldn't work there.
   
   What do you mean already?
  
  I mean in current kernel (I checked it in 2.6.22)
 
 Ahh, that's not current kernel, though ;)
 
 4071c718555d955a35e9651f77086096ad87d498

  So drivers can't assume that wmb() works on write-combining memory.
 
 Drivers should be able to assume that wmb() orders _everything_ (except
 some whacky Altix thing, which I really want to fold under wmb at some
 point anyway).
 
 So I decided that old x86 semantics isn't right, and now it really is a
 lock op / sfence everywhere.

I see. I'm just curious --- is there any real usage for WC memory, except 
graphics card memory?

Mikulas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-17 Thread Nick Piggin
On Wed, Oct 17, 2007 at 01:51:17PM +0800, Herbert Xu wrote:
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> >
> > Also, for non-wb memory. I don't think the Intel document referenced
> > says anything about this, but the AMD document says that loads can pass
> > loads (page 8, rule b).
> > 
> > This is why our rmb() is still an lfence.
> 
> BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb
> instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's
> shared with other Xen domains or the hypervisor.
> 
> The reason this is necessary is because even if a Xen domain is
> UP the hypervisor might be SMP.
> 
> It would be nice if we can have these adopt the new SMP barriers
> on x86 instead of the IO ones as they currently do.

That's a good point actually. Something like raw_smp_*mb, which
always orders memory, but only for regular WB operatoins. I could
put that on the todo list...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-17 Thread Nick Piggin
On Wed, Oct 17, 2007 at 02:30:32AM +0200, Mikulas Patocka wrote:
> > > You already must not place any data structures into WC memory --- for 
> > > example, spinlocks wouldn't work there.
> > 
> > What do you mean "already"?
> 
> I mean "in current kernel" (I checked it in 2.6.22)

Ahh, that's not "current kernel", though ;)

4071c718555d955a35e9651f77086096ad87d498

 
> > If we already have drivers loading data from
> > WC memory, then rmb() needs to order them, whether or not they actually
> > need it. If that were prohibitively costly, then we'd introduce a new
> > barrier which does not order WC memory, right?
> > 
> > 
> > > wmb() also won't work on WC 
> > > memory, because it assumes that writes are ordered.
> > 
> > You mean the one defined like this:
> >   #define wmb()   asm volatile("sfence" ::: "memory")
> > ? If it assumed writes are ordered, then it would just be a barrier().
> 
> You read wrong part of the include file. Really, it is 
> (2.6.22,include/asm-i386/system.h):
> #ifdef CONFIG_X86_OOSTORE
> #define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", 
> X86_FEATURE_XMM)
> #else
> #define wmb()   __asm__ __volatile__ ("": : :"memory")
> #endif
> 
> CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6
> --- so on Intel and AMD, it is really just barrier().
> 
> So drivers can't assume that wmb() works on write-combining memory.

Drivers should be able to assume that wmb() orders _everything_ (except
some whacky Altix thing, which I really want to fold under wmb at some
point anyway).

So I decided that old x86 semantics isn't right, and now it really is a
lock op / sfence everywhere.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-17 Thread Nick Piggin
On Wed, Oct 17, 2007 at 02:30:32AM +0200, Mikulas Patocka wrote:
   You already must not place any data structures into WC memory --- for 
   example, spinlocks wouldn't work there.
  
  What do you mean already?
 
 I mean in current kernel (I checked it in 2.6.22)

Ahh, that's not current kernel, though ;)

4071c718555d955a35e9651f77086096ad87d498

 
  If we already have drivers loading data from
  WC memory, then rmb() needs to order them, whether or not they actually
  need it. If that were prohibitively costly, then we'd introduce a new
  barrier which does not order WC memory, right?
  
  
   wmb() also won't work on WC 
   memory, because it assumes that writes are ordered.
  
  You mean the one defined like this:
#define wmb()   asm volatile(sfence ::: memory)
  ? If it assumed writes are ordered, then it would just be a barrier().
 
 You read wrong part of the include file. Really, it is 
 (2.6.22,include/asm-i386/system.h):
 #ifdef CONFIG_X86_OOSTORE
 #define wmb() alternative(lock; addl $0,0(%%esp), sfence, 
 X86_FEATURE_XMM)
 #else
 #define wmb()   __asm__ __volatile__ (: : :memory)
 #endif
 
 CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6
 --- so on Intel and AMD, it is really just barrier().
 
 So drivers can't assume that wmb() works on write-combining memory.

Drivers should be able to assume that wmb() orders _everything_ (except
some whacky Altix thing, which I really want to fold under wmb at some
point anyway).

So I decided that old x86 semantics isn't right, and now it really is a
lock op / sfence everywhere.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-17 Thread Nick Piggin
On Wed, Oct 17, 2007 at 01:51:17PM +0800, Herbert Xu wrote:
 Nick Piggin [EMAIL PROTECTED] wrote:
 
  Also, for non-wb memory. I don't think the Intel document referenced
  says anything about this, but the AMD document says that loads can pass
  loads (page 8, rule b).
  
  This is why our rmb() is still an lfence.
 
 BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb
 instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's
 shared with other Xen domains or the hypervisor.
 
 The reason this is necessary is because even if a Xen domain is
 UP the hypervisor might be SMP.
 
 It would be nice if we can have these adopt the new SMP barriers
 on x86 instead of the IO ones as they currently do.

That's a good point actually. Something like raw_smp_*mb, which
always orders memory, but only for regular WB operatoins. I could
put that on the todo list...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Herbert Xu
Nick Piggin <[EMAIL PROTECTED]> wrote:
>
> Also, for non-wb memory. I don't think the Intel document referenced
> says anything about this, but the AMD document says that loads can pass
> loads (page 8, rule b).
> 
> This is why our rmb() is still an lfence.

BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb
instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's
shared with other Xen domains or the hypervisor.

The reason this is necessary is because even if a Xen domain is
UP the hypervisor might be SMP.

It would be nice if we can have these adopt the new SMP barriers
on x86 instead of the IO ones as they currently do.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
> > You already must not place any data structures into WC memory --- for 
> > example, spinlocks wouldn't work there.
> 
> What do you mean "already"?

I mean "in current kernel" (I checked it in 2.6.22)

> If we already have drivers loading data from
> WC memory, then rmb() needs to order them, whether or not they actually
> need it. If that were prohibitively costly, then we'd introduce a new
> barrier which does not order WC memory, right?
> 
> 
> > wmb() also won't work on WC 
> > memory, because it assumes that writes are ordered.
> 
> You mean the one defined like this:
>   #define wmb()   asm volatile("sfence" ::: "memory")
> ? If it assumed writes are ordered, then it would just be a barrier().

You read wrong part of the include file. Really, it is 
(2.6.22,include/asm-i386/system.h):
#ifdef CONFIG_X86_OOSTORE
#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", 
X86_FEATURE_XMM)
#else
#define wmb()   __asm__ __volatile__ ("": : :"memory")
#endif

CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6
--- so on Intel and AMD, it is really just barrier().

So drivers can't assume that wmb() works on write-combining memory.

> > > Doing that would lead to an unmaintainable mess. If drivers don't 
> > > need rmb, then they don't call it.
> > 
> > If wmb() doesn't currently work on write-combining memory, why should 
> > rmb() work there?
> 
> I don't understand why you say wmb() doesn't work on WC memory.

Because it is defined as __asm__ __volatile__ ("": : :"memory")

And WC memory can reorder writes (WB memory can't).

> > The purpose of rmb() is to enforce ordering on architectures that don't 
> > force it in hardware --- that is not the case of x86.
> 
> Well it clearly is the case because I just pointed you to a document
> that says they can go out of order.

> If you want to argue that existing
> implementations do not, then by all means go ahead and send a patch to
> Linus and see what he says about it ;)

I mean this: wmb() assumes that the data to be ordered are not in WC 
memory. rmb() assumes that the data can be in WC memory (lfence is only 
useful on WC --- it doesn't have any effect on other memory types).

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Nick Piggin
On Wed, Oct 17, 2007 at 01:05:16AM +0200, Mikulas Patocka wrote:
> > > I see, AMD says that WC memory loads can be out-of-order.
> > > 
> > > There is very little usability to it --- framebuffer and AGP aperture is 
> > > the only piece of memory that is WC and no kernel structures are placed 
> > > there, so it is possible to remove that lfence.
> > 
> > No. In Linux kernel, rmb() means that all previous loads, including to
> > any IO regions, will be executed before any subsequent load.
> 
> You already must not place any data structures into WC memory --- for 
> example, spinlocks wouldn't work there.

What do you mean "already"? If we already have drivers loading data from
WC memory, then rmb() needs to order them, whether or not they actually
need it. If that were prohibitively costly, then we'd introduce a new
barrier which does not order WC memory, right?


> wmb() also won't work on WC 
> memory, because it assumes that writes are ordered.

You mean the one defined like this:
  #define wmb()   asm volatile("sfence" ::: "memory")
? If it assumed writes are ordered, then it would just be a barrier().


> > How can you possibly get rid of lfence from there just because you may
> > happen to *know* that it isn't used (btw. the IO serialisation isn't for
> > kernel data structures, it is for actual IO operations, generally).
> 
> IO regions are in uncached memory, and x86 already serializes it fine. It 
> flushes any write buffers on access to uncached memory.
> 
> (BTW. what is the general portable rule for serializing writel() and 
> readl()? On x86 they are serialized in hardware, but what on other archs?)

Most tend to order them strongly these days. There are also relaxed
variants for architectures that can take advantage of them.


> > Doing that would lead to an unmaintainable mess. If drivers don't need rmb,
> > then they don't call it.
> 
> If wmb() doesn't currently work on write-combining memory, why should 
> rmb() work there?

I don't understand why you say wmb() doesn't work on WC memory. What part
of which spec are you reading (or, given your mistrust of specs, what CPU
are you seeing failures with)?
 

> The purpose of rmb() is to enforce ordering on architectures that don't 
> force it in hardware --- that is not the case of x86.

Well it clearly is the case because I just pointed you to a document
that says they can go out of order. If you want to argue that existing
implementations do not, then by all means go ahead and send a patch to
Linus and see what he says about it ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
> > I see, AMD says that WC memory loads can be out-of-order.
> > 
> > There is very little usability to it --- framebuffer and AGP aperture is 
> > the only piece of memory that is WC and no kernel structures are placed 
> > there, so it is possible to remove that lfence.
> 
> No. In Linux kernel, rmb() means that all previous loads, including to
> any IO regions, will be executed before any subsequent load.

You already must not place any data structures into WC memory --- for 
example, spinlocks wouldn't work there. wmb() also won't work on WC 
memory, because it assumes that writes are ordered.

> How can you possibly get rid of lfence from there just because you may
> happen to *know* that it isn't used (btw. the IO serialisation isn't for
> kernel data structures, it is for actual IO operations, generally).

IO regions are in uncached memory, and x86 already serializes it fine. It 
flushes any write buffers on access to uncached memory.

(BTW. what is the general portable rule for serializing writel() and 
readl()? On x86 they are serialized in hardware, but what on other archs?)

> Doing that would lead to an unmaintainable mess. If drivers don't need rmb,
> then they don't call it.

If wmb() doesn't currently work on write-combining memory, why should 
rmb() work there?

The purpose of rmb() is to enforce ordering on architectures that don't 
force it in hardware --- that is not the case of x86.

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Nick Piggin
On Tue, Oct 16, 2007 at 12:33:54PM +0200, Mikulas Patocka wrote:
> 
> 
> On Tue, 16 Oct 2007, Nick Piggin wrote:
> 
> > > > The cpus also have an explicit set of instructions that deliberately do 
> > > > unordered stores/loads, and s/lfence etc are mostly designed for those.
> > > 
> > > I know about unordered stores (movnti & similar) --- they basically use 
> > > write-combining method on memory that is normally write-back --- and they 
> > > need sfence. But which one instruction does unordered load and needs 
> > > lefence?
> > 
> > Also, for non-wb memory. I don't think the Intel document referenced
> > says anything about this, but the AMD document says that loads can pass
> > loads (page 8, rule b).
> > 
> > This is why our rmb() is still an lfence.
> 
> I see, AMD says that WC memory loads can be out-of-order.
> 
> There is very little usability to it --- framebuffer and AGP aperture is 
> the only piece of memory that is WC and no kernel structures are placed 
> there, so it is possible to remove that lfence.

No. In Linux kernel, rmb() means that all previous loads, including to
any IO regions, will be executed before any subsequent load.

How can you possibly get rid of lfence from there just because you may
happen to *know* that it isn't used (btw. the IO serialisation isn't for
kernel data structures, it is for actual IO operations, generally).

Doing that would lead to an unmaintainable mess. If drivers don't need rmb,
then they don't call it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka


On Tue, 16 Oct 2007, Nick Piggin wrote:

> > > The cpus also have an explicit set of instructions that deliberately do 
> > > unordered stores/loads, and s/lfence etc are mostly designed for those.
> > 
> > I know about unordered stores (movnti & similar) --- they basically use 
> > write-combining method on memory that is normally write-back --- and they 
> > need sfence. But which one instruction does unordered load and needs 
> > lefence?
> 
> Also, for non-wb memory. I don't think the Intel document referenced
> says anything about this, but the AMD document says that loads can pass
> loads (page 8, rule b).
> 
> This is why our rmb() is still an lfence.

I see, AMD says that WC memory loads can be out-of-order.

There is very little usability to it --- framebuffer and AGP aperture is 
the only piece of memory that is WC and no kernel structures are placed 
there, so it is possible to remove that lfence.

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
On Mon, 15 Oct 2007, H. Peter Anvin wrote:

> Mikulas Patocka wrote:
> > 
> > I know about unordered stores (movnti & similar) --- they basically use
> > write-combining method on memory that is normally write-back --- and they
> > need sfence. But which one instruction does unordered load and needs
> > lefence?
> > 
> 
> PREFETCHNTA.

PREFETCH* doesn't change program semantics. The processor is allowed to 
ignore prefetch instruction if it doesn't have resources needed for 
prefetch. It not ordered wrt. fences.

PREFETCHNTA was implemented as prefetch into L1 cache and omitting L2 
cache on Pentium 3 and M --- and it is implemented as prefetch into L2 
cache on other --- do it doesn't really use any special buffers.

Mikulas

>   -hpa
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
On Mon, 15 Oct 2007, H. Peter Anvin wrote:

 Mikulas Patocka wrote:
  
  I know about unordered stores (movnti  similar) --- they basically use
  write-combining method on memory that is normally write-back --- and they
  need sfence. But which one instruction does unordered load and needs
  lefence?
  
 
 PREFETCHNTA.

PREFETCH* doesn't change program semantics. The processor is allowed to 
ignore prefetch instruction if it doesn't have resources needed for 
prefetch. It not ordered wrt. fences.

PREFETCHNTA was implemented as prefetch into L1 cache and omitting L2 
cache on Pentium 3 and M --- and it is implemented as prefetch into L2 
cache on other --- do it doesn't really use any special buffers.

Mikulas

   -hpa
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka


On Tue, 16 Oct 2007, Nick Piggin wrote:

   The cpus also have an explicit set of instructions that deliberately do 
   unordered stores/loads, and s/lfence etc are mostly designed for those.
  
  I know about unordered stores (movnti  similar) --- they basically use 
  write-combining method on memory that is normally write-back --- and they 
  need sfence. But which one instruction does unordered load and needs 
  lefence?
 
 Also, for non-wb memory. I don't think the Intel document referenced
 says anything about this, but the AMD document says that loads can pass
 loads (page 8, rule b).
 
 This is why our rmb() is still an lfence.

I see, AMD says that WC memory loads can be out-of-order.

There is very little usability to it --- framebuffer and AGP aperture is 
the only piece of memory that is WC and no kernel structures are placed 
there, so it is possible to remove that lfence.

Mikulas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Nick Piggin
On Tue, Oct 16, 2007 at 12:33:54PM +0200, Mikulas Patocka wrote:
 
 
 On Tue, 16 Oct 2007, Nick Piggin wrote:
 
The cpus also have an explicit set of instructions that deliberately do 
unordered stores/loads, and s/lfence etc are mostly designed for those.
   
   I know about unordered stores (movnti  similar) --- they basically use 
   write-combining method on memory that is normally write-back --- and they 
   need sfence. But which one instruction does unordered load and needs 
   lefence?
  
  Also, for non-wb memory. I don't think the Intel document referenced
  says anything about this, but the AMD document says that loads can pass
  loads (page 8, rule b).
  
  This is why our rmb() is still an lfence.
 
 I see, AMD says that WC memory loads can be out-of-order.
 
 There is very little usability to it --- framebuffer and AGP aperture is 
 the only piece of memory that is WC and no kernel structures are placed 
 there, so it is possible to remove that lfence.

No. In Linux kernel, rmb() means that all previous loads, including to
any IO regions, will be executed before any subsequent load.

How can you possibly get rid of lfence from there just because you may
happen to *know* that it isn't used (btw. the IO serialisation isn't for
kernel data structures, it is for actual IO operations, generally).

Doing that would lead to an unmaintainable mess. If drivers don't need rmb,
then they don't call it.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
  I see, AMD says that WC memory loads can be out-of-order.
  
  There is very little usability to it --- framebuffer and AGP aperture is 
  the only piece of memory that is WC and no kernel structures are placed 
  there, so it is possible to remove that lfence.
 
 No. In Linux kernel, rmb() means that all previous loads, including to
 any IO regions, will be executed before any subsequent load.

You already must not place any data structures into WC memory --- for 
example, spinlocks wouldn't work there. wmb() also won't work on WC 
memory, because it assumes that writes are ordered.

 How can you possibly get rid of lfence from there just because you may
 happen to *know* that it isn't used (btw. the IO serialisation isn't for
 kernel data structures, it is for actual IO operations, generally).

IO regions are in uncached memory, and x86 already serializes it fine. It 
flushes any write buffers on access to uncached memory.

(BTW. what is the general portable rule for serializing writel() and 
readl()? On x86 they are serialized in hardware, but what on other archs?)

 Doing that would lead to an unmaintainable mess. If drivers don't need rmb,
 then they don't call it.

If wmb() doesn't currently work on write-combining memory, why should 
rmb() work there?

The purpose of rmb() is to enforce ordering on architectures that don't 
force it in hardware --- that is not the case of x86.

Mikulas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Nick Piggin
On Wed, Oct 17, 2007 at 01:05:16AM +0200, Mikulas Patocka wrote:
   I see, AMD says that WC memory loads can be out-of-order.
   
   There is very little usability to it --- framebuffer and AGP aperture is 
   the only piece of memory that is WC and no kernel structures are placed 
   there, so it is possible to remove that lfence.
  
  No. In Linux kernel, rmb() means that all previous loads, including to
  any IO regions, will be executed before any subsequent load.
 
 You already must not place any data structures into WC memory --- for 
 example, spinlocks wouldn't work there.

What do you mean already? If we already have drivers loading data from
WC memory, then rmb() needs to order them, whether or not they actually
need it. If that were prohibitively costly, then we'd introduce a new
barrier which does not order WC memory, right?


 wmb() also won't work on WC 
 memory, because it assumes that writes are ordered.

You mean the one defined like this:
  #define wmb()   asm volatile(sfence ::: memory)
? If it assumed writes are ordered, then it would just be a barrier().


  How can you possibly get rid of lfence from there just because you may
  happen to *know* that it isn't used (btw. the IO serialisation isn't for
  kernel data structures, it is for actual IO operations, generally).
 
 IO regions are in uncached memory, and x86 already serializes it fine. It 
 flushes any write buffers on access to uncached memory.
 
 (BTW. what is the general portable rule for serializing writel() and 
 readl()? On x86 they are serialized in hardware, but what on other archs?)

Most tend to order them strongly these days. There are also relaxed
variants for architectures that can take advantage of them.


  Doing that would lead to an unmaintainable mess. If drivers don't need rmb,
  then they don't call it.
 
 If wmb() doesn't currently work on write-combining memory, why should 
 rmb() work there?

I don't understand why you say wmb() doesn't work on WC memory. What part
of which spec are you reading (or, given your mistrust of specs, what CPU
are you seeing failures with)?
 

 The purpose of rmb() is to enforce ordering on architectures that don't 
 force it in hardware --- that is not the case of x86.

Well it clearly is the case because I just pointed you to a document
that says they can go out of order. If you want to argue that existing
implementations do not, then by all means go ahead and send a patch to
Linus and see what he says about it ;)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
  You already must not place any data structures into WC memory --- for 
  example, spinlocks wouldn't work there.
 
 What do you mean already?

I mean in current kernel (I checked it in 2.6.22)

 If we already have drivers loading data from
 WC memory, then rmb() needs to order them, whether or not they actually
 need it. If that were prohibitively costly, then we'd introduce a new
 barrier which does not order WC memory, right?
 
 
  wmb() also won't work on WC 
  memory, because it assumes that writes are ordered.
 
 You mean the one defined like this:
   #define wmb()   asm volatile(sfence ::: memory)
 ? If it assumed writes are ordered, then it would just be a barrier().

You read wrong part of the include file. Really, it is 
(2.6.22,include/asm-i386/system.h):
#ifdef CONFIG_X86_OOSTORE
#define wmb() alternative(lock; addl $0,0(%%esp), sfence, 
X86_FEATURE_XMM)
#else
#define wmb()   __asm__ __volatile__ (: : :memory)
#endif

CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6
--- so on Intel and AMD, it is really just barrier().

So drivers can't assume that wmb() works on write-combining memory.

   Doing that would lead to an unmaintainable mess. If drivers don't 
   need rmb, then they don't call it.
  
  If wmb() doesn't currently work on write-combining memory, why should 
  rmb() work there?
 
 I don't understand why you say wmb() doesn't work on WC memory.

Because it is defined as __asm__ __volatile__ (: : :memory)

And WC memory can reorder writes (WB memory can't).

  The purpose of rmb() is to enforce ordering on architectures that don't 
  force it in hardware --- that is not the case of x86.
 
 Well it clearly is the case because I just pointed you to a document
 that says they can go out of order.

 If you want to argue that existing
 implementations do not, then by all means go ahead and send a patch to
 Linus and see what he says about it ;)

I mean this: wmb() assumes that the data to be ordered are not in WC 
memory. rmb() assumes that the data can be in WC memory (lfence is only 
useful on WC --- it doesn't have any effect on other memory types).

Mikulas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Herbert Xu
Nick Piggin [EMAIL PROTECTED] wrote:

 Also, for non-wb memory. I don't think the Intel document referenced
 says anything about this, but the AMD document says that loads can pass
 loads (page 8, rule b).
 
 This is why our rmb() is still an lfence.

BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb
instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's
shared with other Xen domains or the hypervisor.

The reason this is necessary is because even if a Xen domain is
UP the hypervisor might be SMP.

It would be nice if we can have these adopt the new SMP barriers
on x86 instead of the IO ones as they currently do.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Nick Piggin
On Tue, Oct 16, 2007 at 12:08:01AM +0200, Mikulas Patocka wrote:
> > On Mon, 15 Oct 2007 22:47:42 +0200 (CEST)
> > Mikulas Patocka <[EMAIL PROTECTED]> wrote:
> > 
> > > > According to latest memory ordering specification documents from
> > > > Intel and AMD, both manufacturers are committed to in-order loads
> > > > from cacheable memory for the x86 architecture. Hence, smp_rmb()
> > > > may be a simple barrier.
> > > >
> > > > http://developer.intel.com/products/processor/manuals/318147.pdf 
> > > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
> > > 
> > > Hi
> > > 
> > > I'm just wondering about one thing --- what is LFENCE instruction
> > > good for?
> > > 
> > > SFENCE is for enforcing ordering in write-combining buffers (it
> > > doesn't have sense in write-back cache mode).
> > > MFENCE is for preventing of moving stores past loads.
> > > 
> > > But what is LFENCE for? I read the above documents and they already
> > > say that CPUs have ordered loads.
> > > 
> > 
> > The cpus also have an explicit set of instructions that deliberately do 
> > unordered stores/loads, and s/lfence etc are mostly designed for those.
> 
> I know about unordered stores (movnti & similar) --- they basically use 
> write-combining method on memory that is normally write-back --- and they 
> need sfence. But which one instruction does unordered load and needs 
> lefence?

Also, for non-wb memory. I don't think the Intel document referenced
says anything about this, but the AMD document says that loads can pass
loads (page 8, rule b).

This is why our rmb() is still an lfence.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread H. Peter Anvin

Mikulas Patocka wrote:


I know about unordered stores (movnti & similar) --- they basically use 
write-combining method on memory that is normally write-back --- and they 
need sfence. But which one instruction does unordered load and needs 
lefence?




PREFETCHNTA.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Mikulas Patocka
> On Mon, 15 Oct 2007 22:47:42 +0200 (CEST)
> Mikulas Patocka <[EMAIL PROTECTED]> wrote:
> 
> > > According to latest memory ordering specification documents from
> > > Intel and AMD, both manufacturers are committed to in-order loads
> > > from cacheable memory for the x86 architecture. Hence, smp_rmb()
> > > may be a simple barrier.
> > >
> > > http://developer.intel.com/products/processor/manuals/318147.pdf 
> > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
> > 
> > Hi
> > 
> > I'm just wondering about one thing --- what is LFENCE instruction
> > good for?
> > 
> > SFENCE is for enforcing ordering in write-combining buffers (it
> > doesn't have sense in write-back cache mode).
> > MFENCE is for preventing of moving stores past loads.
> > 
> > But what is LFENCE for? I read the above documents and they already
> > say that CPUs have ordered loads.
> > 
> 
> The cpus also have an explicit set of instructions that deliberately do 
> unordered stores/loads, and s/lfence etc are mostly designed for those.

I know about unordered stores (movnti & similar) --- they basically use 
write-combining method on memory that is normally write-back --- and they 
need sfence. But which one instruction does unordered load and needs 
lefence?

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Arjan van de Ven
On Mon, 15 Oct 2007 22:47:42 +0200 (CEST)
Mikulas Patocka <[EMAIL PROTECTED]> wrote:

> > According to latest memory ordering specification documents from
> > Intel and AMD, both manufacturers are committed to in-order loads
> > from cacheable memory for the x86 architecture. Hence, smp_rmb()
> > may be a simple barrier.
> >
> > http://developer.intel.com/products/processor/manuals/318147.pdf 
> > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
> 
> Hi
> 
> I'm just wondering about one thing --- what is LFENCE instruction
> good for?
> 
> SFENCE is for enforcing ordering in write-combining buffers (it
> doesn't have sense in write-back cache mode).
> MFENCE is for preventing of moving stores past loads.
> 
> But what is LFENCE for? I read the above documents and they already
> say that CPUs have ordered loads.
> 

The cpus also have an explicit set of instructions that deliberately do
unordered stores/loads, and s/lfence etc are mostly designed for those.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Arjan van de Ven
On Mon, 15 Oct 2007 22:47:42 +0200 (CEST)
Mikulas Patocka [EMAIL PROTECTED] wrote:

  According to latest memory ordering specification documents from
  Intel and AMD, both manufacturers are committed to in-order loads
  from cacheable memory for the x86 architecture. Hence, smp_rmb()
  may be a simple barrier.
 
  http://developer.intel.com/products/processor/manuals/318147.pdf 
  http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
 
 Hi
 
 I'm just wondering about one thing --- what is LFENCE instruction
 good for?
 
 SFENCE is for enforcing ordering in write-combining buffers (it
 doesn't have sense in write-back cache mode).
 MFENCE is for preventing of moving stores past loads.
 
 But what is LFENCE for? I read the above documents and they already
 say that CPUs have ordered loads.
 

The cpus also have an explicit set of instructions that deliberately do
unordered stores/loads, and s/lfence etc are mostly designed for those.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Mikulas Patocka
 On Mon, 15 Oct 2007 22:47:42 +0200 (CEST)
 Mikulas Patocka [EMAIL PROTECTED] wrote:
 
   According to latest memory ordering specification documents from
   Intel and AMD, both manufacturers are committed to in-order loads
   from cacheable memory for the x86 architecture. Hence, smp_rmb()
   may be a simple barrier.
  
   http://developer.intel.com/products/processor/manuals/318147.pdf 
   http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
  
  Hi
  
  I'm just wondering about one thing --- what is LFENCE instruction
  good for?
  
  SFENCE is for enforcing ordering in write-combining buffers (it
  doesn't have sense in write-back cache mode).
  MFENCE is for preventing of moving stores past loads.
  
  But what is LFENCE for? I read the above documents and they already
  say that CPUs have ordered loads.
  
 
 The cpus also have an explicit set of instructions that deliberately do 
 unordered stores/loads, and s/lfence etc are mostly designed for those.

I know about unordered stores (movnti  similar) --- they basically use 
write-combining method on memory that is normally write-back --- and they 
need sfence. But which one instruction does unordered load and needs 
lefence?

Mikulas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread H. Peter Anvin

Mikulas Patocka wrote:


I know about unordered stores (movnti  similar) --- they basically use 
write-combining method on memory that is normally write-back --- and they 
need sfence. But which one instruction does unordered load and needs 
lefence?




PREFETCHNTA.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Nick Piggin
On Tue, Oct 16, 2007 at 12:08:01AM +0200, Mikulas Patocka wrote:
  On Mon, 15 Oct 2007 22:47:42 +0200 (CEST)
  Mikulas Patocka [EMAIL PROTECTED] wrote:
  
According to latest memory ordering specification documents from
Intel and AMD, both manufacturers are committed to in-order loads
from cacheable memory for the x86 architecture. Hence, smp_rmb()
may be a simple barrier.
   
http://developer.intel.com/products/processor/manuals/318147.pdf 
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
   
   Hi
   
   I'm just wondering about one thing --- what is LFENCE instruction
   good for?
   
   SFENCE is for enforcing ordering in write-combining buffers (it
   doesn't have sense in write-back cache mode).
   MFENCE is for preventing of moving stores past loads.
   
   But what is LFENCE for? I read the above documents and they already
   say that CPUs have ordered loads.
   
  
  The cpus also have an explicit set of instructions that deliberately do 
  unordered stores/loads, and s/lfence etc are mostly designed for those.
 
 I know about unordered stores (movnti  similar) --- they basically use 
 write-combining method on memory that is normally write-back --- and they 
 need sfence. But which one instruction does unordered load and needs 
 lefence?

Also, for non-wb memory. I don't think the Intel document referenced
says anything about this, but the AMD document says that loads can pass
loads (page 8, rule b).

This is why our rmb() is still an lfence.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/