Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
> > > > You already must not place any data structures into WC memory --- for > > > > example, spinlocks wouldn't work there. > > > > > > What do you mean "already"? > > > > I mean "in current kernel" (I checked it in 2.6.22) > > Ahh, that's not "current kernel", though ;) > > 4071c718555d955a35e9651f77086096ad87d498 > > > So drivers can't assume that wmb() works on write-combining memory. > > Drivers should be able to assume that wmb() orders _everything_ (except > some whacky Altix thing, which I really want to fold under wmb at some > point anyway). > > So I decided that old x86 semantics isn't right, and now it really is a > lock op / sfence everywhere. I see. I'm just curious --- is there any real usage for WC memory, except graphics card memory? Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
You already must not place any data structures into WC memory --- for example, spinlocks wouldn't work there. What do you mean already? I mean in current kernel (I checked it in 2.6.22) Ahh, that's not current kernel, though ;) 4071c718555d955a35e9651f77086096ad87d498 So drivers can't assume that wmb() works on write-combining memory. Drivers should be able to assume that wmb() orders _everything_ (except some whacky Altix thing, which I really want to fold under wmb at some point anyway). So I decided that old x86 semantics isn't right, and now it really is a lock op / sfence everywhere. I see. I'm just curious --- is there any real usage for WC memory, except graphics card memory? Mikulas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Wed, Oct 17, 2007 at 01:51:17PM +0800, Herbert Xu wrote: > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > Also, for non-wb memory. I don't think the Intel document referenced > > says anything about this, but the AMD document says that loads can pass > > loads (page 8, rule b). > > > > This is why our rmb() is still an lfence. > > BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb > instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's > shared with other Xen domains or the hypervisor. > > The reason this is necessary is because even if a Xen domain is > UP the hypervisor might be SMP. > > It would be nice if we can have these adopt the new SMP barriers > on x86 instead of the IO ones as they currently do. That's a good point actually. Something like raw_smp_*mb, which always orders memory, but only for regular WB operatoins. I could put that on the todo list... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Wed, Oct 17, 2007 at 02:30:32AM +0200, Mikulas Patocka wrote: > > > You already must not place any data structures into WC memory --- for > > > example, spinlocks wouldn't work there. > > > > What do you mean "already"? > > I mean "in current kernel" (I checked it in 2.6.22) Ahh, that's not "current kernel", though ;) 4071c718555d955a35e9651f77086096ad87d498 > > If we already have drivers loading data from > > WC memory, then rmb() needs to order them, whether or not they actually > > need it. If that were prohibitively costly, then we'd introduce a new > > barrier which does not order WC memory, right? > > > > > > > wmb() also won't work on WC > > > memory, because it assumes that writes are ordered. > > > > You mean the one defined like this: > > #define wmb() asm volatile("sfence" ::: "memory") > > ? If it assumed writes are ordered, then it would just be a barrier(). > > You read wrong part of the include file. Really, it is > (2.6.22,include/asm-i386/system.h): > #ifdef CONFIG_X86_OOSTORE > #define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", > X86_FEATURE_XMM) > #else > #define wmb() __asm__ __volatile__ ("": : :"memory") > #endif > > CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 > --- so on Intel and AMD, it is really just barrier(). > > So drivers can't assume that wmb() works on write-combining memory. Drivers should be able to assume that wmb() orders _everything_ (except some whacky Altix thing, which I really want to fold under wmb at some point anyway). So I decided that old x86 semantics isn't right, and now it really is a lock op / sfence everywhere. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Wed, Oct 17, 2007 at 02:30:32AM +0200, Mikulas Patocka wrote: You already must not place any data structures into WC memory --- for example, spinlocks wouldn't work there. What do you mean already? I mean in current kernel (I checked it in 2.6.22) Ahh, that's not current kernel, though ;) 4071c718555d955a35e9651f77086096ad87d498 If we already have drivers loading data from WC memory, then rmb() needs to order them, whether or not they actually need it. If that were prohibitively costly, then we'd introduce a new barrier which does not order WC memory, right? wmb() also won't work on WC memory, because it assumes that writes are ordered. You mean the one defined like this: #define wmb() asm volatile(sfence ::: memory) ? If it assumed writes are ordered, then it would just be a barrier(). You read wrong part of the include file. Really, it is (2.6.22,include/asm-i386/system.h): #ifdef CONFIG_X86_OOSTORE #define wmb() alternative(lock; addl $0,0(%%esp), sfence, X86_FEATURE_XMM) #else #define wmb() __asm__ __volatile__ (: : :memory) #endif CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 --- so on Intel and AMD, it is really just barrier(). So drivers can't assume that wmb() works on write-combining memory. Drivers should be able to assume that wmb() orders _everything_ (except some whacky Altix thing, which I really want to fold under wmb at some point anyway). So I decided that old x86 semantics isn't right, and now it really is a lock op / sfence everywhere. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Wed, Oct 17, 2007 at 01:51:17PM +0800, Herbert Xu wrote: Nick Piggin [EMAIL PROTECTED] wrote: Also, for non-wb memory. I don't think the Intel document referenced says anything about this, but the AMD document says that loads can pass loads (page 8, rule b). This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's shared with other Xen domains or the hypervisor. The reason this is necessary is because even if a Xen domain is UP the hypervisor might be SMP. It would be nice if we can have these adopt the new SMP barriers on x86 instead of the IO ones as they currently do. That's a good point actually. Something like raw_smp_*mb, which always orders memory, but only for regular WB operatoins. I could put that on the todo list... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
Nick Piggin <[EMAIL PROTECTED]> wrote: > > Also, for non-wb memory. I don't think the Intel document referenced > says anything about this, but the AMD document says that loads can pass > loads (page 8, rule b). > > This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's shared with other Xen domains or the hypervisor. The reason this is necessary is because even if a Xen domain is UP the hypervisor might be SMP. It would be nice if we can have these adopt the new SMP barriers on x86 instead of the IO ones as they currently do. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
> > You already must not place any data structures into WC memory --- for > > example, spinlocks wouldn't work there. > > What do you mean "already"? I mean "in current kernel" (I checked it in 2.6.22) > If we already have drivers loading data from > WC memory, then rmb() needs to order them, whether or not they actually > need it. If that were prohibitively costly, then we'd introduce a new > barrier which does not order WC memory, right? > > > > wmb() also won't work on WC > > memory, because it assumes that writes are ordered. > > You mean the one defined like this: > #define wmb() asm volatile("sfence" ::: "memory") > ? If it assumed writes are ordered, then it would just be a barrier(). You read wrong part of the include file. Really, it is (2.6.22,include/asm-i386/system.h): #ifdef CONFIG_X86_OOSTORE #define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM) #else #define wmb() __asm__ __volatile__ ("": : :"memory") #endif CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 --- so on Intel and AMD, it is really just barrier(). So drivers can't assume that wmb() works on write-combining memory. > > > Doing that would lead to an unmaintainable mess. If drivers don't > > > need rmb, then they don't call it. > > > > If wmb() doesn't currently work on write-combining memory, why should > > rmb() work there? > > I don't understand why you say wmb() doesn't work on WC memory. Because it is defined as __asm__ __volatile__ ("": : :"memory") And WC memory can reorder writes (WB memory can't). > > The purpose of rmb() is to enforce ordering on architectures that don't > > force it in hardware --- that is not the case of x86. > > Well it clearly is the case because I just pointed you to a document > that says they can go out of order. > If you want to argue that existing > implementations do not, then by all means go ahead and send a patch to > Linus and see what he says about it ;) I mean this: wmb() assumes that the data to be ordered are not in WC memory. rmb() assumes that the data can be in WC memory (lfence is only useful on WC --- it doesn't have any effect on other memory types). Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Wed, Oct 17, 2007 at 01:05:16AM +0200, Mikulas Patocka wrote: > > > I see, AMD says that WC memory loads can be out-of-order. > > > > > > There is very little usability to it --- framebuffer and AGP aperture is > > > the only piece of memory that is WC and no kernel structures are placed > > > there, so it is possible to remove that lfence. > > > > No. In Linux kernel, rmb() means that all previous loads, including to > > any IO regions, will be executed before any subsequent load. > > You already must not place any data structures into WC memory --- for > example, spinlocks wouldn't work there. What do you mean "already"? If we already have drivers loading data from WC memory, then rmb() needs to order them, whether or not they actually need it. If that were prohibitively costly, then we'd introduce a new barrier which does not order WC memory, right? > wmb() also won't work on WC > memory, because it assumes that writes are ordered. You mean the one defined like this: #define wmb() asm volatile("sfence" ::: "memory") ? If it assumed writes are ordered, then it would just be a barrier(). > > How can you possibly get rid of lfence from there just because you may > > happen to *know* that it isn't used (btw. the IO serialisation isn't for > > kernel data structures, it is for actual IO operations, generally). > > IO regions are in uncached memory, and x86 already serializes it fine. It > flushes any write buffers on access to uncached memory. > > (BTW. what is the general portable rule for serializing writel() and > readl()? On x86 they are serialized in hardware, but what on other archs?) Most tend to order them strongly these days. There are also relaxed variants for architectures that can take advantage of them. > > Doing that would lead to an unmaintainable mess. If drivers don't need rmb, > > then they don't call it. > > If wmb() doesn't currently work on write-combining memory, why should > rmb() work there? I don't understand why you say wmb() doesn't work on WC memory. What part of which spec are you reading (or, given your mistrust of specs, what CPU are you seeing failures with)? > The purpose of rmb() is to enforce ordering on architectures that don't > force it in hardware --- that is not the case of x86. Well it clearly is the case because I just pointed you to a document that says they can go out of order. If you want to argue that existing implementations do not, then by all means go ahead and send a patch to Linus and see what he says about it ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
> > I see, AMD says that WC memory loads can be out-of-order. > > > > There is very little usability to it --- framebuffer and AGP aperture is > > the only piece of memory that is WC and no kernel structures are placed > > there, so it is possible to remove that lfence. > > No. In Linux kernel, rmb() means that all previous loads, including to > any IO regions, will be executed before any subsequent load. You already must not place any data structures into WC memory --- for example, spinlocks wouldn't work there. wmb() also won't work on WC memory, because it assumes that writes are ordered. > How can you possibly get rid of lfence from there just because you may > happen to *know* that it isn't used (btw. the IO serialisation isn't for > kernel data structures, it is for actual IO operations, generally). IO regions are in uncached memory, and x86 already serializes it fine. It flushes any write buffers on access to uncached memory. (BTW. what is the general portable rule for serializing writel() and readl()? On x86 they are serialized in hardware, but what on other archs?) > Doing that would lead to an unmaintainable mess. If drivers don't need rmb, > then they don't call it. If wmb() doesn't currently work on write-combining memory, why should rmb() work there? The purpose of rmb() is to enforce ordering on architectures that don't force it in hardware --- that is not the case of x86. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Tue, Oct 16, 2007 at 12:33:54PM +0200, Mikulas Patocka wrote: > > > On Tue, 16 Oct 2007, Nick Piggin wrote: > > > > > The cpus also have an explicit set of instructions that deliberately do > > > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > > > > > I know about unordered stores (movnti & similar) --- they basically use > > > write-combining method on memory that is normally write-back --- and they > > > need sfence. But which one instruction does unordered load and needs > > > lefence? > > > > Also, for non-wb memory. I don't think the Intel document referenced > > says anything about this, but the AMD document says that loads can pass > > loads (page 8, rule b). > > > > This is why our rmb() is still an lfence. > > I see, AMD says that WC memory loads can be out-of-order. > > There is very little usability to it --- framebuffer and AGP aperture is > the only piece of memory that is WC and no kernel structures are placed > there, so it is possible to remove that lfence. No. In Linux kernel, rmb() means that all previous loads, including to any IO regions, will be executed before any subsequent load. How can you possibly get rid of lfence from there just because you may happen to *know* that it isn't used (btw. the IO serialisation isn't for kernel data structures, it is for actual IO operations, generally). Doing that would lead to an unmaintainable mess. If drivers don't need rmb, then they don't call it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Tue, 16 Oct 2007, Nick Piggin wrote: > > > The cpus also have an explicit set of instructions that deliberately do > > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > > > I know about unordered stores (movnti & similar) --- they basically use > > write-combining method on memory that is normally write-back --- and they > > need sfence. But which one instruction does unordered load and needs > > lefence? > > Also, for non-wb memory. I don't think the Intel document referenced > says anything about this, but the AMD document says that loads can pass > loads (page 8, rule b). > > This is why our rmb() is still an lfence. I see, AMD says that WC memory loads can be out-of-order. There is very little usability to it --- framebuffer and AGP aperture is the only piece of memory that is WC and no kernel structures are placed there, so it is possible to remove that lfence. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Mon, 15 Oct 2007, H. Peter Anvin wrote: > Mikulas Patocka wrote: > > > > I know about unordered stores (movnti & similar) --- they basically use > > write-combining method on memory that is normally write-back --- and they > > need sfence. But which one instruction does unordered load and needs > > lefence? > > > > PREFETCHNTA. PREFETCH* doesn't change program semantics. The processor is allowed to ignore prefetch instruction if it doesn't have resources needed for prefetch. It not ordered wrt. fences. PREFETCHNTA was implemented as prefetch into L1 cache and omitting L2 cache on Pentium 3 and M --- and it is implemented as prefetch into L2 cache on other --- do it doesn't really use any special buffers. Mikulas > -hpa > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Mon, 15 Oct 2007, H. Peter Anvin wrote: Mikulas Patocka wrote: I know about unordered stores (movnti similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? PREFETCHNTA. PREFETCH* doesn't change program semantics. The processor is allowed to ignore prefetch instruction if it doesn't have resources needed for prefetch. It not ordered wrt. fences. PREFETCHNTA was implemented as prefetch into L1 cache and omitting L2 cache on Pentium 3 and M --- and it is implemented as prefetch into L2 cache on other --- do it doesn't really use any special buffers. Mikulas -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Tue, 16 Oct 2007, Nick Piggin wrote: The cpus also have an explicit set of instructions that deliberately do unordered stores/loads, and s/lfence etc are mostly designed for those. I know about unordered stores (movnti similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? Also, for non-wb memory. I don't think the Intel document referenced says anything about this, but the AMD document says that loads can pass loads (page 8, rule b). This is why our rmb() is still an lfence. I see, AMD says that WC memory loads can be out-of-order. There is very little usability to it --- framebuffer and AGP aperture is the only piece of memory that is WC and no kernel structures are placed there, so it is possible to remove that lfence. Mikulas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Tue, Oct 16, 2007 at 12:33:54PM +0200, Mikulas Patocka wrote: On Tue, 16 Oct 2007, Nick Piggin wrote: The cpus also have an explicit set of instructions that deliberately do unordered stores/loads, and s/lfence etc are mostly designed for those. I know about unordered stores (movnti similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? Also, for non-wb memory. I don't think the Intel document referenced says anything about this, but the AMD document says that loads can pass loads (page 8, rule b). This is why our rmb() is still an lfence. I see, AMD says that WC memory loads can be out-of-order. There is very little usability to it --- framebuffer and AGP aperture is the only piece of memory that is WC and no kernel structures are placed there, so it is possible to remove that lfence. No. In Linux kernel, rmb() means that all previous loads, including to any IO regions, will be executed before any subsequent load. How can you possibly get rid of lfence from there just because you may happen to *know* that it isn't used (btw. the IO serialisation isn't for kernel data structures, it is for actual IO operations, generally). Doing that would lead to an unmaintainable mess. If drivers don't need rmb, then they don't call it. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
I see, AMD says that WC memory loads can be out-of-order. There is very little usability to it --- framebuffer and AGP aperture is the only piece of memory that is WC and no kernel structures are placed there, so it is possible to remove that lfence. No. In Linux kernel, rmb() means that all previous loads, including to any IO regions, will be executed before any subsequent load. You already must not place any data structures into WC memory --- for example, spinlocks wouldn't work there. wmb() also won't work on WC memory, because it assumes that writes are ordered. How can you possibly get rid of lfence from there just because you may happen to *know* that it isn't used (btw. the IO serialisation isn't for kernel data structures, it is for actual IO operations, generally). IO regions are in uncached memory, and x86 already serializes it fine. It flushes any write buffers on access to uncached memory. (BTW. what is the general portable rule for serializing writel() and readl()? On x86 they are serialized in hardware, but what on other archs?) Doing that would lead to an unmaintainable mess. If drivers don't need rmb, then they don't call it. If wmb() doesn't currently work on write-combining memory, why should rmb() work there? The purpose of rmb() is to enforce ordering on architectures that don't force it in hardware --- that is not the case of x86. Mikulas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Wed, Oct 17, 2007 at 01:05:16AM +0200, Mikulas Patocka wrote: I see, AMD says that WC memory loads can be out-of-order. There is very little usability to it --- framebuffer and AGP aperture is the only piece of memory that is WC and no kernel structures are placed there, so it is possible to remove that lfence. No. In Linux kernel, rmb() means that all previous loads, including to any IO regions, will be executed before any subsequent load. You already must not place any data structures into WC memory --- for example, spinlocks wouldn't work there. What do you mean already? If we already have drivers loading data from WC memory, then rmb() needs to order them, whether or not they actually need it. If that were prohibitively costly, then we'd introduce a new barrier which does not order WC memory, right? wmb() also won't work on WC memory, because it assumes that writes are ordered. You mean the one defined like this: #define wmb() asm volatile(sfence ::: memory) ? If it assumed writes are ordered, then it would just be a barrier(). How can you possibly get rid of lfence from there just because you may happen to *know* that it isn't used (btw. the IO serialisation isn't for kernel data structures, it is for actual IO operations, generally). IO regions are in uncached memory, and x86 already serializes it fine. It flushes any write buffers on access to uncached memory. (BTW. what is the general portable rule for serializing writel() and readl()? On x86 they are serialized in hardware, but what on other archs?) Most tend to order them strongly these days. There are also relaxed variants for architectures that can take advantage of them. Doing that would lead to an unmaintainable mess. If drivers don't need rmb, then they don't call it. If wmb() doesn't currently work on write-combining memory, why should rmb() work there? I don't understand why you say wmb() doesn't work on WC memory. What part of which spec are you reading (or, given your mistrust of specs, what CPU are you seeing failures with)? The purpose of rmb() is to enforce ordering on architectures that don't force it in hardware --- that is not the case of x86. Well it clearly is the case because I just pointed you to a document that says they can go out of order. If you want to argue that existing implementations do not, then by all means go ahead and send a patch to Linus and see what he says about it ;) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
You already must not place any data structures into WC memory --- for example, spinlocks wouldn't work there. What do you mean already? I mean in current kernel (I checked it in 2.6.22) If we already have drivers loading data from WC memory, then rmb() needs to order them, whether or not they actually need it. If that were prohibitively costly, then we'd introduce a new barrier which does not order WC memory, right? wmb() also won't work on WC memory, because it assumes that writes are ordered. You mean the one defined like this: #define wmb() asm volatile(sfence ::: memory) ? If it assumed writes are ordered, then it would just be a barrier(). You read wrong part of the include file. Really, it is (2.6.22,include/asm-i386/system.h): #ifdef CONFIG_X86_OOSTORE #define wmb() alternative(lock; addl $0,0(%%esp), sfence, X86_FEATURE_XMM) #else #define wmb() __asm__ __volatile__ (: : :memory) #endif CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 --- so on Intel and AMD, it is really just barrier(). So drivers can't assume that wmb() works on write-combining memory. Doing that would lead to an unmaintainable mess. If drivers don't need rmb, then they don't call it. If wmb() doesn't currently work on write-combining memory, why should rmb() work there? I don't understand why you say wmb() doesn't work on WC memory. Because it is defined as __asm__ __volatile__ (: : :memory) And WC memory can reorder writes (WB memory can't). The purpose of rmb() is to enforce ordering on architectures that don't force it in hardware --- that is not the case of x86. Well it clearly is the case because I just pointed you to a document that says they can go out of order. If you want to argue that existing implementations do not, then by all means go ahead and send a patch to Linus and see what he says about it ;) I mean this: wmb() assumes that the data to be ordered are not in WC memory. rmb() assumes that the data can be in WC memory (lfence is only useful on WC --- it doesn't have any effect on other memory types). Mikulas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
Nick Piggin [EMAIL PROTECTED] wrote: Also, for non-wb memory. I don't think the Intel document referenced says anything about this, but the AMD document says that loads can pass loads (page 8, rule b). This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's shared with other Xen domains or the hypervisor. The reason this is necessary is because even if a Xen domain is UP the hypervisor might be SMP. It would be nice if we can have these adopt the new SMP barriers on x86 instead of the IO ones as they currently do. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Tue, Oct 16, 2007 at 12:08:01AM +0200, Mikulas Patocka wrote: > > On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) > > Mikulas Patocka <[EMAIL PROTECTED]> wrote: > > > > > > According to latest memory ordering specification documents from > > > > Intel and AMD, both manufacturers are committed to in-order loads > > > > from cacheable memory for the x86 architecture. Hence, smp_rmb() > > > > may be a simple barrier. > > > > > > > > http://developer.intel.com/products/processor/manuals/318147.pdf > > > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf > > > > > > Hi > > > > > > I'm just wondering about one thing --- what is LFENCE instruction > > > good for? > > > > > > SFENCE is for enforcing ordering in write-combining buffers (it > > > doesn't have sense in write-back cache mode). > > > MFENCE is for preventing of moving stores past loads. > > > > > > But what is LFENCE for? I read the above documents and they already > > > say that CPUs have ordered loads. > > > > > > > The cpus also have an explicit set of instructions that deliberately do > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > I know about unordered stores (movnti & similar) --- they basically use > write-combining method on memory that is normally write-back --- and they > need sfence. But which one instruction does unordered load and needs > lefence? Also, for non-wb memory. I don't think the Intel document referenced says anything about this, but the AMD document says that loads can pass loads (page 8, rule b). This is why our rmb() is still an lfence. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
Mikulas Patocka wrote: I know about unordered stores (movnti & similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? PREFETCHNTA. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
> On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) > Mikulas Patocka <[EMAIL PROTECTED]> wrote: > > > > According to latest memory ordering specification documents from > > > Intel and AMD, both manufacturers are committed to in-order loads > > > from cacheable memory for the x86 architecture. Hence, smp_rmb() > > > may be a simple barrier. > > > > > > http://developer.intel.com/products/processor/manuals/318147.pdf > > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf > > > > Hi > > > > I'm just wondering about one thing --- what is LFENCE instruction > > good for? > > > > SFENCE is for enforcing ordering in write-combining buffers (it > > doesn't have sense in write-back cache mode). > > MFENCE is for preventing of moving stores past loads. > > > > But what is LFENCE for? I read the above documents and they already > > say that CPUs have ordered loads. > > > > The cpus also have an explicit set of instructions that deliberately do > unordered stores/loads, and s/lfence etc are mostly designed for those. I know about unordered stores (movnti & similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) Mikulas Patocka <[EMAIL PROTECTED]> wrote: > > According to latest memory ordering specification documents from > > Intel and AMD, both manufacturers are committed to in-order loads > > from cacheable memory for the x86 architecture. Hence, smp_rmb() > > may be a simple barrier. > > > > http://developer.intel.com/products/processor/manuals/318147.pdf > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf > > Hi > > I'm just wondering about one thing --- what is LFENCE instruction > good for? > > SFENCE is for enforcing ordering in write-combining buffers (it > doesn't have sense in write-back cache mode). > MFENCE is for preventing of moving stores past loads. > > But what is LFENCE for? I read the above documents and they already > say that CPUs have ordered loads. > The cpus also have an explicit set of instructions that deliberately do unordered stores/loads, and s/lfence etc are mostly designed for those. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) Mikulas Patocka [EMAIL PROTECTED] wrote: According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a simple barrier. http://developer.intel.com/products/processor/manuals/318147.pdf http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf Hi I'm just wondering about one thing --- what is LFENCE instruction good for? SFENCE is for enforcing ordering in write-combining buffers (it doesn't have sense in write-back cache mode). MFENCE is for preventing of moving stores past loads. But what is LFENCE for? I read the above documents and they already say that CPUs have ordered loads. The cpus also have an explicit set of instructions that deliberately do unordered stores/loads, and s/lfence etc are mostly designed for those. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) Mikulas Patocka [EMAIL PROTECTED] wrote: According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a simple barrier. http://developer.intel.com/products/processor/manuals/318147.pdf http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf Hi I'm just wondering about one thing --- what is LFENCE instruction good for? SFENCE is for enforcing ordering in write-combining buffers (it doesn't have sense in write-back cache mode). MFENCE is for preventing of moving stores past loads. But what is LFENCE for? I read the above documents and they already say that CPUs have ordered loads. The cpus also have an explicit set of instructions that deliberately do unordered stores/loads, and s/lfence etc are mostly designed for those. I know about unordered stores (movnti similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? Mikulas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
Mikulas Patocka wrote: I know about unordered stores (movnti similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? PREFETCHNTA. -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Tue, Oct 16, 2007 at 12:08:01AM +0200, Mikulas Patocka wrote: On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) Mikulas Patocka [EMAIL PROTECTED] wrote: According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a simple barrier. http://developer.intel.com/products/processor/manuals/318147.pdf http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf Hi I'm just wondering about one thing --- what is LFENCE instruction good for? SFENCE is for enforcing ordering in write-combining buffers (it doesn't have sense in write-back cache mode). MFENCE is for preventing of moving stores past loads. But what is LFENCE for? I read the above documents and they already say that CPUs have ordered loads. The cpus also have an explicit set of instructions that deliberately do unordered stores/loads, and s/lfence etc are mostly designed for those. I know about unordered stores (movnti similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? Also, for non-wb memory. I don't think the Intel document referenced says anything about this, but the AMD document says that loads can pass loads (page 8, rule b). This is why our rmb() is still an lfence. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/