Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
Venki Pallipadi <[EMAIL PROTECTED]> writes: > Checking the manual for this. You are right, we had missed some steps here. > Actually, manual says on MP, PAT MSR on all CPUs must be consistent (even when > they are not really using it in their page tables. > So, this will change the init and shutdown parts significantly and there may > be > some challenges with CPU offline and KEXEC. We will redo this part in next > iteration. Well the normal kexec path is no worse then reboot. The kdump path is a mess but only a minor one, and with us only changing the UC- case we can probably just ignore it and leave the system started with that pat register set to WC :) What we are doing really should be no worse the MTRR setup except that disabling it at reboot is polite. CPU online and offline that is weird, but so far it is always weird, and I don't think ever quite correct. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
Andi Kleen <[EMAIL PROTECTED]> writes: >> I do know we need to use the low 4 pat mappings to avoid most of the PAT >> errata issues. > > They don't really matter. These are all very old systems who have run > fine for many years without PAT. It is no problem to let them > continue to do so and just disable PAT for them. So just clear pat bit in > CPU initialization for any CPUs with non trivial erratas in this > area. > > PAT is only really needed on modern boxes. > > Just someone needs to go through the old errata sheets and find > out on which CPUs it is needed to clear the bit. It has been ages now, but my impression when I wrote the patch that current cores still had a few outstanding errata with using the extended pat bits. Further it was my impression was that if we just changed UC- to WC we work on essentially everything, because PAT is always enabled on the cores that support it. Therefore since we only have 3 interesting caching modes. WB, WC, UC. We should be very careful about reprogramming it and we can ignore the errors. As for the pat class errata about inconsistent mappings those are reoccurring issues, that happen across all cpu types (x86/ppc/fred), and every major core overhaul is likely to have them again. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PAT 64b: Basic PAT implementation
On 14 Dec, 00:50, Andi Kleen <[EMAIL PROTECTED]> wrote: > > +void __cpuinit pat_init(void) > > +{ > > + /* Set PWT+PCD to Write-Combining. All other bits stay the same */ > > + if (cpu_has_pat) { > > All the old CPUs (PPro etc.) with known PAT bugs need to clear this flag > now in their CPU init functions. It is fine to be aggressive there > because these old systems have lived so long without PAT they can do > so forever. So perhaps it's best to just white list it only for newer > CPUs on the Intel side at least. > Another problem is that there are some popular modules (ATI, Nvidia for once) > who reprogram the PAT registers on their own, likely different. Need some way > to detect > that case I guess, otherwise lots of users will see strange malfunctions. > Maybe recheck after module load? This may not be as big problem as thought, since sane and at least one vendor driver (Quadrics QsNetII) searches the PAT slots for a WC entry - where this has already been setup by the kernel, it'll use the right one. > > + ||| > > + 000 WB default > > + 010 UC_MINUS _PAGE_PCD > > + 011 WC _PAGE_WC > > + PAT bit unused */ > > + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) | > > +PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC); > > + rdmsrl(MSR_IA32_CR_PAT, boot_pat_state); > > + wrmsrl(MSR_IA32_CR_PAT, pat); > > + __flush_tlb_all(); > > + asm volatile("wbinvd"); > > Have you double checked this is the full procedure from the manual? iirc there > were some steps missing. > > -Andi -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
* Siddha, Suresh B <[EMAIL PROTECTED]> wrote: > > Ok. I will send a separate patch fixing ioremap_nocache on x86. > > Appended the patch. x86 folks, please consider for x86 mm git tree. > Thanks. thanks, applied. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
On Fri, Dec 14, 2007 at 01:10:39PM -0800, Siddha, Suresh B wrote: > On Thu, Dec 13, 2007 at 09:23:26PM -0700, Eric W. Biederman wrote: > > [EMAIL PROTECTED] (Eric W. Biederman) writes: > > Ok. My analysis here was wrong. Currently pgprot_noncached and > > ioremap_nocache are out of sync. With ioremap_nocache only specifying > > _PAGE_PCD and pgprot_noncached specifying _PAGE_PCD | _PAGE_PWT. > > > > So I don't have a clue how someone could reprogram the mtrrs currently > > and expect things to work. > > > > ... > > > > If we bother to ask ioremap for memory that is not cached, the last > > thing in the world we want is the MTRRs upgrading that to write combining. > > So ioremap_nocache has been slightly buggy for ages. ioremap_nocache > > and PAGE_KERNEL_NOCACHE should get _PAGE_PWT added to their > > definitions. > > > > Could we please get a cleanup patch at the beginning of this patchset > > or that comes before it that fixes ioremap_nocache on x86? > > > > That will make us a lot more git-bisect safe. > > Ok. I will send a separate patch fixing ioremap_nocache on x86. Appended the patch. x86 folks, please consider for x86 mm git tree. Thanks. --- [patch] x86: Set strong uncacheable where UC is really desired Also use _PAGE_PWT for all the mappings which need uncache mapping. Instead of existing PAT2 which is UC- (and can be overwritten by MTRRs), we now use PAT3 which is strong uncacheable. This makes it consistent with pgprot_noncached() Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]> --- diff --git a/arch/x86/mm/ioremap_32.c b/arch/x86/mm/ioremap_32.c index 0b27831..ef0f6a4 100644 --- a/arch/x86/mm/ioremap_32.c +++ b/arch/x86/mm/ioremap_32.c @@ -119,7 +119,7 @@ EXPORT_SYMBOL(__ioremap); void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { unsigned long last_addr; - void __iomem *p = __ioremap(phys_addr, size, _PAGE_PCD); + void __iomem *p = __ioremap(phys_addr, size, _PAGE_PCD | _PAGE_PWT); if (!p) return p; diff --git a/arch/x86/mm/ioremap_64.c b/arch/x86/mm/ioremap_64.c index 6cac90a..8be3062 100644 --- a/arch/x86/mm/ioremap_64.c +++ b/arch/x86/mm/ioremap_64.c @@ -158,7 +158,7 @@ EXPORT_SYMBOL(__ioremap); void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { - return __ioremap(phys_addr, size, _PAGE_PCD); + return __ioremap(phys_addr, size, _PAGE_PCD | _PAGE_PWT); } EXPORT_SYMBOL(ioremap_nocache); diff --git a/include/asm-x86/pgtable_32.h b/include/asm-x86/pgtable_32.h index ed3e70d..b1215e1 100644 --- a/include/asm-x86/pgtable_32.h +++ b/include/asm-x86/pgtable_32.h @@ -156,7 +156,7 @@ void paging_init(void); extern unsigned long long __PAGE_KERNEL, __PAGE_KERNEL_EXEC; #define __PAGE_KERNEL_RO (__PAGE_KERNEL & ~_PAGE_RW) #define __PAGE_KERNEL_RX (__PAGE_KERNEL_EXEC & ~_PAGE_RW) -#define __PAGE_KERNEL_NOCACHE (__PAGE_KERNEL | _PAGE_PCD) +#define __PAGE_KERNEL_NOCACHE (__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT) #define __PAGE_KERNEL_LARGE(__PAGE_KERNEL | _PAGE_PSE) #define __PAGE_KERNEL_LARGE_EXEC (__PAGE_KERNEL_EXEC | _PAGE_PSE) diff --git a/include/asm-x86/pgtable_64.h b/include/asm-x86/pgtable_64.h index 9b0ff47..4e4dcc4 100644 --- a/include/asm-x86/pgtable_64.h +++ b/include/asm-x86/pgtable_64.h @@ -185,13 +185,13 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, unsigned long #define __PAGE_KERNEL_EXEC \ (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED) #define __PAGE_KERNEL_NOCACHE \ - (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_PCD | _PAGE_ACCESSED | _PAGE_NX) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_PCD | _PAGE_PWT | _PAGE_ACCESSED | _PAGE_NX) #define __PAGE_KERNEL_RO \ (_PAGE_PRESENT | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_NX) #define __PAGE_KERNEL_VSYSCALL \ (_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED) #define __PAGE_KERNEL_VSYSCALL_NOCACHE \ - (_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED | _PAGE_PCD) + (_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED | _PAGE_PCD | _PAGE_PWT) #define __PAGE_KERNEL_LARGE \ (__PAGE_KERNEL | _PAGE_PSE) #define __PAGE_KERNEL_LARGE_EXEC \ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
On Thu, Dec 13, 2007 at 09:23:26PM -0700, Eric W. Biederman wrote: > [EMAIL PROTECTED] (Eric W. Biederman) writes: > Ok. My analysis here was wrong. Currently pgprot_noncached and > ioremap_nocache are out of sync. With ioremap_nocache only specifying > _PAGE_PCD and pgprot_noncached specifying _PAGE_PCD | _PAGE_PWT. > > So I don't have a clue how someone could reprogram the mtrrs currently > and expect things to work. > > ... > > If we bother to ask ioremap for memory that is not cached, the last > thing in the world we want is the MTRRs upgrading that to write combining. > So ioremap_nocache has been slightly buggy for ages. ioremap_nocache > and PAGE_KERNEL_NOCACHE should get _PAGE_PWT added to their > definitions. > > Could we please get a cleanup patch at the beginning of this patchset > or that comes before it that fixes ioremap_nocache on x86? > > That will make us a lot more git-bisect safe. Ok. I will send a separate patch fixing ioremap_nocache on x86. thanks, suresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
On Thu, Dec 13, 2007 at 08:48:45PM -0700, Eric W. Biederman wrote: > > + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) | > > + PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC); > > I strongly object to this configuration. > > The caching modes of interest are: > PAT_WB write-back or a close as the MTRRs will allow >used for WC today. > PAT_UC completely uncachable not overridable by MTRRs >and what we use today for pgprot_noncached > PAT_WC what isn't available for current use. > > We should use: > > + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) | > > + PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC); > > Changing the UC- which currently allows write-combining if the MTRRs specify > it, > to WC. This grandfathers in all of our current usage and changes the one > PAT type that could today and in legacy mode specify WC to really specify WC. That seems reasonable. But looking at mainline kernel, ioremap_nocache() actually uses UC_MINUS. Wonder why it is not using UC (like pgprot_noncached). I think it is ok to change ioremap_nocache() to use UC. thanks, suresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
Andi Kleen wrote: I do know we need to use the low 4 pat mappings to avoid most of the PAT errata issues. They don't really matter. These are all very old systems who have run fine for many years without PAT. It is no problem to let them continue to do so and just disable PAT for them. So just clear pat bit in CPU initialization for any CPUs with non trivial erratas in this area. PAT is only really needed on modern boxes. How many mapping types do we actually need? The only ones which are likely to be used in practice are WB, UC, WC, which still leaves a spare. (Any intended users of WP or WT?) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
On Fri, Dec 14, 2007 at 01:42:12AM +0100, Andi Kleen wrote: > > +void __cpuinit pat_init(void) > > +{ > > + /* Set PWT+PCD to Write-Combining. All other bits stay the same */ > > + if (cpu_has_pat) { > > All the old CPUs (PPro etc.) with known PAT bugs need to clear this flag > now in their CPU init functions. It is fine to be aggressive there > because these old systems have lived so long without PAT they can do > so forever. So perhaps it's best to just white list it only for newer > CPUs on the Intel side at least. Yes. Enabling this only on relatively newer CPUs is safer. Will do that in next iteration of the patches. > Another problem is that there are some popular modules (ATI, Nvidia for once) > who reprogram the PAT registers on their own, likely different. Need some way > to detect > that case I guess, otherwise lots of users will see strange malfunctions. > Maybe recheck after module load? Yes. We can check that at load time. But they can still do bad things at runt ime, like say when 3D gets enabled etc?? > > + ||| > > + 000 WB default > > + 010 UC_MINUS _PAGE_PCD > > + 011 WC _PAGE_WC > > + PAT bit unused */ > > + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) | > > + PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC); > > + rdmsrl(MSR_IA32_CR_PAT, boot_pat_state); > > + wrmsrl(MSR_IA32_CR_PAT, pat); > > + __flush_tlb_all(); > > + asm volatile("wbinvd"); > > Have you double checked this is the full procedure from the manual? iirc there > were some steps missing. Checking the manual for this. You are right, we had missed some steps here. Actually, manual says on MP, PAT MSR on all CPUs must be consistent (even when they are not really using it in their page tables. So, this will change the init and shutdown parts significantly and there may be some challenges with CPU offline and KEXEC. We will redo this part in next iteration. Thanks, Venki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
> I do know we need to use the low 4 pat mappings to avoid most of the PAT > errata issues. They don't really matter. These are all very old systems who have run fine for many years without PAT. It is no problem to let them continue to do so and just disable PAT for them. So just clear pat bit in CPU initialization for any CPUs with non trivial erratas in this area. PAT is only really needed on modern boxes. Just someone needs to go through the old errata sheets and find out on which CPUs it is needed to clear the bit. > As for Andi's concern about modules playing games with the PAT mappings > if we don't redefine how we use the page table entries our exposure to > badly behaved modules more limited. I would just recheck them after module load and if it happens print a nasty message and program them back. e.g. kernel debuggers need an after module notifier anyways, so it would be fine to just add one and hook into that. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
[EMAIL PROTECTED] (Eric W. Biederman) writes: > We should use: >> +pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) | >> + PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC); > > Changing the UC- which currently allows write-combining if the MTRRs specify > it, > to WC. This grandfathers in all of our current usage and changes the one > PAT type that could today and in legacy mode specify WC to really specify WC. > > I don't know if we need to set the high half or not, that would depend > on the state of the PAT errata. > > I do know we need to use the low 4 pat mappings to avoid most of the PAT > errata issues. > > As for Andi's concern about modules playing games with the PAT mappings > if we don't redefine how we use the page table entries our exposure to > badly behaved modules more limited. Ok. My analysis here was wrong. Currently pgprot_noncached and ioremap_nocache are out of sync. With ioremap_nocache only specifying _PAGE_PCD and pgprot_noncached specifying _PAGE_PCD | _PAGE_PWT. So I don't have a clue how someone could reprogram the mtrrs currently and expect things to work. ... If we bother to ask ioremap for memory that is not cached, the last thing in the world we want is the MTRRs upgrading that to write combining. So ioremap_nocache has been slightly buggy for ages. ioremap_nocache and PAGE_KERNEL_NOCACHE should get _PAGE_PWT added to their definitions. Could we please get a cleanup patch at the beginning of this patchset or that comes before it that fixes ioremap_nocache on x86? That will make us a lot more git-bisect safe. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
[EMAIL PROTECTED] writes: > Originally based on a patch from Eric Biederman, but heavily changed. > > Forward port of pat-base.patch to x86 tree, with a bug fix. > Code was using 'PCD|PWT' i.e., PAT3 for WC mapping. So set the WC mapping at > correct PAT fields PA3/PA7. Well that wasn't from my original tested patch. Grr. > TBD: KEXEC and other CPU offline paths may need pat_shutdown()? > Index: linux-2.6/arch/x86/mm/Makefile_64 > === > --- linux-2.6.orig/arch/x86/mm/Makefile_64 2007-12-11 03:30:34.0 -0800 > +++ linux-2.6/arch/x86/mm/Makefile_64 2007-12-11 03:42:08.0 -0800 > @@ -2,7 +2,7 @@ > # Makefile for the linux x86_64-specific parts of the memory manager. > # > > -obj-y := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o > mmap_64.o > +obj-y := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o > mmap_64.o > pat.o > obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o > obj-$(CONFIG_NUMA) += numa_64.o > obj-$(CONFIG_K8_NUMA) += k8topology_64.o > Index: linux-2.6/arch/x86/mm/pat.c > === > --- /dev/null 1970-01-01 00:00:00.0 + > +++ linux-2.6/arch/x86/mm/pat.c 2007-12-11 04:12:47.0 -0800 > @@ -0,0 +1,57 @@ > +/* Handle caching attributes in page tables (PAT) */ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +static u64 boot_pat_state; > + > +enum { > + PAT_UC = 0, /* uncached */ > + PAT_WC = 1, /* Write combining */ > + PAT_WT = 4, /* Write Through */ > + PAT_WP = 5, /* Write Protected */ > + PAT_WB = 6, /* Write Back (default) */ > + PAT_UC_MINUS = 7, /* UC, but can be overriden by MTRR */ > +}; > + > +#define PAT(x,y) ((u64)PAT_ ## y << ((x)*8)) > + > +void __cpuinit pat_init(void) > +{ > + /* Set PWT+PCD to Write-Combining. All other bits stay the same */ > + if (cpu_has_pat) { > + u64 pat; > + /* PTE encoding used in Linux: > + PAT > + |PCD > + ||PWT > + ||| > +000 WB default > +010 UC_MINUS _PAGE_PCD > +011 WC _PAGE_WC > +PAT bit unused */ > + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) | > + PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC); I strongly object to this configuration. The caching modes of interest are: PAT_WB write-back or a close as the MTRRs will allow used for WC today. PAT_UC completely uncachable not overridable by MTRRs and what we use today for pgprot_noncached PAT_WC what isn't available for current use. We should use: > + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) | > + PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC); Changing the UC- which currently allows write-combining if the MTRRs specify it, to WC. This grandfathers in all of our current usage and changes the one PAT type that could today and in legacy mode specify WC to really specify WC. I don't know if we need to set the high half or not, that would depend on the state of the PAT errata. I do know we need to use the low 4 pat mappings to avoid most of the PAT errata issues. As for Andi's concern about modules playing games with the PAT mappings if we don't redefine how we use the page table entries our exposure to badly behaved modules more limited. > + rdmsrl(MSR_IA32_CR_PAT, boot_pat_state); > + wrmsrl(MSR_IA32_CR_PAT, pat); > + __flush_tlb_all(); > + asm volatile("wbinvd"); > + } > +} > + > +#undef PAT > + > +void pat_shutdown(void) > +{ > + /* Restore CPU default pat state */ > + if (cpu_has_pat) { > + wrmsrl(MSR_IA32_CR_PAT, boot_pat_state); > + __flush_tlb_all(); > + asm volatile("wbinvd"); > + } > +} > + Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
> +void __cpuinit pat_init(void) > +{ > + /* Set PWT+PCD to Write-Combining. All other bits stay the same */ > + if (cpu_has_pat) { All the old CPUs (PPro etc.) with known PAT bugs need to clear this flag now in their CPU init functions. It is fine to be aggressive there because these old systems have lived so long without PAT they can do so forever. So perhaps it's best to just white list it only for newer CPUs on the Intel side at least. Another problem is that there are some popular modules (ATI, Nvidia for once) who reprogram the PAT registers on their own, likely different. Need some way to detect that case I guess, otherwise lots of users will see strange malfunctions. Maybe recheck after module load? > + ||| > +000 WB default > +010 UC_MINUS _PAGE_PCD > +011 WC _PAGE_WC > +PAT bit unused */ > + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) | > + PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC); > + rdmsrl(MSR_IA32_CR_PAT, boot_pat_state); > + wrmsrl(MSR_IA32_CR_PAT, pat); > + __flush_tlb_all(); > + asm volatile("wbinvd"); Have you double checked this is the full procedure from the manual? iirc there were some steps missing. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 02/12] PAT 64b: Basic PAT implementation
Originally based on a patch from Eric Biederman, but heavily changed. Forward port of pat-base.patch to x86 tree, with a bug fix. Code was using 'PCD|PWT' i.e., PAT3 for WC mapping. So set the WC mapping at correct PAT fields PA3/PA7. TBD: KEXEC and other CPU offline paths may need pat_shutdown()? Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]> --- Index: linux-2.6/arch/x86/kernel/setup64.c === --- linux-2.6.orig/arch/x86/kernel/setup64.c2007-12-11 03:30:46.0 -0800 +++ linux-2.6/arch/x86/kernel/setup64.c 2007-12-11 03:42:08.0 -0800 @@ -291,9 +291,11 @@ fpu_init(); + pat_init(); raw_local_save_flags(kernel_eflags); } void cpu_shutdown(void) { + pat_shutdown(); } Index: linux-2.6/arch/x86/mm/Makefile_64 === --- linux-2.6.orig/arch/x86/mm/Makefile_64 2007-12-11 03:30:34.0 -0800 +++ linux-2.6/arch/x86/mm/Makefile_64 2007-12-11 03:42:08.0 -0800 @@ -2,7 +2,7 @@ # Makefile for the linux x86_64-specific parts of the memory manager. # -obj-y := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o mmap_64.o +obj-y := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o mmap_64.o pat.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_NUMA) += numa_64.o obj-$(CONFIG_K8_NUMA) += k8topology_64.o Index: linux-2.6/arch/x86/mm/pat.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6/arch/x86/mm/pat.c 2007-12-11 04:12:47.0 -0800 @@ -0,0 +1,57 @@ +/* Handle caching attributes in page tables (PAT) */ +#include +#include +#include +#include +#include +#include +#include + +static u64 boot_pat_state; + +enum { + PAT_UC = 0, /* uncached */ + PAT_WC = 1, /* Write combining */ + PAT_WT = 4, /* Write Through */ + PAT_WP = 5, /* Write Protected */ + PAT_WB = 6, /* Write Back (default) */ + PAT_UC_MINUS = 7, /* UC, but can be overriden by MTRR */ +}; + +#define PAT(x,y) ((u64)PAT_ ## y << ((x)*8)) + +void __cpuinit pat_init(void) +{ + /* Set PWT+PCD to Write-Combining. All other bits stay the same */ + if (cpu_has_pat) { + u64 pat; + /* PTE encoding used in Linux: + PAT + |PCD + ||PWT + ||| + 000 WB default + 010 UC_MINUS _PAGE_PCD + 011 WC _PAGE_WC + PAT bit unused */ + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) | + PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC); + rdmsrl(MSR_IA32_CR_PAT, boot_pat_state); + wrmsrl(MSR_IA32_CR_PAT, pat); + __flush_tlb_all(); + asm volatile("wbinvd"); + } +} + +#undef PAT + +void pat_shutdown(void) +{ + /* Restore CPU default pat state */ + if (cpu_has_pat) { + wrmsrl(MSR_IA32_CR_PAT, boot_pat_state); + __flush_tlb_all(); + asm volatile("wbinvd"); + } +} + Index: linux-2.6/arch/x86/pci/i386.c === --- linux-2.6.orig/arch/x86/pci/i386.c 2007-12-11 03:30:34.0 -0800 +++ linux-2.6/arch/x86/pci/i386.c 2007-12-11 03:42:08.0 -0800 @@ -300,8 +300,6 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma, enum pci_mmap_state mmap_state, int write_combine) { - unsigned long prot; - /* I/O space cannot be accessed via normal processor loads and * stores on this platform. */ @@ -311,14 +309,11 @@ /* Leave vm_pgoff as-is, the PCI space address is the physical * address on this platform. */ - prot = pgprot_val(vma->vm_page_prot); - if (boot_cpu_data.x86 > 3) - prot |= _PAGE_PCD | _PAGE_PWT; - vma->vm_page_prot = __pgprot(prot); + if (write_combine) + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); + else + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); - /* Write-combine setting is ignored, it is changed via the mtrr -* interfaces on this platform. -*/ if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, vma->vm_end - vma->vm_start, vma->vm_page_prot)) Index: linux-2.6/include/asm-x86/cpufeature_32.h === --- linux-2.6.orig/include/asm-x86/cpufeature_32.h 2007-12-11 03:30:34.0 -0800 +++ linux-2.6/include/asm