Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-17 Thread Eric W. Biederman
Venki Pallipadi <[EMAIL PROTECTED]> writes:

> Checking the manual for this. You are right, we had missed some steps here.
> Actually, manual says on MP, PAT MSR on all CPUs must be consistent (even when
> they are not really using it in their page tables.
> So, this will change the init and shutdown parts significantly and there may 
> be
> some challenges with CPU offline and KEXEC. We will redo this part in next
> iteration.

Well the normal kexec path is no worse then reboot.  The kdump path is a
mess but only a minor one, and with us only changing the UC- case we
can probably just ignore it and leave the system started with that
pat register set to WC :)

What we are doing really should be no worse the MTRR setup except
that disabling it at reboot is polite.

CPU online and offline that is weird, but so far it is always weird,
and I don't think ever quite correct.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-17 Thread Eric W. Biederman
Andi Kleen <[EMAIL PROTECTED]> writes:

>> I do know we need to use the low 4 pat mappings to avoid most of the PAT
>> errata issues.
>
> They don't really matter. These are all very old systems who have run 
> fine for many years without PAT. It is no problem to let them
> continue to do so and just disable PAT for them. So just clear pat bit in
> CPU initialization for any CPUs with non trivial erratas in this
> area.
>
> PAT is only really needed on modern boxes.
>
> Just someone needs to go through the old errata sheets and find
> out on which CPUs it is needed to clear the bit.

It has been ages now, but my impression when I wrote the patch that
current cores still had a few outstanding errata with using the
extended pat bits.

Further it was my impression was that if we just changed UC- to WC
we work on essentially everything, because PAT is always enabled
on the cores that support it.

Therefore since we only have 3 interesting caching modes.
WB, WC, UC.  We should be very careful about reprogramming it
and we can ignore the errors.

As for the pat class errata about inconsistent mappings those are
reoccurring issues, that happen across all cpu types (x86/ppc/fred),
and every  major core overhaul is likely to have them again.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PAT 64b: Basic PAT implementation

2007-12-17 Thread Daniel J Blueman
On 14 Dec, 00:50, Andi Kleen <[EMAIL PROTECTED]> wrote:
> > +void __cpuinit pat_init(void)
> > +{
> > +  /* Set PWT+PCD to Write-Combining. All other bits stay the same */
> > +  if (cpu_has_pat) {
>
> All the old CPUs (PPro etc.) with known PAT bugs need to clear this flag
> now in their CPU init functions. It is fine to be aggressive there
> because these old systems have lived so long without PAT they can do
> so forever. So perhaps it's best to just white list it only for newer
> CPUs on the Intel side at least.

> Another problem is that there are some popular modules (ATI, Nvidia for once)
> who reprogram the PAT registers on their own, likely different. Need some way 
> to detect
> that case I guess, otherwise lots of users will see strange malfunctions.
> Maybe recheck after module load?

This may not be as big problem as thought, since sane and at least one
vendor driver (Quadrics QsNetII) searches the PAT slots for a WC entry
- where this has already been setup by the kernel, it'll use the right
one.

> > +   |||
> > + 000 WB default
> > + 010 UC_MINUS   _PAGE_PCD
> > + 011 WC _PAGE_WC
> > + PAT bit unused */
> > +  pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
> > +PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);
> > +  rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
> > +  wrmsrl(MSR_IA32_CR_PAT, pat);
> > +  __flush_tlb_all();
> > +  asm volatile("wbinvd");
>
> Have you double checked this is the full procedure from the manual? iirc there
> were some steps missing.
>
> -Andi
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-14 Thread Ingo Molnar

* Siddha, Suresh B <[EMAIL PROTECTED]> wrote:

> > Ok. I will send a separate patch fixing ioremap_nocache on x86.
> 
> Appended the patch. x86 folks, please consider for x86 mm git tree. 
> Thanks.

thanks, applied.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-14 Thread Siddha, Suresh B
On Fri, Dec 14, 2007 at 01:10:39PM -0800, Siddha, Suresh B wrote:
> On Thu, Dec 13, 2007 at 09:23:26PM -0700, Eric W. Biederman wrote:
> > [EMAIL PROTECTED] (Eric W. Biederman) writes:
> > Ok.  My analysis here was wrong.  Currently pgprot_noncached and
> > ioremap_nocache are out of sync.  With ioremap_nocache only specifying
> > _PAGE_PCD and pgprot_noncached specifying _PAGE_PCD | _PAGE_PWT.
> > 
> > So I don't have a clue how someone could reprogram the mtrrs currently
> > and expect things to work.
> > 
> > ...
> > 
> > If we bother to ask ioremap for memory that is not cached, the last
> > thing in the world we want is the MTRRs upgrading that to write combining.
> > So ioremap_nocache has been slightly buggy for ages.  ioremap_nocache
> > and PAGE_KERNEL_NOCACHE should get _PAGE_PWT added to their
> > definitions.
> > 
> > Could we please get a cleanup patch at the beginning of this patchset
> > or that comes before it that fixes ioremap_nocache on x86?
> > 
> > That will make us a lot more git-bisect safe.
> 
> Ok. I will send a separate patch  fixing ioremap_nocache on x86.

Appended the patch. x86 folks, please consider for x86 mm git tree. Thanks.

---
[patch] x86: Set strong uncacheable where UC is really desired

Also use _PAGE_PWT for all the mappings which need uncache mapping. Instead of
existing PAT2 which is UC- (and can be overwritten by MTRRs), we now use PAT3
which is strong uncacheable.

This makes it consistent with pgprot_noncached()

Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

diff --git a/arch/x86/mm/ioremap_32.c b/arch/x86/mm/ioremap_32.c
index 0b27831..ef0f6a4 100644
--- a/arch/x86/mm/ioremap_32.c
+++ b/arch/x86/mm/ioremap_32.c
@@ -119,7 +119,7 @@ EXPORT_SYMBOL(__ioremap);
 void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
 {
unsigned long last_addr;
-   void __iomem *p = __ioremap(phys_addr, size, _PAGE_PCD);
+   void __iomem *p = __ioremap(phys_addr, size, _PAGE_PCD | _PAGE_PWT);
if (!p) 
return p; 
 
diff --git a/arch/x86/mm/ioremap_64.c b/arch/x86/mm/ioremap_64.c
index 6cac90a..8be3062 100644
--- a/arch/x86/mm/ioremap_64.c
+++ b/arch/x86/mm/ioremap_64.c
@@ -158,7 +158,7 @@ EXPORT_SYMBOL(__ioremap);
 
 void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
 {
-   return __ioremap(phys_addr, size, _PAGE_PCD);
+   return __ioremap(phys_addr, size, _PAGE_PCD | _PAGE_PWT);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
diff --git a/include/asm-x86/pgtable_32.h b/include/asm-x86/pgtable_32.h
index ed3e70d..b1215e1 100644
--- a/include/asm-x86/pgtable_32.h
+++ b/include/asm-x86/pgtable_32.h
@@ -156,7 +156,7 @@ void paging_init(void);
 extern unsigned long long __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
 #define __PAGE_KERNEL_RO   (__PAGE_KERNEL & ~_PAGE_RW)
 #define __PAGE_KERNEL_RX   (__PAGE_KERNEL_EXEC & ~_PAGE_RW)
-#define __PAGE_KERNEL_NOCACHE  (__PAGE_KERNEL | _PAGE_PCD)
+#define __PAGE_KERNEL_NOCACHE  (__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_LARGE(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC   (__PAGE_KERNEL_EXEC | _PAGE_PSE)
 
diff --git a/include/asm-x86/pgtable_64.h b/include/asm-x86/pgtable_64.h
index 9b0ff47..4e4dcc4 100644
--- a/include/asm-x86/pgtable_64.h
+++ b/include/asm-x86/pgtable_64.h
@@ -185,13 +185,13 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm, unsigned long
 #define __PAGE_KERNEL_EXEC \
(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED)
 #define __PAGE_KERNEL_NOCACHE \
-   (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_PCD | _PAGE_ACCESSED | 
_PAGE_NX)
+   (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_PCD | _PAGE_PWT | 
_PAGE_ACCESSED | _PAGE_NX)
 #define __PAGE_KERNEL_RO \
(_PAGE_PRESENT | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_NX)
 #define __PAGE_KERNEL_VSYSCALL \
(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED)
 #define __PAGE_KERNEL_VSYSCALL_NOCACHE \
-   (_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED | _PAGE_PCD)
+   (_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_LARGE \
(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC \
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-14 Thread Siddha, Suresh B
On Thu, Dec 13, 2007 at 09:23:26PM -0700, Eric W. Biederman wrote:
> [EMAIL PROTECTED] (Eric W. Biederman) writes:
> Ok.  My analysis here was wrong.  Currently pgprot_noncached and
> ioremap_nocache are out of sync.  With ioremap_nocache only specifying
> _PAGE_PCD and pgprot_noncached specifying _PAGE_PCD | _PAGE_PWT.
> 
> So I don't have a clue how someone could reprogram the mtrrs currently
> and expect things to work.
> 
> ...
> 
> If we bother to ask ioremap for memory that is not cached, the last
> thing in the world we want is the MTRRs upgrading that to write combining.
> So ioremap_nocache has been slightly buggy for ages.  ioremap_nocache
> and PAGE_KERNEL_NOCACHE should get _PAGE_PWT added to their
> definitions.
> 
> Could we please get a cleanup patch at the beginning of this patchset
> or that comes before it that fixes ioremap_nocache on x86?
> 
> That will make us a lot more git-bisect safe.

Ok. I will send a separate patch  fixing ioremap_nocache on x86.

thanks,
suresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-14 Thread Siddha, Suresh B
On Thu, Dec 13, 2007 at 08:48:45PM -0700, Eric W. Biederman wrote:
> > +   pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
> > + PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);
> 
> I strongly object to this configuration.
> 
> The caching modes of interest are:
> PAT_WB write-back or a close as the MTRRs will allow
>used for WC today.
> PAT_UC completely uncachable not overridable by MTRRs 
>and what we use today for pgprot_noncached
> PAT_WC what isn't available for current use.
>
> We should use:
> > +   pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) |
> > + PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC);
> 
> Changing the UC- which currently allows write-combining if the MTRRs specify 
> it,
> to WC.  This grandfathers in all of our current usage and changes the one
> PAT type that could today and in legacy mode specify WC to really specify WC.

That seems reasonable. But looking at mainline kernel, ioremap_nocache()
actually uses UC_MINUS. Wonder why it is not using UC (like
pgprot_noncached).  I think it is ok to change ioremap_nocache() to use UC.

thanks,
suresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-14 Thread H. Peter Anvin

Andi Kleen wrote:

I do know we need to use the low 4 pat mappings to avoid most of the PAT
errata issues.


They don't really matter. These are all very old systems who have run 
fine for many years without PAT. It is no problem to let them

continue to do so and just disable PAT for them. So just clear pat bit in
CPU initialization for any CPUs with non trivial erratas in this
area.

PAT is only really needed on modern boxes.


How many mapping types do we actually need?  The only ones which are 
likely to be used in practice are WB, UC, WC, which still leaves a 
spare.  (Any intended users of WP or WT?)


-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-14 Thread Venki Pallipadi
On Fri, Dec 14, 2007 at 01:42:12AM +0100, Andi Kleen wrote:
> > +void __cpuinit pat_init(void)
> > +{
> > +   /* Set PWT+PCD to Write-Combining. All other bits stay the same */
> > +   if (cpu_has_pat) {
> 
> All the old CPUs (PPro etc.) with known PAT bugs need to clear this flag 
> now in their CPU init functions. It is fine to be aggressive there
> because these old systems have lived so long without PAT they can do 
> so forever. So perhaps it's best to just white list it only for newer
> CPUs on the Intel side at least.

Yes. Enabling this only on relatively newer CPUs is safer. Will do that in next 
iteration of the patches.
 
> Another problem is that there are some popular modules (ATI, Nvidia for once)
> who reprogram the PAT registers on their own, likely different. Need some way 
> to detect
> that case I guess, otherwise lots of users will see strange malfunctions.
> Maybe recheck after module load?

Yes. We can check that at load time. But they can still do bad things at runt 
ime, like say when 3D gets enabled etc??

 
> > +   |||
> > +  000 WB default
> > +  010 UC_MINUS   _PAGE_PCD
> > +  011 WC _PAGE_WC
> > +  PAT bit unused */
> > +   pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
> > + PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);
> > +   rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
> > +   wrmsrl(MSR_IA32_CR_PAT, pat);
> > +   __flush_tlb_all();
> > +   asm volatile("wbinvd");
> 
> Have you double checked this is the full procedure from the manual? iirc there
> were some steps missing.


Checking the manual for this. You are right, we had missed some steps here.
Actually, manual says on MP, PAT MSR on all CPUs must be consistent (even when 
they are not really using it in their page tables.
So, this will change the init and shutdown parts significantly and there may be 
some challenges with CPU offline and KEXEC. We will redo this part in next 
iteration.

Thanks,
Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-14 Thread Andi Kleen
> I do know we need to use the low 4 pat mappings to avoid most of the PAT
> errata issues.

They don't really matter. These are all very old systems who have run 
fine for many years without PAT. It is no problem to let them
continue to do so and just disable PAT for them. So just clear pat bit in
CPU initialization for any CPUs with non trivial erratas in this
area.

PAT is only really needed on modern boxes.

Just someone needs to go through the old errata sheets and find
out on which CPUs it is needed to clear the bit.

> As for Andi's concern about modules playing games with the PAT mappings
> if we don't redefine how we use the page table entries our exposure to
> badly behaved modules more limited.

I would just recheck them after module load and if it happens
print a nasty message and program them back. e.g. kernel debuggers
need an after module notifier anyways, so it would be fine
to just add one and hook into that.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-13 Thread Eric W. Biederman
[EMAIL PROTECTED] (Eric W. Biederman) writes:


> We should use:
>> +pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) |
>> +  PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC);
>
> Changing the UC- which currently allows write-combining if the MTRRs specify 
> it,
> to WC.  This grandfathers in all of our current usage and changes the one
> PAT type that could today and in legacy mode specify WC to really specify WC.
>
> I don't know if we need to set the high half or not, that would depend
> on the state of the PAT errata.
>
> I do know we need to use the low 4 pat mappings to avoid most of the PAT
> errata issues.
>
> As for Andi's concern about modules playing games with the PAT mappings
> if we don't redefine how we use the page table entries our exposure to
> badly behaved modules more limited.

Ok.  My analysis here was wrong.  Currently pgprot_noncached and
ioremap_nocache are out of sync.  With ioremap_nocache only specifying
_PAGE_PCD and pgprot_noncached specifying _PAGE_PCD | _PAGE_PWT.

So I don't have a clue how someone could reprogram the mtrrs currently
and expect things to work.

...

If we bother to ask ioremap for memory that is not cached, the last
thing in the world we want is the MTRRs upgrading that to write combining.
So ioremap_nocache has been slightly buggy for ages.  ioremap_nocache
and PAGE_KERNEL_NOCACHE should get _PAGE_PWT added to their
definitions.

Could we please get a cleanup patch at the beginning of this patchset
or that comes before it that fixes ioremap_nocache on x86?

That will make us a lot more git-bisect safe.


Eric









--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-13 Thread Eric W. Biederman
[EMAIL PROTECTED] writes:

> Originally based on a patch from Eric Biederman, but heavily changed.
>
> Forward port of pat-base.patch to x86 tree, with a bug fix.
> Code was using 'PCD|PWT' i.e., PAT3 for WC mapping. So set the WC mapping at
> correct PAT fields PA3/PA7.

Well that wasn't from my original tested patch. Grr.

> TBD: KEXEC and other CPU offline paths may need pat_shutdown()?

> Index: linux-2.6/arch/x86/mm/Makefile_64
> ===
> --- linux-2.6.orig/arch/x86/mm/Makefile_64 2007-12-11 03:30:34.0 -0800
> +++ linux-2.6/arch/x86/mm/Makefile_64 2007-12-11 03:42:08.0 -0800
> @@ -2,7 +2,7 @@
>  # Makefile for the linux x86_64-specific parts of the memory manager.
>  #
>  
> -obj-y := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o 
> mmap_64.o
> +obj-y := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o 
> mmap_64.o
> pat.o
>  obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
>  obj-$(CONFIG_NUMA) += numa_64.o
>  obj-$(CONFIG_K8_NUMA) += k8topology_64.o
> Index: linux-2.6/arch/x86/mm/pat.c
> ===
> --- /dev/null 1970-01-01 00:00:00.0 +
> +++ linux-2.6/arch/x86/mm/pat.c   2007-12-11 04:12:47.0 -0800
> @@ -0,0 +1,57 @@
> +/* Handle caching attributes in page tables (PAT) */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static u64 boot_pat_state;
> +
> +enum {
> + PAT_UC = 0, /* uncached */
> + PAT_WC = 1, /* Write combining */
> + PAT_WT = 4, /* Write Through */
> + PAT_WP = 5, /* Write Protected */
> + PAT_WB = 6, /* Write Back (default) */
> + PAT_UC_MINUS = 7,   /* UC, but can be overriden by MTRR */
> +};
> +
> +#define PAT(x,y) ((u64)PAT_ ## y << ((x)*8))
> +
> +void __cpuinit pat_init(void)
> +{
> + /* Set PWT+PCD to Write-Combining. All other bits stay the same */
> + if (cpu_has_pat) {
> + u64 pat;
> + /* PTE encoding used in Linux:
> +   PAT
> +   |PCD
> +   ||PWT
> +   |||
> +000 WB default
> +010 UC_MINUS   _PAGE_PCD
> +011 WC _PAGE_WC
> +PAT bit unused */
> + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
> +   PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);

I strongly object to this configuration.

The caching modes of interest are:
PAT_WB write-back or a close as the MTRRs will allow
   used for WC today.
PAT_UC completely uncachable not overridable by MTRRs 
   and what we use today for pgprot_noncached
PAT_WC what isn't available for current use.

We should use:
> + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) |
> +   PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC);

Changing the UC- which currently allows write-combining if the MTRRs specify it,
to WC.  This grandfathers in all of our current usage and changes the one
PAT type that could today and in legacy mode specify WC to really specify WC.

I don't know if we need to set the high half or not, that would depend
on the state of the PAT errata.

I do know we need to use the low 4 pat mappings to avoid most of the PAT
errata issues.

As for Andi's concern about modules playing games with the PAT mappings
if we don't redefine how we use the page table entries our exposure to
badly behaved modules more limited.

> + rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
> + wrmsrl(MSR_IA32_CR_PAT, pat);
> + __flush_tlb_all();
> + asm volatile("wbinvd");
> + }
> +}
> +
> +#undef PAT
> +
> +void pat_shutdown(void)
> +{
> + /* Restore CPU default pat state */
> + if (cpu_has_pat) {
> + wrmsrl(MSR_IA32_CR_PAT, boot_pat_state);
> + __flush_tlb_all();
> + asm volatile("wbinvd");
> + }
> +}
> +


Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-13 Thread Andi Kleen
> +void __cpuinit pat_init(void)
> +{
> + /* Set PWT+PCD to Write-Combining. All other bits stay the same */
> + if (cpu_has_pat) {

All the old CPUs (PPro etc.) with known PAT bugs need to clear this flag 
now in their CPU init functions. It is fine to be aggressive there
because these old systems have lived so long without PAT they can do 
so forever. So perhaps it's best to just white list it only for newer
CPUs on the Intel side at least.

Another problem is that there are some popular modules (ATI, Nvidia for once)
who reprogram the PAT registers on their own, likely different. Need some way 
to detect
that case I guess, otherwise lots of users will see strange malfunctions.
Maybe recheck after module load?

> +   |||
> +000 WB default
> +010 UC_MINUS   _PAGE_PCD
> +011 WC _PAGE_WC
> +PAT bit unused */
> + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
> +   PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);
> + rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
> + wrmsrl(MSR_IA32_CR_PAT, pat);
> + __flush_tlb_all();
> + asm volatile("wbinvd");

Have you double checked this is the full procedure from the manual? iirc there
were some steps missing.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 02/12] PAT 64b: Basic PAT implementation

2007-12-13 Thread venkatesh . pallipadi
Originally based on a patch from Eric Biederman, but heavily changed.

Forward port of pat-base.patch to x86 tree, with a bug fix.
Code was using 'PCD|PWT' i.e., PAT3 for WC mapping. So set the WC mapping at
correct PAT fields PA3/PA7.

TBD: KEXEC and other CPU offline paths may need pat_shutdown()?

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---
Index: linux-2.6/arch/x86/kernel/setup64.c
===
--- linux-2.6.orig/arch/x86/kernel/setup64.c2007-12-11 03:30:46.0 
-0800
+++ linux-2.6/arch/x86/kernel/setup64.c 2007-12-11 03:42:08.0 -0800
@@ -291,9 +291,11 @@
 
fpu_init(); 
 
+   pat_init();
raw_local_save_flags(kernel_eflags);
 }
 
 void cpu_shutdown(void)
 {
+   pat_shutdown();
 }
Index: linux-2.6/arch/x86/mm/Makefile_64
===
--- linux-2.6.orig/arch/x86/mm/Makefile_64  2007-12-11 03:30:34.0 
-0800
+++ linux-2.6/arch/x86/mm/Makefile_64   2007-12-11 03:42:08.0 -0800
@@ -2,7 +2,7 @@
 # Makefile for the linux x86_64-specific parts of the memory manager.
 #
 
-obj-y   := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o 
mmap_64.o
+obj-y   := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o 
mmap_64.o pat.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_NUMA) += numa_64.o
 obj-$(CONFIG_K8_NUMA) += k8topology_64.o
Index: linux-2.6/arch/x86/mm/pat.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/arch/x86/mm/pat.c 2007-12-11 04:12:47.0 -0800
@@ -0,0 +1,57 @@
+/* Handle caching attributes in page tables (PAT) */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static u64 boot_pat_state;
+
+enum {
+   PAT_UC = 0, /* uncached */
+   PAT_WC = 1, /* Write combining */
+   PAT_WT = 4, /* Write Through */
+   PAT_WP = 5, /* Write Protected */
+   PAT_WB = 6, /* Write Back (default) */
+   PAT_UC_MINUS = 7,   /* UC, but can be overriden by MTRR */
+};
+
+#define PAT(x,y) ((u64)PAT_ ## y << ((x)*8))
+
+void __cpuinit pat_init(void)
+{
+   /* Set PWT+PCD to Write-Combining. All other bits stay the same */
+   if (cpu_has_pat) {
+   u64 pat;
+   /* PTE encoding used in Linux:
+   PAT
+   |PCD
+   ||PWT
+   |||
+  000 WB default
+  010 UC_MINUS   _PAGE_PCD
+  011 WC _PAGE_WC
+  PAT bit unused */
+   pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
+ PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);
+   rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
+   wrmsrl(MSR_IA32_CR_PAT, pat);
+   __flush_tlb_all();
+   asm volatile("wbinvd");
+   }
+}
+
+#undef PAT
+
+void pat_shutdown(void)
+{
+   /* Restore CPU default pat state */
+   if (cpu_has_pat) {
+   wrmsrl(MSR_IA32_CR_PAT, boot_pat_state);
+   __flush_tlb_all();
+   asm volatile("wbinvd");
+   }
+}
+
Index: linux-2.6/arch/x86/pci/i386.c
===
--- linux-2.6.orig/arch/x86/pci/i386.c  2007-12-11 03:30:34.0 -0800
+++ linux-2.6/arch/x86/pci/i386.c   2007-12-11 03:42:08.0 -0800
@@ -300,8 +300,6 @@
 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
enum pci_mmap_state mmap_state, int write_combine)
 {
-   unsigned long prot;
-
/* I/O space cannot be accessed via normal processor loads and
 * stores on this platform.
 */
@@ -311,14 +309,11 @@
/* Leave vm_pgoff as-is, the PCI space address is the physical
 * address on this platform.
 */
-   prot = pgprot_val(vma->vm_page_prot);
-   if (boot_cpu_data.x86 > 3)
-   prot |= _PAGE_PCD | _PAGE_PWT;
-   vma->vm_page_prot = __pgprot(prot);
+   if (write_combine)
+   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+   else
+   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
-   /* Write-combine setting is ignored, it is changed via the mtrr
-* interfaces on this platform.
-*/
if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
   vma->vm_end - vma->vm_start,
   vma->vm_page_prot))
Index: linux-2.6/include/asm-x86/cpufeature_32.h
===
--- linux-2.6.orig/include/asm-x86/cpufeature_32.h  2007-12-11 
03:30:34.0 -0800
+++ linux-2.6/include/asm