Re: [PATCH v6 7/7] x86/vmware: Add TDX hypercall support

2024-01-22 Thread H. Peter Anvin
On January 22, 2024 4:04:33 PM PST, Alexey Makhalov 
 wrote:
>
>
>On 1/22/24 10:28 AM, H. Peter Anvin wrote:
>> On January 22, 2024 8:32:22 AM PST, Dave Hansen  
>> wrote:
>>> On 1/9/24 00:40, Alexey Makhalov wrote:
>>>> +#ifdef CONFIG_INTEL_TDX_GUEST
>>>> +unsigned long vmware_tdx_hypercall(unsigned long cmd,
>>>> + struct tdx_module_args *args)
>>>> +{
>>>> +  if (!hypervisor_is_type(X86_HYPER_VMWARE))
>>>> +  return ULONG_MAX;
>>>> +
>>>> +  if (cmd & ~VMWARE_CMD_MASK) {
>>>> +  pr_warn_once("Out of range command %lx\n", cmd);
>>>> +  return ULONG_MAX;
>>>> +  }
>>>> +
>>>> +  args->r10 = VMWARE_TDX_VENDOR_LEAF;
>>>> +  args->r11 = VMWARE_TDX_HCALL_FUNC;
>>>> +  args->r12 = VMWARE_HYPERVISOR_MAGIC;
>>>> +  args->r13 = cmd;
>>>> +  args->r15 = 0; /* CPL */
>>>> +
>>>> +  __tdx_hypercall(args);
>>>> +
>>>> +  return args->r12;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(vmware_tdx_hypercall);
>>>> +#endif
>>> 
>>> This is the kind of wrapper that I was hoping for.  Thanks.
>>> 
>>> Acked-by: Dave Hansen 
>>> 
>> 
>> I'm slightly confused by this TBH.
>> 
>> Why are the arguments passed in as a structure, which is modified by the 
>> wrapper to boot? This is analogous to a system call interface.
>> 
>> Furthermore, this is an out-of-line function; it should never be called with 
>> !X86_HYPER_VMWARE or you are introducing overhead for other hypervisors; I 
>> believe a pr_warn_once() is in order at least, just as you have for the 
>> out-of-range test.
>> 
>
>This patch series introduces vmware_hypercall family of functions similar to 
>kvm_hypercall. Similarity: both vmware and kvm implementations are static 
>inline functions and both of them use __tdx_hypercall (global not exported 
>symbol). Difference: kvm_hypercall functions are used _only_ within the 
>kernel, but vmware_hypercall are also used by modules.
>Exporting __tdx_hypercall function is an original Dave's concern.
>So we ended up with exporting wrapper, not generic, but VMware specific with 
>added checks against arbitrary use.
>vmware_tdx_hypercall is not designed for !X86_HYPER_VMWARE callers. But such a 
>calls are not forbidden.
>Arguments in a structure is an API for __tdx_hypercall(). Input and output 
>argument handling are done by vmware_hypercall callers, while VMware specific 
>dress up is inside the wrapper.
>
>Peter, do you think code comments are required to make it clear for the reader?
>
>

TBH that explanation didn't make much sense to me...


Re: [PATCH v6 7/7] x86/vmware: Add TDX hypercall support

2024-01-22 Thread H. Peter Anvin
On January 22, 2024 8:32:22 AM PST, Dave Hansen  wrote:
>On 1/9/24 00:40, Alexey Makhalov wrote:
>> +#ifdef CONFIG_INTEL_TDX_GUEST
>> +unsigned long vmware_tdx_hypercall(unsigned long cmd,
>> +   struct tdx_module_args *args)
>> +{
>> +if (!hypervisor_is_type(X86_HYPER_VMWARE))
>> +return ULONG_MAX;
>> +
>> +if (cmd & ~VMWARE_CMD_MASK) {
>> +pr_warn_once("Out of range command %lx\n", cmd);
>> +return ULONG_MAX;
>> +}
>> +
>> +args->r10 = VMWARE_TDX_VENDOR_LEAF;
>> +args->r11 = VMWARE_TDX_HCALL_FUNC;
>> +args->r12 = VMWARE_HYPERVISOR_MAGIC;
>> +args->r13 = cmd;
>> +args->r15 = 0; /* CPL */
>> +
>> +__tdx_hypercall(args);
>> +
>> +return args->r12;
>> +}
>> +EXPORT_SYMBOL_GPL(vmware_tdx_hypercall);
>> +#endif
>
>This is the kind of wrapper that I was hoping for.  Thanks.
>
>Acked-by: Dave Hansen 
>

I'm slightly confused by this TBH.

Why are the arguments passed in as a structure, which is modified by the 
wrapper to boot? This is analogous to a system call interface.

Furthermore, this is an out-of-line function; it should never be called with 
!X86_HYPER_VMWARE or you are introducing overhead for other hypervisors; I 
believe a pr_warn_once() is in order at least, just as you have for the 
out-of-range test.





[PATCH] drm/bochs: add Bochs PCI ID for Simics model

2021-09-09 Thread H. Peter Anvin (Intel)
Current (and older) Simics models for the Bochs VGA used the wrong PCI
vendor ID (0x4321 instead of 0x1234).  Although this can hopefully be
fixed in the future, it is a problem for users of the current version,
not the least because to update the device ID the BIOS has to be
rebuilt in order to see BIOS output.

Add support for the 4321: device number in addition to the
1234: one.

Signed-off-by: H. Peter Anvin (Intel) 
---
 drivers/gpu/drm/tiny/bochs.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
index 73415fa9ae0f..2ce3bd903b70 100644
--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -63,6 +63,7 @@ MODULE_PARM_DESC(defy, "default y resolution");
 
 enum bochs_types {
BOCHS_QEMU_STDVGA,
+   BOCHS_SIMICS,
BOCHS_UNKNOWN,
 };
 
@@ -695,6 +696,13 @@ static const struct pci_device_id bochs_pci_tbl[] = {
.subdevice   = PCI_ANY_ID,
.driver_data = BOCHS_UNKNOWN,
},
+   {
+   .vendor  = 0x4321,
+   .device  = 0x,
+   .subvendor   = PCI_ANY_ID,
+   .subdevice   = PCI_ANY_ID,
+   .driver_data = BOCHS_SIMICS,
+   },
{ /* end of list */ }
 };
 
-- 
2.31.1



[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread H. Peter Anvin
On January 11, 2016 3:28:01 AM PST, Chris Wilson  
wrote:
>On Sat, Jan 09, 2016 at 02:36:03PM -0800, Andy Lutomirski wrote:
>> On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson
> wrote:
>> > On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote:
>> >> On 01/07/16 14:29, H. Peter Anvin wrote:
>> >> >
>> >> > I would be very interested in knowing if replacing the final
>clflushopt
>> >> > with a clflush would resolve your problems (in which case the
>last mb()
>> >> > shouldn't be necessary either.)
>> >> >
>> >>
>> >> Nevermind.  CLFLUSH is not ordered with regards to CLFLUSHOPT to
>the
>> >> same cache line.
>> >>
>> >> Could you add a sync_cpu(); call to the end (can replace the final
>mb())
>> >> and see if that helps your case?
>> >
>> > s/sync_cpu()/sync_core()/
>> >
>> > No. I still see failures on Baytrail and Braswell (Pineview is not
>> > affected) with the final mb() replaced with sync_core(). I can
>reproduce
>> > failures on Pineview by tweaking the clflush_cache_range()
>parameters,
>> > so I am fairly confident that it is validating the current code.
>> >
>> > iirc sync_core() is cpuid, a heavy serialising instruction, an
>> > alternative to mfence.  Is there anything that else I can infer
>about
>> > the nature of my bug from this result?
>> 
>> No clue, but I don't know much about the underlying architecture.
>> 
>> Can you try clflush_cache_ranging one cacheline less and then
>manually
>> doing clflushopt; mb on the last cache line, just to make sure that
>> the helper is really doing the right thing?  You could also try
>> clflush instead of clflushopt to see if that makes a difference.
>
>I had looked at increasing the range over which clflush_cache_range()
>runs (using roundup/rounddown by cache lines), but it took something
>like +/- 256 bytes to pass all the tests. And also did
>s/clflushopt/clflush/ to confirm that made no differnce.
>
>Bizarrely,
>
>diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
>index 6000ad7..cf074400 100644
>--- a/arch/x86/mm/pageattr.c
>+++ b/arch/x86/mm/pageattr.c
>@@ -141,6 +141,7 @@ void clflush_cache_range(void *vaddr, unsigned int
>size)
>for (; p < vend; p += clflush_size)
>clflushopt(p);
> 
>+   clflushopt(vend-1);
>mb();
> }
> EXPORT_SYMBOL_GPL(clflush_cache_range);
>
>works like a charm.
>-Chris

That clflushopt touches a cache line already touched and therefore serializes 
with it.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.


[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-08 Thread H. Peter Anvin
On 01/07/16 14:32, H. Peter Anvin wrote:
>
> Nevermind.  CLFLUSH is not ordered with regards to CLFLUSHOPT to the
> same cache line.
> 

*Except* to the same cache line, dammit.

-hpa




[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread H. Peter Anvin
On 01/07/16 14:29, H. Peter Anvin wrote:
> 
> I would be very interested in knowing if replacing the final clflushopt
> with a clflush would resolve your problems (in which case the last mb()
> shouldn't be necessary either.)
> 

Nevermind.  CLFLUSH is not ordered with regards to CLFLUSHOPT to the
same cache line.

Could you add a sync_cpu(); call to the end (can replace the final mb())
and see if that helps your case?

-hpa




[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread H. Peter Anvin
On 01/07/16 13:54, Chris Wilson wrote:
> 
> Whilst you are looking at this asm, note that we reload
> boot_cpu_data.x86_cflush_size everytime around the loop. That's a small
> but noticeable extra cost (especially when we are only flushing a single
> cacheline).
> 

I did notice that; I don't know if this is a compiler failure to do an
obvious hoisting (which should then be merged with the other load) or a
consequence of the volatile.  It might be useful to report this to the
gcc bugzilla to let them look at it.

Either way, the diff looks good and you can add my:

Acked-by: H. Peter Anvin 

However, I see absolutely nothing wrong with the assembly other than
minor optimization possibilities: since gcc generates an early-out test
as well as a late-out test anyway, we could add an explicit:

if (p < vend)
return;

before the first mb() at no additional cost (assuming gcc is smart
enough to skip the second test; otherwise that can easily be done
manually by replacing the for loop with a do { } while loop.

I would be very interested in knowing if replacing the final clflushopt
with a clflush would resolve your problems (in which case the last mb()
shouldn't be necessary either.)

Something like:

void clflush_cache_range(void *vaddr, unsigned int size)
{
unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
char *vend = (char *)vaddr + size;
char *p = (char *)((unsigned long)vaddr & ~(clflush_size - 1);

if (p >= vend)
return;

mb();

for (; p < vend - clflush_size; p += clflush_size)
clflushopt(p);

clflush(p); /* Serializing and thus a barrier */
}



[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread H. Peter Anvin
On 01/07/16 11:44, Chris Wilson wrote:
> 
> Now I feel silly. Looking at the .s, there is no difference with the
> addition of the barrier to clflush_cache_range(). And sure enough
> letting the test run for longer, we see a failure. I fell for a placebo.
> 
> The failing assertion is always on the last cacheline and is always one
> value behind. Oh well, back to wondering where we miss the flush.
> -Chris
> 

Could you include the assembly here?

-hpa



[REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)

2014-06-19 Thread H. Peter Anvin
On 06/19/2014 01:01 PM, Greg KH wrote:
> On Thu, Jun 19, 2014 at 09:15:36PM +0200, Daniel Vetter wrote:
>> On Thu, Jun 19, 2014 at 7:00 PM, Greg KH  
>> wrote:
>> + BUG_ON(f1->context != f2->context);
>
> Nice, you just crashed the kernel, making it impossible to debug or
> recover :(

 agreed, that should probably be 'if (WARN_ON(...)) return NULL;'

 (but at least I wouldn't expect to hit that under console_lock so you
 should at least see the last N lines of the backtrace on the screen
 ;-))
>>>
>>> Lots of devices don't have console screens :)
>>
>> Aside: This is a pet peeve of mine and recently I've switched to
>> rejecting all patch that have a BUG_ON, period.
> 
> Please do, I have been for a few years now as well for the same reasons
> you cite.
> 

I'm actually concerned about this trend.  Downgrading things to WARN_ON
can allow a security bug in the kernel to continue to exist, for
example, or make the error message disappear.

I am wondering if the right thing here isn't to have a user (command
line?) settable policy as to how to proceed on an assert violation,
instead of hardcoding it at compile time.

-hpa




[PATCH] drm: Missed clflushopt in drm_clflush_virt_range

2014-05-15 Thread H. Peter Anvin
On 05/15/2014 05:38 AM, Daniel Vetter wrote:
> On Wed, May 14, 2014 at 09:41:12AM -0600, Ross Zwisler wrote:
>> With this commit:
>>
>> 2a0788dc9bc4 x86: Use clflushopt in drm_clflush_virt_range
>>
>> If clflushopt is available on the system, we use it instead of clflush
>> in drm_clflush_virt_range.  There were two calls to clflush in this
>> function, but only one was changed to clflushopt.  This patch changes
>> the other clflush call to clflushopt.
>>
>> Signed-off-by: Ross Zwisler 
>> Reported-by: Matthew Wilcox 
>>
>> Cc: David Airlie 
>> Cc: dri-devel at lists.freedesktop.org
>> Cc: H Peter Anvin 
>> Cc: Ingo Molnar 
>> Cc: Thomas Gleixner 
> 
> Picked to my topic/core-stuff drm branch so it doesn't get lost.
> -Daniel
> 

Does this mean you're picking this up, or do you want me to put it into
-tip?

-hpa




Info: mapping multiple BARs. Your kernel is fine.

2014-02-25 Thread H. Peter Anvin
On 02/24/2014 12:19 PM, Borislav Petkov wrote:
> Btw,
> 
> I don't know whether the following observation is related or not, but it
> so happens that after resume from suspend-to-disk, I see the booting up
> of the resume kernel on the console but when it is time for the original
> kernel to take over and switch to graphics, the screen remains black but
> the machine is responsive over the network.
> 
> And this doesn't happen on every resume but only sporadically.
> 
> And yep, -rc3 was fine.
> 
> On Mon, Feb 24, 2014 at 05:24:00PM +0100, Borislav Petkov wrote:
>> This started happening this morning after booting -rc4+tip, let's
>> add *everybody* to CC :-)
>>
>> We have intel_uncore_init, snb_uncore_imc_init_box, uncore_pci_probe and
>> other goodies on the stack.
>>

snb_uncore_imc_init_box() is introduced new in tip:perf/core, and is a
relatively recent commit (b9e1ab6d4c0582cad97699285a6b3cf992251b00), so
I suspect that that wasn't in whatever -rc3 mix you were testing.

I am wondering if backing/disabling out that support (perhaps by
removing the relevant PCI ID) fixes the problem?

-hpa



[PATCH v2 00/14] Platform Framebuffers and SimpleDRM

2013-07-04 Thread H. Peter Anvin
On 07/04/2013 05:25 AM, David Herrmann wrote:
>   - What FB formats are common on x86 that we should add to SIMPLEFB_FORMATS?
> (other than ARGB/XRGB32)

The common pixel formats on x86 are:

- Palettized 4-bit planar (bigendian, i.e. MSB to the left)
- Palettized 8-bit packed (one byte per pixel)
- 16-bit RGB555 (16-bit littleendian words with R=14:10, G=9:5, B=4:0)
- 16-bit RGB565 (16-bit littleendian words with R=15:11, G=10:5, B=4:0)
- 24-bit RGB888 in littleendian order (first byte in memory is B,
   second is G, third is R)
- 32-bit ARGB (first byte in memory is B, second G, third R, fourth
   unused in the framebuffer proper)
- 32-bit RGB10:10:10 (I *believe* 32-bit littleendian words with
   R=29:20, G=19:10, B=9:0)

-hpa



Re: [PATCH v2 00/14] Platform Framebuffers and SimpleDRM

2013-07-04 Thread H. Peter Anvin

On 07/04/2013 05:25 AM, David Herrmann wrote:

  - What FB formats are common on x86 that we should add to SIMPLEFB_FORMATS?
(other than ARGB/XRGB32)


The common pixel formats on x86 are:

- Palettized 4-bit planar (bigendian, i.e. MSB to the left)
- Palettized 8-bit packed (one byte per pixel)
- 16-bit RGB555 (16-bit littleendian words with R=14:10, G=9:5, B=4:0)
- 16-bit RGB565 (16-bit littleendian words with R=15:11, G=10:5, B=4:0)
- 24-bit RGB888 in littleendian order (first byte in memory is B,
  second is G, third is R)
- 32-bit ARGB (first byte in memory is B, second G, third R, fourth
  unused in the framebuffer proper)
- 32-bit RGB10:10:10 (I *believe* 32-bit littleendian words with
  R=29:20, G=19:10, B=9:0)

-hpa

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
The aliasing doesn't matter for Linux because we map the high and low half the 
same.

Henrique de Moraes Holschuh  wrote:

>On Sun, 23 Jun 2013, H. Peter Anvin wrote:
>> On 06/23/2013 02:56 PM, Henrique de Moraes Holschuh wrote:
>> > 
>> > And as far as I could find from Intel's not-that-complete public
>> > "specification updates", we are applying the errata workaround to a
>few more
>> > processors than strictly required, but since I have no idea how to
>write a
>> > test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor
>can I get
>> > ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and
>report
>> > back.
>> 
>> Which specific erratum are you referring to, here?  The "WC becomes
>UC"
>> erratum?  I don't think there is a sane testcase for it since it
>needs a
>> very complicated setup to trigger.
>
>There are at least two different nasty PAT issues that are not always
>critical, and one that outright hangs the processor (if the unsupported
>aliasing of WB with UC/WC happens).
>
>Interestingly enough, most of the P4-Xeons and P4 do not appear to have
>the
>"WC becomes UC" errata.
>
>However, LOTS of P4, M-P4, Xeon PIII, Xeon, and Pentium M have a bug
>where
>the four highest entries in the PAT table are inactive (aliased to the
>four
>lowest entries) in mode B (PSE) and mode C (PAE) for 4k pages.  They
>work
>fine for large pages.
>
>Also, lots of them can hang if you ever alias WB with UC or WC (which
>is
>apparently an unsupported configuration anyway, or so it says in the
>errata).
>
>There are other weird aliasing nasties, such as one where you get
>memory
>corruption if you alias WB data with code (being accessed as UC or WC)
>in
>the same cacheline, and some stuff such as weirdness should the page
>table
>be on WC memory...
>
>I can track down most of the CPUIDs involved if you want, but someone
>from
>Intel would be better (I assume they actually have access to the errata
>documentation in some less idiotic way than reading a ton of badly
>indexed
>PDFs that take forever to find in their site).

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 02:56 PM, Henrique de Moraes Holschuh wrote:
> 
> And as far as I could find from Intel's not-that-complete public
> "specification updates", we are applying the errata workaround to a few more
> processors than strictly required, but since I have no idea how to write a
> test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor can I get
> ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and report
> back.
> 

Which specific erratum are you referring to, here?  The "WC becomes UC"
erratum?  I don't think there is a sane testcase for it since it needs a
very complicated setup to trigger.

-hpa



MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 01:54 PM, Dave Airlie wrote:
>>> breaking old boxes just because, is just going to get reverted when I
>>> get the first regression report that you broke old boxes.
>>>
>>
>> Not "just because", but *if* the choice is between breaking old boxes
>> and breaking new boxes I'll take the latter.
>>
> 
> But Linus won't so your choice doesn't matter.

I hate to break it to you, but we regress on ancient hardware all the
time.  Optimization work gets done on modern machines, so the sweet spot
keeps moving.  In particular, if supporting ancient hardware means
leaving a lot of performance on modern hardware on the table, we may
have to take that penalty.

Fortunately, most of the time we don't have to.

-hpa





MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 01:30 PM, Dave Airlie wrote:
 Why do you care about performance when PAT is disabled?
> 
> breaking old boxes just because, is just going to get reverted when I
> get the first regression report that you broke old boxes.
> 

Not "just because", but *if* the choice is between breaking old boxes
and breaking new boxes I'll take the latter.

> Andy Lutomirski just submitted a bunch of patches to clean up the DRM
> usage of mtrrs, they are in drm-next, afaik we no longer add them on
> PAT systems.

Fantastic news.  No issue, then, and no need to break anything.

The only problem I see with having ioremap_wc() installing an MTRR on
non-PAT, rather than pushing that into the drivers which is clearly not
the right thing, is that we will need a hook to uninstall it when the
mapping is destroyed.

-hpa





MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 12:29 PM, Henrique de Moraes Holschuh wrote:
> On Sun, 23 Jun 2013, H. Peter Anvin wrote:
>> Why do you care about performance when PAT is disabled?
> 
> It will regress already slow boxes.  We blacklist a LOT of P4s, PMs, etc and
> nobody ever took the pain to track down which ones of those actually have
> PAT+MTRR aliasing bugs.
> 
> These boxes have boards like the Radeon X300, which needs either PAT or MTRR
> to not become unusable...
> 

We're talking hardware which is now many years old, but this is causing
very serious problems on real, modern hardware.  As far as I understand
it, too, the blacklisting was precautionary (the only bug that I
personally know about is a performance bug, where WC would be
incorrectly converted to UC.)

We need a way forward here.  If it is the only way I think we would have
to sacrifice the old machines, but perhaps something can be worked out
(e.g. if PAT is disabled, fall back to MTRRs if available for ioremap_wc()).

-hpa



MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
Why do you care about performance when PAT is disabled?

Brice Goglin  wrote:

>Le 21/06/2013 07:00, H. Peter Anvin a ?crit :
>> An awful lot of drivers, mostly DRI drivers, are still mucking with
>> MTRRs directly as opposed to using ioremap_wc() or similar
>interfaces.
>> In addition to the architecture dependency, this is really
>undesirable
>> because MTRRs are a limited resource, whereas page table attributes
>are not.
>>
>> Furthermore, this perpetuates the need for the horrific hack known as
>> "MTRR cleanup".
>>
>> What, if anything, can we do to clean up this mess?
>>
>>  -hpa
>>
>
>The first network driver that used ioremap_wc() back in 2008 (myri10ge)
>had to keep using MTRR because ioremap_wc() silently falls back to
>ioremap_nocache() when PAT is disabled.
>
>I asked about this in https://lkml.org/lkml/2008/5/31/42 and there was
>some talk about putting the MTRR addition in the nocache fallback path
>but I guess nobody implemented the idea.
>
>Brice

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
Why do you care about performance when PAT is disabled?

Brice Goglin brice.gog...@gmail.com wrote:

Le 21/06/2013 07:00, H. Peter Anvin a écrit :
 An awful lot of drivers, mostly DRI drivers, are still mucking with
 MTRRs directly as opposed to using ioremap_wc() or similar
interfaces.
 In addition to the architecture dependency, this is really
undesirable
 because MTRRs are a limited resource, whereas page table attributes
are not.

 Furthermore, this perpetuates the need for the horrific hack known as
 MTRR cleanup.

 What, if anything, can we do to clean up this mess?

  -hpa


The first network driver that used ioremap_wc() back in 2008 (myri10ge)
had to keep using MTRR because ioremap_wc() silently falls back to
ioremap_nocache() when PAT is disabled.

I asked about this in https://lkml.org/lkml/2008/5/31/42 and there was
some talk about putting the MTRR addition in the nocache fallback path
but I guess nobody implemented the idea.

Brice

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 12:29 PM, Henrique de Moraes Holschuh wrote:
 On Sun, 23 Jun 2013, H. Peter Anvin wrote:
 Why do you care about performance when PAT is disabled?
 
 It will regress already slow boxes.  We blacklist a LOT of P4s, PMs, etc and
 nobody ever took the pain to track down which ones of those actually have
 PAT+MTRR aliasing bugs.
 
 These boxes have boards like the Radeon X300, which needs either PAT or MTRR
 to not become unusable...
 

We're talking hardware which is now many years old, but this is causing
very serious problems on real, modern hardware.  As far as I understand
it, too, the blacklisting was precautionary (the only bug that I
personally know about is a performance bug, where WC would be
incorrectly converted to UC.)

We need a way forward here.  If it is the only way I think we would have
to sacrifice the old machines, but perhaps something can be worked out
(e.g. if PAT is disabled, fall back to MTRRs if available for ioremap_wc()).

-hpa

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 01:30 PM, Dave Airlie wrote:
 Why do you care about performance when PAT is disabled?
 
 breaking old boxes just because, is just going to get reverted when I
 get the first regression report that you broke old boxes.
 

Not just because, but *if* the choice is between breaking old boxes
and breaking new boxes I'll take the latter.

 Andy Lutomirski just submitted a bunch of patches to clean up the DRM
 usage of mtrrs, they are in drm-next, afaik we no longer add them on
 PAT systems.

Fantastic news.  No issue, then, and no need to break anything.

The only problem I see with having ioremap_wc() installing an MTRR on
non-PAT, rather than pushing that into the drivers which is clearly not
the right thing, is that we will need a hook to uninstall it when the
mapping is destroyed.

-hpa



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 01:54 PM, Dave Airlie wrote:
 breaking old boxes just because, is just going to get reverted when I
 get the first regression report that you broke old boxes.


 Not just because, but *if* the choice is between breaking old boxes
 and breaking new boxes I'll take the latter.

 
 But Linus won't so your choice doesn't matter.

I hate to break it to you, but we regress on ancient hardware all the
time.  Optimization work gets done on modern machines, so the sweet spot
keeps moving.  In particular, if supporting ancient hardware means
leaving a lot of performance on modern hardware on the table, we may
have to take that penalty.

Fortunately, most of the time we don't have to.

-hpa



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
The aliasing doesn't matter for Linux because we map the high and low half the 
same.

Henrique de Moraes Holschuh h...@hmh.eng.br wrote:

On Sun, 23 Jun 2013, H. Peter Anvin wrote:
 On 06/23/2013 02:56 PM, Henrique de Moraes Holschuh wrote:
  
  And as far as I could find from Intel's not-that-complete public
  specification updates, we are applying the errata workaround to a
few more
  processors than strictly required, but since I have no idea how to
write a
  test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor
can I get
  ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and
report
  back.
 
 Which specific erratum are you referring to, here?  The WC becomes
UC
 erratum?  I don't think there is a sane testcase for it since it
needs a
 very complicated setup to trigger.

There are at least two different nasty PAT issues that are not always
critical, and one that outright hangs the processor (if the unsupported
aliasing of WB with UC/WC happens).

Interestingly enough, most of the P4-Xeons and P4 do not appear to have
the
WC becomes UC errata.

However, LOTS of P4, M-P4, Xeon PIII, Xeon, and Pentium M have a bug
where
the four highest entries in the PAT table are inactive (aliased to the
four
lowest entries) in mode B (PSE) and mode C (PAE) for 4k pages.  They
work
fine for large pages.

Also, lots of them can hang if you ever alias WB with UC or WC (which
is
apparently an unsupported configuration anyway, or so it says in the
errata).

There are other weird aliasing nasties, such as one where you get
memory
corruption if you alias WB data with code (being accessed as UC or WC)
in
the same cacheline, and some stuff such as weirdness should the page
table
be on WC memory...

I can track down most of the CPUIDs involved if you want, but someone
from
Intel would be better (I assume they actually have access to the errata
documentation in some less idiotic way than reading a ton of badly
indexed
PDFs that take forever to find in their site).

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


MTRR use in drivers

2013-06-20 Thread H. Peter Anvin
An awful lot of drivers, mostly DRI drivers, are still mucking with
MTRRs directly as opposed to using ioremap_wc() or similar interfaces.
In addition to the architecture dependency, this is really undesirable
because MTRRs are a limited resource, whereas page table attributes are not.

Furthermore, this perpetuates the need for the horrific hack known as
"MTRR cleanup".

What, if anything, can we do to clean up this mess?

-hpa


MTRR use in drivers

2013-06-20 Thread H. Peter Anvin
An awful lot of drivers, mostly DRI drivers, are still mucking with
MTRRs directly as opposed to using ioremap_wc() or similar interfaces.
In addition to the architecture dependency, this is really undesirable
because MTRRs are a limited resource, whereas page table attributes are not.

Furthermore, this perpetuates the need for the horrific hack known as
MTRR cleanup.

What, if anything, can we do to clean up this mess?

-hpa
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-11 Thread H. Peter Anvin
The problem is that the code will be broken, and so it makes no sense.  The #if 
0 is really confusing.

Daniel Vetter  wrote:

>On Thu, Mar 07, 2013 at 10:09:26PM -0800, H. Peter Anvin wrote:
>> On 03/07/2013 09:28 PM, Tejun Heo wrote:
>> > On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu 
>wrote:
>> >> They are not using memblock_find_in_range(), so 1ULL<< will not
>help.
>> >>
>> >> Really hope i915 drm guys could clean that hacks.
>> > 
>> > The code isn't being used.  Just leave it alone.  Maybe add a
>comment.
>> >  The change is just making things more confusing.
>> > 
>> 
>> Indeed, but...
>> 
>> Daniel: can you guys clean this up or can we just remove the #if 0
>clause?
>
>I guess we could just put this into a comment explaining where stolen
>memory for the gfx devices is at on gen2. But tbh I don't mind if we
>just
>keep the #if 0 code around. For all newer platforms we can get at that
>offset through mch bar registers, so I don't really care.
>-Daniel

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-11 Thread H. Peter Anvin
The problem is that the code will be broken, and so it makes no sense.  The #if 
0 is really confusing.

Daniel Vetter dan...@ffwll.ch wrote:

On Thu, Mar 07, 2013 at 10:09:26PM -0800, H. Peter Anvin wrote:
 On 03/07/2013 09:28 PM, Tejun Heo wrote:
  On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu ying...@kernel.org
wrote:
  They are not using memblock_find_in_range(), so 1ULL will not
help.
 
  Really hope i915 drm guys could clean that hacks.
  
  The code isn't being used.  Just leave it alone.  Maybe add a
comment.
   The change is just making things more confusing.
  
 
 Indeed, but...
 
 Daniel: can you guys clean this up or can we just remove the #if 0
clause?

I guess we could just put this into a comment explaining where stolen
memory for the gfx devices is at on gen2. But tbh I don't mind if we
just
keep the #if 0 code around. For all newer platforms we can get at that
offset through mch bar registers, so I don't really care.
-Daniel

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread H. Peter Anvin
On 03/07/2013 09:28 PM, Tejun Heo wrote:
> On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu  wrote:
>> They are not using memblock_find_in_range(), so 1ULL<< will not help.
>>
>> Really hope i915 drm guys could clean that hacks.
> 
> The code isn't being used.  Just leave it alone.  Maybe add a comment.
>  The change is just making things more confusing.
> 

Indeed, but...

Daniel: can you guys clean this up or can we just remove the #if 0 clause?

-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread H. Peter Anvin
On 03/07/2013 09:28 PM, Tejun Heo wrote:
 On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu ying...@kernel.org wrote:
 They are not using memblock_find_in_range(), so 1ULL will not help.

 Really hope i915 drm guys could clean that hacks.
 
 The code isn't being used.  Just leave it alone.  Maybe add a comment.
  The change is just making things more confusing.
 

Indeed, but...

Daniel: can you guys clean this up or can we just remove the #if 0 clause?

-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-24 Thread H. Peter Anvin
It would be nice to support more than 512 characters when using a graphical 
console.  We should be able to support up to 2048 at least.

Dave Airlie  wrote:

>On Thu, Jan 24, 2013 at 10:18 PM, H. Peter Anvin  wrote:
>> The characters will morph anyway, it is just a matter off having them
>randomly scream with the intensity bit.
>>
>> 512-character mode is definitely useful... we get much wider language
>coverage with 512 than with 256, which is why most distros use a 512
>console font.
>
>Yeah really use a graphics console :-)
>
>But I've sent a v2 patch that should clear the attribute space going
>both directions.
>
>Dave.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


[PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-24 Thread H. Peter Anvin
The characters will morph anyway, it is just a matter off having them randomly 
scream with the intensity bit.

512-character mode is definitely useful... we get much wider language coverage 
with 512 than with 256, which is why most distros use a 512 console font.

Dave Airlie  wrote:

>On Thu, Jan 24, 2013 at 3:20 PM, H. Peter Anvin  wrote:
>> We should clear this bit presumably on switching either from or to
>512-char mode, since the bit doesn't really make sense either way.
>
>Yeah the only problem going from 512-char is that chars above 256 will
>morph into garbage chars below 256, like ctrl chars, I don't really
>want to face into doing some sort of safe translation out of 512-char
>mode, from reading around the net, 512 char mode doesn't seem all that
>brilliantly useful.
>
>Dave.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


Re: [PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-24 Thread H. Peter Anvin
The characters will morph anyway, it is just a matter off having them randomly 
scream with the intensity bit.

512-character mode is definitely useful... we get much wider language coverage 
with 512 than with 256, which is why most distros use a 512 console font.

Dave Airlie airl...@gmail.com wrote:

On Thu, Jan 24, 2013 at 3:20 PM, H. Peter Anvin h...@zytor.com wrote:
 We should clear this bit presumably on switching either from or to
512-char mode, since the bit doesn't really make sense either way.

Yeah the only problem going from 512-char is that chars above 256 will
morph into garbage chars below 256, like ctrl chars, I don't really
want to face into doing some sort of safe translation out of 512-char
mode, from reading around the net, 512 char mode doesn't seem all that
brilliantly useful.

Dave.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-24 Thread H. Peter Anvin
It would be nice to support more than 512 characters when using a graphical 
console.  We should be able to support up to 2048 at least.

Dave Airlie airl...@gmail.com wrote:

On Thu, Jan 24, 2013 at 10:18 PM, H. Peter Anvin h...@zytor.com wrote:
 The characters will morph anyway, it is just a matter off having them
randomly scream with the intensity bit.

 512-character mode is definitely useful... we get much wider language
coverage with 512 than with 256, which is why most distros use a 512
console font.

Yeah really use a graphics console :-)

But I've sent a v2 patch that should clear the attribute space going
both directions.

Dave.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-23 Thread H. Peter Anvin
We should clear this bit presumably on switching either from or to 512-char 
mode, since the bit doesn't really make sense either way.

Dave Airlie  wrote:

>From: Dave Airlie 
>
>When we switch from 256->512 byte font rendering mode, it means the
>current contents of the screen is being reinterpreted. The bit that
>holds
>the high bit of the 9-bit font, may have been previously set, and thus
>the new font misrenders.
>
>The problem case we see is grub2 writes spaces with the bit set, so it
>ends up with data like 0x820, which gets reinterpreted into 0x120 char
>which the font translates into G with a circumflex. This flashes up on
>screen at boot and is quite ugly.
>
>A current side effect of this patch though is that any rendering on the
>screen changes color to a slightly darker color, but at least the
>screen
>no longer corrupts.
>
>Signed-off-by: Dave Airlie 
>---
> drivers/tty/vt/vt.c|  2 +-
> drivers/video/console/vgacon.c | 19 ---
> include/linux/vt_kern.h|  1 +
> 3 files changed, 14 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
>index 8fd8968..c8067ae 100644
>--- a/drivers/tty/vt/vt.c
>+++ b/drivers/tty/vt/vt.c
>@@ -638,7 +638,7 @@ static inline void save_screen(struct vc_data *vc)
>  *Redrawing of screen
>  */
> 
>-static void clear_buffer_attributes(struct vc_data *vc)
>+void clear_buffer_attributes(struct vc_data *vc)
> {
>   unsigned short *p = (unsigned short *)vc->vc_origin;
>   int count = vc->vc_screenbuf_size / 2;
>diff --git a/drivers/video/console/vgacon.c
>b/drivers/video/console/vgacon.c
>index d449a74..271b5d0 100644
>--- a/drivers/video/console/vgacon.c
>+++ b/drivers/video/console/vgacon.c
>@@ -1064,7 +1064,7 @@ static int vgacon_do_font_op(struct vgastate
>*state,char *arg,int set,int ch512)
>   unsigned short video_port_status = vga_video_port_reg + 6;
>   int font_select = 0x00, beg, i;
>   char *charmap;
>-  
>+  bool clear_attribs = false;
>   if (vga_video_type != VIDEO_TYPE_EGAM) {
>   charmap = (char *) VGA_MAP_MEM(colourmap, 0);
>   beg = 0x0e;
>@@ -1169,12 +1169,6 @@ static int vgacon_do_font_op(struct vgastate
>*state,char *arg,int set,int ch512)
> 
>   /* if 512 char mode is already enabled don't re-enable it. */
>   if ((set) && (ch512 != vga_512_chars)) {
>-  /* attribute controller */
>-  for (i = 0; i < MAX_NR_CONSOLES; i++) {
>-  struct vc_data *c = vc_cons[i].d;
>-  if (c && c->vc_sw == _con)
>-  c->vc_hi_font_mask = ch512 ? 0x0800 : 0;
>-  }
>   vga_512_chars = ch512;
>   /* 256-char: enable intensity bit
>  512-char: disable intensity bit */
>@@ -1185,8 +1179,19 @@ static int vgacon_do_font_op(struct vgastate
>*state,char *arg,int set,int ch512)
>  it means, but it works, and it appears necessary */
>   inb_p(video_port_status);
>   vga_wattr(state->vgabase, VGA_AR_ENABLE_DISPLAY, 0);
>+  clear_attribs = true;
>   }
>   raw_spin_unlock_irq(_lock);
>+
>+  if (clear_attribs) {
>+  for (i = 0; i < MAX_NR_CONSOLES; i++) {
>+  struct vc_data *c = vc_cons[i].d;
>+  if (c && c->vc_sw == _con) {
>+  clear_buffer_attributes(c);
>+  c->vc_hi_font_mask = ch512 ? 0x0800 : 0;
>+  }
>+  }
>+  }
>   return 0;
> }
> 
>diff --git a/include/linux/vt_kern.h b/include/linux/vt_kern.h
>index 50ae7d0..1f55665 100644
>--- a/include/linux/vt_kern.h
>+++ b/include/linux/vt_kern.h
>@@ -47,6 +47,7 @@ int con_set_cmap(unsigned char __user *cmap);
> int con_get_cmap(unsigned char __user *cmap);
> void scrollback(struct vc_data *vc, int lines);
> void scrollfront(struct vc_data *vc, int lines);
>+void clear_buffer_attributes(struct vc_data *vc);
>void update_region(struct vc_data *vc, unsigned long start, int count);
> void redraw_screen(struct vc_data *vc, int is_switch);
> #define update_screen(x) redraw_screen(x, 0)

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


(Short?) merge window reminder

2011-05-24 Thread H. Peter Anvin
On 05/24/2011 08:07 AM, jonsmirl at gmail.com wrote:
> On Tue, May 24, 2011 at 10:43 AM, Alan Cox  
> wrote:
>> Can we drop most of MCA, EISA and ISA bus if we are going to have a big
>> version change ? A driver spring clean is much overdue and it's all in
>> git in case someone wishes to sneak out at midnight and bring some crawly
>> horror back from the dead.
> 
> 2.8 could mark the beginning of the great cleanup
>   --- work out the details of what needs to be cleaned and set a goal
>   --- remove old buses/driver, switch to device tree, graphics, 32/64
> merges, etc
> 3.0 would mark its completion
> 

I think this whole discussion misses the essence of the new development
model, which is that we no longer do these kinds of feature-based major
milestones.  If we want to to deprecate lots of drivers (which I
personally would advocate against -- I have built systems specifically
to run a real floppy drive since the Linux floppy driver is amazingly
flexible and can read/write a lot of formats that nothing else can,
including USB floppies) then we should do that in the normal course of
action, incrementally, and listed in feature-removal-schedule.txt, not
all at once due to some arbitrary milestone.

We have found it works better this way.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: (Short?) merge window reminder

2011-05-24 Thread H. Peter Anvin
On 05/24/2011 08:07 AM, jonsm...@gmail.com wrote:
 On Tue, May 24, 2011 at 10:43 AM, Alan Cox a...@lxorguk.ukuu.org.uk wrote:
 Can we drop most of MCA, EISA and ISA bus if we are going to have a big
 version change ? A driver spring clean is much overdue and it's all in
 git in case someone wishes to sneak out at midnight and bring some crawly
 horror back from the dead.
 
 2.8 could mark the beginning of the great cleanup
   --- work out the details of what needs to be cleaned and set a goal
   --- remove old buses/driver, switch to device tree, graphics, 32/64
 merges, etc
 3.0 would mark its completion
 

I think this whole discussion misses the essence of the new development
model, which is that we no longer do these kinds of feature-based major
milestones.  If we want to to deprecate lots of drivers (which I
personally would advocate against -- I have built systems specifically
to run a real floppy drive since the Linux floppy driver is amazingly
flexible and can read/write a lot of formats that nothing else can,
including USB floppies) then we should do that in the normal course of
action, incrementally, and listed in feature-removal-schedule.txt, not
all at once due to some arbitrary milestone.

We have found it works better this way.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


(Short?) merge window reminder

2011-05-23 Thread H. Peter Anvin
On 05/23/2011 04:17 PM, Ted Ts'o wrote:
> On Mon, May 23, 2011 at 01:33:48PM -0700, Linus Torvalds wrote:
>> On Mon, May 23, 2011 at 12:20 PM, Ingo Molnar  wrote:
>>>
>>> I really hope there's also a voice that tells you to wait until .42 before
>>> cutting 3.0.0! :-)
>>
>> So I'm toying with 3.0 (and in that case, it really would be "3.0",
>> not "3.0.0" - the stable team would get the third digit rather than
>> the fourth one.
> 
> If we change from 2.6.X to 3.X, then if we don't change anything else,
> then successive stable release will cause the LINUX_VERSION_CODE to be
> incremented.  This isn't necessary bad, but it would be a different
> from what we have now.
> 

That sounds like a good thing.

-hpa


Linux 2.6.39-rc3

2011-04-15 Thread H. Peter Anvin
On 04/15/2011 12:18 PM, Yinghai Lu wrote:
> On 04/15/2011 12:06 PM, Ingo Molnar wrote:
> 
>>
>> Joerg, mind submitting it with a changelog that includes everything we 
>> learned 
>> about this bug and all the Tested-by's in place?
>>
>> Is anyone of the opinion that we should try to revert the allocation 
>> order/alignment changes in addition to this fix?
> 
> We should figure out what is written to 0xa0001000 (main memory) by GPU 
> before internal GART is setup.
> 
> Joerg,
> can you insert some dump code in the drm/radon code to find out which 
> function cause the problem?
> 

Yes, I would like to make sure we don't just paper over a real bug
(again).  I think we still should talk Joerg's patch since it seems to
be the right thing to do anyway, but I do want to make sure we don't
have a memory-overwrite bug in the kernel that we're papering over.

-hpa


Linux 2.6.39-rc3

2011-04-14 Thread H. Peter Anvin
On 04/14/2011 02:11 AM, Ingo Molnar wrote:
> 
> I'd strongly suggest we revert back to the old and proven allocation order, 
> as 
> long as it results in valid layouts. Even if we figure out this particular 
> GART/GTT assumption there might be a dozen others in other types of hardware.
> 

Yes, but we might also be hiding a real bug which bites other hardware.
 We have found real and very serious bugs in the kernel this way before
-- things where drivers scribble over random memory and allocation order
exposed the failure in a predictable way, as opposed to random crashes.

    -hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Linux 2.6.39-rc3

2011-04-14 Thread H. Peter Anvin
On 04/13/2011 07:07 PM, Dave Airlie wrote:
>>
>> Okay, staring at this, it definitely seems toxic to overlay the GART
>> over memory areas reserved by the BIOS.  If I were to guess, I would say
>> that the problem here seems to be that the kernel thinks it is
>> overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
>> size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.
>>
>> Alex D., could you comment on the "num cpu pages" bit?
> 
> These are not CPU addresses. I think we've stated that already. Not the
> droids.
> 
> the num cpu pages is how many CPU pages would be needed to fill the GPU
> GTT, for those crazy cases where CPU pagesize != GPU pagesize.
> 

OK, well, something is still weird.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: Linux 2.6.39-rc3

2011-04-14 Thread H. Peter Anvin
On 04/13/2011 07:07 PM, Dave Airlie wrote:

 Okay, staring at this, it definitely seems toxic to overlay the GART
 over memory areas reserved by the BIOS.  If I were to guess, I would say
 that the problem here seems to be that the kernel thinks it is
 overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
 size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.

 Alex D., could you comment on the num cpu pages bit?
 
 These are not CPU addresses. I think we've stated that already. Not the
 droids.
 
 the num cpu pages is how many CPU pages would be needed to fill the GPU
 GTT, for those crazy cases where CPU pagesize != GPU pagesize.
 

OK, well, something is still weird.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-14 Thread H. Peter Anvin
On 04/14/2011 02:11 AM, Ingo Molnar wrote:
 
 I'd strongly suggest we revert back to the old and proven allocation order, 
 as 
 long as it results in valid layouts. Even if we figure out this particular 
 GART/GTT assumption there might be a dozen others in other types of hardware.
 

Yes, but we might also be hiding a real bug which bites other hardware.
 We have found real and very serious bugs in the kernel this way before
-- things where drivers scribble over random memory and allocation order
exposed the failure in a predictable way, as opposed to random crashes.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 04:39 PM, Linus Torvalds wrote:
> 
>  - Choice #2: understand exactly _what_ goes wrong, and fix it
> analytically (ie by _understanding_ the problem, and being able to
> solve it exactly, and in a way you can argue about without having to
> resort to "magic happens").
> 
> Now, the whole analytic approach (aka "computer sciency" approach),
> where you can actually think about the problem without having any
> pesky "reality" impact the solution is obviously the one we tend to
> prefer. Sadly, it's seldom the one we can use in reality when it comes
> to things like resource allocation, since we end up starting off with
> often buggy approximations of what the actual hardware is all about
> (ie broken firmware tables).
> 
> So I'd love to know exactly why one random number works, and why
> another one doesn't. But as long as we do _not_ know the "Why" of it,
> we will have to revert.
> 

Yes.  However, even if we *do* revert (and the time is running short on
not reverting) I would like to understand this particular one, simply
because I think it may very well be a problem that is manifesting itself
in other ways on other systems.

The other thing that this has uncovered is that we already have a bunch
of complete b*llsh*t magic numbers in this path, some of which are
trivially shown to be wrong or at least completely arbitrary, so there
are more issues here :(

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 12:14 PM, Yinghai Lu wrote:
> 
> so those two patches uncover some problems.
> 
> [0.00] Checking aperture...
> [0.00] No AGP bridge found
> [0.00] Node 0: aperture @ a000 size 32 MB
> [0.00] Aperture pointing to e820 RAM. Ignoring.
> [0.00] Your BIOS doesn't leave a aperture memory hole
> [0.00] Please enable the IOMMU option in the BIOS setup
> [0.00] This costs you 64 MB of RAM
> [0.00] memblock_x86_reserve_range: [0xa000-0xa3ff]   
> aperture64
> [0.00] Mapping aperture over 65536 KB of RAM @ a000
> 
> so kernel try to reallocate apperture. because BIOS allocated is pointed to 
> RAM or size is too small.
> 
> but your radeon does use [0xa000, 0xbfff)
> 
> [4.281993] radeon :01:05.0: VRAM: 320M 0xC000 - 
> 0xD3FF (320M used)
> [4.290672] radeon :01:05.0: GTT: 512M 0xA000 - 
> 0xBFFF
> [4.298550] [drm] Detected VRAM RAM=320M, BAR=256M
> [4.309857] [drm] RAM width 32bits DDR
> [4.313748] [TTM] Zone  kernel: Available graphics memory: 1896524 kiB.
> [4.320379] [TTM] Initializing pool allocator.
> [4.324948] [drm] radeon: 320M of VRAM memory ready
> [4.329832] [drm] radeon: 512M of GTT memory ready.
> 
> and the one seems working:
> 
> [0.00] Checking aperture...
> [0.00] No AGP bridge found
> [0.00] Node 0: aperture @ a000 size 32 MB
> [0.00] Aperture pointing to e820 RAM. Ignoring.
> [0.00] Your BIOS doesn't leave a aperture memory hole
> [0.00] Please enable the IOMMU option in the BIOS setup
> [0.00] This costs you 64 MB of RAM
> [0.00] memblock_x86_reserve_range: [0x8000-0x83ff]   
> aperture64
> [0.00] Mapping aperture over 65536 KB of RAM @ 8000
> [0.00] memblock_x86_reserve_range: [0xacb6bdc0-0xacb6bddf]
>   BOOTMEM
> 
> will use different position...
> 
> [4.250159] radeon :01:05.0: VRAM: 320M 0xC000 - 
> 0xD3FF (320M used)
> [4.258830] radeon :01:05.0: GTT: 512M 0xA000 - 
> 0xBFFF
> [4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
> [4.271549] [drm] RAM width 32bits DDR
> [4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
> [4.282066] [TTM] Initializing pool allocator.
> [4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
> [4.293076] [drm] radeon: 320M of VRAM memory ready
> [4.298277] [drm] radeon: 512M of GTT memory ready.
> [4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> [4.309854] [drm] Driver supports precise vblank timestamp query.
> [4.315970] [drm] radeon: irq initialized.
> [4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072
> 
> So question is why radeon is using the address [0xa000 - 0xc00], and 
> in E820 it is RAM 
> 
> [0.00]  BIOS-e820: 0010 - acb8d000 (usable)
> [0.00]  BIOS-e820: acb8d000 - acb8f000 (reserved)
> [0.00]  BIOS-e820: acb8f000 - afce9000 (usable)
> [0.00]  BIOS-e820: afce9000 - afd21000 (reserved)
> [0.00]  BIOS-e820: afd21000 - afd4f000 (usable)
> [0.00]  BIOS-e820: afd4f000 - afdcf000 (reserved)
> [0.00]  BIOS-e820: afdcf000 - afecf000 (ACPI NVS)
> [0.00]  BIOS-e820: afecf000 - afeff000 (ACPI data)
> [0.00]  BIOS-e820: afeff000 - aff0 (usable)
> 
> so looks bios program wrong address to the radon card?
> 

Okay, staring at this, it definitely seems toxic to overlay the GART
over memory areas reserved by the BIOS.  If I were to guess, I would say
that the problem here seems to be that the kernel thinks it is
overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.

Alex D., could you comment on the "num cpu pages" bit?

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 03:22 PM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 03:01:10PM -0700, H. Peter Anvin wrote:
>> On 04/13/2011 02:50 PM, Joerg Roedel wrote:
>>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>>>> -  addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>>>> +  addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
>>>
>>> Btw, while looking at this code I wondered why the 512M goal is enforced
>>> by the alignment. Start could be set to 512M instead and the alignment
>>> can be aper_size as it should. Any reason for such a big alignment?
>>>
>>> Joerg
>>>
>>> P.S.: The box is still in the office, I will try this debug-patch
>>>   tomorrow.
>>
>> The only reason that I can think of is that the aperture itself can be
>> huge, and perhaps 512 MiB is the biggest such known. 
> 
> Well, that would work as well by just using aper_size as alignment, the
> aperture needs to be aligned on its size anyway. This code only runs
> when Linux allocates the aperture itself and if I am mistaken is uses
> always 64MB when doing this.

Yes, I would agree with that.  The sane thing would be to set the base
to whatever address needs to be guarded against (WHICH SHOULD BE
MOTIVATED), and use aper_size as alignment, *unless* we are only using
the initial portion of a much larger hardware structure that needs
natural alignment (which isn't clear to me, I do know we sometimes use
only a fraction of the GART, but that doesn't mean we need to
naturally-align the entire thing, nor that 512 MiB is sufficient to do so.)

-hpa




Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 02:59 PM, Yinghai Lu wrote:
> On 04/13/2011 02:50 PM, Joerg Roedel wrote:
>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>>> -   addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>>> +   addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
>>
>> Btw, while looking at this code I wondered why the 512M goal is enforced
>> by the alignment. Start could be set to 512M instead and the alignment
>> can be aper_size as it should. Any reason for such a big alignment?
>>
> 
> when using bootmem, try to use big alignment (512M ), so we could avoid take 
> ram range below 512M.
> 

Yes, his question was why on Earth are you using 0 as start if that is
the purpose.

On top of that, where the hell does the magic 512 MiB come from?  It
looks like it is either completly ad hoc, or it has something to do with
where the kexec kernel was allocated once upon a time.

-hpa


Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 02:50 PM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>> -addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>> +addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
> 
> Btw, while looking at this code I wondered why the 512M goal is enforced
> by the alignment. Start could be set to 512M instead and the alignment
> can be aper_size as it should. Any reason for such a big alignment?
> 
>   Joerg
> 
> P.S.: The box is still in the office, I will try this debug-patch
>   tomorrow.

The only reason that I can think of is that the aperture itself can be
huge, and perhaps 512 MiB is the biggest such known.  512ULL<<21 is of
course a particularly moronic way to write 1 GiB, but it was a debug patch.

The value 512 MiB apparently comes from
7677b2ef6c0c4fddc84f6473f3863f40eb71821b, which is apparently totally ad
hoc; effectively it tries to prevent a collision with kexec by
hardcoding the kdump allocation as it sat at that point in time in the
GART assignment rules.

Yeah.  Brilliant.

-hpa



Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> 
> First of all, I bisected between v2.6.37-rc2..f005fe12b90c which where
> only a couple of patches and merged v2.6.38-rc4 in at every step. There
> was no failure found.
> Then I tried this again, but this time I merged v2.6.38-rc5 at every
> step and was successful. The bad commit in this branch turned out to be
> 
>   1a4a678b12c84db9ae5dce424e0e97f0559bb57c
> 
> which is related to memblock.
> 
> Then I tried to find out which change between 2.6.38-rc4 and 2.6.38-rc5
> is needed to trigger the failure, so I used f005fe12b90c as a base,
> bisected between v2.6.38-rc4..v2.6.38-rc5 and merged every bisect step
> into the base and tested. Here the bad commit turned out to be
> 
>   e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20
> 
> which is related to gart. It turned out that the gart aperture on that
> box is on another position with these patches. Before it was as
> 0xa400 and now it is at 0xa000. It seems like this has something
> to do with the root-cause.
> 
> Reverting commit 1a4a678b12c84db9ae5dce424e0e97f0559bb57c fixes the
> problem btw. and booting with iommu=soft also works, but I have no idea
> yet why the aperture at that address is a problem (with the patch
> reverted the aperture lands at 0x8000).
> 

Does reverting e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 solve the
problem for you?

1a4a678b12c84db9ae5dce424e0e97f0559bb57c is a memory-allocation-order
patch, which have a nasty tendency to unmask bugs elsewhere in the
kernel.  However, e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 looks
positively strange (and it doesn't exactly help that the description is
written in Yinghai-ese and is therefore nearly impossible to decode,
never mind tell if it is remotely correct.)

-hpa




Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 08:46:09AM +0200, Ingo Molnar wrote:
>> Could you please send the before/after bootlog (in particular all memory 
>> init 
>> messages included) and your .config?
>>
>>  before:  f005fe12b90c: x86-64: Move out cleanup higmap [_brk_end, _end) out 
>> of init_memory_mapping()
>>   after:  d2137d5af425: Merge branch 'linus' into x86/bootmem
>>
>> I've Cc:-ed more people who might have an idea about it.
> 
> Okay, I have done some more bisecting and debugging today.
> 

First of all, *huge* thanks for this effort.  At least we need to track
down the bits that need to be reverted -- it is past rc3, and it's time
to see what we should revert and tell the submitter to try again next cycle.

This looks to be the same issue as in bugzilla 33012:

https://bugzilla.kernel.org/show_bug.cgi?id=33012

... so it would be good if we could keep the information in there.

-hpa


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 10:21 AM, Joerg Roedel wrote:
 
 First of all, I bisected between v2.6.37-rc2..f005fe12b90c which where
 only a couple of patches and merged v2.6.38-rc4 in at every step. There
 was no failure found.
 Then I tried this again, but this time I merged v2.6.38-rc5 at every
 step and was successful. The bad commit in this branch turned out to be
 
   1a4a678b12c84db9ae5dce424e0e97f0559bb57c
 
 which is related to memblock.
 
 Then I tried to find out which change between 2.6.38-rc4 and 2.6.38-rc5
 is needed to trigger the failure, so I used f005fe12b90c as a base,
 bisected between v2.6.38-rc4..v2.6.38-rc5 and merged every bisect step
 into the base and tested. Here the bad commit turned out to be
 
   e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20
 
 which is related to gart. It turned out that the gart aperture on that
 box is on another position with these patches. Before it was as
 0xa400 and now it is at 0xa000. It seems like this has something
 to do with the root-cause.
 
 Reverting commit 1a4a678b12c84db9ae5dce424e0e97f0559bb57c fixes the
 problem btw. and booting with iommu=soft also works, but I have no idea
 yet why the aperture at that address is a problem (with the patch
 reverted the aperture lands at 0x8000).
 

Does reverting e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 solve the
problem for you?

1a4a678b12c84db9ae5dce424e0e97f0559bb57c is a memory-allocation-order
patch, which have a nasty tendency to unmask bugs elsewhere in the
kernel.  However, e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 looks
positively strange (and it doesn't exactly help that the description is
written in Yinghai-ese and is therefore nearly impossible to decode,
never mind tell if it is remotely correct.)

-hpa


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 10:21 AM, Joerg Roedel wrote:
 On Wed, Apr 13, 2011 at 08:46:09AM +0200, Ingo Molnar wrote:
 Could you please send the before/after bootlog (in particular all memory 
 init 
 messages included) and your .config?

  before:  f005fe12b90c: x86-64: Move out cleanup higmap [_brk_end, _end) out 
 of init_memory_mapping()
   after:  d2137d5af425: Merge branch 'linus' into x86/bootmem

 I've Cc:-ed more people who might have an idea about it.
 
 Okay, I have done some more bisecting and debugging today.
 

First of all, *huge* thanks for this effort.  At least we need to track
down the bits that need to be reverted -- it is past rc3, and it's time
to see what we should revert and tell the submitter to try again next cycle.

This looks to be the same issue as in bugzilla 33012:

https://bugzilla.kernel.org/show_bug.cgi?id=33012

... so it would be good if we could keep the information in there.

-hpa
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 02:50 PM, Joerg Roedel wrote:
 On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
 -addr = memblock_find_in_range(0, 1ULL32, aper_size, 512ULL20);
 +addr = memblock_find_in_range(0, 1ULL32, aper_size, 512ULL21);
 
 Btw, while looking at this code I wondered why the 512M goal is enforced
 by the alignment. Start could be set to 512M instead and the alignment
 can be aper_size as it should. Any reason for such a big alignment?
 
   Joerg
 
 P.S.: The box is still in the office, I will try this debug-patch
   tomorrow.

The only reason that I can think of is that the aperture itself can be
huge, and perhaps 512 MiB is the biggest such known.  512ULL21 is of
course a particularly moronic way to write 1 GiB, but it was a debug patch.

The value 512 MiB apparently comes from
7677b2ef6c0c4fddc84f6473f3863f40eb71821b, which is apparently totally ad
hoc; effectively it tries to prevent a collision with kexec by
hardcoding the kdump allocation as it sat at that point in time in the
GART assignment rules.

Yeah.  Brilliant.

-hpa

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 02:59 PM, Yinghai Lu wrote:
 On 04/13/2011 02:50 PM, Joerg Roedel wrote:
 On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
 -   addr = memblock_find_in_range(0, 1ULL32, aper_size, 512ULL20);
 +   addr = memblock_find_in_range(0, 1ULL32, aper_size, 512ULL21);

 Btw, while looking at this code I wondered why the 512M goal is enforced
 by the alignment. Start could be set to 512M instead and the alignment
 can be aper_size as it should. Any reason for such a big alignment?

 
 when using bootmem, try to use big alignment (512M ), so we could avoid take 
 ram range below 512M.
 

Yes, his question was why on Earth are you using 0 as start if that is
the purpose.

On top of that, where the hell does the magic 512 MiB come from?  It
looks like it is either completly ad hoc, or it has something to do with
where the kexec kernel was allocated once upon a time.

-hpa
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 03:22 PM, Joerg Roedel wrote:
 On Wed, Apr 13, 2011 at 03:01:10PM -0700, H. Peter Anvin wrote:
 On 04/13/2011 02:50 PM, Joerg Roedel wrote:
 On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
 -  addr = memblock_find_in_range(0, 1ULL32, aper_size, 512ULL20);
 +  addr = memblock_find_in_range(0, 1ULL32, aper_size, 512ULL21);

 Btw, while looking at this code I wondered why the 512M goal is enforced
 by the alignment. Start could be set to 512M instead and the alignment
 can be aper_size as it should. Any reason for such a big alignment?

 Joerg

 P.S.: The box is still in the office, I will try this debug-patch
   tomorrow.

 The only reason that I can think of is that the aperture itself can be
 huge, and perhaps 512 MiB is the biggest such known. 
 
 Well, that would work as well by just using aper_size as alignment, the
 aperture needs to be aligned on its size anyway. This code only runs
 when Linux allocates the aperture itself and if I am mistaken is uses
 always 64MB when doing this.

Yes, I would agree with that.  The sane thing would be to set the base
to whatever address needs to be guarded against (WHICH SHOULD BE
MOTIVATED), and use aper_size as alignment, *unless* we are only using
the initial portion of a much larger hardware structure that needs
natural alignment (which isn't clear to me, I do know we sometimes use
only a fraction of the GART, but that doesn't mean we need to
naturally-align the entire thing, nor that 512 MiB is sufficient to do so.)

-hpa


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 12:14 PM, Yinghai Lu wrote:
 
 so those two patches uncover some problems.
 
 [0.00] Checking aperture...
 [0.00] No AGP bridge found
 [0.00] Node 0: aperture @ a000 size 32 MB
 [0.00] Aperture pointing to e820 RAM. Ignoring.
 [0.00] Your BIOS doesn't leave a aperture memory hole
 [0.00] Please enable the IOMMU option in the BIOS setup
 [0.00] This costs you 64 MB of RAM
 [0.00] memblock_x86_reserve_range: [0xa000-0xa3ff]   
 aperture64
 [0.00] Mapping aperture over 65536 KB of RAM @ a000
 
 so kernel try to reallocate apperture. because BIOS allocated is pointed to 
 RAM or size is too small.
 
 but your radeon does use [0xa000, 0xbfff)
 
 [4.281993] radeon :01:05.0: VRAM: 320M 0xC000 - 
 0xD3FF (320M used)
 [4.290672] radeon :01:05.0: GTT: 512M 0xA000 - 
 0xBFFF
 [4.298550] [drm] Detected VRAM RAM=320M, BAR=256M
 [4.309857] [drm] RAM width 32bits DDR
 [4.313748] [TTM] Zone  kernel: Available graphics memory: 1896524 kiB.
 [4.320379] [TTM] Initializing pool allocator.
 [4.324948] [drm] radeon: 320M of VRAM memory ready
 [4.329832] [drm] radeon: 512M of GTT memory ready.
 
 and the one seems working:
 
 [0.00] Checking aperture...
 [0.00] No AGP bridge found
 [0.00] Node 0: aperture @ a000 size 32 MB
 [0.00] Aperture pointing to e820 RAM. Ignoring.
 [0.00] Your BIOS doesn't leave a aperture memory hole
 [0.00] Please enable the IOMMU option in the BIOS setup
 [0.00] This costs you 64 MB of RAM
 [0.00] memblock_x86_reserve_range: [0x8000-0x83ff]   
 aperture64
 [0.00] Mapping aperture over 65536 KB of RAM @ 8000
 [0.00] memblock_x86_reserve_range: [0xacb6bdc0-0xacb6bddf]
   BOOTMEM
 
 will use different position...
 
 [4.250159] radeon :01:05.0: VRAM: 320M 0xC000 - 
 0xD3FF (320M used)
 [4.258830] radeon :01:05.0: GTT: 512M 0xA000 - 
 0xBFFF
 [4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
 [4.271549] [drm] RAM width 32bits DDR
 [4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
 [4.282066] [TTM] Initializing pool allocator.
 [4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
 [4.293076] [drm] radeon: 320M of VRAM memory ready
 [4.298277] [drm] radeon: 512M of GTT memory ready.
 [4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
 [4.309854] [drm] Driver supports precise vblank timestamp query.
 [4.315970] [drm] radeon: irq initialized.
 [4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072
 
 So question is why radeon is using the address [0xa000 - 0xc00], and 
 in E820 it is RAM 
 
 [0.00]  BIOS-e820: 0010 - acb8d000 (usable)
 [0.00]  BIOS-e820: acb8d000 - acb8f000 (reserved)
 [0.00]  BIOS-e820: acb8f000 - afce9000 (usable)
 [0.00]  BIOS-e820: afce9000 - afd21000 (reserved)
 [0.00]  BIOS-e820: afd21000 - afd4f000 (usable)
 [0.00]  BIOS-e820: afd4f000 - afdcf000 (reserved)
 [0.00]  BIOS-e820: afdcf000 - afecf000 (ACPI NVS)
 [0.00]  BIOS-e820: afecf000 - afeff000 (ACPI data)
 [0.00]  BIOS-e820: afeff000 - aff0 (usable)
 
 so looks bios program wrong address to the radon card?
 

Okay, staring at this, it definitely seems toxic to overlay the GART
over memory areas reserved by the BIOS.  If I were to guess, I would say
that the problem here seems to be that the kernel thinks it is
overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.

Alex D., could you comment on the num cpu pages bit?

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 04:39 PM, Linus Torvalds wrote:
 
  - Choice #2: understand exactly _what_ goes wrong, and fix it
 analytically (ie by _understanding_ the problem, and being able to
 solve it exactly, and in a way you can argue about without having to
 resort to magic happens).
 
 Now, the whole analytic approach (aka computer sciency approach),
 where you can actually think about the problem without having any
 pesky reality impact the solution is obviously the one we tend to
 prefer. Sadly, it's seldom the one we can use in reality when it comes
 to things like resource allocation, since we end up starting off with
 often buggy approximations of what the actual hardware is all about
 (ie broken firmware tables).
 
 So I'd love to know exactly why one random number works, and why
 another one doesn't. But as long as we do _not_ know the Why of it,
 we will have to revert.
 

Yes.  However, even if we *do* revert (and the time is running short on
not reverting) I would like to understand this particular one, simply
because I think it may very well be a problem that is manifesting itself
in other ways on other systems.

The other thing that this has uncovered is that we already have a bunch
of complete b*llsh*t magic numbers in this path, some of which are
trivially shown to be wrong or at least completely arbitrary, so there
are more issues here :(

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel