Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-19 Thread Michael S. Tsirkin
On Tue, Feb 19, 2013 at 05:58:38PM +0200, Avi Kivity wrote:
> On Tue, Feb 19, 2013 at 4:41 PM, Michael S. Tsirkin  wrote:
> > On Thu, Feb 14, 2013 at 08:23:04PM +0200, Avi Kivity wrote:
> >> On Thu, Feb 14, 2013 at 8:12 PM, Michael S. Tsirkin  
> >> wrote:
> >> >>
> >> >> Is there an actual real problem that needs fixing?
> >> >
> >> > Yes. Guests sometimes cause device BARs to temporary overlap
> >> > the APIC range during BAR sizing. It works fine on a physical
> >> > system but fails on KVM since pci has same priority.
> >> >
> >> > See the report:
> >> > [BUG] Guest OS hangs on boot when 64bit BAR present
> >> >
> >>
> >> Is PCI_COMMAND_MEMORY set while this is going on?
> >
> > I think Linux never clears PCI_COMMAND_MEMORY because
> > it's buggy in some devices.
> 
> Ok.  Then I recommend defining the MSI message area as overlapped with
> sufficient priority.  It should probably be a child of the PCI address
> space.
> 
> The IOAPIC is actually closer to ISA, but again it's sufficient to
> move it to the PCI address space.  I doubt its priority matters.

Well moving IOAPIC to PCI seems strange, it's not a PCI thing,
and I think it can be moved outside PCI though guests don't do it.
So I think ideally we really should have it look something like:

sysbus -> ioapic
   -> pci -> msi

-- 
MST



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-19 Thread Avi Kivity
On Tue, Feb 19, 2013 at 6:08 PM, Michael S. Tsirkin  wrote:
>>
>> The IOAPIC is actually closer to ISA, but again it's sufficient to
>> move it to the PCI address space.  I doubt its priority matters.
>
> Well moving IOAPIC to PCI seems strange, it's not a PCI thing,
> and I think it can be moved outside PCI though guests don't do it.

Look at the 440fx/piix datasheets.  It's connected to the piix which
decodes its address.  So it's definitely part of the pci address
space.


> So I think ideally we really should have it look something like:
>
> sysbus -> ioapic
>-> pci -> msi
>



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-19 Thread Avi Kivity
On Tue, Feb 19, 2013 at 4:41 PM, Michael S. Tsirkin  wrote:
> On Thu, Feb 14, 2013 at 08:23:04PM +0200, Avi Kivity wrote:
>> On Thu, Feb 14, 2013 at 8:12 PM, Michael S. Tsirkin  wrote:
>> >>
>> >> Is there an actual real problem that needs fixing?
>> >
>> > Yes. Guests sometimes cause device BARs to temporary overlap
>> > the APIC range during BAR sizing. It works fine on a physical
>> > system but fails on KVM since pci has same priority.
>> >
>> > See the report:
>> > [BUG] Guest OS hangs on boot when 64bit BAR present
>> >
>>
>> Is PCI_COMMAND_MEMORY set while this is going on?
>
> I think Linux never clears PCI_COMMAND_MEMORY because
> it's buggy in some devices.

Ok.  Then I recommend defining the MSI message area as overlapped with
sufficient priority.  It should probably be a child of the PCI address
space.

The IOAPIC is actually closer to ISA, but again it's sufficient to
move it to the PCI address space.  I doubt its priority matters.



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-19 Thread Michael S. Tsirkin
On Thu, Feb 14, 2013 at 08:23:04PM +0200, Avi Kivity wrote:
> On Thu, Feb 14, 2013 at 8:12 PM, Michael S. Tsirkin  wrote:
> >>
> >> Is there an actual real problem that needs fixing?
> >
> > Yes. Guests sometimes cause device BARs to temporary overlap
> > the APIC range during BAR sizing. It works fine on a physical
> > system but fails on KVM since pci has same priority.
> >
> > See the report:
> > [BUG] Guest OS hangs on boot when 64bit BAR present
> >
> 
> Is PCI_COMMAND_MEMORY set while this is going on?

I think Linux never clears PCI_COMMAND_MEMORY because
it's buggy in some devices.



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Avi Kivity
On Thu, Feb 14, 2013 at 8:12 PM, Michael S. Tsirkin  wrote:
>>
>> Is there an actual real problem that needs fixing?
>
> Yes. Guests sometimes cause device BARs to temporary overlap
> the APIC range during BAR sizing. It works fine on a physical
> system but fails on KVM since pci has same priority.
>
> See the report:
> [BUG] Guest OS hangs on boot when 64bit BAR present
>

Is PCI_COMMAND_MEMORY set while this is going on?



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Michael S. Tsirkin
On Thu, Feb 14, 2013 at 07:02:15PM +0200, Avi Kivity wrote:
> On Thu, Feb 14, 2013 at 6:50 PM, Michael S. Tsirkin  wrote:
> >> > As you see, ioapic at 0xfec0 overlaps pci-hole.
> >> > ioapic is guest programmable in theory - should use _overlap?
> >> > pci-hole is not but can overlap with ioapic.
> >> > So also _overlap?
> >>
> >> It's a bug.  The ioapic is in the pci address space, not the system
> >> address space.  And yes it's overlappable.
> >
> > So you want to put it where? Under pci-hole?
> 
> No, under the pci address space.  Look at the 440fx block diagram.
> 
> > And we'll have to teach all machine types
> > creating pci-hole about it?
> 
> No.
> 
> >
> >> >
> >> > Let's imagine someone writes a guest programmable device for
> >> > ARM. Now we should update all ARM devices from regular to _overlap?
> >>
> >> It's sufficient to update the programmable device.
> >
> > Then the device can be higher priority (works for apic)
> > but not lower priority. Make priority signed?
> 
> Is there an actual real problem that needs fixing?

Yes. Guests sometimes cause device BARs to temporary overlap
the APIC range during BAR sizing. It works fine on a physical
system but fails on KVM since pci has same priority.

See the report:
[BUG] Guest OS hangs on boot when 64bit BAR present 

-- 
MST



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Avi Kivity
On Thu, Feb 14, 2013 at 6:50 PM, Michael S. Tsirkin  wrote:
>> > As you see, ioapic at 0xfec0 overlaps pci-hole.
>> > ioapic is guest programmable in theory - should use _overlap?
>> > pci-hole is not but can overlap with ioapic.
>> > So also _overlap?
>>
>> It's a bug.  The ioapic is in the pci address space, not the system
>> address space.  And yes it's overlappable.
>
> So you want to put it where? Under pci-hole?

No, under the pci address space.  Look at the 440fx block diagram.

> And we'll have to teach all machine types
> creating pci-hole about it?

No.

>
>> >
>> > Let's imagine someone writes a guest programmable device for
>> > ARM. Now we should update all ARM devices from regular to _overlap?
>>
>> It's sufficient to update the programmable device.
>
> Then the device can be higher priority (works for apic)
> but not lower priority. Make priority signed?

Is there an actual real problem that needs fixing?



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Michael S. Tsirkin
On Thu, Feb 14, 2013 at 05:07:02PM +0200, Avi Kivity wrote:
> On Thu, Feb 14, 2013 at 4:40 PM, Michael S. Tsirkin  wrote:
> > On Thu, Feb 14, 2013 at 04:14:39PM +0200, Avi Kivity wrote:
> >
> > But some parents are system created and shared by many devices so children 
> > for
> > such have no idea who their siblings are.
> >
> > Please take a look at the typical map in this mail:
> > '[BUG] Guest OS hangs on boot when 64bit BAR present'
> >
> > system overlap 0 pri 0 [0x0 - 0x7fff]
> >  kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000]
> >  pc.ram overlap 0 pri 0 [0xca000 - 0xcd000]
> >  ++ pc.ram [0xca000 - 0xcd000] is added to view
> >  
> >  smram-region overlap 1 pri 1 [0xa - 0xc]
> >  pci overlap 0 pri 0 [0xa - 0xc]
> >  cirrus-lowmem-container overlap 1 pri 1 [0xa - 0xc]
> >  cirrus-low-memory overlap 0 pri 0 [0xa - 0xc]
> > ++cirrus-low-memory [0xa - 0xc] is added to view
> >  kvm-ioapic overlap 0 pri 0 [0xfec0 - 0xfec01000]
> > ++kvm-ioapic [0xfec0 - 0xfec01000] is added to view
> >  pci-hole64 overlap 0 pri 0 [0x1 - 0x4001]
> >  pci overlap 0 pri 0 [0x1 - 0x4001]
> >  pci-hole overlap 0 pri 0 [0x7d00 - 0x1]
> >  pci overlap 0 pri 0 [0x7d00 - 0x1]
> >  ivshmem-bar2-container overlap 1 pri 1 [0xfe00 - 
> > 0x1]
> >  ivshmem.bar2 overlap 0 pri 0 [0xfe00 - 0x1]
> > ++ivshmem.bar2 [0xfe00 - 0xfec0] is added to view
> > ++ivshmem.bar2  [0xfec01000 - 0x1] is added to view
> >
> > As you see, ioapic at 0xfec0 overlaps pci-hole.
> > ioapic is guest programmable in theory - should use _overlap?
> > pci-hole is not but can overlap with ioapic.
> > So also _overlap?
> 
> It's a bug.  The ioapic is in the pci address space, not the system
> address space.  And yes it's overlappable.

So you want to put it where? Under pci-hole?
And we'll have to teach all machine types
creating pci-hole about it?

> >
> > Let's imagine someone writes a guest programmable device for
> > ARM. Now we should update all ARM devices from regular to _overlap?
> 
> It's sufficient to update the programmable device.

Then the device can be higher priority (works for apic)
but not lower priority. Make priority signed?

> >> >
> >> > Non overlapping is not a common case at all.  E.g. with normal PCI
> >> > devices you have no way to know nothing overlaps - addresses are guest
> >> > programmable.
> >>
> >> Non overlapping is mostly useful for embedded platforms.
> >
> > Maybe it should have a longer name like _nonoverlap then?
> > Current API makes people assume _overlap is only for special
> > cases and default should be non overlap.
> 
> The assumption is correct.



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Avi Kivity
On Thu, Feb 14, 2013 at 4:40 PM, Michael S. Tsirkin  wrote:
> On Thu, Feb 14, 2013 at 04:14:39PM +0200, Avi Kivity wrote:
>
> But some parents are system created and shared by many devices so children for
> such have no idea who their siblings are.
>
> Please take a look at the typical map in this mail:
> '[BUG] Guest OS hangs on boot when 64bit BAR present'
>
> system overlap 0 pri 0 [0x0 - 0x7fff]
>  kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000]
>  pc.ram overlap 0 pri 0 [0xca000 - 0xcd000]
>  ++ pc.ram [0xca000 - 0xcd000] is added to view
>  
>  smram-region overlap 1 pri 1 [0xa - 0xc]
>  pci overlap 0 pri 0 [0xa - 0xc]
>  cirrus-lowmem-container overlap 1 pri 1 [0xa - 0xc]
>  cirrus-low-memory overlap 0 pri 0 [0xa - 0xc]
> ++cirrus-low-memory [0xa - 0xc] is added to view
>  kvm-ioapic overlap 0 pri 0 [0xfec0 - 0xfec01000]
> ++kvm-ioapic [0xfec0 - 0xfec01000] is added to view
>  pci-hole64 overlap 0 pri 0 [0x1 - 0x4001]
>  pci overlap 0 pri 0 [0x1 - 0x4001]
>  pci-hole overlap 0 pri 0 [0x7d00 - 0x1]
>  pci overlap 0 pri 0 [0x7d00 - 0x1]
>  ivshmem-bar2-container overlap 1 pri 1 [0xfe00 - 0x1]
>  ivshmem.bar2 overlap 0 pri 0 [0xfe00 - 0x1]
> ++ivshmem.bar2 [0xfe00 - 0xfec0] is added to view
> ++ivshmem.bar2  [0xfec01000 - 0x1] is added to view
>
> As you see, ioapic at 0xfec0 overlaps pci-hole.
> ioapic is guest programmable in theory - should use _overlap?
> pci-hole is not but can overlap with ioapic.
> So also _overlap?

It's a bug.  The ioapic is in the pci address space, not the system
address space.  And yes it's overlappable.

>
> Let's imagine someone writes a guest programmable device for
> ARM. Now we should update all ARM devices from regular to _overlap?

It's sufficient to update the programmable device.

>> >
>> > Non overlapping is not a common case at all.  E.g. with normal PCI
>> > devices you have no way to know nothing overlaps - addresses are guest
>> > programmable.
>>
>> Non overlapping is mostly useful for embedded platforms.
>
> Maybe it should have a longer name like _nonoverlap then?
> Current API makes people assume _overlap is only for special
> cases and default should be non overlap.

The assumption is correct.



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Michael S. Tsirkin
On Thu, Feb 14, 2013 at 02:34:20PM +, Peter Maydell wrote:
> On 14 February 2013 14:02, Michael S. Tsirkin  wrote:
> > Well that's the status quo. One of the issues is, you have
> > no idea what else uses each priority. With this change,
> > at least you can grep for it.
> 
> No, because most of the code you find will be setting
> priorities for completely irrelevant containers (for
> instance PCI doesn't care at all about priorities used
> by the v7m NVIC).
> 
> > Imagine the specific example: ioapic and pci devices. ioapic has
> > an address within the pci hole but it is not a subregion.
> > If priority has no meaning how would you decide which one
> > to use?
> 
> I don't know about the specifics of the PC's memory layout,
> but *something* has to manage the address space that is
> being set up. I would expect something like:
> 
>  * PCI host controller has a memory region (container) which
>all the PCI devices are mapped into as per guest programming
>  * ioapic has a memory region
>  * there is another container which contains both these
>memory regions. The code that controls and sets up that
>container [which is probably the pc board model] gets to
>decide priorities, which are purely local to it

This assumes we set up devices in code.
We are trying to move away from that, and have
APIs that let you set up boards from command line.


> (It's possible that at the moment the "another container" is
> the get_system_memory() system address space. If it makes life
> easier you can always invent another container to give you a
> fresh level of indirection.)
> 
> > Also, on a PC many addresses are guest programmable. We need to behave
> > in some defined way if guest programs addresses to something silly.
> 
> Yes, this is the job of the code controlling the container(s)
> into which those memory regions may be mapped.

Some containers don't know what is mapped into them.

> >> If the guest can
> >> program overlap then presumably PCI specifies semantics
> >> for what happens then, and there need to be PCI specific
> >> wrappers that enforce those semantics and they can call
> >> the relevant _overlap functions when mapping things.
> >> In any case this isn't a concern for the PCI *device*,
> >> which can just provide its memory regions. It's a problem
> >> the PCI *host adaptor* has to deal with when it's figuring
> >> out how to map those regions into the container which
> >> corresponds to its area of the address space.
> >
> > Issue is, a PCI device overlapping something else suddenly
> > becomes this something else's problem.
> 
> Nope, because the PCI host controller model should be in
> complete control of the container all the PCI devices live
> in, and it is the thing doing the mapping and unmapping
> so it gets to set priorities and mark things as OK to
> overlap. Also, memory.c permits overlap if either of the
> two memory regions in question is marked as may-overlap;
> they don't both have to be marked.

That's undocumented, isn't it?
And then which one wins?


> >> > We could add a wrapper for MEMORY_PRIO_LOWEST - will that address
> >> > your concern?
> >>
> >> Well, I'm entirely happy with the memory API we have at
> >> the moment, and I'm trying to figure out why you want to
> >> change it...
> >
> > I am guessing your systems all have hardcoded addresses
> > not controlled by guest.
> 
> Nope. omap_gpmc.c for instance has guest programmable subregions;
> it uses a container so the guest's manipulation of these can't
> leak out and cause weird things to happen to other bits of QEMU.
> [I think we don't implement the correct guest-facing behaviour
> when the guest asks for overlapping regions, but we shouldn't
> hit the memory.c overlapping-region issue, or if we do it's
> a bug to be fixed.]
> 
> There's also PCI on the versatilepb, but PCI devices can't
> just appear anywhere, the PCI memory windows are at known
> addresses and the PCI device can't escape from the wrong
> side of the PCI controller.

But, there are devices who's addresses can overlap the PCI
window.


> >> >> Maybe we should take the printf() about subregion collisions
> >> >> in memory_region_add_subregion_common() out of the #if 0
> >> >> that it currently sits in?
> >>
> >> > This is just a debugging tool, it won't fix anything.
> >>
> >> It might tell us what bits of code are currently erroneously
> >> mapping regions that overlap without using the _overlap()
> >> function. Then we could fix them.
> 
> > When there is a single guest programmable device,
> > any address can be overlapped by it.
> 
> Do we really have an example of a guest programmable
> device where the *device itself* decides where it lives
> in the address space, rather than the guest being able to
> program a host controller/bus fabric/equivalent thing to
> specify where the device should live, or the device
> effectively negotiating with its bus controller? That
> seems very implausible to me just because hardware itself
> gener

Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Michael S. Tsirkin
On Thu, Feb 14, 2013 at 04:14:39PM +0200, Avi Kivity wrote:
> On Thu, Feb 14, 2013 at 3:09 PM, Michael S. Tsirkin  wrote:
> > On Thu, Feb 14, 2013 at 12:56:02PM +, Peter Maydell wrote:
> >> On 14 February 2013 12:45, Michael S. Tsirkin  wrote:
> >> > overlap flag in the region is currently unused, most devices have no
> >> > idea whether their region overlaps with anything, so drop it,
> >> > assume that all regions can overlap and always require priority.
> >>
> >> Devices themselves shouldn't care, for the most part -- they just
> >> provide a memory region and it's their parent that has to map it
> >> and know whether it overlaps or not. Similarly, parents should
> >> generally be in control of the container they're mapping the
> >> memory region into, and know whether it will be an overlapping
> >> map or not.
> >>
> >> > It's also not clear how should devices allocate priorities.
> >>
> >> Up to the parent which controls the region being mapped into.
> >
> > We could just assume same priority as parent but what happens if it
> > has to be different?
> 
> Priority is only considered relative to siblings.  The parent's
> priority is only considered wrt the parent's siblings, not its
> children.

But some parents are system created and shared by many devices so children for
such have no idea who their siblings are.

Please take a look at the typical map in this mail:
'[BUG] Guest OS hangs on boot when 64bit BAR present'

system overlap 0 pri 0 [0x0 - 0x7fff]
 kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000]
 pc.ram overlap 0 pri 0 [0xca000 - 0xcd000]
 ++ pc.ram [0xca000 - 0xcd000] is added to view
 
 smram-region overlap 1 pri 1 [0xa - 0xc]
 pci overlap 0 pri 0 [0xa - 0xc]
 cirrus-lowmem-container overlap 1 pri 1 [0xa - 0xc]
 cirrus-low-memory overlap 0 pri 0 [0xa - 0xc]
++cirrus-low-memory [0xa - 0xc] is added to view
 kvm-ioapic overlap 0 pri 0 [0xfec0 - 0xfec01000]
++kvm-ioapic [0xfec0 - 0xfec01000] is added to view
 pci-hole64 overlap 0 pri 0 [0x1 - 0x4001]
 pci overlap 0 pri 0 [0x1 - 0x4001]
 pci-hole overlap 0 pri 0 [0x7d00 - 0x1]
 pci overlap 0 pri 0 [0x7d00 - 0x1]
 ivshmem-bar2-container overlap 1 pri 1 [0xfe00 - 0x1]
 ivshmem.bar2 overlap 0 pri 0 [0xfe00 - 0x1]
++ivshmem.bar2 [0xfe00 - 0xfec0] is added to view
++ivshmem.bar2  [0xfec01000 - 0x1] is added to view

As you see, ioapic at 0xfec0 overlaps pci-hole.
ioapic is guest programmable in theory - should use _overlap?
pci-hole is not but can overlap with ioapic.
So also _overlap?

Let's imagine someone writes a guest programmable device for
ARM. Now we should update all ARM devices from regular to _overlap?


> > There are also aliases so a region
> > can have multiple parents. Presumably it will have to have
> > different priorities depending on what the parent does?
> 
> The alias region has its own priority
> 
> > Here's a list of instances using priority != 0.
> >
> > hw/armv7m_nvic.c:MEMORY_PRIO_LOW);
> > hw/cirrus_vga.c:MEMORY_PRIO_LOW);
> > hw/cirrus_vga.c:MEMORY_PRIO_LOW);
> > hw/cirrus_vga.c:&s->low_mem_container, 
> > MEMORY_PRIO_LOW);
> > hw/kvm/pci-assign.c: &r_dev->mmio, MEMORY_PRIO_LOW);
> > hw/kvmvapic.c:memory_region_add_subregion(as, rom_paddr, &s->rom, 
> > MEMORY_PRIO_HIGH);
> > hw/lpc_ich9.c:MEMORY_PRIO_LOW);
> > hw/onenand.c:&s->mapped_ram, 
> > MEMORY_PRIO_LOW);
> > hw/pam.c:MEMORY_PRIO_LOW);
> > hw/pc.c:MEMORY_PRIO_LOW);
> > hw/pc_sysfw.c:isa_bios, MEMORY_PRIO_LOW);
> > hw/pc_sysfw.c:isa_bios, MEMORY_PRIO_LOW);
> > hw/pci/pci.c:MEMORY_PRIO_LOW);
> > hw/pci/pci_bridge.c:memory_region_add_subregion(parent_space, base, 
> > alias, MEMORY_PRIO_LOW);
> > hw/piix_pci.c:MEMORY_PRIO_LOW);
> > hw/piix_pci.c:&d->rcr_mem, MEMORY_PRIO_LOW);
> > hw/q35.c:&mch->smram_region, 
> > MEMORY_PRIO_LOW);
> > hw/vga-isa.c:MEMORY_PRIO_LOW);
> > hw/vga.c:MEMORY_PRIO_MEDIUM);
> > hw/vga.c:vga_io_memory, MEMORY_PRIO_LOW);
> > hw/xen_pt_msi.c:MEMORY_PRIO_MEDIUM); /* 
> > Priority: pci default + 1
> >
> > Making priority relative to parent but not the same just seems like a 
> > recip

Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Peter Maydell
On 14 February 2013 14:02, Michael S. Tsirkin  wrote:
> Well that's the status quo. One of the issues is, you have
> no idea what else uses each priority. With this change,
> at least you can grep for it.

No, because most of the code you find will be setting
priorities for completely irrelevant containers (for
instance PCI doesn't care at all about priorities used
by the v7m NVIC).

> Imagine the specific example: ioapic and pci devices. ioapic has
> an address within the pci hole but it is not a subregion.
> If priority has no meaning how would you decide which one
> to use?

I don't know about the specifics of the PC's memory layout,
but *something* has to manage the address space that is
being set up. I would expect something like:

 * PCI host controller has a memory region (container) which
   all the PCI devices are mapped into as per guest programming
 * ioapic has a memory region
 * there is another container which contains both these
   memory regions. The code that controls and sets up that
   container [which is probably the pc board model] gets to
   decide priorities, which are purely local to it

(It's possible that at the moment the "another container" is
the get_system_memory() system address space. If it makes life
easier you can always invent another container to give you a
fresh level of indirection.)

> Also, on a PC many addresses are guest programmable. We need to behave
> in some defined way if guest programs addresses to something silly.

Yes, this is the job of the code controlling the container(s)
into which those memory regions may be mapped.

>> If the guest can
>> program overlap then presumably PCI specifies semantics
>> for what happens then, and there need to be PCI specific
>> wrappers that enforce those semantics and they can call
>> the relevant _overlap functions when mapping things.
>> In any case this isn't a concern for the PCI *device*,
>> which can just provide its memory regions. It's a problem
>> the PCI *host adaptor* has to deal with when it's figuring
>> out how to map those regions into the container which
>> corresponds to its area of the address space.
>
> Issue is, a PCI device overlapping something else suddenly
> becomes this something else's problem.

Nope, because the PCI host controller model should be in
complete control of the container all the PCI devices live
in, and it is the thing doing the mapping and unmapping
so it gets to set priorities and mark things as OK to
overlap. Also, memory.c permits overlap if either of the
two memory regions in question is marked as may-overlap;
they don't both have to be marked.

>> > We could add a wrapper for MEMORY_PRIO_LOWEST - will that address
>> > your concern?
>>
>> Well, I'm entirely happy with the memory API we have at
>> the moment, and I'm trying to figure out why you want to
>> change it...
>
> I am guessing your systems all have hardcoded addresses
> not controlled by guest.

Nope. omap_gpmc.c for instance has guest programmable subregions;
it uses a container so the guest's manipulation of these can't
leak out and cause weird things to happen to other bits of QEMU.
[I think we don't implement the correct guest-facing behaviour
when the guest asks for overlapping regions, but we shouldn't
hit the memory.c overlapping-region issue, or if we do it's
a bug to be fixed.]

There's also PCI on the versatilepb, but PCI devices can't
just appear anywhere, the PCI memory windows are at known
addresses and the PCI device can't escape from the wrong
side of the PCI controller.

>> >> Maybe we should take the printf() about subregion collisions
>> >> in memory_region_add_subregion_common() out of the #if 0
>> >> that it currently sits in?
>>
>> > This is just a debugging tool, it won't fix anything.
>>
>> It might tell us what bits of code are currently erroneously
>> mapping regions that overlap without using the _overlap()
>> function. Then we could fix them.

> When there is a single guest programmable device,
> any address can be overlapped by it.

Do we really have an example of a guest programmable
device where the *device itself* decides where it lives
in the address space, rather than the guest being able to
program a host controller/bus fabric/equivalent thing to
specify where the device should live, or the device
effectively negotiating with its bus controller? That
seems very implausible to me just because hardware itself
generally has some kind of hierarchy of buses and it's not
really possible for a leaf node to make itself appear
anywhere in the hierarchy; all it can do is by agreement
with the thing above it appear at some different address at
the same level.

[of course there are trivial systems with a totally flat
bus but that's just a degenerate case of the above where
there's only one thing (the board) managing a single
layer, and typically those systems have everything at
a fixed address anyhow.]

-- PMM



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Avi Kivity
On Thu, Feb 14, 2013 at 4:02 PM, Michael S. Tsirkin  wrote:
> On Thu, Feb 14, 2013 at 01:22:18PM +, Peter Maydell wrote:
>> On 14 February 2013 13:09, Michael S. Tsirkin  wrote:
>> > On Thu, Feb 14, 2013 at 12:56:02PM +, Peter Maydell wrote:
>> >> Up to the parent which controls the region being mapped into.
>> >
>> > We could just assume same priority as parent
>>
>> Er, no. I mean the code in control of the parent MR sets the
>> priority, when it calls memory_region_add_subregion_overlap().
>>
>> > but what happens if it
>> > has to be different? There are also aliases so a region
>> > can have multiple parents.
>>
>> The alias has its own priority.
>
> Well that's the status quo. One of the issues is, you have
> no idea what else uses each priority. With this change,
> at least you can grep for it.

The question "what priorities do aliases of this region have" is not
an interesting question.  Priority is a local attribute, not an
attribute of the region being prioritized.

>
>> > Presumably it will have to have
>> > different priorities depending on what the parent does?
>> > Here's a list of instances using priority != 0.
>> >
>> > hw/armv7m_nvic.c:MEMORY_PRIO_LOW);
>>
>> So this one I know about, and it's a good example of what
>> I'm talking about. This function sets up a container memory
>> region ("nvic"), and it is in total control of what is
>> mapped into that container. Specifically, it puts in a
>> "nvic_sysregs" background region which covers the whole
>> 0x1000 size of the container (at an implicit priority of
>> zero). It then layers over that an alias of the GIC
>> registers ("nvic-gic") at a specific address and with
>> a priority of 1 so it appears above the background region.
>> Nobody else ever puts anything in this container, so
>> the only thing we care about is that the priority of
>> the nvic-gic region is higher than that of the nvic_sysregs
>> region; and it's clear from the code that we do that.
>> Priority is a local question whose meaning is only relevant
>> within a particular container region, not system-wide,
>> and
>> having system-wide MEMORY_PRIO_ defines obscures that IMHO.
>
> Well that's not how it seems to work, and I don't see how it *could*
> work. Imagine the specific example: ioapic and pci devices. ioapic has
> an address within the pci hole but it is not a subregion.
> If priority has no meaning how would you decide which one
> to use?

Like PMM said.  You look at the semantics of the hardware, and map
that onto the API.  If the pci controller says that BARs hide the
ioapic, then you give them higher priority.  If it says that the
ioapic hides BARs, then that gets higher priority.  If it doesn't say
anything, take your pick (or give them the same priority).

>
> Also, on a PC many addresses are guest programmable. We need to behave
> in some defined way if guest programs addresses to something silly.

That's why _overlap exists.

> The only reason it works sometimes is because some systems
> use fixes addresses which never overlap.

That's why the no overlap API exists.

>
>>
>> >> I definitely don't like making the priority argument mandatory:
>> >> this is just introducing pointless boilerplate for the common
>> >> case where nothing overlaps and you know nothing overlaps.
>> >
>> > Non overlapping is not a common case at all.  E.g. with normal PCI
>> > devices you have no way to know nothing overlaps - addresses are guest
>> > programmable.
>>
>> That means PCI is a special case :-)
>> If the guest can
>> program overlap then presumably PCI specifies semantics
>> for what happens then, and there need to be PCI specific
>> wrappers that enforce those semantics and they can call
>> the relevant _overlap functions when mapping things.
>> In any case this isn't a concern for the PCI *device*,
>> which can just provide its memory regions. It's a problem
>> the PCI *host adaptor* has to deal with when it's figuring
>> out how to map those regions into the container which
>> corresponds to its area of the address space.
>
> Issue is, a PCI device overlapping something else suddenly
> becomes this something else's problem.

It is not a problem at all.

>
>> >> Maybe we should take the printf() about subregion collisions
>> >> in memory_region_add_subregion_common() out of the #if 0
>> >> that it currently sits in?
>>
>> > This is just a debugging tool, it won't fix anything.
>>
>> It might tell us what bits of code are currently erroneously
>> mapping regions that overlap without using the _overlap()
>> function. Then we could fix them.
>>
>> -- PMM
>
> When there is a single guest programmable device,
> any address can be overlapped by it.

No.  Only addresses within the same container.  Other containers work
fine without overlap.

> We could invent rules like 'non overlappable is higher
> priority' but it seems completely arbitrary, a single
> priority is clearer.

It's just noise for the xx% of cases which don't need it.



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Avi Kivity
On Thu, Feb 14, 2013 at 3:09 PM, Michael S. Tsirkin  wrote:
> On Thu, Feb 14, 2013 at 12:56:02PM +, Peter Maydell wrote:
>> On 14 February 2013 12:45, Michael S. Tsirkin  wrote:
>> > overlap flag in the region is currently unused, most devices have no
>> > idea whether their region overlaps with anything, so drop it,
>> > assume that all regions can overlap and always require priority.
>>
>> Devices themselves shouldn't care, for the most part -- they just
>> provide a memory region and it's their parent that has to map it
>> and know whether it overlaps or not. Similarly, parents should
>> generally be in control of the container they're mapping the
>> memory region into, and know whether it will be an overlapping
>> map or not.
>>
>> > It's also not clear how should devices allocate priorities.
>>
>> Up to the parent which controls the region being mapped into.
>
> We could just assume same priority as parent but what happens if it
> has to be different?

Priority is only considered relative to siblings.  The parent's
priority is only considered wrt the parent's siblings, not its
children.

> There are also aliases so a region
> can have multiple parents. Presumably it will have to have
> different priorities depending on what the parent does?

The alias region has its own priority

> Here's a list of instances using priority != 0.
>
> hw/armv7m_nvic.c:MEMORY_PRIO_LOW);
> hw/cirrus_vga.c:MEMORY_PRIO_LOW);
> hw/cirrus_vga.c:MEMORY_PRIO_LOW);
> hw/cirrus_vga.c:&s->low_mem_container, 
> MEMORY_PRIO_LOW);
> hw/kvm/pci-assign.c: &r_dev->mmio, MEMORY_PRIO_LOW);
> hw/kvmvapic.c:memory_region_add_subregion(as, rom_paddr, &s->rom, 
> MEMORY_PRIO_HIGH);
> hw/lpc_ich9.c:MEMORY_PRIO_LOW);
> hw/onenand.c:&s->mapped_ram, MEMORY_PRIO_LOW);
> hw/pam.c:MEMORY_PRIO_LOW);
> hw/pc.c:MEMORY_PRIO_LOW);
> hw/pc_sysfw.c:isa_bios, MEMORY_PRIO_LOW);
> hw/pc_sysfw.c:isa_bios, MEMORY_PRIO_LOW);
> hw/pci/pci.c:MEMORY_PRIO_LOW);
> hw/pci/pci_bridge.c:memory_region_add_subregion(parent_space, base, 
> alias, MEMORY_PRIO_LOW);
> hw/piix_pci.c:MEMORY_PRIO_LOW);
> hw/piix_pci.c:&d->rcr_mem, MEMORY_PRIO_LOW);
> hw/q35.c:&mch->smram_region, MEMORY_PRIO_LOW);
> hw/vga-isa.c:MEMORY_PRIO_LOW);
> hw/vga.c:MEMORY_PRIO_MEDIUM);
> hw/vga.c:vga_io_memory, MEMORY_PRIO_LOW);
> hw/xen_pt_msi.c:MEMORY_PRIO_MEDIUM); /* 
> Priority: pci default + 1
>
> Making priority relative to parent but not the same just seems like a recipe 
> for disaster.
>
>> I definitely don't like making the priority argument mandatory:
>> this is just introducing pointless boilerplate for the common
>> case where nothing overlaps and you know nothing overlaps.
>
> Non overlapping is not a common case at all.  E.g. with normal PCI
> devices you have no way to know nothing overlaps - addresses are guest
> programmable.

Non overlapping is mostly useful for embedded platforms.



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Michael S. Tsirkin
On Thu, Feb 14, 2013 at 01:22:18PM +, Peter Maydell wrote:
> On 14 February 2013 13:09, Michael S. Tsirkin  wrote:
> > On Thu, Feb 14, 2013 at 12:56:02PM +, Peter Maydell wrote:
> >> Up to the parent which controls the region being mapped into.
> >
> > We could just assume same priority as parent
> 
> Er, no. I mean the code in control of the parent MR sets the
> priority, when it calls memory_region_add_subregion_overlap().
> 
> > but what happens if it
> > has to be different? There are also aliases so a region
> > can have multiple parents.
> 
> The alias has its own priority.

Well that's the status quo. One of the issues is, you have
no idea what else uses each priority. With this change,
at least you can grep for it.

> > Presumably it will have to have
> > different priorities depending on what the parent does?
> > Here's a list of instances using priority != 0.
> >
> > hw/armv7m_nvic.c:MEMORY_PRIO_LOW);
> 
> So this one I know about, and it's a good example of what
> I'm talking about. This function sets up a container memory
> region ("nvic"), and it is in total control of what is
> mapped into that container. Specifically, it puts in a
> "nvic_sysregs" background region which covers the whole
> 0x1000 size of the container (at an implicit priority of
> zero). It then layers over that an alias of the GIC
> registers ("nvic-gic") at a specific address and with
> a priority of 1 so it appears above the background region.
> Nobody else ever puts anything in this container, so
> the only thing we care about is that the priority of
> the nvic-gic region is higher than that of the nvic_sysregs
> region; and it's clear from the code that we do that.
> Priority is a local question whose meaning is only relevant
> within a particular container region, not system-wide,
> and
> having system-wide MEMORY_PRIO_ defines obscures that IMHO.

Well that's not how it seems to work, and I don't see how it *could*
work. Imagine the specific example: ioapic and pci devices. ioapic has
an address within the pci hole but it is not a subregion.
If priority has no meaning how would you decide which one
to use?

Also, on a PC many addresses are guest programmable. We need to behave
in some defined way if guest programs addresses to something silly.

The only reason it works sometimes is because some systems
use fixes addresses which never overlap.

> 
> >> I definitely don't like making the priority argument mandatory:
> >> this is just introducing pointless boilerplate for the common
> >> case where nothing overlaps and you know nothing overlaps.
> >
> > Non overlapping is not a common case at all.  E.g. with normal PCI
> > devices you have no way to know nothing overlaps - addresses are guest
> > programmable.
> 
> That means PCI is a special case :-)
> If the guest can
> program overlap then presumably PCI specifies semantics
> for what happens then, and there need to be PCI specific
> wrappers that enforce those semantics and they can call
> the relevant _overlap functions when mapping things.
> In any case this isn't a concern for the PCI *device*,
> which can just provide its memory regions. It's a problem
> the PCI *host adaptor* has to deal with when it's figuring
> out how to map those regions into the container which
> corresponds to its area of the address space.

Issue is, a PCI device overlapping something else suddenly
becomes this something else's problem.

> > We could add a wrapper for MEMORY_PRIO_LOWEST - will that address
> > your concern?
> 
> Well, I'm entirely happy with the memory API we have at
> the moment, and I'm trying to figure out why you want to
> change it...

I am guessing your systems all have hardcoded addresses
not controlled by guest.

> >> Maybe we should take the printf() about subregion collisions
> >> in memory_region_add_subregion_common() out of the #if 0
> >> that it currently sits in?
> 
> > This is just a debugging tool, it won't fix anything.
> 
> It might tell us what bits of code are currently erroneously
> mapping regions that overlap without using the _overlap()
> function. Then we could fix them.
> 
> -- PMM

When there is a single guest programmable device,
any address can be overlapped by it.
We could invent rules like 'non overlappable is higher
priority' but it seems completely arbitrary, a single
priority is clearer.

-- 
MST



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Peter Maydell
On 14 February 2013 13:09, Michael S. Tsirkin  wrote:
> On Thu, Feb 14, 2013 at 12:56:02PM +, Peter Maydell wrote:
>> Up to the parent which controls the region being mapped into.
>
> We could just assume same priority as parent

Er, no. I mean the code in control of the parent MR sets the
priority, when it calls memory_region_add_subregion_overlap().

> but what happens if it
> has to be different? There are also aliases so a region
> can have multiple parents.

The alias has its own priority.

> Presumably it will have to have
> different priorities depending on what the parent does?
> Here's a list of instances using priority != 0.
>
> hw/armv7m_nvic.c:MEMORY_PRIO_LOW);

So this one I know about, and it's a good example of what
I'm talking about. This function sets up a container memory
region ("nvic"), and it is in total control of what is
mapped into that container. Specifically, it puts in a
"nvic_sysregs" background region which covers the whole
0x1000 size of the container (at an implicit priority of
zero). It then layers over that an alias of the GIC
registers ("nvic-gic") at a specific address and with
a priority of 1 so it appears above the background region.
Nobody else ever puts anything in this container, so
the only thing we care about is that the priority of
the nvic-gic region is higher than that of the nvic_sysregs
region; and it's clear from the code that we do that.
Priority is a local question whose meaning is only relevant
within a particular container region, not system-wide, and
having system-wide MEMORY_PRIO_ defines obscures that IMHO.

>> I definitely don't like making the priority argument mandatory:
>> this is just introducing pointless boilerplate for the common
>> case where nothing overlaps and you know nothing overlaps.
>
> Non overlapping is not a common case at all.  E.g. with normal PCI
> devices you have no way to know nothing overlaps - addresses are guest
> programmable.

That means PCI is a special case :-) If the guest can
program overlap then presumably PCI specifies semantics
for what happens then, and there need to be PCI specific
wrappers that enforce those semantics and they can call
the relevant _overlap functions when mapping things.
In any case this isn't a concern for the PCI *device*,
which can just provide its memory regions. It's a problem
the PCI *host adaptor* has to deal with when it's figuring
out how to map those regions into the container which
corresponds to its area of the address space.

> We could add a wrapper for MEMORY_PRIO_LOWEST - will that address
> your concern?

Well, I'm entirely happy with the memory API we have at
the moment, and I'm trying to figure out why you want to
change it...

>> Maybe we should take the printf() about subregion collisions
>> in memory_region_add_subregion_common() out of the #if 0
>> that it currently sits in?

> This is just a debugging tool, it won't fix anything.

It might tell us what bits of code are currently erroneously
mapping regions that overlap without using the _overlap()
function. Then we could fix them.

-- PMM



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Michael S. Tsirkin
On Thu, Feb 14, 2013 at 12:56:02PM +, Peter Maydell wrote:
> On 14 February 2013 12:45, Michael S. Tsirkin  wrote:
> > overlap flag in the region is currently unused, most devices have no
> > idea whether their region overlaps with anything, so drop it,
> > assume that all regions can overlap and always require priority.
> 
> Devices themselves shouldn't care, for the most part -- they just
> provide a memory region and it's their parent that has to map it
> and know whether it overlaps or not. Similarly, parents should
> generally be in control of the container they're mapping the
> memory region into, and know whether it will be an overlapping
> map or not.
> 
> > It's also not clear how should devices allocate priorities.
> 
> Up to the parent which controls the region being mapped into.

We could just assume same priority as parent but what happens if it
has to be different? There are also aliases so a region
can have multiple parents. Presumably it will have to have
different priorities depending on what the parent does?
Here's a list of instances using priority != 0.

hw/armv7m_nvic.c:MEMORY_PRIO_LOW);
hw/cirrus_vga.c:MEMORY_PRIO_LOW);
hw/cirrus_vga.c:MEMORY_PRIO_LOW);
hw/cirrus_vga.c:&s->low_mem_container, 
MEMORY_PRIO_LOW);
hw/kvm/pci-assign.c: &r_dev->mmio, MEMORY_PRIO_LOW);
hw/kvmvapic.c:memory_region_add_subregion(as, rom_paddr, &s->rom, 
MEMORY_PRIO_HIGH);
hw/lpc_ich9.c:MEMORY_PRIO_LOW);
hw/onenand.c:&s->mapped_ram, MEMORY_PRIO_LOW);
hw/pam.c:MEMORY_PRIO_LOW);
hw/pc.c:MEMORY_PRIO_LOW);
hw/pc_sysfw.c:isa_bios, MEMORY_PRIO_LOW);
hw/pc_sysfw.c:isa_bios, MEMORY_PRIO_LOW);
hw/pci/pci.c:MEMORY_PRIO_LOW);
hw/pci/pci_bridge.c:memory_region_add_subregion(parent_space, base, alias, 
MEMORY_PRIO_LOW);
hw/piix_pci.c:MEMORY_PRIO_LOW);
hw/piix_pci.c:&d->rcr_mem, MEMORY_PRIO_LOW);
hw/q35.c:&mch->smram_region, MEMORY_PRIO_LOW);
hw/vga-isa.c:MEMORY_PRIO_LOW);
hw/vga.c:MEMORY_PRIO_MEDIUM);
hw/vga.c:vga_io_memory, MEMORY_PRIO_LOW);
hw/xen_pt_msi.c:MEMORY_PRIO_MEDIUM); /* 
Priority: pci default + 1

Making priority relative to parent but not the same just seems like a recipe 
for disaster.

> I definitely don't like making the priority argument mandatory:
> this is just introducing pointless boilerplate for the common
> case where nothing overlaps and you know nothing overlaps.

Non overlapping is not a common case at all.  E.g. with normal PCI
devices you have no way to know nothing overlaps - addresses are guest
programmable.

See also recent discussion about 64 bit BARs.

We could add a wrapper for MEMORY_PRIO_LOWEST - will that address
your concern?

> Maybe we should take the printf() about subregion collisions
> in memory_region_add_subregion_common() out of the #if 0
> that it currently sits in?
> 
> -- PMM

This is just a debugging tool, it won't fix anything.

-- 
MST



Re: [Qemu-devel] [PATCH RFC] memory: drop _overlap variant

2013-02-14 Thread Peter Maydell
On 14 February 2013 12:45, Michael S. Tsirkin  wrote:
> overlap flag in the region is currently unused, most devices have no
> idea whether their region overlaps with anything, so drop it,
> assume that all regions can overlap and always require priority.

Devices themselves shouldn't care, for the most part -- they just
provide a memory region and it's their parent that has to map it
and know whether it overlaps or not. Similarly, parents should
generally be in control of the container they're mapping the
memory region into, and know whether it will be an overlapping
map or not.

> It's also not clear how should devices allocate priorities.

Up to the parent which controls the region being mapped into.
I definitely don't like making the priority argument mandatory:
this is just introducing pointless boilerplate for the common
case where nothing overlaps and you know nothing overlaps.

Maybe we should take the printf() about subregion collisions
in memory_region_add_subregion_common() out of the #if 0
that it currently sits in?

-- PMM