Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2019-01-17 Thread Yu Zhang
On Tue, Jan 15, 2019 at 03:13:14PM +0800, Yu Zhang wrote:
> On Fri, Dec 28, 2018 at 11:29:41PM -0200, Eduardo Habkost wrote:
> > On Fri, Dec 28, 2018 at 10:32:59AM +0800, Yu Zhang wrote:
> > > On Thu, Dec 27, 2018 at 01:14:11PM -0200, Eduardo Habkost wrote:
> > > > On Wed, Dec 26, 2018 at 01:30:00PM +0800, Yu Zhang wrote:
> > > > > On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote:
> > > > > > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote:
> > > > > > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote:
> > > > > > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin 
> > > > > > > > > wrote:
> > > > > > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > > > > > > > > > > Well, my understanding of the vt-d spec is that the 
> > > > > > > > > > > address limitation in
> > > > > > > > > > > DMAR are referring to the same concept of 
> > > > > > > > > > > CPUID.MAXPHYSADDR. I do not think
> > > > > > > > > > > there's any different in the native scenario. :)
> > > > > > > > > > 
> > > > > > > > > > I think native machines exist on which the two values are 
> > > > > > > > > > different.
> > > > > > > > > > Is that true?
> > > > > > > > > 
> > > > > > > > > I think the answer is not. My understanding is that HAW(host 
> > > > > > > > > address wdith) is
> > > > > > > > > the maximum physical address width a CPU can detects(by 
> > > > > > > > > cpuid.0x8008).
> > > > > > > > > 
> > > > > > > > > I agree there are some addresses the CPU does not touch, but 
> > > > > > > > > they are still in
> > > > > > > > > the physical address space, and there's only one physical 
> > > > > > > > > address space...
> > > > > > > > > 
> > > > > > > > > B.R.
> > > > > > > > > Yu
> > > > > > > > 
> > > > > > > > Ouch I thought we are talking about the virtual address size.
> > > > > > > > I think I did have a box where VTD's virtual address size was
> > > > > > > > smaller than CPU's.
> > > > > > > > For physical one - we just need to make it as big as max 
> > > > > > > > supported
> > > > > > > > memory right?
> > > > > > > 
> > > > > > > Well, my understanding of the physical one is the maximum 
> > > > > > > physical address
> > > > > > > width. Sorry, this explain seems nonsense... I mean, it's not 
> > > > > > > just about
> > > > > > > the max supported memory, but also covers MMIO. It shall be 
> > > > > > > detectable
> > > > > > > from cpuid, or ACPI's DMAR table, instead of calculated by the 
> > > > > > > max memory
> > > > > > > size. One common usage of this value is to tell the paging 
> > > > > > > structure entries(
> > > > > > > CPU's or IOMMU's) which bits shall be reserved. There are also 
> > > > > > > some registers
> > > > > > > e.g. apic base reg etc, whose contents are physical addresses, 
> > > > > > > therefore also
> > > > > > > need to follow the similar requirement for the reserved bits.
> > > > > > > 
> > > > > > > So I think the correct direction might be to define this property 
> > > > > > > in the
> > > > > > > machine status level, instead of the CPU level. Is this 
> > > > > > > reasonable to you?
> > > > > > 
> > > > > > At that level yes. But isn't this already specified by 
> > > > > > "pci-hole64-end"?
> > > > > 
> > > > > But this value is set by guest firmware? Will PCI hotplug change this 
> > > > > address?
> > > > > 
> > > > > @Eduardo, do you have any plan to calculate the phys-bits by 
> > > > > "pci-hole64-end"?
> > > > > Or introduce another property, say "max-phys-bits" in machine status?
> > > > 
> > > > I agree it may make sense to make the machine code control
> > > > phys-bits instead of the CPU object.  A machine property sounds
> > > > like the simplest solution.
> > > > 
> > > > But I don't think we can have a meaningful discussion about
> > > > implementation if we don't agree about the command-line
> > > > interface.  We must decide what will happen to the CPU and iommu
> > > > physical address width in cases like:
> > > 
> > > Thanks, Eduardo.
> > > 
> > > What about we just use "-machine phys-bits=52", and remove the
> > > "phys-bits" from CPU parameter?
> > 
> > Maybe we can deprecate it, but we can't remove it immediately.
> > We still need to decide what to do on the cases below, while the
> > option is still available.
> 
> I saw the ACPI DMAR is ininitialized in acpi_build(), which is called
> by pc_machine_done(). I guess this is done after the initialization of
> vCPU and vIOMMU.
> 
> So I am wondering, instead of moving "phys-bits" from X86CPU into the
> MachineState, maybe we could:
> 
> 1> Define a "phys_bits" in MachineState or PCMachineState(not sure which
> one is more suitable).
> 
> 2> Set ms->phys_bits in x86_cpu_realizefn().
> 
> 3> Since DMAR is created after vCPU creation, we can build DMAR table
> with ms->phys_bits.
> 
> 4> Also, we can reset the hardware address width for 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2019-01-14 Thread Yu Zhang
On Fri, Dec 28, 2018 at 11:29:41PM -0200, Eduardo Habkost wrote:
> On Fri, Dec 28, 2018 at 10:32:59AM +0800, Yu Zhang wrote:
> > On Thu, Dec 27, 2018 at 01:14:11PM -0200, Eduardo Habkost wrote:
> > > On Wed, Dec 26, 2018 at 01:30:00PM +0800, Yu Zhang wrote:
> > > > On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote:
> > > > > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote:
> > > > > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote:
> > > > > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > > > > > > > > > Well, my understanding of the vt-d spec is that the address 
> > > > > > > > > > limitation in
> > > > > > > > > > DMAR are referring to the same concept of 
> > > > > > > > > > CPUID.MAXPHYSADDR. I do not think
> > > > > > > > > > there's any different in the native scenario. :)
> > > > > > > > > 
> > > > > > > > > I think native machines exist on which the two values are 
> > > > > > > > > different.
> > > > > > > > > Is that true?
> > > > > > > > 
> > > > > > > > I think the answer is not. My understanding is that HAW(host 
> > > > > > > > address wdith) is
> > > > > > > > the maximum physical address width a CPU can detects(by 
> > > > > > > > cpuid.0x8008).
> > > > > > > > 
> > > > > > > > I agree there are some addresses the CPU does not touch, but 
> > > > > > > > they are still in
> > > > > > > > the physical address space, and there's only one physical 
> > > > > > > > address space...
> > > > > > > > 
> > > > > > > > B.R.
> > > > > > > > Yu
> > > > > > > 
> > > > > > > Ouch I thought we are talking about the virtual address size.
> > > > > > > I think I did have a box where VTD's virtual address size was
> > > > > > > smaller than CPU's.
> > > > > > > For physical one - we just need to make it as big as max supported
> > > > > > > memory right?
> > > > > > 
> > > > > > Well, my understanding of the physical one is the maximum physical 
> > > > > > address
> > > > > > width. Sorry, this explain seems nonsense... I mean, it's not just 
> > > > > > about
> > > > > > the max supported memory, but also covers MMIO. It shall be 
> > > > > > detectable
> > > > > > from cpuid, or ACPI's DMAR table, instead of calculated by the max 
> > > > > > memory
> > > > > > size. One common usage of this value is to tell the paging 
> > > > > > structure entries(
> > > > > > CPU's or IOMMU's) which bits shall be reserved. There are also some 
> > > > > > registers
> > > > > > e.g. apic base reg etc, whose contents are physical addresses, 
> > > > > > therefore also
> > > > > > need to follow the similar requirement for the reserved bits.
> > > > > > 
> > > > > > So I think the correct direction might be to define this property 
> > > > > > in the
> > > > > > machine status level, instead of the CPU level. Is this reasonable 
> > > > > > to you?
> > > > > 
> > > > > At that level yes. But isn't this already specified by 
> > > > > "pci-hole64-end"?
> > > > 
> > > > But this value is set by guest firmware? Will PCI hotplug change this 
> > > > address?
> > > > 
> > > > @Eduardo, do you have any plan to calculate the phys-bits by 
> > > > "pci-hole64-end"?
> > > > Or introduce another property, say "max-phys-bits" in machine status?
> > > 
> > > I agree it may make sense to make the machine code control
> > > phys-bits instead of the CPU object.  A machine property sounds
> > > like the simplest solution.
> > > 
> > > But I don't think we can have a meaningful discussion about
> > > implementation if we don't agree about the command-line
> > > interface.  We must decide what will happen to the CPU and iommu
> > > physical address width in cases like:
> > 
> > Thanks, Eduardo.
> > 
> > What about we just use "-machine phys-bits=52", and remove the
> > "phys-bits" from CPU parameter?
> 
> Maybe we can deprecate it, but we can't remove it immediately.
> We still need to decide what to do on the cases below, while the
> option is still available.

I saw the ACPI DMAR is ininitialized in acpi_build(), which is called
by pc_machine_done(). I guess this is done after the initialization of
vCPU and vIOMMU.

So I am wondering, instead of moving "phys-bits" from X86CPU into the
MachineState, maybe we could:

1> Define a "phys_bits" in MachineState or PCMachineState(not sure which
one is more suitable).

2> Set ms->phys_bits in x86_cpu_realizefn().

3> Since DMAR is created after vCPU creation, we can build DMAR table
with ms->phys_bits.

4> Also, we can reset the hardware address width for vIOMMU(and the
vtd_paging_entry_rsvd_field array) in pc_machine_done(), based on the value
of ms->phys_bits, or from ACPI DMAR table(from spec point of view, address
width limitation of IOMMU shall come from DMAR, yet I have not figured out
any simple approach to probe the ACPI property). 

This way, 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-28 Thread Eduardo Habkost
On Fri, Dec 28, 2018 at 10:32:59AM +0800, Yu Zhang wrote:
> On Thu, Dec 27, 2018 at 01:14:11PM -0200, Eduardo Habkost wrote:
> > On Wed, Dec 26, 2018 at 01:30:00PM +0800, Yu Zhang wrote:
> > > On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote:
> > > > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote:
> > > > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote:
> > > > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote:
> > > > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > > > > > > > > Well, my understanding of the vt-d spec is that the address 
> > > > > > > > > limitation in
> > > > > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. 
> > > > > > > > > I do not think
> > > > > > > > > there's any different in the native scenario. :)
> > > > > > > > 
> > > > > > > > I think native machines exist on which the two values are 
> > > > > > > > different.
> > > > > > > > Is that true?
> > > > > > > 
> > > > > > > I think the answer is not. My understanding is that HAW(host 
> > > > > > > address wdith) is
> > > > > > > the maximum physical address width a CPU can detects(by 
> > > > > > > cpuid.0x8008).
> > > > > > > 
> > > > > > > I agree there are some addresses the CPU does not touch, but they 
> > > > > > > are still in
> > > > > > > the physical address space, and there's only one physical address 
> > > > > > > space...
> > > > > > > 
> > > > > > > B.R.
> > > > > > > Yu
> > > > > > 
> > > > > > Ouch I thought we are talking about the virtual address size.
> > > > > > I think I did have a box where VTD's virtual address size was
> > > > > > smaller than CPU's.
> > > > > > For physical one - we just need to make it as big as max supported
> > > > > > memory right?
> > > > > 
> > > > > Well, my understanding of the physical one is the maximum physical 
> > > > > address
> > > > > width. Sorry, this explain seems nonsense... I mean, it's not just 
> > > > > about
> > > > > the max supported memory, but also covers MMIO. It shall be detectable
> > > > > from cpuid, or ACPI's DMAR table, instead of calculated by the max 
> > > > > memory
> > > > > size. One common usage of this value is to tell the paging structure 
> > > > > entries(
> > > > > CPU's or IOMMU's) which bits shall be reserved. There are also some 
> > > > > registers
> > > > > e.g. apic base reg etc, whose contents are physical addresses, 
> > > > > therefore also
> > > > > need to follow the similar requirement for the reserved bits.
> > > > > 
> > > > > So I think the correct direction might be to define this property in 
> > > > > the
> > > > > machine status level, instead of the CPU level. Is this reasonable to 
> > > > > you?
> > > > 
> > > > At that level yes. But isn't this already specified by "pci-hole64-end"?
> > > 
> > > But this value is set by guest firmware? Will PCI hotplug change this 
> > > address?
> > > 
> > > @Eduardo, do you have any plan to calculate the phys-bits by 
> > > "pci-hole64-end"?
> > > Or introduce another property, say "max-phys-bits" in machine status?
> > 
> > I agree it may make sense to make the machine code control
> > phys-bits instead of the CPU object.  A machine property sounds
> > like the simplest solution.
> > 
> > But I don't think we can have a meaningful discussion about
> > implementation if we don't agree about the command-line
> > interface.  We must decide what will happen to the CPU and iommu
> > physical address width in cases like:
> 
> Thanks, Eduardo.
> 
> What about we just use "-machine phys-bits=52", and remove the
> "phys-bits" from CPU parameter?

Maybe we can deprecate it, but we can't remove it immediately.
We still need to decide what to do on the cases below, while the
option is still available.

> 
> > 
> >   $QEMU -device intel-iommu
> >   $QEMU -cpu ...,phys-bits=50 -device intel-iommu
> >   $QEMU -cpu ...,host-phys-bits=on -device intel-iommu
> >   $QEMU -machine phys-bits=50 -device intel-iommu
> >   $QEMU -machine phys-bits=50 -cpu ...,phys-bits=48 -device intel-iommu
> > 
> > -- 
> > Eduardo
> > 
> 
> B.R.
> Yu

-- 
Eduardo



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-28 Thread Igor Mammedov
On Thu, 27 Dec 2018 12:54:02 -0200
Eduardo Habkost  wrote:

> On Fri, Dec 21, 2018 at 03:13:25PM +0100, Igor Mammedov wrote:
> > On Thu, 20 Dec 2018 19:18:01 -0200
> > Eduardo Habkost  wrote:
> > 
> > > On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote:
> > > > On Wed, 19 Dec 2018 10:57:17 +0800
> > > > Yu Zhang  wrote:
> > > >   
> > > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:  
> > > > > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > > > > Yu Zhang  wrote:
> > > > > > 
> > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > > > > Yu Zhang  wrote:
> > > > > > > > 
> > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, 
> > > > > > > > > instead of
> > > > > > > > > the host address width(HAW) to calculate the number of 
> > > > > > > > > reserved bits in
> > > > > > > > > data structures such as root entries, context entries, and 
> > > > > > > > > entries of
> > > > > > > > > DMA paging structures etc.
> > > > > > > > > 
> > > > > > > > > However values of IOVA address width and of the HAW may not 
> > > > > > > > > equal. For
> > > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses 
> > > > > > > > > no wider than
> > > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved 
> > > > > > > > > bit may result
> > > > > > > > > in an invalid IOVA being accepted.
> > > > > > > > > 
> > > > > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > > > > IntelIOMMUState,
> > > > > > > > > whose value is initialized based on the maximum physical 
> > > > > > > > > address set to
> > > > > > > > > guest CPU.
> > > > > > > > 
> > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are 
> > > > > > > > > renamed
> > > > > > > > > to clarify.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Yu Zhang 
> > > > > > > > > Reviewed-by: Peter Xu 
> > > > > > > > > ---
> > > > > > > > [...]
> > > > > > > > 
> > > > > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier 
> > > > > > > > > *n)
> > > > > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > > > > >  {
> > > > > > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > > > > +CPUState *cs = first_cpu;
> > > > > > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > > > > > >  
> > > > > > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState 
> > > > > > > > > *s)
> > > > > > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > > > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > > > > >  }
> > > > > > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > > > > +s->haw_bits = cpu->phys_bits;
> > > > > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > > > > altogether
> > > > > > > > and set phys_bits when iommu is created?
> > > > > > > 
> > > > > > > Thanks for your comments, Igor.
> > > > > > > 
> > > > > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > > > > deciding
> > > > > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > > > > 
> > > > > > > Here the hardware address width in vt-d, and the one in 
> > > > > > > cpuid.MAXPHYSADDR
> > > > > > > are referring to the same concept. In VM, both are the maximum 
> > > > > > > guest physical
> > > > > > > address width. If we do not check the CPU field here, we will 
> > > > > > > still have to
> > > > > > > check the CPU field in other places such as build_dmar_q35(), and 
> > > > > > > reset the
> > > > > > > s->haw_bits again.
> > > > > > > 
> > > > > > > Is this explanation convincing enough? :)
> > > > > > current build_dmar_q35() doesn't do it, it's all new code in this 
> > > > > > series that
> > > > > > contains not acceptable direct access from one device (iommu) to 
> > > > > > another (cpu).   
> > > > > > Proper way would be for the owner of iommu to fish limits from 
> > > > > > somewhere and set
> > > > > > values during iommu creation.
> > > > > 
> > > > > Well, current build_dmar_q35() doesn't do it, because it is using the 
> > > > > incorrect value. :)
> > > > > According to the spec, the host address width is the maximum physical 
> > > > > address width,
> > > > > yet current implementation is using the DMA address width. For me, 
> > > > > this is not only
> > > > > wrong, but also unsecure. For this point, I think we all agree 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-27 Thread Yu Zhang
On Thu, Dec 27, 2018 at 01:14:11PM -0200, Eduardo Habkost wrote:
> On Wed, Dec 26, 2018 at 01:30:00PM +0800, Yu Zhang wrote:
> > On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote:
> > > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote:
> > > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote:
> > > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote:
> > > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > > > > > > > Well, my understanding of the vt-d spec is that the address 
> > > > > > > > limitation in
> > > > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I 
> > > > > > > > do not think
> > > > > > > > there's any different in the native scenario. :)
> > > > > > > 
> > > > > > > I think native machines exist on which the two values are 
> > > > > > > different.
> > > > > > > Is that true?
> > > > > > 
> > > > > > I think the answer is not. My understanding is that HAW(host 
> > > > > > address wdith) is
> > > > > > the maximum physical address width a CPU can detects(by 
> > > > > > cpuid.0x8008).
> > > > > > 
> > > > > > I agree there are some addresses the CPU does not touch, but they 
> > > > > > are still in
> > > > > > the physical address space, and there's only one physical address 
> > > > > > space...
> > > > > > 
> > > > > > B.R.
> > > > > > Yu
> > > > > 
> > > > > Ouch I thought we are talking about the virtual address size.
> > > > > I think I did have a box where VTD's virtual address size was
> > > > > smaller than CPU's.
> > > > > For physical one - we just need to make it as big as max supported
> > > > > memory right?
> > > > 
> > > > Well, my understanding of the physical one is the maximum physical 
> > > > address
> > > > width. Sorry, this explain seems nonsense... I mean, it's not just about
> > > > the max supported memory, but also covers MMIO. It shall be detectable
> > > > from cpuid, or ACPI's DMAR table, instead of calculated by the max 
> > > > memory
> > > > size. One common usage of this value is to tell the paging structure 
> > > > entries(
> > > > CPU's or IOMMU's) which bits shall be reserved. There are also some 
> > > > registers
> > > > e.g. apic base reg etc, whose contents are physical addresses, 
> > > > therefore also
> > > > need to follow the similar requirement for the reserved bits.
> > > > 
> > > > So I think the correct direction might be to define this property in the
> > > > machine status level, instead of the CPU level. Is this reasonable to 
> > > > you?
> > > 
> > > At that level yes. But isn't this already specified by "pci-hole64-end"?
> > 
> > But this value is set by guest firmware? Will PCI hotplug change this 
> > address?
> > 
> > @Eduardo, do you have any plan to calculate the phys-bits by 
> > "pci-hole64-end"?
> > Or introduce another property, say "max-phys-bits" in machine status?
> 
> I agree it may make sense to make the machine code control
> phys-bits instead of the CPU object.  A machine property sounds
> like the simplest solution.
> 
> But I don't think we can have a meaningful discussion about
> implementation if we don't agree about the command-line
> interface.  We must decide what will happen to the CPU and iommu
> physical address width in cases like:

Thanks, Eduardo.

What about we just use "-machine phys-bits=52", and remove the
"phys-bits" from CPU parameter?

> 
>   $QEMU -device intel-iommu
>   $QEMU -cpu ...,phys-bits=50 -device intel-iommu
>   $QEMU -cpu ...,host-phys-bits=on -device intel-iommu
>   $QEMU -machine phys-bits=50 -device intel-iommu
>   $QEMU -machine phys-bits=50 -cpu ...,phys-bits=48 -device intel-iommu
> 
> -- 
> Eduardo
> 

B.R.
Yu



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-27 Thread Eduardo Habkost
On Wed, Dec 26, 2018 at 01:30:00PM +0800, Yu Zhang wrote:
> On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote:
> > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote:
> > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote:
> > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote:
> > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote:
> > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > > > > > > Well, my understanding of the vt-d spec is that the address 
> > > > > > > limitation in
> > > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do 
> > > > > > > not think
> > > > > > > there's any different in the native scenario. :)
> > > > > > 
> > > > > > I think native machines exist on which the two values are different.
> > > > > > Is that true?
> > > > > 
> > > > > I think the answer is not. My understanding is that HAW(host address 
> > > > > wdith) is
> > > > > the maximum physical address width a CPU can detects(by 
> > > > > cpuid.0x8008).
> > > > > 
> > > > > I agree there are some addresses the CPU does not touch, but they are 
> > > > > still in
> > > > > the physical address space, and there's only one physical address 
> > > > > space...
> > > > > 
> > > > > B.R.
> > > > > Yu
> > > > 
> > > > Ouch I thought we are talking about the virtual address size.
> > > > I think I did have a box where VTD's virtual address size was
> > > > smaller than CPU's.
> > > > For physical one - we just need to make it as big as max supported
> > > > memory right?
> > > 
> > > Well, my understanding of the physical one is the maximum physical address
> > > width. Sorry, this explain seems nonsense... I mean, it's not just about
> > > the max supported memory, but also covers MMIO. It shall be detectable
> > > from cpuid, or ACPI's DMAR table, instead of calculated by the max memory
> > > size. One common usage of this value is to tell the paging structure 
> > > entries(
> > > CPU's or IOMMU's) which bits shall be reserved. There are also some 
> > > registers
> > > e.g. apic base reg etc, whose contents are physical addresses, therefore 
> > > also
> > > need to follow the similar requirement for the reserved bits.
> > > 
> > > So I think the correct direction might be to define this property in the
> > > machine status level, instead of the CPU level. Is this reasonable to you?
> > 
> > At that level yes. But isn't this already specified by "pci-hole64-end"?
> 
> But this value is set by guest firmware? Will PCI hotplug change this address?
> 
> @Eduardo, do you have any plan to calculate the phys-bits by "pci-hole64-end"?
> Or introduce another property, say "max-phys-bits" in machine status?

I agree it may make sense to make the machine code control
phys-bits instead of the CPU object.  A machine property sounds
like the simplest solution.

But I don't think we can have a meaningful discussion about
implementation if we don't agree about the command-line
interface.  We must decide what will happen to the CPU and iommu
physical address width in cases like:

  $QEMU -device intel-iommu
  $QEMU -cpu ...,phys-bits=50 -device intel-iommu
  $QEMU -cpu ...,host-phys-bits=on -device intel-iommu
  $QEMU -machine phys-bits=50 -device intel-iommu
  $QEMU -machine phys-bits=50 -cpu ...,phys-bits=48 -device intel-iommu

-- 
Eduardo



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-27 Thread Eduardo Habkost
On Fri, Dec 21, 2018 at 03:13:25PM +0100, Igor Mammedov wrote:
> On Thu, 20 Dec 2018 19:18:01 -0200
> Eduardo Habkost  wrote:
> 
> > On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote:
> > > On Wed, 19 Dec 2018 10:57:17 +0800
> > > Yu Zhang  wrote:
> > >   
> > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:  
> > > > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > > > Yu Zhang  wrote:
> > > > > 
> > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > > > Yu Zhang  wrote:
> > > > > > > 
> > > > > > > > Currently, vIOMMU is using the value of IOVA address width, 
> > > > > > > > instead of
> > > > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > > > bits in
> > > > > > > > data structures such as root entries, context entries, and 
> > > > > > > > entries of
> > > > > > > > DMA paging structures etc.
> > > > > > > > 
> > > > > > > > However values of IOVA address width and of the HAW may not 
> > > > > > > > equal. For
> > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > > > wider than
> > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit 
> > > > > > > > may result
> > > > > > > > in an invalid IOVA being accepted.
> > > > > > > > 
> > > > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > > > IntelIOMMUState,
> > > > > > > > whose value is initialized based on the maximum physical 
> > > > > > > > address set to
> > > > > > > > guest CPU.
> > > > > > > 
> > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > > > to clarify.
> > > > > > > > 
> > > > > > > > Signed-off-by: Yu Zhang 
> > > > > > > > Reviewed-by: Peter Xu 
> > > > > > > > ---
> > > > > > > [...]
> > > > > > > 
> > > > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > > > >  {
> > > > > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > > > +CPUState *cs = first_cpu;
> > > > > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > > > > >  
> > > > > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > > > >  }
> > > > > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > > > +s->haw_bits = cpu->phys_bits;
> > > > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > > > altogether
> > > > > > > and set phys_bits when iommu is created?
> > > > > > 
> > > > > > Thanks for your comments, Igor.
> > > > > > 
> > > > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > > > deciding
> > > > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > > > 
> > > > > > Here the hardware address width in vt-d, and the one in 
> > > > > > cpuid.MAXPHYSADDR
> > > > > > are referring to the same concept. In VM, both are the maximum 
> > > > > > guest physical
> > > > > > address width. If we do not check the CPU field here, we will still 
> > > > > > have to
> > > > > > check the CPU field in other places such as build_dmar_q35(), and 
> > > > > > reset the
> > > > > > s->haw_bits again.
> > > > > > 
> > > > > > Is this explanation convincing enough? :)
> > > > > current build_dmar_q35() doesn't do it, it's all new code in this 
> > > > > series that
> > > > > contains not acceptable direct access from one device (iommu) to 
> > > > > another (cpu).   
> > > > > Proper way would be for the owner of iommu to fish limits from 
> > > > > somewhere and set
> > > > > values during iommu creation.
> > > > 
> > > > Well, current build_dmar_q35() doesn't do it, because it is using the 
> > > > incorrect value. :)
> > > > According to the spec, the host address width is the maximum physical 
> > > > address width,
> > > > yet current implementation is using the DMA address width. For me, this 
> > > > is not only
> > > > wrong, but also unsecure. For this point, I think we all agree this 
> > > > need to be fixed.
> > > > 
> > > > As to how to fix it - should we query the cpu fields, I still do not 
> > > > understand why
> > > > this is not acceptable. :)
> > > > 
> > > > I had thought of other approaches before, yet I did not choose:
> > > >   
> > > > 1> Introduce a new parameter, say, 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-25 Thread Yu Zhang
On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote:
> On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote:
> > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote:
> > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote:
> > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote:
> > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > > > > > Well, my understanding of the vt-d spec is that the address 
> > > > > > limitation in
> > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do 
> > > > > > not think
> > > > > > there's any different in the native scenario. :)
> > > > > 
> > > > > I think native machines exist on which the two values are different.
> > > > > Is that true?
> > > > 
> > > > I think the answer is not. My understanding is that HAW(host address 
> > > > wdith) is
> > > > the maximum physical address width a CPU can detects(by 
> > > > cpuid.0x8008).
> > > > 
> > > > I agree there are some addresses the CPU does not touch, but they are 
> > > > still in
> > > > the physical address space, and there's only one physical address 
> > > > space...
> > > > 
> > > > B.R.
> > > > Yu
> > > 
> > > Ouch I thought we are talking about the virtual address size.
> > > I think I did have a box where VTD's virtual address size was
> > > smaller than CPU's.
> > > For physical one - we just need to make it as big as max supported
> > > memory right?
> > 
> > Well, my understanding of the physical one is the maximum physical address
> > width. Sorry, this explain seems nonsense... I mean, it's not just about
> > the max supported memory, but also covers MMIO. It shall be detectable
> > from cpuid, or ACPI's DMAR table, instead of calculated by the max memory
> > size. One common usage of this value is to tell the paging structure 
> > entries(
> > CPU's or IOMMU's) which bits shall be reserved. There are also some 
> > registers
> > e.g. apic base reg etc, whose contents are physical addresses, therefore 
> > also
> > need to follow the similar requirement for the reserved bits.
> > 
> > So I think the correct direction might be to define this property in the
> > machine status level, instead of the CPU level. Is this reasonable to you?
> 
> At that level yes. But isn't this already specified by "pci-hole64-end"?

But this value is set by guest firmware? Will PCI hotplug change this address?

@Eduardo, do you have any plan to calculate the phys-bits by "pci-hole64-end"?
Or introduce another property, say "max-phys-bits" in machine status?

> 
> 
> 
> > > 
> > > -- 
> > > MST
> > 
> > B.R.
> > Yu
> 

B.R.
Yu



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-25 Thread Michael S. Tsirkin
On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote:
> On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote:
> > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote:
> > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote:
> > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > > > > Well, my understanding of the vt-d spec is that the address 
> > > > > limitation in
> > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not 
> > > > > think
> > > > > there's any different in the native scenario. :)
> > > > 
> > > > I think native machines exist on which the two values are different.
> > > > Is that true?
> > > 
> > > I think the answer is not. My understanding is that HAW(host address 
> > > wdith) is
> > > the maximum physical address width a CPU can detects(by cpuid.0x8008).
> > > 
> > > I agree there are some addresses the CPU does not touch, but they are 
> > > still in
> > > the physical address space, and there's only one physical address space...
> > > 
> > > B.R.
> > > Yu
> > 
> > Ouch I thought we are talking about the virtual address size.
> > I think I did have a box where VTD's virtual address size was
> > smaller than CPU's.
> > For physical one - we just need to make it as big as max supported
> > memory right?
> 
> Well, my understanding of the physical one is the maximum physical address
> width. Sorry, this explain seems nonsense... I mean, it's not just about
> the max supported memory, but also covers MMIO. It shall be detectable
> from cpuid, or ACPI's DMAR table, instead of calculated by the max memory
> size. One common usage of this value is to tell the paging structure entries(
> CPU's or IOMMU's) which bits shall be reserved. There are also some registers
> e.g. apic base reg etc, whose contents are physical addresses, therefore also
> need to follow the similar requirement for the reserved bits.
> 
> So I think the correct direction might be to define this property in the
> machine status level, instead of the CPU level. Is this reasonable to you?

At that level yes. But isn't this already specified by "pci-hole64-end"?



> > 
> > -- 
> > MST
> 
> B.R.
> Yu



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-21 Thread Yu Zhang
On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote:
> On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote:
> > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote:
> > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > > > Well, my understanding of the vt-d spec is that the address limitation 
> > > > in
> > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not 
> > > > think
> > > > there's any different in the native scenario. :)
> > > 
> > > I think native machines exist on which the two values are different.
> > > Is that true?
> > 
> > I think the answer is not. My understanding is that HAW(host address wdith) 
> > is
> > the maximum physical address width a CPU can detects(by cpuid.0x8008).
> > 
> > I agree there are some addresses the CPU does not touch, but they are still 
> > in
> > the physical address space, and there's only one physical address space...
> > 
> > B.R.
> > Yu
> 
> Ouch I thought we are talking about the virtual address size.
> I think I did have a box where VTD's virtual address size was
> smaller than CPU's.
> For physical one - we just need to make it as big as max supported
> memory right?

Well, my understanding of the physical one is the maximum physical address
width. Sorry, this explain seems nonsense... I mean, it's not just about
the max supported memory, but also covers MMIO. It shall be detectable
from cpuid, or ACPI's DMAR table, instead of calculated by the max memory
size. One common usage of this value is to tell the paging structure entries(
CPU's or IOMMU's) which bits shall be reserved. There are also some registers
e.g. apic base reg etc, whose contents are physical addresses, therefore also
need to follow the similar requirement for the reserved bits.

So I think the correct direction might be to define this property in the
machine status level, instead of the CPU level. Is this reasonable to you?

> 
> -- 
> MST

B.R.
Yu



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-21 Thread Eduardo Habkost
On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote:
> On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote:
> > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote:
> > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > > > Well, my understanding of the vt-d spec is that the address limitation 
> > > > in
> > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not 
> > > > think
> > > > there's any different in the native scenario. :)
> > > 
> > > I think native machines exist on which the two values are different.
> > > Is that true?
> > 
> > I think the answer is not. My understanding is that HAW(host address wdith) 
> > is
> > the maximum physical address width a CPU can detects(by cpuid.0x8008).
> > 
> > I agree there are some addresses the CPU does not touch, but they are still 
> > in
> > the physical address space, and there's only one physical address space...
> > 
> > B.R.
> > Yu
> 
> Ouch I thought we are talking about the virtual address size.
> I think I did have a box where VTD's virtual address size was
> smaller than CPU's.
> For physical one - we just need to make it as big as max supported
> memory right?

What exactly do you mean by "max supported memory"?

-- 
Eduardo



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-21 Thread Michael S. Tsirkin
On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote:
> On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote:
> > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > > Well, my understanding of the vt-d spec is that the address limitation in
> > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not 
> > > think
> > > there's any different in the native scenario. :)
> > 
> > I think native machines exist on which the two values are different.
> > Is that true?
> 
> I think the answer is not. My understanding is that HAW(host address wdith) is
> the maximum physical address width a CPU can detects(by cpuid.0x8008).
> 
> I agree there are some addresses the CPU does not touch, but they are still in
> the physical address space, and there's only one physical address space...
> 
> B.R.
> Yu

Ouch I thought we are talking about the virtual address size.
I think I did have a box where VTD's virtual address size was
smaller than CPU's.
For physical one - we just need to make it as big as max supported
memory right?

-- 
MST



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-21 Thread Yu Zhang
On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote:
> On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> > Well, my understanding of the vt-d spec is that the address limitation in
> > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think
> > there's any different in the native scenario. :)
> 
> I think native machines exist on which the two values are different.
> Is that true?

I think the answer is not. My understanding is that HAW(host address wdith) is
the maximum physical address width a CPU can detects(by cpuid.0x8008).

I agree there are some addresses the CPU does not touch, but they are still in
the physical address space, and there's only one physical address space...

B.R.
Yu



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-21 Thread Michael S. Tsirkin
On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote:
> Well, my understanding of the vt-d spec is that the address limitation in
> DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think
> there's any different in the native scenario. :)

I think native machines exist on which the two values are different.
Is that true?



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-21 Thread Yu Zhang
On Fri, Dec 21, 2018 at 03:13:25PM +0100, Igor Mammedov wrote:
> On Thu, 20 Dec 2018 19:18:01 -0200
> Eduardo Habkost  wrote:
> 
> > On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote:
> > > On Wed, 19 Dec 2018 10:57:17 +0800
> > > Yu Zhang  wrote:
> > >   
> > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:  
> > > > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > > > Yu Zhang  wrote:
> > > > > 
> > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > > > Yu Zhang  wrote:
> > > > > > > 
> > > > > > > > Currently, vIOMMU is using the value of IOVA address width, 
> > > > > > > > instead of
> > > > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > > > bits in
> > > > > > > > data structures such as root entries, context entries, and 
> > > > > > > > entries of
> > > > > > > > DMA paging structures etc.
> > > > > > > > 
> > > > > > > > However values of IOVA address width and of the HAW may not 
> > > > > > > > equal. For
> > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > > > wider than
> > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit 
> > > > > > > > may result
> > > > > > > > in an invalid IOVA being accepted.
> > > > > > > > 
> > > > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > > > IntelIOMMUState,
> > > > > > > > whose value is initialized based on the maximum physical 
> > > > > > > > address set to
> > > > > > > > guest CPU.
> > > > > > > 
> > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > > > to clarify.
> > > > > > > > 
> > > > > > > > Signed-off-by: Yu Zhang 
> > > > > > > > Reviewed-by: Peter Xu 
> > > > > > > > ---
> > > > > > > [...]
> > > > > > > 
> > > > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > > > >  {
> > > > > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > > > +CPUState *cs = first_cpu;
> > > > > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > > > > >  
> > > > > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > > > >  }
> > > > > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > > > +s->haw_bits = cpu->phys_bits;
> > > > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > > > altogether
> > > > > > > and set phys_bits when iommu is created?
> > > > > > 
> > > > > > Thanks for your comments, Igor.
> > > > > > 
> > > > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > > > deciding
> > > > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > > > 
> > > > > > Here the hardware address width in vt-d, and the one in 
> > > > > > cpuid.MAXPHYSADDR
> > > > > > are referring to the same concept. In VM, both are the maximum 
> > > > > > guest physical
> > > > > > address width. If we do not check the CPU field here, we will still 
> > > > > > have to
> > > > > > check the CPU field in other places such as build_dmar_q35(), and 
> > > > > > reset the
> > > > > > s->haw_bits again.
> > > > > > 
> > > > > > Is this explanation convincing enough? :)
> > > > > current build_dmar_q35() doesn't do it, it's all new code in this 
> > > > > series that
> > > > > contains not acceptable direct access from one device (iommu) to 
> > > > > another (cpu).   
> > > > > Proper way would be for the owner of iommu to fish limits from 
> > > > > somewhere and set
> > > > > values during iommu creation.
> > > > 
> > > > Well, current build_dmar_q35() doesn't do it, because it is using the 
> > > > incorrect value. :)
> > > > According to the spec, the host address width is the maximum physical 
> > > > address width,
> > > > yet current implementation is using the DMA address width. For me, this 
> > > > is not only
> > > > wrong, but also unsecure. For this point, I think we all agree this 
> > > > need to be fixed.
> > > > 
> > > > As to how to fix it - should we query the cpu fields, I still do not 
> > > > understand why
> > > > this is not acceptable. :)
> > > > 
> > > > I had thought of other approaches before, yet I did not choose:
> > > >   
> > > > 1> Introduce a new parameter, say, 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-21 Thread Igor Mammedov
On Thu, 20 Dec 2018 19:18:01 -0200
Eduardo Habkost  wrote:

> On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote:
> > On Wed, 19 Dec 2018 10:57:17 +0800
> > Yu Zhang  wrote:
> >   
> > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:  
> > > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > > Yu Zhang  wrote:
> > > > 
> > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > > Yu Zhang  wrote:
> > > > > > 
> > > > > > > Currently, vIOMMU is using the value of IOVA address width, 
> > > > > > > instead of
> > > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > > bits in
> > > > > > > data structures such as root entries, context entries, and 
> > > > > > > entries of
> > > > > > > DMA paging structures etc.
> > > > > > > 
> > > > > > > However values of IOVA address width and of the HAW may not 
> > > > > > > equal. For
> > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > > wider than
> > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit 
> > > > > > > may result
> > > > > > > in an invalid IOVA being accepted.
> > > > > > > 
> > > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > > IntelIOMMUState,
> > > > > > > whose value is initialized based on the maximum physical address 
> > > > > > > set to
> > > > > > > guest CPU.
> > > > > > 
> > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > > to clarify.
> > > > > > > 
> > > > > > > Signed-off-by: Yu Zhang 
> > > > > > > Reviewed-by: Peter Xu 
> > > > > > > ---
> > > > > > [...]
> > > > > > 
> > > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > > >  {
> > > > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > > +CPUState *cs = first_cpu;
> > > > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > > > >  
> > > > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > > >  }
> > > > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > > +s->haw_bits = cpu->phys_bits;
> > > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > > altogether
> > > > > > and set phys_bits when iommu is created?
> > > > > 
> > > > > Thanks for your comments, Igor.
> > > > > 
> > > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > > deciding
> > > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > > 
> > > > > Here the hardware address width in vt-d, and the one in 
> > > > > cpuid.MAXPHYSADDR
> > > > > are referring to the same concept. In VM, both are the maximum guest 
> > > > > physical
> > > > > address width. If we do not check the CPU field here, we will still 
> > > > > have to
> > > > > check the CPU field in other places such as build_dmar_q35(), and 
> > > > > reset the
> > > > > s->haw_bits again.
> > > > > 
> > > > > Is this explanation convincing enough? :)
> > > > current build_dmar_q35() doesn't do it, it's all new code in this 
> > > > series that
> > > > contains not acceptable direct access from one device (iommu) to 
> > > > another (cpu).   
> > > > Proper way would be for the owner of iommu to fish limits from 
> > > > somewhere and set
> > > > values during iommu creation.
> > > 
> > > Well, current build_dmar_q35() doesn't do it, because it is using the 
> > > incorrect value. :)
> > > According to the spec, the host address width is the maximum physical 
> > > address width,
> > > yet current implementation is using the DMA address width. For me, this 
> > > is not only
> > > wrong, but also unsecure. For this point, I think we all agree this need 
> > > to be fixed.
> > > 
> > > As to how to fix it - should we query the cpu fields, I still do not 
> > > understand why
> > > this is not acceptable. :)
> > > 
> > > I had thought of other approaches before, yet I did not choose:
> > >   
> > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu 
> > > to limit its
> > > physical address width(similar to the "x-aw-bits" for IOVA). But what 
> > > should we check
> > > this parameter or not? What if this parameter is set to sth. different 
> > > than the "phys-bits"
> > > or not?
> > >   
> > > 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-20 Thread Eduardo Habkost
On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote:
> On Wed, 19 Dec 2018 10:57:17 +0800
> Yu Zhang  wrote:
> 
> > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > Yu Zhang  wrote:
> > >   
> > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:  
> > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > Yu Zhang  wrote:
> > > > >   
> > > > > > Currently, vIOMMU is using the value of IOVA address width, instead 
> > > > > > of
> > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > bits in
> > > > > > data structures such as root entries, context entries, and entries 
> > > > > > of
> > > > > > DMA paging structures etc.
> > > > > > 
> > > > > > However values of IOVA address width and of the HAW may not equal. 
> > > > > > For
> > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > wider than
> > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may 
> > > > > > result
> > > > > > in an invalid IOVA being accepted.
> > > > > > 
> > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > IntelIOMMUState,
> > > > > > whose value is initialized based on the maximum physical address 
> > > > > > set to
> > > > > > guest CPU.  
> > > > >   
> > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > to clarify.
> > > > > > 
> > > > > > Signed-off-by: Yu Zhang 
> > > > > > Reviewed-by: Peter Xu 
> > > > > > ---  
> > > > > [...]
> > > > >   
> > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > >  {
> > > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > +CPUState *cs = first_cpu;
> > > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > > >  
> > > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > >  }
> > > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > +s->haw_bits = cpu->phys_bits;  
> > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > altogether
> > > > > and set phys_bits when iommu is created?  
> > > > 
> > > > Thanks for your comments, Igor.
> > > > 
> > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > deciding
> > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > 
> > > > Here the hardware address width in vt-d, and the one in 
> > > > cpuid.MAXPHYSADDR
> > > > are referring to the same concept. In VM, both are the maximum guest 
> > > > physical
> > > > address width. If we do not check the CPU field here, we will still 
> > > > have to
> > > > check the CPU field in other places such as build_dmar_q35(), and reset 
> > > > the
> > > > s->haw_bits again.
> > > > 
> > > > Is this explanation convincing enough? :)  
> > > current build_dmar_q35() doesn't do it, it's all new code in this series 
> > > that
> > > contains not acceptable direct access from one device (iommu) to another 
> > > (cpu).   
> > > Proper way would be for the owner of iommu to fish limits from somewhere 
> > > and set
> > > values during iommu creation.  
> > 
> > Well, current build_dmar_q35() doesn't do it, because it is using the 
> > incorrect value. :)
> > According to the spec, the host address width is the maximum physical 
> > address width,
> > yet current implementation is using the DMA address width. For me, this is 
> > not only
> > wrong, but also unsecure. For this point, I think we all agree this need to 
> > be fixed.
> > 
> > As to how to fix it - should we query the cpu fields, I still do not 
> > understand why
> > this is not acceptable. :)
> > 
> > I had thought of other approaches before, yet I did not choose:
> > 
> > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to 
> > limit its  
> > physical address width(similar to the "x-aw-bits" for IOVA). But what 
> > should we check
> > this parameter or not? What if this parameter is set to sth. different than 
> > the "phys-bits"
> > or not?
> > 
> > 2> Another choice I had thought of is, to query the physical iommu. I 
> > abandoned this  
> > idea because my understanding is that vIOMMU is not a passthrued device, it 
> > is emulated.
> 
> > So Igor, may I ask why you think checking against the cpu fields so not 
> > acceptable? :)
> Because accessing private 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-20 Thread Eduardo Habkost
On Tue, Dec 18, 2018 at 05:27:23PM +0800, Yu Zhang wrote:
> On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > On Wed, 12 Dec 2018 21:05:38 +0800
> > Yu Zhang  wrote:
> > 
> > > Currently, vIOMMU is using the value of IOVA address width, instead of
> > > the host address width(HAW) to calculate the number of reserved bits in
> > > data structures such as root entries, context entries, and entries of
> > > DMA paging structures etc.
> > > 
> > > However values of IOVA address width and of the HAW may not equal. For
> > > example, a 48-bit IOVA can only be mapped to host addresses no wider than
> > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result
> > > in an invalid IOVA being accepted.
> > > 
> > > To fix this, a new field - haw_bits is introduced in struct 
> > > IntelIOMMUState,
> > > whose value is initialized based on the maximum physical address set to
> > > guest CPU.
> > 
> > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > to clarify.
> > > 
> > > Signed-off-by: Yu Zhang 
> > > Reviewed-by: Peter Xu 
> > > ---
> > [...]
> > 
> > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
> > > *iommu_mr, IOMMUNotifier *n)
> > >  static void vtd_init(IntelIOMMUState *s)
> > >  {
> > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > +CPUState *cs = first_cpu;
> > > +X86CPU *cpu = X86_CPU(cs);
> > >  
> > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > +if (s->aw_bits == VTD_AW_48BIT) {
> > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > >  }
> > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > +s->haw_bits = cpu->phys_bits;
> > Is it possible to avoid accessing CPU fields directly or cpu altogether
> > and set phys_bits when iommu is created?
> 
> Thanks for your comments, Igor.
> 
> Well, I guess you prefer not to query the CPU capabilities while deciding
> the vIOMMU features. But to me, they are not that irrelevant.:)
> 
> Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR
> are referring to the same concept. In VM, both are the maximum guest physical
> address width. If we do not check the CPU field here, we will still have to
> check the CPU field in other places such as build_dmar_q35(), and reset the
> s->haw_bits again.
> 
> Is this explanation convincing enough? :)
> 
> > 
> > Perhaps Eduardo
> >  can suggest better approach, since he's more familiar with phys_bits topic
> 
> @Eduardo, any comments? Thanks!

Configuring IOMMU phys-bits automatically depending on the
configured CPU is OK, but accessing first_cpu directly in iommu
code is.  I suggest delegating this to the machine object, e.g.:

  uint32_t pc_max_phys_bits(PCMachineState *pcms)
  {
  return object_property_get_uint(OBJECT(first_cpu), "phys-bits", 
_abort);
  }

as the machine itself is responsible for creating the CPU
objects, and I believe there are other places in PC code where we
do physical address calculations that could be affected by the
physical address space size.

-- 
Eduardo



Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-19 Thread Yu Zhang
On Wed, Dec 19, 2018 at 11:47:23AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote:
> > On Wed, 19 Dec 2018 10:57:17 +0800
> > Yu Zhang  wrote:
> > 
> > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> > > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > > Yu Zhang  wrote:
> > > >   
> > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:  
> > > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > > Yu Zhang  wrote:
> > > > > >   
> > > > > > > Currently, vIOMMU is using the value of IOVA address width, 
> > > > > > > instead of
> > > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > > bits in
> > > > > > > data structures such as root entries, context entries, and 
> > > > > > > entries of
> > > > > > > DMA paging structures etc.
> > > > > > > 
> > > > > > > However values of IOVA address width and of the HAW may not 
> > > > > > > equal. For
> > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > > wider than
> > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit 
> > > > > > > may result
> > > > > > > in an invalid IOVA being accepted.
> > > > > > > 
> > > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > > IntelIOMMUState,
> > > > > > > whose value is initialized based on the maximum physical address 
> > > > > > > set to
> > > > > > > guest CPU.  
> > > > > >   
> > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > > to clarify.
> > > > > > > 
> > > > > > > Signed-off-by: Yu Zhang 
> > > > > > > Reviewed-by: Peter Xu 
> > > > > > > ---  
> > > > > > [...]
> > > > > >   
> > > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > > >  {
> > > > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > > +CPUState *cs = first_cpu;
> > > > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > > > >  
> > > > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > > >  }
> > > > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > > +s->haw_bits = cpu->phys_bits;  
> > > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > > altogether
> > > > > > and set phys_bits when iommu is created?  
> > > > > 
> > > > > Thanks for your comments, Igor.
> > > > > 
> > > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > > deciding
> > > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > > 
> > > > > Here the hardware address width in vt-d, and the one in 
> > > > > cpuid.MAXPHYSADDR
> > > > > are referring to the same concept. In VM, both are the maximum guest 
> > > > > physical
> > > > > address width. If we do not check the CPU field here, we will still 
> > > > > have to
> > > > > check the CPU field in other places such as build_dmar_q35(), and 
> > > > > reset the
> > > > > s->haw_bits again.
> > > > > 
> > > > > Is this explanation convincing enough? :)  
> > > > current build_dmar_q35() doesn't do it, it's all new code in this 
> > > > series that
> > > > contains not acceptable direct access from one device (iommu) to 
> > > > another (cpu).   
> > > > Proper way would be for the owner of iommu to fish limits from 
> > > > somewhere and set
> > > > values during iommu creation.  
> > > 
> > > Well, current build_dmar_q35() doesn't do it, because it is using the 
> > > incorrect value. :)
> > > According to the spec, the host address width is the maximum physical 
> > > address width,
> > > yet current implementation is using the DMA address width. For me, this 
> > > is not only
> > > wrong, but also unsecure. For this point, I think we all agree this need 
> > > to be fixed.
> > > 
> > > As to how to fix it - should we query the cpu fields, I still do not 
> > > understand why
> > > this is not acceptable. :)
> > > 
> > > I had thought of other approaches before, yet I did not choose:
> > > 
> > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu 
> > > to limit its  
> > > physical address width(similar to the "x-aw-bits" for IOVA). But what 
> > > should we check
> > > this parameter or not? What if this parameter is set to sth. different 
> > > than the "phys-bits"
> > > or not?
> > > 
> > > 2> Another choice I had 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-19 Thread Michael S. Tsirkin
On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote:
> On Wed, 19 Dec 2018 10:57:17 +0800
> Yu Zhang  wrote:
> 
> > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > Yu Zhang  wrote:
> > >   
> > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:  
> > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > Yu Zhang  wrote:
> > > > >   
> > > > > > Currently, vIOMMU is using the value of IOVA address width, instead 
> > > > > > of
> > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > bits in
> > > > > > data structures such as root entries, context entries, and entries 
> > > > > > of
> > > > > > DMA paging structures etc.
> > > > > > 
> > > > > > However values of IOVA address width and of the HAW may not equal. 
> > > > > > For
> > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > wider than
> > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may 
> > > > > > result
> > > > > > in an invalid IOVA being accepted.
> > > > > > 
> > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > IntelIOMMUState,
> > > > > > whose value is initialized based on the maximum physical address 
> > > > > > set to
> > > > > > guest CPU.  
> > > > >   
> > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > to clarify.
> > > > > > 
> > > > > > Signed-off-by: Yu Zhang 
> > > > > > Reviewed-by: Peter Xu 
> > > > > > ---  
> > > > > [...]
> > > > >   
> > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > >  {
> > > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > +CPUState *cs = first_cpu;
> > > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > > >  
> > > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > >  }
> > > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > +s->haw_bits = cpu->phys_bits;  
> > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > altogether
> > > > > and set phys_bits when iommu is created?  
> > > > 
> > > > Thanks for your comments, Igor.
> > > > 
> > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > deciding
> > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > 
> > > > Here the hardware address width in vt-d, and the one in 
> > > > cpuid.MAXPHYSADDR
> > > > are referring to the same concept. In VM, both are the maximum guest 
> > > > physical
> > > > address width. If we do not check the CPU field here, we will still 
> > > > have to
> > > > check the CPU field in other places such as build_dmar_q35(), and reset 
> > > > the
> > > > s->haw_bits again.
> > > > 
> > > > Is this explanation convincing enough? :)  
> > > current build_dmar_q35() doesn't do it, it's all new code in this series 
> > > that
> > > contains not acceptable direct access from one device (iommu) to another 
> > > (cpu).   
> > > Proper way would be for the owner of iommu to fish limits from somewhere 
> > > and set
> > > values during iommu creation.  
> > 
> > Well, current build_dmar_q35() doesn't do it, because it is using the 
> > incorrect value. :)
> > According to the spec, the host address width is the maximum physical 
> > address width,
> > yet current implementation is using the DMA address width. For me, this is 
> > not only
> > wrong, but also unsecure. For this point, I think we all agree this need to 
> > be fixed.
> > 
> > As to how to fix it - should we query the cpu fields, I still do not 
> > understand why
> > this is not acceptable. :)
> > 
> > I had thought of other approaches before, yet I did not choose:
> > 
> > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to 
> > limit its  
> > physical address width(similar to the "x-aw-bits" for IOVA). But what 
> > should we check
> > this parameter or not? What if this parameter is set to sth. different than 
> > the "phys-bits"
> > or not?
> > 
> > 2> Another choice I had thought of is, to query the physical iommu. I 
> > abandoned this  
> > idea because my understanding is that vIOMMU is not a passthrued device, it 
> > is emulated.
> 
> > So Igor, may I ask why you think checking against the cpu fields so not 
> > acceptable? :)
> Because accessing private 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-19 Thread Michael S. Tsirkin
On Wed, Dec 19, 2018 at 02:28:10PM +0800, Yu Zhang wrote:
> On Tue, Dec 18, 2018 at 10:12:45PM -0500, Michael S. Tsirkin wrote:
> > On Wed, Dec 19, 2018 at 11:03:58AM +0800, Yu Zhang wrote:
> > > On Tue, Dec 18, 2018 at 09:58:35AM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> > > > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > > > Yu Zhang  wrote:
> > > > > 
> > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > > > Yu Zhang  wrote:
> > > > > > > 
> > > > > > > > Currently, vIOMMU is using the value of IOVA address width, 
> > > > > > > > instead of
> > > > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > > > bits in
> > > > > > > > data structures such as root entries, context entries, and 
> > > > > > > > entries of
> > > > > > > > DMA paging structures etc.
> > > > > > > > 
> > > > > > > > However values of IOVA address width and of the HAW may not 
> > > > > > > > equal. For
> > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > > > wider than
> > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit 
> > > > > > > > may result
> > > > > > > > in an invalid IOVA being accepted.
> > > > > > > > 
> > > > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > > > IntelIOMMUState,
> > > > > > > > whose value is initialized based on the maximum physical 
> > > > > > > > address set to
> > > > > > > > guest CPU.
> > > > > > > 
> > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > > > to clarify.
> > > > > > > > 
> > > > > > > > Signed-off-by: Yu Zhang 
> > > > > > > > Reviewed-by: Peter Xu 
> > > > > > > > ---
> > > > > > > [...]
> > > > > > > 
> > > > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > > > >  {
> > > > > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > > > +CPUState *cs = first_cpu;
> > > > > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > > > > >  
> > > > > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > > > >  }
> > > > > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > > > +s->haw_bits = cpu->phys_bits;
> > > > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > > > altogether
> > > > > > > and set phys_bits when iommu is created?
> > > > > > 
> > > > > > Thanks for your comments, Igor.
> > > > > > 
> > > > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > > > deciding
> > > > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > > > 
> > > > > > Here the hardware address width in vt-d, and the one in 
> > > > > > cpuid.MAXPHYSADDR
> > > > > > are referring to the same concept. In VM, both are the maximum 
> > > > > > guest physical
> > > > > > address width. If we do not check the CPU field here, we will still 
> > > > > > have to
> > > > > > check the CPU field in other places such as build_dmar_q35(), and 
> > > > > > reset the
> > > > > > s->haw_bits again.
> > > > > > 
> > > > > > Is this explanation convincing enough? :)
> > > > > current build_dmar_q35() doesn't do it, it's all new code in this 
> > > > > series that
> > > > > contains not acceptable direct access from one device (iommu) to 
> > > > > another (cpu).   
> > > > > Proper way would be for the owner of iommu to fish limits from 
> > > > > somewhere and set
> > > > > values during iommu creation.
> > > > 
> > > > Maybe it's a good idea to add documentation for now.
> > > 
> > > Thanks Michael. So what kind of documentation do you refer? 
> > 
> > The idea would be to have two properties, AW for the CPU and
> > the IOMMU. In the documentation explain that they
> > should normally be set to the same value.
> > 
> > > > 
> > > > It would be nice not to push this stuff up the stack,
> > > > it's unfortunate that our internal APIs make it hard.
> > > 
> > > Sorry, I do not quite get it. What do you mean "internal APIs make it 
> > > hard"? :)
> > 
> > The API doesn't actually guarantee any initialization order.
> > CPU happens to be initialized first but I do not
> > think there's a guarantee that it will keep being the case.
> > This makes it hard 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-19 Thread Igor Mammedov
On Wed, 19 Dec 2018 10:57:17 +0800
Yu Zhang  wrote:

> On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> > On Tue, 18 Dec 2018 17:27:23 +0800
> > Yu Zhang  wrote:
> >   
> > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:  
> > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > Yu Zhang  wrote:
> > > >   
> > > > > Currently, vIOMMU is using the value of IOVA address width, instead of
> > > > > the host address width(HAW) to calculate the number of reserved bits 
> > > > > in
> > > > > data structures such as root entries, context entries, and entries of
> > > > > DMA paging structures etc.
> > > > > 
> > > > > However values of IOVA address width and of the HAW may not equal. For
> > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider 
> > > > > than
> > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may 
> > > > > result
> > > > > in an invalid IOVA being accepted.
> > > > > 
> > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > IntelIOMMUState,
> > > > > whose value is initialized based on the maximum physical address set 
> > > > > to
> > > > > guest CPU.  
> > > >   
> > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > to clarify.
> > > > > 
> > > > > Signed-off-by: Yu Zhang 
> > > > > Reviewed-by: Peter Xu 
> > > > > ---  
> > > > [...]
> > > >   
> > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
> > > > > *iommu_mr, IOMMUNotifier *n)
> > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > >  {
> > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > +CPUState *cs = first_cpu;
> > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > >  
> > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > >  }
> > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > +s->haw_bits = cpu->phys_bits;  
> > > > Is it possible to avoid accessing CPU fields directly or cpu altogether
> > > > and set phys_bits when iommu is created?  
> > > 
> > > Thanks for your comments, Igor.
> > > 
> > > Well, I guess you prefer not to query the CPU capabilities while deciding
> > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > 
> > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR
> > > are referring to the same concept. In VM, both are the maximum guest 
> > > physical
> > > address width. If we do not check the CPU field here, we will still have 
> > > to
> > > check the CPU field in other places such as build_dmar_q35(), and reset 
> > > the
> > > s->haw_bits again.
> > > 
> > > Is this explanation convincing enough? :)  
> > current build_dmar_q35() doesn't do it, it's all new code in this series 
> > that
> > contains not acceptable direct access from one device (iommu) to another 
> > (cpu).   
> > Proper way would be for the owner of iommu to fish limits from somewhere 
> > and set
> > values during iommu creation.  
> 
> Well, current build_dmar_q35() doesn't do it, because it is using the 
> incorrect value. :)
> According to the spec, the host address width is the maximum physical address 
> width,
> yet current implementation is using the DMA address width. For me, this is 
> not only
> wrong, but also unsecure. For this point, I think we all agree this need to 
> be fixed.
> 
> As to how to fix it - should we query the cpu fields, I still do not 
> understand why
> this is not acceptable. :)
> 
> I had thought of other approaches before, yet I did not choose:
> 
> 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to 
> limit its  
> physical address width(similar to the "x-aw-bits" for IOVA). But what should 
> we check
> this parameter or not? What if this parameter is set to sth. different than 
> the "phys-bits"
> or not?
> 
> 2> Another choice I had thought of is, to query the physical iommu. I 
> abandoned this  
> idea because my understanding is that vIOMMU is not a passthrued device, it 
> is emulated.

> So Igor, may I ask why you think checking against the cpu fields so not 
> acceptable? :)
Because accessing private fields of device from another random device is not 
robust
and a subject to breaking in unpredictable manner when field meaning or 
initialization
order changes. (analogy to baremetal: one does not solder wire to a CPU die to 
let
access some piece of data from random device).

I've looked at intel-iommu code and how it's created so here is a way to do the 
thing

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-18 Thread Yu Zhang
On Tue, Dec 18, 2018 at 10:12:45PM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 19, 2018 at 11:03:58AM +0800, Yu Zhang wrote:
> > On Tue, Dec 18, 2018 at 09:58:35AM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> > > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > > Yu Zhang  wrote:
> > > > 
> > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > > Yu Zhang  wrote:
> > > > > > 
> > > > > > > Currently, vIOMMU is using the value of IOVA address width, 
> > > > > > > instead of
> > > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > > bits in
> > > > > > > data structures such as root entries, context entries, and 
> > > > > > > entries of
> > > > > > > DMA paging structures etc.
> > > > > > > 
> > > > > > > However values of IOVA address width and of the HAW may not 
> > > > > > > equal. For
> > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > > wider than
> > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit 
> > > > > > > may result
> > > > > > > in an invalid IOVA being accepted.
> > > > > > > 
> > > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > > IntelIOMMUState,
> > > > > > > whose value is initialized based on the maximum physical address 
> > > > > > > set to
> > > > > > > guest CPU.
> > > > > > 
> > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > > to clarify.
> > > > > > > 
> > > > > > > Signed-off-by: Yu Zhang 
> > > > > > > Reviewed-by: Peter Xu 
> > > > > > > ---
> > > > > > [...]
> > > > > > 
> > > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > > >  {
> > > > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > > +CPUState *cs = first_cpu;
> > > > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > > > >  
> > > > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > > >  }
> > > > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > > +s->haw_bits = cpu->phys_bits;
> > > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > > altogether
> > > > > > and set phys_bits when iommu is created?
> > > > > 
> > > > > Thanks for your comments, Igor.
> > > > > 
> > > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > > deciding
> > > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > > 
> > > > > Here the hardware address width in vt-d, and the one in 
> > > > > cpuid.MAXPHYSADDR
> > > > > are referring to the same concept. In VM, both are the maximum guest 
> > > > > physical
> > > > > address width. If we do not check the CPU field here, we will still 
> > > > > have to
> > > > > check the CPU field in other places such as build_dmar_q35(), and 
> > > > > reset the
> > > > > s->haw_bits again.
> > > > > 
> > > > > Is this explanation convincing enough? :)
> > > > current build_dmar_q35() doesn't do it, it's all new code in this 
> > > > series that
> > > > contains not acceptable direct access from one device (iommu) to 
> > > > another (cpu).   
> > > > Proper way would be for the owner of iommu to fish limits from 
> > > > somewhere and set
> > > > values during iommu creation.
> > > 
> > > Maybe it's a good idea to add documentation for now.
> > 
> > Thanks Michael. So what kind of documentation do you refer? 
> 
> The idea would be to have two properties, AW for the CPU and
> the IOMMU. In the documentation explain that they
> should normally be set to the same value.
> 
> > > 
> > > It would be nice not to push this stuff up the stack,
> > > it's unfortunate that our internal APIs make it hard.
> > 
> > Sorry, I do not quite get it. What do you mean "internal APIs make it 
> > hard"? :)
> 
> The API doesn't actually guarantee any initialization order.
> CPU happens to be initialized first but I do not
> think there's a guarantee that it will keep being the case.
> This makes it hard to get properties from one device
> and use in another one.
> 

Oops...
Then there can be no easy way in the runtime to gurantee this. BTW, could we
initialize CPU before other components? Is it hard to do, or not reasonable
to do so?

I have plan to draft a doc in qemu on 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-18 Thread Yu Zhang
On Tue, Dec 18, 2018 at 09:58:35AM -0500, Michael S. Tsirkin wrote:
> On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> > On Tue, 18 Dec 2018 17:27:23 +0800
> > Yu Zhang  wrote:
> > 
> > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > Yu Zhang  wrote:
> > > > 
> > > > > Currently, vIOMMU is using the value of IOVA address width, instead of
> > > > > the host address width(HAW) to calculate the number of reserved bits 
> > > > > in
> > > > > data structures such as root entries, context entries, and entries of
> > > > > DMA paging structures etc.
> > > > > 
> > > > > However values of IOVA address width and of the HAW may not equal. For
> > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider 
> > > > > than
> > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may 
> > > > > result
> > > > > in an invalid IOVA being accepted.
> > > > > 
> > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > IntelIOMMUState,
> > > > > whose value is initialized based on the maximum physical address set 
> > > > > to
> > > > > guest CPU.
> > > > 
> > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > to clarify.
> > > > > 
> > > > > Signed-off-by: Yu Zhang 
> > > > > Reviewed-by: Peter Xu 
> > > > > ---
> > > > [...]
> > > > 
> > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
> > > > > *iommu_mr, IOMMUNotifier *n)
> > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > >  {
> > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > +CPUState *cs = first_cpu;
> > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > >  
> > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > >  }
> > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > +s->haw_bits = cpu->phys_bits;
> > > > Is it possible to avoid accessing CPU fields directly or cpu altogether
> > > > and set phys_bits when iommu is created?
> > > 
> > > Thanks for your comments, Igor.
> > > 
> > > Well, I guess you prefer not to query the CPU capabilities while deciding
> > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > 
> > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR
> > > are referring to the same concept. In VM, both are the maximum guest 
> > > physical
> > > address width. If we do not check the CPU field here, we will still have 
> > > to
> > > check the CPU field in other places such as build_dmar_q35(), and reset 
> > > the
> > > s->haw_bits again.
> > > 
> > > Is this explanation convincing enough? :)
> > current build_dmar_q35() doesn't do it, it's all new code in this series 
> > that
> > contains not acceptable direct access from one device (iommu) to another 
> > (cpu).   
> > Proper way would be for the owner of iommu to fish limits from somewhere 
> > and set
> > values during iommu creation.
> 
> Maybe it's a good idea to add documentation for now.

Thanks Michael. So what kind of documentation do you refer? 

> 
> It would be nice not to push this stuff up the stack,
> it's unfortunate that our internal APIs make it hard.

Sorry, I do not quite get it. What do you mean "internal APIs make it hard"? :)

> 
> 
> > > > 
> > > > Perhaps Eduardo
> > > >  can suggest better approach, since he's more familiar with phys_bits 
> > > > topic
> > > 
> > > @Eduardo, any comments? Thanks!
> > > 
> > > > 
> > > > >  /*
> > > > >   * Rsvd field masks for spte
> > > > >   */
> > > > >  vtd_paging_entry_rsvd_field[0] = ~0ULL;
> > > > > -vtd_paging_entry_rsvd_field[1] = 
> > > > > VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
> > > > > -vtd_paging_entry_rsvd_field[2] = 
> > > > > VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
> > > > > -vtd_paging_entry_rsvd_field[3] = 
> > > > > VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
> > > > > -vtd_paging_entry_rsvd_field[4] = 
> > > > > VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
> > > > > -vtd_paging_entry_rsvd_field[5] = 
> > > > > VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits);
> > > > > -vtd_paging_entry_rsvd_field[6] = 
> > > > > VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits);
> > > > > -vtd_paging_entry_rsvd_field[7] = 
> > > > > VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits);
> > > > > -vtd_paging_entry_rsvd_field[8] = 
> > > > > VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits);
> > > > > +vtd_paging_entry_rsvd_field[1] = 
> > > > > 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-18 Thread Michael S. Tsirkin
On Wed, Dec 19, 2018 at 11:03:58AM +0800, Yu Zhang wrote:
> On Tue, Dec 18, 2018 at 09:58:35AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > Yu Zhang  wrote:
> > > 
> > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > Yu Zhang  wrote:
> > > > > 
> > > > > > Currently, vIOMMU is using the value of IOVA address width, instead 
> > > > > > of
> > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > bits in
> > > > > > data structures such as root entries, context entries, and entries 
> > > > > > of
> > > > > > DMA paging structures etc.
> > > > > > 
> > > > > > However values of IOVA address width and of the HAW may not equal. 
> > > > > > For
> > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > wider than
> > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may 
> > > > > > result
> > > > > > in an invalid IOVA being accepted.
> > > > > > 
> > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > IntelIOMMUState,
> > > > > > whose value is initialized based on the maximum physical address 
> > > > > > set to
> > > > > > guest CPU.
> > > > > 
> > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > to clarify.
> > > > > > 
> > > > > > Signed-off-by: Yu Zhang 
> > > > > > Reviewed-by: Peter Xu 
> > > > > > ---
> > > > > [...]
> > > > > 
> > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > >  {
> > > > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > +CPUState *cs = first_cpu;
> > > > > > +X86CPU *cpu = X86_CPU(cs);
> > > > > >  
> > > > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > >  }
> > > > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > +s->haw_bits = cpu->phys_bits;
> > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > altogether
> > > > > and set phys_bits when iommu is created?
> > > > 
> > > > Thanks for your comments, Igor.
> > > > 
> > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > deciding
> > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > 
> > > > Here the hardware address width in vt-d, and the one in 
> > > > cpuid.MAXPHYSADDR
> > > > are referring to the same concept. In VM, both are the maximum guest 
> > > > physical
> > > > address width. If we do not check the CPU field here, we will still 
> > > > have to
> > > > check the CPU field in other places such as build_dmar_q35(), and reset 
> > > > the
> > > > s->haw_bits again.
> > > > 
> > > > Is this explanation convincing enough? :)
> > > current build_dmar_q35() doesn't do it, it's all new code in this series 
> > > that
> > > contains not acceptable direct access from one device (iommu) to another 
> > > (cpu).   
> > > Proper way would be for the owner of iommu to fish limits from somewhere 
> > > and set
> > > values during iommu creation.
> > 
> > Maybe it's a good idea to add documentation for now.
> 
> Thanks Michael. So what kind of documentation do you refer? 

The idea would be to have two properties, AW for the CPU and
the IOMMU. In the documentation explain that they
should normally be set to the same value.

> > 
> > It would be nice not to push this stuff up the stack,
> > it's unfortunate that our internal APIs make it hard.
> 
> Sorry, I do not quite get it. What do you mean "internal APIs make it hard"? 
> :)

The API doesn't actually guarantee any initialization order.
CPU happens to be initialized first but I do not
think there's a guarantee that it will keep being the case.
This makes it hard to get properties from one device
and use in another one.

> > 
> > 
> > > > > 
> > > > > Perhaps Eduardo
> > > > >  can suggest better approach, since he's more familiar with phys_bits 
> > > > > topic
> > > > 
> > > > @Eduardo, any comments? Thanks!
> > > > 
> > > > > 
> > > > > >  /*
> > > > > >   * Rsvd field masks for spte
> > > > > >   */
> > > > > >  vtd_paging_entry_rsvd_field[0] = ~0ULL;
> > > > > > -vtd_paging_entry_rsvd_field[1] = 
> > > > > > VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
> > > > > > -

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-18 Thread Yu Zhang
On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> On Tue, 18 Dec 2018 17:27:23 +0800
> Yu Zhang  wrote:
> 
> > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > Yu Zhang  wrote:
> > > 
> > > > Currently, vIOMMU is using the value of IOVA address width, instead of
> > > > the host address width(HAW) to calculate the number of reserved bits in
> > > > data structures such as root entries, context entries, and entries of
> > > > DMA paging structures etc.
> > > > 
> > > > However values of IOVA address width and of the HAW may not equal. For
> > > > example, a 48-bit IOVA can only be mapped to host addresses no wider 
> > > > than
> > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may 
> > > > result
> > > > in an invalid IOVA being accepted.
> > > > 
> > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > IntelIOMMUState,
> > > > whose value is initialized based on the maximum physical address set to
> > > > guest CPU.
> > > 
> > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > to clarify.
> > > > 
> > > > Signed-off-by: Yu Zhang 
> > > > Reviewed-by: Peter Xu 
> > > > ---
> > > [...]
> > > 
> > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
> > > > *iommu_mr, IOMMUNotifier *n)
> > > >  static void vtd_init(IntelIOMMUState *s)
> > > >  {
> > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > +CPUState *cs = first_cpu;
> > > > +X86CPU *cpu = X86_CPU(cs);
> > > >  
> > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > >  }
> > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > +s->haw_bits = cpu->phys_bits;
> > > Is it possible to avoid accessing CPU fields directly or cpu altogether
> > > and set phys_bits when iommu is created?
> > 
> > Thanks for your comments, Igor.
> > 
> > Well, I guess you prefer not to query the CPU capabilities while deciding
> > the vIOMMU features. But to me, they are not that irrelevant.:)
> > 
> > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR
> > are referring to the same concept. In VM, both are the maximum guest 
> > physical
> > address width. If we do not check the CPU field here, we will still have to
> > check the CPU field in other places such as build_dmar_q35(), and reset the
> > s->haw_bits again.
> > 
> > Is this explanation convincing enough? :)
> current build_dmar_q35() doesn't do it, it's all new code in this series that
> contains not acceptable direct access from one device (iommu) to another 
> (cpu).   
> Proper way would be for the owner of iommu to fish limits from somewhere and 
> set
> values during iommu creation.

Well, current build_dmar_q35() doesn't do it, because it is using the incorrect 
value. :)
According to the spec, the host address width is the maximum physical address 
width,
yet current implementation is using the DMA address width. For me, this is not 
only
wrong, but also unsecure. For this point, I think we all agree this need to be 
fixed.

As to how to fix it - should we query the cpu fields, I still do not understand 
why
this is not acceptable. :)

I had thought of other approaches before, yet I did not choose:

1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to 
limit its
physical address width(similar to the "x-aw-bits" for IOVA). But what should we 
check
this parameter or not? What if this parameter is set to sth. different than the 
"phys-bits"
or not?

2> Another choice I had thought of is, to query the physical iommu. I abandoned 
this
idea because my understanding is that vIOMMU is not a passthrued device, it is 
emulated.

So Igor, may I ask why you think checking against the cpu fields so not 
acceptable? :)

> 
> > > 
> > > Perhaps Eduardo
> > >  can suggest better approach, since he's more familiar with phys_bits 
> > > topic
> > 
> > @Eduardo, any comments? Thanks!
> > 
> > > 
> > > >  /*
> > > >   * Rsvd field masks for spte
> > > >   */
> > > >  vtd_paging_entry_rsvd_field[0] = ~0ULL;
> > > > -vtd_paging_entry_rsvd_field[1] = 
> > > > VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
> > > > -vtd_paging_entry_rsvd_field[2] = 
> > > > VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
> > > > -vtd_paging_entry_rsvd_field[3] = 
> > > > VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
> > > > -vtd_paging_entry_rsvd_field[4] = 
> > > > VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
> > > > -

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-18 Thread Michael S. Tsirkin
On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> On Tue, 18 Dec 2018 17:27:23 +0800
> Yu Zhang  wrote:
> 
> > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > Yu Zhang  wrote:
> > > 
> > > > Currently, vIOMMU is using the value of IOVA address width, instead of
> > > > the host address width(HAW) to calculate the number of reserved bits in
> > > > data structures such as root entries, context entries, and entries of
> > > > DMA paging structures etc.
> > > > 
> > > > However values of IOVA address width and of the HAW may not equal. For
> > > > example, a 48-bit IOVA can only be mapped to host addresses no wider 
> > > > than
> > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may 
> > > > result
> > > > in an invalid IOVA being accepted.
> > > > 
> > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > IntelIOMMUState,
> > > > whose value is initialized based on the maximum physical address set to
> > > > guest CPU.
> > > 
> > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > to clarify.
> > > > 
> > > > Signed-off-by: Yu Zhang 
> > > > Reviewed-by: Peter Xu 
> > > > ---
> > > [...]
> > > 
> > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
> > > > *iommu_mr, IOMMUNotifier *n)
> > > >  static void vtd_init(IntelIOMMUState *s)
> > > >  {
> > > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > +CPUState *cs = first_cpu;
> > > > +X86CPU *cpu = X86_CPU(cs);
> > > >  
> > > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > +if (s->aw_bits == VTD_AW_48BIT) {
> > > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > > >  }
> > > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > +s->haw_bits = cpu->phys_bits;
> > > Is it possible to avoid accessing CPU fields directly or cpu altogether
> > > and set phys_bits when iommu is created?
> > 
> > Thanks for your comments, Igor.
> > 
> > Well, I guess you prefer not to query the CPU capabilities while deciding
> > the vIOMMU features. But to me, they are not that irrelevant.:)
> > 
> > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR
> > are referring to the same concept. In VM, both are the maximum guest 
> > physical
> > address width. If we do not check the CPU field here, we will still have to
> > check the CPU field in other places such as build_dmar_q35(), and reset the
> > s->haw_bits again.
> > 
> > Is this explanation convincing enough? :)
> current build_dmar_q35() doesn't do it, it's all new code in this series that
> contains not acceptable direct access from one device (iommu) to another 
> (cpu).   
> Proper way would be for the owner of iommu to fish limits from somewhere and 
> set
> values during iommu creation.

Maybe it's a good idea to add documentation for now.

It would be nice not to push this stuff up the stack,
it's unfortunate that our internal APIs make it hard.


> > > 
> > > Perhaps Eduardo
> > >  can suggest better approach, since he's more familiar with phys_bits 
> > > topic
> > 
> > @Eduardo, any comments? Thanks!
> > 
> > > 
> > > >  /*
> > > >   * Rsvd field masks for spte
> > > >   */
> > > >  vtd_paging_entry_rsvd_field[0] = ~0ULL;
> > > > -vtd_paging_entry_rsvd_field[1] = 
> > > > VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
> > > > -vtd_paging_entry_rsvd_field[2] = 
> > > > VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
> > > > -vtd_paging_entry_rsvd_field[3] = 
> > > > VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
> > > > -vtd_paging_entry_rsvd_field[4] = 
> > > > VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
> > > > -vtd_paging_entry_rsvd_field[5] = 
> > > > VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits);
> > > > -vtd_paging_entry_rsvd_field[6] = 
> > > > VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits);
> > > > -vtd_paging_entry_rsvd_field[7] = 
> > > > VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits);
> > > > -vtd_paging_entry_rsvd_field[8] = 
> > > > VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits);
> > > > +vtd_paging_entry_rsvd_field[1] = 
> > > > VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits);
> > > > +vtd_paging_entry_rsvd_field[2] = 
> > > > VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits);
> > > > +vtd_paging_entry_rsvd_field[3] = 
> > > > VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits);
> > > > +vtd_paging_entry_rsvd_field[4] = 
> > > > VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits);
> > > > +vtd_paging_entry_rsvd_field[5] = 
> > > > VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits);
> > > > +vtd_paging_entry_rsvd_field[6] = 
> > > > 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-18 Thread Igor Mammedov
On Tue, 18 Dec 2018 17:27:23 +0800
Yu Zhang  wrote:

> On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > On Wed, 12 Dec 2018 21:05:38 +0800
> > Yu Zhang  wrote:
> > 
> > > Currently, vIOMMU is using the value of IOVA address width, instead of
> > > the host address width(HAW) to calculate the number of reserved bits in
> > > data structures such as root entries, context entries, and entries of
> > > DMA paging structures etc.
> > > 
> > > However values of IOVA address width and of the HAW may not equal. For
> > > example, a 48-bit IOVA can only be mapped to host addresses no wider than
> > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result
> > > in an invalid IOVA being accepted.
> > > 
> > > To fix this, a new field - haw_bits is introduced in struct 
> > > IntelIOMMUState,
> > > whose value is initialized based on the maximum physical address set to
> > > guest CPU.
> > 
> > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > to clarify.
> > > 
> > > Signed-off-by: Yu Zhang 
> > > Reviewed-by: Peter Xu 
> > > ---
> > [...]
> > 
> > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
> > > *iommu_mr, IOMMUNotifier *n)
> > >  static void vtd_init(IntelIOMMUState *s)
> > >  {
> > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > +CPUState *cs = first_cpu;
> > > +X86CPU *cpu = X86_CPU(cs);
> > >  
> > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > +if (s->aw_bits == VTD_AW_48BIT) {
> > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > >  }
> > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > +s->haw_bits = cpu->phys_bits;
> > Is it possible to avoid accessing CPU fields directly or cpu altogether
> > and set phys_bits when iommu is created?
> 
> Thanks for your comments, Igor.
> 
> Well, I guess you prefer not to query the CPU capabilities while deciding
> the vIOMMU features. But to me, they are not that irrelevant.:)
> 
> Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR
> are referring to the same concept. In VM, both are the maximum guest physical
> address width. If we do not check the CPU field here, we will still have to
> check the CPU field in other places such as build_dmar_q35(), and reset the
> s->haw_bits again.
> 
> Is this explanation convincing enough? :)
current build_dmar_q35() doesn't do it, it's all new code in this series that
contains not acceptable direct access from one device (iommu) to another (cpu). 
  
Proper way would be for the owner of iommu to fish limits from somewhere and set
values during iommu creation.

> > 
> > Perhaps Eduardo
> >  can suggest better approach, since he's more familiar with phys_bits topic
> 
> @Eduardo, any comments? Thanks!
> 
> > 
> > >  /*
> > >   * Rsvd field masks for spte
> > >   */
> > >  vtd_paging_entry_rsvd_field[0] = ~0ULL;
> > > -vtd_paging_entry_rsvd_field[1] = 
> > > VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[2] = 
> > > VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[3] = 
> > > VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[4] = 
> > > VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[5] = 
> > > VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[6] = 
> > > VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[7] = 
> > > VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[8] = 
> > > VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits);
> > > +vtd_paging_entry_rsvd_field[1] = 
> > > VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[2] = 
> > > VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[3] = 
> > > VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[4] = 
> > > VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[5] = 
> > > VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[6] = 
> > > VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[7] = 
> > > VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[8] = 
> > > VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits);
> > >  
> > >  if (x86_iommu->intr_supported) {
> > >  s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
> > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, 
> > > Error **errp)
> > >  }
> > >  
> > >  /* Currently only address widths 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-18 Thread Michael S. Tsirkin
On Tue, Dec 18, 2018 at 05:27:23PM +0800, Yu Zhang wrote:
> On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> > On Wed, 12 Dec 2018 21:05:38 +0800
> > Yu Zhang  wrote:
> > 
> > > Currently, vIOMMU is using the value of IOVA address width, instead of
> > > the host address width(HAW) to calculate the number of reserved bits in
> > > data structures such as root entries, context entries, and entries of
> > > DMA paging structures etc.
> > > 
> > > However values of IOVA address width and of the HAW may not equal. For
> > > example, a 48-bit IOVA can only be mapped to host addresses no wider than
> > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result
> > > in an invalid IOVA being accepted.
> > > 
> > > To fix this, a new field - haw_bits is introduced in struct 
> > > IntelIOMMUState,
> > > whose value is initialized based on the maximum physical address set to
> > > guest CPU.
> > 
> > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > to clarify.
> > > 
> > > Signed-off-by: Yu Zhang 
> > > Reviewed-by: Peter Xu 
> > > ---
> > [...]
> > 
> > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
> > > *iommu_mr, IOMMUNotifier *n)
> > >  static void vtd_init(IntelIOMMUState *s)
> > >  {
> > >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > +CPUState *cs = first_cpu;
> > > +X86CPU *cpu = X86_CPU(cs);
> > >  
> > >  memset(s->csr, 0, DMAR_REG_SIZE);
> > >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > +if (s->aw_bits == VTD_AW_48BIT) {
> > >  s->cap |= VTD_CAP_SAGAW_48bit;
> > >  }
> > >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > +s->haw_bits = cpu->phys_bits;
> > Is it possible to avoid accessing CPU fields directly or cpu altogether
> > and set phys_bits when iommu is created?
> 
> Thanks for your comments, Igor.
> 
> Well, I guess you prefer not to query the CPU capabilities while deciding
> the vIOMMU features. But to me, they are not that irrelevant.:)
> 
> Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR
> are referring to the same concept. In VM, both are the maximum guest physical
> address width. If we do not check the CPU field here, we will still have to
> check the CPU field in other places such as build_dmar_q35(), and reset the
> s->haw_bits again.
> 
> Is this explanation convincing enough? :)

So what happens if these don't match? I guess guest can configure the vtd
to put data into some memory which isn't then accessible to
the cpu, or cpu can use some memory not accessible to devices.

I guess some guests might be confused - is this what you
observe? If yes some comments that tell people which
guests get confused would be nice. Is windows happy? Is linux happy?



> > 
> > Perhaps Eduardo
> >  can suggest better approach, since he's more familiar with phys_bits topic
> 
> @Eduardo, any comments? Thanks!
> 
> > 
> > >  /*
> > >   * Rsvd field masks for spte
> > >   */
> > >  vtd_paging_entry_rsvd_field[0] = ~0ULL;
> > > -vtd_paging_entry_rsvd_field[1] = 
> > > VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[2] = 
> > > VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[3] = 
> > > VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[4] = 
> > > VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[5] = 
> > > VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[6] = 
> > > VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[7] = 
> > > VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits);
> > > -vtd_paging_entry_rsvd_field[8] = 
> > > VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits);
> > > +vtd_paging_entry_rsvd_field[1] = 
> > > VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[2] = 
> > > VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[3] = 
> > > VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[4] = 
> > > VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[5] = 
> > > VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[6] = 
> > > VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[7] = 
> > > VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits);
> > > +vtd_paging_entry_rsvd_field[8] = 
> > > VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits);
> > >  
> > >  if (x86_iommu->intr_supported) {
> > >  s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
> > > @@ -3261,10 +3268,10 @@ static bool 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-18 Thread Yu Zhang
On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:
> On Wed, 12 Dec 2018 21:05:38 +0800
> Yu Zhang  wrote:
> 
> > Currently, vIOMMU is using the value of IOVA address width, instead of
> > the host address width(HAW) to calculate the number of reserved bits in
> > data structures such as root entries, context entries, and entries of
> > DMA paging structures etc.
> > 
> > However values of IOVA address width and of the HAW may not equal. For
> > example, a 48-bit IOVA can only be mapped to host addresses no wider than
> > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result
> > in an invalid IOVA being accepted.
> > 
> > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState,
> > whose value is initialized based on the maximum physical address set to
> > guest CPU.
> 
> > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > to clarify.
> > 
> > Signed-off-by: Yu Zhang 
> > Reviewed-by: Peter Xu 
> > ---
> [...]
> 
> > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
> > *iommu_mr, IOMMUNotifier *n)
> >  static void vtd_init(IntelIOMMUState *s)
> >  {
> >  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > +CPUState *cs = first_cpu;
> > +X86CPU *cpu = X86_CPU(cs);
> >  
> >  memset(s->csr, 0, DMAR_REG_SIZE);
> >  memset(s->wmask, 0, DMAR_REG_SIZE);
> > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> >  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> >   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> >   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > +if (s->aw_bits == VTD_AW_48BIT) {
> >  s->cap |= VTD_CAP_SAGAW_48bit;
> >  }
> >  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > +s->haw_bits = cpu->phys_bits;
> Is it possible to avoid accessing CPU fields directly or cpu altogether
> and set phys_bits when iommu is created?

Thanks for your comments, Igor.

Well, I guess you prefer not to query the CPU capabilities while deciding
the vIOMMU features. But to me, they are not that irrelevant.:)

Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR
are referring to the same concept. In VM, both are the maximum guest physical
address width. If we do not check the CPU field here, we will still have to
check the CPU field in other places such as build_dmar_q35(), and reset the
s->haw_bits again.

Is this explanation convincing enough? :)

> 
> Perhaps Eduardo
>  can suggest better approach, since he's more familiar with phys_bits topic

@Eduardo, any comments? Thanks!

> 
> >  /*
> >   * Rsvd field masks for spte
> >   */
> >  vtd_paging_entry_rsvd_field[0] = ~0ULL;
> > -vtd_paging_entry_rsvd_field[1] = 
> > VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
> > -vtd_paging_entry_rsvd_field[2] = 
> > VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
> > -vtd_paging_entry_rsvd_field[3] = 
> > VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
> > -vtd_paging_entry_rsvd_field[4] = 
> > VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
> > -vtd_paging_entry_rsvd_field[5] = 
> > VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits);
> > -vtd_paging_entry_rsvd_field[6] = 
> > VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits);
> > -vtd_paging_entry_rsvd_field[7] = 
> > VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits);
> > -vtd_paging_entry_rsvd_field[8] = 
> > VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits);
> > +vtd_paging_entry_rsvd_field[1] = 
> > VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits);
> > +vtd_paging_entry_rsvd_field[2] = 
> > VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits);
> > +vtd_paging_entry_rsvd_field[3] = 
> > VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits);
> > +vtd_paging_entry_rsvd_field[4] = 
> > VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits);
> > +vtd_paging_entry_rsvd_field[5] = 
> > VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits);
> > +vtd_paging_entry_rsvd_field[6] = 
> > VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits);
> > +vtd_paging_entry_rsvd_field[7] = 
> > VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits);
> > +vtd_paging_entry_rsvd_field[8] = 
> > VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits);
> >  
> >  if (x86_iommu->intr_supported) {
> >  s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
> > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, 
> > Error **errp)
> >  }
> >  
> >  /* Currently only address widths supported are 39 and 48 bits */
> > -if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
> > -(s->aw_bits != VTD_HOST_AW_48BIT)) {
> > +if ((s->aw_bits != VTD_AW_39BIT) &&
> > +(s->aw_bits != VTD_AW_48BIT)) {
> >  error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
> > -   VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT);
> > +   VTD_AW_39BIT, VTD_AW_48BIT);
> >  return false;
> >  }
> >  
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index 

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-17 Thread Igor Mammedov
On Wed, 12 Dec 2018 21:05:38 +0800
Yu Zhang  wrote:

> Currently, vIOMMU is using the value of IOVA address width, instead of
> the host address width(HAW) to calculate the number of reserved bits in
> data structures such as root entries, context entries, and entries of
> DMA paging structures etc.
> 
> However values of IOVA address width and of the HAW may not equal. For
> example, a 48-bit IOVA can only be mapped to host addresses no wider than
> 46 bits. Using 48, instead of 46 to calculate the reserved bit may result
> in an invalid IOVA being accepted.
> 
> To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState,
> whose value is initialized based on the maximum physical address set to
> guest CPU.

> Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> to clarify.
> 
> Signed-off-by: Yu Zhang 
> Reviewed-by: Peter Xu 
> ---
[...]

> @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
> *iommu_mr, IOMMUNotifier *n)
>  static void vtd_init(IntelIOMMUState *s)
>  {
>  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> +CPUState *cs = first_cpu;
> +X86CPU *cpu = X86_CPU(cs);
>  
>  memset(s->csr, 0, DMAR_REG_SIZE);
>  memset(s->wmask, 0, DMAR_REG_SIZE);
> @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
>  s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
>   VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
>   VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> -if (s->aw_bits == VTD_HOST_AW_48BIT) {
> +if (s->aw_bits == VTD_AW_48BIT) {
>  s->cap |= VTD_CAP_SAGAW_48bit;
>  }
>  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> +s->haw_bits = cpu->phys_bits;
Is it possible to avoid accessing CPU fields directly or cpu altogether
and set phys_bits when iommu is created?

Perhaps Eduardo
 can suggest better approach, since he's more familiar with phys_bits topic

>  /*
>   * Rsvd field masks for spte
>   */
>  vtd_paging_entry_rsvd_field[0] = ~0ULL;
> -vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
> -vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
> -vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
> -vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
> -vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits);
> -vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits);
> -vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits);
> -vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits);
> +vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits);
> +vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits);
> +vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits);
> +vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits);
> +vtd_paging_entry_rsvd_field[5] = 
> VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits);
> +vtd_paging_entry_rsvd_field[6] = 
> VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits);
> +vtd_paging_entry_rsvd_field[7] = 
> VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits);
> +vtd_paging_entry_rsvd_field[8] = 
> VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits);
>  
>  if (x86_iommu->intr_supported) {
>  s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
> @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, 
> Error **errp)
>  }
>  
>  /* Currently only address widths supported are 39 and 48 bits */
> -if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
> -(s->aw_bits != VTD_HOST_AW_48BIT)) {
> +if ((s->aw_bits != VTD_AW_39BIT) &&
> +(s->aw_bits != VTD_AW_48BIT)) {
>  error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
> -   VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT);
> +   VTD_AW_39BIT, VTD_AW_48BIT);
>  return false;
>  }
>  
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index ed4e758..820451c 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -47,9 +47,9 @@
>  #define VTD_SID_TO_DEVFN(sid)   ((sid) & 0xff)
>  
>  #define DMAR_REG_SIZE   0x230
> -#define VTD_HOST_AW_39BIT   39
> -#define VTD_HOST_AW_48BIT   48
> -#define VTD_HOST_ADDRESS_WIDTH  VTD_HOST_AW_39BIT
> +#define VTD_AW_39BIT39
> +#define VTD_AW_48BIT48
> +#define VTD_ADDRESS_WIDTH   VTD_AW_39BIT
>  #define VTD_HAW_MASK(aw)((1ULL << (aw)) - 1)
>  
>  #define DMAR_REPORT_F_INTR  (1)
> @@ -244,7 +244,8 @@ struct IntelIOMMUState {
>  bool intr_eime; /* Extended interrupt mode enabled */
>  OnOffAuto intr_eim; /* Toggle for EIM cabability */
>  bool buggy_eim; /* Force buggy EIM unless eim=off */
> -uint8_t 

[Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

2018-12-12 Thread Yu Zhang
Currently, vIOMMU is using the value of IOVA address width, instead of
the host address width(HAW) to calculate the number of reserved bits in
data structures such as root entries, context entries, and entries of
DMA paging structures etc.

However values of IOVA address width and of the HAW may not equal. For
example, a 48-bit IOVA can only be mapped to host addresses no wider than
46 bits. Using 48, instead of 46 to calculate the reserved bit may result
in an invalid IOVA being accepted.

To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState,
whose value is initialized based on the maximum physical address set to
guest CPU. Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
to clarify.

Signed-off-by: Yu Zhang 
Reviewed-by: Peter Xu 
---
Cc: "Michael S. Tsirkin"  
Cc: Igor Mammedov  
Cc: Marcel Apfelbaum 
Cc: Paolo Bonzini  
Cc: Richard Henderson  
Cc: Eduardo Habkost 
Cc: Peter Xu 
---
 hw/i386/acpi-build.c  |  2 +-
 hw/i386/intel_iommu.c | 55 ---
 include/hw/i386/intel_iommu.h |  9 +++
 3 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 236a20e..b989523 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2431,7 +2431,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
 }
 
 dmar = acpi_data_push(table_data, sizeof(*dmar));
-dmar->host_address_width = intel_iommu->aw_bits - 1;
+dmar->host_address_width = intel_iommu->haw_bits - 1;
 dmar->flags = dmar_flags;
 
 /* DMAR Remapping Hardware Unit Definition structure */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d97bcbc..0e88c63 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -707,7 +707,8 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, 
uint8_t bus_num)
  */
 static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
  uint64_t *slptep, uint32_t *slpte_level,
- bool *reads, bool *writes, uint8_t aw_bits)
+ bool *reads, bool *writes, uint8_t aw_bits,
+ uint8_t haw_bits)
 {
 dma_addr_t addr = vtd_ce_get_slpt_base(ce);
 uint32_t level = vtd_ce_get_level(ce);
@@ -760,7 +761,7 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t 
iova, bool is_write,
 *slpte_level = level;
 return 0;
 }
-addr = vtd_get_slpte_addr(slpte, aw_bits);
+addr = vtd_get_slpte_addr(slpte, haw_bits);
 level--;
 }
 }
@@ -783,6 +784,7 @@ typedef struct {
 void *private;
 bool notify_unmap;
 uint8_t aw;
+uint8_t haw;
 uint16_t domain_id;
 } vtd_page_walk_info;
 
@@ -925,7 +927,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
  * This is a valid PDE (or even bigger than PDE).  We need
  * to walk one further level.
  */
-ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
+ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->haw),
   iova, MIN(iova_next, end), level - 1,
   read_cur, write_cur, info);
 } else {
@@ -942,7 +944,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
 entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
 entry.addr_mask = ~subpage_mask;
 /* NOTE: this is only meaningful if entry_valid == true */
-entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
+entry.translated_addr = vtd_get_slpte_addr(slpte, info->haw);
 ret = vtd_page_walk_one(, info);
 }
 
@@ -1002,7 +1004,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, 
uint8_t bus_num,
 return -VTD_FR_ROOT_ENTRY_P;
 }
 
-if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD(s->aw_bits))) {
+if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD(s->haw_bits))) {
 trace_vtd_re_invalid(re.rsvd, re.val);
 return -VTD_FR_ROOT_ENTRY_RSVD;
 }
@@ -1019,7 +1021,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, 
uint8_t bus_num,
 }
 
 if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
-   (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(s->aw_bits))) {
+   (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(s->haw_bits))) {
 trace_vtd_ce_invalid(ce->hi, ce->lo);
 return -VTD_FR_CONTEXT_ENTRY_RSVD;
 }
@@ -1056,6 +1058,7 @@ static int 
vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
 .private = (void *)_as->iommu,
 .notify_unmap = true,
 .aw = s->aw_bits,
+.haw = s->haw_bits,
 .as = vtd_as,
 .domain_id = VTD_CONTEXT_ENTRY_DID(ce->hi),
 };
@@ -1360,7 +1363,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 }
 
 ret_fr =