Han, Weidong wrote:
>> If we devolve this to the iommu API, the same io page table can be
>> shared by all iommus, so long as they all use the same page table
>> format. 
>>     
>
> I don't understand how to handle this by iommu API. Let me explain my
> thoughts more clearly: 
>
> VT-d spec says: 
>       Context-entries programmed with the same domain identifier must
> always reference the same address translation structure (through the ASR
> field). Similarly, context-entries referencing the same address
> translation structure must be programmed with the same domain id. 
>
> In native VT-d driver, dmar_domain is per device, and has its own VT-d
> page table, which is dynamically setup before each DMA. So it is
> impossible that the same VT-d page table is shared by all iommus.
> Moveover different iommus in system may have different page table
> levels. 

Right.  This use case is in essence to prevent unintended sharing.  It
is also likely to have low page table height, since dma sizes are
relatively small.

> I think it's enough that iommu API tells us its iommu of a
> device. 
>   

While this is tangential to our conversation, why?  Even for the device
driver use case, this only makes the API more complex.  If the API hides
the existence of multiple iommus, it's easier to use and harder to make
a mistake.

> Whereas in KVM side, the same VT-d page table can be shared by the
> devices which are under smae iommu and assigned to the same guest,
> because all of the guest's memory are statically mapped in VT-d page
> table. But it needs to wrap dmar_domain, this patch wraps it with a
> reference count for multiple devices relate to same dmar_domain.
>
> This patch already adds an API (intel_iommu_device_get_iommu()) in
> intel-iommu.c, which returns its iommu of a device. 

There is a missed optimization here.  Suppose we have two devices each
under a different iommu.  With the patch, each will be in a different
dmar_domain and so will have a different page table.  The amount of
memory used is doubled.

Suppose the iommu API hides the existence of multiple iommus.  You
allocate a translation and add devices to it.  When you add a device,
the iommu API checks which iommu is needed and programs it accordingly,
but only one io page table is used.

The other benefit is that iommu developers understand this issues while
kvm developers don't, so it's best managed by the iommu API.  This way
if things change (as usual, becoming more complicated), the iommu can
make the changes in their code and hide the complexity from kvm or other
users.

I'm probably (badly) duplicating Joerg's iommu API here, but this is how
it could go:

iommu_translation_create() - creates an iommu translation object; this
allocates the page tables but doesn't do anything with them
iommu_translation_map() - adds pages to the translation
iommu_translation_attach() - attach a device to the translation; this
locates the iommu and programs it
_detach(), _unmap(), and _free() undo these operations.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to