On Wed, Jun 13, 2012 at 10:37:41AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2012-06-12 at 12:46 +0300, Avi Kivity wrote: > > > I think that transformation function lives in the bus layer > > > MemoryRegion. It's a bit tricky though because you need some sort of > > > notion of "who is asking". So you need: > > > > > > dma_memory_write(MemoryRegion *parent, DeviceState *caller, > > > const void *data, size_t size); > > > > It is not the parent here, but rather the root of the memory hierarchy > > as viewed from the device (the enigmatically named 'pcibm' above). The > > pci memory region simply doesn't have the information about where system > > memory lives, because it is a sibling region. > > Right and it has to be hierarchical, you can have CPU -> PCI transform > followed by PCI -> AXI (or whatever stupid bus they use on the Broadcom > wireless cards), etc... > > There can be any amount of transform. There's also the need at each > level to handle sibling decoding. IE. It's the same BARs used for > downstream and upstream access that will decode an access at any given > level. > > So it's not a separate hierarchy, it's the same hierarchy walked both > ways with potentially different transforms depending on what direction > it's walked . > > > Note that the address transformations are not necessarily symmetric (for > > example, iommus transform device->system transactions, but not > > cpu->device transactions). Each initiator has a separate DAG to follow. > > Right. Or rather they might transform CPU -> device but differently (ie, > we do have several windows with different offsets on power for example > etc...) so it's a different transform which -might- be an iommu of some > sort as well. > > I think the whole mechanism should be symetrical, with a fast path for > transforms that can be represented by a direct map + offset (ie no iommu > case). > > > > This could be simplified at each layer via: > > > > > > void pci_device_write(PCIDevice *dev, const void *data, size_t size) { > > > dma_memory_write(dev->bus->mr, DEVICE(dev), data, size); > > > } > > > > > >> To be true to the HW, each bridge should have its memory region, so a > > >> setup with > > >> > > >> /pci-host > > >> | > > >> |--/p2p > > >> | > > >> |--/device > > >> > > >> Any DMA done by the device would walk through the p2p region to the host > > >> which would contain a region with transform ops. > > >> > > >> However, at each level, you'd have to search for sibling regions that > > >> may decode the address at that level before moving up, ie implement > > >> essentially the equivalent of the PCI substractive decoding scheme. > > > > > > Not quite... subtractive decoding only happens for very specific > > > devices IIUC. For instance, an PCI-ISA bridge. Normally, it's positive > > > decoding and a bridge has to describe the full region of MMIO/PIO that > > > it handles. > > > > > > So it's only necessary to transverse down the tree again for the very > > > special case of PCI-ISA bridges. Normally you can tell just by looking > > > at siblings. > > > > > >> That will be a significant overhead for your DMA ops I believe, though > > >> doable. > > > > > > Worst case scenario, 256 devices with what, a 3 level deep hierarchy? > > > we're still talking about 24 simple address compares. That shouldn't be > > > so bad. > > > > Or just lookup the device-local phys_map. > > > > > > >> Then we'd have to add map/unmap to MemoryRegion as well, with the > > >> understanding that they may not be supported at every level... > > > > > > map/unmap can always fall back to bounce buffers. > > > > > >> So yeah, it sounds doable and it would handle what DMAContext doesn't > > >> handle which is access to peer devices without going all the way back to > > >> the "top level", but it's complex and ... I need something in qemu > > >> 1.2 :-) > > > > > > I think we need a longer term vision here. We can find incremental > > > solutions for the short term but I'm pretty nervous about having two > > > parallel APIs only to discover that we need to converge in 2 years. > > > > The API already exists, we just need to fill up the data structures. > > Not really no, we don't have proper DMA APIs to shoot from devices.
TBH, I don't understand any of the "upstream" access discussion nor the specifics of DMA accesses for the memory/system bus accesses. When a device, like a DMA unit accesses the memory/system bus it, AFAIK, does it from a different port (its master port). That port is NOT the same as it's slave port. There is no reverese decoding of the addresses, the access is made top-down. If you are talking about NoCs we need a compleet different infrastructure but I don't think that is the case. I agree with Avi, most of it is already in place. Cheers