Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-09-15 Thread Blue Swirl
I made a first implementation of this concept. CPU-bus uses
southbound functions, device-CPU northbound ones.

The system is not symmetric, the device address range allocation could
well be separate.

What do you think?
Index: qemu/cpu-all.h
===
--- qemu.orig/cpu-all.h	2007-09-15 15:34:51.0 +
+++ qemu/cpu-all.h	2007-09-15 16:05:04.0 +
@@ -861,6 +861,54 @@
 void dump_exec_info(FILE *f,
 int (*cpu_fprintf)(FILE *f, const char *fmt, ...));
 
+/* Bus operations */
+typedef struct qemu_bus qemu_bus;
+
+typedef void (*qemu_mem_rw_handler)(void *opaque,
+target_phys_addr_t addr,
+uint8_t *buf, unsigned int len,
+int is_write);
+
+qemu_bus *bus_init(unsigned int bus_bits, qemu_mem_rw_handler handler,
+   void *handler_opaque);
+void bus_register_physical_memory(qemu_bus *bus,
+  target_phys_addr_t start_addr,
+  unsigned long size,
+  unsigned long phys_offset);
+int bus_register_io_memory(qemu_bus *bus,
+   int io_index,
+   CPUReadMemoryFunc **mem_read,
+   CPUWriteMemoryFunc **mem_write,
+   void *opaque);
+/* Direction CPU-bridge-device/memory */
+void bus_rw_south(qemu_bus *bus, target_phys_addr_t addr, uint8_t *buf,
+  unsigned int len, int is_write);
+static inline void bus_read_south(qemu_bus *bus, target_phys_addr_t addr,
+  uint8_t *buf, unsigned int len)
+{
+bus_rw_south(bus, addr, buf, len, 0);
+}
+static inline void bus_write_south(qemu_bus *bus, target_phys_addr_t addr,
+   const uint8_t *buf, unsigned int len)
+{
+bus_rw_south(bus, addr, (uint8_t *)buf, len, 1);
+}
+void bus_write_south_rom(qemu_bus *bus, target_phys_addr_t addr,
+ const uint8_t *buf, unsigned int len);
+/* From device towards CPU/memory (DMA) */
+void bus_rw_north(qemu_bus *bus, target_phys_addr_t addr, uint8_t *buf,
+  unsigned int len, int is_write);
+static inline void bus_read_north(qemu_bus *bus, target_phys_addr_t addr,
+  uint8_t *buf, unsigned int len)
+{
+bus_rw_north(bus, addr, buf, len, 0);
+}
+static inline void bus_write_north(qemu_bus *bus, target_phys_addr_t addr,
+   const uint8_t *buf, unsigned int len)
+{
+bus_rw_north(bus, addr, (uint8_t *)buf, len, 1);
+}
+
 /***/
 /* host CPU ticks (if available) */
 
Index: qemu/exec.c
===
--- qemu.orig/exec.c	2007-09-15 15:34:51.0 +
+++ qemu/exec.c	2007-09-15 16:05:54.0 +
@@ -2905,3 +2905,258 @@
 #undef env
 
 #endif
+
+typedef struct BusPageDesc {
+/* offset in host memory of the page + io_index in the low 12 bits */
+target_phys_addr_t phys_offset;
+} BusPageDesc;
+
+struct qemu_bus {
+/* Northbound access handler */
+qemu_mem_rw_handler north_handler;
+void  *handler_opaque;
+/* Southbound access management */
+BusPageDesc **l1_bus_map;
+unsigned int bus_bits, l1_bits, l2_bits, l1_size, l2_size;
+CPUWriteMemoryFunc *io_mem_write[IO_MEM_NB_ENTRIES][4];
+CPUReadMemoryFunc *io_mem_read[IO_MEM_NB_ENTRIES][4];
+void *io_mem_opaque[IO_MEM_NB_ENTRIES];
+unsigned int io_mem_nb;
+};
+
+static BusPageDesc *bus_page_find_alloc(qemu_bus *bus,
+target_phys_addr_t index, int alloc)
+{
+void **lp, **p;
+BusPageDesc *pd;
+
+p = (void **)bus-l1_bus_map;
+#if TARGET_PHYS_ADDR_SPACE_BITS  32
+lp = p + ((index  (bus-l1_bits + bus-l2_bits))  (bus-l1_size - 1));
+p = *lp;
+if (!p) {
+/* allocate if not found */
+if (!alloc)
+return NULL;
+p = qemu_vmalloc(sizeof(void *) * bus-l1_size);
+memset(p, 0, sizeof(void *) * bus-l1_size);
+*lp = p;
+}
+#endif
+lp = p + ((index  bus-l2_bits)  (bus-l1_size - 1));
+pd = *lp;
+if (!pd) {
+unsigned int i;
+/* allocate if not found */
+if (!alloc)
+return NULL;
+pd = qemu_vmalloc(sizeof(BusPageDesc) * bus-l2_size);
+*lp = pd;
+for (i = 0; i  bus-l2_size; i++)
+pd[i].phys_offset = IO_MEM_UNASSIGNED;
+}
+return ((BusPageDesc *)pd) + (index  (bus-l2_size - 1));
+}
+
+static inline BusPageDesc *bus_page_find(qemu_bus *bus,
+ target_phys_addr_t index)
+{
+return bus_page_find_alloc(bus, index, 0);
+}
+
+int bus_register_io_memory(qemu_bus *bus,
+   int io_index,
+   

Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-09-08 Thread Blue Swirl
On 8/30/07, Paul Brook [EMAIL PROTECTED] wrote:
  If this is the case, it means we don't need anything complicated. Devices
  map themselves straight into the system address space at the appropriate
  slot address (no plug-n-play to worry about), and device DMA goes via the
  IOMMU.

 Further searching by google suggests I may be wrong.

 The alternative is that the controller maps the 32-bit VA onto a device
 select+28-bit address, using some as-yet undiscovered mechanism.
 There are then a couple of different options for how the CPU/memory bus is
 accessed:
 a) The IOMMU is one or more slave devices, than feed the 28-bit address
 possibly plus a few other bits from the device ID into the translation table.
 This effectively allows you to map a proportion of the SBus 32-bit master VA
 space onto CPU address space via the IOMMU, and map the remainder onto
 devices on the same bus. For a system with =8 slots per bus a fixed mapping
 using the first 2G as 256Mb for each slot and the top 2G for IOMMU is
 entirely feasible.
 b) The 32-bit SBus VA is looked up directly into the IOMMU. Each IOMMU entry
 can refer to either a CPU address, or a device+28-bit address on the local

From DMA2.txt, NCR89C100.txt, NCR89C105.txt and turbosparc.pdf I
gather the following:
- CPU and IOMMU always perform slave accesses
- Slave accesses use the 28-bit address bus to select the device
- Slave accesses are not translated by IOMMU
- NCR master devices (Lance, ESP) use an internal DREQ-style signal to
indicate their need for DMA to their DMA controller
- Master accesses use the 32-bit SBus data signals for both address and data
- DMA controller is the master for NCR89C100+NCR89C105 combination
- Master accesses are translated and controlled by IOMMU
- Slave devices may or may not support master access cycles (not
supported in the NCR case)
- IOMMU can give direct bus access for intelligent masters (no devices known)

We could model this using two buses: A slave bus between the CPU and
the devices, and a master bus between devices and IOMMU. The slave bus
translates the 36-bit CPU/memory bus addresses to 28-bit SBus bus
addresses. The master bus uses IOMMU to translate 32-bit DVMA
addresses to 36-bit CPU/memory bus addresses. Slave devices are
connected to the slave bus and DREQs. Master devices and DMA
controllers take the DREQs and both buses. Devices register the
address ranges they serve on each bus.

On Sun4c (without IOMMU) there would be just one bus for both purposes
(with the MMU quirk).

For the Sparc64 PCI bus which has an IOMMU, a similar dual bus
arrangement would be needed. On PC/PPC systems the two buses would be
again one.

Maybe even the IO port access and MMIO could be unified (one generic
bus for both)? That could simplify the device registration.

Alternatively, the generic bus could support several access modes, so
there would be no need for many virtual buses.

Comments?




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-09-08 Thread Paul Brook
 From DMA2.txt, NCR89C100.txt, NCR89C105.txt and turbosparc.pdf I
 gather the following:
 - CPU and IOMMU always perform slave accesses
 - Slave accesses use the 28-bit address bus to select the device

I thought device selection was separate from the 28-bit SBus slave address 
space. ie. each device has exclusive ownership of the whole 28-bit address 
space, and it's effectively just multiplexing per-slave busses over a single 
electrical connection.

 - Slave accesses are not translated by IOMMU
 - NCR master devices (Lance, ESP) use an internal DREQ-style signal to
 indicate their need for DMA to their DMA controller
 - Master accesses use the 32-bit SBus data signals for both address and
 data - DMA controller is the master for NCR89C100+NCR89C105 combination -
 Master accesses are translated and controlled by IOMMU
 - Slave devices may or may not support master access cycles (not
 supported in the NCR case)
 - IOMMU can give direct bus access for intelligent masters (no devices
 known)

 We could model this using two buses: A slave bus between the CPU and
 the devices, and a master bus between devices and IOMMU. The slave bus
 translates the 36-bit CPU/memory bus addresses to 28-bit SBus bus
 addresses. The master bus uses IOMMU to translate 32-bit DVMA
 addresses to 36-bit CPU/memory bus addresses. Slave devices are
 connected to the slave bus and DREQs. Master devices and DMA
 controllers take the DREQs and both buses. Devices register the
 address ranges they serve on each bus.

IIUC devices never register addresses on the master bus. The only thing that 
responds on that bus is the IOMMU.

 On Sun4c (without IOMMU) there would be just one bus for both purposes
 (with the MMU quirk).

 For the Sparc64 PCI bus which has an IOMMU, a similar dual bus
 arrangement would be needed. On PC/PPC systems the two buses would be
 again one.

PCI shouldn't need a dual bus setup. You just have one bus for PCI and one bus 
for CPU/memory.

IMHO the whole point of having a generic bus infrastructure is that we can 
define address mapping in terms of [asymmetric] translations from one bus 
address space to another. This isolates teh device from needing to care about 
bridges and IOMMu.

If we're assuming 1:1 or symmetric address space mapping there doesn't seem 
much point modelling separate busses. Instead push everything into the device 
registration and DMA routines.

Paul




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-09-08 Thread Blue Swirl
On 9/8/07, Paul Brook [EMAIL PROTECTED] wrote:
  From DMA2.txt, NCR89C100.txt, NCR89C105.txt and turbosparc.pdf I
  gather the following:
  - CPU and IOMMU always perform slave accesses
  - Slave accesses use the 28-bit address bus to select the device

 I thought device selection was separate from the 28-bit SBus slave address
 space. ie. each device has exclusive ownership of the whole 28-bit address
 space, and it's effectively just multiplexing per-slave busses over a single
 electrical connection.

At least the NCR slave devices use SBus bus signals for device select.
I don't know whether this applies generally.

  - Slave accesses are not translated by IOMMU
  - NCR master devices (Lance, ESP) use an internal DREQ-style signal to
  indicate their need for DMA to their DMA controller
  - Master accesses use the 32-bit SBus data signals for both address and
  data - DMA controller is the master for NCR89C100+NCR89C105 combination -
  Master accesses are translated and controlled by IOMMU
  - Slave devices may or may not support master access cycles (not
  supported in the NCR case)
  - IOMMU can give direct bus access for intelligent masters (no devices
  known)
 
  We could model this using two buses: A slave bus between the CPU and
  the devices, and a master bus between devices and IOMMU. The slave bus
  translates the 36-bit CPU/memory bus addresses to 28-bit SBus bus
  addresses. The master bus uses IOMMU to translate 32-bit DVMA
  addresses to 36-bit CPU/memory bus addresses. Slave devices are
  connected to the slave bus and DREQs. Master devices and DMA
  controllers take the DREQs and both buses. Devices register the
  address ranges they serve on each bus.

 IIUC devices never register addresses on the master bus. The only thing that
 responds on that bus is the IOMMU.

Generally yes, but these intelligent masters and their targets would
register on on both buses. The only case I can only think of is a
video grabber, it's frame memory could be accessed directly by other
IO devices.

  On Sun4c (without IOMMU) there would be just one bus for both purposes
  (with the MMU quirk).
 
  For the Sparc64 PCI bus which has an IOMMU, a similar dual bus
  arrangement would be needed. On PC/PPC systems the two buses would be
  again one.

 PCI shouldn't need a dual bus setup. You just have one bus for PCI and one bus
 for CPU/memory.

Then how would Sparc64 IOMMU intercept the device DMA? I'd think that
PCI bus mastering works similarly as in SBus, or doesn't it?

 IMHO the whole point of having a generic bus infrastructure is that we can
 define address mapping in terms of [asymmetric] translations from one bus
 address space to another. This isolates teh device from needing to care about
 bridges and IOMMu.

 If we're assuming 1:1 or symmetric address space mapping there doesn't seem
 much point modelling separate busses. Instead push everything into the device
 registration and DMA routines.

Agreed. For Sparc32/64 cases there isn't much choice, there is no symmetry.




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-09-08 Thread Paul Brook
  IIUC devices never register addresses on the master bus. The only thing
  that responds on that bus is the IOMMU.

 Generally yes, but these intelligent masters and their targets would
 register on on both buses. The only case I can only think of is a
 video grabber, it's frame memory could be accessed directly by other
 IO devices.

Ah, I think we've got different interpretations of what registering a device 
involves. To a first approximation a master doesn't need to register with a 
bus at all. Slave devices can't/don't need to identify which master initiated 
the transaction. Plus all bus transactions are atomic w.r.t. other bus 
traffic, so no arbitration is needed.

Masters will need to register with a bus insofar as they need to get a 
reference to identify which bus they're talking to. They don't generally 
reserve any bus address space though. Most devices are actually dual function 
master/slave devices, so already have a bus handle from registering the slave 
device.

   For the Sparc64 PCI bus which has an IOMMU, a similar dual bus
   arrangement would be needed. On PC/PPC systems the two buses would be
   again one.
 
  PCI shouldn't need a dual bus setup. You just have one bus for PCI and
  one bus for CPU/memory.

 Then how would Sparc64 IOMMU intercept the device DMA? I'd think that
 PCI bus mastering works similarly as in SBus, or doesn't it?

A PCI host controller effectively consists of two bridges.
The CPU-PCI bridge responds to requests on the CPU bus, using simple linear 
address translation to create PCI requests.  The PCI-CPU bridge responds to 
requests on the PCI bus (ie. device DMA), using an IOMMU to translate these 
into CPU requests.

The interesting bits of a generic bus infrastructure are the bridges between 
the busses, not the busses themselves.

Conceptually each access starts on the bus local to that device (the system 
bus for the CPU, PCI bus for device DMA, etc), then recursively walks bus-bus 
bridges until it finds a device. Walking over a bridge is what causes address 
translation, and that translation is sensitive to direction.

I admit I haven't figured out how to implement this efficiently.

Paul




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-29 Thread Blue Swirl
On 8/28/07, Paul Brook [EMAIL PROTECTED] wrote:
  On second thought, there is a huge difference between a write access
  originating from CPU destined for the device and the device writing to
  main memory. The CPU address could be 0xf000 1000, which may translate
  to a bus address of 0x1000, as an example. The device could write to
  main memory using the same bus address 0x1000, but this time the IOMMU
  would map this to for example 0x1234 5000, or without an IOMMU it
  would be just 0x1000.

 While your concern is valid, your example is not.

 You can't have the same bus address mapping onto both a device and main
 memory. Your example works if e.g. IO bus address 0x2000 1000 (or worse still
 0xf000 1000) maps onto system memory 0x1234 5000.

This is a bit mysterious for me too. SBus address space is 28 bits
(256MB). Usually each slot maps to a different area. So the CPU sees
one slot for example at 0x3000  and other at 0x4000 .

IOMMU can map max 2G of memory, usually a 32 or 64MB region. For the
devices, this device virtual memory access (DVMA) space exists at the
top of address space (for example 0xfc00 ). Each page can map to a
different address. But these mappings can not be seen from CPU, for
example the boot prom is located at 0xffd0 . I wonder how the
devices access the DVMA space in case of 256M DVMA.

The device can't obviously supply the address bits 28-31, I don't know
where they come from (=1?). But from tracing Linux I'm pretty sure
that the bus address can be 0 disregarding the higher bits and also
the device (or device FCode prom more likely) can exist at that
location. How? Maybe IOMMU does not see CPU accesses at all and the
devices see neither each other nor themselves, so it's not a really a
shared bus?

 Conceptually you can have a separate IOMMU on every bus-bus or bus/host
 bridge, with asymmetric mappings depending where the transaction originates.

IOMMU on Sun4m maps DVMA addresses to physical addresses, which (I
think) in turn can be other device's registers or memory, but the
mappings are same for all devices.




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-29 Thread Paul Brook
 This is a bit mysterious for me too. SBus address space is 28 bits
 (256MB). Usually each slot maps to a different area. So the CPU sees
 one slot for example at 0x3000  and other at 0x4000 .

 IOMMU can map max 2G of memory, usually a 32 or 64MB region. For the
 devices, this device virtual memory access (DVMA) space exists at the
 top of address space (for example 0xfc00 ). Each page can map to a
 different address. But these mappings can not be seen from CPU, for
 example the boot prom is located at 0xffd0 . I wonder how the
 devices access the DVMA space in case of 256M DVMA.

 The device can't obviously supply the address bits 28-31, I don't know 
 where they come from (=1?). But from tracing Linux I'm pretty sure
 that the bus address can be 0 disregarding the higher bits and also
 the device (or device FCode prom more likely) can exist at that
 location. How? Maybe IOMMU does not see CPU accesses at all and the
 devices see neither each other nor themselves, so it's not a really a
 shared bus?

I can't find a copy of the SBus specification, so I'm guessing how this fits 
together.

The key bit is that SBus controller performs device selection. c.f. PCI and 
ISA where each device does full address decoding.
What information I've found indicates that SBus supports an unlimited number 
of slave devices, and master devices use a 32-bit virtual address space.

This leads me to the conclusion that it's as if each slave device is on its 
own 28-bit bus, and the sbus devices master transactions go via the IOMMU 
onto the CPU bus. From there they may be routed back to an SBus device.
Actual implementation may need to do some short-circuiting to prevent 
deadlock, so I'm not entirely sure about this.

If this is the case, it means we don't need anything complicated. Devices map 
themselves straight into the system address space at the appropriate slot 
address (no plug-n-play to worry about), and device DMA goes via the IOMMU.

Because devices do not do address decoding I suspect this isn't going to 
nicely fit into a generic bus framework that would work for most systems.

Paul




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-29 Thread Paul Brook
 If this is the case, it means we don't need anything complicated. Devices
 map themselves straight into the system address space at the appropriate
 slot address (no plug-n-play to worry about), and device DMA goes via the
 IOMMU.

Further searching by google suggests I may be wrong.

The alternative is that the controller maps the 32-bit VA onto a device 
select+28-bit address, using some as-yet undiscovered mechanism.
There are then a couple of different options for how the CPU/memory bus is 
accessed:
a) The IOMMU is one or more slave devices, than feed the 28-bit address 
possibly plus a few other bits from the device ID into the translation table. 
This effectively allows you to map a proportion of the SBus 32-bit master VA 
space onto CPU address space via the IOMMU, and map the remainder onto 
devices on the same bus. For a system with =8 slots per bus a fixed mapping 
using the first 2G as 256Mb for each slot and the top 2G for IOMMU is 
entirely feasible.
b) The 32-bit SBus VA is looked up directly into the IOMMU. Each IOMMU entry 
can refer to either a CPU address, or a device+28-bit address on the local 
SBUS.

Paul




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-28 Thread Blue Swirl
On 8/26/07, Blue Swirl [EMAIL PROTECTED] wrote:
 On 8/26/07, Fabrice Bellard [EMAIL PROTECTED] wrote:
  Paul Brook wrote:
   pci_gdma.diff: Convert PCI devices and targets
  
   Any comments? The patches are a bit intrusive and I can't test the
   targets except that they compile.
   Shouldn't the PCI DMA object be a property of the PCI bus?
   ie. we don't want/need to pass it round as a separate parameter. It can
   be inferred form the device/bus.
   I agree. Moreover the DMA is bus specific so I don't see a need to add a
   generic DMA layer.
  
   I can see use for a generic DMA interface. It has some nice possibilities 
   for
   devices which can connect via a variety of busses and maybe for layering
   different busses within a system.
  
   However I don't know how well this will work in practice for the machines 
   qemu
   currently emulates.
 
  I can see more uses for a simple bus interface which could be used at
  least for ISA devices. The API should include bus read/write functions
  (which can be used to implement DMA) and functions to allocate/free a
  memory region as we have for the CPU bus.
 
  Of course the same must be added for PCI buses so that the PCI memory
  area can be mapped at any position in the CPU address space.

 Nice idea. The functions in exec.c could be made more generic by
 extending them with an object parameter, for example
 int cpu_register_io_memory(int io_index,
CPUReadMemoryFunc **mem_read,
CPUWriteMemoryFunc **mem_write,
void *opaque)
 would become
 int bus_register_io_memory(void *opaque,
int io_index,
CPUReadMemoryFunc **mem_read,
CPUWriteMemoryFunc **mem_write,
void *io_opaque)

 The opaque object would be struct PCIBus for PCI devices, something
 other for ISA. The ISA DMA DREQ signal is still a problem, or could we
 use qemu_irq for that too?

On second thought, there is a huge difference between a write access
originating from CPU destined for the device and the device writing to
main memory. The CPU address could be 0xf000 1000, which may translate
to a bus address of 0x1000, as an example. The device could write to
main memory using the same bus address 0x1000, but this time the IOMMU
would map this to for example 0x1234 5000, or without an IOMMU it
would be just 0x1000.

So for the CPU physical address to bus address translation, my
proposal would work. But the system does not work in the direction of
device to memory, which was the original problem.

The devices in the same bus could access each other as well. How would
these accesses be identified, so that they are not device to memory
accesses?

Maybe I have missed something important?




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-28 Thread Paul Brook
 On second thought, there is a huge difference between a write access
 originating from CPU destined for the device and the device writing to
 main memory. The CPU address could be 0xf000 1000, which may translate
 to a bus address of 0x1000, as an example. The device could write to
 main memory using the same bus address 0x1000, but this time the IOMMU
 would map this to for example 0x1234 5000, or without an IOMMU it
 would be just 0x1000.

While your concern is valid, your example is not.

You can't have the same bus address mapping onto both a device and main 
memory. Your example works if e.g. IO bus address 0x2000 1000 (or worse still 
0xf000 1000) maps onto system memory 0x1234 5000.

Conceptually you can have a separate IOMMU on every bus-bus or bus/host 
bridge, with asymmetric mappings depending where the transaction originates.

I believe some of the newer POWER machines can do this (x86 hardware with this 
capability is not generally available). The ARM PCI host bridge allows 
asymmetric mappings, thought this is simple regions rather than a full IOMMU, 
and is currently not implemented.

Paul




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-26 Thread Fabrice Bellard

Paul Brook wrote:

pci_gdma.diff: Convert PCI devices and targets

Any comments? The patches are a bit intrusive and I can't test the
targets except that they compile.

Shouldn't the PCI DMA object be a property of the PCI bus?
ie. we don't want/need to pass it round as a separate parameter. It can
be inferred form the device/bus.

I agree. Moreover the DMA is bus specific so I don't see a need to add a
generic DMA layer.


I can see use for a generic DMA interface. It has some nice possibilities for 
devices which can connect via a variety of busses and maybe for layering 
different busses within a system.


However I don't know how well this will work in practice for the machines qemu 
currently emulates.


I can see more uses for a simple bus interface which could be used at 
least for ISA devices. The API should include bus read/write functions 
(which can be used to implement DMA) and functions to allocate/free a 
memory region as we have for the CPU bus.


Of course the same must be added for PCI buses so that the PCI memory 
area can be mapped at any position in the CPU address space.


Fabrice.





Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-26 Thread Blue Swirl
On 8/26/07, Fabrice Bellard [EMAIL PROTECTED] wrote:
 Paul Brook wrote:
  pci_gdma.diff: Convert PCI devices and targets
 
  Any comments? The patches are a bit intrusive and I can't test the
  targets except that they compile.
  Shouldn't the PCI DMA object be a property of the PCI bus?
  ie. we don't want/need to pass it round as a separate parameter. It can
  be inferred form the device/bus.
  I agree. Moreover the DMA is bus specific so I don't see a need to add a
  generic DMA layer.
 
  I can see use for a generic DMA interface. It has some nice possibilities 
  for
  devices which can connect via a variety of busses and maybe for layering
  different busses within a system.
 
  However I don't know how well this will work in practice for the machines 
  qemu
  currently emulates.

 I can see more uses for a simple bus interface which could be used at
 least for ISA devices. The API should include bus read/write functions
 (which can be used to implement DMA) and functions to allocate/free a
 memory region as we have for the CPU bus.

 Of course the same must be added for PCI buses so that the PCI memory
 area can be mapped at any position in the CPU address space.

Nice idea. The functions in exec.c could be made more generic by
extending them with an object parameter, for example
int cpu_register_io_memory(int io_index,
   CPUReadMemoryFunc **mem_read,
   CPUWriteMemoryFunc **mem_write,
   void *opaque)
would become
int bus_register_io_memory(void *opaque,
   int io_index,
   CPUReadMemoryFunc **mem_read,
   CPUWriteMemoryFunc **mem_write,
   void *io_opaque)

The opaque object would be struct PCIBus for PCI devices, something
other for ISA. The ISA DMA DREQ signal is still a problem, or could we
use qemu_irq for that too?

Ideally the devices would not know too much about the bus they are in.




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-24 Thread Paul Brook
On Friday 24 August 2007, Blue Swirl wrote:
 I have now converted the ISA DMA devices (SB16, FDC), most PCI devices
 and targets.

 gdma.diff: Generic DMA
 pc_ppc_dma_to_gdma.diff: Convert x86 and PPC to GDMA
 pc_sb16_to_gdma.diff: Convert SB16 to GDMA
 pc_fdc_to_gdma.diff: FDC
 pc_dma_cleanup.diff: Remove unused functions
 sparc_gdma.diff: Convert Sparc32 to GDMA
 sparc32_dma_esp_le_to_gdma.diff: Convert ESP and Lance
 sun4c.diff: Preliminary Sun4c (Sparcstation-1) support
 pci_gdma.diff: Convert PCI devices and targets

 Any comments? The patches are a bit intrusive and I can't test the
 targets except that they compile.

Shouldn't the PCI DMA object be a property of the PCI bus?
ie. we don't want/need to pass it round as a separate parameter. It can be 
inferred form the device/bus.

Paul




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-24 Thread Fabrice Bellard

Paul Brook wrote:

On Friday 24 August 2007, Blue Swirl wrote:

I have now converted the ISA DMA devices (SB16, FDC), most PCI devices
and targets.

gdma.diff: Generic DMA
pc_ppc_dma_to_gdma.diff: Convert x86 and PPC to GDMA
pc_sb16_to_gdma.diff: Convert SB16 to GDMA
pc_fdc_to_gdma.diff: FDC
pc_dma_cleanup.diff: Remove unused functions
sparc_gdma.diff: Convert Sparc32 to GDMA
sparc32_dma_esp_le_to_gdma.diff: Convert ESP and Lance
sun4c.diff: Preliminary Sun4c (Sparcstation-1) support
pci_gdma.diff: Convert PCI devices and targets

Any comments? The patches are a bit intrusive and I can't test the
targets except that they compile.


Shouldn't the PCI DMA object be a property of the PCI bus?
ie. we don't want/need to pass it round as a separate parameter. It can be 
inferred form the device/bus.


I agree. Moreover the DMA is bus specific so I don't see a need to add a 
generic DMA layer.


Regards,

Fabrice.





Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-24 Thread Paul Brook
  pci_gdma.diff: Convert PCI devices and targets
 
  Any comments? The patches are a bit intrusive and I can't test the
  targets except that they compile.
 
  Shouldn't the PCI DMA object be a property of the PCI bus?
  ie. we don't want/need to pass it round as a separate parameter. It can
  be inferred form the device/bus.

 I agree. Moreover the DMA is bus specific so I don't see a need to add a
 generic DMA layer.

I can see use for a generic DMA interface. It has some nice possibilities for 
devices which can connect via a variety of busses and maybe for layering 
different busses within a system.

However I don't know how well this will work in practice for the machines qemu 
currently emulates.

Paul




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-19 Thread Blue Swirl
In 8/16/07, malc [EMAIL PROTECTED] wrote:
 Very long time ago i changed the ISA DMA API to address some of the
 critique that Fabrice expressed, i can't remember offhand if that
 included removal of explicit position passing or not (the patch is on
 some off-line HDD so it's not easy to check whether it's in fact so)

 http://www.mail-archive.com/qemu-devel@nongnu.org/msg06594.html

 If needed i can try to locate the patch but the FDC problem still needs
 to be addressed by someone.

That could be interesting, please try to find it. I guess the patch
would be 7_aqemu?




[Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-16 Thread Blue Swirl
On 8/14/07, Blue Swirl [EMAIL PROTECTED] wrote:
 Would the framework need any changes to support other targets? Comments 
 welcome.

Replying to myself: Yes, changes may be needed. Some of the DMA
controllers move the data outside CPU loop, but that does not make
much difference.

Background: I want to use the framework for at least devices that
Sparc32/64 use. For Sparc32 the reason is that on Sun4c (Sparcstation
1, 2, IPX etc.) there is no IOMMU, but instead the CPU MMU is used for
address translation. The DMA framework makes it possible to remove the
IOMMU without changing the devices.

On Sparc64 an IOMMU needs to be inserted between PCI devices and RAM
without disturbing other targets.

About the devices: Users of PC ISA DMA controller (SB16, FDC) pass the
DMA position parameter to controller. I'm not sure this can be removed
easily. Of course a real DMA controller does not get any position data
from target. For Sparc32/64 I would not need to touch the PC ISA DMA
devices, except maybe for FDC. On Sparc32, the FDC DMA is not even
used. I have to think about this part.

PCI DMA-like devices (eepro100, pcnet, rtl8139, ide) as well as PXA
use cpu_physical_memory_rw to transfer data (eepro100 also uses
ldl_phys, which looks very suspicious). These could be converted to
generic DMA easily.

OMAP DMA is strange, but fortunately I'm not interested in those devices.




Re: [Qemu-devel] Re: PATCH, RFC: Generic DMA framework

2007-08-16 Thread malc

On Thu, 16 Aug 2007, Blue Swirl wrote:


On 8/14/07, Blue Swirl [EMAIL PROTECTED] wrote:

Would the framework need any changes to support other targets? Comments welcome.


Replying to myself: Yes, changes may be needed. Some of the DMA
controllers move the data outside CPU loop, but that does not make
much difference.

Background: I want to use the framework for at least devices that
Sparc32/64 use. For Sparc32 the reason is that on Sun4c (Sparcstation
1, 2, IPX etc.) there is no IOMMU, but instead the CPU MMU is used for
address translation. The DMA framework makes it possible to remove the
IOMMU without changing the devices.

On Sparc64 an IOMMU needs to be inserted between PCI devices and RAM
without disturbing other targets.

About the devices: Users of PC ISA DMA controller (SB16, FDC) pass the
DMA position parameter to controller. I'm not sure this can be removed
easily. Of course a real DMA controller does not get any position data
from target. For Sparc32/64 I would not need to touch the PC ISA DMA
devices, except maybe for FDC. On Sparc32, the FDC DMA is not even
used. I have to think about this part.


Very long time ago i changed the ISA DMA API to address some of the
critique that Fabrice expressed, i can't remember offhand if that
included removal of explicit position passing or not (the patch is on
some off-line HDD so it's not easy to check whether it's in fact so)

http://www.mail-archive.com/qemu-devel@nongnu.org/msg06594.html

If needed i can try to locate the patch but the FDC problem still needs
to be addressed by someone.


PCI DMA-like devices (eepro100, pcnet, rtl8139, ide) as well as PXA
use cpu_physical_memory_rw to transfer data (eepro100 also uses
ldl_phys, which looks very suspicious). These could be converted to
generic DMA easily.

OMAP DMA is strange, but fortunately I'm not interested in those devices.




--
vale