[Qemu-devel] [PATCH] Fix subtle integer overflow bug in memory API
It is quite common to have a MemoryRegion with size of INT64_MAX. When processing alias regions in render_memory_region() it's quite easy to find a case where it will construct a temporary AddrRange with a non-zero start, and size still of INT64_MAX. When means attempting to compute the end of such a range as start + size will result in signed integer overflow. This integer overflow means that addrrange_intersects() can incorrectly report regions as not intersecting when they do. For example consider the case of address ranges {0x100, 0x7fff} and {0x1001000, 0x1000} where the second is in fact included completely in the first. This patch rearranges addrrange_intersects() to avoid the integer overflow, correcting this behaviour. Signed-off-by: David Gibson da...@gibson.dropbear.id.au --- memory.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/memory.c b/memory.c index 57f0fa4..101b67c 100644 --- a/memory.c +++ b/memory.c @@ -55,8 +55,8 @@ static AddrRange addrrange_shift(AddrRange range, int64_t delta) static bool addrrange_intersects(AddrRange r1, AddrRange r2) { -return (r1.start = r2.start r1.start r2.start + r2.size) -|| (r2.start = r1.start r2.start r1.start + r1.size); +return (r1.start = r2.start (r1.start - r2.start) r2.size) +|| (r2.start = r1.start (r2.start - r1.start) r1.size); } static AddrRange addrrange_intersection(AddrRange r1, AddrRange r2) -- 1.7.5.4
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 09/13/2011 10:39 PM, Blue Swirl wrote: Here is the problem: Both the vram and the ISA range get mapped into system address space, but the former eclipses the latter as it shows up earlier in the list and has the same priority. This picture changes with the chain-4 alias which has prio 2, thus maps over the vram. It looks to me like the ISA address space is either misplaced at 0x8000 or is not supposed to be mapped at all on PPC. Comments? Since there is no PCI-ISA bridge, ISA address space shouldn't exist. Where does the vga device sit then? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [PATCH] hid: vmstat fix
On 09/14/2011 05:03 AM, TeLeMan wrote: The commit usb/hid: add hid_pointer_activate, use it used HIDMouseState.mouse_grabbed in hid_pointer_activate(), so mouse_grabbed should be added into vmstat. Does this fix a bug? qemu_activate_mouse_event_handler is meant to be called once per execution of the VM, it is not guest state. Paolo
[Qemu-devel] [PATCH 0/3] virtio-serial: Bug fix, add stats for bytes transferred
Hello, These patches fix one bug (patch 2), and add some stats for bytes sent, received and discarded, mainly for debugging purposes.. These stats are shown in the 'info qtree' output. More details in the commit logs. Please apply, Amit Shah (3): virtio-serial-bus: add port arg to discard_vq_data() virtio-serial-bus: discard data in already popped-out elem virtio-serial-bus: Add per-port stats for received, sent, discarded bytes hw/virtio-serial-bus.c | 37 +++-- hw/virtio-serial.h | 11 +++ 2 files changed, 42 insertions(+), 6 deletions(-) -- 1.7.6
[Qemu-devel] [PATCH 1/3] virtio-serial-bus: add port arg to discard_vq_data()
To discard throttled data as well as maintain statistics of bytes received and discarded, discard_vq_data() will need the port associated with the vq. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index a4825b9..6838d73 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -114,7 +114,8 @@ static size_t write_to_port(VirtIOSerialPort *port, return offset; } -static void discard_vq_data(VirtQueue *vq, VirtIODevice *vdev) +static void discard_vq_data(VirtIOSerialPort *port, VirtQueue *vq, +VirtIODevice *vdev) { VirtQueueElement elem; @@ -248,7 +249,7 @@ int virtio_serial_close(VirtIOSerialPort *port) * consume, reset the throttling flag and discard the data. */ port-throttled = false; -discard_vq_data(port-ovq, port-vser-vdev); +discard_vq_data(port, port-ovq, port-vser-vdev); send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN, 0); @@ -473,7 +474,7 @@ static void handle_output(VirtIODevice *vdev, VirtQueue *vq) info = port ? DO_UPCAST(VirtIOSerialPortInfo, qdev, port-dev.info) : NULL; if (!port || !port-host_connected || !info-have_data) { -discard_vq_data(vq, vdev); +discard_vq_data(port, vq, vdev); return; } @@ -730,7 +731,7 @@ static void remove_port(VirtIOSerial *vser, uint32_t port_id) port = find_port_by_id(vser, port_id); /* Flush out any unconsumed buffers first */ -discard_vq_data(port-ovq, port-vser-vdev); +discard_vq_data(port, port-ovq, port-vser-vdev); send_control_event(port, VIRTIO_CONSOLE_PORT_REMOVE, 1); } -- 1.7.6
[Qemu-devel] [PATCH 2/3] virtio-serial-bus: discard data in already popped-out elem
While discarding data previously any popped-out elem in the vq but not yet pushed into the guest because the backend was throttled wasn't pushed back into the guest. Fix that by checking if we had any in-progress elem, and pushing it out to the guest first before emptying the vq. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 6838d73..2c84398 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -122,6 +122,10 @@ static void discard_vq_data(VirtIOSerialPort *port, VirtQueue *vq, if (!virtio_queue_ready(vq)) { return; } +if (port port-elem.out_num) { +virtqueue_push(vq, port-elem, 0); +port-elem.out_num = 0; +} while (virtqueue_pop(vq, elem)) { virtqueue_push(vq, elem, 0); } -- 1.7.6
[Qemu-devel] [PATCH 3/3] virtio-serial-bus: Add per-port stats for received, sent, discarded bytes
This commit adds port-specific stats for the number of bytes received, sent and discarded. They can be seen in the 'info qtree' monitor output for the specific port. This data can be used to check for data loss bugs (or disprove such claims). It can also be used for accounting, if there's such a need. The stats remain valid throughout the lifetime of the port. Unplugging a port will reset the stats. The numbers are not reset across port opens/closes. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c | 24 ++-- hw/virtio-serial.h | 11 +++ 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 2c84398..deefda4 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -108,6 +108,7 @@ static size_t write_to_port(VirtIOSerialPort *port, offset += len; virtqueue_push(vq, elem, len); +port-stats.bytes_sent += len; } virtio_notify(port-vser-vdev, vq); @@ -123,10 +124,24 @@ static void discard_vq_data(VirtIOSerialPort *port, VirtQueue *vq, return; } if (port port-elem.out_num) { +port-stats.bytes_discarded += (iov_size(port-elem.out_sg, + elem.out_num) +- iov_size(port-elem.out_sg, + port-iov_idx) +- port-iov_offset); virtqueue_push(vq, port-elem, 0); port-elem.out_num = 0; } while (virtqueue_pop(vq, elem)) { +if (port) { +unsigned long size; + +size = iov_size(elem.out_sg, elem.out_num); + +/* We haven't counted these bytes in the received stats yet. */ +port-stats.bytes_received += size; +port-stats.bytes_discarded += size; +} virtqueue_push(vq, elem, 0); } virtio_notify(vdev, vq); @@ -152,6 +167,8 @@ static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq, } port-iov_idx = 0; port-iov_offset = 0; +port-stats.bytes_received += iov_size(port-elem.out_sg, + port-elem.out_num); } for (i = port-iov_idx; i port-elem.out_num; i++) { @@ -684,11 +701,14 @@ static void virtser_bus_dev_print(Monitor *mon, DeviceState *qdev, int indent) { VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, qdev); -monitor_printf(mon, %*sport %d, guest %s, host %s, throttle %s\n, +monitor_printf(mon, %*sport %d, guest %s, host %s, throttle %s, bytes_sent %lu, bytes_received %lu, bytes_discarded: %lu\n, indent, , port-id, port-guest_connected ? on : off, port-host_connected ? on : off, - port-throttled ? on : off); + port-throttled ? on : off, + port-stats.bytes_sent, + port-stats.bytes_received, + port-stats.bytes_discarded); } /* This function is only used if a port id is not provided by the user */ diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h index ab13803..34d36d7 100644 --- a/hw/virtio-serial.h +++ b/hw/virtio-serial.h @@ -67,6 +67,10 @@ typedef struct VirtIOSerialBus VirtIOSerialBus; typedef struct VirtIOSerialPort VirtIOSerialPort; typedef struct VirtIOSerialPortInfo VirtIOSerialPortInfo; +typedef struct { +unsigned long bytes_sent, bytes_received, bytes_discarded; +} PortStats; + /* * This is the state that's shared between all the ports. Some of the * state is configurable via command-line options. Some of it can be @@ -87,6 +91,13 @@ struct VirtIOSerialPort { VirtQueue *ivq, *ovq; /* + * Keep count of the bytes sent, received and discarded for + * this port for accounting and debugging purposes. These + * counts are not reset across port open / close events. + */ +PortStats stats; + +/* * This name is sent to the guest and exported via sysfs. * The guest could create symlinks based on this information. * The name is in the reverse fqdn format, like org.qemu.console.0 -- 1.7.6
Re: [Qemu-devel] [PATCH] pseries: Update SLOF firmware image
On 09/01/2011 07:13 AM, David Gibson wrote: The current SLOF firmware for the pseries machine has a bug in SCSI condition handling that was exposed by recent updates to qemu's SCSI emulation. This patch updates the SLOF image to one with the bug fixed. Ping for this and http://permalink.gmane.org/gmane.comp.emulators.qemu/114461 Paolo
Re: [Qemu-devel] [PATCH] hid: vmstat fix
On Wed, Sep 14, 2011 at 15:15, Paolo Bonzini pbonz...@redhat.com wrote: On 09/14/2011 05:03 AM, TeLeMan wrote: The commit usb/hid: add hid_pointer_activate, use it used HIDMouseState.mouse_grabbed in hid_pointer_activate(), so mouse_grabbed should be added into vmstat. Does this fix a bug? qemu_activate_mouse_event_handler is meant to be called once per execution of the VM, it is not guest state. Yes, this patch fixes the usb mouse not be working after loadvm in the guest windows. Paolo
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 14.09.2011, at 09:11, Avi Kivity wrote: On 09/13/2011 10:39 PM, Blue Swirl wrote: Here is the problem: Both the vram and the ISA range get mapped into system address space, but the former eclipses the latter as it shows up earlier in the list and has the same priority. This picture changes with the chain-4 alias which has prio 2, thus maps over the vram. It looks to me like the ISA address space is either misplaced at 0x8000 or is not supposed to be mapped at all on PPC. Comments? Since there is no PCI-ISA bridge, ISA address space shouldn't exist. Where does the vga device sit then? On the PCI bus? :) Alex
Re: [Qemu-devel] [Qemu-ppc] [PATCH] pseries: Update SLOF firmware image
On 14.09.2011, at 09:38, Paolo Bonzini wrote: On 09/01/2011 07:13 AM, David Gibson wrote: The current SLOF firmware for the pseries machine has a bug in SCSI condition handling that was exposed by recent updates to qemu's SCSI emulation. This patch updates the SLOF image to one with the bug fixed. Ping for this and http://permalink.gmane.org/gmane.comp.emulators.qemu/114461 Yeah, sorry, I introduced a regression with the KVM ABI in my HIOR patches and still need to rework that before I can push out the tree (otherwise it's a hell lot of work to untangle the changes). My hope is that we have VGA fixed until then too, so all ppc targets will work again ;) Alex
Re: [Qemu-devel] [PATCH] pc_init: Fail on bad kernel
Ping? On Sat, 2011-09-03 at 22:35 +0300, Sasha Levin wrote: When providing QEMU with a bad '-kernel' parameter, such as a file which is not really a kernel, QEMU will attempt to allocate a huge amount of memory and fail either with Failed to allocate memory: Cannot allocate memory or a GLib error: GLib-ERROR **: gmem.c:170: failed to allocate 18446744073709529965 bytes This patch handles the case where the magic sig wasn't located in the provided kernel, and loading it as multiboot failed as well. Cc: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Sasha Levin levinsasha...@gmail.com --- hw/pc.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 6b3662e..428440b 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -691,8 +691,14 @@ static void load_linux(void *fw_cfg, /* This looks like a multiboot kernel. If it is, let's stop treating it like a Linux kernel. */ if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename, - kernel_cmdline, kernel_size, header)) + kernel_cmdline, kernel_size, header)) { return; +} else { +fprintf(stderr, qemu: could not load kernel '%s': %s\n, + kernel_filename, strerror(errno)); + exit(1); +} + protocol = 0; } -- Sasha.
Re: [Qemu-devel] [PATCH] hid: vmstat fix
On 09/14/2011 09:40 AM, TeLeMan wrote: The commit usb/hid: add hid_pointer_activate, use it used HIDMouseState.mouse_grabbed in hid_pointer_activate(), so mouse_grabbed should be added into vmstat. Does this fix a bug? qemu_activate_mouse_event_handler is meant to be called once per execution of the VM, it is not guest state. Yes, this patch fixes the usb mouse not be working after loadvm in the guest windows. I'm wondering if, with your patch, Windows is actually using the PS/2 mouse after loadvm... If that is the case, perhaps instead you can move if (hs-kind == HID_MOUSE || hs-kind == HID_TABLET) { hid_pointer_activate(hs); } from hw/usb-hid.c to hid_set_next_idle, which is called at post-load time. Paolo
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 2011-09-14 09:42, Alexander Graf wrote: On 14.09.2011, at 09:11, Avi Kivity wrote: On 09/13/2011 10:39 PM, Blue Swirl wrote: Here is the problem: Both the vram and the ISA range get mapped into system address space, but the former eclipses the latter as it shows up earlier in the list and has the same priority. This picture changes with the chain-4 alias which has prio 2, thus maps over the vram. It looks to me like the ISA address space is either misplaced at 0x8000 or is not supposed to be mapped at all on PPC. Comments? Since there is no PCI-ISA bridge, ISA address space shouldn't exist. Where does the vga device sit then? On the PCI bus? :) Then make sure that the container for ISA resources is a dummy region - or even NULL so that VGA will know that it's supposed to skip ISA registrations. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 09/14/2011 10:42 AM, Alexander Graf wrote: On 14.09.2011, at 09:11, Avi Kivity wrote: On 09/13/2011 10:39 PM, Blue Swirl wrote: Here is the problem: Both the vram and the ISA range get mapped into system address space, but the former eclipses the latter as it shows up earlier in the list and has the same priority. This picture changes with the chain-4 alias which has prio 2, thus maps over the vram. It looks to me like the ISA address space is either misplaced at 0x8000 or is not supposed to be mapped at all on PPC. Comments? Since there is no PCI-ISA bridge, ISA address space shouldn't exist. Where does the vga device sit then? On the PCI bus? :) I thought it was std vga, which is an ISA device. Anyway PCI supports the vga region at 0xa-0xc. Where is it supposed to be mapped? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 2011-09-14 10:17, Avi Kivity wrote: On 09/14/2011 10:42 AM, Alexander Graf wrote: On 14.09.2011, at 09:11, Avi Kivity wrote: On 09/13/2011 10:39 PM, Blue Swirl wrote: Here is the problem: Both the vram and the ISA range get mapped into system address space, but the former eclipses the latter as it shows up earlier in the list and has the same priority. This picture changes with the chain-4 alias which has prio 2, thus maps over the vram. It looks to me like the ISA address space is either misplaced at 0x8000 or is not supposed to be mapped at all on PPC. Comments? Since there is no PCI-ISA bridge, ISA address space shouldn't exist. Where does the vga device sit then? On the PCI bus? :) I thought it was std vga, which is an ISA device. There are both types (ISA-only and PCI). Anyway PCI supports the vga region at 0xa-0xc. Where is it supposed to be mapped? ...but not all PCI bridges make use of this feature / forward legacy requests. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 09/14/2011 11:20 AM, Jan Kiszka wrote: Anyway PCI supports the vga region at 0xa-0xc. Where is it supposed to be mapped? ...but not all PCI bridges make use of this feature / forward legacy requests. Then this should be fixed in the bridge? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH] Fix subtle integer overflow bug in memory API
On 09/14/2011 10:02 AM, David Gibson wrote: It is quite common to have a MemoryRegion with size of INT64_MAX. When processing alias regions in render_memory_region() it's quite easy to find a case where it will construct a temporary AddrRange with a non-zero start, and size still of INT64_MAX. When means attempting to compute the end of such a range as start + size will result in signed integer overflow. This integer overflow means that addrrange_intersects() can incorrectly report regions as not intersecting when they do. For example consider the case of address ranges {0x100, 0x7fff} and {0x1001000, 0x1000} where the second is in fact included completely in the first. Good catch, thanks for digging this out. This patch rearranges addrrange_intersects() to avoid the integer overflow, correcting this behaviour. I expect that the bad behaviour can still be triggered, for example by pointing aliases towards the end of very large regions. Not that I expect this to occur in practice. I think we should move towards using __int128 internally. Is there any relevant host which does not support __int128? Meanwhile, applied to memory/core, and will request a pull shortly. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 2011-09-14 10:22, Avi Kivity wrote: On 09/14/2011 11:20 AM, Jan Kiszka wrote: Anyway PCI supports the vga region at 0xa-0xc. Where is it supposed to be mapped? ...but not all PCI bridges make use of this feature / forward legacy requests. Then this should be fixed in the bridge? Yes, it's a PPC bug. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 14.09.2011, at 10:24, Jan Kiszka wrote: On 2011-09-14 10:22, Avi Kivity wrote: On 09/14/2011 11:20 AM, Jan Kiszka wrote: Anyway PCI supports the vga region at 0xa-0xc. Where is it supposed to be mapped? ...but not all PCI bridges make use of this feature / forward legacy requests. Then this should be fixed in the bridge? Yes, it's a PPC bug. So how does the bridge not forward it then? Alex
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 2011-09-14 10:27, Alexander Graf wrote: On 14.09.2011, at 10:24, Jan Kiszka wrote: On 2011-09-14 10:22, Avi Kivity wrote: On 09/14/2011 11:20 AM, Jan Kiszka wrote: Anyway PCI supports the vga region at 0xa-0xc. Where is it supposed to be mapped? ...but not all PCI bridges make use of this feature / forward legacy requests. Then this should be fixed in the bridge? Yes, it's a PPC bug. So how does the bridge not forward it then? On real HW, by keeping the VGA Enable bit off. Or just not issuing requests to the a..b range. Under QEMU, I would simply provide the VGA model a memory region for legacy stuff that remains unregistered. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 09/14/2011 11:27 AM, Alexander Graf wrote: On 14.09.2011, at 10:24, Jan Kiszka wrote: On 2011-09-14 10:22, Avi Kivity wrote: On 09/14/2011 11:20 AM, Jan Kiszka wrote: Anyway PCI supports the vga region at 0xa-0xc. Where is it supposed to be mapped? ...but not all PCI bridges make use of this feature / forward legacy requests. Then this should be fixed in the bridge? Yes, it's a PPC bug. So how does the bridge not forward it then? I expect that currently vga adds the region to pci_address_space(). We need to create a pci_address_space_vga() function that returns a region for vga to use. Then add or remove the region to pci_address_space(), within the bridge code, depending on whether the bridge forwards vga accesses or not. (assuming I understood the problem correctly - not sure) -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH] Fix subtle integer overflow bug in memory API
On 09/14/2011 11:23 AM, Avi Kivity wrote: I think we should move towards using __int128 internally. Is there any relevant host which does not support __int128? Crap, it's not even supported on i386. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PULL 00/58] ppc patch queue 2011-09-14
Hi Aurelien / Blue, This is my current patch queue for ppc. Please pull. Alex The following changes since commit 44520db10b1b92f272348ab7028e7afc68ac3edf: Fabien Chouteau (1): Gdbstub: Fix back-trace on SPARC32 are available in the git repository at: git://repo.or.cz/qemu/agraf.git ppc-next Alexander Graf (38): PPC: Move openpic to target specific code compilation PPC: Add CPU local MMIO regions to MPIC PPC: Extend MPIC MMIO range PPC: Fix IPI support in MPIC PPC: Set MPIC IDE for IPI to 0 PPC: MPIC: Remove read functionality for WO registers PPC: MPIC: Fix CI bit definitions PPC: Bump MPIC up to 32 supported CPUs PPC: E500: create multiple envs PPC: E500: Generate IRQ lines for many CPUs device tree: add nop_node PPC: bamboo: Move host fdt copy to target PPC: KVM: Add generic function to read host clockfreq PPC: E500: Use generic kvm function for freq PPC: E500: Remove mpc8544_copy_soc_cell PPC: bamboo: Use kvm api for freq and clock frequencies PPC: KVM: Remove kvmppc_read_host_property PPC: KVM: Add stubs for kvm helper functions PPC: E500: Update freqs for all CPUs PPC: E500: Remove unneeded CPU nodes PPC: E500: Add PV spinning code PPC: E500: Update cpu-release-addr property in cpu nodes device tree: add add_subnode command device tree: dont fail operations device tree: give dt more size MPC8544DS: Remove CPU nodes MPC8544DS: Generate CPU nodes on init PPC: E500: Bump CPU count to 15 PPC: Add new target config for pseries KVM: update kernel headers PPC: Enable to use PAPR with PR style KVM PPC: SPAPR: Use KVM function for time info KVM: Update kernel headers openpic: Unfold read_IRQreg openpic: Unfold write_IRQreg PPC: Fix via-cuda memory registration PPC: Fix heathrow PIC to use little endian MMIO KVM: Update kernel headers David Gibson (8): pseries: Bugfixes for interrupt numbering in XICS code pseries: Add a phandle to the xicp interrupt controller device tree node pseries: interrupt controller should not have a 'reg' property pseries: More complete WIMG validation in H_ENTER code pseries: Add real mode debugging hcalls Implement POWER7's CFAR in TCG pseries: Implement hcall-bulk hypervisor interface pseries: Update SLOF firmware image Elie Richa (1): PPC: Fix sync instructions problem in SMP Fabien Chouteau (1): Gdbstub: handle read of fpscr Laurent Vivier (1): ppc: move ADB stuff from ppc_mac.h to adb.h Nishanth Aravamudan (1): pseries: use macro for firmware filename Paolo Bonzini (4): spapr: proper qdevification spapr: prepare for qdevification of irq spapr: make irq customizable via qdev vscsi: send the CHECK_CONDITION status down together with autosense data Scott Wood (3): kvm: ppc: booke206: use MMU API ppc: booke206: add info tlb support ppc: booke206: use MAV=2.0 TSIZE definition, fix 4G pages Stefan Hajnoczi (1): ppc405: use RAM_ADDR_FMT instead of %08lx Makefile.objs|1 - Makefile.target | 10 +- configure|3 + cpu-exec.c |1 + device_tree.c| 92 ++-- device_tree.h|2 + gdbstub.c|2 +- hmp-commands.hx |2 +- hw/adb.c |2 +- hw/adb.h | 67 + hw/cuda.c| 29 +++-- hw/heathrow_pic.c|2 +- hw/openpic.c | 289 +- hw/ppc405_boards.c |5 +- hw/ppc440_bamboo.c | 16 ++- hw/ppc_mac.h | 42 -- hw/ppc_newworld.c|1 + hw/ppc_oldworld.c|1 + hw/ppce500_mpc8544ds.c | 195 +++--- hw/ppce500_spin.c| 186 hw/spapr.c | 52 --- hw/spapr.h |9 ++ hw/spapr_hcall.c | 220 +++-- hw/spapr_llan.c | 11 +-- hw/spapr_vio.c | 11 ++ hw/spapr_vio.h | 18 ++-- hw/spapr_vscsi.c | 13 +-- hw/spapr_vty.c | 10 +- hw/xics.c| 17 +-- linux-headers/asm-powerpc/kvm.h | 59 - linux-headers/asm-x86/kvm_para.h | 14 ++ linux-headers/linux/kvm.h| 42 +- linux-headers/linux/kvm_para.h |1 + monitor.c|5 +- pc-bios/README |2 +- pc-bios/mpc8544ds.dtb| Bin 2277 - 2028 bytes pc-bios/mpc8544ds.dts| 12 --
[Qemu-devel] [PATCH 26/58] device tree: add add_subnode command
We want to be able to create subnodes in our device tree, so export it through the qemu device tree abstraction framework. Signed-off-by: Alexander Graf ag...@suse.de --- device_tree.c | 24 device_tree.h |1 + 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/device_tree.c b/device_tree.c index 23e89e3..f4a78c8 100644 --- a/device_tree.c +++ b/device_tree.c @@ -118,3 +118,27 @@ int qemu_devtree_nop_node(void *fdt, const char *node_path) return fdt_nop_node(fdt, offset); } + +int qemu_devtree_add_subnode(void *fdt, const char *name) +{ +int offset; +char *dupname = g_strdup(name); +char *basename = strrchr(dupname, '/'); +int retval; + +if (!basename) { +return -1; +} + +basename[0] = '\0'; +basename++; + +offset = fdt_path_offset(fdt, dupname); +if (offset 0) { +return offset; +} + +retval = fdt_add_subnode(fdt, offset, basename); +g_free(dupname); +return retval; +} diff --git a/device_tree.h b/device_tree.h index 76fce5f..4378685 100644 --- a/device_tree.h +++ b/device_tree.h @@ -23,5 +23,6 @@ int qemu_devtree_setprop_cell(void *fdt, const char *node_path, int qemu_devtree_setprop_string(void *fdt, const char *node_path, const char *property, const char *string); int qemu_devtree_nop_node(void *fdt, const char *node_path); +int qemu_devtree_add_subnode(void *fdt, const char *name); #endif /* __DEVICE_TREE_H__ */ -- 1.6.0.2
[Qemu-devel] [PATCH 53/58] openpic: Unfold read_IRQreg
The helper function read_IRQreg was always called with a specific argument on the type of register to access. Inside the function we were simply doing a switch on that constant argument again. It's a lot easier to just unfold this into two separate functions and call each individually. Reported-by: Blue Swirl blauwir...@gmail.com Signed-off-by: Alexander Graf ag...@suse.de --- hw/openpic.c | 56 +--- 1 files changed, 25 insertions(+), 31 deletions(-) diff --git a/hw/openpic.c b/hw/openpic.c index 03e442b..fbd8837 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -472,20 +472,14 @@ static void openpic_reset (void *opaque) opp-glbc = 0x; } -static inline uint32_t read_IRQreg (openpic_t *opp, int n_IRQ, uint32_t reg) +static inline uint32_t read_IRQreg_ide(openpic_t *opp, int n_IRQ) { -uint32_t retval; - -switch (reg) { -case IRQ_IPVP: -retval = opp-src[n_IRQ].ipvp; -break; -case IRQ_IDE: -retval = opp-src[n_IRQ].ide; -break; -} +return opp-src[n_IRQ].ide; +} -return retval; +static inline uint32_t read_IRQreg_ipvp(openpic_t *opp, int n_IRQ) +{ +return opp-src[n_IRQ].ipvp; } static inline void write_IRQreg (openpic_t *opp, int n_IRQ, @@ -523,10 +517,10 @@ static uint32_t read_doorbell_register (openpic_t *opp, switch (offset) { case DBL_IPVP_OFFSET: -retval = read_IRQreg(opp, IRQ_DBL0 + n_dbl, IRQ_IPVP); +retval = read_IRQreg_ipvp(opp, IRQ_DBL0 + n_dbl); break; case DBL_IDE_OFFSET: -retval = read_IRQreg(opp, IRQ_DBL0 + n_dbl, IRQ_IDE); +retval = read_IRQreg_ide(opp, IRQ_DBL0 + n_dbl); break; case DBL_DMR_OFFSET: retval = opp-doorbells[n_dbl].dmr; @@ -564,10 +558,10 @@ static uint32_t read_mailbox_register (openpic_t *opp, retval = opp-mailboxes[n_mbx].mbr; break; case MBX_IVPR_OFFSET: -retval = read_IRQreg(opp, IRQ_MBX0 + n_mbx, IRQ_IPVP); +retval = read_IRQreg_ipvp(opp, IRQ_MBX0 + n_mbx); break; case MBX_DMR_OFFSET: -retval = read_IRQreg(opp, IRQ_MBX0 + n_mbx, IRQ_IDE); +retval = read_IRQreg_ide(opp, IRQ_MBX0 + n_mbx); break; } @@ -695,7 +689,7 @@ static uint32_t openpic_gbl_read (void *opaque, target_phys_addr_t addr) { int idx; idx = (addr - 0x10A0) 4; -retval = read_IRQreg(opp, opp-irq_ipi0 + idx, IRQ_IPVP); +retval = read_IRQreg_ipvp(opp, opp-irq_ipi0 + idx); } break; case 0x10E0: /* SPVE */ @@ -765,10 +759,10 @@ static uint32_t openpic_timer_read (void *opaque, uint32_t addr) retval = opp-timers[idx].tibc; break; case 0x20: /* TIPV */ -retval = read_IRQreg(opp, opp-irq_tim0 + idx, IRQ_IPVP); +retval = read_IRQreg_ipvp(opp, opp-irq_tim0 + idx); break; case 0x30: /* TIDE */ -retval = read_IRQreg(opp, opp-irq_tim0 + idx, IRQ_IDE); +retval = read_IRQreg_ide(opp, opp-irq_tim0 + idx); break; } DPRINTF(%s: = %08x\n, __func__, retval); @@ -809,10 +803,10 @@ static uint32_t openpic_src_read (void *opaque, uint32_t addr) idx = addr 5; if (addr 0x10) { /* EXDE / IFEDE / IEEDE */ -retval = read_IRQreg(opp, idx, IRQ_IDE); +retval = read_IRQreg_ide(opp, idx); } else { /* EXVP / IFEVP / IEEVP */ -retval = read_IRQreg(opp, idx, IRQ_IPVP); +retval = read_IRQreg_ipvp(opp, idx); } DPRINTF(%s: = %08x\n, __func__, retval); @@ -1368,13 +1362,13 @@ static uint32_t mpic_timer_read (void *opaque, target_phys_addr_t addr) retval = mpp-timers[idx].tibc; break; case 0x20: /* TIPV */ -retval = read_IRQreg(mpp, MPIC_TMR_IRQ + idx, IRQ_IPVP); +retval = read_IRQreg_ipvp(mpp, MPIC_TMR_IRQ + idx); break; case 0x30: /* TIDR */ if ((addr 0xF0) == 0XF0) retval = mpp-dst[cpu].tfrr; else -retval = read_IRQreg(mpp, MPIC_TMR_IRQ + idx, IRQ_IDE); +retval = read_IRQreg_ide(mpp, MPIC_TMR_IRQ + idx); break; } DPRINTF(%s: = %08x\n, __func__, retval); @@ -1421,10 +1415,10 @@ static uint32_t mpic_src_ext_read (void *opaque, target_phys_addr_t addr) idx += (addr 0xFFF0) 5; if (addr 0x10) { /* EXDE / IFEDE / IEEDE */ -retval = read_IRQreg(mpp, idx, IRQ_IDE); +retval = read_IRQreg_ide(mpp, idx); } else { /* EXVP / IFEVP / IEEVP */ -retval = read_IRQreg(mpp, idx, IRQ_IPVP); +retval = read_IRQreg_ipvp(mpp, idx); } DPRINTF(%s: = %08x\n, __func__, retval); } @@ -1471,10 +1465,10 @@ static uint32_t mpic_src_int_read (void *opaque, target_phys_addr_t addr) idx += (addr 0xFFF0) 5; if (addr 0x10) { /* EXDE / IFEDE / IEEDE */
Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization
Am 13.09.2011 15:36, schrieb Frediano Ziglio: 2011/9/13 Kevin Wolf kw...@redhat.com: Am 13.09.2011 09:53, schrieb Frediano Ziglio: These patches try to trade-off between leaks and speed for clusters refcounts. Refcount increments (REF+ or refp) are handled in a different way from decrements (REF- or refm). The reason it that posting or not flushing a REF- cause just a leak while posting a REF+ cause a corruption. To optimize REF- I just used an array to store offsets then when a flush is requested or array reach a limit (currently 1022) the array is sorted and written to disk. I use an array with offset instead of ranges to support compression (an offset could appear multiple times in the array). I consider this patch quite ready. Ok, first of all let's clarify what this optimises. I don't think it changes anything at all for the writeback cache modes, because these already do most operations in memory only. So this must be about optimising some operations with cache=writethrough. REF- isn't about normal cluster allocation, it is about COW with internal snapshots or bdrv_discard. Do you have benchmarks for any of them? I strongly disagree with your approach for REF-. We already have a cache, and introducing a second one sounds like a bad idea. I think we could get a very similar effect if we introduced a qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as dirty, but at the same time tells the cache that even in write-through mode it can still treat this block as write-back. This should require much less code changes. Yes, mainly optimize for writethrough. I did not test with writeback but should improve even this (I think here you have some flush to keep consistency). I'll try to write a qcow2_cache_entry_mark_dirty_wb patch and test it. Great, thanks! But let's measure the effects first, I suspect that for cluster allocation it doesn't help much because every REF- comes with a REF+. That's 50% of effort if REF- clusters are far from REF+ :) I would expect that the next REF+ allocates exactly the REF- cluster. But you still have a point, we save the write on REF- and combine it with the REF+ write. To optimize REF+ I mark a range as allocated and use this range to get new ones (avoiding writing refcount to disk). When a flush is requested or in some situations (like snapshot) this cache is disabled and flushed (written as REF-). I do not consider this patch ready, it works and pass all io-tests but for instance I would avoid allocating new clusters for refcount during preallocation. The only question here is if improving cache=writethrough cluster allocation performance is worth the additional complexity in the already complex refcounting code. I didn't see this optimization as a second level cache, but yes, for REF- is a second cache. The alternative that was discussed before is the dirty bit approach that is used in QED and would allow us to use writeback for all refcount blocks, regardless of REF- or REF+. It would be an easier approach requiring less code changes, but it comes with the cost of requiring an fsck after a qemu crash. I was thinking about changing the header magic first time we change refcount in order to mark image as dirty so newer Qemu recognize the flag while former one does not recognize image. Obviously reverting magic on image close. We've discussed this idea before and I think it wasn't considered a great idea to automagically change the header in an incompatible way. But we can always say that for improved performance you need to upgrade your image to qcow2 v3. End speed up is quite visible allocating clusters (more then 20%). What benchmark do you use for testing this? Kevin Currently I'm using bonnie++ but I noted similar improves with iozone. The test script format an image then launch a Linux machine which run a script and save result to a file. The test image is seems by this virtual machine as a separate disk. The file on hist reside in a separate LV. I got quite consistent results (of course not working on the machine while testing, is not actually dedicated to this job). Actually I'm running the test (added a test working in a snapshot image). Okay. Let me guess the remaining variables: The image is on an ext4 host filesystem, you use cache=writethrough and virtio-blk. You don't use backing files, compression and encryption. For your tests with internal snapshots you have exactly one internal snapshot that is taken immediately before the benchmark. Oh, and not to forget, KVM is enabled. Are these assumptions correct? Kevin
[Qemu-devel] [PATCH 42/58] pseries: use macro for firmware filename
From: Nishanth Aravamudan n...@us.ibm.com For some time we've had a nicely defined macro with the filename for our firmware image. However we didn't actually use it in the place we're supposed to. This patch fixes it. Signed-off-by: Nishanth Aravamudan n...@us.ibm.com Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 00aed62..91953cf 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -442,7 +442,7 @@ static void ppc_spapr_init(ram_addr_t ram_size, %ldM guest RAM\n, MIN_RAM_SLOF); exit(1); } -filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, slof.bin); +filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, FW_FILE_NAME); fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE); if (fw_size 0) { hw_error(qemu: could not load LPAR rtas '%s'\n, filename); -- 1.6.0.2
[Qemu-devel] [PATCH 09/58] PPC: MPIC: Remove read functionality for WO registers
The IPI dispatch registers are write only according to every MPIC spec I have found. So instead of pretending you could read back something from them, better not handle them at all. Reported-by: Elie Richa ri...@adacore.com Signed-off-by: Alexander Graf ag...@suse.de --- hw/openpic.c |7 --- 1 files changed, 0 insertions(+), 7 deletions(-) diff --git a/hw/openpic.c b/hw/openpic.c index 31ad175..dfec52e 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -952,13 +952,6 @@ static uint32_t openpic_cpu_read_internal(void *opaque, target_phys_addr_t addr, case 0xB0: /* PEOI */ retval = 0; break; -#if MAX_IPI 0 -case 0x40: /* IDE */ -case 0x50: -idx = (addr - 0x40) 4; -retval = read_IRQreg(opp, opp-irq_ipi0 + idx, IRQ_IDE); -break; -#endif default: break; } -- 1.6.0.2
[Qemu-devel] [PATCH] raw-posix: Fix bdrv_flush error return values
bdrv_flush is supposed to use 0/-errno return values Signed-off-by: Kevin Wolf kw...@redhat.com --- block/raw-posix.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index a624f56..305998d 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -839,7 +839,14 @@ static int raw_create(const char *filename, QEMUOptionParameter *options) static int raw_flush(BlockDriverState *bs) { BDRVRawState *s = bs-opaque; -return qemu_fdatasync(s-fd); +int ret; + +ret = qemu_fdatasync(s-fd); +if (ret 0) { +return -errno; +} + +return 0; } #ifdef CONFIG_XFS -- 1.7.6
[Qemu-devel] [PATCH 3/3] memory: optimize empty transactions due to mutators
The mutating memory APIs can easily cause empty transactions, where the mutators don't actually change anything, or perhaps only modify disabled regions. Detect these conditions and avoid regenerating the memory topology. Signed-off-by: Avi Kivity a...@redhat.com --- memory.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/memory.c b/memory.c index 3b0cc25..1370fac 100644 --- a/memory.c +++ b/memory.c @@ -19,6 +19,7 @@ #include assert.h unsigned memory_region_transaction_depth = 0; +static bool memory_region_update_pending = false; typedef struct AddrRange AddrRange; @@ -717,6 +718,7 @@ static void address_space_update_topology(AddressSpace *as) static void memory_region_update_topology(MemoryRegion *mr) { if (memory_region_transaction_depth) { +memory_region_update_pending |= !mr || mr-enabled; return; } @@ -730,6 +732,8 @@ static void memory_region_update_topology(MemoryRegion *mr) if (address_space_io.root) { address_space_update_topology(address_space_io); } + +memory_region_update_pending = false; } void memory_region_transaction_begin(void) @@ -741,7 +745,9 @@ void memory_region_transaction_commit(void) { assert(memory_region_transaction_depth); --memory_region_transaction_depth; -memory_region_update_topology(NULL); +if (!memory_region_transaction_depth memory_region_update_pending) { +memory_region_update_topology(NULL); +} } static void memory_region_destructor_none(MemoryRegion *mr) -- 1.7.6.3
[Qemu-devel] [PATCH 2/3] memory: introduce memory_region_set_address()
Allow changing the address of a memory region while it is in the memory hierarchy. Signed-off-by: Avi Kivity a...@redhat.com --- memory.c | 20 memory.h | 11 +++ 2 files changed, 31 insertions(+), 0 deletions(-) diff --git a/memory.c b/memory.c index ce0f3fd..3b0cc25 100644 --- a/memory.c +++ b/memory.c @@ -1260,6 +1260,26 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled) memory_region_update_topology(NULL); } +void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr) +{ +MemoryRegion *parent = mr-parent; +unsigned priority = mr-priority; +bool may_overlap = mr-may_overlap; + +if (addr == mr-addr || !parent) { +return; +} + +memory_region_transaction_begin(); +memory_region_del_subregion(parent, mr); +if (may_overlap) { +memory_region_add_subregion_overlap(parent, addr, mr, priority); +} else { +memory_region_add_subregion(parent, addr, mr); +} +memory_region_transaction_commit(); +} + void set_system_memory_map(MemoryRegion *mr) { address_space_memory.root = mr; diff --git a/memory.h b/memory.h index 60b1449..468970b 100644 --- a/memory.h +++ b/memory.h @@ -509,6 +509,17 @@ void memory_region_del_subregion(MemoryRegion *mr, */ void memory_region_set_enabled(MemoryRegion *mr, bool enabled); +/* + * memory_region_set_address: dynamically update the address of a region + * + * Dynamically updates the address of a region, relative to its parent. + * May be used on regions are currently part of a memory hierarchy. + * + * @mr: the region to be updated + * @addr: new address, relative to parent region + */ +void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr); + /* Start a transaction; changes will be accumulated and made visible only * when the transaction ends. */ -- 1.7.6.3
[Qemu-devel] [PATCH 0/3] Memory API mutators
This patchset introduces memory_region_set_enabled() and memory_region_set_address() to avoid the requirement on memory routers to track the internal state of the memory API (so they know whether they need to add or remove a region). Instead, they can simply copy the state of the region from the guest-exposed register to the memory core, via the new mutator functions. Please review. Do we need a memory_region_set_size() as well? Do we want memory_region_set_attributes(mr, MR_ATTR_ENABLED | MR_ATTR_SIZE, (MemoryRegionAttributes) { .enabled = s-enabled, .address = s-addr, }); ? Avi Kivity (3): memory: introduce memory_region_set_enabled() memory: introduce memory_region_set_address() memory: optimize empty transactions due to mutators memory.c | 64 - memory.h | 28 +++ 2 files changed, 82 insertions(+), 10 deletions(-) -- 1.7.6.3
[Qemu-devel] [PATCH 1/3] memory: introduce memory_region_set_enabled()
This allows users to disable a memory region without removing it from the hierarchy, simplifying the implementation of memory routers. Signed-off-by: Avi Kivity a...@redhat.com --- memory.c | 38 -- memory.h | 17 + 2 files changed, 45 insertions(+), 10 deletions(-) diff --git a/memory.c b/memory.c index 101b67c..ce0f3fd 100644 --- a/memory.c +++ b/memory.c @@ -494,6 +494,10 @@ static void render_memory_region(FlatView *view, FlatRange fr; AddrRange tmp; +if (!mr-enabled) { +return; +} + base += mr-addr; tmp = addrrange_make(base, mr-size); @@ -710,12 +714,16 @@ static void address_space_update_topology(AddressSpace *as) address_space_update_ioeventfds(as); } -static void memory_region_update_topology(void) +static void memory_region_update_topology(MemoryRegion *mr) { if (memory_region_transaction_depth) { return; } +if (mr !mr-enabled) { +return; +} + if (address_space_memory.root) { address_space_update_topology(address_space_memory); } @@ -733,7 +741,7 @@ void memory_region_transaction_commit(void) { assert(memory_region_transaction_depth); --memory_region_transaction_depth; -memory_region_update_topology(); +memory_region_update_topology(NULL); } static void memory_region_destructor_none(MemoryRegion *mr) @@ -770,6 +778,7 @@ void memory_region_init(MemoryRegion *mr, mr-size = size; mr-addr = 0; mr-offset = 0; +mr-enabled = true; mr-terminates = false; mr-readable = true; mr-destructor = memory_region_destructor_none; @@ -1005,7 +1014,7 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client) uint8_t mask = 1 client; mr-dirty_log_mask = (mr-dirty_log_mask ~mask) | (log * mask); -memory_region_update_topology(); +memory_region_update_topology(mr); } bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr, @@ -1042,7 +1051,7 @@ void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable) { if (mr-readable != readable) { mr-readable = readable; -memory_region_update_topology(); +memory_region_update_topology(mr); } } @@ -1144,7 +1153,7 @@ void memory_region_add_eventfd(MemoryRegion *mr, memmove(mr-ioeventfds[i+1], mr-ioeventfds[i], sizeof(*mr-ioeventfds) * (mr-ioeventfd_nb-1 - i)); mr-ioeventfds[i] = mrfd; -memory_region_update_topology(); +memory_region_update_topology(mr); } void memory_region_del_eventfd(MemoryRegion *mr, @@ -1174,7 +1183,7 @@ void memory_region_del_eventfd(MemoryRegion *mr, --mr-ioeventfd_nb; mr-ioeventfds = g_realloc(mr-ioeventfds, sizeof(*mr-ioeventfds)*mr-ioeventfd_nb + 1); -memory_region_update_topology(); +memory_region_update_topology(mr); } static void memory_region_add_subregion_common(MemoryRegion *mr, @@ -1210,7 +1219,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr, } QTAILQ_INSERT_TAIL(mr-subregions, subregion, subregions_link); done: -memory_region_update_topology(); +memory_region_update_topology(mr); } @@ -1239,17 +1248,26 @@ void memory_region_del_subregion(MemoryRegion *mr, assert(subregion-parent == mr); subregion-parent = NULL; QTAILQ_REMOVE(mr-subregions, subregion, subregions_link); -memory_region_update_topology(); +memory_region_update_topology(mr); +} + +void memory_region_set_enabled(MemoryRegion *mr, bool enabled) +{ +if (enabled == mr-enabled) { +return; +} +mr-enabled = enabled; +memory_region_update_topology(NULL); } void set_system_memory_map(MemoryRegion *mr) { address_space_memory.root = mr; -memory_region_update_topology(); +memory_region_update_topology(NULL); } void set_system_io_map(MemoryRegion *mr) { address_space_io.root = mr; -memory_region_update_topology(); +memory_region_update_topology(NULL); } diff --git a/memory.h b/memory.h index 06b83ae..60b1449 100644 --- a/memory.h +++ b/memory.h @@ -114,6 +114,7 @@ struct MemoryRegion { IORange iorange; bool terminates; bool readable; +bool enabled; MemoryRegion *alias; target_phys_addr_t alias_offset; unsigned priority; @@ -492,6 +493,22 @@ void memory_region_add_subregion_overlap(MemoryRegion *mr, void memory_region_del_subregion(MemoryRegion *mr, MemoryRegion *subregion); + +/* + * memory_region_set_enabled: dynamically enable or disable a region + * + * Enables or disables a memory region. A disabled memory region + * ignores all accesses to itself and its subregions. It does not + * obscure sibling subregions with lower priority - it simply behaves as + * if it was removed from the hierarchy. + * + * Regions default to being enabled. + * + * @mr: the region to be updated + *
[Qemu-devel] [PATCH 51/58] Gdbstub: handle read of fpscr
From: Fabien Chouteau chout...@adacore.com Signed-off-by: Fabien Chouteau chout...@adacore.com Signed-off-by: Alexander Graf ag...@suse.de --- gdbstub.c |2 +- target-ppc/translate_init.c |3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/gdbstub.c b/gdbstub.c index 90683a4..efe7b5f 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -733,7 +733,7 @@ static int cpu_gdb_read_register(CPUState *env, uint8_t *mem_buf, int n) { if (gdb_has_xml) return 0; -GET_REG32(0); /* fpscr */ +GET_REG32(env-fpscr); } } } diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index 211f3bd..d09c7ca 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -9700,8 +9700,7 @@ static int gdb_get_float_reg(CPUState *env, uint8_t *mem_buf, int n) return 8; } if (n == 32) { -/* FPSCR not implemented */ -memset(mem_buf, 0, 4); +stl_p(mem_buf, env-fpscr); return 4; } return 0; -- 1.6.0.2
[Qemu-devel] [PATCH 22/58] PPC: E500: Update freqs for all CPUs
Now that we can so nicely find out the host's frequencies, we should also make sure that we get them into all virtual CPUs' device tree nodes. Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppce500_mpc8544ds.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 2c7c677..0791e27 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -70,9 +70,9 @@ static int mpc8544_load_device_tree(CPUState *env, int fdt_size; void *fdt; uint8_t hypercall[16]; -char cpu_name[128] = /cpus/PowerPC,8544@0; uint32_t clock_freq = 4; uint32_t tb_freq = 4; +int i; filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, BINARY_DEVICE_TREE_FILE); if (!filename) { @@ -122,8 +122,12 @@ static int mpc8544_load_device_tree(CPUState *env, hypercall, sizeof(hypercall)); } -qemu_devtree_setprop_cell(fdt, cpu_name, clock-frequency, clock_freq); -qemu_devtree_setprop_cell(fdt, cpu_name, timebase-frequency, tb_freq); +for (i = 0; i smp_cpus; i++) { +char cpu_name[128]; +snprintf(cpu_name, sizeof(cpu_name), /cpus/PowerPC,8544@%x, i); +qemu_devtree_setprop_cell(fdt, cpu_name, clock-frequency, clock_freq); +qemu_devtree_setprop_cell(fdt, cpu_name, timebase-frequency, tb_freq); +} ret = rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr); g_free(fdt); -- 1.6.0.2
[Qemu-devel] [PATCH 58/58] KVM: Update kernel headers
Removes ABI-breaking HIOR parts - KVM patch to follow. Signed-off-by: Alexander Graf ag...@suse.de --- linux-headers/asm-powerpc/kvm.h | 12 ++-- linux-headers/linux/kvm.h |1 - 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h index 28eecf0..a635e22 100644 --- a/linux-headers/asm-powerpc/kvm.h +++ b/linux-headers/asm-powerpc/kvm.h @@ -149,12 +149,6 @@ struct kvm_regs { #define KVM_SREGS_E_UPDATE_DBSR(1 3) /* - * Book3S special bits to indicate contents in the struct by maintaining - * backwards compatibility with older structs. If adding a new field, - * please make sure to add a flag for that new field */ -#define KVM_SREGS_S_HIOR (1 0) - -/* * In KVM_SET_SREGS, reserved/pad fields must be left untouched from a * previous KVM_GET_REGS. * @@ -176,11 +170,9 @@ struct kvm_sregs { } ppc64; struct { __u32 sr[16]; - __u64 ibat[8]; - __u64 dbat[8]; + __u64 ibat[8]; + __u64 dbat[8]; } ppc32; - __u64 flags; /* KVM_SREGS_S_ */ - __u64 hior; } s; struct { union { diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 8bb6cde..6f5095c 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -554,7 +554,6 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_PPC_SMT 64 #define KVM_CAP_PPC_RMA65 #define KVM_CAP_MAX_VCPUS 66 /* returns max vcpus per vm */ -#define KVM_CAP_PPC_HIOR 67 #define KVM_CAP_PPC_PAPR 68 #define KVM_CAP_SW_TLB 69 -- 1.6.0.2
[Qemu-devel] [PATCH 50/58] pseries: Update SLOF firmware image
From: David Gibson da...@gibson.dropbear.id.au The current SLOF firmware for the pseries machine has a bug in SCSI condition handling that was exposed by recent updates to qemu's SCSI emulation. This patch updates the SLOF image to one with the bug fixed. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- pc-bios/README |2 +- pc-bios/slof.bin | Bin 579072 - 57 bytes 2 files changed, 1 insertions(+), 1 deletions(-) diff --git a/pc-bios/README b/pc-bios/README index f74b246..8912211 100644 --- a/pc-bios/README +++ b/pc-bios/README @@ -17,7 +17,7 @@ - SLOF (Slimline Open Firmware) is a free IEEE 1275 Open Firmware implementation for certain IBM POWER hardware. The sources are at https://github.com/dgibson/SLOF, and the image currently in qemu is - built from git tag qemu-slof-20110323. + built from git tag qemu-slof-20110830. - The PXE roms come from the iPXE project. Built with BANNER_TIME 0. Sources available at http://ipxe.org. Vendor:Device ID - ROM mapping: diff --git a/pc-bios/slof.bin b/pc-bios/slof.bin index 22c4c7f5c448e3002aefecf3438f5d080586d666..66163031c6eb5539b54b73214bf18b8cb6aa8743 100644 GIT binary patch delta 2674 zcma)-e{2)i9l+l^+vI#pAfbVz1)3MqK~gx|NkeJ^ZtP|W2}zTNkV3agE@WF-#FhL zcjv}l6WP#ojQ#+OS}A$K{{iHcf~vXkI6+cq@ZwKT6q#)c!NIV^hiKCZ+ZO0;?J z9Q%+vw%D_wB*`@X;4yL0-74U0c$IMYVm=$ZDnI^A$n)wEXa{KrcNPFfeP)0;CX zS8)1Id!Rov9sYQR3R9uw7m+8xoS$qAJmhlc-8GFZxQ#q7*FLS=+#LT*9~p)U|B zCsaXbJ)sSRDhX8)+DPaDLe+#e5!y_shEOe`2MIkyXbYjOgtifSm{1)dM5vz7c0vt= z9wGEaLOTfUB-BW#iO{2jb`jc5h$ZA9h5x5pHbyZumPX=D5TG;Bzb)EdPn+FqTVL z_G9@WmV;Qnk7YNO*YS8Wb2OOPIV@@jHKP_^b2Dn*)85?N-r7Zpzb5+8BsSRRY3tH zj7Ksn!L6?E`bT@qGh0LT)Oci}Gn9sK@51FLFRg=1nW^3*75#nryYGYIllh?cCj5Tp z^1+?(MCRjzyWq$|ZQmsblbQGWFTudV{GWPJf4{vJP7}txi*;JNP(?aK;^{MmTLbz z81knI`hHPUxierZzzj@9fux!xM=eA?bZ8p0E}J%AaSMaihw2N^upmoTQV}1x?s_ zaD;Fa0q^D-T?UPpvx+_9tn+vtS!{?R2WJPj%+zl`~mCx^=2XL~yGl8eT9vIMM zBRKCp4Zu@p0ovODpg#vdvI(Hn=|Jb3+Rd8_MG?M1k_{{oc9B;KJgJi6K==r6HEIN z*)u_Sp`yQIUKfi{YhdO0+auFYQ-$lzRDQ!k??VlX(qjefdv+bejX|TU7djt+#D$ z8-K2ux{^r@kM~J-DI00DN`ZgSWnK1oxe{RE_zWgVD?aO~w*x@hDw{R?m#H z;#7PP;z8lz$i@eqZ)|cO=IAqjRyusvLmANO-OzIsZ|+qdaItppSW~RwV#c*3(#B zK~^~57k)bBF;wls=-T__gX_?=R?-tBF=44v*@UE+#N0WfIh$Vji~bg^q+b+dd|y zb^U~0j{`QLR!^Ek8P8k`=;v{goPMetcF?F0}Q*ox|zjaDjX^7EVuYs*HWZIhhx z{hhMJX=hEUBb^2-ujOy8{33scpQFLbhxz4|`uyT@eLii*|FKzg80II#4}|ot4H? z2dF(!2T%g+_{(H3RXrI+SJ6J%SBV=^zN6g_X(Cnw$G_0EAA0qv=GGd1;IF!U|3 z==}}=6KBgV;)4$Qr}EYLS{2oOyOIR{BpH_3cytFAFLyDt{tv=U}jEOC+A#VcL1P3 zQyNSMte?%fI$+?`-8mN?+mSc-kqaa%8krpIvWfedp+ln;z*M7(N^t=Gtzg!)Ii z5FgidQ88IbV0@^wmi_D^z1wX)`xaeSea`@yYSG+!t_SEk0J7=s-tEf5_QfH8C~QA zmgh8%m_zmR1@`fW2ZjUb+q|V@a3jXx5RVkGrz3|B5~it+8LyHI2v@u+2Q%(UCS zSYem#Ct1H8@jCOEpsN~lxTk9%yZI{pW4OVGBxHyt%L#%m8pa#97HO)juF91MHNJ( zYNE1A9YS1NbzjH;*0Fvk~Ks*jGoXLdl{Eh)L0+=nr`mrWz`T1+Zp|dHLJ`%^_W{ zBvw)+99DugXa{c(As?QDNKNKz9^s1_WXu^nX+K{x-C}YJ9MRW^=je*%+5#$7Gd1 z#wzicogP}98PeB8dpdZi*pF^bl10JJ0oR5EyE~=kt~RGT)IZ#{+T22i9)E}KfZ4~ zdVr!F-PYoDIu!J(n(5_r)iAsfo@WcM1ujpP#(1x$j*Gg+x3#jeq{I{6F-f^!$6D^+ z)1Fkxgy~e3uoQ{wIInTZjJ6|hRE#VQp1QvA84)oGg0{owtK|`I0eWS(DRo6@mLg z;ssIR#$U$ZUpdttkiTdEF}UZ=c%c3nU6{i1aTK{)V_aexg5{YvW(FpKZX=l$W z;y7+Zai+#iHi~Z^{tGIt8%e_yW2~H##^TDjqzHHcpFGMYhdKGc#m+O--Z%Gq7I zXM!We;zCFwEtEb#(j#28e6LoM4pjioXm!0l{2mEAL(tWLiRBtdkb4?U3DnoI1PO zBOI+@9*JrHpLMCGF;z7UQ|IveSEvrf@arYFBSUIeBC^QfBStS2|F8@93V-W6JDU)L yKfp*93SWNoB)(auBpXer#na;RG+U=`(2cIPR!!rr@7u;jR35DeHyN*t^War#sNp delta 2908 zcmb`Ie{3699l+nSeaYD@YZsg!YnSz9q4_~_{*fldP2x3fm$pgTq-oQwLQM1d?tH%X zVw~jd;j7N=k=v?Z2yVT9tMX@db(rL%Vx?~~)!0}-8|nm1={nR*-K0nZ6fGe0 zgZvu9shu^_3rz5-|zQ*-+SNpeRtPhs!RX0?wOsIKY~eLo6TBRacnZ~yvN)3_(Hz; z%Y20n0V#c4aOu+?othQgB?GdXmBCS8OnD^V0-%0?7Ln4+)(5C85|34u}qGyyvS z2Z1jTxQ9R)fqMyTBCwgjeFVx0e35{YKm~zH0#yX63EWR$3xTZ!9w1Od;6Vc02p|Hr z1Rf$#N8n2YwiBo)_JM(Kofy46KE#TLVzLQBH%XRR_5};_5V{lJJns+PWghM;M4 z!qS1|-?2Q3wsZ^#PY9L4r2KiEcapgHf~ol$GejU53ver-GN#?UN7tceJ;6wE4;$ zXy0S#Q9)6qSQPmYFG?Xb7VU0~E7B;d2`H41WWK4)0RR|}SY9{)s2}@+z0OQ^pti*H zo$U;yA@t7vYR_B=ypb92eT*9JwuhxX1Vt!d^xkw*8snr`LM4UK0aIBKL_EHnd=AV zV9)IIq1zCSWhMsu;Bw~8!F)La4=jSsG*Qj`m$iim!?2w`S1ydGW|n2{Inj~@nFA zTMs$A^c#U0+3uRUF5M0Ed82_{i3BI6NEY8-8HV;uQm6s*953J2kezG{VEf$3bM z4bojfUmAc9Ea_SBoLg)=ex21Vg#i5tN{Avad-ewD%}VI7XUqe4O~hF0rK4hmgFsf z#y{vxo{4}J^=0=w_9HzV-1;mrW;-^JB(KecW008Uz}5b)+S@m|so(`r{;?ekgMh z#8c;PCj0Y$``K+4q*pBI%SNBree;8hL17!^l5;w-#Ps|1MumZij^Ox=ZkJ}-~6W4 z{`|)``t$!N^VNcRz56Ki=!3b`}xEm#La@6HJ2E)y;!jeugr#6F2Mrfny*`mru-{ zSbToddfB0uDLu}dGG!D{d@OzpvFgpkiqo2$*$Uih!?SkzW(DFOz8to?@QwV zI0S1-0BUq3+T0}PY6;}N{@-QM!V_o4pchZPO|D|ZFfm!{!Xgg((8zBJnbnC| zw*9dCd^TdsWh2zjnaaB+Z!(vWDaEYLRh0O6A-c$%W2YjUVs-!@H78l4pHBEhF z*;SjrwjRhYjgM_jjz3qI#M#@tgQT04yM8SllkTKi}@AqG!52ok!|;9AQ2^#9i zOIN4sbBgYRn_?fKuJ*l{u#T)c}4=RV?h8?BFG=0(6cAt8RAA0}wb1zLtIp*isWq z-GT3e*WJ$paQjDqes+ecaGd$7(DQNs_D1-wi9Px(KuktM|=NZhIF%{-!jT?9$K9 zP`gXJ09Xm759sOVD0F^hhQh4`x66o3|(rmR~y*@p!wb0+6DSbMfszFgU5OhmrxWT zsxcDp@Z^@R@J{#59%E_Ivg0-+k;FWja5WJfAtDo=_sJa$cd=Few9Y$#lFNNErv`#
[Qemu-devel] [PATCH 47/58] Implement POWER7's CFAR in TCG
From: David Gibson da...@gibson.dropbear.id.au This patch implements support for the CFAR SPR on POWER7 (Come From Address Register), which snapshots the PC value at the time of a branch or an rfid. The latest powerpc-next kernel also catches it and can show it in xmon or in the signal frames. This works well enough to let recent kernels boot (which otherwise oops on the CFAR access). It hasn't been tested enough to be confident that the CFAR values are actually accurate, but one thing at a time. Signed-off-by: Ben Herrenschmidt b...@kernel.crashing.org Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- target-ppc/cpu.h|8 target-ppc/translate.c | 28 target-ppc/translate_init.c | 23 ++- 3 files changed, 58 insertions(+), 1 deletions(-) diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index 32706df..3f4af22 100644 --- a/target-ppc/cpu.h +++ b/target-ppc/cpu.h @@ -555,6 +555,8 @@ enum { /* Decrementer clock: RTC clock (POWER, 601) or bus clock*/ POWERPC_FLAG_RTC_CLK = 0x0001, POWERPC_FLAG_BUS_CLK = 0x0002, +/* Has CFAR */ +POWERPC_FLAG_CFAR = 0x0004, }; /*/ @@ -872,6 +874,10 @@ struct CPUPPCState { target_ulong ctr; /* condition register */ uint32_t crf[8]; +#if defined(TARGET_PPC64) +/* CFAR */ +target_ulong cfar; +#endif /* XER */ target_ulong xer; /* Reservation address */ @@ -1204,6 +1210,7 @@ static inline void cpu_clone_regs(CPUState *env, target_ulong newsp) #define SPR_601_UDECR (0x006) #define SPR_LR(0x008) #define SPR_CTR (0x009) +#define SPR_DSCR (0x011) #define SPR_DSISR (0x012) #define SPR_DAR (0x013) /* DAE for PowerPC 601 */ #define SPR_601_RTCU (0x014) @@ -1212,6 +1219,7 @@ static inline void cpu_clone_regs(CPUState *env, target_ulong newsp) #define SPR_SDR1 (0x019) #define SPR_SRR0 (0x01A) #define SPR_SRR1 (0x01B) +#define SPR_CFAR (0x01C) #define SPR_AMR (0x01D) #define SPR_BOOKE_PID (0x030) #define SPR_BOOKE_DECAR (0x036) diff --git a/target-ppc/translate.c b/target-ppc/translate.c index 4277460..1e362fc 100644 --- a/target-ppc/translate.c +++ b/target-ppc/translate.c @@ -69,6 +69,9 @@ static TCGv cpu_nip; static TCGv cpu_msr; static TCGv cpu_ctr; static TCGv cpu_lr; +#if defined(TARGET_PPC64) +static TCGv cpu_cfar; +#endif static TCGv cpu_xer; static TCGv cpu_reserve; static TCGv_i32 cpu_fpscr; @@ -154,6 +157,11 @@ void ppc_translate_init(void) cpu_lr = tcg_global_mem_new(TCG_AREG0, offsetof(CPUState, lr), lr); +#if defined(TARGET_PPC64) +cpu_cfar = tcg_global_mem_new(TCG_AREG0, + offsetof(CPUState, cfar), cfar); +#endif + cpu_xer = tcg_global_mem_new(TCG_AREG0, offsetof(CPUState, xer), xer); @@ -187,6 +195,7 @@ typedef struct DisasContext { int le_mode; #if defined(TARGET_PPC64) int sf_mode; +int has_cfar; #endif int fpu_enabled; int altivec_enabled; @@ -3345,6 +3354,14 @@ static inline void gen_qemu_st32fiw(DisasContext *ctx, TCGv_i64 arg1, TCGv arg2) /* stfiwx */ GEN_STXF(stfiw, st32fiw, 0x17, 0x1E, PPC_FLOAT_STFIWX); +static inline void gen_update_cfar(DisasContext *ctx, target_ulong nip) +{ +#if defined(TARGET_PPC64) +if (ctx-has_cfar) +tcg_gen_movi_tl(cpu_cfar, nip); +#endif +} + /***Branch ***/ static inline void gen_goto_tb(DisasContext *ctx, int n, target_ulong dest) { @@ -3407,6 +3424,7 @@ static void gen_b(DisasContext *ctx) target = li; if (LK(ctx-opcode)) gen_setlr(ctx, ctx-nip); +gen_update_cfar(ctx, ctx-nip); gen_goto_tb(ctx, 0, target); } @@ -3469,6 +3487,7 @@ static inline void gen_bcond(DisasContext *ctx, int type) } tcg_temp_free_i32(temp); } +gen_update_cfar(ctx, ctx-nip); if (type == BCOND_IM) { target_ulong li = (target_long)((int16_t)(BD(ctx-opcode))); if (likely(AA(ctx-opcode) == 0)) { @@ -3580,6 +3599,7 @@ static void gen_rfi(DisasContext *ctx) gen_inval_exception(ctx, POWERPC_EXCP_PRIV_OPC); return; } +gen_update_cfar(ctx, ctx-nip); gen_helper_rfi(); gen_sync_exception(ctx); #endif @@ -3596,6 +3616,7 @@ static void gen_rfid(DisasContext *ctx) gen_inval_exception(ctx, POWERPC_EXCP_PRIV_OPC); return; } +gen_update_cfar(ctx, ctx-nip); gen_helper_rfid(); gen_sync_exception(ctx); #endif @@ -9263,6 +9284,12 @@ void cpu_dump_state
[Qemu-devel] [PATCH 14/58] device tree: add nop_node
We have a qemu internal abstraction layer on FDT. While I'm not fully convinced we need it at all, it's missing the nop_node functionality that we now need on e500. So let's add it and think about the general future of that API later. Signed-off-by: Alexander Graf ag...@suse.de --- device_tree.c | 11 +++ device_tree.h |1 + 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/device_tree.c b/device_tree.c index 3a224d1..23e89e3 100644 --- a/device_tree.c +++ b/device_tree.c @@ -107,3 +107,14 @@ int qemu_devtree_setprop_string(void *fdt, const char *node_path, return fdt_setprop_string(fdt, offset, property, string); } + +int qemu_devtree_nop_node(void *fdt, const char *node_path) +{ +int offset; + +offset = fdt_path_offset(fdt, node_path); +if (offset 0) +return offset; + +return fdt_nop_node(fdt, offset); +} diff --git a/device_tree.h b/device_tree.h index cecd98f..76fce5f 100644 --- a/device_tree.h +++ b/device_tree.h @@ -22,5 +22,6 @@ int qemu_devtree_setprop_cell(void *fdt, const char *node_path, const char *property, uint32_t val); int qemu_devtree_setprop_string(void *fdt, const char *node_path, const char *property, const char *string); +int qemu_devtree_nop_node(void *fdt, const char *node_path); #endif /* __DEVICE_TREE_H__ */ -- 1.6.0.2
Re: [Qemu-devel] [PATCH 0/3] Memory API mutators
Jan, too, was interested in this. On 09/14/2011 12:23 PM, Avi Kivity wrote: This patchset introduces memory_region_set_enabled() and memory_region_set_address() to avoid the requirement on memory routers to track the internal state of the memory API (so they know whether they need to add or remove a region). Instead, they can simply copy the state of the region from the guest-exposed register to the memory core, via the new mutator functions. Please review. Do we need a memory_region_set_size() as well? Do we want memory_region_set_attributes(mr, MR_ATTR_ENABLED | MR_ATTR_SIZE, (MemoryRegionAttributes) { .enabled = s-enabled, .address = s-addr, }); ? Avi Kivity (3): memory: introduce memory_region_set_enabled() memory: introduce memory_region_set_address() memory: optimize empty transactions due to mutators memory.c | 64 - memory.h | 28 +++ 2 files changed, 82 insertions(+), 10 deletions(-) -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization
2011/9/14 Kevin Wolf kw...@redhat.com: Am 13.09.2011 15:36, schrieb Frediano Ziglio: 2011/9/13 Kevin Wolf kw...@redhat.com: Am 13.09.2011 09:53, schrieb Frediano Ziglio: These patches try to trade-off between leaks and speed for clusters refcounts. Refcount increments (REF+ or refp) are handled in a different way from decrements (REF- or refm). The reason it that posting or not flushing a REF- cause just a leak while posting a REF+ cause a corruption. To optimize REF- I just used an array to store offsets then when a flush is requested or array reach a limit (currently 1022) the array is sorted and written to disk. I use an array with offset instead of ranges to support compression (an offset could appear multiple times in the array). I consider this patch quite ready. Ok, first of all let's clarify what this optimises. I don't think it changes anything at all for the writeback cache modes, because these already do most operations in memory only. So this must be about optimising some operations with cache=writethrough. REF- isn't about normal cluster allocation, it is about COW with internal snapshots or bdrv_discard. Do you have benchmarks for any of them? I strongly disagree with your approach for REF-. We already have a cache, and introducing a second one sounds like a bad idea. I think we could get a very similar effect if we introduced a qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as dirty, but at the same time tells the cache that even in write-through mode it can still treat this block as write-back. This should require much less code changes. Yes, mainly optimize for writethrough. I did not test with writeback but should improve even this (I think here you have some flush to keep consistency). I'll try to write a qcow2_cache_entry_mark_dirty_wb patch and test it. Great, thanks! Don't expect however the patch too soon, I'm quite busy in these days. But let's measure the effects first, I suspect that for cluster allocation it doesn't help much because every REF- comes with a REF+. That's 50% of effort if REF- clusters are far from REF+ :) I would expect that the next REF+ allocates exactly the REF- cluster. But you still have a point, we save the write on REF- and combine it with the REF+ write. This is still a TODO for REF+ patch. Oh... time ago looking at refcount code I realize that a single deallocation could be reused in some cases only after Qemu restart. For instance - got a single cluster REF- which take refcount to 0 - free_cluster_index get decreased to this index - we get a new cluster request for 2 clusters - free_cluster_index get increased we skip freed deallocation and if we don't get a new deallocation for a cluster with index minor to our freed cluster this cluster is not reused. (I didn't test this behavior, no leak, no corruption, just image could be larger then expected) To optimize REF+ I mark a range as allocated and use this range to get new ones (avoiding writing refcount to disk). When a flush is requested or in some situations (like snapshot) this cache is disabled and flushed (written as REF-). I do not consider this patch ready, it works and pass all io-tests but for instance I would avoid allocating new clusters for refcount during preallocation. The only question here is if improving cache=writethrough cluster allocation performance is worth the additional complexity in the already complex refcounting code. I didn't see this optimization as a second level cache, but yes, for REF- is a second cache. The alternative that was discussed before is the dirty bit approach that is used in QED and would allow us to use writeback for all refcount blocks, regardless of REF- or REF+. It would be an easier approach requiring less code changes, but it comes with the cost of requiring an fsck after a qemu crash. I was thinking about changing the header magic first time we change refcount in order to mark image as dirty so newer Qemu recognize the flag while former one does not recognize image. Obviously reverting magic on image close. We've discussed this idea before and I think it wasn't considered a great idea to automagically change the header in an incompatible way. But we can always say that for improved performance you need to upgrade your image to qcow2 v3. I don't understand why there is not a wiki page for detailed qcow3 changes. I saw your post on May. I follow this ML since August so I think I missed a lot of discussion on qcow improves. End speed up is quite visible allocating clusters (more then 20%). What benchmark do you use for testing this? Kevin Currently I'm using bonnie++ but I noted similar improves with iozone. The test script format an image then launch a Linux machine which run a script and save result to a file. The test image is seems by this virtual machine as a separate disk. The file on hist reside in a separate LV. I got quite
[Qemu-devel] [PATCH 23/58] PPC: E500: Remove unneeded CPU nodes
We should only keep CPU nodes in the device tree around that we really have virtual CPUs for. So remove all superfluous entries that we just keep there in case someone wants to create a lot of vCPUs. Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppce500_mpc8544ds.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 0791e27..9379624 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -129,6 +129,12 @@ static int mpc8544_load_device_tree(CPUState *env, qemu_devtree_setprop_cell(fdt, cpu_name, timebase-frequency, tb_freq); } +for (i = smp_cpus; i 32; i++) { +char cpu_name[128]; +snprintf(cpu_name, sizeof(cpu_name), /cpus/PowerPC,8544@%x, i); +qemu_devtree_nop_node(fdt, cpu_name); +} + ret = rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr); g_free(fdt); -- 1.6.0.2
Re: [Qemu-devel] [PATCH 0/3] Memory API mutators
On 14 September 2011 10:23, Avi Kivity a...@redhat.com wrote: This patchset introduces memory_region_set_enabled() and memory_region_set_address() to avoid the requirement on memory routers to track the internal state of the memory API (so they know whether they need to add or remove a region). Instead, they can simply copy the state of the region from the guest-exposed register to the memory core, via the new mutator functions. Please review. Do we need a memory_region_set_size() as well? Would set_size() allow things like omap_gpmc() to avoid the need to create an intermediate container subregion to enforce size clipping on the child region it's trying to map? (Strictly speaking what omap_gpmc() wants is not merely clipping to a guest-specified size but also wrapping, so you can take a 16MB child region and map the bottom 4MB of it repeating into a 32MB chunk of address space, say. But that would require a lot of playing games with aliases to implement a bizarre corner case that nobody uses in practice.) -- PMM
[Qemu-devel] [PATCH 57/58] PPC: Fix heathrow PIC to use little endian MMIO
During the memory API conversion, the indication on little endianness of MMIO for the heathrow PIC got dropped. This patch adds it back again. Signed-off-by: Alexander Graf ag...@suse.de --- hw/heathrow_pic.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c index 51996ab..16f48d1 100644 --- a/hw/heathrow_pic.c +++ b/hw/heathrow_pic.c @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, target_phys_addr_t addr, static const MemoryRegionOps heathrow_pic_ops = { .read = pic_read, .write = pic_write, -.endianness = DEVICE_NATIVE_ENDIAN, +.endianness = DEVICE_LITTLE_ENDIAN, }; static void heathrow_pic_set_irq(void *opaque, int num, int level) -- 1.6.0.2
[Qemu-devel] [PATCH 11/58] PPC: Bump MPIC up to 32 supported CPUs
The MPIC emulation is now capable of handling up to 32 CPUs. Reflect that in the code exporting the numbers out and fix an integer overflow while at it. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - Max cpus is 15 due to cINT routing - Report nb_cpus not MAX_CPUS in MPIC capabilities --- hw/openpic.c | 10 +++--- 1 files changed, 3 insertions(+), 7 deletions(-) diff --git a/hw/openpic.c b/hw/openpic.c index 109c1bc..03e442b 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -63,7 +63,7 @@ #elif defined(USE_MPCxxx) -#define MAX_CPU 2 +#define MAX_CPU15 #define MAX_IRQ 128 #define MAX_DBL 0 #define MAX_MBX 0 @@ -507,7 +507,7 @@ static inline void write_IRQreg (openpic_t *opp, int n_IRQ, break; case IRQ_IDE: tmp = val 0xC000; -tmp |= val ((1 MAX_CPU) - 1); +tmp |= val ((1ULL MAX_CPU) - 1); opp-src[n_IRQ].ide = tmp; DPRINTF(Set IDE %d to 0x%08x\n, n_IRQ, opp-src[n_IRQ].ide); break; @@ -1283,7 +1283,7 @@ static void mpic_reset (void *opaque) mpp-glbc = 0x8000; /* Initialise controller registers */ -mpp-frep = 0x004f0002; +mpp-frep = 0x004f0002 | ((mpp-nb_cpus - 1) 8); mpp-veni = VENI; mpp-pint = 0x; mpp-spve = 0x; @@ -1684,10 +1684,6 @@ qemu_irq *mpic_init (target_phys_addr_t base, int nb_cpus, {mpic_cpu_read, mpic_cpu_write, MPIC_CPU_REG_START, MPIC_CPU_REG_SIZE}, }; -/* XXX: for now, only one CPU is supported */ -if (nb_cpus != 1) -return NULL; - mpp = g_malloc0(sizeof(openpic_t)); for (i = 0; i sizeof(list)/sizeof(list[0]); i++) { -- 1.6.0.2
Re: [Qemu-devel] [PATCH 0/3] Memory API mutators
On 09/14/2011 12:56 PM, Peter Maydell wrote: On 14 September 2011 10:23, Avi Kivitya...@redhat.com wrote: This patchset introduces memory_region_set_enabled() and memory_region_set_address() to avoid the requirement on memory routers to track the internal state of the memory API (so they know whether they need to add or remove a region). Instead, they can simply copy the state of the region from the guest-exposed register to the memory core, via the new mutator functions. Please review. Do we need a memory_region_set_size() as well? Would set_size() allow things like omap_gpmc() to avoid the need to create an intermediate container subregion to enforce size clipping on the child region it's trying to map? I'd recommend not calling _set_size() on somebody else's region - this quickly leads to confusion. Only call set_size() if you also called _init() and will call _destroy(). Can you point me at the code in question? _set_size() may be useful for dynamic bridge windows and the like. (Strictly speaking what omap_gpmc() wants is not merely clipping to a guest-specified size but also wrapping, so you can take a 16MB child region and map the bottom 4MB of it repeating into a 32MB chunk of address space, say. But that would require a lot of playing games with aliases to implement a bizarre corner case that nobody uses in practice.) That's best done in the memory core, the rendering loop can be adjusted to do this replication. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 17/58] PPC: E500: Use generic kvm function for freq
Now that we have generic KVM functions to read out the host tb and clock frequencies, let's use them in the e500 code! Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppce500_mpc8544ds.c | 44 +--- 1 files changed, 9 insertions(+), 35 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 9cb01f3..8748531 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -14,8 +14,6 @@ * (at your option) any later version. */ -#include dirent.h - #include config.h #include qemu-common.h #include net.h @@ -96,6 +94,9 @@ static int mpc8544_load_device_tree(CPUState *env, int fdt_size; void *fdt; uint8_t hypercall[16]; +char cpu_name[128] = /cpus/PowerPC,8544@0; +uint32_t clock_freq = 4; +uint32_t tb_freq = 4; filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, BINARY_DEVICE_TREE_FILE); if (!filename) { @@ -133,32 +134,9 @@ static int mpc8544_load_device_tree(CPUState *env, fprintf(stderr, couldn't set /chosen/bootargs\n); if (kvm_enabled()) { -struct dirent *dirp; -DIR *dp; -char buf[128]; - -if ((dp = opendir(/proc/device-tree/cpus/)) == NULL) { -printf(Can't open directory /proc/device-tree/cpus/\n); -ret = -1; -goto out; -} - -buf[0] = '\0'; -while ((dirp = readdir(dp)) != NULL) { -if (strncmp(dirp-d_name, PowerPC, 7) == 0) { -snprintf(buf, 128, /cpus/%s, dirp-d_name); -break; -} -} -closedir(dp); -if (buf[0] == '\0') { -printf(Unknow host!\n); -ret = -1; -goto out; -} - -mpc8544_copy_soc_cell(fdt, buf, clock-frequency); -mpc8544_copy_soc_cell(fdt, buf, timebase-frequency); +/* Read out host's frequencies */ +clock_freq = kvmppc_get_clockfreq(); +tb_freq = kvmppc_get_tbfreq(); /* indicate KVM hypercall interface */ qemu_devtree_setprop_string(fdt, /hypervisor, compatible, @@ -166,15 +144,11 @@ static int mpc8544_load_device_tree(CPUState *env, kvmppc_get_hypercall(env, hypercall, sizeof(hypercall)); qemu_devtree_setprop(fdt, /hypervisor, hcall-instructions, hypercall, sizeof(hypercall)); -} else { -const uint32_t freq = 4; - -qemu_devtree_setprop_cell(fdt, /cpus/PowerPC,8544@0, - clock-frequency, freq); -qemu_devtree_setprop_cell(fdt, /cpus/PowerPC,8544@0, - timebase-frequency, freq); } +qemu_devtree_setprop_cell(fdt, cpu_name, clock-frequency, clock_freq); +qemu_devtree_setprop_cell(fdt, cpu_name, timebase-frequency, tb_freq); + ret = rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr); g_free(fdt); -- 1.6.0.2
[Qemu-devel] [PATCH 04/58] PPC: Move openpic to target specific code compilation
The MPIC has some funny feature where it maps different registers to an MMIO region depending which CPU accesses them. To be able to reflect that, we need to make OpenPIC be compiled in the target code, so it can access cpu_single_env. Signed-off-by: Alexander Graf ag...@suse.de --- Makefile.objs |1 - Makefile.target |2 ++ 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index 62020d7..60c63af 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -221,7 +221,6 @@ hw-obj-$(CONFIG_SMARTCARD_NSS) += ccid-card-emulated.o hw-obj-$(CONFIG_USB_REDIR) += usb-redir.o # PPC devices -hw-obj-$(CONFIG_OPENPIC) += openpic.o hw-obj-$(CONFIG_PREP_PCI) += prep_pci.o # Mac shared devices hw-obj-$(CONFIG_MACIO) += macio.o diff --git a/Makefile.target b/Makefile.target index f708453..2ed9099 100644 --- a/Makefile.target +++ b/Makefile.target @@ -252,6 +252,8 @@ obj-ppc-y += ppce500_mpc8544ds.o mpc8544_guts.o obj-ppc-y += virtex_ml507.o obj-ppc-$(CONFIG_KVM) += kvm_ppc.o obj-ppc-$(CONFIG_FDT) += device_tree.o +# PowerPC OpenPIC +obj-ppc-y += openpic.o # Xilinx PPC peripherals obj-ppc-y += xilinx_intc.o -- 1.6.0.2
[Qemu-devel] [PATCH 45/58] ppc: booke206: add info tlb support
From: Scott Wood scottw...@freescale.com Signed-off-by: Scott Wood scottw...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- hmp-commands.hx |2 +- monitor.c |5 ++- target-ppc/cpu.h|2 + target-ppc/helper.c | 88 +++ 4 files changed, 94 insertions(+), 3 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 9e1cca8..506014c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1306,7 +1306,7 @@ show i8259 (PIC) state @item info pci show emulated PCI device info @item info tlb -show virtual to physical memory mappings (i386, SH4 and SPARC only) +show virtual to physical memory mappings (i386, SH4, SPARC, and PPC only) @item info mem show the active virtual memory mappings (i386 only) @item info jit diff --git a/monitor.c b/monitor.c index 03ae997..46bfeec 100644 --- a/monitor.c +++ b/monitor.c @@ -2456,7 +2456,7 @@ static void tlb_info(Monitor *mon) #endif -#if defined(TARGET_SPARC) +#if defined(TARGET_SPARC) || defined(TARGET_PPC) static void tlb_info(Monitor *mon) { CPUState *env1 = mon_get_cpu(); @@ -2949,7 +2949,8 @@ static const mon_cmd_t info_cmds[] = { .user_print = do_pci_info_print, .mhandler.info_new = do_pci_info, }, -#if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC) +#if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC) || \ +defined(TARGET_PPC) { .name = tlb, .args_type = , diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index 3e7f797..5200e6e 100644 --- a/target-ppc/cpu.h +++ b/target-ppc/cpu.h @@ -2045,4 +2045,6 @@ static inline void cpu_pc_from_tb(CPUState *env, TranslationBlock *tb) env-nip = tb-pc; } +void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUState *env); + #endif /* !defined (__CPU_PPC_H__) */ diff --git a/target-ppc/helper.c b/target-ppc/helper.c index 5ec83f2..d1bc574 100644 --- a/target-ppc/helper.c +++ b/target-ppc/helper.c @@ -1465,6 +1465,94 @@ found_tlb: return ret; } +static const char *book3e_tsize_to_str[32] = { +1K, 2K, 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K, +1M, 2M, 4M, 8M, 16M, 32M, 64M, 128M, 256M, 512M, +1G, 2G, 4G, 8G, 16G, 32G, 64G, 128G, 256G, 512G, +1T, 2T +}; + +static void mmubooke206_dump_one_tlb(FILE *f, fprintf_function cpu_fprintf, + CPUState *env, int tlbn, int offset, + int tlbsize) +{ +ppcmas_tlb_t *entry; +int i; + +cpu_fprintf(f, \nTLB%d:\n, tlbn); +cpu_fprintf(f, Effective Physical Size TID TS SRWX URWX WIMGE U0123\n); + +entry = env-tlb.tlbm[offset]; +for (i = 0; i tlbsize; i++, entry++) { +target_phys_addr_t ea, pa, size; +int tsize; + +if (!(entry-mas1 MAS1_VALID)) { +continue; +} + +tsize = (entry-mas1 MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT; +size = 1024ULL tsize; +ea = entry-mas2 ~(size - 1); +pa = entry-mas7_3 ~(size - 1); + +cpu_fprintf(f, 0x%016 PRIx64 0x%016 PRIx64 %4s %-5u %1u S%c%c%c U%c%c%c %c%c%c%c%c U%c%c%c%c\n, +(uint64_t)ea, (uint64_t)pa, +book3e_tsize_to_str[tsize], +(entry-mas1 MAS1_TID_MASK) MAS1_TID_SHIFT, +(entry-mas1 MAS1_TS) MAS1_TS_SHIFT, +entry-mas7_3 MAS3_SR ? 'R' : '-', +entry-mas7_3 MAS3_SW ? 'W' : '-', +entry-mas7_3 MAS3_SX ? 'X' : '-', +entry-mas7_3 MAS3_UR ? 'R' : '-', +entry-mas7_3 MAS3_UW ? 'W' : '-', +entry-mas7_3 MAS3_UX ? 'X' : '-', +entry-mas2 MAS2_W ? 'W' : '-', +entry-mas2 MAS2_I ? 'I' : '-', +entry-mas2 MAS2_M ? 'M' : '-', +entry-mas2 MAS2_G ? 'G' : '-', +entry-mas2 MAS2_E ? 'E' : '-', +entry-mas7_3 MAS3_U0 ? '0' : '-', +entry-mas7_3 MAS3_U1 ? '1' : '-', +entry-mas7_3 MAS3_U2 ? '2' : '-', +entry-mas7_3 MAS3_U3 ? '3' : '-'); +} +} + +static void mmubooke206_dump_mmu(FILE *f, fprintf_function cpu_fprintf, + CPUState *env) +{ +int offset = 0; +int i; + +if (kvm_enabled() !env-kvm_sw_tlb) { +cpu_fprintf(f, Cannot access KVM TLB\n); +return; +} + +for (i = 0; i BOOKE206_MAX_TLBN; i++) { +int size = booke206_tlb_size(env, i); + +if (size == 0) { +continue; +} + +mmubooke206_dump_one_tlb(f, cpu_fprintf, env, i, offset, size); +offset += size; +} +} + +void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUState *env) +{ +switch (env-mmu_model) { +case POWERPC_MMU_BOOKE206: +mmubooke206_dump_mmu(f, cpu_fprintf,
[Qemu-devel] [PATCH 28/58] device tree: give dt more size
We currently load a device tree blob and then just take its size x2 to account for modifications we do inside. While this is nice and great, it fails when we have a small device tree as blob and lots of nodes added in machine init code. So for now, just make it 20k bigger than it was before. We maybe want to be more clever about this later. Signed-off-by: Alexander Graf ag...@suse.de --- device_tree.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/device_tree.c b/device_tree.c index 751538e..dc69232 100644 --- a/device_tree.c +++ b/device_tree.c @@ -41,6 +41,7 @@ void *load_device_tree(const char *filename_path, int *sizep) } /* Expand to 2x size to give enough room for manipulation. */ +dt_size += 1; dt_size *= 2; /* First allocate space in qemu for device tree */ fdt = g_malloc0(dt_size); -- 1.6.0.2
Re: [Qemu-devel] [PATCH 05/58] PPC: Add CPU local MMIO regions to MPIC
On 14 September 2011 09:42, Alexander Graf ag...@suse.de wrote: The MPIC exports a register set for each CPU connected to it. They can all be accessed through specific registers or using a shadow page that is mapped differently depending on which CPU accesses it. This patch implements the shadow map, making it possible for guests to access the CPU local registers using the same address on each CPU. +static int get_current_cpu(void) +{ + return cpu_single_env-cpu_index; +} This is the standard way of doing this (we use it on ARM as well), but it's pretty clearly a hack. which master sent this memory transaction is an attribute that ought to be passed down to the MMIO read/write functions, really (along with other interesting things like priv or not? and probably architecture specific attributes like ARM's secure/non-secure); this matches how hardware does it where the attributes are passed along as extra signals in the bus fabric. (Sometimes hardware also does this by having buses from the different cores be totally separate paths at the point where this kind of device is connected, before merging together later; we don't really support modelling that either :-)) Not a nak, just an observation while I'm thinking about it. -- PMM
[Qemu-devel] [PATCH 05/58] PPC: Add CPU local MMIO regions to MPIC
The MPIC exports a register set for each CPU connected to it. They can all be accessed through specific registers or using a shadow page that is mapped differently depending on which CPU accesses it. This patch implements the shadow map, making it possible for guests to access the CPU local registers using the same address on each CPU. Signed-off-by: Alexander Graf ag...@suse.de --- hw/openpic.c | 110 ++ 1 files changed, 72 insertions(+), 38 deletions(-) diff --git a/hw/openpic.c b/hw/openpic.c index 26c96e2..cf89f23 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -2,6 +2,7 @@ * OpenPIC emulation * * Copyright (c) 2004 Jocelyn Mayer + * 2011 Alexander Graf * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the Software), to deal @@ -161,6 +162,16 @@ static inline int test_bit (uint32_t *field, int bit) return (field[bit 5] 1 (bit 0x1F)) != 0; } +static int get_current_cpu(void) +{ + return cpu_single_env-cpu_index; +} + +static uint32_t openpic_cpu_read_internal(void *opaque, target_phys_addr_t addr, + int idx); +static void openpic_cpu_write_internal(void *opaque, target_phys_addr_t addr, + uint32_t val, int idx); + enum { IRQ_EXTERNAL = 0x01, IRQ_INTERNAL = 0x02, @@ -590,18 +601,27 @@ static void openpic_gbl_write (void *opaque, target_phys_addr_t addr, uint32_t v DPRINTF(%s: addr TARGET_FMT_plx = %08x\n, __func__, addr, val); if (addr 0xF) return; -addr = 0xFF; switch (addr) { -case 0x00: /* FREP */ +case 0x40: +case 0x50: +case 0x60: +case 0x70: +case 0x80: +case 0x90: +case 0xA0: +case 0xB0: +openpic_cpu_write_internal(opp, addr, val, get_current_cpu()); +break; +case 0x1000: /* FREP */ break; -case 0x20: /* GLBC */ +case 0x1020: /* GLBC */ if (val 0x8000 opp-reset) opp-reset(opp); opp-glbc = val ~0x8000; break; -case 0x80: /* VENI */ +case 0x1080: /* VENI */ break; -case 0x90: /* PINT */ +case 0x1090: /* PINT */ for (idx = 0; idx opp-nb_cpus; idx++) { if ((val (1 idx)) !(opp-pint (1 idx))) { DPRINTF(Raise OpenPIC RESET output for CPU %d\n, idx); @@ -615,22 +635,20 @@ static void openpic_gbl_write (void *opaque, target_phys_addr_t addr, uint32_t v } opp-pint = val; break; -#if MAX_IPI 0 -case 0xA0: /* IPI_IPVP */ -case 0xB0: -case 0xC0: -case 0xD0: +case 0x10A0: /* IPI_IPVP */ +case 0x10B0: +case 0x10C0: +case 0x10D0: { int idx; -idx = (addr - 0xA0) 4; +idx = (addr - 0x10A0) 4; write_IRQreg(opp, opp-irq_ipi0 + idx, IRQ_IPVP, val); } break; -#endif -case 0xE0: /* SPVE */ +case 0x10E0: /* SPVE */ opp-spve = val 0x00FF; break; -case 0xF0: /* TIFR */ +case 0x10F0: /* TIFR */ opp-tifr = val; break; default: @@ -647,36 +665,43 @@ static uint32_t openpic_gbl_read (void *opaque, target_phys_addr_t addr) retval = 0x; if (addr 0xF) return retval; -addr = 0xFF; switch (addr) { -case 0x00: /* FREP */ +case 0x1000: /* FREP */ retval = opp-frep; break; -case 0x20: /* GLBC */ +case 0x1020: /* GLBC */ retval = opp-glbc; break; -case 0x80: /* VENI */ +case 0x1080: /* VENI */ retval = opp-veni; break; -case 0x90: /* PINT */ +case 0x1090: /* PINT */ retval = 0x; break; -#if MAX_IPI 0 -case 0xA0: /* IPI_IPVP */ +case 0x40: +case 0x50: +case 0x60: +case 0x70: +case 0x80: +case 0x90: +case 0xA0: case 0xB0: -case 0xC0: -case 0xD0: +retval = openpic_cpu_read_internal(opp, addr, get_current_cpu()); +break; +case 0x10A0: /* IPI_IPVP */ +case 0x10B0: +case 0x10C0: +case 0x10D0: { int idx; -idx = (addr - 0xA0) 4; +idx = (addr - 0x10A0) 4; retval = read_IRQreg(opp, opp-irq_ipi0 + idx, IRQ_IPVP); } break; -#endif -case 0xE0: /* SPVE */ +case 0x10E0: /* SPVE */ retval = opp-spve; break; -case 0xF0: /* TIFR */ +case 0x10F0: /* TIFR */ retval = opp-tifr; break; default: @@ -794,23 +819,23 @@ static uint32_t openpic_src_read (void *opaque, uint32_t addr) return retval; } -static void openpic_cpu_write (void *opaque, target_phys_addr_t addr, uint32_t val) +static void openpic_cpu_write_internal(void *opaque, target_phys_addr_t addr, +
Re: [Qemu-devel] [PATCH 05/58] PPC: Add CPU local MMIO regions to MPIC
Am 14.09.2011 um 12:07 schrieb Peter Maydell peter.mayd...@linaro.org: On 14 September 2011 09:42, Alexander Graf ag...@suse.de wrote: The MPIC exports a register set for each CPU connected to it. They can all be accessed through specific registers or using a shadow page that is mapped differently depending on which CPU accesses it. This patch implements the shadow map, making it possible for guests to access the CPU local registers using the same address on each CPU. +static int get_current_cpu(void) +{ + return cpu_single_env-cpu_index; +} This is the standard way of doing this (we use it on ARM as well), but it's pretty clearly a hack. which master sent this memory transaction is an attribute that ought to be passed down to the MMIO read/write functions, really (along with other interesting things like priv or not? and probably architecture specific attributes like ARM's secure/non-secure); this matches how hardware does it where the attributes are passed along as extra signals in the bus fabric. (Sometimes hardware also does this by having buses from the different cores be totally separate paths at the point where this kind of device is connected, before merging together later; we don't really support modelling that either :-)) Not a nak, just an observation while I'm thinking about it. Yeah, I tend to agree in general. I'm not 100% sure in this case, as it's almost an in-cpu device. But it would be nice to pass this information on the mmio callbacks. However, right now this is the only way to do it, as we don't have the pretty flexible one implemented yet ;). Alex
[Qemu-devel] [PATCH 48/58] pseries: Implement hcall-bulk hypervisor interface
From: David Gibson da...@gibson.dropbear.id.au This patch adds support for the H_REMOVE_BULK hypercall on the pseries machine. Strictly speaking this isn't necessarym since the kernel will only attempt to use this if hcall-bulk is advertised in the device tree, which previously it was not. Adding this support may give a marginal performance increase, but more importantly it reduces the differences between the emulated machine and an existing PowerVM or kvm system, both of which already implement hcall-bulk. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr.c |2 +- hw/spapr_hcall.c | 125 - 2 files changed, 114 insertions(+), 13 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 91953cf..deb4ae5 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -75,7 +75,7 @@ static void *spapr_create_fdt_skel(const char *cpu_model, uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size); uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)}; char hypertas_prop[] = hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt -\0hcall-tce\0hcall-vio\0hcall-splpar; +\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk; uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)}; int i; char *modelname; diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c index 0c61c10..84281be 100644 --- a/hw/spapr_hcall.c +++ b/hw/spapr_hcall.c @@ -174,20 +174,26 @@ static target_ulong h_enter(CPUState *env, sPAPREnvironment *spapr, return H_SUCCESS; } -static target_ulong h_remove(CPUState *env, sPAPREnvironment *spapr, - target_ulong opcode, target_ulong *args) +enum { +REMOVE_SUCCESS = 0, +REMOVE_NOT_FOUND = 1, +REMOVE_PARM = 2, +REMOVE_HW = 3, +}; + +static target_ulong remove_hpte(CPUState *env, target_ulong ptex, +target_ulong avpn, +target_ulong flags, +target_ulong *vp, target_ulong *rp) { -target_ulong flags = args[0]; -target_ulong pte_index = args[1]; -target_ulong avpn = args[2]; uint8_t *hpte; target_ulong v, r, rb; -if ((pte_index * HASH_PTE_SIZE_64) ~env-htab_mask) { -return H_PARAMETER; +if ((ptex * HASH_PTE_SIZE_64) ~env-htab_mask) { +return REMOVE_PARM; } -hpte = env-external_htab + (pte_index * HASH_PTE_SIZE_64); +hpte = env-external_htab + (ptex * HASH_PTE_SIZE_64); while (!lock_hpte(hpte, HPTE_V_HVLOCK)) { /* We have no real concurrency in qemu soft-emulation, so we * will never actually have a contested lock */ @@ -202,14 +208,106 @@ static target_ulong h_remove(CPUState *env, sPAPREnvironment *spapr, ((flags H_ANDCOND) (v avpn) != 0)) { stq_p(hpte, v ~HPTE_V_HVLOCK); assert(!(ldq_p(hpte) HPTE_V_HVLOCK)); -return H_NOT_FOUND; +return REMOVE_NOT_FOUND; } -args[0] = v ~HPTE_V_HVLOCK; -args[1] = r; +*vp = v ~HPTE_V_HVLOCK; +*rp = r; stq_p(hpte, 0); -rb = compute_tlbie_rb(v, r, pte_index); +rb = compute_tlbie_rb(v, r, ptex); ppc_tlb_invalidate_one(env, rb); assert(!(ldq_p(hpte) HPTE_V_HVLOCK)); +return REMOVE_SUCCESS; +} + +static target_ulong h_remove(CPUState *env, sPAPREnvironment *spapr, + target_ulong opcode, target_ulong *args) +{ +target_ulong flags = args[0]; +target_ulong pte_index = args[1]; +target_ulong avpn = args[2]; +int ret; + +ret = remove_hpte(env, pte_index, avpn, flags, + args[0], args[1]); + +switch (ret) { +case REMOVE_SUCCESS: +return H_SUCCESS; + +case REMOVE_NOT_FOUND: +return H_NOT_FOUND; + +case REMOVE_PARM: +return H_PARAMETER; + +case REMOVE_HW: +return H_HARDWARE; +} + +assert(0); +} + +#define H_BULK_REMOVE_TYPE 0xc000ULL +#define H_BULK_REMOVE_REQUEST0x4000ULL +#define H_BULK_REMOVE_RESPONSE 0x8000ULL +#define H_BULK_REMOVE_END0xc000ULL +#define H_BULK_REMOVE_CODE 0x3000ULL +#define H_BULK_REMOVE_SUCCESS0xULL +#define H_BULK_REMOVE_NOT_FOUND 0x1000ULL +#define H_BULK_REMOVE_PARM 0x2000ULL +#define H_BULK_REMOVE_HW 0x3000ULL +#define H_BULK_REMOVE_RC 0x0c00ULL +#define H_BULK_REMOVE_FLAGS0x0300ULL +#define H_BULK_REMOVE_ABSOLUTE 0xULL +#define H_BULK_REMOVE_ANDCOND0x0100ULL +#define H_BULK_REMOVE_AVPN 0x0200ULL +#define H_BULK_REMOVE_PTEX 0x00ffULL + +#define H_BULK_REMOVE_MAX_BATCH4 + +static target_ulong
[Qemu-devel] [PATCH 36/58] pseries: Bugfixes for interrupt numbering in XICS code
From: David Gibson da...@gibson.dropbear.id.au The implementation of the XICS interrupt controller contains several (difficult to trigger) bugs due to the fact that we were not 100% consistent with which irq numbering we used. In most places, global numbers were used as handled by the presentation layer, however a few functions took local numberings, that is the source number within the interrupt source controller which is offset from the global number. In most cases the function and its caller agreed on this, but in a few cases it didn't. This patch cleans this up by always using global numbering. Translation to the local number is now always and only done when we look up the individual interrupt source state structure. This should remove the existing bugs and with luck reduce the chances of re-introducing such bugs. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- hw/xics.c | 17 - 1 files changed, 8 insertions(+), 9 deletions(-) diff --git a/hw/xics.c b/hw/xics.c index 9bf82aa..bd8d4cd 100644 --- a/hw/xics.c +++ b/hw/xics.c @@ -187,17 +187,17 @@ static int ics_valid_irq(struct ics_state *ics, uint32_t nr) (nr (ics-offset + ics-nr_irqs)); } -static void ics_set_irq_msi(void *opaque, int nr, int val) +static void ics_set_irq_msi(void *opaque, int srcno, int val) { struct ics_state *ics = (struct ics_state *)opaque; -struct ics_irq_state *irq = ics-irqs + nr; +struct ics_irq_state *irq = ics-irqs + srcno; if (val) { if (irq-priority == 0xff) { irq-masked_pending = 1; /* masked pending */ ; } else { -icp_irq(ics-icp, irq-server, nr + ics-offset, irq-priority); +icp_irq(ics-icp, irq-server, srcno + ics-offset, irq-priority); } } } @@ -229,7 +229,7 @@ static void ics_resend_msi(struct ics_state *ics) static void ics_write_xive_msi(struct ics_state *ics, int nr, int server, uint8_t priority) { -struct ics_irq_state *irq = ics-irqs + nr; +struct ics_irq_state *irq = ics-irqs + nr - ics-offset; irq-server = server; irq-priority = priority; @@ -239,7 +239,7 @@ static void ics_write_xive_msi(struct ics_state *ics, int nr, int server, } irq-masked_pending = 0; -icp_irq(ics-icp, server, nr + ics-offset, priority); +icp_irq(ics-icp, server, nr, priority); } static void ics_reject(struct ics_state *ics, int nr) @@ -334,7 +334,7 @@ static void rtas_set_xive(sPAPREnvironment *spapr, uint32_t token, return; } -ics_write_xive_msi(ics, nr - ics-offset, server, priority); +ics_write_xive_msi(ics, nr, server, priority); rtas_st(rets, 0, 0); /* Success */ } @@ -388,7 +388,7 @@ static void rtas_int_off(sPAPREnvironment *spapr, uint32_t token, struct ics_irq_state *irq = xics-irqs + (nr - xics-offset); irq-saved_priority = irq-priority; -ics_write_xive_msi(xics, nr - xics-offset, irq-server, 0xff); +ics_write_xive_msi(xics, nr, irq-server, 0xff); #endif rtas_st(rets, 0, 0); /* Success */ @@ -418,8 +418,7 @@ static void rtas_int_on(sPAPREnvironment *spapr, uint32_t token, #if 0 struct ics_irq_state *irq = xics-irqs + (nr - xics-offset); -ics_write_xive_msi(xics, nr - xics-offset, - irq-server, irq-saved_priority); +ics_write_xive_msi(xics, nr, irq-server, irq-saved_priority); #endif rtas_st(rets, 0, 0); /* Success */ -- 1.6.0.2
[Qemu-devel] [PATCH 01/58] spapr: proper qdevification
From: Paolo Bonzini pbonz...@redhat.com Right now the spapr devices cannot be instantiated with -device, because the IRQs need to be passed to the spapr_*_create functions. Do this instead in the bus's init wrapper. This is particularly important with the conversion from scsi-disk to scsi-{cd,hd} that Markus made. After his patches, if you specify a scsi-cd device attached to an if=none drive, the default VSCSI controller will not be created and, without qdevification, you will not be able to add yours. NOTE from agraf: added small compile fix Signed-off-by: Paolo Bonzini pbonz...@redhat.com Cc: Alexander Graf ag...@suse.de Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr.c | 15 +-- hw/spapr.h |8 hw/spapr_llan.c |7 +-- hw/spapr_vio.c |5 + hw/spapr_vio.h | 13 - hw/spapr_vscsi.c |8 +--- hw/spapr_vty.c |8 +--- 7 files changed, 25 insertions(+), 39 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 1265cee..8cf93fe 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -298,7 +298,6 @@ static void ppc_spapr_init(ram_addr_t ram_size, long kernel_size, initrd_size, fw_size; long pteg_shift = 17; char *filename; -int irq = 16; spapr = g_malloc(sizeof(*spapr)); cpu_ppc_hypercall = emulate_spapr_hypercall; @@ -360,15 +359,14 @@ static void ppc_spapr_init(ram_addr_t ram_size, /* Set up VIO bus */ spapr-vio_bus = spapr_vio_bus_init(); -for (i = 0; i MAX_SERIAL_PORTS; i++, irq++) { +for (i = 0; i MAX_SERIAL_PORTS; i++) { if (serial_hds[i]) { spapr_vty_create(spapr-vio_bus, SPAPR_VTY_BASE_ADDRESS + i, - serial_hds[i], xics_find_qirq(spapr-icp, irq), - irq); + serial_hds[i]); } } -for (i = 0; i nb_nics; i++, irq++) { +for (i = 0; i nb_nics; i++) { NICInfo *nd = nd_table[i]; if (!nd-model) { @@ -376,8 +374,7 @@ static void ppc_spapr_init(ram_addr_t ram_size, } if (strcmp(nd-model, ibmveth) == 0) { -spapr_vlan_create(spapr-vio_bus, 0x1000 + i, nd, - xics_find_qirq(spapr-icp, irq), irq); +spapr_vlan_create(spapr-vio_bus, 0x1000 + i, nd); } else { fprintf(stderr, pSeries (sPAPR) platform does not support NIC model '%s' (only ibmveth is supported)\n, @@ -387,9 +384,7 @@ static void ppc_spapr_init(ram_addr_t ram_size, } for (i = 0; i = drive_get_max_bus(IF_SCSI); i++) { -spapr_vscsi_create(spapr-vio_bus, 0x2000 + i, - xics_find_qirq(spapr-icp, irq), irq); -irq++; +spapr_vscsi_create(spapr-vio_bus, 0x2000 + i); } if (kernel_filename) { diff --git a/hw/spapr.h b/hw/spapr.h index 263691b..009c459 100644 --- a/hw/spapr.h +++ b/hw/spapr.h @@ -1,6 +1,8 @@ #if !defined(__HW_SPAPR_H__) #define __HW_SPAPR_H__ +#include hw/xics.h + struct VIOsPAPRBus; struct icp_state; @@ -278,6 +280,12 @@ void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn); target_ulong spapr_hypercall(CPUState *env, target_ulong opcode, target_ulong *args); +static inline qemu_irq spapr_find_qirq(sPAPREnvironment *spapr, +int irq_num) +{ +return xics_find_qirq(spapr-icp, irq_num); +} + static inline uint32_t rtas_ld(target_ulong phys, int n) { return ldl_be_phys(phys + 4*n); diff --git a/hw/spapr_llan.c b/hw/spapr_llan.c index c18efc7..2597748 100644 --- a/hw/spapr_llan.c +++ b/hw/spapr_llan.c @@ -195,11 +195,9 @@ static int spapr_vlan_init(VIOsPAPRDevice *sdev) return 0; } -void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd, - qemu_irq qirq, uint32_t vio_irq_num) +void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd) { DeviceState *dev; -VIOsPAPRDevice *sdev; dev = qdev_create(bus-bus, spapr-vlan); qdev_prop_set_uint32(dev, reg, reg); @@ -207,9 +205,6 @@ void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd, qdev_set_nic_properties(dev, nd); qdev_init_nofail(dev); -sdev = (VIOsPAPRDevice *)dev; -sdev-qirq = qirq; -sdev-vio_irq_num = vio_irq_num; } static int spapr_vlan_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off) diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c index ce6558b..ba2e1c1 100644 --- a/hw/spapr_vio.c +++ b/hw/spapr_vio.c @@ -32,6 +32,7 @@ #include hw/spapr.h #include hw/spapr_vio.h +#include hw/xics.h #ifdef CONFIG_FDT #include libfdt.h @@ -595,6 +596,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev, DeviceInfo *qinfo) { VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)qinfo; VIOsPAPRDevice *dev = (VIOsPAPRDevice *)qdev; +VIOsPAPRBus *bus =
[Qemu-devel] [PATCH 13/58] PPC: E500: Generate IRQ lines for many CPUs
Now that we can generate multiple envs for all our virtual CPUs, we also need to tell the MPIC that we have multiple CPUs connected and connect them all to the respective virtual interrupt lines. Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppce500_mpc8544ds.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 8d05587..9cb01f3 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -237,7 +237,7 @@ static void mpc8544ds_init(ram_addr_t ram_size, target_long initrd_size=0; int i=0; unsigned int pci_irq_nrs[4] = {1, 2, 3, 4}; -qemu_irq *irqs, *mpic; +qemu_irq **irqs, *mpic; DeviceState *dev; struct boot_info *boot_info; CPUState *firstenv = NULL; @@ -247,6 +247,8 @@ static void mpc8544ds_init(ram_addr_t ram_size, cpu_model = e500v2_v30; } +irqs = g_malloc0(smp_cpus * sizeof(qemu_irq *)); +irqs[0] = g_malloc0(smp_cpus * sizeof(qemu_irq) * OPENPIC_OUTPUT_NB); for (i = 0; i smp_cpus; i++) { qemu_irq *input; env = cpu_ppc_init(cpu_model); @@ -259,6 +261,10 @@ static void mpc8544ds_init(ram_addr_t ram_size, firstenv = env; } +irqs[i] = irqs[0] + (i * OPENPIC_OUTPUT_NB); +input = (qemu_irq *)env-irq_inputs; +irqs[i][OPENPIC_OUTPUT_INT] = input[PPCE500_INPUT_INT]; +irqs[i][OPENPIC_OUTPUT_CINT] = input[PPCE500_INPUT_CINT]; env-spr[SPR_BOOKE_PIR] = env-cpu_index = i; /* XXX register timer? */ @@ -283,10 +289,11 @@ static void mpc8544ds_init(ram_addr_t ram_size, mpc8544ds.ram, ram_size)); /* MPIC */ -irqs = g_malloc0(sizeof(qemu_irq) * OPENPIC_OUTPUT_NB); -irqs[OPENPIC_OUTPUT_INT] = ((qemu_irq *)env-irq_inputs)[PPCE500_INPUT_INT]; -irqs[OPENPIC_OUTPUT_CINT] = ((qemu_irq *)env-irq_inputs)[PPCE500_INPUT_CINT]; -mpic = mpic_init(MPC8544_MPIC_REGS_BASE, 1, irqs, NULL); +mpic = mpic_init(MPC8544_MPIC_REGS_BASE, smp_cpus, irqs, NULL); + +if (!mpic) { +cpu_abort(env, MPIC failed to initialize\n); +} /* Serial */ if (serial_hds[0]) { -- 1.6.0.2
[Qemu-devel] [PATCH 34/58] PPC: Enable to use PAPR with PR style KVM
When running PR style KVM, we need to tell the kernel that we want to run in PAPR mode now. This means that we need to pass some more register information down and enable papr mode. We also need to align the HTAB to htab_size boundary. Using this patch, -M pseries works with kvm even on non-hv kvm implementations, as long as the preceding kernel patches are in. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - match on CONFIG_PSERIES v2 - v3: - remove HIOR pieces from PAPR patch (ABI breakage) --- hw/spapr.c | 14 +- target-ppc/kvm.c | 40 target-ppc/kvm_ppc.h |5 + 3 files changed, 58 insertions(+), 1 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 8cf93fe..c5c9a95 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -38,6 +38,9 @@ #include hw/spapr_vio.h #include hw/xics.h +#include kvm.h +#include kvm_ppc.h + #include libfdt.h #define KERNEL_LOAD_ADDR0x @@ -336,12 +339,21 @@ static void ppc_spapr_init(ram_addr_t ram_size, * later we should probably make it scale to the size of guest * RAM */ spapr-htab_size = 1ULL (pteg_shift + 7); -spapr-htab = g_malloc(spapr-htab_size); +spapr-htab = qemu_memalign(spapr-htab_size, spapr-htab_size); for (env = first_cpu; env != NULL; env = env-next_cpu) { env-external_htab = spapr-htab; env-htab_base = -1; env-htab_mask = spapr-htab_size - 1; + +/* Tell KVM that we're in PAPR mode */ +env-spr[SPR_SDR1] = (unsigned long)spapr-htab | + ((pteg_shift + 7) - 18); +env-spr[SPR_HIOR] = 0; + +if (kvm_enabled()) { +kvmppc_set_papr(env); +} } filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, spapr-rtas.bin); diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 77b98c4..f65b6e1 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -29,6 +29,10 @@ #include cpu.h #include device_tree.h +#include hw/sysbus.h +#include hw/spapr.h +#include hw/spapr_vio.h + //#define DEBUG_KVM #ifdef DEBUG_KVM @@ -455,6 +459,14 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run) dprintf(handle halt\n); ret = kvmppc_handle_halt(env); break; +#ifdef CONFIG_PSERIES +case KVM_EXIT_PAPR_HCALL: +dprintf(handle PAPR hypercall\n); +run-papr_hcall.ret = spapr_hypercall(env, run-papr_hcall.nr, + run-papr_hcall.args); +ret = 1; +break; +#endif default: fprintf(stderr, KVM: unknown exit reason %d\n, run-exit_reason); ret = -1; @@ -606,6 +618,34 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len) return 0; } +void kvmppc_set_papr(CPUState *env) +{ +struct kvm_enable_cap cap; +int ret; + +memset(cap, 0, sizeof(cap)); +cap.cap = KVM_CAP_PPC_PAPR; +ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, cap); + +if (ret) { +goto fail; +} + +/* + * XXX We set HIOR here. It really should be a qdev property of + * the CPU node, but we don't have CPUs converted to qdev yet. + * + * Once we have qdev CPUs, move HIOR to a qdev property and + * remove this chunk. + */ +/* XXX Set HIOR using new ioctl */ + +return; + +fail: +cpu_abort(env, This KVM version does not support PAPR\n); +} + bool kvm_arch_stop_on_emulation_error(CPUState *env) { return true; diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 76f98d9..c484e60 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -17,6 +17,7 @@ uint32_t kvmppc_get_tbfreq(void); uint64_t kvmppc_get_clockfreq(void); int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len); int kvmppc_set_interrupt(CPUState *env, int irq, int level); +void kvmppc_set_papr(CPUState *env); #else @@ -40,6 +41,10 @@ static inline int kvmppc_set_interrupt(CPUState *env, int irq, int level) return -1; } +static inline void kvmppc_set_papr(CPUState *env) +{ +} + #endif #ifndef CONFIG_KVM -- 1.6.0.2
Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization
Am 14.09.2011 11:52, schrieb Frediano Ziglio: 2011/9/14 Kevin Wolf kw...@redhat.com: Am 13.09.2011 15:36, schrieb Frediano Ziglio: 2011/9/13 Kevin Wolf kw...@redhat.com: Am 13.09.2011 09:53, schrieb Frediano Ziglio: These patches try to trade-off between leaks and speed for clusters refcounts. Refcount increments (REF+ or refp) are handled in a different way from decrements (REF- or refm). The reason it that posting or not flushing a REF- cause just a leak while posting a REF+ cause a corruption. To optimize REF- I just used an array to store offsets then when a flush is requested or array reach a limit (currently 1022) the array is sorted and written to disk. I use an array with offset instead of ranges to support compression (an offset could appear multiple times in the array). I consider this patch quite ready. Ok, first of all let's clarify what this optimises. I don't think it changes anything at all for the writeback cache modes, because these already do most operations in memory only. So this must be about optimising some operations with cache=writethrough. REF- isn't about normal cluster allocation, it is about COW with internal snapshots or bdrv_discard. Do you have benchmarks for any of them? I strongly disagree with your approach for REF-. We already have a cache, and introducing a second one sounds like a bad idea. I think we could get a very similar effect if we introduced a qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as dirty, but at the same time tells the cache that even in write-through mode it can still treat this block as write-back. This should require much less code changes. Yes, mainly optimize for writethrough. I did not test with writeback but should improve even this (I think here you have some flush to keep consistency). I'll try to write a qcow2_cache_entry_mark_dirty_wb patch and test it. Great, thanks! Don't expect however the patch too soon, I'm quite busy in these days. Ok, no problem. It's not really urgent either. But let's measure the effects first, I suspect that for cluster allocation it doesn't help much because every REF- comes with a REF+. That's 50% of effort if REF- clusters are far from REF+ :) I would expect that the next REF+ allocates exactly the REF- cluster. But you still have a point, we save the write on REF- and combine it with the REF+ write. This is still a TODO for REF+ patch. Actually, I was talking about the qcow2_cache_entry_mark_dirty_wb() case without any other change. You get it automatically then. Oh... time ago looking at refcount code I realize that a single deallocation could be reused in some cases only after Qemu restart. For instance - got a single cluster REF- which take refcount to 0 - free_cluster_index get decreased to this index - we get a new cluster request for 2 clusters - free_cluster_index get increased we skip freed deallocation and if we don't get a new deallocation for a cluster with index minor to our freed cluster this cluster is not reused. (I didn't test this behavior, no leak, no corruption, just image could be larger then expected) Yes, I'm aware of that. I'm not sure if it matters in practice. To optimize REF+ I mark a range as allocated and use this range to get new ones (avoiding writing refcount to disk). When a flush is requested or in some situations (like snapshot) this cache is disabled and flushed (written as REF-). I do not consider this patch ready, it works and pass all io-tests but for instance I would avoid allocating new clusters for refcount during preallocation. The only question here is if improving cache=writethrough cluster allocation performance is worth the additional complexity in the already complex refcounting code. I didn't see this optimization as a second level cache, but yes, for REF- is a second cache. The alternative that was discussed before is the dirty bit approach that is used in QED and would allow us to use writeback for all refcount blocks, regardless of REF- or REF+. It would be an easier approach requiring less code changes, but it comes with the cost of requiring an fsck after a qemu crash. I was thinking about changing the header magic first time we change refcount in order to mark image as dirty so newer Qemu recognize the flag while former one does not recognize image. Obviously reverting magic on image close. We've discussed this idea before and I think it wasn't considered a great idea to automagically change the header in an incompatible way. But we can always say that for improved performance you need to upgrade your image to qcow2 v3. I don't understand why there is not a wiki page for detailed qcow3 changes. I saw your post on May. I follow this ML since August so I think I missed a lot of discussion on qcow improves. Unfortunately there have been almost no comments, so you can consider RFC v2 as the current proposal. End speed up is
[Qemu-devel] [PATCH 27/58] device tree: dont fail operations
When we screw up and issue an FDT command that doesn't work, we really need to know immediately and usually can't continue to create the machine. To make sure we don't need to add error checking in all device tree modification code users, we can just add the fail checks to the qemu abstract functions. Signed-off-by: Alexander Graf ag...@suse.de --- device_tree.c | 76 ++-- 1 files changed, 51 insertions(+), 25 deletions(-) diff --git a/device_tree.c b/device_tree.c index f4a78c8..751538e 100644 --- a/device_tree.c +++ b/device_tree.c @@ -72,56 +72,81 @@ fail: return NULL; } -int qemu_devtree_setprop(void *fdt, const char *node_path, - const char *property, void *val_array, int size) +static int findnode_nofail(void *fdt, const char *node_path) { int offset; offset = fdt_path_offset(fdt, node_path); -if (offset 0) -return offset; +if (offset 0) { +fprintf(stderr, %s Couldn't find node %s: %s\n, __func__, node_path, +fdt_strerror(offset)); +exit(1); +} + +return offset; +} + +int qemu_devtree_setprop(void *fdt, const char *node_path, + const char *property, void *val_array, int size) +{ +int r; + +r = fdt_setprop(fdt, findnode_nofail(fdt, node_path), property, val_array, size); +if (r 0) { +fprintf(stderr, %s: Couldn't set %s/%s: %s\n, __func__, node_path, +property, fdt_strerror(r)); +exit(1); +} -return fdt_setprop(fdt, offset, property, val_array, size); +return r; } int qemu_devtree_setprop_cell(void *fdt, const char *node_path, const char *property, uint32_t val) { -int offset; +int r; -offset = fdt_path_offset(fdt, node_path); -if (offset 0) -return offset; +r = fdt_setprop_cell(fdt, findnode_nofail(fdt, node_path), property, val); +if (r 0) { +fprintf(stderr, %s: Couldn't set %s/%s = %#08x: %s\n, __func__, +node_path, property, val, fdt_strerror(r)); +exit(1); +} -return fdt_setprop_cell(fdt, offset, property, val); +return r; } int qemu_devtree_setprop_string(void *fdt, const char *node_path, const char *property, const char *string) { -int offset; +int r; -offset = fdt_path_offset(fdt, node_path); -if (offset 0) -return offset; +r = fdt_setprop_string(fdt, findnode_nofail(fdt, node_path), property, string); +if (r 0) { +fprintf(stderr, %s: Couldn't set %s/%s = %s: %s\n, __func__, +node_path, property, string, fdt_strerror(r)); +exit(1); +} -return fdt_setprop_string(fdt, offset, property, string); +return r; } int qemu_devtree_nop_node(void *fdt, const char *node_path) { -int offset; +int r; -offset = fdt_path_offset(fdt, node_path); -if (offset 0) -return offset; +r = fdt_nop_node(fdt, findnode_nofail(fdt, node_path)); +if (r 0) { +fprintf(stderr, %s: Couldn't nop node %s: %s\n, __func__, node_path, +fdt_strerror(r)); +exit(1); +} -return fdt_nop_node(fdt, offset); +return r; } int qemu_devtree_add_subnode(void *fdt, const char *name) { -int offset; char *dupname = g_strdup(name); char *basename = strrchr(dupname, '/'); int retval; @@ -133,12 +158,13 @@ int qemu_devtree_add_subnode(void *fdt, const char *name) basename[0] = '\0'; basename++; -offset = fdt_path_offset(fdt, dupname); -if (offset 0) { -return offset; +retval = fdt_add_subnode(fdt, findnode_nofail(fdt, dupname), basename); +if (retval 0) { +fprintf(stderr, FDT: Failed to create subnode %s: %s\n, name, +fdt_strerror(retval)); +exit(1); } -retval = fdt_add_subnode(fdt, offset, basename); g_free(dupname); return retval; } -- 1.6.0.2
[Qemu-devel] [PATCH 44/58] kvm: ppc: booke206: use MMU API
From: Scott Wood scottw...@freescale.com Share the TLB array with KVM. This allows us to set the initial TLB both on initial boot and reset, is useful for debugging, and could eventually be used to support migration. Signed-off-by: Scott Wood scottw...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppce500_mpc8544ds.c |2 + target-ppc/cpu.h |2 + target-ppc/kvm.c | 85 3 files changed, 89 insertions(+), 0 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index b86a008..61151d8 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -189,6 +189,8 @@ static void mmubooke_create_initial_mapping(CPUState *env, tlb-mas2 = va TARGET_PAGE_MASK; tlb-mas7_3 = pa TARGET_PAGE_MASK; tlb-mas7_3 |= MAS3_UR | MAS3_UW | MAS3_UX | MAS3_SR | MAS3_SW | MAS3_SX; + +env-tlb_dirty = true; } static void mpc8544ds_cpu_reset_sec(void *opaque) diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index b8d42e0..3e7f797 100644 --- a/target-ppc/cpu.h +++ b/target-ppc/cpu.h @@ -934,6 +934,8 @@ struct CPUPPCState { ppc_tlb_t tlb; /* TLB is optional. Allocate them only if needed*/ /* 403 dedicated access protection registers */ target_ulong pb[4]; +bool tlb_dirty; /* Set to non-zero when modifying TLB */ +bool kvm_sw_tlb; /* non-zero if KVM SW TLB API is active*/ #endif /* Other registers */ diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index f65b6e1..35a6f10 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -112,6 +112,52 @@ static int kvm_arch_sync_sregs(CPUState *cenv) return kvm_vcpu_ioctl(cenv, KVM_SET_SREGS, sregs); } +/* Set up a shared TLB array with KVM */ +static int kvm_booke206_tlb_init(CPUState *env) +{ +struct kvm_book3e_206_tlb_params params = {}; +struct kvm_config_tlb cfg = {}; +struct kvm_enable_cap encap = {}; +unsigned int entries = 0; +int ret, i; + +if (!kvm_enabled() || +!kvm_check_extension(env-kvm_state, KVM_CAP_SW_TLB)) { +return 0; +} + +assert(ARRAY_SIZE(params.tlb_sizes) == BOOKE206_MAX_TLBN); + +for (i = 0; i BOOKE206_MAX_TLBN; i++) { +params.tlb_sizes[i] = booke206_tlb_size(env, i); +params.tlb_ways[i] = booke206_tlb_ways(env, i); +entries += params.tlb_sizes[i]; +} + +assert(entries == env-nb_tlb); +assert(sizeof(struct kvm_book3e_206_tlb_entry) == sizeof(ppcmas_tlb_t)); + +env-tlb_dirty = true; + +cfg.array = (uintptr_t)env-tlb.tlbm; +cfg.array_len = sizeof(ppcmas_tlb_t) * entries; +cfg.params = (uintptr_t)params; +cfg.mmu_type = KVM_MMU_FSL_BOOKE_NOHV; + +encap.cap = KVM_CAP_SW_TLB; +encap.args[0] = (uintptr_t)cfg; + +ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, encap); +if (ret 0) { +fprintf(stderr, %s: couldn't enable KVM_CAP_SW_TLB: %s\n, +__func__, strerror(-ret)); +return ret; +} + +env-kvm_sw_tlb = true; +return 0; +} + int kvm_arch_init_vcpu(CPUState *cenv) { int ret; @@ -123,6 +169,15 @@ int kvm_arch_init_vcpu(CPUState *cenv) idle_timer = qemu_new_timer_ns(vm_clock, kvm_kick_env, cenv); +/* Some targets support access to KVM's guest TLB. */ +switch (cenv-mmu_model) { +case POWERPC_MMU_BOOKE206: +ret = kvm_booke206_tlb_init(cenv); +break; +default: +break; +} + return ret; } @@ -130,6 +185,31 @@ void kvm_arch_reset_vcpu(CPUState *env) { } +static void kvm_sw_tlb_put(CPUState *env) +{ +struct kvm_dirty_tlb dirty_tlb; +unsigned char *bitmap; +int ret; + +if (!env-kvm_sw_tlb) { +return; +} + +bitmap = g_malloc((env-nb_tlb + 7) / 8); +memset(bitmap, 0xFF, (env-nb_tlb + 7) / 8); + +dirty_tlb.bitmap = (uintptr_t)bitmap; +dirty_tlb.num_dirty = env-nb_tlb; + +ret = kvm_vcpu_ioctl(env, KVM_DIRTY_TLB, dirty_tlb); +if (ret) { +fprintf(stderr, %s: KVM_DIRTY_TLB: %s\n, +__func__, strerror(-ret)); +} + +g_free(bitmap); +} + int kvm_arch_put_registers(CPUState *env, int level) { struct kvm_regs regs; @@ -167,6 +247,11 @@ int kvm_arch_put_registers(CPUState *env, int level) if (ret 0) return ret; +if (env-tlb_dirty) { +kvm_sw_tlb_put(env); +env-tlb_dirty = false; +} + return ret; } -- 1.6.0.2
Re: [Qemu-devel] [PATCH 00/12] nbd improvements
Am 08.09.2011 17:24, schrieb Paolo Bonzini: I find nbd quite useful to test migration, but it is limited: it can only do synchronous operation, it is not safe because it does not support flush, and it has no discard either. qemu-nbd is also limited to 1MB requests, and the nbd block driver does not take this into account. Luckily, flush/FUA support is being worked out by upstream, and discard can also be added with the same framework (patches 1 to 6). Asynchronous support is also very similar to what sheepdog is already doing (patches 7 to 12). Paolo Bonzini (12): nbd: support feature negotiation nbd: sync API definitions with upstream nbd: support NBD_SET_FLAGS ioctl nbd: add support for NBD_CMD_FLUSH nbd: add support for NBD_CMD_FLAG_FUA nbd: support NBD_CMD_TRIM in the server sheepdog: add coroutine_fn markers add socket_set_block sheepdog: move coroutine send/recv function to generic code block: add bdrv_co_flush support nbd: switch to asynchronous operation nbd: split requests Okay, completed the review for this series now. I think if you consider the comments posted so far for v2 we should be good. Kevin
Re: [Qemu-devel] [PATCH 0/3] Memory API mutators
On 14 September 2011 11:02, Avi Kivity a...@redhat.com wrote: On 09/14/2011 12:56 PM, Peter Maydell wrote: On 14 September 2011 10:23, Avi Kivitya...@redhat.com wrote: This patchset introduces memory_region_set_enabled() and memory_region_set_address() to avoid the requirement on memory routers to track the internal state of the memory API (so they know whether they need to add or remove a region). Instead, they can simply copy the state of the region from the guest-exposed register to the memory core, via the new mutator functions. Please review. Do we need a memory_region_set_size() as well? Would set_size() allow things like omap_gpmc() to avoid the need to create an intermediate container subregion to enforce size clipping on the child region it's trying to map? I'd recommend not calling _set_size() on somebody else's region - this quickly leads to confusion. Only call set_size() if you also called _init() and will call _destroy(). Can you point me at the code in question? hw/omap_gpmc.c:omap_gpmc_cs_map(). For each of the 8 children you can connect to it, the GPMC has a base and mask register. The hardware logic is effectively if ((address mask) == base) { send transaction to this child } (complicated only slightly by the register for base only having bits [29:24] with the others implied-zero, and the register for mask only having bits [27:24].) The effect is that you can use the mask value to set the size of the area the child is mapped in. (Silly mask settings with holes are discouraged by the TRM, and the current code doesn't handle them.) The repeated-in-the-space effect happens if the child is smaller than the space it's in: the child hardware just ignores the higher bits of the address so appears multiple times. (Strictly speaking what omap_gpmc() wants is not merely clipping to a guest-specified size but also wrapping, so you can take a 16MB child region and map the bottom 4MB of it repeating into a 32MB chunk of address space, say. But that would require a lot of playing games with aliases to implement a bizarre corner case that nobody uses in practice.) That's best done in the memory core, the rendering loop can be adjusted to do this replication. That would be nice, although as I say nobody is actually relying on it so probably not worth the effort unless there's another user for it. -- PMM
[Qemu-devel] [PATCH 55/58] ppc: move ADB stuff from ppc_mac.h to adb.h
From: Laurent Vivier laur...@vivier.eu Allow to use ADB in non-ppc macintosh Signed-off-by: Laurent Vivier laur...@vivier.eu Signed-off-by: Alexander Graf ag...@suse.de --- hw/adb.c |2 +- hw/adb.h | 67 + hw/cuda.c |1 + hw/ppc_mac.h | 42 - hw/ppc_newworld.c |1 + hw/ppc_oldworld.c |1 + 6 files changed, 71 insertions(+), 43 deletions(-) create mode 100644 hw/adb.h diff --git a/hw/adb.c b/hw/adb.c index 8dedbf8..aa15f55 100644 --- a/hw/adb.c +++ b/hw/adb.c @@ -22,7 +22,7 @@ * THE SOFTWARE. */ #include hw.h -#include ppc_mac.h +#include adb.h #include console.h /* debug ADB */ diff --git a/hw/adb.h b/hw/adb.h new file mode 100644 index 000..b2a591c --- /dev/null +++ b/hw/adb.h @@ -0,0 +1,67 @@ +/* + * QEMU ADB emulation shared definitions and prototypes + * + * Copyright (c) 2004-2007 Fabrice Bellard + * Copyright (c) 2007 Jocelyn Mayer + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#if !defined(__ADB_H__) +#define __ADB_H__ + +#define MAX_ADB_DEVICES 16 + +#define ADB_MAX_OUT_LEN 16 + +typedef struct ADBDevice ADBDevice; + +/* buf = NULL means polling */ +typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, + const uint8_t *buf, int len); +typedef int ADBDeviceReset(ADBDevice *d); + +struct ADBDevice { +struct ADBBusState *bus; +int devaddr; +int handler; +ADBDeviceRequest *devreq; +ADBDeviceReset *devreset; +void *opaque; +}; + +typedef struct ADBBusState { +ADBDevice devices[MAX_ADB_DEVICES]; +int nb_devices; +int poll_index; +} ADBBusState; + +int adb_request(ADBBusState *s, uint8_t *buf_out, +const uint8_t *buf, int len); +int adb_poll(ADBBusState *s, uint8_t *buf_out); + +ADBDevice *adb_register_device(ADBBusState *s, int devaddr, + ADBDeviceRequest *devreq, + ADBDeviceReset *devreset, + void *opaque); +void adb_kbd_init(ADBBusState *bus); +void adb_mouse_init(ADBBusState *bus); + +extern ADBBusState adb_bus; +#endif /* !defined(__ADB_H__) */ diff --git a/hw/cuda.c b/hw/cuda.c index 5c92d81..6f05975 100644 --- a/hw/cuda.c +++ b/hw/cuda.c @@ -24,6 +24,7 @@ */ #include hw.h #include ppc_mac.h +#include adb.h #include qemu-timer.h #include sysemu.h diff --git a/hw/ppc_mac.h b/hw/ppc_mac.h index 7351bb6..af75e45 100644 --- a/hw/ppc_mac.h +++ b/hw/ppc_mac.h @@ -77,46 +77,4 @@ void macio_nvram_setup_bar(MacIONVRAMState *s, MemoryRegion *bar, void pmac_format_nvram_partition (MacIONVRAMState *nvr, int len); uint32_t macio_nvram_read (void *opaque, uint32_t addr); void macio_nvram_write (void *opaque, uint32_t addr, uint32_t val); - -/* adb.c */ - -#define MAX_ADB_DEVICES 16 - -#define ADB_MAX_OUT_LEN 16 - -typedef struct ADBDevice ADBDevice; - -/* buf = NULL means polling */ -typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, - const uint8_t *buf, int len); -typedef int ADBDeviceReset(ADBDevice *d); - -struct ADBDevice { -struct ADBBusState *bus; -int devaddr; -int handler; -ADBDeviceRequest *devreq; -ADBDeviceReset *devreset; -void *opaque; -}; - -typedef struct ADBBusState { -ADBDevice devices[MAX_ADB_DEVICES]; -int nb_devices; -int poll_index; -} ADBBusState; - -int adb_request(ADBBusState *s, uint8_t *buf_out, -const uint8_t *buf, int len); -int adb_poll(ADBBusState *s, uint8_t *buf_out); - -ADBDevice *adb_register_device(ADBBusState *s, int devaddr, - ADBDeviceRequest *devreq, - ADBDeviceReset *devreset, - void *opaque); -void adb_kbd_init(ADBBusState *bus); -void adb_mouse_init(ADBBusState *bus); - -extern ADBBusState adb_bus; - #endif /*
Re: [Qemu-devel] [PATCH 05/58] PPC: Add CPU local MMIO regions to MPIC
On 2011-09-14 12:07, Peter Maydell wrote: On 14 September 2011 09:42, Alexander Graf ag...@suse.de wrote: The MPIC exports a register set for each CPU connected to it. They can all be accessed through specific registers or using a shadow page that is mapped differently depending on which CPU accesses it. This patch implements the shadow map, making it possible for guests to access the CPU local registers using the same address on each CPU. +static int get_current_cpu(void) +{ + return cpu_single_env-cpu_index; +} This is the standard way of doing this (we use it on ARM as well), but it's pretty clearly a hack. which master sent this memory transaction is an attribute that ought to be passed down to the MMIO read/write functions, really (along with other interesting things like priv or not? and probably architecture specific attributes like ARM's secure/non-secure); this matches how hardware does it where the attributes are passed along as extra signals in the bus fabric. (Sometimes hardware also does this by having buses from the different cores be totally separate paths at the point where this kind of device is connected, before merging together later; we don't really support modelling that either :-)) Not a nak, just an observation while I'm thinking about it. Same problem has to be solved on x86. The way the local APIC is hooked up right now is totally broken, just works by chance because normal guests don't seriously stress the architecture. If we start dispatching CPU memory accesses via per-CPU memory roots, the problem can be solved without passing additional source information to the callbacks. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH 56/58] PPC: Fix via-cuda memory registration
Commit 23c5e4ca (convert to memory API) broke the VIA Cuda emulation layer by not registering the IO structs. This patch registers them properly and thus makes -M g3beige and -M mac99 work again. Tested-by: Andreas Färber andreas.faer...@web.de Signed-off-by: Alexander Graf ag...@suse.de --- hw/cuda.c | 28 1 files changed, 16 insertions(+), 12 deletions(-) diff --git a/hw/cuda.c b/hw/cuda.c index 6f05975..4077436 100644 --- a/hw/cuda.c +++ b/hw/cuda.c @@ -634,16 +634,20 @@ static uint32_t cuda_readl (void *opaque, target_phys_addr_t addr) return 0; } -static CPUWriteMemoryFunc * const cuda_write[] = { -cuda_writeb, -cuda_writew, -cuda_writel, -}; - -static CPUReadMemoryFunc * const cuda_read[] = { -cuda_readb, -cuda_readw, -cuda_readl, +static MemoryRegionOps cuda_ops = { +.old_mmio = { +.write = { +cuda_writeb, +cuda_writew, +cuda_writel, +}, +.read = { +cuda_readb, +cuda_readw, +cuda_readl, +}, +}, +.endianness = DEVICE_NATIVE_ENDIAN, }; static bool cuda_timer_exist(void *opaque, int version_id) @@ -740,8 +744,8 @@ void cuda_init (MemoryRegion **cuda_mem, qemu_irq irq) s-tick_offset = (uint32_t)mktimegm(tm) + RTC_OFFSET; s-adb_poll_timer = qemu_new_timer_ns(vm_clock, cuda_adb_poll, s); -cpu_register_io_memory(cuda_read, cuda_write, s, - DEVICE_NATIVE_ENDIAN); +memory_region_init_io(s-mem, cuda_ops, s, cuda, 0x2000); + *cuda_mem = s-mem; vmstate_register(NULL, -1, vmstate_cuda, s); qemu_register_reset(cuda_reset, s); -- 1.6.0.2
Re: [Qemu-devel] [PATCH 0/3] Memory API mutators
On 2011-09-14 11:49, Avi Kivity wrote: Jan, too, was interested in this. On 09/14/2011 12:23 PM, Avi Kivity wrote: This patchset introduces memory_region_set_enabled() and memory_region_set_address() to avoid the requirement on memory routers to track the internal state of the memory API (so they know whether they need to add or remove a region). Instead, they can simply copy the state of the region from the guest-exposed register to the memory core, via the new mutator functions. Please review. Do we need a memory_region_set_size() as well? Do we want memory_region_set_attributes(mr, MR_ATTR_ENABLED | MR_ATTR_SIZE, (MemoryRegionAttributes) { .enabled = s-enabled, .address = s-addr, }); ? Avi Kivity (3): memory: introduce memory_region_set_enabled() memory: introduce memory_region_set_address() memory: optimize empty transactions due to mutators memory.c | 64 - memory.h | 28 +++ 2 files changed, 82 insertions(+), 10 deletions(-) Whatever the outcome is (tons of memory_region_set/get_X functions or huge attribute structures + set/get_attributes), it should be consistent for all attributes of a memory region. And there should be only one way of doing this. I think the decision multiple set/get vs. attribute struct depends on some (estimated) usage stats: How many call sites will access multiple attributes in one run and how may will only manipulate a single? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH 39/58] pseries: More complete WIMG validation in H_ENTER code
From: David Gibson da...@gibson.dropbear.id.au Currently our implementation of the H_ENTER hypercall, which inserts a mapping in the hash page table assumes that only ordinary memory is ever mapped, and only permits mapping attribute bits accordingly (WIMG==0010). However, we intend to start adding emulated IO to the pseries platform (and real IO with PCI passthrough on kvm) which means this simple test will no longer suffice. This patch extends the h_enter validation code to check if the given address is a RAM address. If it is it enforces WIMG==0010, otherwise it assumes that it is an IO mapping and instead enforces WIMG=010x. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr.c |3 ++- hw/spapr.h |1 + hw/spapr_hcall.c | 22 ++ 3 files changed, 21 insertions(+), 5 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 9eefef9..00aed62 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -336,7 +336,8 @@ static void ppc_spapr_init(ram_addr_t ram_size, } /* allocate RAM */ -ram_offset = qemu_ram_alloc(NULL, ppc_spapr.ram, ram_size); +spapr-ram_limit = ram_size; +ram_offset = qemu_ram_alloc(NULL, ppc_spapr.ram, spapr-ram_limit); cpu_register_physical_memory(0, ram_size, ram_offset); /* allocate hash page table. For now we always make this 16mb, diff --git a/hw/spapr.h b/hw/spapr.h index 009c459..3d21b7a 100644 --- a/hw/spapr.h +++ b/hw/spapr.h @@ -10,6 +10,7 @@ typedef struct sPAPREnvironment { struct VIOsPAPRBus *vio_bus; struct icp_state *icp; +target_phys_addr_t ram_limit; void *htab; long htab_size; target_phys_addr_t fdt_addr, rtas_addr; diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c index f7ead04..70f853c 100644 --- a/hw/spapr_hcall.c +++ b/hw/spapr_hcall.c @@ -99,6 +99,8 @@ static target_ulong h_enter(CPUState *env, sPAPREnvironment *spapr, target_ulong pte_index = args[1]; target_ulong pteh = args[2]; target_ulong ptel = args[3]; +target_ulong page_shift = 12; +target_ulong raddr; target_ulong i; uint8_t *hpte; @@ -111,6 +113,7 @@ static target_ulong h_enter(CPUState *env, sPAPREnvironment *spapr, #endif if ((ptel 0xff000) == 0) { /* 16M page */ +page_shift = 24; /* lowest AVA bit must be 0 for 16M pages */ if (pteh 0x80) { return H_PARAMETER; @@ -120,12 +123,23 @@ static target_ulong h_enter(CPUState *env, sPAPREnvironment *spapr, } } -/* FIXME: bounds check the pa? */ +raddr = (ptel HPTE_R_RPN) ~((1ULL page_shift) - 1); -/* Check WIMG */ -if ((ptel HPTE_R_WIMG) != HPTE_R_M) { -return H_PARAMETER; +if (raddr spapr-ram_limit) { +/* Regular RAM - should have WIMG=0010 */ +if ((ptel HPTE_R_WIMG) != HPTE_R_M) { +return H_PARAMETER; +} +} else { +/* Looks like an IO address */ +/* FIXME: What WIMG combinations could be sensible for IO? + * For now we allow WIMG=010x, but are there others? */ +/* FIXME: Should we check against registered IO addresses? */ +if ((ptel (HPTE_R_W | HPTE_R_I | HPTE_R_M)) != HPTE_R_I) { +return H_PARAMETER; +} } + pteh = ~0x60ULL; if ((pte_index * HASH_PTE_SIZE_64) ~env-htab_mask) { -- 1.6.0.2
[Qemu-devel] [PATCH 35/58] PPC: SPAPR: Use KVM function for time info
One of the things we can't fake on PPC is the timer speed. So we need to extract the frequency information from the host and put it back into the guest device tree. Luckily, we already have functions for that from the non-pseries targets, so all we need to do is to connect the dots and the guest suddenly gets to know its real timer speeds. Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index c5c9a95..760e323 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -140,6 +140,8 @@ static void *spapr_create_fdt_skel(const char *cpu_model, char *nodename; uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40), 0x, 0x}; +uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : TIMEBASE_FREQ; +uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 10; if (asprintf(nodename, %s@%x, modelname, index) 0) { fprintf(stderr, Allocation failure\n); @@ -158,10 +160,8 @@ static void *spapr_create_fdt_skel(const char *cpu_model, env-dcache_line_size))); _FDT((fdt_property_cell(fdt, icache-block-size, env-icache_line_size))); -_FDT((fdt_property_cell(fdt, timebase-frequency, TIMEBASE_FREQ))); -/* Hardcode CPU frequency for now. It's kind of arbitrary on - * full emu, for kvm we should copy it from the host */ -_FDT((fdt_property_cell(fdt, clock-frequency, 10))); +_FDT((fdt_property_cell(fdt, timebase-frequency, tbfreq))); +_FDT((fdt_property_cell(fdt, clock-frequency, cpufreq))); _FDT((fdt_property_cell(fdt, ibm,slb-size, env-slb_nr))); _FDT((fdt_property(fdt, ibm,pft-size, pft_size_prop, sizeof(pft_size_prop; -- 1.6.0.2
[Qemu-devel] [PATCH 15/58] PPC: bamboo: Move host fdt copy to target
We have some code in generic kvm_ppc.c that is only used by 440. Move to the 440 specific device code. Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppc440_bamboo.c | 37 +++-- target-ppc/kvm_ppc.c | 30 -- target-ppc/kvm_ppc.h |1 - 3 files changed, 35 insertions(+), 33 deletions(-) diff --git a/hw/ppc440_bamboo.c b/hw/ppc440_bamboo.c index 1addb68..65d4f0f 100644 --- a/hw/ppc440_bamboo.c +++ b/hw/ppc440_bamboo.c @@ -31,6 +31,38 @@ #define FDT_ADDR 0x180 #define RAMDISK_ADDR 0x190 +#ifdef CONFIG_FDT +static int bamboo_copy_host_cell(void *fdt, const char *node, const char *prop) +{ +uint32_t cell; +int ret; + +ret = kvmppc_read_host_property(node, prop, cell, sizeof(cell)); +if (ret 0) { +fprintf(stderr, couldn't read host %s/%s\n, node, prop); +goto out; +} + +ret = qemu_devtree_setprop_cell(fdt, node, prop, cell); +if (ret 0) { +fprintf(stderr, couldn't set guest %s/%s\n, node, prop); +goto out; +} + +out: +return ret; +} + +static void bamboo_fdt_update(void *fdt) +{ +/* Copy data from the host device tree into the guest. Since the guest can + * directly access the timebase without host involvement, we must expose + * the correct frequencies. */ +bamboo_copy_host_cell(fdt, /cpus/cpu@0, clock-frequency); +bamboo_copy_host_cell(fdt, /cpus/cpu@0, timebase-frequency); +} +#endif + static int bamboo_load_device_tree(target_phys_addr_t addr, uint32_t ramsize, target_phys_addr_t initrd_base, @@ -76,8 +108,9 @@ static int bamboo_load_device_tree(target_phys_addr_t addr, if (ret 0) fprintf(stderr, couldn't set /chosen/bootargs\n); -if (kvm_enabled()) -kvmppc_fdt_update(fdt); +if (kvm_enabled()) { +bamboo_fdt_update(fdt); +} ret = rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr); g_free(fdt); diff --git a/target-ppc/kvm_ppc.c b/target-ppc/kvm_ppc.c index 867dc1d..233115e 100644 --- a/target-ppc/kvm_ppc.c +++ b/target-ppc/kvm_ppc.c @@ -54,36 +54,6 @@ free: free(path); return ret; } - -static int kvmppc_copy_host_cell(void *fdt, const char *node, const char *prop) -{ -uint32_t cell; -int ret; - -ret = kvmppc_read_host_property(node, prop, cell, sizeof(cell)); -if (ret 0) { -fprintf(stderr, couldn't read host %s/%s\n, node, prop); -goto out; -} - -ret = qemu_devtree_setprop_cell(fdt, node, prop, cell); -if (ret 0) { -fprintf(stderr, couldn't set guest %s/%s\n, node, prop); -goto out; -} - -out: -return ret; -} - -void kvmppc_fdt_update(void *fdt) -{ -/* Copy data from the host device tree into the guest. Since the guest can - * directly access the timebase without host involvement, we must expose - * the correct frequencies. */ -kvmppc_copy_host_cell(fdt, /cpus/cpu@0, clock-frequency); -kvmppc_copy_host_cell(fdt, /cpus/cpu@0, timebase-frequency); -} #endif static void kvmppc_timer_hack(void *opaque) diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 45a1373..2f32249 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -10,7 +10,6 @@ #define __KVM_PPC_H__ void kvmppc_init(void); -void kvmppc_fdt_update(void *fdt); #ifndef CONFIG_KVM static inline int kvmppc_read_host_property(const char *node_path, const char *prop, void *val, size_t len) -- 1.6.0.2
[Qemu-devel] [PATCH 02/58] spapr: prepare for qdevification of irq
From: Paolo Bonzini pbonz...@redhat.com Restructure common properties for sPAPR devices so that IRQ definitions can be added in one place. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Cc: Alexander Graf ag...@suse.de Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr_llan.c |4 +--- hw/spapr_vio.h |5 + hw/spapr_vscsi.c |4 +--- hw/spapr_vty.c |2 +- 4 files changed, 8 insertions(+), 7 deletions(-) diff --git a/hw/spapr_llan.c b/hw/spapr_llan.c index 2597748..abe1297 100644 --- a/hw/spapr_llan.c +++ b/hw/spapr_llan.c @@ -495,9 +495,7 @@ static VIOsPAPRDeviceInfo spapr_vlan = { .qdev.name = spapr-vlan, .qdev.size = sizeof(VIOsPAPRVLANDevice), .qdev.props = (Property[]) { -DEFINE_PROP_UINT32(reg, VIOsPAPRDevice, reg, 0x1000), -DEFINE_PROP_UINT32(dma-window, VIOsPAPRDevice, rtce_window_size, - 0x1000), +DEFINE_SPAPR_PROPERTIES(VIOsPAPRVLANDevice, sdev, 0x1000, 0x1000), DEFINE_NIC_PROPERTIES(VIOsPAPRVLANDevice, nicconf), DEFINE_PROP_END_OF_LIST(), }, diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h index faa5d94..7eb5367 100644 --- a/hw/spapr_vio.h +++ b/hw/spapr_vio.h @@ -60,6 +60,11 @@ typedef struct VIOsPAPRDevice { VIOsPAPR_CRQ crq; } VIOsPAPRDevice; +#define DEFINE_SPAPR_PROPERTIES(type, field, default_reg, default_dma_window) \ +DEFINE_PROP_UINT32(reg, type, field.reg, default_reg), \ +DEFINE_PROP_UINT32(dma-window, type, field.rtce_window_size, \ + default_dma_window) + typedef struct VIOsPAPRBus { BusState bus; int irq; diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c index d2d0415..6fc82f6 100644 --- a/hw/spapr_vscsi.c +++ b/hw/spapr_vscsi.c @@ -930,9 +930,7 @@ static VIOsPAPRDeviceInfo spapr_vscsi = { .qdev.name = spapr-vscsi, .qdev.size = sizeof(VSCSIState), .qdev.props = (Property[]) { -DEFINE_PROP_UINT32(reg, VIOsPAPRDevice, reg, 0x2000), -DEFINE_PROP_UINT32(dma-window, VIOsPAPRDevice, - rtce_window_size, 0x1000), +DEFINE_SPAPR_PROPERTIES(VSCSIState, vdev, 0x2000, 0x1000), DEFINE_PROP_END_OF_LIST(), }, }; diff --git a/hw/spapr_vty.c b/hw/spapr_vty.c index 607b81b..a9d4b03 100644 --- a/hw/spapr_vty.c +++ b/hw/spapr_vty.c @@ -140,7 +140,7 @@ static VIOsPAPRDeviceInfo spapr_vty = { .qdev.name = spapr-vty, .qdev.size = sizeof(VIOsPAPRVTYDevice), .qdev.props = (Property[]) { -DEFINE_PROP_UINT32(reg, VIOsPAPRDevice, reg, 0), +DEFINE_SPAPR_PROPERTIES(VIOsPAPRVTYDevice, sdev, 0, 0), DEFINE_PROP_CHR(chardev, VIOsPAPRVTYDevice, chardev), DEFINE_PROP_END_OF_LIST(), }, -- 1.6.0.2
[Qemu-devel] [PATCH 33/58] KVM: update kernel headers
This patch updates the kvm kernel headers to the latest version. Signed-off-by: Alexander Graf ag...@suse.de --- linux-headers/asm-powerpc/kvm.h | 23 +++ linux-headers/asm-x86/kvm_para.h | 14 ++ linux-headers/linux/kvm.h| 25 + linux-headers/linux/kvm_para.h |1 + 4 files changed, 55 insertions(+), 8 deletions(-) diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h index 777d307..579e219 100644 --- a/linux-headers/asm-powerpc/kvm.h +++ b/linux-headers/asm-powerpc/kvm.h @@ -22,6 +22,10 @@ #include linux/types.h +/* Select powerpc specific features in linux/kvm.h */ +#define __KVM_HAVE_SPAPR_TCE +#define __KVM_HAVE_PPC_SMT + struct kvm_regs { __u64 pc; __u64 cr; @@ -145,6 +149,12 @@ struct kvm_regs { #define KVM_SREGS_E_UPDATE_DBSR(1 3) /* + * Book3S special bits to indicate contents in the struct by maintaining + * backwards compatibility with older structs. If adding a new field, + * please make sure to add a flag for that new field */ +#define KVM_SREGS_S_HIOR (1 0) + +/* * In KVM_SET_SREGS, reserved/pad fields must be left untouched from a * previous KVM_GET_REGS. * @@ -169,6 +179,8 @@ struct kvm_sregs { __u64 ibat[8]; __u64 dbat[8]; } ppc32; + __u64 flags; /* KVM_SREGS_S_ */ + __u64 hior; } s; struct { union { @@ -272,4 +284,15 @@ struct kvm_guest_debug_arch { #define KVM_INTERRUPT_UNSET-2U #define KVM_INTERRUPT_SET_LEVEL-3U +/* for KVM_CAP_SPAPR_TCE */ +struct kvm_create_spapr_tce { + __u64 liobn; + __u32 window_size; +}; + +/* for KVM_ALLOCATE_RMA */ +struct kvm_allocate_rma { + __u64 rma_size; +}; + #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h index 834d71e..f2ac46a 100644 --- a/linux-headers/asm-x86/kvm_para.h +++ b/linux-headers/asm-x86/kvm_para.h @@ -21,6 +21,7 @@ */ #define KVM_FEATURE_CLOCKSOURCE23 #define KVM_FEATURE_ASYNC_PF 4 +#define KVM_FEATURE_STEAL_TIME 5 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -30,10 +31,23 @@ #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 +#define KVM_MSR_ENABLED 1 /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */ #define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 +#define MSR_KVM_STEAL_TIME 0x4b564d03 + +struct kvm_steal_time { + __u64 steal; + __u32 version; + __u32 flags; + __u32 pad[12]; +}; + +#define KVM_STEAL_ALIGNMENT_BITS 5 +#define KVM_STEAL_VALID_BITS ((-1ULL (KVM_STEAL_ALIGNMENT_BITS + 1))) +#define KVM_STEAL_RESERVED_MASK (((1 KVM_STEAL_ALIGNMENT_BITS) - 1 ) 1) #define KVM_MAX_MMU_OP_BATCH 32 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index fc63b73..2062375 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -161,6 +161,7 @@ struct kvm_pit_config { #define KVM_EXIT_NMI 16 #define KVM_EXIT_INTERNAL_ERROR 17 #define KVM_EXIT_OSI 18 +#define KVM_EXIT_PAPR_HCALL 19 /* For KVM_EXIT_INTERNAL_ERROR */ #define KVM_INTERNAL_ERROR_EMULATION 1 @@ -264,6 +265,11 @@ struct kvm_run { struct { __u64 gprs[32]; } osi; + struct { + __u64 nr; + __u64 ret; + __u64 args[9]; + } papr_hcall; /* Fix the size of the union. */ char padding[256]; }; @@ -457,7 +463,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_VAPIC 6 #define KVM_CAP_EXT_CPUID 7 #define KVM_CAP_CLOCKSOURCE 8 -#define KVM_CAP_NR_VCPUS 9 /* returns max vcpus per vm */ +#define KVM_CAP_NR_VCPUS 9 /* returns recommended max vcpus per vm */ #define KVM_CAP_NR_MEMSLOTS 10 /* returns max memory slots per vm */ #define KVM_CAP_PIT 11 #define KVM_CAP_NOP_IO_DELAY 12 @@ -544,6 +550,12 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_TSC_CONTROL 60 #define KVM_CAP_GET_TSC_KHZ 61 #define KVM_CAP_PPC_BOOKE_SREGS 62 +#define KVM_CAP_SPAPR_TCE 63 +#define KVM_CAP_PPC_SMT 64 +#define KVM_CAP_PPC_RMA65 +#define KVM_CAP_MAX_VCPUS 66 /* returns max vcpus per vm */ +#define KVM_CAP_PPC_HIOR 67 +#define KVM_CAP_PPC_PAPR 68 #ifdef KVM_CAP_IRQ_ROUTING @@ -746,6 +758,9 @@ struct kvm_clock_data { /* Available with KVM_CAP_XCRS */ #define KVM_GET_XCRS _IOR(KVMIO, 0xa6, struct kvm_xcrs) #define KVM_SET_XCRS _IOW(KVMIO, 0xa7, struct kvm_xcrs) +#define
[Qemu-devel] [PATCH 32/58] PPC: Add new target config for pseries
We only support -M pseries when certain prerequisites are met, such as a PPC64 guest and libfdt. To only gather these requirements in a single place, this patch introduces a new CONFIG_PSERIES variable that gets set when all prerequisites are met. Signed-off-by: Alexander Graf ag...@suse.de --- Makefile.target |6 ++ configure |3 +++ 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/Makefile.target b/Makefile.target index 3f689ce..7160b35 100644 --- a/Makefile.target +++ b/Makefile.target @@ -239,10 +239,8 @@ obj-ppc-y += ppc_oldworld.o # NewWorld PowerMac obj-ppc-y += ppc_newworld.o # IBM pSeries (sPAPR) -ifeq ($(CONFIG_FDT)$(TARGET_PPC64),yy) -obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o -obj-ppc-y += xics.o spapr_vty.o spapr_llan.o spapr_vscsi.o -endif +obj-ppc-$(CONFIG_PSERIES) += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o +obj-ppc-$(CONFIG_PSERIES) += xics.o spapr_vty.o spapr_llan.o spapr_vscsi.o # PowerPC 4xx boards obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o obj-ppc-y += ppc440.o ppc440_bamboo.o diff --git a/configure b/configure index 0875f95..d59fbd5 100755 --- a/configure +++ b/configure @@ -3402,6 +3402,9 @@ case $target_arch2 in fi fi esac +if test $target_arch2 = ppc64 -a $fdt = yes; then + echo CONFIG_PSERIES=y $config_target_mak +fi if test $target_bigendian = yes ; then echo TARGET_WORDS_BIGENDIAN=y $config_target_mak fi -- 1.6.0.2
[Qemu-devel] [PATCH 54/58] openpic: Unfold write_IRQreg
The helper function write_IRQreg was always called with a specific argument on the type of register to access. Inside the function we were simply doing a switch on that constant argument again. It's a lot easier to just unfold this into two separate functions and call each individually. Reported-by: Blue Swirl blauwir...@gmail.com Signed-off-by: Alexander Graf ag...@suse.de --- hw/openpic.c | 79 +++-- 1 files changed, 37 insertions(+), 42 deletions(-) diff --git a/hw/openpic.c b/hw/openpic.c index fbd8837..43b8f27 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -482,30 +482,25 @@ static inline uint32_t read_IRQreg_ipvp(openpic_t *opp, int n_IRQ) return opp-src[n_IRQ].ipvp; } -static inline void write_IRQreg (openpic_t *opp, int n_IRQ, - uint32_t reg, uint32_t val) +static inline void write_IRQreg_ide(openpic_t *opp, int n_IRQ, uint32_t val) { uint32_t tmp; -switch (reg) { -case IRQ_IPVP: -/* NOTE: not fully accurate for special IRQs, but simple and - sufficient */ -/* ACTIVITY bit is read-only */ -opp-src[n_IRQ].ipvp = -(opp-src[n_IRQ].ipvp 0x4000) | -(val 0x800F00FF); -openpic_update_irq(opp, n_IRQ); -DPRINTF(Set IPVP %d to 0x%08x - 0x%08x\n, -n_IRQ, val, opp-src[n_IRQ].ipvp); -break; -case IRQ_IDE: -tmp = val 0xC000; -tmp |= val ((1ULL MAX_CPU) - 1); -opp-src[n_IRQ].ide = tmp; -DPRINTF(Set IDE %d to 0x%08x\n, n_IRQ, opp-src[n_IRQ].ide); -break; -} +tmp = val 0xC000; +tmp |= val ((1ULL MAX_CPU) - 1); +opp-src[n_IRQ].ide = tmp; +DPRINTF(Set IDE %d to 0x%08x\n, n_IRQ, opp-src[n_IRQ].ide); +} + +static inline void write_IRQreg_ipvp(openpic_t *opp, int n_IRQ, uint32_t val) +{ +/* NOTE: not fully accurate for special IRQs, but simple and sufficient */ +/* ACTIVITY bit is read-only */ +opp-src[n_IRQ].ipvp = (opp-src[n_IRQ].ipvp 0x4000) + | (val 0x800F00FF); +openpic_update_irq(opp, n_IRQ); +DPRINTF(Set IPVP %d to 0x%08x - 0x%08x\n, n_IRQ, val, +opp-src[n_IRQ].ipvp); } #if 0 // Code provision for Intel model @@ -535,10 +530,10 @@ static void write_doorbell_register (penpic_t *opp, int n_dbl, { switch (offset) { case DBL_IVPR_OFFSET: -write_IRQreg(opp, IRQ_DBL0 + n_dbl, IRQ_IPVP, value); +write_IRQreg_ipvp(opp, IRQ_DBL0 + n_dbl, value); break; case DBL_IDE_OFFSET: -write_IRQreg(opp, IRQ_DBL0 + n_dbl, IRQ_IDE, value); +write_IRQreg_ide(opp, IRQ_DBL0 + n_dbl, value); break; case DBL_DMR_OFFSET: opp-doorbells[n_dbl].dmr = value; @@ -576,10 +571,10 @@ static void write_mailbox_register (openpic_t *opp, int n_mbx, opp-mailboxes[n_mbx].mbr = value; break; case MBX_IVPR_OFFSET: -write_IRQreg(opp, IRQ_MBX0 + n_mbx, IRQ_IPVP, value); +write_IRQreg_ipvp(opp, IRQ_MBX0 + n_mbx, value); break; case MBX_DMR_OFFSET: -write_IRQreg(opp, IRQ_MBX0 + n_mbx, IRQ_IDE, value); +write_IRQreg_ide(opp, IRQ_MBX0 + n_mbx, value); break; } } @@ -636,7 +631,7 @@ static void openpic_gbl_write (void *opaque, target_phys_addr_t addr, uint32_t v { int idx; idx = (addr - 0x10A0) 4; -write_IRQreg(opp, opp-irq_ipi0 + idx, IRQ_IPVP, val); +write_IRQreg_ipvp(opp, opp-irq_ipi0 + idx, val); } break; case 0x10E0: /* SPVE */ @@ -729,10 +724,10 @@ static void openpic_timer_write (void *opaque, uint32_t addr, uint32_t val) opp-timers[idx].tibc = val; break; case 0x20: /* TIVP */ -write_IRQreg(opp, opp-irq_tim0 + idx, IRQ_IPVP, val); +write_IRQreg_ipvp(opp, opp-irq_tim0 + idx, val); break; case 0x30: /* TIDE */ -write_IRQreg(opp, opp-irq_tim0 + idx, IRQ_IDE, val); +write_IRQreg_ide(opp, opp-irq_tim0 + idx, val); break; } } @@ -782,10 +777,10 @@ static void openpic_src_write (void *opaque, uint32_t addr, uint32_t val) idx = addr 5; if (addr 0x10) { /* EXDE / IFEDE / IEEDE */ -write_IRQreg(opp, idx, IRQ_IDE, val); +write_IRQreg_ide(opp, idx, val); } else { /* EXVP / IFEVP / IEEVP */ -write_IRQreg(opp, idx, IRQ_IPVP, val); +write_IRQreg_ipvp(opp, idx, val); } } @@ -835,8 +830,8 @@ static void openpic_cpu_write_internal(void *opaque, target_phys_addr_t addr, case 0x70: idx = (addr - 0x40) 4; /* we use IDE as mask which CPUs to deliver the IPI to still. */ -write_IRQreg(opp, opp-irq_ipi0 + idx, IRQ_IDE, - opp-src[opp-irq_ipi0 + idx].ide | val); +write_IRQreg_ide(opp, opp-irq_ipi0 + idx, + opp-src[opp-irq_ipi0 + idx].ide
[Qemu-devel] [PATCH 12/58] PPC: E500: create multiple envs
When creating a VM, we should go through smp_cpus and create a virtual CPU for every CPU the user requested. This patch adds support for that and moves some code around to make that more convenient. Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppce500_mpc8544ds.c | 44 +--- 1 files changed, 29 insertions(+), 15 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 1274a3e..8d05587 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -226,7 +226,7 @@ static void mpc8544ds_init(ram_addr_t ram_size, const char *cpu_model) { PCIBus *pci_bus; -CPUState *env; +CPUState *env = NULL; uint64_t elf_entry; uint64_t elf_lowaddr; target_phys_addr_t entry=0; @@ -240,24 +240,40 @@ static void mpc8544ds_init(ram_addr_t ram_size, qemu_irq *irqs, *mpic; DeviceState *dev; struct boot_info *boot_info; +CPUState *firstenv = NULL; -/* Setup CPU */ +/* Setup CPUs */ if (cpu_model == NULL) { cpu_model = e500v2_v30; } -env = cpu_ppc_init(cpu_model); -if (!env) { -fprintf(stderr, Unable to initialize CPU!\n); -exit(1); -} +for (i = 0; i smp_cpus; i++) { +qemu_irq *input; +env = cpu_ppc_init(cpu_model); +if (!env) { +fprintf(stderr, Unable to initialize CPU!\n); +exit(1); +} + +if (!firstenv) { +firstenv = env; +} -/* XXX register timer? */ -ppc_emb_timers_init(env, 4, PPC_INTERRUPT_DECR); -ppc_dcr_init(env, NULL, NULL); +env-spr[SPR_BOOKE_PIR] = env-cpu_index = i; -/* Register reset handler */ -qemu_register_reset(mpc8544ds_cpu_reset, env); +/* XXX register timer? */ +ppc_emb_timers_init(env, 4, PPC_INTERRUPT_DECR); +ppc_dcr_init(env, NULL, NULL); +/* XXX Enable DEC interrupts - probably wrong in the backend */ +env-spr[SPR_40x_TCR] = 1 26; + +/* Register reset handler */ +boot_info = g_malloc0(sizeof(struct boot_info)); +qemu_register_reset(mpc8544ds_cpu_reset, env); +env-load_info = boot_info; +} + +env = firstenv; /* Fixup Memory size on a alignment boundary */ ram_size = ~(RAM_SIZES_ALIGN - 1); @@ -336,8 +352,6 @@ static void mpc8544ds_init(ram_addr_t ram_size, } } -boot_info = g_malloc0(sizeof(struct boot_info)); - /* If we're loading a kernel directly, we must load the device tree too. */ if (kernel_filename) { #ifndef CONFIG_FDT @@ -350,10 +364,10 @@ static void mpc8544ds_init(ram_addr_t ram_size, exit(1); } +boot_info = env-load_info; boot_info-entry = entry; boot_info-dt_base = dt_base; } -env-load_info = boot_info; if (kvm_enabled()) { kvmppc_init(); -- 1.6.0.2
[Qemu-devel] [PATCH 03/58] spapr: make irq customizable via qdev
From: Paolo Bonzini pbonz...@redhat.com This also lets the user see the irq in info qtree. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Cc: Alexander Graf ag...@suse.de Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr_vio.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c index ba2e1c1..0546ccb 100644 --- a/hw/spapr_vio.c +++ b/hw/spapr_vio.c @@ -52,6 +52,10 @@ static struct BusInfo spapr_vio_bus_info = { .name = spapr-vio, .size = sizeof(VIOsPAPRBus), +.props = (Property[]) { +DEFINE_PROP_UINT32(irq, VIOsPAPRDevice, vio_irq_num, 0), \ +DEFINE_PROP_END_OF_LIST(), +}, }; VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg) @@ -604,7 +608,9 @@ static int spapr_vio_busdev_init(DeviceState *qdev, DeviceInfo *qinfo) } dev-qdev.id = id; -dev-vio_irq_num = bus-irq++; +if (!dev-vio_irq_num) { +dev-vio_irq_num = bus-irq++; +} dev-qirq = spapr_find_qirq(spapr, dev-vio_irq_num); rtce_init(dev); -- 1.6.0.2
[Qemu-devel] [PATCH 25/58] PPC: E500: Update cpu-release-addr property in cpu nodes
The guest OS wants to know where the guest spins, so let's tell him while updating the CPU nodes with the frequencies anyways. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - use new spin table address --- hw/ppce500_mpc8544ds.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 3b8b449..a3e1ce4 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -125,9 +125,15 @@ static int mpc8544_load_device_tree(CPUState *env, for (i = 0; i smp_cpus; i++) { char cpu_name[128]; +uint64_t cpu_release_addr[] = { +cpu_to_be64(MPC8544_SPIN_BASE + (i * 0x20)) +}; + snprintf(cpu_name, sizeof(cpu_name), /cpus/PowerPC,8544@%x, i); qemu_devtree_setprop_cell(fdt, cpu_name, clock-frequency, clock_freq); qemu_devtree_setprop_cell(fdt, cpu_name, timebase-frequency, tb_freq); +qemu_devtree_setprop(fdt, cpu_name, cpu-release-addr, + cpu_release_addr, sizeof(cpu_release_addr)); } for (i = smp_cpus; i 32; i++) { -- 1.6.0.2
[Qemu-devel] [PATCH 07/58] PPC: Fix IPI support in MPIC
The current IPI support in the MPIC code is incomplete and doesn't work. This code adds proper support for IPIs in MPIC by using the IDE register to remember which CPUs IPIs are still outstanding to. New triggers through the IPI trigger register only add to the list of CPUs we want to IPI. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - Use MAX_IPI instead of hardcoded 4 Signed-off-by: Alexander Graf ag...@suse.de --- hw/openpic.c | 17 +++-- 1 files changed, 15 insertions(+), 2 deletions(-) diff --git a/hw/openpic.c b/hw/openpic.c index f7d5583..9710ac0 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -57,7 +57,7 @@ #define MAX_MBX 4 #define MAX_TMR 4 #define VECTOR_BITS 8 -#define MAX_IPI 0 +#define MAX_IPI 4 #define VID (0x) @@ -840,7 +840,9 @@ static void openpic_cpu_write_internal(void *opaque, target_phys_addr_t addr, case 0x60: case 0x70: idx = (addr - 0x40) 4; -write_IRQreg(opp, opp-irq_ipi0 + idx, IRQ_IDE, val); +/* we use IDE as mask which CPUs to deliver the IPI to still. */ +write_IRQreg(opp, opp-irq_ipi0 + idx, IRQ_IDE, + opp-src[opp-irq_ipi0 + idx].ide | val); openpic_set_irq(opp, opp-irq_ipi0 + idx, 1); openpic_set_irq(opp, opp-irq_ipi0 + idx, 0); break; @@ -934,6 +936,17 @@ static uint32_t openpic_cpu_read_internal(void *opaque, target_phys_addr_t addr, reset_bit(src-ipvp, IPVP_ACTIVITY); src-pending = 0; } + +if ((n_IRQ = opp-irq_ipi0) (n_IRQ (opp-irq_ipi0 + MAX_IPI))) { +src-ide = ~(1 idx); +if (src-ide !test_bit(src-ipvp, IPVP_SENSE)) { +/* trigger on CPUs that didn't know about it yet */ +openpic_set_irq(opp, n_IRQ, 1); +openpic_set_irq(opp, n_IRQ, 0); +/* if all CPUs knew about it, set active bit again */ +set_bit(src-ipvp, IPVP_ACTIVITY); +} +} } break; case 0xB0: /* PEOI */ -- 1.6.0.2
[Qemu-devel] [PATCH 43/58] KVM: Update kernel headers
Another round of KVM features, another round of kernel header updates :) Signed-off-by: Alexander Graf ag...@suse.de --- linux-headers/asm-powerpc/kvm.h | 40 +++ linux-headers/linux/kvm.h | 18 + 2 files changed, 58 insertions(+), 0 deletions(-) diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h index 579e219..28eecf0 100644 --- a/linux-headers/asm-powerpc/kvm.h +++ b/linux-headers/asm-powerpc/kvm.h @@ -284,6 +284,11 @@ struct kvm_guest_debug_arch { #define KVM_INTERRUPT_UNSET-2U #define KVM_INTERRUPT_SET_LEVEL-3U +#define KVM_CPU_4401 +#define KVM_CPU_E500V2 2 +#define KVM_CPU_3S_32 3 +#define KVM_CPU_3S_64 4 + /* for KVM_CAP_SPAPR_TCE */ struct kvm_create_spapr_tce { __u64 liobn; @@ -295,4 +300,39 @@ struct kvm_allocate_rma { __u64 rma_size; }; +struct kvm_book3e_206_tlb_entry { + __u32 mas8; + __u32 mas1; + __u64 mas2; + __u64 mas7_3; +}; + +struct kvm_book3e_206_tlb_params { + /* +* For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV: +* +* - The number of ways of TLB0 must be a power of two between 2 and +* 16. +* - TLB1 must be fully associative. +* - The size of TLB0 must be a multiple of the number of ways, and +* the number of sets must be a power of two. +* - The size of TLB1 may not exceed 64 entries. +* - TLB0 supports 4 KiB pages. +* - The page sizes supported by TLB1 are as indicated by +* TLB1CFG (if MMUCFG[MAVN] = 0) or TLB1PS (if MMUCFG[MAVN] = 1) +* as returned by KVM_GET_SREGS. +* - TLB2 and TLB3 are reserved, and their entries in tlb_sizes[] +* and tlb_ways[] must be zero. +* +* tlb_ways[n] = tlb_sizes[n] means the array is fully associative. +* +* KVM will adjust TLBnCFG based on the sizes configured here, +* though arrays greater than 2048 entries will have TLBnCFG[NENTRY] +* set to zero. +*/ + __u32 tlb_sizes[4]; + __u32 tlb_ways[4]; + __u32 reserved[8]; +}; + #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 2062375..8bb6cde 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -556,6 +556,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_MAX_VCPUS 66 /* returns max vcpus per vm */ #define KVM_CAP_PPC_HIOR 67 #define KVM_CAP_PPC_PAPR 68 +#define KVM_CAP_SW_TLB 69 #ifdef KVM_CAP_IRQ_ROUTING @@ -635,6 +636,21 @@ struct kvm_clock_data { __u32 pad[9]; }; +#define KVM_MMU_FSL_BOOKE_NOHV 0 +#define KVM_MMU_FSL_BOOKE_HV 1 + +struct kvm_config_tlb { + __u64 params; + __u64 array; + __u32 mmu_type; + __u32 array_len; +}; + +struct kvm_dirty_tlb { + __u64 bitmap; + __u32 num_dirty; +}; + /* * ioctls for VM fds */ @@ -761,6 +777,8 @@ struct kvm_clock_data { #define KVM_CREATE_SPAPR_TCE _IOW(KVMIO, 0xa8, struct kvm_create_spapr_tce) /* Available with KVM_CAP_RMA */ #define KVM_ALLOCATE_RMA _IOR(KVMIO, 0xa9, struct kvm_allocate_rma) +/* Available with KVM_CAP_SW_TLB */ +#define KVM_DIRTY_TLB_IOW(KVMIO, 0xaa, struct kvm_dirty_tlb) #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 0) -- 1.6.0.2
[Qemu-devel] [PATCH 16/58] PPC: KVM: Add generic function to read host clockfreq
We need to find out the host's clock-frequency when running on KVM, so let's export a respective function. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - enable 64bit values --- target-ppc/kvm.c | 67 ++ target-ppc/kvm_ppc.h |1 + 2 files changed, 68 insertions(+), 0 deletions(-) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 21f35af..77b98c4 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -14,6 +14,7 @@ * */ +#include dirent.h #include sys/types.h #include sys/ioctl.h #include sys/mman.h @@ -38,6 +39,8 @@ do { } while (0) #endif +#define PROC_DEVTREE_CPU /proc/device-tree/cpus/ + const KVMCapabilityInfo kvm_arch_required_capabilities[] = { KVM_CAP_LAST_INFO }; @@ -509,6 +512,70 @@ uint32_t kvmppc_get_tbfreq(void) return retval; } +/* Try to find a device tree node for a CPU with clock-frequency property */ +static int kvmppc_find_cpu_dt(char *buf, int buf_len) +{ +struct dirent *dirp; +DIR *dp; + +if ((dp = opendir(PROC_DEVTREE_CPU)) == NULL) { +printf(Can't open directory PROC_DEVTREE_CPU \n); +return -1; +} + +buf[0] = '\0'; +while ((dirp = readdir(dp)) != NULL) { +FILE *f; +snprintf(buf, buf_len, %s%s/clock-frequency, PROC_DEVTREE_CPU, + dirp-d_name); +f = fopen(buf, r); +if (f) { +snprintf(buf, buf_len, %s%s, PROC_DEVTREE_CPU, dirp-d_name); +fclose(f); +break; +} +buf[0] = '\0'; +} +closedir(dp); +if (buf[0] == '\0') { +printf(Unknown host!\n); +return -1; +} + +return 0; +} + +uint64_t kvmppc_get_clockfreq(void) +{ +char buf[512]; +uint32_t tb[2]; +FILE *f; +int len; + +if (kvmppc_find_cpu_dt(buf, sizeof(buf))) { +return 0; +} + +strncat(buf, /clock-frequency, sizeof(buf) - strlen(buf)); + +f = fopen(buf, rb); +if (!f) { +return -1; +} + +len = fread(tb, sizeof(tb[0]), 2, f); +fclose(f); +switch (len) { +case 1: +/* freq is only a single cell */ +return tb[0]; +case 2: +return *(uint64_t*)tb; +} + +return 0; +} + int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len) { uint32_t *hc = (uint32_t*)buf; diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 2f32249..7c08c0f 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -23,6 +23,7 @@ int kvmppc_read_host_property(const char *node_path, const char *prop, #endif uint32_t kvmppc_get_tbfreq(void); +uint64_t kvmppc_get_clockfreq(void); int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len); int kvmppc_set_interrupt(CPUState *env, int irq, int level); -- 1.6.0.2
Re: [Qemu-devel] [PATCH] PPC: Fix for the gdb single step problem on an rfi instruction
Hi! On Fri, 12 Aug 2011 15:29:58 +0200, Elie Richa wrote: I've had this problem recently and your patch does fix the issue, thanks! I like to bump this as this was not in the latest ppc patch queue. Is there anything wrong with that patch? TIA Best, Sebastian On 08/10/2011 01:41 PM, Sebastian Bauer wrote: When using gdb to single step a ppc interrupt routine, the execution flow passes the rfi instruction without actually returning from the interrupt. The patch fixes this by avoiding to update the nip when the debug exception is raised and a previous POWERPC_EXCP_SYNC was set. The latter is the case only, if code for rfi or a related instruction was generated. Signed-off-by: Sebastian Bauer m...@sebastianbauer.info --- diff --git a/target-ppc/translate.c b/target-ppc/translate.c index fd7c208..42b91fd 100644 --- a/target-ppc/translate.c +++ b/target-ppc/translate.c @@ -287,7 +287,7 @@ static inline void gen_debug_exception(DisasContext *ctx) { TCGv_i32 t0; - if (ctx-exception != POWERPC_EXCP_BRANCH) + if (ctx-exception != POWERPC_EXCP_BRANCH ctx-exception != POWERPC_EXCP_SYNC) gen_update_nip(ctx, ctx-nip); t0 = tcg_const_i32(EXCP_DEBUG); gen_helper_raise_exception(t0);
[Qemu-devel] [PATCH 46/58] ppc: booke206: use MAV=2.0 TSIZE definition, fix 4G pages
From: Scott Wood scottw...@freescale.com This definition is backward compatible with MAV=1.0 as long as the guest does not set reserved bits in MAS1/MAS4. Also, fix the shift in booke206_tlb_to_page_size -- it's the base that should be able to hold a 4G page size, not the shift count. Signed-off-by: Scott Wood scottw...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppce500_mpc8544ds.c |2 +- target-ppc/cpu.h |4 ++-- target-ppc/helper.c|5 +++-- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 61151d8..8095516 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -174,7 +174,7 @@ out: /* Create -kernel TLB entries for BookE, linearly spanning 256MB. */ static inline target_phys_addr_t booke206_page_size_to_tlb(uint64_t size) { -return (ffs(size 10) - 1) 1; +return ffs(size 10) - 1; } static void mmubooke_create_initial_mapping(CPUState *env, diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index 5200e6e..32706df 100644 --- a/target-ppc/cpu.h +++ b/target-ppc/cpu.h @@ -667,8 +667,8 @@ enum { #define MAS0_ATSEL_TLB 0 #define MAS0_ATSEL_LRATMAS0_ATSEL -#define MAS1_TSIZE_SHIFT 8 -#define MAS1_TSIZE_MASK(0xf MAS1_TSIZE_SHIFT) +#define MAS1_TSIZE_SHIFT 7 +#define MAS1_TSIZE_MASK(0x1f MAS1_TSIZE_SHIFT) #define MAS1_TS_SHIFT 12 #define MAS1_TS(1 MAS1_TS_SHIFT) diff --git a/target-ppc/helper.c b/target-ppc/helper.c index d1bc574..73796c8 100644 --- a/target-ppc/helper.c +++ b/target-ppc/helper.c @@ -1293,7 +1293,7 @@ target_phys_addr_t booke206_tlb_to_page_size(CPUState *env, ppcmas_tlb_t *tlb) { uint32_t tlbncfg; int tlbn = booke206_tlbm_to_tlbn(env, tlb); -target_phys_addr_t tlbm_size; +int tlbm_size; tlbncfg = env-spr[SPR_BOOKE_TLB0CFG + tlbn]; @@ -1301,9 +1301,10 @@ target_phys_addr_t booke206_tlb_to_page_size(CPUState *env, ppcmas_tlb_t *tlb) tlbm_size = (tlb-mas1 MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT; } else { tlbm_size = (tlbncfg TLBnCFG_MINSIZE) TLBnCFG_MINSIZE_SHIFT; +tlbm_size = 1; } -return (1 (tlbm_size 1)) 10; +return 1024ULL tlbm_size; } /* TLB check function for MAS based SoftTLBs */ -- 1.6.0.2
[Qemu-devel] [PATCH 30/58] MPC8544DS: Generate CPU nodes on init
With this patch, we generate CPU nodes in the machine initialization, giving us the freedom to generate as many nodes as we want and as the machine supports, but only those. This is a first step towards a much cleaner device tree generation infrastructure, where we would not require precompiled dtb blobs anymore. Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppce500_mpc8544ds.c | 46 +- 1 files changed, 33 insertions(+), 13 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index a3e1ce4..dfa8034 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -123,23 +123,43 @@ static int mpc8544_load_device_tree(CPUState *env, hypercall, sizeof(hypercall)); } -for (i = 0; i smp_cpus; i++) { +/* We need to generate the cpu nodes in reverse order, so Linux can pick + the first node as boot node and be happy */ +for (i = smp_cpus - 1; i = 0; i--) { char cpu_name[128]; -uint64_t cpu_release_addr[] = { -cpu_to_be64(MPC8544_SPIN_BASE + (i * 0x20)) -}; +uint64_t cpu_release_addr = cpu_to_be64(MPC8544_SPIN_BASE + (i * 0x20)); + +for (env = first_cpu; env != NULL; env = env-next_cpu) { +if (env-cpu_index == i) { +break; +} +} + +if (!env) { +continue; +} -snprintf(cpu_name, sizeof(cpu_name), /cpus/PowerPC,8544@%x, i); +snprintf(cpu_name, sizeof(cpu_name), /cpus/PowerPC,8544@%x, env-cpu_index); +qemu_devtree_add_subnode(fdt, cpu_name); qemu_devtree_setprop_cell(fdt, cpu_name, clock-frequency, clock_freq); qemu_devtree_setprop_cell(fdt, cpu_name, timebase-frequency, tb_freq); -qemu_devtree_setprop(fdt, cpu_name, cpu-release-addr, - cpu_release_addr, sizeof(cpu_release_addr)); -} - -for (i = smp_cpus; i 32; i++) { -char cpu_name[128]; -snprintf(cpu_name, sizeof(cpu_name), /cpus/PowerPC,8544@%x, i); -qemu_devtree_nop_node(fdt, cpu_name); +qemu_devtree_setprop_string(fdt, cpu_name, device_type, cpu); +qemu_devtree_setprop_cell(fdt, cpu_name, reg, env-cpu_index); +qemu_devtree_setprop_cell(fdt, cpu_name, d-cache-line-size, + env-dcache_line_size); +qemu_devtree_setprop_cell(fdt, cpu_name, i-cache-line-size, + env-icache_line_size); +qemu_devtree_setprop_cell(fdt, cpu_name, d-cache-size, 0x8000); +qemu_devtree_setprop_cell(fdt, cpu_name, i-cache-size, 0x8000); +qemu_devtree_setprop_cell(fdt, cpu_name, bus-frequency, 0); +if (env-cpu_index) { +qemu_devtree_setprop_string(fdt, cpu_name, status, disabled); +qemu_devtree_setprop_string(fdt, cpu_name, enable-method, spin-table); +qemu_devtree_setprop(fdt, cpu_name, cpu-release-addr, + cpu_release_addr, sizeof(cpu_release_addr)); +} else { +qemu_devtree_setprop_string(fdt, cpu_name, status, okay); +} } ret = rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr); -- 1.6.0.2
[Qemu-devel] [PATCH 10/58] PPC: MPIC: Fix CI bit definitions
The bit definitions for critical interrupt routing are in PowerPC order (most significant bit is 0), while we end up shifting it with normal bit order. Turn the numbers around so we actually end up fetching the right ones. Signed-off-by: Alexander Graf ag...@suse.de --- hw/openpic.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/openpic.c b/hw/openpic.c index dfec52e..109c1bc 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -131,11 +131,11 @@ enum { #define MPIC_CPU_REG_SIZE 0x100 + ((MAX_CPU - 1) * 0x1000) enum mpic_ide_bits { -IDR_EP = 0, -IDR_CI0 = 1, -IDR_CI1 = 2, -IDR_P1 = 30, -IDR_P0 = 31, +IDR_EP = 31, +IDR_CI0 = 30, +IDR_CI1 = 29, +IDR_P1 = 1, +IDR_P0 = 0, }; #else -- 1.6.0.2
Re: [Qemu-devel] [PATCH 50/58] pseries: Update SLOF firmware image
On 14 September 2011 09:43, Alexander Graf ag...@suse.de wrote: From: David Gibson da...@gibson.dropbear.id.au The current SLOF firmware for the pseries machine has a bug in SCSI condition handling that was exposed by recent updates to qemu's SCSI emulation. This patch updates the SLOF image to one with the bug fixed. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- pc-bios/README | 2 +- pc-bios/slof.bin | Bin 579072 - 57 bytes 2 files changed, 1 insertions(+), 1 deletions(-) I confess to not really understanding how we keep the git submodules and the binary blobs in sync, but shouldn't there be a reference in the commit message to the git commit hash for the slof sources corresponding to this blob, and maybe also an update to roms/SLOF here? (cf commit d67c3f2c for example) ? -- PMM
[Qemu-devel] [PATCH 31/58] PPC: E500: Bump CPU count to 15
Now that we have everything in place, make the machine description aware of the fact that we can now handle 15 virtual CPUs! Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - Max cpus is 15 because of MPIC --- hw/ppce500_mpc8544ds.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index dfa8034..b86a008 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -396,6 +396,7 @@ static QEMUMachine mpc8544ds_machine = { .name = mpc8544ds, .desc = mpc8544ds, .init = mpc8544ds_init, +.max_cpus = 15, }; static void mpc8544ds_machine_init(void) -- 1.6.0.2
[Qemu-devel] [PATCH 21/58] PPC: KVM: Add stubs for kvm helper functions
We have a bunch of helper functions that don't have any stubs for them in case we don't have CONFIG_KVM enabled. That didn't bite us so far, because gcc can optimize them out pretty well, but we should really provide them. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - use uint64_t for clockfreq --- target-ppc/kvm_ppc.h | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 0c659c8..76f98d9 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -11,11 +11,37 @@ void kvmppc_init(void); +#ifdef CONFIG_KVM + uint32_t kvmppc_get_tbfreq(void); uint64_t kvmppc_get_clockfreq(void); int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len); int kvmppc_set_interrupt(CPUState *env, int irq, int level); +#else + +static inline uint32_t kvmppc_get_tbfreq(void) +{ +return 0; +} + +static inline uint64_t kvmppc_get_clockfreq(void) +{ +return 0; +} + +static inline int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len) +{ +return -1; +} + +static inline int kvmppc_set_interrupt(CPUState *env, int irq, int level) +{ +return -1; +} + +#endif + #ifndef CONFIG_KVM #define kvmppc_eieio() do { } while (0) #else -- 1.6.0.2
[Qemu-devel] [PATCH 38/58] pseries: interrupt controller should not have a 'reg' property
From: David Gibson da...@gibson.dropbear.id.au The interrupt controller presented in the device tree for the pseries machine is manipulated by the guest only through hypervisor calls. It has no real or emulated registers for the guest to access. However, it currently has a bogus 'reg' property advertising a register window. Moreover, this property has an invalid format, being a 32-bit zero, when the #address-cells property on the root bus indicates that it needs a 64-bit address. Since the guest never attempts to manipulate the node directly, it works, but it is ugly and can cause warnings when manipulating the device tree in other tools (such as future firmware versions). This patch, therefore, corrects the problem by entirely removing the interrupt-controller node's 'reg' property. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index bb00ae6..9eefef9 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -194,12 +194,11 @@ static void *spapr_create_fdt_skel(const char *cpu_model, _FDT((fdt_end_node(fdt))); /* interrupt controller */ -_FDT((fdt_begin_node(fdt, interrupt-controller@0))); +_FDT((fdt_begin_node(fdt, interrupt-controller))); _FDT((fdt_property_string(fdt, device_type, PowerPC-External-Interrupt-Presentation))); _FDT((fdt_property_string(fdt, compatible, IBM,ppc-xicp))); -_FDT((fdt_property_cell(fdt, reg, 0))); _FDT((fdt_property(fdt, interrupt-controller, NULL, 0))); _FDT((fdt_property(fdt, ibm,interrupt-server-ranges, interrupt_server_ranges_prop, -- 1.6.0.2
[Qemu-devel] [PATCH 18/58] PPC: E500: Remove mpc8544_copy_soc_cell
We don't need mpc8544_copy_soc_cell anymore, since we're explicitly reading host values and writing guest values respectively. Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppce500_mpc8544ds.c | 24 1 files changed, 0 insertions(+), 24 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 8748531..2c7c677 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -56,30 +56,6 @@ struct boot_info uint32_t entry; }; -#ifdef CONFIG_FDT -static int mpc8544_copy_soc_cell(void *fdt, const char *node, const char *prop) -{ -uint32_t cell; -int ret; - -ret = kvmppc_read_host_property(node, prop, cell, sizeof(cell)); -if (ret 0) { -fprintf(stderr, couldn't read host %s/%s\n, node, prop); -goto out; -} - -ret = qemu_devtree_setprop_cell(fdt, /cpus/PowerPC,8544@0, -prop, cell); -if (ret 0) { -fprintf(stderr, couldn't set guest /cpus/PowerPC,8544@0/%s\n, prop); -goto out; -} - -out: -return ret; -} -#endif - static int mpc8544_load_device_tree(CPUState *env, target_phys_addr_t addr, uint32_t ramsize, -- 1.6.0.2
[Qemu-devel] [PATCH 06/58] PPC: Extend MPIC MMIO range
The MPIC exports a page for each CPU that it controls. To support more than one CPU, we need to also reserve the MMIO space according to the amount of CPUs we want to support. Signed-off-by: Alexander Graf ag...@suse.de --- hw/openpic.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/openpic.c b/hw/openpic.c index cf89f23..f7d5583 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -128,7 +128,7 @@ enum { #define MPIC_MSI_REG_START0x11C00 #define MPIC_MSI_REG_SIZE 0x100 #define MPIC_CPU_REG_START0x2 -#define MPIC_CPU_REG_SIZE 0x100 +#define MPIC_CPU_REG_SIZE 0x100 + ((MAX_CPU - 1) * 0x1000) enum mpic_ide_bits { IDR_EP = 0, -- 1.6.0.2
[Qemu-devel] [PATCH 49/58] vscsi: send the CHECK_CONDITION status down together with autosense data
From: Paolo Bonzini pbonz...@redhat.com I introduced this bug in commit 05751d3 (vscsi: always use get_sense, 2011-08-03) because at the time there was no way to expose a sense condition to SLOF and Linux manages to work around the bug. However, the bug becomes evident now that SCSI devices also report unit attention on reset. SLOF also has problems dealing with unit attention conditions, so it still will not boot even with this fix (just like OpenBIOS). IBM folks are aware of their part of the bug. :-) Reported-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Acked-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr_vscsi.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c index 6fc82f6..e8426d7 100644 --- a/hw/spapr_vscsi.c +++ b/hw/spapr_vscsi.c @@ -483,7 +483,6 @@ static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status) if (status == CHECK_CONDITION) { req-senselen = scsi_req_get_sense(req-sreq, req-sense, sizeof(req-sense)); -status = 0; dprintf(VSCSI: Sense data, %d bytes:\n, len); dprintf( %02x %02x %02x %02x %02x %02x %02x %02x\n, req-sense[0], req-sense[1], req-sense[2], req-sense[3], -- 1.6.0.2
[Qemu-devel] [PATCH 41/58] pseries: Add real mode debugging hcalls
From: David Gibson da...@gibson.dropbear.id.au PAPR systems support several hypercalls intended for use in real mode debugging tools. These implement reads and writes to arbitrary guest physical addresses. This is useful for real mode software because it allows access to IO addresses and memory outside the RMA without going through the somewhat involved process of setting up the hash page table and enabling translation. We want these so that when we add real IO devices, the SLOF firmware can boot from them without having to enter virtual mode. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: David Gibson d...@au1.ibm.com Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr_hcall.c | 73 ++ 1 files changed, 73 insertions(+), 0 deletions(-) diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c index 70f853c..0c61c10 100644 --- a/hw/spapr_hcall.c +++ b/hw/spapr_hcall.c @@ -463,6 +463,67 @@ static target_ulong h_rtas(CPUState *env, sPAPREnvironment *spapr, nret, rtas_r3 + 12 + 4*nargs); } +static target_ulong h_logical_load(CPUState *env, sPAPREnvironment *spapr, + target_ulong opcode, target_ulong *args) +{ +target_ulong size = args[0]; +target_ulong addr = args[1]; + +switch (size) { +case 1: +args[0] = ldub_phys(addr); +return H_SUCCESS; +case 2: +args[0] = lduw_phys(addr); +return H_SUCCESS; +case 4: +args[0] = ldl_phys(addr); +return H_SUCCESS; +case 8: +args[0] = ldq_phys(addr); +return H_SUCCESS; +} +return H_PARAMETER; +} + +static target_ulong h_logical_store(CPUState *env, sPAPREnvironment *spapr, +target_ulong opcode, target_ulong *args) +{ +target_ulong size = args[0]; +target_ulong addr = args[1]; +target_ulong val = args[2]; + +switch (size) { +case 1: +stb_phys(addr, val); +return H_SUCCESS; +case 2: +stw_phys(addr, val); +return H_SUCCESS; +case 4: +stl_phys(addr, val); +return H_SUCCESS; +case 8: +stq_phys(addr, val); +return H_SUCCESS; +} +return H_PARAMETER; +} + +static target_ulong h_logical_icbi(CPUState *env, sPAPREnvironment *spapr, + target_ulong opcode, target_ulong *args) +{ +/* Nothing to do on emulation, KVM will trap this in the kernel */ +return H_SUCCESS; +} + +static target_ulong h_logical_dcbf(CPUState *env, sPAPREnvironment *spapr, + target_ulong opcode, target_ulong *args) +{ +/* Nothing to do on emulation, KVM will trap this in the kernel */ +return H_SUCCESS; +} + static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1]; static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1]; @@ -527,6 +588,18 @@ static void hypercall_init(void) spapr_register_hypercall(H_REGISTER_VPA, h_register_vpa); spapr_register_hypercall(H_CEDE, h_cede); +/* debugger hcalls (also used by SLOF). Note: We do -not- differenciate + * here between the CI and the CACHE variants, they will use whatever + * mapping attributes qemu is using. When using KVM, the kernel will + * enforce the attributes more strongly + */ +spapr_register_hypercall(H_LOGICAL_CI_LOAD, h_logical_load); +spapr_register_hypercall(H_LOGICAL_CI_STORE, h_logical_store); +spapr_register_hypercall(H_LOGICAL_CACHE_LOAD, h_logical_load); +spapr_register_hypercall(H_LOGICAL_CACHE_STORE, h_logical_store); +spapr_register_hypercall(H_LOGICAL_ICBI, h_logical_icbi); +spapr_register_hypercall(H_LOGICAL_DCBF, h_logical_dcbf); + /* qemu/KVM-PPC specific hcalls */ spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas); } -- 1.6.0.2
[Qemu-devel] [PATCH 08/58] PPC: Set MPIC IDE for IPI to 0
We use the IDE register with IPIs as a mask to keep track which processors have already acknowledged the respective interrupt. So we need to initialize it to 0 to make sure that it doesn't accidently fire an IPI on CPU0 when the first IPI is triggered. Reported-by: Elie Richa ri...@adacore.com Signed-off-by: Alexander Graf ag...@suse.de --- v2 - v3: - fix IDE IPI reset --- hw/openpic.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/hw/openpic.c b/hw/openpic.c index 9710ac0..31ad175 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -1299,6 +1299,10 @@ static void mpic_reset (void *opaque) mpp-src[i].ipvp = 0x8080; mpp-src[i].ide = 0x0001; } +/* Set IDE for IPIs to 0 so we don't get spurious interrupts */ +for (i = mpp-irq_ipi0; i (mpp-irq_ipi0 + MAX_IPI); i++) { +mpp-src[i].ide = 0; +} /* Initialise IRQ destinations */ for (i = 0; i MAX_CPU; i++) { mpp-dst[i].pctp = 0x000F; -- 1.6.0.2
[Qemu-devel] [PATCH 29/58] MPC8544DS: Remove CPU nodes
We want to generate the CPU nodes in machine init code, so remove them from the device tree definition that we precompile. Signed-off-by: Alexander Graf ag...@suse.de --- pc-bios/mpc8544ds.dtb | Bin 2277 - 2028 bytes pc-bios/mpc8544ds.dts | 12 2 files changed, 0 insertions(+), 12 deletions(-) diff --git a/pc-bios/mpc8544ds.dtb b/pc-bios/mpc8544ds.dtb index ae318b1fe83846cc2e133951a3666fcfcdf87f79..c6d302153c7407d5d0127be29b0c35f80e47f8fb 100644 GIT binary patch delta 424 zcmaDV_=aEO0`I@K3=HgV7#J8V7#P?t0BH%76f7eAO-?P8KC%#jT*{~lRq;qVGNu+ zgGpO80wTx2Se#mvnV92XVrpOj5@H5o79dUoaVFO=n@yHu7E~+@qhp%%K^lVK%DC zOh63N(K9)OS(!0yas{(Dk?LOn)z6*G!y?7RuxYXeOPCPDVW4@8NM@d#Jb@*NiQ(ep zFDXnqh(mFQVQVF#G^2fdh6R3*?3eKTTP3F~Xi9v};53e2?KrxtWfnp$OF!E7 zLApVn2ZjurwO}FhPlr`dQBX*1n*4+n2~GZ2Ia}o?0Nb{iFxU%#SBTM#ky%lsfDGf edC8Rw$*DOxx|w+?sTB;#Ir+)i2u`Q9CHADZ%Xk1 delta 636 zcmaFE|5Q-p0`I@K3=AAk85kHW7#P?qfV2h3j(nK5CZ{YEPTIqlPkLJ!3$Ad1_IB zvyO$SiHU;Seh9~vH-DTazQCb0LJ$Paex5E4+OFmkod`He4yqApb%Vr6B@stfk6! z4_B}V%tP=uK19QJs6iW9+=rQJZnmWEm!T#^aN1n7mbC@*oFs0P!Ut)gQCAci^e z?LL0%0TrOh*s~wtStKu$plcqfdJG*M`*4%wa-|B0wQVBw?w^FPM{7?mdbu4v= zD`BxVle%fkldm(RuP2me-bdkuuD*+UPxfUqK2ntdV_z%P`#=!_^f#-u;0ETO z4yM|z{ml*!iFuFF?#X@wu$vAy2**j8L7HCnR%(Y#hF#944D`rFf}OBU`|P9Zfa6u zajI@wQEFjnYF=_BLsDrm5-L?KRFwTUzC`ao?6V1oSKuPo0*rA%2+WufPD@C{)cX6 diff --git a/pc-bios/mpc8544ds.dts b/pc-bios/mpc8544ds.dts index a88b47c..7eb3160 100644 --- a/pc-bios/mpc8544ds.dts +++ b/pc-bios/mpc8544ds.dts @@ -25,18 +25,6 @@ cpus { #address-cells = 1; #size-cells = 0; - - PowerPC,8544@0 { - device_type = cpu; - reg = 0x0; - d-cache-line-size = 32; // 32 bytes - i-cache-line-size = 32; // 32 bytes - d-cache-size = 0x8000;// L1, 32K - i-cache-size = 0x8000;// L1, 32K - timebase-frequency = 0; - bus-frequency = 0; - clock-frequency = 0; - }; }; memory { -- 1.6.0.2
Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote: On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote: Note: 1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario. You can increase the length of the slice, if the request is larger than slice_time * bps_limit. Yeah, but it is a challenge for how to increase it. Do you have some nice idea? If the queue is empty, and the request being processed does not fit the queue, increase the slice so that the request fits. That is, make BLOCK_IO_SLICE_TIME dynamic and adjust it as described above (if the bps or io limits change, reset it to the default BLOCK_IO_SLICE_TIME). 2.) When dd command is issued in guest, if its option bs is set to a large value such as bs=1024K, the result speed will slightly bigger than the limits. Why? This issue has not existed. I will remove it. When drive bps=100, i did some testings on guest VM. 1.) bs=1024K 18+0 records in 18+0 records out 18874368 bytes (19 MB) copied, 26.6268 s, 709 kB/s 2.) bs=2048K 18+0 records in 18+0 records out 37748736 bytes (38 MB) copied, 46.5336 s, 811 kB/s There is lots of debugging leftovers in the patch. sorry, i forgot to remove them.
[Qemu-devel] [PATCH 37/58] pseries: Add a phandle to the xicp interrupt controller device tree node
From: David Gibson da...@gibson.dropbear.id.au Future devices we will be adding to the pseries machine (e.g. PCI) will need nodes in the device tree which explicitly reference the top-level interrupt controller via interrupt-parent or interrupt-map properties. In order to do this, the interrupt controller node needs an assigned phandle. This patch adds the appropriate property, in preparation. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 760e323..bb00ae6 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -57,6 +57,8 @@ #define MAX_CPUS256 #define XICS_IRQS 1024 +#define PHANDLE_XICP0x + sPAPREnvironment *spapr; static void *spapr_create_fdt_skel(const char *cpu_model, @@ -202,6 +204,9 @@ static void *spapr_create_fdt_skel(const char *cpu_model, _FDT((fdt_property(fdt, ibm,interrupt-server-ranges, interrupt_server_ranges_prop, sizeof(interrupt_server_ranges_prop; +_FDT((fdt_property_cell(fdt, #interrupt-cells, 2))); +_FDT((fdt_property_cell(fdt, linux,phandle, PHANDLE_XICP))); +_FDT((fdt_property_cell(fdt, phandle, PHANDLE_XICP))); _FDT((fdt_end_node(fdt))); -- 1.6.0.2
[Qemu-devel] [PATCH 52/58] ppc405: use RAM_ADDR_FMT instead of %08lx
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com The RAM_ADDR_FMT macro hides the type of ram_addr_t so that format strings can be safely used. Make sure to use RAM_ADDR_FMT so that the build works on 32-bit hosts with Xen enabled. Whether Xen should affect ppc TCG targets is questionable but a separate issue. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppc405_boards.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/ppc405_boards.c b/hw/ppc405_boards.c index e6c8ac6..712a6be 100644 --- a/hw/ppc405_boards.c +++ b/hw/ppc405_boards.c @@ -213,7 +213,8 @@ static void ref405ep_init (ram_addr_t ram_size, sram_size = 512 * 1024; sram_offset = qemu_ram_alloc(NULL, ef405ep.sram, sram_size); #ifdef DEBUG_BOARD_INIT -printf(%s: register SRAM at offset %08lx\n, __func__, sram_offset); +printf(%s: register SRAM at offset RAM_ADDR_FMT \n, + __func__, sram_offset); #endif cpu_register_physical_memory(0xFFF0, sram_size, sram_offset | IO_MEM_RAM); @@ -357,7 +358,7 @@ static void ref405ep_init (ram_addr_t ram_size, #ifdef DEBUG_BOARD_INIT printf(%s: Done\n, __func__); #endif -printf(bdloc %016lx\n, (unsigned long)bdloc); +printf(bdloc RAM_ADDR_FMT \n, bdloc); } static QEMUMachine ref405ep_machine = { -- 1.6.0.2
[Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code
CPUs that are not the boot CPU need to run in spinning code to check if they should run off to execute and if so where to jump to. This usually happens by leaving secondary CPUs looping and checking if some variable in memory changed. In an environment like Qemu however we can be more clever. We can just export the spin table the primary CPU modifies as MMIO region that would event based wake up the respective secondary CPUs. That saves us quite some cycles while the secondary CPUs are not up yet. So this patch adds a PV device that simply exports the spinning table into the guest and thus allows the primary CPU to wake up secondary ones. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - change into MMIO scheme - map the secondary NIP instead of 0 1:1 - only map 64MB for TLB, same as u-boot - prepare code for 64-bit spinnings v2 - v3: - remove r6 - set MAS2_M - map EA 0 - use second TLB1 entry v3 - v4: - change to memoryops v4 - v5: - fix endianness bugs --- Makefile.target|2 +- hw/ppce500_mpc8544ds.c | 33 - hw/ppce500_spin.c | 186 3 files changed, 216 insertions(+), 5 deletions(-) create mode 100644 hw/ppce500_spin.c diff --git a/Makefile.target b/Makefile.target index 2ed9099..3f689ce 100644 --- a/Makefile.target +++ b/Makefile.target @@ -247,7 +247,7 @@ endif obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o obj-ppc-y += ppc440.o ppc440_bamboo.o # PowerPC E500 boards -obj-ppc-y += ppce500_mpc8544ds.o mpc8544_guts.o +obj-ppc-y += ppce500_mpc8544ds.o mpc8544_guts.o ppce500_spin.o # PowerPC 440 Xilinx ML507 reference board. obj-ppc-y += virtex_ml507.o obj-ppc-$(CONFIG_KVM) += kvm_ppc.o diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 9379624..3b8b449 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -49,6 +49,7 @@ #define MPC8544_PCI_IO 0xE100 #define MPC8544_PCI_IOLEN 0x1 #define MPC8544_UTIL_BASE (MPC8544_CCSRBAR_BASE + 0xe) +#define MPC8544_SPIN_BASE 0xEF00 struct boot_info { @@ -164,6 +165,18 @@ static void mmubooke_create_initial_mapping(CPUState *env, tlb-mas7_3 |= MAS3_UR | MAS3_UW | MAS3_UX | MAS3_SR | MAS3_SW | MAS3_SX; } +static void mpc8544ds_cpu_reset_sec(void *opaque) +{ +CPUState *env = opaque; + +cpu_reset(env); + +/* Secondary CPU starts in halted state for now. Needs to change when + implementing non-kernel boot. */ +env-halted = 1; +env-exception_index = EXCP_HLT; +} + static void mpc8544ds_cpu_reset(void *opaque) { CPUState *env = opaque; @@ -172,6 +185,7 @@ static void mpc8544ds_cpu_reset(void *opaque) cpu_reset(env); /* Set initial guest state. */ +env-halted = 0; env-gpr[1] = (1620) - 8; env-gpr[3] = bi-dt_base; env-nip = bi-entry; @@ -199,7 +213,6 @@ static void mpc8544ds_init(ram_addr_t ram_size, unsigned int pci_irq_nrs[4] = {1, 2, 3, 4}; qemu_irq **irqs, *mpic; DeviceState *dev; -struct boot_info *boot_info; CPUState *firstenv = NULL; /* Setup CPUs */ @@ -234,9 +247,16 @@ static void mpc8544ds_init(ram_addr_t ram_size, env-spr[SPR_40x_TCR] = 1 26; /* Register reset handler */ -boot_info = g_malloc0(sizeof(struct boot_info)); -qemu_register_reset(mpc8544ds_cpu_reset, env); -env-load_info = boot_info; +if (!i) { +/* Primary CPU */ +struct boot_info *boot_info; +boot_info = g_malloc0(sizeof(struct boot_info)); +qemu_register_reset(mpc8544ds_cpu_reset, env); +env-load_info = boot_info; +} else { +/* Secondary CPUs */ +qemu_register_reset(mpc8544ds_cpu_reset_sec, env); +} } env = firstenv; @@ -289,6 +309,9 @@ static void mpc8544ds_init(ram_addr_t ram_size, } } +/* Register spinning region */ +sysbus_create_simple(e500-spin, MPC8544_SPIN_BASE, NULL); + /* Load kernel. */ if (kernel_filename) { kernel_size = load_uimage(kernel_filename, entry, loadaddr, NULL); @@ -321,6 +344,8 @@ static void mpc8544ds_init(ram_addr_t ram_size, /* If we're loading a kernel directly, we must load the device tree too. */ if (kernel_filename) { +struct boot_info *boot_info; + #ifndef CONFIG_FDT cpu_abort(env, Compiled without FDT support - can't load kernel\n); #endif diff --git a/hw/ppce500_spin.c b/hw/ppce500_spin.c new file mode 100644 index 000..38451ac --- /dev/null +++ b/hw/ppce500_spin.c @@ -0,0 +1,186 @@ +#include hw.h +#include sysemu.h +#include sysbus.h +#include kvm.h + +#define MAX_CPUS 32 + +typedef struct spin_info { +uint64_t addr; +uint64_t r3; +uint32_t resv; +uint32_t pir; +uint64_t reserved; +} __attribute__ ((packed)) SpinInfo; + +typedef struct spin_state { +SysBusDevice busdev; +
[Qemu-devel] [PATCH 19/58] PPC: bamboo: Use kvm api for freq and clock frequencies
Now that we have nice and shiny APIs to read out the host's clock and timebase frequencies, let's use them in the bamboo code as well! Signed-off-by: Alexander Graf ag...@suse.de --- hw/ppc440_bamboo.c | 45 - 1 files changed, 12 insertions(+), 33 deletions(-) diff --git a/hw/ppc440_bamboo.c b/hw/ppc440_bamboo.c index 65d4f0f..1523764 100644 --- a/hw/ppc440_bamboo.c +++ b/hw/ppc440_bamboo.c @@ -31,38 +31,6 @@ #define FDT_ADDR 0x180 #define RAMDISK_ADDR 0x190 -#ifdef CONFIG_FDT -static int bamboo_copy_host_cell(void *fdt, const char *node, const char *prop) -{ -uint32_t cell; -int ret; - -ret = kvmppc_read_host_property(node, prop, cell, sizeof(cell)); -if (ret 0) { -fprintf(stderr, couldn't read host %s/%s\n, node, prop); -goto out; -} - -ret = qemu_devtree_setprop_cell(fdt, node, prop, cell); -if (ret 0) { -fprintf(stderr, couldn't set guest %s/%s\n, node, prop); -goto out; -} - -out: -return ret; -} - -static void bamboo_fdt_update(void *fdt) -{ -/* Copy data from the host device tree into the guest. Since the guest can - * directly access the timebase without host involvement, we must expose - * the correct frequencies. */ -bamboo_copy_host_cell(fdt, /cpus/cpu@0, clock-frequency); -bamboo_copy_host_cell(fdt, /cpus/cpu@0, timebase-frequency); -} -#endif - static int bamboo_load_device_tree(target_phys_addr_t addr, uint32_t ramsize, target_phys_addr_t initrd_base, @@ -75,6 +43,8 @@ static int bamboo_load_device_tree(target_phys_addr_t addr, char *filename; int fdt_size; void *fdt; +uint32_t tb_freq = 4; +uint32_t clock_freq = 4; filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, BINARY_DEVICE_TREE_FILE); if (!filename) { @@ -108,10 +78,19 @@ static int bamboo_load_device_tree(target_phys_addr_t addr, if (ret 0) fprintf(stderr, couldn't set /chosen/bootargs\n); +/* Copy data from the host device tree into the guest. Since the guest can + * directly access the timebase without host involvement, we must expose + * the correct frequencies. */ if (kvm_enabled()) { -bamboo_fdt_update(fdt); +tb_freq = kvmppc_get_tbfreq(); +clock_freq = kvmppc_get_clockfreq(); } +qemu_devtree_setprop_cell(fdt, /cpus/cpu@0, clock-frequency, + clock_freq); +qemu_devtree_setprop_cell(fdt, /cpus/cpu@0, timebase-frequency, + tb_freq); + ret = rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr); g_free(fdt); -- 1.6.0.2