Re: [Qemu-devel] [PATCH 4/4] hw/ppc/spapr: Implement the h_page_init hypercall

2016-02-10 Thread Thomas Huth
On 11.02.2016 00:47, David Gibson wrote:
> On Wed, Feb 10, 2016 at 07:09:12PM +0100, Thomas Huth wrote:
>> This hypercall either initializes a page with zeros, or copies
>> another page.
>> According to LoPAPR, the i-cache of the page should also be
>> flushed when using H_ICACHE_INVALIDATE or H_ICACHE_SYNCHRONIZE,
>> but this is currently only done when running with TCG - assuming
>> the cache will be flushed with KVM anyway when switching back to
>> kernel / VM context.
> 
> I don't think that's true.  dcache and icache aren't usually flushed
> by kernel/user or even process context changes in Linux.  Cache
> control instructions aren't priveleged so, I think to get this right
> you'd need a helper which does dcbst and icbi across the page.

Oh, ok, didn't know/expect that ... I'll try to come up with a solution...

> I'm pretty sure libc needs to do this at several points, but alas I
> don't think there's an exported function to do it.

There seems to be at least a __builtin___clear_cache() function for
flushing the instruction cache (see
).

I haven't found anything similar for the data cache yet ... in the worst
case, I guess I need to write a wrapper with some inline assembly,
similar to kvmppc_eieio() in kvm_ppc.h ... would that be acceptable?

>> The code currently also does not explicitely flush the data cache
>> with H_ICACHE_SYNCHRONIZE, since this either also should be done
>> when switching back to kernel / VM context (with KVM), or not matter
>> anyway (for TCG).
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  hw/ppc/spapr_hcall.c | 42 ++
>>  1 file changed, 42 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index f14f849..91e703d 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -387,6 +387,47 @@ static target_ulong h_set_xdabr(PowerPCCPU *cpu, 
>> sPAPRMachineState *spapr,
>>  return H_SUCCESS;
>>  }
>>  
>> +static target_ulong h_page_init(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>> +target_ulong opcode, target_ulong *args)
>> +{
>> +target_ulong flags = args[0];
>> +target_ulong dest = args[1];
>> +target_ulong src = args[2];
>> +uint8_t buf[TARGET_PAGE_SIZE];
>> +
>> +if (flags & ~(H_ICACHE_SYNCHRONIZE | H_ICACHE_INVALIDATE
>> +  | H_COPY_PAGE | H_ZERO_PAGE)) {
>> +qemu_log_mask(LOG_UNIMP, "h_page_init: Bad flags (" TARGET_FMT_lx 
>> "\n",
>> +  flags);
> 
> This should return H_PARAMETER as well as logging, surely?

Yes, that's likely better.

>> +}
>> +
>> +if (!is_ram_address(spapr, dest) || (dest & ~TARGET_PAGE_MASK) != 0) {
>> +return H_PARAMETER;
>> +}
>> +
>> +if (flags & H_COPY_PAGE) {
>> +if (!is_ram_address(spapr, src) || (src & ~TARGET_PAGE_MASK) != 0) {
>> +return H_PARAMETER;
>> +}
>> +cpu_physical_memory_read(src, buf, TARGET_PAGE_SIZE);
>> +} else if (flags & H_ZERO_PAGE) {
>> +memset(buf, 0, TARGET_PAGE_SIZE);
>> +}
>> +
>> +if (flags & (H_COPY_PAGE | H_ZERO_PAGE)) {
>> +cpu_physical_memory_write(dest, buf, TARGET_PAGE_SIZE);
>> +}
> 
> Hmm, so this does 2 copies for an H_COPY_PAGE and a zero and a copy
> for H_ZERO_PAGE, which is going to be substantially slower than the
> caller might expect.

I guess I'd better use cpu_physical_memory_map and memset/memcpy.
I likely need cpu_physical_memory_map now anyway to be able to flush the
caches related to that page...

>> +if (flags & (H_ICACHE_SYNCHRONIZE | H_ICACHE_INVALIDATE)) {
>> +if (tcg_enabled()) {
>> +tb_flush(CPU(cpu));
>> +}
>> +/* XXX: Flush data cache for H_ICACHE_SYNCHRONIZE? */
>> +}
>> +
>> +return H_SUCCESS;
>> +}

Thanks for the review!

 Thomas




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 1/4] hw/ppc/spapr: Add h_set_sprg0 hypercall

2016-02-10 Thread Thomas Huth
On 11.02.2016 00:30, David Gibson wrote:
> On Wed, Feb 10, 2016 at 07:09:09PM +0100, Thomas Huth wrote:
>> This is a very simple hypercall that only sets up the SPRG0
>> register for the guest (since writing to SPRG0 was only permitted
>> to the hypervisor in older versions of the PowerISA).
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  hw/ppc/spapr_hcall.c | 15 +--
>>  1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index 12f8c33..58103ef 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -332,6 +332,15 @@ static target_ulong h_read(PowerPCCPU *cpu, 
>> sPAPRMachineState *spapr,
>>  return H_SUCCESS;
>>  }
>>  
>> +static target_ulong h_set_sprg0(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>> +target_ulong opcode, target_ulong *args)
>> +{
>> +CPUState *cs = CPU(cpu);
>> +
>> +set_spr(cs, SPR_SPRG0, args[0], -1L);
> 
> This looks correct, but I think set_spr() is serious overkill here.
> It does some fancy synchronization designed for setting one cpu's SPR
> from an hcall executed on a different CPU.  In this case the calling
> CPU is just setting its own SPRG0, so just
>   cpu_synchronize_state()
>   env->spr[SPR_SPRG0] = XXX
> 
> Should be sufficient.

AFAIK the synchronization stuff is skipped when set_spr() runs already
on the destination CPU, but ok, since h-calls should be fast, I can
change this anyway to save some precious cycles.

 Thomas




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [v10][PATCH 03/10] piix: create host bridge to passthrough

2016-02-10 Thread Alex Williamson
On Thu, 11 Feb 2016 02:25:51 +
"Kay, Allen M"  wrote:

> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Tuesday, February 09, 2016 10:01 PM
> > To: Kay, Allen M 
> > Cc: ehabk...@redhat.com; m...@redhat.com;
> > stefano.stabell...@eu.citrix.com; qemu-devel@nongnu.org;
> > pbonz...@redhat.com; r...@twiddle.net; Ruan, Shuai
> > 
> > Subject: Re: [Qemu-devel] [v10][PATCH 03/10] piix: create host bridge to
> > passthrough
> > 
> > I can run that system as either primary or secondary.  On both systems I'm
> > using pci-stub to keep i915 from interfering on the host, on the SNB laptop 
> > I
> > also use video=efifb:off since it's configured for UEFI boot, I'm not sure 
> > if
> > that play any role in why it's not working.
> >   
> 
> Hi Alex,
> 
> My understand if that your IVB system is a desktop running legacy mode BIOS 
> and your SNB is a laptop running EFI mode BIOS, correct?  I'm curious how 
> does your SNB system getting hold of VBT table, which is needed for lighting 
> up local display.
> 
> There are two ways the driver can get VBT table:  1) VGA 0xc memory,  2) 
> OpRegion
> 
> On your IVB system, the guest driver would try to get VBT from 0xc memory 
> if IGD is configured as primary controller in the guest, assuming KVM/QEMU 
> copies VGA 0xc memory from host to guest 0xc area.   If IGD is 
> configured as secondary controller, the driver would get VBT from OpRegion.  
> Given IGD works in both configurations on IVB, this mean guest driver can 
> successfully read VBT from both 0xc area and OpRegion.
> 
> On your SNB system that is running in EFI mode on the host, 0xc memory 
> does not contain VBIOS/VBT.  This means if you configure IGD as primary 
> controller in the guest, the driver cannot get to VBT at 0xc memory even 
> if KVM/QEMU copies 0xc memory from host to guest.  If you configure IGD 
> as secondary controller in the guest, and guest driver should be able to read 
> VBT from the OpRegion.  You might want to trace i915 to see how does the  
> guest driver get to VBT via OpRegion in this configuration.

Hi Allen,

Success!  We don't directly pass 0xc from the host, but we can use
PCI option ROMs loaded via files into QEMU.  For the IVB system I
haven't been bothering with this, even in primary mode I've just been
letting the display initialize later in the guest boot.  I also haven't
been enabling VGA mode access in that case.  With the SNB laptop, if I
extract the ROM from sysfs, modify the device ID and fix the checksum,
the panel lights up, with or without VGA mode access (though of course
I only see a flash of SeaBIOS when VGA mode is enabled).  So it seems
that either the user is going to need to hack their own ROM file or I'm
going to need to make vfio do this automatically and pretend that the
ROM is actually implemented as a BAR on IGD.  The latter seems far more
accessible.  Lacking an actual ROM BAR, we'll also of course only be
able to do that when IGD is primary graphics on the host.

I'm not sure how we can do secondary mode in the VM with this config
though since only the primary graphics gets the coveted 0xc
location.  FWIW, to make this work I added 'romfile=igd.rom' to the
vfio-pci device with igd.rom being the modified copy retrieved from
sysfs.  Then I used '-vga none' to disable the QEMU primary VGA device
(-nodefaults is probably an option too).  I then added 'addr=2.0' to
the vfio-pci device to make it be the primary graphics from SeaBIOS'
perspective and added '-device secondary-vga,addr=3.0 -vnc :1', which
predominantly gives me a VNC connection where I can interact with the
VM via mouse and keyboard.

I'll need to do some further investigation to see what we're really
getting with the OpRegion.  Prior to adding the ROM, the Xorg log file
sure seems like it knew the LVDS panel was 1920x1080, but it might be
seeing more modes now that it has a video BIOS.  I'll also try to prune
the registers copied into the virtual host bridge and ISA bridge config
space to the minimum we need.

> Another potential problem to watch out for with laptop vs. desktop has to do 
> FLR timeout.  If you are working with a laptop with LCD panel attached (the 
> usual case), FLR will take much longer than 100ms to finish.  The reason is 
> the devices needs to power down the LCD panel before it can finish FLR. I 
> have seen it takes more than 500ms for FLR to finish.  As a work around, I 
> have increase FLR time out in Linux PCI driver to 1000ms when working with 
> LCD panels.   Given that you have seen evidences that IGD HW is working, this 
> might not be your issue.   I would focus tracing how does i915 get VBT from 
> the OpRegion when IGD is configured as secondary controller in the guest.

Ok, I did see evidence of this.  In my case the VM would always fail to
start on the first try with errors in dmesg indicating that vfio was
reading back -1 from config space of the device.  Presu

Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay

2016-02-10 Thread Pavel Dovgalyuk
> From: Kevin Wolf [mailto:kw...@redhat.com]
> Am 10.02.2016 um 13:51 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > Am 10.02.2016 um 13:05 hat Pavel Dovgalyuk geschrieben:
> > > > > Am 09.02.2016 um 12:52 hat Pavel Dovgalyuk geschrieben:
> > > > > > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > > > > > But even this doesn't feel completely right, because block 
> > > > > > > drivers are
> > > > > > > already layered and there is no need to hardcode something 
> > > > > > > optional (and
> > > > > > > rarely used) in the hot code path that could just be another 
> > > > > > > layer.
> > > > > > >
> > > > > > > I assume that you know beforehand if you want to replay 
> > > > > > > something, so
> > > > > > > requiring you to configure your block devices with a replay 
> > > > > > > driver on
> > > > > > > top of the stack seems reasonable enough.
> > > > > >
> > > > > > I cannot use block drivers for this. When driver functions are 
> > > > > > used, QEMU
> > > > > > is already used coroutines (and probably started bottom halves).
> > > > > > Coroutines make execution non-deterministic.
> > > > > > That's why we have to intercept blk_aio_ functions, that are called
> > > > > > deterministically.
> > > > >
> > > > > What does "deterministic" mean in this context, i.e. what are your 
> > > > > exact
> > > > > requirements?
> > > >
> > > > "Deterministic" means that the replayed execution should run exactly
> > > > the same guest instructions in the same sequence, as in recording 
> > > > session.
> > >
> > > Okay. I think with this we can do better than what you have now.
> > >
> > > > > I don't think that coroutines introduce anything non-deterministic per
> > > > > se. Depending on what you mean by it, the block layer code paths in
> > > > > block.c may contain problematic code.
> > > >
> > > > They are non-deterministic if we need instruction-level accuracy.
> > > > Thread switching (and therefore callbacks and BH execution) is 
> > > > non-deterministic.
> > >
> > > Thread switching depends on an external event (the kernel scheduler
> > > deciding to switch), so agreed, if a thread switch ever influences what
> > > the guest sees, that would be a problem.
> > >
> > > Generally, however, callbacks and BHs don't involve a thread switch at
> > > all (BHs can be invoked from a different thread in theory, but we have
> > > very few of those cases and they shouldn't be visible for the guest).
> > > The same is true for coroutines, which are semantically equivalent to
> > > callbacks.
> > >
> > > > In two different executions these callbacks may happen at different 
> > > > moments of
> > > > time (counting in number of executed instructions).
> > > > All operations with virtual devices (including memory, interrupt 
> > > > controller,
> > > > and disk drive controller) should happen at deterministic moments of 
> > > > time
> > > > to be replayable.
> > >
> > > Right, so let's talk about what this external non-deterministic event
> > > really is.
> > >
> > > I think the only thing whose timing is unknown in the block layer is the
> > > completion of I/O requests. This non-determinism comes from the time the
> > > I/O syscalls made by the lowest layer (usually raw-posix) take.
> >
> > Right.
> >
> > > This means that we can add logic to remove the non-determinism at the
> > > point of our choice between raw-posix and the guest device emulation. A
> > > block driver on top is as good as anything else.
> > >
> > > While recording, this block driver would just pass the request to next
> > > lower layer (starting a request is deterministic, so it doesn't need to
> > > be logged) and once the request completes it logs it. While replaying,
> > > the completion of requests is delayed until we read it in the log; if we
> > > read it in the log and the request hasn't completed yet, we do a busy
> > > wait for it (while(!completed) aio_poll();).
> >
> > I tried serializing all bottom halves and worker thread callbacks in
> > previous version of the patches. That code was much more complicated
> > and error-prone than the current version. We had to classify all bottom
> > halves to recorded and non-recorded (because sometimes they are used
> > for qemu's purposes, not the guest ones).
> >
> > However, I don't understand yet which layer do you offer as the candidate
> > for record/replay? What functions should be changed?
> > I would like to investigate this way, but I don't got it yet.
> 
> At the core, I wouldn't change any existing function, but introduce a
> new block driver. You could copy raw_bsd.c for a start and then tweak
> it. Leave out functions that you don't want to support, and add the
> necessary magic to .bdrv_co_readv/writev.
> 
> Something like this (can probably be generalised for more than just
> reads as the part after the bdrv_co_reads() call should be the same for
> reads, writes and any other request types):
> 
> int blkreplay_co_readv()
> {
> BlockRe

Re: [Qemu-devel] [PATCH] w32: include winsock2.h before windows.h

2016-02-10 Thread Michael Tokarev
09.02.2016 17:16, Paolo Bonzini wrote:
> Recent Fedora complains while compiling ui/sdl.c:
> 
> /usr/x86_64-w64-mingw32/sys-root/mingw/include/winsock2.h:15:2: warning: 
> #warning Please include winsock2.h before windows.h [-Wcpp]
> 
> And with this patch we dutifully obey.

Applied to -trivial, adding Stefan's comments to the commit
message too.

Thanks,

/mjt



Re: [Qemu-devel] [PATCH] char: fix repeated registration of tcp chardev I/O handlers

2016-02-10 Thread Michael Tokarev
Applied to -trivial, thanks!

/mjt



Re: [Qemu-devel] [PATCH] Adds keycode 86 to the hid_usage_keys translation table.

2016-02-10 Thread Michael Tokarev
04.02.2016 22:03, Daniel Serpell wrote:
> This key is present in international keyboards, between left shift and
> the 'Z' key, ant is described in the HID usage tables as "Keyboard
> Non-US \ and |": http://www.usb.org/developers/hidpage/Hut1_12v2.pdf
> 
> This patch fixes the usb-kbd devices.

Applied to -trivial, thanks!

/mjt



Re: [Qemu-devel] [v10][PATCH 03/10] piix: create host bridge to passthrough

2016-02-10 Thread Kay, Allen M

> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, February 09, 2016 10:01 PM
> To: Kay, Allen M 
> Cc: ehabk...@redhat.com; m...@redhat.com;
> stefano.stabell...@eu.citrix.com; qemu-devel@nongnu.org;
> pbonz...@redhat.com; r...@twiddle.net; Ruan, Shuai
> 
> Subject: Re: [Qemu-devel] [v10][PATCH 03/10] piix: create host bridge to
> passthrough
> 
> I can run that system as either primary or secondary.  On both systems I'm
> using pci-stub to keep i915 from interfering on the host, on the SNB laptop I
> also use video=efifb:off since it's configured for UEFI boot, I'm not sure if
> that play any role in why it's not working.
> 

Hi Alex,

My understand if that your IVB system is a desktop running legacy mode BIOS and 
your SNB is a laptop running EFI mode BIOS, correct?  I'm curious how does your 
SNB system getting hold of VBT table, which is needed for lighting up local 
display.

There are two ways the driver can get VBT table:  1) VGA 0xc memory,  2) 
OpRegion

On your IVB system, the guest driver would try to get VBT from 0xc memory 
if IGD is configured as primary controller in the guest, assuming KVM/QEMU 
copies VGA 0xc memory from host to guest 0xc area.   If IGD is 
configured as secondary controller, the driver would get VBT from OpRegion.  
Given IGD works in both configurations on IVB, this mean guest driver can 
successfully read VBT from both 0xc area and OpRegion.

On your SNB system that is running in EFI mode on the host, 0xc memory does 
not contain VBIOS/VBT.  This means if you configure IGD as primary controller 
in the guest, the driver cannot get to VBT at 0xc memory even if KVM/QEMU 
copies 0xc memory from host to guest.  If you configure IGD as secondary 
controller in the guest, and guest driver should be able to read VBT from the 
OpRegion.  You might want to trace i915 to see how does the  guest driver get 
to VBT via OpRegion in this configuration.

Another potential problem to watch out for with laptop vs. desktop has to do 
FLR timeout.  If you are working with a laptop with LCD panel attached (the 
usual case), FLR will take much longer than 100ms to finish.  The reason is the 
devices needs to power down the LCD panel before it can finish FLR. I have seen 
it takes more than 500ms for FLR to finish.  As a work around, I have increase 
FLR time out in Linux PCI driver to 1000ms when working with LCD panels.   
Given that you have seen evidences that IGD HW is working, this might not be 
your issue.   I would focus tracing how does i915 get VBT from the OpRegion 
when IGD is configured as secondary controller in the guest.

Allen


> I've tried adding a bunch more registers on the SNB system to see if it helps,
> for the host bridge:
> 
> {0x50,2},
> {0x52,2},
> {0xa4,4},
> {0xa8,4},
> 
> (ie. the registers Tiejun added with this patch)
> 
> And on the ISA bridge:
> 
> {0x40,4},
> {0x44,4},
> {0x48,4},
> {0x4c,4},
> {0xf0,4},
> 
> These were registers I saw accessed with tracing enabled, but nothing
> seemed to change by adding them. I don't see any accesses to the BDSM
> register on the host bridge, but I do see GMCH accesses.  Sort of interesting
> is that the guest reads 201h from that register and tries to write 203h. In 
> the
> read case we have a 2MB GGMS and GGCLCK set, the guest tries to set IVD
> (IGD VGA Disable). I'm not sure why it bothers to try to do this with GGCLCK
> indicated since the register is locked, but it is slightly troubling that the 
> spec
> indicates that the BIOS must
> *not* set IVD to zero (VGA enabled) if GMS pre-allocates no memory, which
> is exactly what we're doing to try to indicate no stolen memory.
> If we could actually disable VGA on IGD, that might be a nice option, but I
> know there are issues with that (iirc, there's no disable for one of either 
> the
> VGA MMIO or I/O port space, so the only option is to disable that address
> space at the PCI command register, which has obvious implications - I think it
> was probably I/O port space).
> 
> I'm currently trying a Linux live CD on the guest for the SNB, mostly because 
> I
> can see driver error messages and look at the driver source, neither of which
> are available to me for the Windows driver.  Both of the driver warnings I get
> are from the driver thinking a certain feature should be disabled but finds it
> enabled.  They're also sort of on a strange drm release path, maybe due to
> being in secondary mode and Xorg not being configured to use the IGD on
> this image.  In case it looks familiar:
> 
> [   12.397908] [ cut here ]
> [   12.399175] WARNING: CPU: 0 PID: 1135 at
> drivers/gpu/drm/i915/intel_display.c:1259 assert_fdi_rx_pll+0x78/0xb0
> [i915]()
> [   12.401720] FDI RX PLL assertion failure (expected off

Re: [Qemu-devel] [Qemu-ppc] [PATCH] spapr: skip configuration section during migration of older machines

2016-02-10 Thread David Gibson
On Wed, Feb 10, 2016 at 09:26:12PM +0100, Greg Kurz wrote:
> On Mon, 08 Feb 2016 16:59:47 +0100
> Greg Kurz  wrote:
> > Since QEMU 2.4, we have a configuration section in the migration stream.
> > This must be skipped for older machines, like it is already done for x86.
> > 
> 
> Ouch ! It is more complex than I thought... the migration of pseries-2.3
> machine is already broken between QEMU-2.3 and QEMU-2.4. So this patch
> fixes indeed migration of a pseries-2.3 machine from QEMU-2.3, but it
> breaks migration of the same machine from QEMU-2.4 and up.
> 
> Not sure how to deal with that... is it reasonable to assume that
> pseries-2.3 running with QEMU-2.3 is the common case ? If so, this
> patch would bring more help than harm.

Ouch.  I really have no idea what would be the more common case.  I
doubt there are many people using old upstream versions in anger, but
I can't be sure of course.

> 
> > Fixes: 61964c23e5ddd5a33f15699e45ce126f879e3e33
> > Cc: qemu-sta...@nongnu.org
> > Signed-off-by: Greg Kurz 
> > ---
> >  hw/ppc/spapr.c |1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 5bd8fd3ef842..bca7cb8a5d27 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2446,6 +2446,7 @@ static void 
> > spapr_machine_2_3_instance_options(MachineState *machine)
> >  spapr_machine_2_4_instance_options(machine);
> >  savevm_skip_section_footers();
> >  global_state_set_optional();
> > +savevm_skip_configuration();
> >  }
> > 
> >  static void spapr_machine_2_3_class_options(MachineClass *mc)
> > 
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] migration: ensure htab_save_first completes after timeout

2016-02-10 Thread David Gibson
On Wed, Feb 10, 2016 at 10:20:40AM -0800, Jianjun Duan wrote:
> htab_save_first_pass could return without finishing its work due to
> timeout. The patch checks if another invocation of it is necessary and
> will call it in htab_save_complete if necessary.
> 
> Signed-off-by: Jianjun Duan 

Looks good, applied to ppc-for-2.6 (with a small fix to remove the
overly long line).

> ---
>  hw/ppc/spapr.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6bfb908..f6b6749 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1317,6 +1317,7 @@ static int htab_save_setup(QEMUFile *f, void *opaque)
>  static void htab_save_first_pass(QEMUFile *f, sPAPRMachineState *spapr,
>   int64_t max_ns)
>  {
> +bool has_timeout = max_ns != -1;
>  int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
>  int index = spapr->htab_save_index;
>  int64_t starttime = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> @@ -1350,7 +1351,7 @@ static void htab_save_first_pass(QEMUFile *f, 
> sPAPRMachineState *spapr,
>  qemu_put_buffer(f, HPTE(spapr->htab, chunkstart),
>  HASH_PTE_SIZE_64 * n_valid);
>  
> -if ((qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - starttime) > 
> max_ns) {
> +if (has_timeout && (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - 
> starttime) > max_ns) {
>  break;
>  }
>  }
> @@ -1503,6 +1504,9 @@ static int htab_save_complete(QEMUFile *f, void *opaque)
>  close(spapr->htab_fd);
>  spapr->htab_fd = -1;
>  } else {
> +if (spapr->htab_first_pass) {
> +htab_save_first_pass(f, spapr, -1);
> +}
>  htab_save_later_pass(f, spapr, -1);
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCHv2 0/7] Cleanups to Hash Page Table handling

2016-02-10 Thread David Gibson
On Tue, Feb 09, 2016 at 12:12:19PM +1000, David Gibson wrote:
> This contains some assorted cleanups and small improvements to the
> management of the Hash Page Table for 64-bit ppc systems, and the
> "pseries" machine type in particular.
> 
> These were devised in the context of getting hash page table resizing
> working, but can stand on their own.
> 
> I think this is ready to go, if I get an R-b for 4 & 5 / 7 and no
> objections to the rest I'll merge into ppc-for-2.6
> 
> Changes in v2:
>  * Split patch 4/6 into two (4,5/7) with helper for HPT size separate
>from moving the HPT initialization.

I've merged 1-6 into ppc-for-2.6.  I discovered a bug in 7/7, so I'll
post a fixed version later.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] migration: reorder code to make it symmetric

2016-02-10 Thread Wei Yang
Hello everyone,

Is this one correct?

On Thu, Feb 04, 2016 at 10:50:30PM +, Wei Yang wrote:
>In qemu_savevm_state_complete_precopy(), it iterates on each device to add
>a json object and transfer related status to destination, while the order
>of the last two steps could be refined.
>
>Current order:
>
>json_start_object()
>   save_section_header()
>   vmstate_save()
>json_end_object()
>   save_section_footer()
>
>After the change:
>
>json_start_object()
>   save_section_header()
>   vmstate_save()
>   save_section_footer()
>json_end_object()
>
>This patch reorder the code to to make it symmetric. No functional change.
>
>Signed-off-by: Wei Yang 
>---
> migration/savevm.c |5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
>diff --git a/migration/savevm.c b/migration/savevm.c
>index b9caf59..42bfab4 100644
>--- a/migration/savevm.c
>+++ b/migration/savevm.c
>@@ -1088,12 +1088,11 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, 
>bool iterable_only)
> json_prop_int(vmdesc, "instance_id", se->instance_id);
> 
> save_section_header(f, se, QEMU_VM_SECTION_FULL);
>-
> vmstate_save(f, se, vmdesc);
>-
>-json_end_object(vmdesc);
> trace_savevm_section_end(se->idstr, se->section_id, 0);
> save_section_footer(f, se);
>+
>+json_end_object(vmdesc);
> }
> 
> if (!in_postcopy) {
>-- 
>1.7.9.5

-- 
Wei Yang
Help you, Help me



[Qemu-devel] [PATCH] vfio/pci: replace 1 with PCI_CAP_LIST_NEXT to make code self-explain

2016-02-10 Thread Wei Yang
Use the macro PCI_CAP_LIST_NEXT instead of 1, so that the code would be
more self-explain.

This patch makes this change and also fixs one typo in comment.

Signed-off-by: Wei Yang 
---
 hw/vfio/pci.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1fb868c..17d1462 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1472,7 +1472,7 @@ static uint8_t vfio_std_cap_max_size(PCIDevice *pdev, 
uint8_t pos)
 uint8_t tmp, next = 0xff;
 
 for (tmp = pdev->config[PCI_CAPABILITY_LIST]; tmp;
- tmp = pdev->config[tmp + 1]) {
+ tmp = pdev->config[tmp + PCI_CAP_LIST_NEXT]) {
 if (tmp > pos && tmp < next) {
 next = tmp;
 }
@@ -1661,7 +1661,7 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos)
 int ret;
 
 cap_id = pdev->config[pos];
-next = pdev->config[pos + 1];
+next = pdev->config[pos + PCI_CAP_LIST_NEXT];
 
 /*
  * If it becomes important to configure capabilities to their actual
@@ -1675,7 +1675,7 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos)
  * pci_add_capability always inserts the new capability at the head
  * of the chain.  Therefore to end up with a chain that matches the
  * physical device, we insert from the end by making this recursive.
- * This is also why we pre-caclulate size above as cached config space
+ * This is also why we pre-calculate size above as cached config space
  * will be changed as we unwind the stack.
  */
 if (next) {
@@ -1691,7 +1691,7 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos)
 }
 
 /* Use emulated next pointer to allow dropping caps */
-pci_set_byte(vdev->emulated_config_bits + pos + 1, 0xff);
+pci_set_byte(vdev->emulated_config_bits + pos + PCI_CAP_LIST_NEXT, 0xff);
 
 switch (cap_id) {
 case PCI_CAP_ID_MSI:
-- 
2.5.0




Re: [Qemu-devel] [PATCH 1/4] hw/ppc/spapr: Add h_set_sprg0 hypercall

2016-02-10 Thread David Gibson
On Wed, Feb 10, 2016 at 07:09:09PM +0100, Thomas Huth wrote:
> This is a very simple hypercall that only sets up the SPRG0
> register for the guest (since writing to SPRG0 was only permitted
> to the hypervisor in older versions of the PowerISA).
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/ppc/spapr_hcall.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 12f8c33..58103ef 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -332,6 +332,15 @@ static target_ulong h_read(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  return H_SUCCESS;
>  }
>  
> +static target_ulong h_set_sprg0(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> +target_ulong opcode, target_ulong *args)
> +{
> +CPUState *cs = CPU(cpu);
> +
> +set_spr(cs, SPR_SPRG0, args[0], -1L);

This looks correct, but I think set_spr() is serious overkill here.
It does some fancy synchronization designed for setting one cpu's SPR
from an hcall executed on a different CPU.  In this case the calling
CPU is just setting its own SPRG0, so just
cpu_synchronize_state()
env->spr[SPR_SPRG0] = XXX

Should be sufficient.

> +return H_SUCCESS;
> +}
> +
>  static target_ulong h_set_dabr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> target_ulong opcode, target_ulong *args)
>  {
> @@ -997,6 +1006,10 @@ static void hypercall_register_types(void)
>  spapr_register_hypercall(H_REGISTER_VPA, h_register_vpa);
>  spapr_register_hypercall(H_CEDE, h_cede);
>  
> +/* processor register resource access h-calls */
> +spapr_register_hypercall(H_SET_SPRG0, h_set_sprg0);
> +spapr_register_hypercall(H_SET_MODE, h_set_mode);
> +
>  /* "debugger" hcalls (also used by SLOF). Note: We do -not- differenciate
>   * here between the "CI" and the "CACHE" variants, they will use whatever
>   * mapping attributes qemu is using. When using KVM, the kernel will
> @@ -1013,8 +1026,6 @@ static void hypercall_register_types(void)
>  /* qemu/KVM-PPC specific hcalls */
>  spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
>  
> -spapr_register_hypercall(H_SET_MODE, h_set_mode);
> -
>  /* ibm,client-architecture-support support */
>  spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 4/4] hw/ppc/spapr: Implement the h_page_init hypercall

2016-02-10 Thread David Gibson
On Wed, Feb 10, 2016 at 07:09:12PM +0100, Thomas Huth wrote:
> This hypercall either initializes a page with zeros, or copies
> another page.
> According to LoPAPR, the i-cache of the page should also be
> flushed when using H_ICACHE_INVALIDATE or H_ICACHE_SYNCHRONIZE,
> but this is currently only done when running with TCG - assuming
> the cache will be flushed with KVM anyway when switching back to
> kernel / VM context.

I don't think that's true.  dcache and icache aren't usually flushed
by kernel/user or even process context changes in Linux.  Cache
control instructions aren't priveleged so, I think to get this right
you'd need a helper which does dcbst and icbi across the page.

I'm pretty sure libc needs to do this at several points, but alas I
don't think there's an exported function to do it.

> The code currently also does not explicitely flush the data cache
> with H_ICACHE_SYNCHRONIZE, since this either also should be done
> when switching back to kernel / VM context (with KVM), or not matter
> anyway (for TCG).
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/ppc/spapr_hcall.c | 42 ++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index f14f849..91e703d 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -387,6 +387,47 @@ static target_ulong h_set_xdabr(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  return H_SUCCESS;
>  }
>  
> +static target_ulong h_page_init(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> +target_ulong opcode, target_ulong *args)
> +{
> +target_ulong flags = args[0];
> +target_ulong dest = args[1];
> +target_ulong src = args[2];
> +uint8_t buf[TARGET_PAGE_SIZE];
> +
> +if (flags & ~(H_ICACHE_SYNCHRONIZE | H_ICACHE_INVALIDATE
> +  | H_COPY_PAGE | H_ZERO_PAGE)) {
> +qemu_log_mask(LOG_UNIMP, "h_page_init: Bad flags (" TARGET_FMT_lx 
> "\n",
> +  flags);

This should return H_PARAMETER as well as logging, surely?

> +}
> +
> +if (!is_ram_address(spapr, dest) || (dest & ~TARGET_PAGE_MASK) != 0) {
> +return H_PARAMETER;
> +}
> +
> +if (flags & H_COPY_PAGE) {
> +if (!is_ram_address(spapr, src) || (src & ~TARGET_PAGE_MASK) != 0) {
> +return H_PARAMETER;
> +}
> +cpu_physical_memory_read(src, buf, TARGET_PAGE_SIZE);
> +} else if (flags & H_ZERO_PAGE) {
> +memset(buf, 0, TARGET_PAGE_SIZE);
> +}
> +
> +if (flags & (H_COPY_PAGE | H_ZERO_PAGE)) {
> +cpu_physical_memory_write(dest, buf, TARGET_PAGE_SIZE);
> +}

Hmm, so this does 2 copies for an H_COPY_PAGE and a zero and a copy
for H_ZERO_PAGE, which is going to be substantially slower than the
caller might expect.

> +if (flags & (H_ICACHE_SYNCHRONIZE | H_ICACHE_INVALIDATE)) {
> +if (tcg_enabled()) {
> +tb_flush(CPU(cpu));
> +}
> +/* XXX: Flush data cache for H_ICACHE_SYNCHRONIZE? */
> +}
> +
> +return H_SUCCESS;
> +}
> +
>  #define FLAGS_REGISTER_VPA 0x2000ULL
>  #define FLAGS_REGISTER_DTL 0x4000ULL
>  #define FLAGS_REGISTER_SLBSHADOW   0x6000ULL
> @@ -1046,6 +1087,7 @@ static void hypercall_register_types(void)
>  spapr_register_hypercall(H_SET_SPRG0, h_set_sprg0);
>  spapr_register_hypercall(H_SET_DABR, h_set_dabr);
>  spapr_register_hypercall(H_SET_XDABR, h_set_xdabr);
> +spapr_register_hypercall(H_PAGE_INIT, h_page_init);
>  spapr_register_hypercall(H_SET_MODE, h_set_mode);
>  
>  /* "debugger" hcalls (also used by SLOF). Note: We do -not- differenciate

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 2/4] hw/ppc/spapr: Implement h_set_dabr

2016-02-10 Thread David Gibson
On Wed, Feb 10, 2016 at 07:09:10PM +0100, Thomas Huth wrote:
> According to LoPAPR, h_set_dabr should simply set DABRX to 3
> (if the register is available), and load the parameter into DABR.
> If DABRX is not available, the hypervisor has to check the
> "Breakpoint Translation" bit of the DABR register first.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/ppc/spapr_hcall.c | 26 +-
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 58103ef..6b9d512 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -38,6 +38,12 @@ static void set_spr(CPUState *cs, int spr, target_ulong 
> value,
>  run_on_cpu(cs, do_spr_sync, &s);
>  }
>  
> +static bool has_spr(PowerPCCPU *cpu, int spr)
> +{
> +/* We can test whether the SPR is defined by checking for a valid name */
> +return cpu->env.spr_cb[spr].name != NULL;
> +}
> +
>  static inline bool valid_pte_index(CPUPPCState *env, target_ulong pte_index)
>  {
>  /*
> @@ -344,8 +350,20 @@ static target_ulong h_set_sprg0(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  static target_ulong h_set_dabr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> target_ulong opcode, target_ulong *args)
>  {
> -/* FIXME: actually implement this */
> -return H_HARDWARE;
> +CPUState *cs = CPU(cpu);
> +
> +if (!has_spr(cpu, SPR_DABR)) {
> +return H_HARDWARE;  /* DABR register not available */
> +}
> +
> +if (has_spr(cpu, SPR_DABRX)) {
> +set_spr(cs, SPR_DABRX, 0x3, -1L);

As in the previous patch, we shouldn't need the fancy run_on_cpu stuff
that set_spr() uses, since this is just a cpu local register update.

> +} else if (!(args[0] & 0x4)) {  /* Breakpoint Translation set? */
> +return H_RESERVED_DABR;
> +}
> +
> +set_spr(cs, SPR_DABR, args[0], -1L);
> +return H_SUCCESS;
>  }
>  
>  #define FLAGS_REGISTER_VPA 0x2000ULL
> @@ -999,15 +1017,13 @@ static void hypercall_register_types(void)
>  /* hcall-bulk */
>  spapr_register_hypercall(H_BULK_REMOVE, h_bulk_remove);
>  
> -/* hcall-dabr */
> -spapr_register_hypercall(H_SET_DABR, h_set_dabr);
> -
>  /* hcall-splpar */
>  spapr_register_hypercall(H_REGISTER_VPA, h_register_vpa);
>  spapr_register_hypercall(H_CEDE, h_cede);
>  
>  /* processor register resource access h-calls */
>  spapr_register_hypercall(H_SET_SPRG0, h_set_sprg0);
> +spapr_register_hypercall(H_SET_DABR, h_set_dabr);
>  spapr_register_hypercall(H_SET_MODE, h_set_mode);
>  
>  /* "debugger" hcalls (also used by SLOF). Note: We do -not- differenciate

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 3/4] hw/ppc/spapr: Implement the h_set_xdabr hypercall

2016-02-10 Thread David Gibson
On Wed, Feb 10, 2016 at 07:09:11PM +0100, Thomas Huth wrote:
> The H_SET_XDABR hypercall is similar to H_SET_DABR, but also sets
> the extended DABR (DABRX) register.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/ppc/spapr_hcall.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 6b9d512..f14f849 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -366,6 +366,27 @@ static target_ulong h_set_dabr(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  return H_SUCCESS;
>  }
>  
> +static target_ulong h_set_xdabr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> +target_ulong opcode, target_ulong *args)
> +{
> +CPUState *cs = CPU(cpu);
> +target_ulong dabrx = args[1];
> +
> +if (!has_spr(cpu, SPR_DABR) || !has_spr(cpu, SPR_DABRX)) {
> +return H_HARDWARE;
> +}
> +
> +if ((dabrx & ~0xfULL) != 0 || (dabrx & H_DABRX_HYPERVISOR) != 0
> +|| (dabrx & (H_DABRX_KERNEL | H_DABRX_USER)) == 0) {
> +return H_PARAMETER;
> +}
> +
> +set_spr(cs, SPR_DABRX, dabrx, -1L);
> +set_spr(cs, SPR_DABR, args[0], -1L);

Same comments for set_spr().

> +
> +return H_SUCCESS;
> +}
> +
>  #define FLAGS_REGISTER_VPA 0x2000ULL
>  #define FLAGS_REGISTER_DTL 0x4000ULL
>  #define FLAGS_REGISTER_SLBSHADOW   0x6000ULL
> @@ -1024,6 +1045,7 @@ static void hypercall_register_types(void)
>  /* processor register resource access h-calls */
>  spapr_register_hypercall(H_SET_SPRG0, h_set_sprg0);
>  spapr_register_hypercall(H_SET_DABR, h_set_dabr);
> +spapr_register_hypercall(H_SET_XDABR, h_set_xdabr);
>  spapr_register_hypercall(H_SET_MODE, h_set_mode);
>  
>  /* "debugger" hcalls (also used by SLOF). Note: We do -not- differenciate

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] memory: fix usage of find_next_bit and find_next_zero_bit

2016-02-10 Thread Peter Maydell
On 10 February 2016 at 14:11, Paolo Bonzini  wrote:
> The last two arguments to these functions are the last and first bit to
> check relative to the base.  The code was using incorrectly the first
> bit and the number of bits.  Fix this in cpu_physical_memory_get_dirty
> and cpu_physical_memory_all_dirty.  This requires a few changes in the
> iteration; change the code in cpu_physical_memory_set_dirty_range to
> match.
>
> Fixes: 5b82b70
> Cc: Stefan Hajnoczi 
> Signed-off-by: Paolo Bonzini 

I've set the pre-merge tests running so I should be able
to commit this to master first thing tomorrow. Was this
the only patch that needs applying to fix your pullreq?

thanks
-- PMM



Re: [Qemu-devel] [PATCH 1/2] qemu-iotests: add _filter_qmp_events() for filtering out QMP events

2016-02-10 Thread Eric Blake
On 02/10/2016 11:52 AM, Sascha Silbe wrote:
> Dear Max,
> 
> Max Reitz  writes:
> 
>>> +# remove QMP events from output
>>> +_filter_qmp_events()
>>> +{
>>> +sed -e '/^{\(.*, \)"event": ".*}$/ d'
>>> +}
>>
>> There is a pretty good reason test 067 uses -qmp-pretty (as you yourself
>> say, the lines get pretty long otherwise, and if we have any change
>> within, the whole line needs to be changed).
> 
> Additionally, it's a lot easier to read when indented properly,
> especially with the block info containing nested dicts.
> 
>> Using the following ugly
>> piece of code here instead, we would still be able to use it:
>>
>> tr '\n' '\t' \
>>   | sed -e
>> 's/{\s*"timestamp":\s*{[^}]*},\s*"event":[^,}]*\(,\s*"data":\s*{[^}]*}\)\?\s*}\s*//g'
>> \
>>   | tr '\t' '\n'
> 
> Nice trick. Why didn't I come up with it? ;)

Mishandles any event whose data includes nested dicts.  But a little bit
more creativity can do it in two passes (the first modifies the stream
to mark the start of an event, the second then does multiline matching
to nuke the entire event):

tr '\n' '\t' |
  sed 's/^{\(\s*"timestamp":\s*{[^}]*},\s*"event":\)/{MARK\1/g' |
  tr '\t' '\n' |
  sed '/^{MARK/,/^}/d'


> It definitely is a bit ugly, though. We can't just drop the entire line
> (using "d") as the entire stream now is a single line. Matching
> parenthesis pairs is context sensitive, so we can't just use regular
> expressions to aggregate results into lines. And before I start
> implementing a JSON indenter in awk, I'd rather rewrite the whole test
> in Python. So if we stay with the shell test for now, we need something
> like your incantation above. It's not perfect, but good enough for now
> and I can't think of anything significantly simpler right now either.
> 
> Will test your version and send a v2. Thanks for the suggestion!
> 
> Sascha
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PULL 48/49] i386: populate floppy drive information in DSDT

2016-02-10 Thread John Snow


On 02/10/2016 12:33 PM, Roman Kagan wrote:
> On Wed, Feb 10, 2016 at 12:16:32PM -0500, John Snow wrote:
>> On 02/10/2016 12:10 PM, Roman Kagan wrote:
>>> Well, as I wrote in another mail, SeaBIOS, which is supposed to provide
>>> the same information to int 0x13/0x08, populates it with static data
>>> based only on the drive type as encoded in CMOS at initialization time,
>>> and everyone seem to have been fine with that so far.  I'll need to
>>> re-test it with real Windows guests, though.
>>>
>>
>> OK, but what we appear to be doing currently is polling the current
>> geometry values for a drive instead of some pre-chosen ones based on the
>> drive type.
>>
>> What values does SeaBIOS use? (Can you point me to the table it uses?)
> 
> src/hw/floppy.c:
> 
> struct floppyinfo_s FloppyInfo[] VARFSEG = {
> // Unknown
> { {0, 0, 0}, 0x00, 0x00},
> // 1 - 360KB, 5.25" - 2 heads, 40 tracks, 9 sectors
> { {2, 40, 9}, FLOPPY_SIZE_525, FLOPPY_RATE_300K},
> // 2 - 1.2MB, 5.25" - 2 heads, 80 tracks, 15 sectors
> { {2, 80, 15}, FLOPPY_SIZE_525, FLOPPY_RATE_500K},
> // 3 - 720KB, 3.5"  - 2 heads, 80 tracks, 9 sectors
> { {2, 80, 9}, FLOPPY_SIZE_350, FLOPPY_RATE_250K},
> // 4 - 1.44MB, 3.5" - 2 heads, 80 tracks, 18 sectors
> { {2, 80, 18}, FLOPPY_SIZE_350, FLOPPY_RATE_500K},
> // 5 - 2.88MB, 3.5" - 2 heads, 80 tracks, 36 sectors
> { {2, 80, 36}, FLOPPY_SIZE_350, FLOPPY_RATE_1M},
> // 6 - 160k, 5.25"  - 1 heads, 40 tracks, 8 sectors
> { {1, 40, 8}, FLOPPY_SIZE_525, FLOPPY_RATE_250K},
> // 7 - 180k, 5.25"  - 1 heads, 40 tracks, 9 sectors
> { {1, 40, 9}, FLOPPY_SIZE_525, FLOPPY_RATE_300K},
> // 8 - 320k, 5.25"  - 2 heads, 40 tracks, 8 sectors
> { {2, 40, 8}, FLOPPY_SIZE_525, FLOPPY_RATE_250K},
> };
> 
> The array is indexed by the floppy drive type from CMOS
> 
> Roman.
> 

Hm, thanks. These seem like sane geometries, but looking at the QEMU
geometry table ... I'm not sure what these even mean anymore, or what it
might mean when they don't align with the SeaBIOS values. We probably
don't even try very often. Maybe nobody ever has.

{ FLOPPY_DRIVE_TYPE_288, 36, 80, 1, FDRIVE_RATE_1M, }, // 5760
{ FLOPPY_DRIVE_TYPE_288, 39, 80, 1, FDRIVE_RATE_1M, }, // 6240
{ FLOPPY_DRIVE_TYPE_288, 40, 80, 1, FDRIVE_RATE_1M, }, // 6400
{ FLOPPY_DRIVE_TYPE_288, 44, 80, 1, FDRIVE_RATE_1M, }, // 7040
{ FLOPPY_DRIVE_TYPE_288, 48, 80, 1, FDRIVE_RATE_1M, }, // 7680

If sectors are 512 bytes, the byte size of these diskettes range from
2.813 MiB (or 2.99 Mb; I just today discovered the only way to get an
even 2.88 is to count this as ((36 * 80 * 2) * 512 / 1000 / 1024), which
mixes x^2 and x^10 designations) up through a whopping 3.74MiB/"3.84MB"

QEMU seems to try to support some real funky formats. Not even really
sure where Fabrice got these numbers from.

Sigh, as long as things sort of seem to work. I'm not going to invest
any further effort into figuring out exactly what we should be doing. If
the UEFI guests are working I'm happy if you are.

--js



Re: [Qemu-devel] commit 5b82b70 boot problems

2016-02-10 Thread Paolo Bonzini


On 10/02/2016 20:26, Gabriel L. Somlo wrote:
> Hi Stefan,
> 
> After pulling from upstream this morning, my guest VMs were no longer
> able to boot. For instance, running something like
> 
> bin/qemu-system-x86_64 -machine q35,accel=kvm -m 2048 -monitor stdio \
>   -device ide-drive,bus=ide.2,drive=CD \
>   -drive 
> id=CD,if=none,snapshot=on,file=Fedora-Live-Workstation-x86_64-22-3.iso
> 
> Would just hang with a black screen, or sometimes partially display
> the GRUB boot menu, and never make it any further than that.
> 
> After a bisect, I narrowed it down to 5b82b70 ("memory: RCU
> ram_list.dirty_memory[] for safe RAM hotplug"); reverting that
> commit got me back to where my VMs were working fine, but the
> actual content of the changes isn't immediately obvious.
> 
> Apologies for the noise in case you're already aware of the problem.

No apologies needed, but indeed a fix is pending on the list (search for
"memory" in today's subjects).

Paolo

> Otherwise, I'd be happy to test and send you debug output.
> 
> Please advise.
> 
> Thanks much,
> --Gabriel
> 
> 



Re: [Qemu-devel] [PULL v3 00/33] Misc patches for 2016-02-08

2016-02-10 Thread Paolo Bonzini


On 10/02/2016 20:16, John Snow wrote:
> 
> jsnow@scv ((977a82a...)) ~/s/q/b/git> ../../configure --cxx=clang++
> --cc=clang --host-cc=clang --extra-cflags=-Werror
> --extra-cflags=-fsanitize=undefined
> --extra-cflags=-Wno-deprecated-declarations
> --extra-cflags=-fno-sanitize=float-divide-by-zero
> --target-list="x86_64-softmmu"
> 
> ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.
>You probably need to set PKG_CONFIG_LIBDIR
>to point to the right pkg-config files for your
>build target
> 
> 
> 
> Problem appears to be this:
> 
> /usr/include/glib-2.0/glib/gmem.h:76:58: error: unknown attribute
> '__alloc_size__' ignored [-Werror,-Wunknown-attributes]
> gpointer g_malloc (gsize n_bytes) G_GNUC_MALLOC
> G_GNUC_ALLOC_SIZE(1);
> ^
> /usr/include/glib-2.0/glib/gmacros.h:67:45: note: expanded from macro
> 'G_GNUC_ALLOC_SIZE'
> #define G_GNUC_ALLOC_SIZE(x) __attribute__((__alloc_size__(x)))

F23 wraps it like this:

/* Clang feature detection: http://clang.llvm.org/docs/LanguageExtensions.html 
*/
#ifndef __has_feature
#define __has_feature(x) 0
#endif

#if (!defined(__clang__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINO
R__ >= 3))) || \
(defined(__clang__) && __has_feature(__alloc_size__))
#define G_GNUC_ALLOC_SIZE(x) __attribute__((__alloc_size__(x)))
#define G_GNUC_ALLOC_SIZE2(x,y) __attribute__((__alloc_size__(x,y)))
#else
#define G_GNUC_ALLOC_SIZE(x)
#define G_GNUC_ALLOC_SIZE2(x,y)
#endif

Please open a bug on the distro to backport this fix.

Paolo



Re: [Qemu-devel] [PATCH] migration: ensure htab_save_first completes after timeout

2016-02-10 Thread Michael Roth
Quoting Jianjun Duan (2016-02-10 12:20:40)
> htab_save_first_pass could return without finishing its work due to
> timeout. The patch checks if another invocation of it is necessary and
> will call it in htab_save_complete if necessary.
> 
> Signed-off-by: Jianjun Duan 

Reviewed-by: Michael Roth 

> ---
>  hw/ppc/spapr.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6bfb908..f6b6749 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1317,6 +1317,7 @@ static int htab_save_setup(QEMUFile *f, void *opaque)
>  static void htab_save_first_pass(QEMUFile *f, sPAPRMachineState *spapr,
>   int64_t max_ns)
>  {
> +bool has_timeout = max_ns != -1;
>  int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
>  int index = spapr->htab_save_index;
>  int64_t starttime = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> @@ -1350,7 +1351,7 @@ static void htab_save_first_pass(QEMUFile *f, 
> sPAPRMachineState *spapr,
>  qemu_put_buffer(f, HPTE(spapr->htab, chunkstart),
>  HASH_PTE_SIZE_64 * n_valid);
> 
> -if ((qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - starttime) > 
> max_ns) {
> +if (has_timeout && (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - 
> starttime) > max_ns) {
>  break;
>  }
>  }
> @@ -1503,6 +1504,9 @@ static int htab_save_complete(QEMUFile *f, void *opaque)
>  close(spapr->htab_fd);
>  spapr->htab_fd = -1;
>  } else {
> +if (spapr->htab_first_pass) {
> +htab_save_first_pass(f, spapr, -1);
> +}
>  htab_save_later_pass(f, spapr, -1);
>  }
> 
> -- 
> 1.9.1
> 




[Qemu-devel] [PATCH] migration: ensure htab_save_first completes after timeout

2016-02-10 Thread Jianjun Duan
htab_save_first_pass could return without finishing its work due to
timeout. The patch checks if another invocation of it is necessary and
will call it in htab_save_complete if necessary.

Signed-off-by: Jianjun Duan 
---
 hw/ppc/spapr.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6bfb908..f6b6749 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1317,6 +1317,7 @@ static int htab_save_setup(QEMUFile *f, void *opaque)
 static void htab_save_first_pass(QEMUFile *f, sPAPRMachineState *spapr,
  int64_t max_ns)
 {
+bool has_timeout = max_ns != -1;
 int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
 int index = spapr->htab_save_index;
 int64_t starttime = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1350,7 +1351,7 @@ static void htab_save_first_pass(QEMUFile *f, 
sPAPRMachineState *spapr,
 qemu_put_buffer(f, HPTE(spapr->htab, chunkstart),
 HASH_PTE_SIZE_64 * n_valid);
 
-if ((qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - starttime) > max_ns) 
{
+if (has_timeout && (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - 
starttime) > max_ns) {
 break;
 }
 }
@@ -1503,6 +1504,9 @@ static int htab_save_complete(QEMUFile *f, void *opaque)
 close(spapr->htab_fd);
 spapr->htab_fd = -1;
 } else {
+if (spapr->htab_first_pass) {
+htab_save_first_pass(f, spapr, -1);
+}
 htab_save_later_pass(f, spapr, -1);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH v7 1/5] fw_cfg: expose control register size in fw_cfg.h

2016-02-10 Thread Gabriel L. Somlo
Expose the size of the control register (FW_CFG_CTL_SIZE) in fw_cfg.h.
Add comment to fw_cfg_io_realize() pointing out that since the
8-bit data register is always subsumed by the 16-bit control
register in the port I/O case, we use the control register width
as the *total* width of the (classic, non-DMA) port I/O region reserved
for the device.

Cc: Marc Marí 
Signed-off-by: Gabriel Somlo 
Reviewed-by: Laszlo Ersek 
Reviewed-by: Marc Marí 
---
 hw/nvram/fw_cfg.c | 4 +++-
 include/hw/nvram/fw_cfg.h | 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 79c5742..ef2a219 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -32,7 +32,6 @@
 #include "qemu/error-report.h"
 #include "qemu/config-file.h"
 
-#define FW_CFG_CTL_SIZE 2
 #define FW_CFG_NAME "fw_cfg"
 #define FW_CFG_PATH "/machine/" FW_CFG_NAME
 
@@ -882,6 +881,9 @@ static void fw_cfg_io_realize(DeviceState *dev, Error 
**errp)
 FWCfgIoState *s = FW_CFG_IO(dev);
 SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 
+/* when using port i/o, the 8-bit data register ALWAYS overlaps
+ * with half of the 16-bit control register. Hence, the total size
+ * of the i/o region used is FW_CFG_CTL_SIZE */
 memory_region_init_io(&s->comb_iomem, OBJECT(s), &fw_cfg_comb_mem_ops,
   FW_CFG(s), "fwcfg", FW_CFG_CTL_SIZE);
 sysbus_add_io(sbd, s->iobase, &s->comb_iomem);
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 664eaf6..2667ca9 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -46,6 +46,9 @@
 
 #define FW_CFG_INVALID  0x
 
+/* width in bytes of fw_cfg control register */
+#define FW_CFG_CTL_SIZE 0x02
+
 #define FW_CFG_MAX_FILE_PATH56
 
 #ifndef NO_QEMU_PROTOS
-- 
2.4.3




[Qemu-devel] [PATCH v7 4/5] acpi: arm: add fw_cfg device node to dsdt

2016-02-10 Thread Gabriel L. Somlo
Add a fw_cfg device node to the ACPI DSDT. This is mostly
informational, as the authoritative fw_cfg MMIO region(s)
are listed in the Device Tree. However, since we are building
ACPI tables, we might as well be thorough while at it...

Signed-off-by: Gabriel Somlo 
Reviewed-by: Laszlo Ersek 
Tested-by: Laszlo Ersek 
Reviewed-by: Marc Marí 
---
 hw/arm/virt-acpi-build.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 8cf9a21..7d7998b 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -81,6 +81,20 @@ static void acpi_dsdt_add_uart(Aml *scope, const MemMapEntry 
*uart_memmap,
 aml_append(scope, dev);
 }
 
+static void acpi_dsdt_add_fw_cfg(Aml *scope, const MemMapEntry *fw_cfg_memmap)
+{
+Aml *dev = aml_device("FWCF");
+aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0002")));
+/* device present, functioning, decoding, not shown in UI */
+aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
+
+Aml *crs = aml_resource_template();
+aml_append(crs, aml_memory32_fixed(fw_cfg_memmap->base,
+   fw_cfg_memmap->size, AML_READ_WRITE));
+aml_append(dev, aml_name_decl("_CRS", crs));
+aml_append(scope, dev);
+}
+
 static void acpi_dsdt_add_flash(Aml *scope, const MemMapEntry *flash_memmap)
 {
 Aml *dev, *crs;
@@ -548,6 +562,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
VirtGuestInfo *guest_info)
 acpi_dsdt_add_uart(scope, &memmap[VIRT_UART],
(irqmap[VIRT_UART] + ARM_SPI_BASE));
 acpi_dsdt_add_flash(scope, &memmap[VIRT_FLASH]);
+acpi_dsdt_add_fw_cfg(scope, &memmap[VIRT_FW_CFG]);
 acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
 (irqmap[VIRT_MMIO] + ARM_SPI_BASE), NUM_VIRTIO_TRANSPORTS);
 acpi_dsdt_add_pci(scope, memmap, (irqmap[VIRT_PCIE] + ARM_SPI_BASE),
-- 
2.4.3




[Qemu-devel] [PATCH v7 3/5] acpi: pc: add fw_cfg device node to ssdt

2016-02-10 Thread Gabriel L. Somlo
Add a fw_cfg device node to the ACPI SSDT. While the guest-side
firmware can't utilize this information (since it has to access
the hard-coded fw_cfg device to extract ACPI tables to begin with),
having fw_cfg listed in ACPI will help the guest kernel keep a more
accurate inventory of in-use IO port regions.

Signed-off-by: Gabriel Somlo 
Reviewed-by: Laszlo Ersek 
Reviewed-by: Marc Marí 
---
 hw/i386/acpi-build.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4554eb8..4762fd2 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2190,6 +2190,35 @@ build_dsdt(GArray *table_data, GArray *linker,
 aml_append(scope, aml_name_decl("_S5", pkg));
 aml_append(dsdt, scope);
 
+/* create fw_cfg node, unconditionally */
+{
+/* when using port i/o, the 8-bit data register *always* overlaps
+ * with half of the 16-bit control register. Hence, the total size
+ * of the i/o region used is FW_CFG_CTL_SIZE; when using DMA, the
+ * DMA control register is located at FW_CFG_DMA_IO_BASE + 4 */
+uint8_t io_size = object_property_get_bool(OBJECT(pcms->fw_cfg),
+   "dma_enabled", NULL) ?
+  ROUND_UP(FW_CFG_CTL_SIZE, 4) + sizeof(dma_addr_t) :
+  FW_CFG_CTL_SIZE;
+
+scope = aml_scope("\\_SB");
+dev = aml_device("FWCF");
+
+aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0002")));
+
+/* device present, functioning, decoding, not shown in UI */
+aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
+
+crs = aml_resource_template();
+aml_append(crs,
+aml_io(AML_DECODE16, FW_CFG_IO_BASE, FW_CFG_IO_BASE, 0x01, io_size)
+);
+aml_append(dev, aml_name_decl("_CRS", crs));
+
+aml_append(scope, dev);
+aml_append(dsdt, scope);
+}
+
 if (misc->applesmc_io_base) {
 scope = aml_scope("\\_SB.PCI0.ISA");
 dev = aml_device("SMC");
-- 
2.4.3




[Qemu-devel] [PATCH v7 2/5] pc: fw_cfg: move ioport base constant to pc.h

2016-02-10 Thread Gabriel L. Somlo
Move BIOS_CFG_IOPORT define from pc.c to pc.h, and rename
it to FW_CFG_IO_BASE.

Cc: Marc Marí 
Signed-off-by: Gabriel Somlo 
Reviewed-by: Laszlo Ersek 
Reviewed-by: Marc Marí 
---
 hw/i386/pc.c | 5 ++---
 include/hw/i386/pc.h | 2 ++
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 0aeefd2..56ec6cd 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -78,7 +78,6 @@
 #define DPRINTF(fmt, ...)
 #endif
 
-#define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1)
 #define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2)
@@ -756,7 +755,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as)
 int i, j;
 unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
 
-fw_cfg = fw_cfg_init_io_dma(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 4, as);
+fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
 
 /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
  *
@@ -1258,7 +1257,7 @@ void xen_load_linux(PCMachineState *pcms)
 
 assert(MACHINE(pcms)->kernel_filename != NULL);
 
-fw_cfg = fw_cfg_init_io(BIOS_CFG_IOPORT);
+fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
 rom_set_fw(fw_cfg);
 
 load_linux(pcms, fw_cfg);
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 8b3546e..79ffe5b 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -266,6 +266,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
*parent_name);
 
 ISADevice *pc_find_fdc0(void);
 
+#define FW_CFG_IO_BASE 0x510
+
 /* acpi_piix.c */
 
 I2CBus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
-- 
2.4.3




[Qemu-devel] [PATCH v7 0/5] add ACPI node for fw_cfg on pc and arm

2016-02-10 Thread Gabriel L. Somlo
Generate an ACPI DSDT node for fw_cfg on pc and arm guests.

New since v6:
- rebased to fit on top of fb306ff and f264d36, which moved things
  around in pc's acpi-build.c (only patch 3/5 affected);
- kernel-side fw_cfg sysfs driver accepted into upstream linux

Thanks,
 --Gabriel

>New since v5:
>
>   - rebased on top of latest QEMU git master
>
>>New since v4:
>>
>>  - rebased on top of Marc's DMA series
>>  - drop machine compat dependency for insertion into x86/ssdt
>>(patch 3/5), following agreement between Igor and Eduardo
>>  - [mm]io register range now covers DMA register as well, if
>>available.
>>  - s/bios/firmware in doc file updates
>>
>>>New since v3:
>>>
>>> - rebased to work on top of 87e896ab (introducing pc-*-25 classes),
>>>   inserting fw_cfg acpi node only for machines >= 2.5.
>>>
>>> - reintroduce _STA with value 0x0B (bit 2 for u/i visibility turned
>>>   off to avoid Windows complaining -- thanks Igor for catching that!)
>>>
>>>If there's any other feedback besides questions regarding the
>>>appropriateness of "QEMU0002" as the value of _HID, please don't hesitate!
>>>
New since v2:

- pc/i386 node in ssdt only on machine types *newer* than 2.4
  (as suggested by Eduardo)

I appreciate any further comments and reviews. Hopefully we can make
this palatable for upstream, modulo the lingering concerns about whether
"QEMU0002" is ok to use as the value of _HID, which I'll hopefully get
sorted out with the kernel crew...

>New since v1:
>
>   - expose control register size (suggested by Marc Marí)
>
>   - leaving out _UID and _STA fields (thanks Shannon & Igor)
>
>   - using "QEMU0002" as the value of _HID (thanks Michael)
>
>   - added documentation blurb to docs/specs/fw_cfg.txt
> (mainly to record usage of the "QEMU0002" string with fw_cfg).
>
>> This series adds a fw_cfg device node to the SSDT (on pc), or to the
>> DSDT (on arm).
>>
>>  - Patch 1/3 moves (and renames) the BIOS_CFG_IOPORT (0x510)
>>define from pc.c to pc.h, so that it could be used from
>>acpi-build.c in patch 2/3.
>> 
>>  - Patch 2/3 adds a fw_cfg node to the pc SSDT.
>> 
>>  - Patch 3/3 adds a fw_cfg node to the arm DSDT.
>>
>> I made up some names - "FWCF" for the node name, and "FWCF0001"
>> for _HID; no idea whether that's appropriate, or how else I should
>> figure out what to use instead...
>>
>> Also, using scope "\\_SB", based on where fw_cfg shows up in the
>> output of "info qtree". Again, if that's wrong, please point me in
>> the right direction.
>>
>> Re. 3/3 (also mentioned after the commit blurb in the patch itself),
>> I noticed none of the other DSDT entries contain a _STA field, wondering
>> why it would (not) make sense to include that, same as on the PC.

Gabriel L. Somlo (5):
  fw_cfg: expose control register size in fw_cfg.h
  pc: fw_cfg: move ioport base constant to pc.h
  acpi: pc: add fw_cfg device node to ssdt
  acpi: arm: add fw_cfg device node to dsdt
  fw_cfg: document ACPI device node information

 docs/specs/fw_cfg.txt |  9 +
 hw/arm/virt-acpi-build.c  | 15 +++
 hw/i386/acpi-build.c  | 29 +
 hw/i386/pc.c  |  5 ++---
 hw/nvram/fw_cfg.c |  4 +++-
 include/hw/i386/pc.h  |  2 ++
 include/hw/nvram/fw_cfg.h |  3 +++
 7 files changed, 63 insertions(+), 4 deletions(-)

-- 
2.4.3




[Qemu-devel] [PATCH v7 5/5] fw_cfg: document ACPI device node information

2016-02-10 Thread Gabriel L. Somlo
Signed-off-by: Gabriel Somlo 
Reviewed-by: Laszlo Ersek 
Reviewed-by: Marc Marí 
---
 docs/specs/fw_cfg.txt | 9 +
 1 file changed, 9 insertions(+)

diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
index 2099ad9..5414140 100644
--- a/docs/specs/fw_cfg.txt
+++ b/docs/specs/fw_cfg.txt
@@ -84,6 +84,15 @@ Selector Register address: Base + 8 (2 bytes)
 Data Register address: Base + 0 (8 bytes)
 DMA Address address:   Base + 16 (8 bytes)
 
+== ACPI Interface ==
+
+The fw_cfg device is defined with ACPI ID "QEMU0002". Since we expect
+ACPI tables to be passed into the guest through the fw_cfg device itself,
+the guest-side firmware can not use ACPI to find fw_cfg. However, once the
+firmware is finished setting up ACPI tables and hands control over to the
+guest kernel, the latter can use the fw_cfg ACPI node for a more accurate
+inventory of in-use IOport or MMIO regions.
+
 == Firmware Configuration Items ==
 
 === Signature (Key 0x, FW_CFG_SIGNATURE) ===
-- 
2.4.3




Re: [Qemu-devel] [Qemu-ppc] [PATCH] spapr: skip configuration section during migration of older machines

2016-02-10 Thread Greg Kurz
On Mon, 08 Feb 2016 16:59:47 +0100
Greg Kurz  wrote:
> Since QEMU 2.4, we have a configuration section in the migration stream.
> This must be skipped for older machines, like it is already done for x86.
> 

Ouch ! It is more complex than I thought... the migration of pseries-2.3
machine is already broken between QEMU-2.3 and QEMU-2.4. So this patch
fixes indeed migration of a pseries-2.3 machine from QEMU-2.3, but it
breaks migration of the same machine from QEMU-2.4 and up.

Not sure how to deal with that... is it reasonable to assume that
pseries-2.3 running with QEMU-2.3 is the common case ? If so, this
patch would bring more help than harm.

> Fixes: 61964c23e5ddd5a33f15699e45ce126f879e3e33
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Greg Kurz 
> ---
>  hw/ppc/spapr.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5bd8fd3ef842..bca7cb8a5d27 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2446,6 +2446,7 @@ static void 
> spapr_machine_2_3_instance_options(MachineState *machine)
>  spapr_machine_2_4_instance_options(machine);
>  savevm_skip_section_footers();
>  global_state_set_optional();
> +savevm_skip_configuration();
>  }
> 
>  static void spapr_machine_2_3_class_options(MachineClass *mc)
> 
> 




Re: [Qemu-devel] [PATCH] linux-user: Don't assert if guest tries shmdt(0)

2016-02-10 Thread Peter Maydell
On 10 February 2016 at 18:39, Laurent Vivier  wrote:
>
>
> Le 09/02/2016 16:57, Peter Maydell a écrit :
>> Our implementation of shmat() and shmdt() for linux-user was
>> using "zero guest address" as its marker for "entry in the
>> shm_regions[] array is not in use". This meant that if the
>> guest did a shmdt(0) we would match on an unused array entry
>
> Is shmdt(0) valid ?

It's valid in the sense of "should detach" if shmat() ever
returned 0 (which I suspect it will never do but have not
attempted to determine). It's valid in the sense of "should
not cause an assert" anyway.

> I mean, if shmat() is called with shmaddr equal to 0:
> "the system chooses a suitable (unused) address at which
> to attach the segment."
>
> and
>
> "The to-be-detached segment must be currently attached with shmaddr
> equal to the value returned by the attaching shmat() call."
>
> Did you check shmat() can return 0 ?
> (I think our mmap_find_vma() cannot return 0)

Not wanting to try to figure this out is why I switched to
having an extra in_use flag in the shm_regions[] array.
0 is now not any kind of special value as far as addresses
go -- if shmat() returned 0 as a valid address then we'll
record it in the array, and shmdt() will work. If it
never did, then shmdt() won't find any valid entries,
we'll call the host with shmdt() on something that wasn't
an attached segment and the host kernel will fail the
syscall as it should.

> Why don't you fail on shmdt(0) (EINVAL) ?

We let the host kernel do the error checking and return
the errno for us, at which point it will indeed fail EINVAL.

thanks
-- PMM



[Qemu-devel] [PULL 11/11] ahci: prohibit "restarting" the FIS or CLB engines

2016-02-10 Thread John Snow
If the FIS or DMA engines are already started, do not allow them to be
"restarted." As a side-effect of this change, the migration post-load
routine must be modified to cope. If the engines are listed as "on"
in the migrated registers, they must be cleared to allow the startup
routine to see the transition from "off" to "on".

As a second side-effect, the extra argument to ahci_cond_engine_start
is removed in favor of consistent behavior.

Signed-off-by: John Snow 
Message-id: 1454103689-13042-5-git-send-email-js...@redhat.com
---
 hw/ide/ahci.c | 39 ---
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 9f4a672..f244bc0 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -199,38 +199,38 @@ static void map_page(AddressSpace *as, uint8_t **ptr, 
uint64_t addr,
  * Check the cmd register to see if we should start or stop
  * the DMA or FIS RX engines.
  *
- * @ad: Device to engage.
- * @allow_stop: Allow device to transition from started to stopped?
- *   'no' is useful for migration post_load, which does not expect a 
transition.
+ * @ad: Device to dis/engage.
  *
  * @return 0 on success, -1 on error.
  */
-static int ahci_cond_start_engines(AHCIDevice *ad, bool allow_stop)
+static int ahci_cond_start_engines(AHCIDevice *ad)
 {
 AHCIPortRegs *pr = &ad->port_regs;
+bool cmd_start = pr->cmd & PORT_CMD_START;
+bool cmd_on= pr->cmd & PORT_CMD_LIST_ON;
+bool fis_start = pr->cmd & PORT_CMD_FIS_RX;
+bool fis_on= pr->cmd & PORT_CMD_FIS_ON;
 
-if (pr->cmd & PORT_CMD_START) {
+if (cmd_start && !cmd_on) {
 if (!ahci_map_clb_address(ad)) {
+pr->cmd &= ~PORT_CMD_START;
 error_report("AHCI: Failed to start DMA engine: "
  "bad command list buffer address");
 return -1;
 }
-} else if (pr->cmd & PORT_CMD_LIST_ON) {
-if (allow_stop) {
-ahci_unmap_clb_address(ad);
-}
+} else if (!cmd_start && cmd_on) {
+ahci_unmap_clb_address(ad);
 }
 
-if (pr->cmd & PORT_CMD_FIS_RX) {
+if (fis_start && !fis_on) {
 if (!ahci_map_fis_address(ad)) {
+pr->cmd &= ~PORT_CMD_FIS_RX;
 error_report("AHCI: Failed to start FIS receive engine: "
  "bad FIS receive buffer address");
 return -1;
 }
-} else if (pr->cmd & PORT_CMD_FIS_ON) {
-if (allow_stop) {
-ahci_unmap_fis_address(ad);
-}
+} else if (!fis_start && fis_on) {
+ahci_unmap_fis_address(ad);
 }
 
 return 0;
@@ -272,8 +272,8 @@ static void  ahci_port_write(AHCIState *s, int port, int 
offset, uint32_t val)
 pr->cmd = (pr->cmd & PORT_CMD_RO_MASK) |
   (val & ~(PORT_CMD_RO_MASK|PORT_CMD_ICC_MASK));
 
-/* Check FIS RX and CLB engines, allow transition to false: */
-ahci_cond_start_engines(&s->dev[port], true);
+/* Check FIS RX and CLB engines */
+ahci_cond_start_engines(&s->dev[port]);
 
 /* XXX usually the FIS would be pending on the bus here and
issuing deferred until the OS enables FIS receival.
@@ -1578,9 +1578,10 @@ static int ahci_state_post_load(void *opaque, int 
version_id)
 return -1;
 }
 
-/* Only remap the CLB address if appropriate, disallowing a state
- * transition from 'on' to 'off' it should be consistent here. */
-if (ahci_cond_start_engines(ad, false) != 0) {
+/* After a migrate, the DMA/FIS engines are "off" and
+ * need to be conditionally restarted */
+pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON);
+if (ahci_cond_start_engines(ad) != 0) {
 return -1;
 }
 
-- 
2.4.3




[Qemu-devel] [PULL 08/11] ahci: Do not unmap NULL addresses

2016-02-10 Thread John Snow
Definitely don't try to unmap a garbage address.

Reported-by: Zuozhi fzz 
Signed-off-by: John Snow 
Message-id: 1454103689-13042-2-git-send-email-js...@redhat.com
---
 hw/ide/ahci.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 7e87b18..3a95dad 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -662,6 +662,10 @@ static bool ahci_map_fis_address(AHCIDevice *ad)
 
 static void ahci_unmap_fis_address(AHCIDevice *ad)
 {
+if (ad->res_fis == NULL) {
+DPRINTF(ad->port_no, "Attempt to unmap NULL FIS address\n");
+return;
+}
 dma_memory_unmap(ad->hba->as, ad->res_fis, 256,
  DMA_DIRECTION_FROM_DEVICE, 256);
 ad->res_fis = NULL;
@@ -678,6 +682,10 @@ static bool ahci_map_clb_address(AHCIDevice *ad)
 
 static void ahci_unmap_clb_address(AHCIDevice *ad)
 {
+if (ad->lst == NULL) {
+DPRINTF(ad->port_no, "Attempt to unmap NULL CLB address\n");
+return;
+}
 dma_memory_unmap(ad->hba->as, ad->lst, 1024,
  DMA_DIRECTION_FROM_DEVICE, 1024);
 ad->lst = NULL;
-- 
2.4.3




[Qemu-devel] [PULL 07/11] fdc: always compile-check debug prints

2016-02-10 Thread John Snow
Coverity noticed that some variables are only used by debug prints, and
called them unused. Always compile the print statements. While we're
here, print to stderr as well.

Bonus: Fix a debug printf I broke in f31937aa8

Signed-off-by: John Snow 
Reviewed-by: Eric Blake 
[Touched up commit message. --js]
Message-id: 1454971529-14830-1-git-send-email-js...@redhat.com
---
 hw/block/fdc.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index a6f22ef..9838d21 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -41,14 +41,15 @@
 
 //
 /* debug Floppy devices */
-//#define DEBUG_FLOPPY
 
-#ifdef DEBUG_FLOPPY
+#define DEBUG_FLOPPY 0
+
 #define FLOPPY_DPRINTF(fmt, ...)\
-do { printf("FLOPPY: " fmt , ## __VA_ARGS__); } while (0)
-#else
-#define FLOPPY_DPRINTF(fmt, ...)
-#endif
+do {\
+if (DEBUG_FLOPPY) { \
+fprintf(stderr, "FLOPPY: " fmt , ## __VA_ARGS__);   \
+}   \
+} while (0)
 
 //
 /* Floppy drive emulation   */
@@ -353,7 +354,7 @@ static int pick_geometry(FDrive *drv)
 parse = &fd_formats[size_match];
 FLOPPY_DPRINTF("User requested floppy drive type '%s', "
"but inserted medium appears to be a "
-   "%d sector '%s' type\n",
+   "%"PRId64" sector '%s' type\n",
FloppyDriveType_lookup[drv->drive],
nb_sectors,
FloppyDriveType_lookup[parse->drive]);
-- 
2.4.3




[Qemu-devel] [PULL 00/11] Ide patches

2016-02-10 Thread John Snow
The following changes since commit c9f19dff101e2c2cf3fa3967eceec2833e845e40:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2016-02-09 19:34:46 +)

are available in the git repository at:

  https://github.com/jnsnow/qemu.git tags/ide-pull-request

for you to fetch changes up to d590474922d37372c56075adb229c86d3aeae967:

  ahci: prohibit "restarting" the FIS or CLB engines (2016-02-10 13:29:40 -0500)



Untested with Clang.



John Snow (11):
  ide: Prohibit RESET on IDE drives
  ide: code motion
  ide: move buffered DMA cancel to core
  ide: replace blk_drain_all by blk_drain
  ide: Add silent DRQ cancellation
  ide: fix device_reset to not ignore pending AIO
  fdc: always compile-check debug prints
  ahci: Do not unmap NULL addresses
  ahci: handle LIST_ON and FIS_ON in map helpers
  ahci: explicitly reject bad engine states on post_load
  ahci: prohibit "restarting" the FIS or CLB engines

 hw/block/fdc.c|  15 ++--
 hw/ide/ahci.c |  96 ++--
 hw/ide/core.c | 217 --
 hw/ide/internal.h |   1 +
 hw/ide/pci.c  |  36 +
 5 files changed, 213 insertions(+), 152 deletions(-)

-- 
2.4.3




[Qemu-devel] [PULL 09/11] ahci: handle LIST_ON and FIS_ON in map helpers

2016-02-10 Thread John Snow
Instead of relying on ahci_cond_start_engines to maintain the
engine status indicators itself, have the lower-layer CLB and FIS mapper
helpers do it themselves.

This makes the cond_start routine slightly nicer to read, and makes sure
that the status indicators will always be correct.

Signed-off-by: John Snow 
Message-id: 1454103689-13042-3-git-send-email-js...@redhat.com
---
 hw/ide/ahci.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 3a95dad..ff338fe 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -210,9 +210,7 @@ static int ahci_cond_start_engines(AHCIDevice *ad, bool 
allow_stop)
 AHCIPortRegs *pr = &ad->port_regs;
 
 if (pr->cmd & PORT_CMD_START) {
-if (ahci_map_clb_address(ad)) {
-pr->cmd |= PORT_CMD_LIST_ON;
-} else {
+if (!ahci_map_clb_address(ad)) {
 error_report("AHCI: Failed to start DMA engine: "
  "bad command list buffer address");
 return -1;
@@ -220,7 +218,6 @@ static int ahci_cond_start_engines(AHCIDevice *ad, bool 
allow_stop)
 } else if (pr->cmd & PORT_CMD_LIST_ON) {
 if (allow_stop) {
 ahci_unmap_clb_address(ad);
-pr->cmd = pr->cmd & ~(PORT_CMD_LIST_ON);
 } else {
 error_report("AHCI: DMA engine should be off, "
  "but appears to still be running");
@@ -229,9 +226,7 @@ static int ahci_cond_start_engines(AHCIDevice *ad, bool 
allow_stop)
 }
 
 if (pr->cmd & PORT_CMD_FIS_RX) {
-if (ahci_map_fis_address(ad)) {
-pr->cmd |= PORT_CMD_FIS_ON;
-} else {
+if (!ahci_map_fis_address(ad)) {
 error_report("AHCI: Failed to start FIS receive engine: "
  "bad FIS receive buffer address");
 return -1;
@@ -239,7 +234,6 @@ static int ahci_cond_start_engines(AHCIDevice *ad, bool 
allow_stop)
 } else if (pr->cmd & PORT_CMD_FIS_ON) {
 if (allow_stop) {
 ahci_unmap_fis_address(ad);
-pr->cmd = pr->cmd & ~(PORT_CMD_FIS_ON);
 } else {
 error_report("AHCI: FIS receive engine should be off, "
  "but appears to still be running");
@@ -657,7 +651,13 @@ static bool ahci_map_fis_address(AHCIDevice *ad)
 AHCIPortRegs *pr = &ad->port_regs;
 map_page(ad->hba->as, &ad->res_fis,
  ((uint64_t)pr->fis_addr_hi << 32) | pr->fis_addr, 256);
-return ad->res_fis != NULL;
+if (ad->res_fis != NULL) {
+pr->cmd |= PORT_CMD_FIS_ON;
+return true;
+}
+
+pr->cmd &= ~PORT_CMD_FIS_ON;
+return false;
 }
 
 static void ahci_unmap_fis_address(AHCIDevice *ad)
@@ -666,6 +666,7 @@ static void ahci_unmap_fis_address(AHCIDevice *ad)
 DPRINTF(ad->port_no, "Attempt to unmap NULL FIS address\n");
 return;
 }
+ad->port_regs.cmd &= ~PORT_CMD_FIS_ON;
 dma_memory_unmap(ad->hba->as, ad->res_fis, 256,
  DMA_DIRECTION_FROM_DEVICE, 256);
 ad->res_fis = NULL;
@@ -677,7 +678,13 @@ static bool ahci_map_clb_address(AHCIDevice *ad)
 ad->cur_cmd = NULL;
 map_page(ad->hba->as, &ad->lst,
  ((uint64_t)pr->lst_addr_hi << 32) | pr->lst_addr, 1024);
-return ad->lst != NULL;
+if (ad->lst != NULL) {
+pr->cmd |= PORT_CMD_LIST_ON;
+return true;
+}
+
+pr->cmd &= ~PORT_CMD_LIST_ON;
+return false;
 }
 
 static void ahci_unmap_clb_address(AHCIDevice *ad)
@@ -686,6 +693,7 @@ static void ahci_unmap_clb_address(AHCIDevice *ad)
 DPRINTF(ad->port_no, "Attempt to unmap NULL CLB address\n");
 return;
 }
+ad->port_regs.cmd &= ~PORT_CMD_LIST_ON;
 dma_memory_unmap(ad->hba->as, ad->lst, 1024,
  DMA_DIRECTION_FROM_DEVICE, 1024);
 ad->lst = NULL;
-- 
2.4.3




[Qemu-devel] [PULL 06/11] ide: fix device_reset to not ignore pending AIO

2016-02-10 Thread John Snow
Signed-off-by: John Snow 
Reported-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
Message-id: 1453225191-11871-7-git-send-email-js...@redhat.com
---
 hw/ide/core.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 3c32b39..241e840 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -505,7 +505,6 @@ void ide_transfer_stop(IDEState *s)
 ide_transfer_halt(s, ide_transfer_stop, true);
 }
 
-__attribute__((__unused__))
 static void ide_transfer_cancel(IDEState *s)
 {
 ide_transfer_halt(s, ide_transfer_cancel, false);
@@ -1298,6 +1297,23 @@ static bool cmd_nop(IDEState *s, uint8_t cmd)
 return true;
 }
 
+static bool cmd_device_reset(IDEState *s, uint8_t cmd)
+{
+/* Halt PIO (in the DRQ phase), then DMA */
+ide_transfer_cancel(s);
+ide_cancel_dma_sync(s);
+
+/* Reset any PIO commands, reset signature, etc */
+ide_reset(s);
+
+/* RESET: ATA8-ACS3 7.10.4 "Normal Outputs";
+ * ATA8-ACS3 Table 184 "Device Signatures for Normal Output" */
+s->status = 0x00;
+
+/* Do not overwrite status register */
+return false;
+}
+
 static bool cmd_data_set_management(IDEState *s, uint8_t cmd)
 {
 switch (s->feature) {
@@ -1614,15 +1630,6 @@ static bool cmd_exec_dev_diagnostic(IDEState *s, uint8_t 
cmd)
 return false;
 }
 
-static bool cmd_device_reset(IDEState *s, uint8_t cmd)
-{
-ide_set_signature(s);
-s->status = 0x00; /* NOTE: READY is _not_ set */
-s->error = 0x01;
-
-return false;
-}
-
 static bool cmd_packet(IDEState *s, uint8_t cmd)
 {
 /* overlapping commands not supported */
-- 
2.4.3




[Qemu-devel] [PULL 10/11] ahci: explicitly reject bad engine states on post_load

2016-02-10 Thread John Snow
Currently, we let ahci_cond_start_engines reject weird configurations
where either the DMA (CLB) or FIS engines are said to be started, but
their matching on/off control bit is toggled off.

There should be no way to achieve this, since any time you toggle the
control bit off, the status bit should always follow synchronously.

Preparing for a refactor in cond_start_engines, move the rejection logic
straight up into post_load.

Signed-off-by: John Snow 
Message-id: 1454103689-13042-4-git-send-email-js...@redhat.com
---
 hw/ide/ahci.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index ff338fe..9f4a672 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -218,10 +218,6 @@ static int ahci_cond_start_engines(AHCIDevice *ad, bool 
allow_stop)
 } else if (pr->cmd & PORT_CMD_LIST_ON) {
 if (allow_stop) {
 ahci_unmap_clb_address(ad);
-} else {
-error_report("AHCI: DMA engine should be off, "
- "but appears to still be running");
-return -1;
 }
 }
 
@@ -234,10 +230,6 @@ static int ahci_cond_start_engines(AHCIDevice *ad, bool 
allow_stop)
 } else if (pr->cmd & PORT_CMD_FIS_ON) {
 if (allow_stop) {
 ahci_unmap_fis_address(ad);
-} else {
-error_report("AHCI: FIS receive engine should be off, "
- "but appears to still be running");
-return -1;
 }
 }
 
@@ -1568,10 +1560,23 @@ static int ahci_state_post_load(void *opaque, int 
version_id)
 int i, j;
 struct AHCIDevice *ad;
 NCQTransferState *ncq_tfs;
+AHCIPortRegs *pr;
 AHCIState *s = opaque;
 
 for (i = 0; i < s->ports; i++) {
 ad = &s->dev[i];
+pr = &ad->port_regs;
+
+if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) {
+error_report("AHCI: DMA engine should be off, but status bit "
+ "indicates it is still running.");
+return -1;
+}
+if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) {
+error_report("AHCI: FIS RX engine should be off, but status bit "
+ "indicates it is still running.");
+return -1;
+}
 
 /* Only remap the CLB address if appropriate, disallowing a state
  * transition from 'on' to 'off' it should be consistent here. */
-- 
2.4.3




[Qemu-devel] [PULL 05/11] ide: Add silent DRQ cancellation

2016-02-10 Thread John Snow
Split apart the ide_transfer_stop function into two versions: one that
interrupts and one that doesn't. The one that doesn't can be used to
halt any PIO transfers that are in the DRQ phase. It will not halt
any PIO transfers that are currently in the process of buffering data
for the guest to read.

Signed-off-by: John Snow 
Reported-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
[Renamed 'etf' to 'end_transfer_func' --js]
Message-id: 1453225191-11871-6-git-send-email-js...@redhat.com
---
 hw/ide/core.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 40b6cc8..3c32b39 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -487,13 +487,28 @@ static void ide_cmd_done(IDEState *s)
 }
 }
 
-void ide_transfer_stop(IDEState *s)
+static void ide_transfer_halt(IDEState *s,
+  void(*end_transfer_func)(IDEState *),
+  bool notify)
 {
-s->end_transfer_func = ide_transfer_stop;
+s->end_transfer_func = end_transfer_func;
 s->data_ptr = s->io_buffer;
 s->data_end = s->io_buffer;
 s->status &= ~DRQ_STAT;
-ide_cmd_done(s);
+if (notify) {
+ide_cmd_done(s);
+}
+}
+
+void ide_transfer_stop(IDEState *s)
+{
+ide_transfer_halt(s, ide_transfer_stop, true);
+}
+
+__attribute__((__unused__))
+static void ide_transfer_cancel(IDEState *s)
+{
+ide_transfer_halt(s, ide_transfer_cancel, false);
 }
 
 int64_t ide_get_sector(IDEState *s)
-- 
2.4.3




[Qemu-devel] [PULL 00/11] Ide patches

2016-02-10 Thread John Snow
The following changes since commit c9f19dff101e2c2cf3fa3967eceec2833e845e40:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2016-02-09 19:34:46 +)

are available in the git repository at:

  https://github.com/jnsnow/qemu.git tags/ide-pull-request

for you to fetch changes up to d590474922d37372c56075adb229c86d3aeae967:

  ahci: prohibit "restarting" the FIS or CLB engines (2016-02-10 13:29:40 -0500)





John Snow (11):
  ide: Prohibit RESET on IDE drives
  ide: code motion
  ide: move buffered DMA cancel to core
  ide: replace blk_drain_all by blk_drain
  ide: Add silent DRQ cancellation
  ide: fix device_reset to not ignore pending AIO
  fdc: always compile-check debug prints
  ahci: Do not unmap NULL addresses
  ahci: handle LIST_ON and FIS_ON in map helpers
  ahci: explicitly reject bad engine states on post_load
  ahci: prohibit "restarting" the FIS or CLB engines

 hw/block/fdc.c|  15 ++--
 hw/ide/ahci.c |  96 ++--
 hw/ide/core.c | 217 --
 hw/ide/internal.h |   1 +
 hw/ide/pci.c  |  36 +
 5 files changed, 213 insertions(+), 152 deletions(-)

-- 
2.4.3




[Qemu-devel] [PULL 01/11] ide: Prohibit RESET on IDE drives

2016-02-10 Thread John Snow
This command is meant for ATAPI devices only, prohibit acknowledging it with
a command aborted response when an IDE device is busy.

Signed-off-by: John Snow 
Reported-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
Message-id: 1453225191-11871-2-git-send-email-js...@redhat.com
---
 hw/ide/core.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 4c46453..88d5fab 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1877,9 +1877,13 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val)
 return;
 }
 
-/* Only DEVICE RESET is allowed while BSY or/and DRQ are set */
-if ((s->status & (BUSY_STAT|DRQ_STAT)) && val != WIN_DEVICE_RESET)
-return;
+/* Only RESET is allowed while BSY and/or DRQ are set,
+ * and only to ATAPI devices. */
+if (s->status & (BUSY_STAT|DRQ_STAT)) {
+if (val != WIN_DEVICE_RESET || s->drive_kind != IDE_CD) {
+return;
+}
+}
 
 if (!ide_cmd_permitted(s, val)) {
 ide_abort_command(s);
-- 
2.4.3




[Qemu-devel] [PULL 03/11] ide: move buffered DMA cancel to core

2016-02-10 Thread John Snow
Buffered DMA cancellation was added to ATAPI devices and implemented
for the BMDMA HBA. Move the code over to common IDE code and allow
it to be used for any HBA.

Signed-off-by: John Snow 
Reported-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
Message-id: 1453225191-11871-4-git-send-email-js...@redhat.com
---
 hw/ide/core.c | 45 +
 hw/ide/internal.h |  1 +
 hw/ide/pci.c  | 36 +---
 3 files changed, 47 insertions(+), 35 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 37d1058..5cafcf5 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -609,6 +609,51 @@ BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t 
sector_num,
 return aioreq;
 }
 
+/**
+ * Cancel all pending DMA requests.
+ * Any buffered DMA requests are instantly canceled,
+ * but any pending unbuffered DMA requests must be waited on.
+ */
+void ide_cancel_dma_sync(IDEState *s)
+{
+IDEBufferedRequest *req;
+
+/* First invoke the callbacks of all buffered requests
+ * and flag those requests as orphaned. Ideally there
+ * are no unbuffered (Scatter Gather DMA Requests or
+ * write requests) pending and we can avoid to drain. */
+QLIST_FOREACH(req, &s->buffered_requests, list) {
+if (!req->orphaned) {
+#ifdef DEBUG_IDE
+printf("%s: invoking cb %p of buffered request %p with"
+   " -ECANCELED\n", __func__, req->original_cb, req);
+#endif
+req->original_cb(req->original_opaque, -ECANCELED);
+}
+req->orphaned = true;
+}
+
+/*
+ * We can't cancel Scatter Gather DMA in the middle of the
+ * operation or a partial (not full) DMA transfer would reach
+ * the storage so we wait for completion instead (we beahve
+ * like if the DMA was completed by the time the guest trying
+ * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
+ * set).
+ *
+ * In the future we'll be able to safely cancel the I/O if the
+ * whole DMA operation will be submitted to disk with a single
+ * aio operation with preadv/pwritev.
+ */
+if (s->bus->dma->aiocb) {
+#ifdef DEBUG_IDE
+printf("%s: draining all remaining requests", __func__);
+#endif
+blk_drain_all();
+assert(s->bus->dma->aiocb == NULL);
+}
+}
+
 static void ide_sector_read(IDEState *s);
 
 static void ide_sector_read_cb(void *opaque, int ret)
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 2d1e2d2..86bde26 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -586,6 +586,7 @@ BlockAIOCB *ide_issue_trim(BlockBackend *blk,
 BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque);
+void ide_cancel_dma_sync(IDEState *s);
 
 /* hw/ide/atapi.c */
 void ide_atapi_cmd(IDEState *s);
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index fa5b63b..92ffee7 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -234,41 +234,7 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
 /* Ignore writes to SSBM if it keeps the old value */
 if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
 if (!(val & BM_CMD_START)) {
-/* First invoke the callbacks of all buffered requests
- * and flag those requests as orphaned. Ideally there
- * are no unbuffered (Scatter Gather DMA Requests or
- * write requests) pending and we can avoid to drain. */
-IDEBufferedRequest *req;
-IDEState *s = idebus_active_if(bm->bus);
-QLIST_FOREACH(req, &s->buffered_requests, list) {
-if (!req->orphaned) {
-#ifdef DEBUG_IDE
-printf("%s: invoking cb %p of buffered request %p with"
-   " -ECANCELED\n", __func__, req->original_cb, req);
-#endif
-req->original_cb(req->original_opaque, -ECANCELED);
-}
-req->orphaned = true;
-}
-/*
- * We can't cancel Scatter Gather DMA in the middle of the
- * operation or a partial (not full) DMA transfer would reach
- * the storage so we wait for completion instead (we beahve
- * like if the DMA was completed by the time the guest trying
- * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
- * set).
- *
- * In the future we'll be able to safely cancel the I/O if the
- * whole DMA operation will be submitted to disk with a single
- * aio operation with preadv/pwritev.
- */
-if (bm->bus->dma->aiocb) {
-#ifdef DEBUG_IDE
-printf("%s: draining all remaining requests", __func__);
-#endif
-blk_drain_all();
-assert(bm->bus->dma->aiocb == NULL);
-}
+ide_cancel_dma_sync(idebus

[Qemu-devel] [PULL 02/11] ide: code motion

2016-02-10 Thread John Snow
Shuffle the reset function upwards.

Signed-off-by: John Snow 
Reported-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
Message-id: 1453225191-11871-3-git-send-email-js...@redhat.com
---
 hw/ide/core.c | 116 +-
 1 file changed, 58 insertions(+), 58 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 88d5fab..37d1058 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1175,6 +1175,64 @@ void ide_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 }
 }
 
+static void ide_reset(IDEState *s)
+{
+#ifdef DEBUG_IDE
+printf("ide: reset\n");
+#endif
+
+if (s->pio_aiocb) {
+blk_aio_cancel(s->pio_aiocb);
+s->pio_aiocb = NULL;
+}
+
+if (s->drive_kind == IDE_CFATA)
+s->mult_sectors = 0;
+else
+s->mult_sectors = MAX_MULT_SECTORS;
+/* ide regs */
+s->feature = 0;
+s->error = 0;
+s->nsector = 0;
+s->sector = 0;
+s->lcyl = 0;
+s->hcyl = 0;
+
+/* lba48 */
+s->hob_feature = 0;
+s->hob_sector = 0;
+s->hob_nsector = 0;
+s->hob_lcyl = 0;
+s->hob_hcyl = 0;
+
+s->select = 0xa0;
+s->status = READY_STAT | SEEK_STAT;
+
+s->lba48 = 0;
+
+/* ATAPI specific */
+s->sense_key = 0;
+s->asc = 0;
+s->cdrom_changed = 0;
+s->packet_transfer_size = 0;
+s->elementary_transfer_size = 0;
+s->io_buffer_index = 0;
+s->cd_sector_size = 0;
+s->atapi_dma = 0;
+s->tray_locked = 0;
+s->tray_open = 0;
+/* ATA DMA state */
+s->io_buffer_size = 0;
+s->req_nb_sectors = 0;
+
+ide_set_signature(s);
+/* init the transfer handler so that 0x is returned on data
+   accesses */
+s->end_transfer_func = ide_dummy_transfer_stop;
+ide_dummy_transfer_stop(s);
+s->media_changed = 0;
+}
+
 static bool cmd_nop(IDEState *s, uint8_t cmd)
 {
 return true;
@@ -2183,64 +2241,6 @@ static void ide_dummy_transfer_stop(IDEState *s)
 s->io_buffer[3] = 0xff;
 }
 
-static void ide_reset(IDEState *s)
-{
-#ifdef DEBUG_IDE
-printf("ide: reset\n");
-#endif
-
-if (s->pio_aiocb) {
-blk_aio_cancel(s->pio_aiocb);
-s->pio_aiocb = NULL;
-}
-
-if (s->drive_kind == IDE_CFATA)
-s->mult_sectors = 0;
-else
-s->mult_sectors = MAX_MULT_SECTORS;
-/* ide regs */
-s->feature = 0;
-s->error = 0;
-s->nsector = 0;
-s->sector = 0;
-s->lcyl = 0;
-s->hcyl = 0;
-
-/* lba48 */
-s->hob_feature = 0;
-s->hob_sector = 0;
-s->hob_nsector = 0;
-s->hob_lcyl = 0;
-s->hob_hcyl = 0;
-
-s->select = 0xa0;
-s->status = READY_STAT | SEEK_STAT;
-
-s->lba48 = 0;
-
-/* ATAPI specific */
-s->sense_key = 0;
-s->asc = 0;
-s->cdrom_changed = 0;
-s->packet_transfer_size = 0;
-s->elementary_transfer_size = 0;
-s->io_buffer_index = 0;
-s->cd_sector_size = 0;
-s->atapi_dma = 0;
-s->tray_locked = 0;
-s->tray_open = 0;
-/* ATA DMA state */
-s->io_buffer_size = 0;
-s->req_nb_sectors = 0;
-
-ide_set_signature(s);
-/* init the transfer handler so that 0x is returned on data
-   accesses */
-s->end_transfer_func = ide_dummy_transfer_stop;
-ide_dummy_transfer_stop(s);
-s->media_changed = 0;
-}
-
 void ide_bus_reset(IDEBus *bus)
 {
 bus->unit = 0;
-- 
2.4.3




[Qemu-devel] [PULL 04/11] ide: replace blk_drain_all by blk_drain

2016-02-10 Thread John Snow
Target the drain for just one device.

Signed-off-by: John Snow 
Reported-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
Message-id: 1453225191-11871-5-git-send-email-js...@redhat.com
---
 hw/ide/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 5cafcf5..40b6cc8 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -649,7 +649,7 @@ void ide_cancel_dma_sync(IDEState *s)
 #ifdef DEBUG_IDE
 printf("%s: draining all remaining requests", __func__);
 #endif
-blk_drain_all();
+blk_drain(s->blk);
 assert(s->bus->dma->aiocb == NULL);
 }
 }
-- 
2.4.3




[Qemu-devel] make distclean can fail do to a configuration check

2016-02-10 Thread John Snow
Stuff like this:

> ~/s/q/b/git> make distclean
> config-host.mak is out-of-date, running configure
>
> ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.
>You probably need to set PKG_CONFIG_LIBDIR
>to point to the right pkg-config files for your
>build target
>
> Makefile:35: recipe for target 'config-host.mak' failed
> make: *** [config-host.mak] Error 1`

is obnoxious. We had patches from Fam to allow some targets to bypass
the configuration check, did those die? Did we not want them for some
reason?

--js



Re: [Qemu-devel] [PATCH 3/3] qemu-iotests: 140: use virtio-scsi instead of IDE

2016-02-10 Thread Sascha Silbe
Dear Max,

Max Reitz  writes:

[tests/qemu-iotests/140]
>> -_launch_qemu -drive 
>> if=ide,media=cdrom,id=drv,file="$TEST_IMG",format=$IMGFMT \
>> -2> >(_filter_nbd)
>> +_launch_qemu -drive 
>> if=none,media=cdrom,id=drv,file="$TEST_IMG",format=$IMGFMT \
>> +-device virtio-scsi -device scsi-cd,drive=drv 2> >(_filter_nbd)
>
> Why not just omit the device (and the media=cdrom along with it, keeping
> if=none)? This will change the reference output because there is no
> longer any tray to be moved, but this will still test what it's supposed to.
>
> (This may sound hypocritical coming from me, because I wrote this test
> so I could have just done so in the first place; I guess I just didn't
> realize that 'eject' works on device-less drives, too.)

Is this supposed to work? I.e. can we rely on it? If so, that would
certainly be the easier route for this particular test. Test coverage
should be unaffected as 139 already tests ejection (using virtio, unlike
118 which is PC-only).

The aliases patch has a value of its own, but that's a separate
matter.

Sascha
-- 
Softwareentwicklung Sascha Silbe, Niederhofenstraße 5/1, 71229 Leonberg
https://se-silbe.de/
USt-IdNr. DE281696641




[Qemu-devel] commit 5b82b70 boot problems

2016-02-10 Thread Gabriel L. Somlo
Hi Stefan,

After pulling from upstream this morning, my guest VMs were no longer
able to boot. For instance, running something like

bin/qemu-system-x86_64 -machine q35,accel=kvm -m 2048 -monitor stdio \
  -device ide-drive,bus=ide.2,drive=CD \
  -drive id=CD,if=none,snapshot=on,file=Fedora-Live-Workstation-x86_64-22-3.iso

Would just hang with a black screen, or sometimes partially display
the GRUB boot menu, and never make it any further than that.

After a bisect, I narrowed it down to 5b82b70 ("memory: RCU
ram_list.dirty_memory[] for safe RAM hotplug"); reverting that
commit got me back to where my VMs were working fine, but the
actual content of the changes isn't immediately obvious.

Apologies for the noise in case you're already aware of the problem.
Otherwise, I'd be happy to test and send you debug output.

Please advise.

Thanks much,
--Gabriel



Re: [Qemu-devel] [PULL v3 00/33] Misc patches for 2016-02-08

2016-02-10 Thread John Snow


On 02/10/2016 07:48 AM, Paolo Bonzini wrote:
> 
> 
> On 09/02/2016 17:13, Paolo Bonzini wrote:
>> The following changes since commit ac1be2ae6b2995b99430c48329eb971b0281acf1:
>>
>>   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-02-09' 
>> into staging (2016-02-09 11:42:43 +)
>>
>> are available in the git repository at:
>>
>>   git://github.com/bonzini/qemu.git tags/for-upstream
>>
>> for you to fetch changes up to 150dcd1aed6f9ebcf370dbb9b666e7d7c6d908e2:
>>
>>   qemu-char, io: fix ordering of arguments for UDP socket creation 
>> (2016-02-09 17:09:15 +0100)
>>
>> 
>> * switch to C11 atomics (Alex)
>> * Coverity fixes for IPMI (Corey), i386 (Paolo), qemu-char (Paolo)
>> * at long last, fail on wrong .pc files if -m32 is in use (Daniel)
>> * qemu-char regression fix (Daniel)
>> * SAS1068 device (Paolo)
>> * memory region docs improvements (Peter)
>> * target-i386 cleanups (Richard)
>> * qemu-nbd docs improvements (Sitsofe)
>> * thread-safe memory hotplug (Stefan)
>>
>> 
>>
>> v2->v3: rebased due to semantic conflict, split MAINTAINERS patch
>>
>> Alex Bennée (1):
>>   include/qemu/atomic.h: default to __atomic functions
>>
>> Andrew Jones (1):
>>   kvm-all: trace: strerror fixup
>>
>> Corey Minyard (2):
>>   ipmi_bmc_sim: Fix off by one in check.
>>   ipmi_bmc_sim: Add break to correct watchdog NMI check
>>
>> Daniel P. Berrange (2):
>>   configure: sanity check the glib library that pkg-config finds

Appears to break clang 3.5.0 on F22;

jsnow@scv ((977a82a...)) ~/s/q/b/git> ../../configure --cxx=clang++
--cc=clang --host-cc=clang --extra-cflags=-Werror
--extra-cflags=-fsanitize=undefined
--extra-cflags=-Wno-deprecated-declarations
--extra-cflags=-fno-sanitize=float-divide-by-zero
--target-list="x86_64-softmmu"

ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.
   You probably need to set PKG_CONFIG_LIBDIR
   to point to the right pkg-config files for your
   build target



Problem appears to be this:

/usr/include/glib-2.0/glib/gmem.h:76:58: error: unknown attribute
'__alloc_size__' ignored [-Werror,-Wunknown-attributes]
gpointer g_malloc (gsize n_bytes) G_GNUC_MALLOC
G_GNUC_ALLOC_SIZE(1);
^
/usr/include/glib-2.0/glib/gmacros.h:67:45: note: expanded from macro
'G_GNUC_ALLOC_SIZE'
#define G_GNUC_ALLOC_SIZE(x) __attribute__((__alloc_size__(x)))


>>   char: fix repeated registration of tcp chardev I/O handlers
>>
>> Janosch Frank (1):
>>   scripts/kvm/kvm_stat: Fix tracefs access checking
>>
>> John Snow (1):
>>   nbd: avoid unaligned uint64_t store
>>
>> Paolo Bonzini (9):
>>   memory: add early bail out from cpu_physical_memory_set_dirty_range
>>   qemu-char: Keep pty slave file descriptor open until the master is 
>> closed
>>   scsi: push WWN fields up to SCSIDevice
>>   scsi-generic: grab device and port SAS addresses from backend
>>   hw: Add support for LSI SAS1068 (mptsas) device
>>   ipmi: do not take/drop iothread lock
>>   target-i386: fix PSE36 mode
>>   get_maintainer.pl: fall back to git if only lists are found
>>   qemu-char, io: fix ordering of arguments for UDP socket creation
>>
>> Peter Maydell (1):
>>   docs/memory.txt: Improve list of different memory regions
>>
>> Richard Henderson (10):
>>   target-i386: Create gen_lea_v_seg
>>   target-i386: Introduce mo_stacksize
>>   target-i386: Use gen_lea_v_seg in gen_lea_modrm
>>   target-i386: Use gen_lea_v_seg in stack subroutines
>>   target-i386: Access segs via TCG registers
>>   target-i386: Use gen_lea_v_seg in pusha/popa
>>   target-i386: Rewrite gen_enter inline
>>   target-i386: Rewrite leave
>>   target-i386: Tidy gen_add_A0_im
>>   target-i386: Deconstruct the cpu_T array
>>
>> Sitsofe Wheeler (3):
>>   qemu-nbd: Fix unintended texi verbatim formatting
>>   qemu-nbd: Minor texi updates
>>   qemu-nbd: Fix texi sentence capitalisation
>>
>> Stefan Hajnoczi (1):
>>   memory: RCU ram_list.dirty_memory[] for safe RAM hotplug
>>
>> Stephen Warren (1):
>>   MAINTAINERS: add all-match entry for qemu-devel@
>>
>>  MAINTAINERS   |5 +
>>  configure |   24 +
>>  default-configs/pci.mak   |1 +
>>  docs/memory.txt   |   26 +-
>>  exec.c|   75 +-
>>  hw/ipmi/ipmi.c|2 -
>>  hw/ipmi/ipmi_bmc_sim.c|4 +-
>>  hw/scsi/Makefile.objs |1 +
>>  hw/scsi/mpi.h | 1153 ++
>>  hw/scsi/mptconfig.c   |  904 
>>  hw/scsi/mptendian.c   |  204 ++
>>  hw/scsi/mptsas.c  | 1441 +
>>  hw/scsi/mptsas.h  |  100 +++
>>  hw/scsi/scsi-disk.c   |   23 +-
>>  hw/scsi/scsi-gen

Re: [Qemu-devel] [PATCH v2] linux-user: add option to intercept execve() syscalls

2016-02-10 Thread Laurent Vivier


Le 27/01/2016 09:49, Petros Angelatos a écrit :
> From: Petros Angelatos 
> 
> In order for one to use QEMU user mode emulation under a chroot, it is
> required to use binfmt_misc. This can be avoided by QEMU never doing a
> raw execve() to the host system.
> 
> Introduce a new option, -execve, that uses the current QEMU interpreter
> to intercept execve().
> 
> qemu_execve() will prepend the interpreter path , similar to what
> binfmt_misc would do, and then pass the modified execve() to the host.
> 
> It is necessary to parse hashbang scripts in that function otherwise
> the kernel will try to run the interpreter of a script without QEMU and
> get an invalid exec format error.
> 
> Signed-off-by: Petros Angelatos 
> ---

You can put here the difference between v1 and v2.

>  linux-user/main.c|  36 
>  linux-user/qemu.h|   1 +
>  linux-user/syscall.c | 117 
> ++-
>  3 files changed, 153 insertions(+), 1 deletion(-)
> 
> diff --git a/linux-user/main.c b/linux-user/main.c
> index ee12035..751eafa 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

This line needs to be rebased.

Otherwise:

Tested-by: Laurent Vivier 
Reviewed-by: Laurent Vivier 

>  #include 
>  #include 
>  #include 
> @@ -79,6 +80,7 @@ static void usage(int exitcode);
>  
>  static const char *interp_prefix = CONFIG_QEMU_INTERP_PREFIX;
>  const char *qemu_uname_release;
> +const char *qemu_execve_path;
>  
>  /* XXX: on x86 MAP_GROWSDOWN only works if ESP <= address + 32, so
> we allocate a bigger stack. Need a better solution, for example
> @@ -3828,6 +3830,38 @@ static void handle_arg_guest_base(const char *arg)
>  have_guest_base = 1;
>  }
>  
> +static void handle_arg_execve(const char *arg)
> +{
> +const char *execfn;
> +char buf[PATH_MAX];
> +char *ret;
> +int len;
> +
> +/* try getauxval() */
> +execfn = (const char *) getauxval(AT_EXECFN);
> +
> +if (execfn != 0) {
> +ret = realpath(execfn, buf);
> +
> +if (ret != NULL) {
> +qemu_execve_path = strdup(buf);
> +return;
> +}
> +}
> +
> +/* try /proc/self/exe */
> +len = readlink("/proc/self/exe", buf, sizeof(buf) - 1);
> +
> +if (len != -1) {
> +buf[len] = '\0';
> +qemu_execve_path = strdup(buf);
> +return;
> +}
> +
> +fprintf(stderr, "qemu_execve: unable to determine intepreter's path\n");
> +exit(EXIT_FAILURE);
> +}
> +
>  static void handle_arg_reserved_va(const char *arg)
>  {
>  char *p;
> @@ -3913,6 +3947,8 @@ static const struct qemu_argument arg_table[] = {
>   "uname",  "set qemu uname release string to 'uname'"},
>  {"B",  "QEMU_GUEST_BASE",  true,  handle_arg_guest_base,
>   "address","set guest_base address to 'address'"},
> +{"execve", "QEMU_EXECVE",  false, handle_arg_execve,
> + "",   "use this interpreter when a process calls execve()"},
>  {"R",  "QEMU_RESERVED_VA", true,  handle_arg_reserved_va,
>   "size",   "reserve 'size' bytes for guest virtual address space"},
>  {"d",  "QEMU_LOG", true,  handle_arg_log,
> diff --git a/linux-user/qemu.h b/linux-user/qemu.h
> index bd90cc3..0d9b058 100644
> --- a/linux-user/qemu.h
> +++ b/linux-user/qemu.h
> @@ -140,6 +140,7 @@ void init_task_state(TaskState *ts);
>  void task_settid(TaskState *);
>  void stop_all_tasks(void);
>  extern const char *qemu_uname_release;
> +extern const char *qemu_execve_path;
>  extern unsigned long mmap_min_addr;
>  
>  /* ??? See if we can avoid exposing so much of the loader internals.  */
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 0cbace4..4755978 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -111,6 +111,7 @@ int __clone2(int (*fn)(void *), void *child_stack_base,
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "linux_loop.h"
>  #include "uname.h"
>  
> @@ -5854,6 +5855,118 @@ static target_timer_t get_timer_id(abi_long arg)
>  return timerid;
>  }
>  
> +/* qemu_execve() Must return target values and target errnos. */
> +static abi_long qemu_execve(char *filename, char *argv[],
> +  char *envp[])
> +{
> +char *i_arg = NULL, *i_name = NULL;
> +char **new_argp;
> +int argc, fd, ret, i, offset = 3;
> +char *cp;
> +char buf[BINPRM_BUF_SIZE];
> +
> +/* normal execve case */
> +if (qemu_execve_path == NULL || *qemu_execve_path == 0) {
> +return get_errno(execve(filename, argv, envp));
> +}
> +
> +for (argc = 0; argv[argc] != NULL; argc++) {
> +/* nothing */ ;
> +}
> +
> +fd = open(filename, O_RDONLY);
> +if (fd == -1) {
> +return get_errno(fd);
> +}
> +
> +ret = read(fd, buf, BINPRM_BUF_SIZE);
> +if (ret == -1) {
> +close(fd);
> +return ge

Re: [Qemu-devel] [PATCH 1/2] qemu-iotests: add _filter_qmp_events() for filtering out QMP events

2016-02-10 Thread Sascha Silbe
Dear Max,

Max Reitz  writes:

>> +# remove QMP events from output
>> +_filter_qmp_events()
>> +{
>> +sed -e '/^{\(.*, \)"event": ".*}$/ d'
>> +}
>
> There is a pretty good reason test 067 uses -qmp-pretty (as you yourself
> say, the lines get pretty long otherwise, and if we have any change
> within, the whole line needs to be changed).

Additionally, it's a lot easier to read when indented properly,
especially with the block info containing nested dicts.

> Using the following ugly
> piece of code here instead, we would still be able to use it:
>
> tr '\n' '\t' \
>   | sed -e
> 's/{\s*"timestamp":\s*{[^}]*},\s*"event":[^,}]*\(,\s*"data":\s*{[^}]*}\)\?\s*}\s*//g'
> \
>   | tr '\t' '\n'

Nice trick. Why didn't I come up with it? ;)

It definitely is a bit ugly, though. We can't just drop the entire line
(using "d") as the entire stream now is a single line. Matching
parenthesis pairs is context sensitive, so we can't just use regular
expressions to aggregate results into lines. And before I start
implementing a JSON indenter in awk, I'd rather rewrite the whole test
in Python. So if we stay with the shell test for now, we need something
like your incantation above. It's not perfect, but good enough for now
and I can't think of anything significantly simpler right now either.

Will test your version and send a v2. Thanks for the suggestion!

Sascha
-- 
Softwareentwicklung Sascha Silbe, Niederhofenstraße 5/1, 71229 Leonberg
https://se-silbe.de/
USt-IdNr. DE281696641




[Qemu-devel] [PATCH v6 16/16] nbd: enable use of TLS with nbd-server-start command

2016-02-10 Thread Daniel P. Berrange
This modifies the nbd-server-start QMP command so that it
is possible to request use of TLS. This is done by adding
a new optional parameter "tls-creds" which provides the ID
of a previously created QCryptoTLSCreds object instance.

TLS is only supported when using an IPv4/IPv6 socket listener.

Signed-off-by: Daniel P. Berrange 
---
 blockdev-nbd.c  | 122 ++--
 hmp.c   |   2 +-
 qapi/block.json |   4 +-
 qmp-commands.hx |   2 +-
 4 files changed, 105 insertions(+), 25 deletions(-)

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index f181840..12cae0e 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -20,42 +20,126 @@
 #include "block/nbd.h"
 #include "io/channel-socket.h"
 
-static QIOChannelSocket *server_ioc;
-static int server_watch = -1;
+typedef struct NBDServerData {
+QIOChannelSocket *listen_ioc;
+int watch;
+QCryptoTLSCreds *tlscreds;
+} NBDServerData;
+
+static NBDServerData *nbd_server;
+
 
 static gboolean nbd_accept(QIOChannel *ioc, GIOCondition condition,
gpointer opaque)
 {
 QIOChannelSocket *cioc;
 
+if (!nbd_server) {
+return FALSE;
+}
+
 cioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc),
  NULL);
 if (!cioc) {
 return TRUE;
 }
 
-nbd_client_new(NULL, cioc, NULL, NULL, nbd_client_put);
+nbd_client_new(NULL, cioc,
+   nbd_server->tlscreds, NULL,
+   nbd_client_put);
 object_unref(OBJECT(cioc));
 return TRUE;
 }
 
-void qmp_nbd_server_start(SocketAddress *addr, Error **errp)
+
+static void nbd_server_free(NBDServerData *server)
 {
-if (server_ioc) {
-error_setg(errp, "NBD server already running");
+if (!server) {
 return;
 }
 
-server_ioc = qio_channel_socket_new();
-if (qio_channel_socket_listen_sync(server_ioc, addr, errp) < 0) {
+if (server->watch != -1) {
+g_source_remove(server->watch);
+}
+object_unref(OBJECT(server->listen_ioc));
+if (server->tlscreds) {
+object_unref(OBJECT(server->tlscreds));
+}
+
+g_free(server);
+}
+
+static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, Error **errp)
+{
+Object *obj;
+QCryptoTLSCreds *creds;
+
+obj = object_resolve_path_component(
+object_get_objects_root(), id);
+if (!obj) {
+error_setg(errp, "No TLS credentials with id '%s'",
+   id);
+return NULL;
+}
+creds = (QCryptoTLSCreds *)
+object_dynamic_cast(obj, TYPE_QCRYPTO_TLS_CREDS);
+if (!creds) {
+error_setg(errp, "Object with id '%s' is not TLS credentials",
+   id);
+return NULL;
+}
+
+if (creds->endpoint != QCRYPTO_TLS_CREDS_ENDPOINT_SERVER) {
+error_setg(errp,
+   "Expecting TLS credentials with a server endpoint");
+return NULL;
+}
+object_ref(obj);
+return creds;
+}
+
+
+void qmp_nbd_server_start(SocketAddress *addr,
+  bool has_tls_creds, const char *tls_creds,
+  Error **errp)
+{
+if (nbd_server) {
+error_setg(errp, "NBD server already running");
 return;
 }
 
-server_watch = qio_channel_add_watch(QIO_CHANNEL(server_ioc),
- G_IO_IN,
- nbd_accept,
- NULL,
- NULL);
+nbd_server = g_new0(NBDServerData, 1);
+nbd_server->watch = -1;
+nbd_server->listen_ioc = qio_channel_socket_new();
+if (qio_channel_socket_listen_sync(
+nbd_server->listen_ioc, addr, errp) < 0) {
+goto error;
+}
+
+if (has_tls_creds) {
+nbd_server->tlscreds = nbd_get_tls_creds(tls_creds, errp);
+if (!nbd_server->tlscreds) {
+goto error;
+}
+
+if (addr->type != SOCKET_ADDRESS_KIND_INET) {
+error_setg(errp, "TLS is only supported with IPv4/IPv6");
+goto error;
+}
+}
+
+nbd_server->watch = qio_channel_add_watch(
+QIO_CHANNEL(nbd_server->listen_ioc),
+G_IO_IN,
+nbd_accept,
+NULL,
+NULL);
+
+return;
+
+ error:
+nbd_server_free(nbd_server);
+nbd_server = NULL;
 }
 
 void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
@@ -64,7 +148,7 @@ void qmp_nbd_server_add(const char *device, bool 
has_writable, bool writable,
 BlockBackend *blk;
 NBDExport *exp;
 
-if (!server_ioc) {
+if (!nbd_server) {
 error_setg(errp, "NBD server not running");
 return;
 }
@@ -110,12 +194,6 @@ void qmp_nbd_server_stop(Error **errp)
 {
 nbd_export_close_all();
 
-if (server_watch != -1) {
-g_source_remove(server_watch);
-server_watch = -1;
-}
-if (server_ioc) {
-object_unref(OBJECT(server_io

[Qemu-devel] [PATCH v6 15/16] nbd: enable use of TLS with qemu-nbd server

2016-02-10 Thread Daniel P. Berrange
This modifies the qemu-nbd program so that it is possible to
request the use of TLS with the server. It simply adds a new
command line option --tls-creds which is used to provide the
ID of a QCryptoTLSCreds object previously created via the
--object command line option.

For example

  qemu-nbd --object tls-creds-x509,id=tls0,endpoint=server,\
dir=/home/berrange/security/qemutls \
   --tls-creds tls0 \
   --exportname default

TLS requires the new style NBD protocol, so if no export name
is set (via --export-name), then we use the default NBD protocol
export name ""

TLS is only supported when using an IPv4/IPv6 socket listener.
It is not possible to use with UNIX sockets, which includes
when connecting the NBD server to a host device.

Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c| 62 ++-
 qemu-nbd.texi |  4 
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 8acd515..933ca4a 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -42,6 +42,7 @@
 #define QEMU_NBD_OPT_DISCARD   3
 #define QEMU_NBD_OPT_DETECT_ZEROES 4
 #define QEMU_NBD_OPT_OBJECT5
+#define QEMU_NBD_OPT_TLSCREDS  6
 
 static NBDExport *exp;
 static bool newproto;
@@ -54,6 +55,7 @@ static int shared = 1;
 static int nb_fds;
 static QIOChannelSocket *server_ioc;
 static int server_watch = -1;
+static QCryptoTLSCreds *tlscreds;
 
 static void usage(const char *name)
 {
@@ -342,7 +344,7 @@ static gboolean nbd_accept(QIOChannel *ioc, GIOCondition 
cond, gpointer opaque)
 nb_fds++;
 nbd_update_server_watch();
 nbd_client_new(newproto ? NULL : exp, cioc,
-   NULL, NULL, nbd_client_closed);
+   tlscreds, NULL, nbd_client_closed);
 object_unref(OBJECT(cioc));
 
 return TRUE;
@@ -402,6 +404,37 @@ static QemuOptsList qemu_object_opts = {
 };
 
 
+
+static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, Error **errp)
+{
+Object *obj;
+QCryptoTLSCreds *creds;
+
+obj = object_resolve_path_component(
+object_get_objects_root(), id);
+if (!obj) {
+error_setg(errp, "No TLS credentials with id '%s'",
+   id);
+return NULL;
+}
+creds = (QCryptoTLSCreds *)
+object_dynamic_cast(obj, TYPE_QCRYPTO_TLS_CREDS);
+if (!creds) {
+error_setg(errp, "Object with id '%s' is not TLS credentials",
+   id);
+return NULL;
+}
+
+if (creds->endpoint != QCRYPTO_TLS_CREDS_ENDPOINT_SERVER) {
+error_setg(errp,
+   "Expecting TLS credentials with a server endpoint");
+return NULL;
+}
+object_ref(obj);
+return creds;
+}
+
+
 int main(int argc, char **argv)
 {
 BlockBackend *blk;
@@ -441,6 +474,7 @@ int main(int argc, char **argv)
 { "verbose", 0, NULL, 'v' },
 { "object", 1, NULL, QEMU_NBD_OPT_OBJECT },
 { "export-name", 1, NULL, 'x' },
+{ "tls-creds", 1, NULL, QEMU_NBD_OPT_TLSCREDS },
 { NULL, 0, NULL, 0 }
 };
 int ch;
@@ -458,6 +492,7 @@ int main(int argc, char **argv)
 BlockdevDetectZeroesOptions detect_zeroes = 
BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
 QDict *options = NULL;
 const char *export_name = NULL;
+const char *tlscredsid = NULL;
 
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
@@ -634,6 +669,9 @@ int main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }
 }   break;
+case QEMU_NBD_OPT_TLSCREDS:
+tlscredsid = optarg;
+break;
 }
 }
 
@@ -650,6 +688,28 @@ int main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }
 
+if (tlscredsid) {
+if (sockpath) {
+error_report("TLS is only supported with IPv4/IPv6");
+exit(EXIT_FAILURE);
+}
+if (device) {
+error_report("TLS is not supported with a host device");
+exit(EXIT_FAILURE);
+}
+if (!export_name) {
+/* Set the default NBD protocol export name, since
+ * we *must* use new style protocol for TLS */
+export_name = "";
+}
+tlscreds = nbd_get_tls_creds(tlscredsid, &local_err);
+if (local_err) {
+error_report("Failed to get TLS creds %s",
+ error_get_pretty(local_err));
+exit(EXIT_FAILURE);
+}
+}
+
 if (disconnect) {
 int nbdfd = open(argv[optind], O_RDWR);
 if (nbdfd < 0) {
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 2516963..417f9c6 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -76,6 +76,10 @@ Don't exit on the last connection
 @item -x NAME, --export-name=NAME
 Set the NDB volume export name. This switches the server to use
 the new style NBD protocol negotiation
+@item --tls-creds=ID
+

[Qemu-devel] [PATCH v6 14/16] nbd: enable use of TLS with NBD block driver

2016-02-10 Thread Daniel P. Berrange
This modifies the NBD driver so that it is possible to request
use of TLS. This is done by providing the 'tls-creds' parameter
with the ID of a previously created QCryptoTLSCreds object.

For example

  $QEMU -object tls-creds-x509,id=tls0,endpoint=client,\
dir=/home/berrange/security/qemutls \
-drive driver=nbd,host=localhost,port=9000,tls-creds=tls0

The client will drop the connection if the NBD server does not
provide TLS.

Signed-off-by: Daniel P. Berrange 
---
 block/nbd-client.c | 10 ---
 block/nbd-client.h |  2 ++
 block/nbd.c| 78 +++---
 3 files changed, 78 insertions(+), 12 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 1c79e4b..6a9b4c7 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -394,8 +394,12 @@ void nbd_client_close(BlockDriverState *bs)
 nbd_teardown_connection(bs);
 }
 
-int nbd_client_init(BlockDriverState *bs, QIOChannelSocket *sioc,
-const char *export, Error **errp)
+int nbd_client_init(BlockDriverState *bs,
+QIOChannelSocket *sioc,
+const char *export,
+QCryptoTLSCreds *tlscreds,
+const char *hostname,
+Error **errp)
 {
 NbdClientSession *client = nbd_get_client_session(bs);
 int ret;
@@ -406,7 +410,7 @@ int nbd_client_init(BlockDriverState *bs, QIOChannelSocket 
*sioc,
 
 ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), export,
 &client->nbdflags,
-NULL, NULL,
+tlscreds, hostname,
 &client->ioc,
 &client->size, errp);
 if (ret < 0) {
diff --git a/block/nbd-client.h b/block/nbd-client.h
index e8b3283..53f116d 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -39,6 +39,8 @@ NbdClientSession *nbd_get_client_session(BlockDriverState 
*bs);
 int nbd_client_init(BlockDriverState *bs,
 QIOChannelSocket *sock,
 const char *export_name,
+QCryptoTLSCreds *tlscreds,
+const char *hostname,
 Error **errp);
 void nbd_client_close(BlockDriverState *bs);
 
diff --git a/block/nbd.c b/block/nbd.c
index d7116e2..db57b49 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -258,36 +258,92 @@ static QIOChannelSocket 
*nbd_establish_connection(SocketAddress *saddr,
 return sioc;
 }
 
+
+static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, Error **errp)
+{
+Object *obj;
+QCryptoTLSCreds *creds;
+
+obj = object_resolve_path_component(
+object_get_objects_root(), id);
+if (!obj) {
+error_setg(errp, "No TLS credentials with id '%s'",
+   id);
+return NULL;
+}
+creds = (QCryptoTLSCreds *)
+object_dynamic_cast(obj, TYPE_QCRYPTO_TLS_CREDS);
+if (!creds) {
+error_setg(errp, "Object with id '%s' is not TLS credentials",
+   id);
+return NULL;
+}
+
+if (creds->endpoint != QCRYPTO_TLS_CREDS_ENDPOINT_CLIENT) {
+error_setg(errp,
+   "Expecting TLS credentials with a client endpoint");
+return NULL;
+}
+object_ref(obj);
+return creds;
+}
+
+
 static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
 Error **errp)
 {
 BDRVNBDState *s = bs->opaque;
 char *export = NULL;
-int result;
-QIOChannelSocket *sioc;
+QIOChannelSocket *sioc = NULL;
 SocketAddress *saddr;
+const char *tlscredsid;
+QCryptoTLSCreds *tlscreds = NULL;
+const char *hostname = NULL;
+int ret = -EINVAL;
 
 /* Pop the config into our state object. Exit if invalid. */
 saddr = nbd_config(s, options, &export, errp);
 if (!saddr) {
-return -EINVAL;
+goto error;
+}
+
+tlscredsid = g_strdup(qdict_get_try_str(options, "tls-creds"));
+if (tlscredsid) {
+qdict_del(options, "tls-creds");
+tlscreds = nbd_get_tls_creds(tlscredsid, errp);
+if (!tlscreds) {
+goto error;
+}
+
+if (saddr->type != SOCKET_ADDRESS_KIND_INET) {
+error_setg(errp, "TLS only supported over IP sockets");
+goto error;
+}
+hostname = saddr->u.inet->host;
 }
 
 /* establish TCP connection, return error if it fails
  * TODO: Configurable retry-until-timeout behaviour.
  */
 sioc = nbd_establish_connection(saddr, errp);
-qapi_free_SocketAddress(saddr);
 if (!sioc) {
-g_free(export);
-return -ECONNREFUSED;
+ret = -ECONNREFUSED;
+goto error;
 }
 
 /* NBD handshake */
-result = nbd_client_init(bs, sioc, export, errp);
-object_unref(OBJECT(sioc));
+ret = nbd_client_init(bs, sioc, export,
+  tlscreds, hostname, errp);
+ er

[Qemu-devel] [PATCH v6 11/16] nbd: always query export list in fixed new style protocol

2016-02-10 Thread Daniel P. Berrange
With the new style protocol, the NBD client will currenetly
send NBD_OPT_EXPORT_NAME as the first (and indeed only)
option it wants. The problem is that the NBD protocol spec
does not allow for returning an error message with the
NBD_OPT_EXPORT_NAME option. So if the server mandates use
of TLS, the client will simply see an immediate connection
close after issuing NBD_OPT_EXPORT_NAME which is not user
friendly.

To improve this situation, if we have the fixed new style
protocol, we can sent NBD_OPT_LIST as the first option
to query the list of server exports. We can check for our
named export in this list and raise an error if it is not
found, instead of going ahead and sending NBD_OPT_EXPORT_NAME
with a name that we know will be rejected.

This improves the error reporting both in the case that the
server required TLS, and in the case that the client requested
export name does not exist on the server.

If the server does not support NBD_OPT_LIST, we just ignore
that and carry on with NBD_OPT_EXPORT_NAME as before.

Signed-off-by: Daniel P. Berrange 
---
 nbd/client.c   | 195 -
 nbd/server.c   |   2 +
 tests/qemu-iotests/140.out |   2 +-
 tests/qemu-iotests/143.out |   2 +-
 4 files changed, 196 insertions(+), 5 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 88f2ada..be5f08d 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -71,6 +71,177 @@ static QTAILQ_HEAD(, NBDExport) exports = 
QTAILQ_HEAD_INITIALIZER(exports);
 
 */
 
+
+static int nbd_handle_reply_err(uint32_t opt, uint32_t type, Error **errp)
+{
+if (!(type & (1 << 31))) {
+return 0;
+}
+
+switch (type) {
+case NBD_REP_ERR_UNSUP:
+error_setg(errp, "Unsupported option type %x", opt);
+break;
+
+case NBD_REP_ERR_INVALID:
+error_setg(errp, "Invalid data length for option %x", opt);
+break;
+
+default:
+error_setg(errp, "Unknown error code when asking for option %x", opt);
+break;
+}
+
+return -1;
+}
+
+static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
+{
+uint64_t magic;
+uint32_t opt;
+uint32_t type;
+uint32_t len;
+uint32_t namelen;
+
+*name = NULL;
+if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
+error_setg(errp, "failed to read list option magic");
+return -1;
+}
+magic = be64_to_cpu(magic);
+if (magic != NBD_REP_MAGIC) {
+error_setg(errp, "Unexpected option list magic");
+return -1;
+}
+if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
+error_setg(errp, "failed to read list option");
+return -1;
+}
+opt = be32_to_cpu(opt);
+if (opt != NBD_OPT_LIST) {
+error_setg(errp, "Unexpected option type %x expected %x",
+   opt, NBD_OPT_LIST);
+return -1;
+}
+
+if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
+error_setg(errp, "failed to read list option type");
+return -1;
+}
+type = be32_to_cpu(type);
+if (type == NBD_REP_ERR_UNSUP) {
+return 0;
+}
+if (nbd_handle_reply_err(opt, type, errp) < 0) {
+return -1;
+}
+
+if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
+error_setg(errp, "failed to read option length");
+return -1;
+}
+len = be32_to_cpu(len);
+
+if (type == NBD_REP_ACK) {
+if (len != 0) {
+error_setg(errp, "length too long for option end");
+return -1;
+}
+} else if (type == NBD_REP_SERVER) {
+if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
+error_setg(errp, "failed to read option name length");
+return -1;
+}
+namelen = be32_to_cpu(namelen);
+if (len != (namelen + sizeof(namelen))) {
+error_setg(errp, "incorrect option mame length");
+return -1;
+}
+if (namelen > 255) {
+error_setg(errp, "export name length too long %d", namelen);
+return -1;
+}
+
+*name = g_new0(char, namelen + 1);
+if (read_sync(ioc, *name, namelen) != namelen) {
+error_setg(errp, "failed to read export name");
+g_free(*name);
+*name = NULL;
+return -1;
+}
+(*name)[namelen] = '\0';
+} else {
+error_setg(errp, "Unexpected reply type %x expected %x",
+   type, NBD_REP_SERVER);
+return -1;
+}
+return 1;
+}
+
+
+static int nbd_receive_query_exports(QIOChannel *ioc,
+ const char *wantname,
+ Error **errp)
+{
+uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
+uint32_t opt = cpu_to_be32(NBD_OPT_LIST);
+uint32_t length = 0;
+bool foundExport = false;
+
+TRACE("Querying export list");
+if (write_sync(ioc, &magic, sizeof(magic)) !

[Qemu-devel] [PATCH v6 13/16] nbd: implement TLS support in the protocol negotiation

2016-02-10 Thread Daniel P. Berrange
This extends the NBD protocol handling code so that it is capable
of negotiating TLS support during the connection setup. This involves
requesting the STARTTLS protocol option before any other NBD options.

Signed-off-by: Daniel P. Berrange 
---
 block/nbd-client.c  |  12 +++--
 blockdev-nbd.c  |   2 +-
 include/block/nbd.h |   8 
 nbd/client.c| 136 +++-
 nbd/common.c|  15 ++
 nbd/nbd-internal.h  |  14 ++
 nbd/server.c| 118 ++---
 qemu-nbd.c  |   4 +-
 8 files changed, 296 insertions(+), 13 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 75bb2d9..1c79e4b 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -405,7 +405,10 @@ int nbd_client_init(BlockDriverState *bs, QIOChannelSocket 
*sioc,
 qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
 
 ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), export,
-&client->nbdflags, &client->size, errp);
+&client->nbdflags,
+NULL, NULL,
+&client->ioc,
+&client->size, errp);
 if (ret < 0) {
 logout("Failed to negotiate with the NBD server\n");
 return ret;
@@ -415,8 +418,11 @@ int nbd_client_init(BlockDriverState *bs, QIOChannelSocket 
*sioc,
 qemu_co_mutex_init(&client->free_sema);
 client->sioc = sioc;
 object_ref(OBJECT(client->sioc));
-client->ioc = QIO_CHANNEL(sioc);
-object_ref(OBJECT(client->ioc));
+
+if (!client->ioc) {
+client->ioc = QIO_CHANNEL(sioc);
+object_ref(OBJECT(client->ioc));
+}
 
 /* Now that we're connected, set the socket to be non-blocking and
  * kick the reply mechanism.  */
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 2148753..f181840 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -34,7 +34,7 @@ static gboolean nbd_accept(QIOChannel *ioc, GIOCondition 
condition,
 return TRUE;
 }
 
-nbd_client_new(NULL, cioc, nbd_client_put);
+nbd_client_new(NULL, cioc, NULL, NULL, nbd_client_put);
 object_unref(OBJECT(cioc));
 return TRUE;
 }
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 1080ef8..b197adc 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -24,6 +24,7 @@
 #include "qemu-common.h"
 #include "qemu/option.h"
 #include "io/channel-socket.h"
+#include "crypto/tlscreds.h"
 
 struct nbd_request {
 uint32_t magic;
@@ -56,7 +57,10 @@ struct nbd_reply {
 #define NBD_REP_ACK (1) /* Data sending finished. */
 #define NBD_REP_SERVER  (2) /* Export description. */
 #define NBD_REP_ERR_UNSUP   ((UINT32_C(1) << 31) | 1) /* Unknown option. */
+#define NBD_REP_ERR_POLICY  ((UINT32_C(1) << 31) | 2) /* Server denied */
 #define NBD_REP_ERR_INVALID ((UINT32_C(1) << 31) | 3) /* Invalid length. */
+#define NBD_REP_ERR_TLS_REQD((UINT32_C(1) << 31) | 5) /* TLS required */
+
 
 #define NBD_CMD_MASK_COMMAND   0x
 #define NBD_CMD_FLAG_FUA   (1 << 16)
@@ -81,6 +85,8 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
  size_t length,
  bool do_read);
 int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
+  QCryptoTLSCreds *tlscreds, const char *hostname,
+  QIOChannel **outioc,
   off_t *size, Error **errp);
 int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size);
 ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request);
@@ -106,6 +112,8 @@ void nbd_export_close_all(void);
 
 void nbd_client_new(NBDExport *exp,
 QIOChannelSocket *sioc,
+QCryptoTLSCreds *tlscreds,
+const char *tlsaclname,
 void (*close)(NBDClient *));
 void nbd_client_get(NBDClient *client);
 void nbd_client_put(NBDClient *client);
diff --git a/nbd/client.c b/nbd/client.c
index 5e47ac7..9e5b651 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -83,10 +83,18 @@ static int nbd_handle_reply_err(uint32_t opt, uint32_t 
type, Error **errp)
 error_setg(errp, "Unsupported option type %x", opt);
 break;
 
+case NBD_REP_ERR_POLICY:
+error_setg(errp, "Denied by server for option %x", opt);
+break;
+
 case NBD_REP_ERR_INVALID:
 error_setg(errp, "Invalid data length for option %x", opt);
 break;
 
+case NBD_REP_ERR_TLS_REQD:
+error_setg(errp, "TLS negotiation required before option %x", opt);
+break;
+
 default:
 error_setg(errp, "Unknown error code when asking for option %x", opt);
 break;
@@ -242,17 +250,127 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
 return 0;
 }
 
+static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
+

[Qemu-devel] [PATCH v6 08/16] nbd: make server compliant with fixed newstyle spec

2016-02-10 Thread Daniel P. Berrange
If the client does not request the fixed new style protocol,
then we should only accept NBD_OPT_EXPORT_NAME. All other
options are only valid when fixed new style has been activated.

The qemu-nbd client doesn't currently request fixed new style
protocol, but this change won't break qemu-nbd, because it
fortunately only ever uses NBD_OPT_EXPORT_NAME, so was never
triggering the non-compliant server behaviour.

Signed-off-by: Daniel P. Berrange 
---
 nbd/server.c | 69 
 1 file changed, 46 insertions(+), 23 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 15aa03d..074a1e6 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -310,6 +310,7 @@ fail:
 static int nbd_negotiate_options(NBDClient *client)
 {
 uint32_t flags;
+bool fixedNewstyle = false;
 
 /* Client sends:
 [ 0 ..   3]   client flags
@@ -332,14 +333,19 @@ static int nbd_negotiate_options(NBDClient *client)
 }
 TRACE("Checking client flags");
 be32_to_cpus(&flags);
-if (flags != 0 && flags != NBD_FLAG_C_FIXED_NEWSTYLE) {
-LOG("Bad client flags received");
+if (flags & NBD_FLAG_C_FIXED_NEWSTYLE) {
+TRACE("Support supports fixed newstyle handshake");
+fixedNewstyle = true;
+flags &= ~NBD_FLAG_C_FIXED_NEWSTYLE;
+}
+if (flags != 0) {
+TRACE("Unknown client flags 0x%x received", flags);
 return -EIO;
 }
 
 while (1) {
 int ret;
-uint32_t tmp, length;
+uint32_t clientflags, length;
 uint64_t magic;
 
 if (nbd_negotiate_read(client->ioc, &magic, sizeof(magic)) !=
@@ -353,10 +359,12 @@ static int nbd_negotiate_options(NBDClient *client)
 return -EINVAL;
 }
 
-if (nbd_negotiate_read(client->ioc, &tmp, sizeof(tmp)) != sizeof(tmp)) 
{
+if (nbd_negotiate_read(client->ioc, &clientflags,
+   sizeof(clientflags)) != sizeof(clientflags)) {
 LOG("read failed");
 return -EINVAL;
 }
+clientflags = be32_to_cpu(clientflags);
 
 if (nbd_negotiate_read(client->ioc, &length, sizeof(length)) !=
 sizeof(length)) {
@@ -365,26 +373,41 @@ static int nbd_negotiate_options(NBDClient *client)
 }
 length = be32_to_cpu(length);
 
-TRACE("Checking option");
-switch (be32_to_cpu(tmp)) {
-case NBD_OPT_LIST:
-ret = nbd_negotiate_handle_list(client, length);
-if (ret < 0) {
-return ret;
+TRACE("Checking option 0x%x", clientflags);
+if (fixedNewstyle) {
+switch (clientflags) {
+case NBD_OPT_LIST:
+ret = nbd_negotiate_handle_list(client, length);
+if (ret < 0) {
+return ret;
+}
+break;
+
+case NBD_OPT_ABORT:
+return -EINVAL;
+
+case NBD_OPT_EXPORT_NAME:
+return nbd_negotiate_handle_export_name(client, length);
+
+default:
+TRACE("Unsupported option 0x%x", clientflags);
+nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_UNSUP,
+   clientflags);
+return -EINVAL;
+}
+} else {
+/*
+ * If broken new-style we should drop the connection
+ * for anything except NBD_OPT_EXPORT_NAME
+ */
+switch (clientflags) {
+case NBD_OPT_EXPORT_NAME:
+return nbd_negotiate_handle_export_name(client, length);
+
+default:
+TRACE("Unsupported option 0x%x", clientflags);
+return -EINVAL;
 }
-break;
-
-case NBD_OPT_ABORT:
-return -EINVAL;
-
-case NBD_OPT_EXPORT_NAME:
-return nbd_negotiate_handle_export_name(client, length);
-
-default:
-tmp = be32_to_cpu(tmp);
-LOG("Unsupported option 0x%x", tmp);
-nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_UNSUP, tmp);
-return -EINVAL;
 }
 }
 }
-- 
2.5.0




[Qemu-devel] [PATCH v6 07/16] nbd: invert client logic for negotiating protocol version

2016-02-10 Thread Daniel P. Berrange
The nbd_receive_negotiate() method takes different code
paths based on whether 'name == NULL', and then checks
the expected protocol version in each branch.

This patch inverts the logic, so that it takes different
code paths based on what protocol version it receives and
then checks if name is NULL or not as needed.

This facilitates later code which allows the client to
be capable of using the new style protocol regardless
of whether an export name is listed or not.

Signed-off-by: Daniel P. Berrange 
---
 nbd/client.c | 60 +---
 1 file changed, 29 insertions(+), 31 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index f6dd0a3..5064788 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -116,20 +116,11 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
*name, uint32_t *flags,
 magic = be64_to_cpu(magic);
 TRACE("Magic is 0x%" PRIx64, magic);
 
-if (name) {
+if (magic == NBD_OPTS_MAGIC) {
 uint32_t reserved = 0;
 uint32_t opt;
 uint32_t namesize;
 
-TRACE("Checking magic (opts_magic)");
-if (magic != NBD_OPTS_MAGIC) {
-if (magic == NBD_CLIENT_MAGIC) {
-error_setg(errp, "Server does not support export names");
-} else {
-error_setg(errp, "Bad magic received");
-}
-goto fail;
-}
 if (read_sync(ioc, &tmp, sizeof(tmp)) != sizeof(tmp)) {
 error_setg(errp, "Failed to read server flags");
 goto fail;
@@ -142,6 +133,10 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
*name, uint32_t *flags,
 goto fail;
 }
 /* write the export name */
+if (!name) {
+error_setg(errp, "Server requires an export name");
+goto fail;
+}
 magic = cpu_to_be64(magic);
 if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
 error_setg(errp, "Failed to send export name magic");
@@ -162,39 +157,42 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
*name, uint32_t *flags,
 error_setg(errp, "Failed to send export name");
 goto fail;
 }
-} else {
-TRACE("Checking magic (cli_magic)");
 
-if (magic != NBD_CLIENT_MAGIC) {
-if (magic == NBD_OPTS_MAGIC) {
-error_setg(errp, "Server requires an export name");
-} else {
-error_setg(errp, "Bad magic received");
-}
+if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
+error_setg(errp, "Failed to read export length");
 goto fail;
 }
-}
+*size = be64_to_cpu(s);
+TRACE("Size is %" PRIu64, *size);
 
-if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
-error_setg(errp, "Failed to read export length");
-goto fail;
-}
-*size = be64_to_cpu(s);
-TRACE("Size is %" PRIu64, *size);
+if (read_sync(ioc, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+error_setg(errp, "Failed to read export flags");
+goto fail;
+}
+*flags |= be16_to_cpu(tmp);
+} else if (magic == NBD_CLIENT_MAGIC) {
+if (name) {
+error_setg(errp, "Server does not support export names");
+goto fail;
+}
+
+if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
+error_setg(errp, "Failed to read export length");
+goto fail;
+}
+*size = be64_to_cpu(s);
+TRACE("Size is %" PRIu64, *size);
 
-if (!name) {
 if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
 error_setg(errp, "Failed to read export flags");
 goto fail;
 }
 *flags = be32_to_cpup(flags);
 } else {
-if (read_sync(ioc, &tmp, sizeof(tmp)) != sizeof(tmp)) {
-error_setg(errp, "Failed to read export flags");
-goto fail;
-}
-*flags |= be16_to_cpu(tmp);
+error_setg(errp, "Bad magic received");
+goto fail;
 }
+
 if (read_sync(ioc, &buf, 124) != 124) {
 error_setg(errp, "Failed to read reserved block");
 goto fail;
-- 
2.5.0




[Qemu-devel] [PATCH v6 12/16] nbd: use "" as a default export name if none provided

2016-02-10 Thread Daniel P. Berrange
If the user does not provide an export name and the server
is running the new style protocol, where export names are
mandatory, use "" as the default export name if the user
has not specified any. "" is defined in the NBD protocol
as the default name to use in such scenarios.

Signed-off-by: Daniel P. Berrange 
---
 nbd/client.c | 4 ++--
 nbd/server.c | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index be5f08d..5e47ac7 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -315,8 +315,8 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
*name, uint32_t *flags,
 goto fail;
 }
 if (!name) {
-error_setg(errp, "Server requires an export name");
-goto fail;
+TRACE("Using default NBD export name \"\"");
+name = "";
 }
 if (fixedNewStyle) {
 /* Check our desired export is present in the
diff --git a/nbd/server.c b/nbd/server.c
index 3d2fb10..9fee1d4 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -220,6 +220,7 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, 
NBDExport *exp)
 uint64_t magic, name_len;
 uint32_t opt, type, len;
 
+TRACE("Advertizing export name '%s'", exp->name ? exp->name : "");
 name_len = strlen(exp->name);
 magic = cpu_to_be64(NBD_REP_MAGIC);
 if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-- 
2.5.0




[Qemu-devel] [PATCH v6 06/16] nbd: convert to using I/O channels for actual socket I/O

2016-02-10 Thread Daniel P. Berrange
Now that all callers are converted to use I/O channels for
initial connection setup, it is possible to switch the core
NBD protocol handling core over to use QIOChannel APIs for
actual sockets I/O.

Signed-off-by: Daniel P. Berrange 
---
 block/nbd-client.c  |  19 +++
 blockdev-nbd.c  |   6 +--
 include/block/nbd.h |  20 ---
 nbd/client.c|  40 +++---
 nbd/common.c|  68 ++--
 nbd/nbd-internal.h  |  18 +++
 nbd/server.c| 150 +---
 qemu-nbd.c  |  10 ++--
 8 files changed, 180 insertions(+), 151 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index c4379a9..75bb2d9 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -80,7 +80,7 @@ static void nbd_reply_ready(void *opaque)
  * that another thread has done the same thing in parallel, so
  * the socket is not readable anymore.
  */
-ret = nbd_receive_reply(s->sioc->fd, &s->reply);
+ret = nbd_receive_reply(s->ioc, &s->reply);
 if (ret == -EAGAIN) {
 return;
 }
@@ -131,6 +131,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
 }
 }
 
+g_assert(qemu_in_coroutine());
 assert(i < MAX_NBD_REQUESTS);
 request->handle = INDEX_TO_HANDLE(s, i);
 
@@ -146,17 +147,17 @@ static int nbd_co_send_request(BlockDriverState *bs,
nbd_reply_ready, nbd_restart_write, bs);
 if (qiov) {
 qio_channel_set_cork(s->ioc, true);
-rc = nbd_send_request(s->sioc->fd, request);
+rc = nbd_send_request(s->ioc, request);
 if (rc >= 0) {
-ret = qemu_co_sendv(s->sioc->fd, qiov->iov, qiov->niov,
-offset, request->len);
+ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov,
+   offset, request->len, 0);
 if (ret != request->len) {
 rc = -EIO;
 }
 }
 qio_channel_set_cork(s->ioc, false);
 } else {
-rc = nbd_send_request(s->sioc->fd, request);
+rc = nbd_send_request(s->ioc, request);
 }
 aio_set_fd_handler(aio_context, s->sioc->fd, false,
nbd_reply_ready, NULL, bs);
@@ -180,8 +181,8 @@ static void nbd_co_receive_reply(NbdClientSession *s,
 reply->error = EIO;
 } else {
 if (qiov && reply->error == 0) {
-ret = qemu_co_recvv(s->sioc->fd, qiov->iov, qiov->niov,
-offset, request->len);
+ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov,
+   offset, request->len, 1);
 if (ret != request->len) {
 reply->error = EIO;
 }
@@ -388,7 +389,7 @@ void nbd_client_close(BlockDriverState *bs)
 return;
 }
 
-nbd_send_request(client->sioc->fd, &request);
+nbd_send_request(client->ioc, &request);
 
 nbd_teardown_connection(bs);
 }
@@ -403,7 +404,7 @@ int nbd_client_init(BlockDriverState *bs, QIOChannelSocket 
*sioc,
 logout("session init %s\n", export);
 qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
 
-ret = nbd_receive_negotiate(sioc->fd, export,
+ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), export,
 &client->nbdflags, &client->size, errp);
 if (ret < 0) {
 logout("Failed to negotiate with the NBD server\n");
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 9baf883..2148753 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -27,7 +27,6 @@ static gboolean nbd_accept(QIOChannel *ioc, GIOCondition 
condition,
gpointer opaque)
 {
 QIOChannelSocket *cioc;
-int fd;
 
 cioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc),
  NULL);
@@ -35,10 +34,7 @@ static gboolean nbd_accept(QIOChannel *ioc, GIOCondition 
condition,
 return TRUE;
 }
 
-fd = dup(cioc->fd);
-if (fd >= 0) {
-nbd_client_new(NULL, fd, nbd_client_put);
-}
+nbd_client_new(NULL, cioc, nbd_client_put);
 object_unref(OBJECT(cioc));
 return TRUE;
 }
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 7eccb41..1080ef8 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -23,6 +23,7 @@
 
 #include "qemu-common.h"
 #include "qemu/option.h"
+#include "io/channel-socket.h"
 
 struct nbd_request {
 uint32_t magic;
@@ -73,12 +74,17 @@ enum {
 /* Maximum size of a single READ/WRITE data buffer */
 #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
 
-ssize_t nbd_wr_sync(int fd, void *buffer, size_t size, bool do_read);
-int nbd_receive_negotiate(int csock, const char *name, uint32_t *flags,
+ssize_t nbd_wr_syncv(QIOChannel *ioc,
+ struct iovec *iov,
+ size_t niov,
+ size_t offset,
+ size_t length,
+ bool do_read);
+int

[Qemu-devel] [PATCH v6 10/16] nbd: allow setting of an export name for qemu-nbd server

2016-02-10 Thread Daniel P. Berrange
The qemu-nbd server currently always uses the old style protocol
since it never sets any export name. This is a problem because
future TLS support will require use of the new style protocol
negotiation.

This adds "--exportname NAME" / "-x NAME" arguments to qemu-nbd
which allow the user to set an explicit export name. When an
export name is set the server will always use the new style
NBD protocol.

Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c| 14 --
 qemu-nbd.texi |  3 +++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 5e54290..9710a26 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -44,6 +44,7 @@
 #define QEMU_NBD_OPT_OBJECT5
 
 static NBDExport *exp;
+static bool newproto;
 static int verbose;
 static char *srcpath;
 static SocketAddress *saddr;
@@ -339,7 +340,7 @@ static gboolean nbd_accept(QIOChannel *ioc, GIOCondition 
cond, gpointer opaque)
 
 nb_fds++;
 nbd_update_server_watch();
-nbd_client_new(exp, cioc, nbd_client_closed);
+nbd_client_new(newproto ? NULL : exp, cioc, nbd_client_closed);
 object_unref(OBJECT(cioc));
 
 return TRUE;
@@ -413,7 +414,7 @@ int main(int argc, char **argv)
 off_t fd_size;
 QemuOpts *sn_opts = NULL;
 const char *sn_id_or_name = NULL;
-const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:";
+const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:";
 struct option lopt[] = {
 { "help", 0, NULL, 'h' },
 { "version", 0, NULL, 'V' },
@@ -437,6 +438,7 @@ int main(int argc, char **argv)
 { "persistent", 0, NULL, 't' },
 { "verbose", 0, NULL, 'v' },
 { "object", 1, NULL, QEMU_NBD_OPT_OBJECT },
+{ "export-name", 1, NULL, 'x' },
 { NULL, 0, NULL, 0 }
 };
 int ch;
@@ -453,6 +455,7 @@ int main(int argc, char **argv)
 Error *local_err = NULL;
 BlockdevDetectZeroesOptions detect_zeroes = 
BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
 QDict *options = NULL;
+const char *export_name = NULL;
 
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
@@ -604,6 +607,9 @@ int main(int argc, char **argv)
 case 't':
 persistent = 1;
 break;
+case 'x':
+export_name = optarg;
+break;
 case 'v':
 verbose = 1;
 break;
@@ -783,6 +789,10 @@ int main(int argc, char **argv)
 error_report_err(local_err);
 exit(EXIT_FAILURE);
 }
+if (export_name) {
+nbd_export_set_name(exp, export_name);
+newproto = true;
+}
 
 server_ioc = qio_channel_socket_new();
 if (qio_channel_socket_listen_sync(server_ioc, saddr, &local_err) < 0) {
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index a56ebc3..2516963 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -73,6 +73,9 @@ Disconnect the device @var{dev}
 Allow up to @var{num} clients to share the device (default @samp{1})
 @item -t, --persistent
 Don't exit on the last connection
+@item -x NAME, --export-name=NAME
+Set the NDB volume export name. This switches the server to use
+the new style NBD protocol negotiation
 @item -v, --verbose
 Display extra debugging information
 @item -h, --help
-- 
2.5.0




[Qemu-devel] [PATCH v6 09/16] nbd: make client request fixed new style if advertized

2016-02-10 Thread Daniel P. Berrange
If the server advertizes support for the fixed new style
negotiation, the client should in turn enable new style.
This will allow the client to negotiate further NBD
options besides the export name.

Signed-off-by: Daniel P. Berrange 
---
 nbd/client.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 5064788..88f2ada 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -76,7 +76,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, 
uint32_t *flags,
 {
 char buf[256];
 uint64_t magic, s;
-uint16_t tmp;
 int rc;
 
 TRACE("Receiving negotiation.");
@@ -117,19 +116,26 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
*name, uint32_t *flags,
 TRACE("Magic is 0x%" PRIx64, magic);
 
 if (magic == NBD_OPTS_MAGIC) {
-uint32_t reserved = 0;
+uint32_t clientflags = 0;
 uint32_t opt;
 uint32_t namesize;
+uint16_t globalflags;
+uint16_t exportflags;
 
-if (read_sync(ioc, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+if (read_sync(ioc, &globalflags, sizeof(globalflags)) !=
+sizeof(globalflags)) {
 error_setg(errp, "Failed to read server flags");
 goto fail;
 }
-*flags = be16_to_cpu(tmp) << 16;
-/* reserved for future use */
-if (write_sync(ioc, &reserved, sizeof(reserved)) !=
-sizeof(reserved)) {
-error_setg(errp, "Failed to read reserved field");
+*flags = be16_to_cpu(globalflags) << 16;
+if (globalflags & NBD_FLAG_FIXED_NEWSTYLE) {
+TRACE("Server supports fixed new style");
+clientflags |= NBD_FLAG_C_FIXED_NEWSTYLE;
+}
+/* client requested flags */
+if (write_sync(ioc, &clientflags, sizeof(clientflags)) !=
+sizeof(clientflags)) {
+error_setg(errp, "Failed to send clientflags field");
 goto fail;
 }
 /* write the export name */
@@ -165,11 +171,12 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
*name, uint32_t *flags,
 *size = be64_to_cpu(s);
 TRACE("Size is %" PRIu64, *size);
 
-if (read_sync(ioc, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+if (read_sync(ioc, &exportflags, sizeof(exportflags)) !=
+sizeof(exportflags)) {
 error_setg(errp, "Failed to read export flags");
 goto fail;
 }
-*flags |= be16_to_cpu(tmp);
+*flags |= be16_to_cpu(exportflags);
 } else if (magic == NBD_CLIENT_MAGIC) {
 if (name) {
 error_setg(errp, "Server does not support export names");
-- 
2.5.0




[Qemu-devel] [PATCH v6 01/16] qom: add helpers for UserCreatable object types

2016-02-10 Thread Daniel P. Berrange
The QMP monitor code has two helper methods object_add
and qmp_object_del that are called from several places
in the code (QMP, HMP and main emulator startup).

The HMP and main emulator startup code also share
further logic that extracts the qom-type & id
values from a qdict.

We soon need to use this logic from qemu-img, qemu-io
and qemu-nbd too, but don't want those to depend on
the monitor, nor do we want to duplicate the code.

To avoid this, move some code out of qmp.c and hmp.c
adding new methods to qom/object_interfaces.c

 - user_creatable_add - takes a QDict holding a full
   object definition & instantiates it
 - user_creatable_add_type - takes an ID, type name,
   and QDict holding object properties & instantiates
   it
 - user_creatable_add_opts - takes a QemuOpts holding
   a full object definition & instantiates it
 - user_creatable_add_opts_foreach - variant on
   user_creatable_add_opts which can be directly used
   in conjunction with qemu_opts_foreach.
 - user_creatable_del - takes an ID and deletes the
   corresponding object

The existing code is updated to use these new methods.

Signed-off-by: Daniel P. Berrange 
---
 hmp.c   |  52 +++-
 include/monitor/monitor.h   |   3 -
 include/qom/object_interfaces.h |  92 +
 qmp.c   |  76 ++
 qom/object_interfaces.c | 174 
 vl.c|  68 ++--
 6 files changed, 290 insertions(+), 175 deletions(-)

diff --git a/hmp.c b/hmp.c
index c6419da..41fb9ca 100644
--- a/hmp.c
+++ b/hmp.c
@@ -30,6 +30,7 @@
 #include "qapi/string-output-visitor.h"
 #include "qapi/util.h"
 #include "qapi-visit.h"
+#include "qom/object_interfaces.h"
 #include "ui/console.h"
 #include "block/qapi.h"
 #include "qemu-io.h"
@@ -1655,58 +1656,27 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict)
 void hmp_object_add(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
-Error *err_end = NULL;
 QemuOpts *opts;
-char *type = NULL;
-char *id = NULL;
 OptsVisitor *ov;
-QDict *pdict;
-Visitor *v;
+Object *obj = NULL;
 
 opts = qemu_opts_from_qdict(qemu_find_opts("object"), qdict, &err);
 if (err) {
-goto out;
+hmp_handle_error(mon, &err);
+return;
 }
 
 ov = opts_visitor_new(opts);
-pdict = qdict_clone_shallow(qdict);
-v = opts_get_visitor(ov);
-
-visit_start_struct(v, NULL, NULL, 0, &err);
-if (err) {
-goto out_clean;
-}
-
-qdict_del(pdict, "qom-type");
-visit_type_str(v, "qom-type", &type, &err);
-if (err) {
-goto out_end;
-}
+obj = user_creatable_add(qdict, opts_get_visitor(ov), &err);
+opts_visitor_cleanup(ov);
+qemu_opts_del(opts);
 
-qdict_del(pdict, "id");
-visit_type_str(v, "id", &id, &err);
 if (err) {
-goto out_end;
+hmp_handle_error(mon, &err);
 }
-
-object_add(type, id, pdict, v, &err);
-
-out_end:
-visit_end_struct(v, &err_end);
-if (!err && err_end) {
-qmp_object_del(id, NULL);
+if (obj) {
+object_unref(obj);
 }
-error_propagate(&err, err_end);
-out_clean:
-opts_visitor_cleanup(ov);
-
-QDECREF(pdict);
-qemu_opts_del(opts);
-g_free(id);
-g_free(type);
-
-out:
-hmp_handle_error(mon, &err);
 }
 
 void hmp_getfd(Monitor *mon, const QDict *qdict)
@@ -1934,7 +1904,7 @@ void hmp_object_del(Monitor *mon, const QDict *qdict)
 const char *id = qdict_get_str(qdict, "id");
 Error *err = NULL;
 
-qmp_object_del(id, &err);
+user_creatable_del(id, &err);
 hmp_handle_error(mon, &err);
 }
 
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 91b95ae..aa0f373 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -43,9 +43,6 @@ void monitor_read_command(Monitor *mon, int show_prompt);
 int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func,
   void *opaque);
 
-void object_add(const char *type, const char *id, const QDict *qdict,
-Visitor *v, Error **errp);
-
 AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
 bool has_opaque, const char *opaque,
 Error **errp);
diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index 283ae0d..d579746 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -2,6 +2,8 @@
 #define OBJECT_INTERFACES_H
 
 #include "qom/object.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/visitor.h"
 
 #define TYPE_USER_CREATABLE "user-creatable"
 
@@ -72,4 +74,94 @@ void user_creatable_complete(Object *obj, Error **errp);
  * from implements USER_CREATABLE interface.
  */
 bool user_creatable_can_be_deleted(UserCreatable *uc, Error **errp);
+
+/**
+ * user_creatable_add:
+ * @qdict: the object definitio

[Qemu-devel] [PATCH v6 04/16] nbd: convert qemu-nbd server to use I/O channels for connection setup

2016-02-10 Thread Daniel P. Berrange
This converts the qemu-nbd server to use the QIOChannelSocket
class for initial listener socket setup and accepting of client
connections. Actual I/O is still being performed against the
socket file descriptor using the POSIX socket APIs.

In this initial conversion though, all I/O is still actually done
using the raw POSIX sockets APIs.

Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c | 91 --
 1 file changed, 53 insertions(+), 38 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 130c306..bc309e0 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -22,19 +22,17 @@
 #include "block/block_int.h"
 #include "block/nbd.h"
 #include "qemu/main-loop.h"
-#include "qemu/sockets.h"
 #include "qemu/error-report.h"
 #include "qemu/config-file.h"
 #include "block/snapshot.h"
 #include "qapi/util.h"
 #include "qapi/qmp/qstring.h"
 #include "qom/object_interfaces.h"
+#include "io/channel-socket.h"
 
 #include 
-#include 
-#include 
-#include 
-#include 
+#include 
+#include 
 #include 
 #include 
 
@@ -53,7 +51,8 @@ static int persistent = 0;
 static enum { RUNNING, TERMINATE, TERMINATING, TERMINATED } state;
 static int shared = 1;
 static int nb_fds;
-static int server_fd;
+static QIOChannelSocket *server_ioc;
+static int server_watch = -1;
 
 static void usage(const char *name)
 {
@@ -236,19 +235,21 @@ static void *nbd_client_thread(void *arg)
 char *device = arg;
 off_t size;
 uint32_t nbdflags;
-int fd, sock;
+QIOChannelSocket *sioc;
+int fd;
 int ret;
 pthread_t show_parts_thread;
 Error *local_error = NULL;
 
-
-sock = socket_connect(saddr, &local_error, NULL, NULL);
-if (sock < 0) {
+sioc = qio_channel_socket_new();
+if (qio_channel_socket_connect_sync(sioc,
+saddr,
+&local_error) < 0) {
 error_report_err(local_error);
 goto out;
 }
 
-ret = nbd_receive_negotiate(sock, NULL, &nbdflags,
+ret = nbd_receive_negotiate(sioc->fd, NULL, &nbdflags,
 &size, &local_error);
 if (ret < 0) {
 if (local_error) {
@@ -264,7 +265,7 @@ static void *nbd_client_thread(void *arg)
 goto out_socket;
 }
 
-ret = nbd_init(fd, sock, nbdflags, size);
+ret = nbd_init(fd, sioc->fd, nbdflags, size);
 if (ret < 0) {
 goto out_fd;
 }
@@ -285,13 +286,14 @@ static void *nbd_client_thread(void *arg)
 goto out_fd;
 }
 close(fd);
+object_unref(OBJECT(sioc));
 kill(getpid(), SIGTERM);
 return (void *) EXIT_SUCCESS;
 
 out_fd:
 close(fd);
 out_socket:
-closesocket(sock);
+object_unref(OBJECT(sioc));
 out:
 kill(getpid(), SIGTERM);
 return (void *) EXIT_FAILURE;
@@ -308,7 +310,7 @@ static void nbd_export_closed(NBDExport *exp)
 state = TERMINATED;
 }
 
-static void nbd_update_server_fd_handler(int fd);
+static void nbd_update_server_watch(void);
 
 static void nbd_client_closed(NBDClient *client)
 {
@@ -316,37 +318,51 @@ static void nbd_client_closed(NBDClient *client)
 if (nb_fds == 0 && !persistent && state == RUNNING) {
 state = TERMINATE;
 }
-nbd_update_server_fd_handler(server_fd);
+nbd_update_server_watch();
 nbd_client_put(client);
 }
 
-static void nbd_accept(void *opaque)
+static gboolean nbd_accept(QIOChannel *ioc, GIOCondition cond, gpointer opaque)
 {
-struct sockaddr_in addr;
-socklen_t addr_len = sizeof(addr);
+QIOChannelSocket *cioc;
+int fd;
 
-int fd = accept(server_fd, (struct sockaddr *)&addr, &addr_len);
-if (fd < 0) {
-perror("accept");
-return;
+cioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc),
+ NULL);
+if (!cioc) {
+return TRUE;
 }
 
 if (state >= TERMINATE) {
-close(fd);
-return;
+object_unref(OBJECT(cioc));
+return TRUE;
 }
 
 nb_fds++;
-nbd_update_server_fd_handler(server_fd);
-nbd_client_new(exp, fd, nbd_client_closed);
+nbd_update_server_watch();
+fd = dup(cioc->fd);
+if (fd >= 0) {
+nbd_client_new(exp, fd, nbd_client_closed);
+}
+object_unref(OBJECT(cioc));
+
+return TRUE;
 }
 
-static void nbd_update_server_fd_handler(int fd)
+static void nbd_update_server_watch(void)
 {
 if (nbd_can_accept()) {
-qemu_set_fd_handler(fd, nbd_accept, NULL, (void *)(uintptr_t)fd);
+if (server_watch == -1) {
+server_watch = qio_channel_add_watch(QIO_CHANNEL(server_ioc),
+ G_IO_IN,
+ nbd_accept,
+ NULL, NULL);
+}
 } else {
-qemu_set_fd_handler(fd, NULL, NULL, NULL);
+if (server_watch != -1) {
+g_source_remove(server_watch);
+server_watch = -1;
+}
 }

[Qemu-devel] [PATCH v6 05/16] nbd: convert blockdev NBD server to use I/O channels for connection setup

2016-02-10 Thread Daniel P. Berrange
This converts the blockdev NBD server to use the QIOChannelSocket
class for initial listener socket setup and accepting of client
connections. Actual I/O is still being performed against the
socket file descriptor using the POSIX socket APIs.

Signed-off-by: Daniel P. Berrange 
---
 blockdev-nbd.c | 49 ++---
 1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index efc31a4..9baf883 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -18,32 +18,48 @@
 #include "qmp-commands.h"
 #include "trace.h"
 #include "block/nbd.h"
-#include "qemu/sockets.h"
+#include "io/channel-socket.h"
 
-static int server_fd = -1;
+static QIOChannelSocket *server_ioc;
+static int server_watch = -1;
 
-static void nbd_accept(void *opaque)
+static gboolean nbd_accept(QIOChannel *ioc, GIOCondition condition,
+   gpointer opaque)
 {
-struct sockaddr_in addr;
-socklen_t addr_len = sizeof(addr);
+QIOChannelSocket *cioc;
+int fd;
 
-int fd = accept(server_fd, (struct sockaddr *)&addr, &addr_len);
+cioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc),
+ NULL);
+if (!cioc) {
+return TRUE;
+}
+
+fd = dup(cioc->fd);
 if (fd >= 0) {
 nbd_client_new(NULL, fd, nbd_client_put);
 }
+object_unref(OBJECT(cioc));
+return TRUE;
 }
 
 void qmp_nbd_server_start(SocketAddress *addr, Error **errp)
 {
-if (server_fd != -1) {
+if (server_ioc) {
 error_setg(errp, "NBD server already running");
 return;
 }
 
-server_fd = socket_listen(addr, errp);
-if (server_fd != -1) {
-qemu_set_fd_handler(server_fd, nbd_accept, NULL, NULL);
+server_ioc = qio_channel_socket_new();
+if (qio_channel_socket_listen_sync(server_ioc, addr, errp) < 0) {
+return;
 }
+
+server_watch = qio_channel_add_watch(QIO_CHANNEL(server_ioc),
+ G_IO_IN,
+ nbd_accept,
+ NULL,
+ NULL);
 }
 
 void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
@@ -52,7 +68,7 @@ void qmp_nbd_server_add(const char *device, bool 
has_writable, bool writable,
 BlockBackend *blk;
 NBDExport *exp;
 
-if (server_fd == -1) {
+if (!server_ioc) {
 error_setg(errp, "NBD server not running");
 return;
 }
@@ -98,9 +114,12 @@ void qmp_nbd_server_stop(Error **errp)
 {
 nbd_export_close_all();
 
-if (server_fd != -1) {
-qemu_set_fd_handler(server_fd, NULL, NULL, NULL);
-close(server_fd);
-server_fd = -1;
+if (server_watch != -1) {
+g_source_remove(server_watch);
+server_watch = -1;
+}
+if (server_ioc) {
+object_unref(OBJECT(server_ioc));
+server_ioc = NULL;
 }
 }
-- 
2.5.0




[Qemu-devel] [PATCH v6 03/16] nbd: convert block client to use I/O channels for connection setup

2016-02-10 Thread Daniel P. Berrange
This converts the NBD block driver client to use the QIOChannelSocket
class for initial connection setup. The NbdClientSession struct has
two pointers, one to the master QIOChannelSocket providing the raw
data channel, and one to a QIOChannel which is the current channel
used for I/O. Initially the two point to the same object, but when
TLS support is added, they will point to different objects.

The qemu-img & qemu-io tools now need to use MODULE_INIT_QOM to
ensure the QIOChannel object classes are registered. The qemu-nbd
tool already did this.

In this initial conversion though, all I/O is still actually done
using the raw POSIX sockets APIs.

Signed-off-by: Daniel P. Berrange 
---
 Makefile   |  6 ++---
 block/nbd-client.c | 76 +-
 block/nbd-client.h |  8 --
 block/nbd.c| 39 ++--
 qemu-img.c |  1 +
 qemu-io.c  |  1 +
 tests/Makefile |  2 +-
 7 files changed, 79 insertions(+), 54 deletions(-)

diff --git a/Makefile b/Makefile
index 30b1b2d..ce3d64d 100644
--- a/Makefile
+++ b/Makefile
@@ -234,9 +234,9 @@ util/module.o-cflags = 
-D'CONFIG_BLOCK_MODULES=$(block-modules)'
 
 qemu-img.o: qemu-img-cmds.h
 
-qemu-img$(EXESUF): qemu-img.o $(block-obj-y) $(crypto-obj-y) $(qom-obj-y) 
libqemuutil.a libqemustub.a
-qemu-nbd$(EXESUF): qemu-nbd.o $(block-obj-y) $(crypto-obj-y) $(qom-obj-y) 
libqemuutil.a libqemustub.a
-qemu-io$(EXESUF): qemu-io.o $(block-obj-y) $(crypto-obj-y) $(qom-obj-y) 
libqemuutil.a libqemustub.a
+qemu-img$(EXESUF): qemu-img.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) 
$(qom-obj-y) libqemuutil.a libqemustub.a
+qemu-nbd$(EXESUF): qemu-nbd.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) 
$(qom-obj-y) libqemuutil.a libqemustub.a
+qemu-io$(EXESUF): qemu-io.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) 
$(qom-obj-y) libqemuutil.a libqemustub.a
 
 qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o
 
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 568c56c..c4379a9 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -28,7 +28,6 @@
 
 #include "qemu/osdep.h"
 #include "nbd-client.h"
-#include "qemu/sockets.h"
 
 #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
 #define INDEX_TO_HANDLE(bs, index)  ((index)  ^ ((uint64_t)(intptr_t)bs))
@@ -48,13 +47,21 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 {
 NbdClientSession *client = nbd_get_client_session(bs);
 
+if (!client->ioc) { /* Already closed */
+return;
+}
+
 /* finish any pending coroutines */
-shutdown(client->sock, 2);
+qio_channel_shutdown(client->ioc,
+ QIO_CHANNEL_SHUTDOWN_BOTH,
+ NULL);
 nbd_recv_coroutines_enter_all(client);
 
 nbd_client_detach_aio_context(bs);
-closesocket(client->sock);
-client->sock = -1;
+object_unref(OBJECT(client->sioc));
+client->sioc = NULL;
+object_unref(OBJECT(client->ioc));
+client->ioc = NULL;
 }
 
 static void nbd_reply_ready(void *opaque)
@@ -64,12 +71,16 @@ static void nbd_reply_ready(void *opaque)
 uint64_t i;
 int ret;
 
+if (!s->ioc) { /* Already closed */
+return;
+}
+
 if (s->reply.handle == 0) {
 /* No reply already in flight.  Fetch a header.  It is possible
  * that another thread has done the same thing in parallel, so
  * the socket is not readable anymore.
  */
-ret = nbd_receive_reply(s->sock, &s->reply);
+ret = nbd_receive_reply(s->sioc->fd, &s->reply);
 if (ret == -EAGAIN) {
 return;
 }
@@ -122,30 +133,32 @@ static int nbd_co_send_request(BlockDriverState *bs,
 
 assert(i < MAX_NBD_REQUESTS);
 request->handle = INDEX_TO_HANDLE(s, i);
+
+if (!s->ioc) {
+qemu_co_mutex_unlock(&s->send_mutex);
+return -EPIPE;
+}
+
 s->send_coroutine = qemu_coroutine_self();
 aio_context = bdrv_get_aio_context(bs);
 
-aio_set_fd_handler(aio_context, s->sock, false,
+aio_set_fd_handler(aio_context, s->sioc->fd, false,
nbd_reply_ready, nbd_restart_write, bs);
 if (qiov) {
-if (!s->is_unix) {
-socket_set_cork(s->sock, 1);
-}
-rc = nbd_send_request(s->sock, request);
+qio_channel_set_cork(s->ioc, true);
+rc = nbd_send_request(s->sioc->fd, request);
 if (rc >= 0) {
-ret = qemu_co_sendv(s->sock, qiov->iov, qiov->niov,
+ret = qemu_co_sendv(s->sioc->fd, qiov->iov, qiov->niov,
 offset, request->len);
 if (ret != request->len) {
 rc = -EIO;
 }
 }
-if (!s->is_unix) {
-socket_set_cork(s->sock, 0);
-}
+qio_channel_set_cork(s->ioc, false);
 } else {
-rc = nbd_send_request(s->sock, request);
+rc = nbd_send_request(s->sioc->fd, request);
 }
-aio_set_fd_hand

[Qemu-devel] [PATCH v6 02/16] qemu-nbd: add support for --object command line arg

2016-02-10 Thread Daniel P. Berrange
Allow creation of user creatable object types with qemu-nbd
via a new --object command line arg. This will be used to supply
passwords and/or encryption keys to the various block driver
backends via the recently added 'secret' object type.

 # printf letmein > mypasswd.txt
 # qemu-nbd --object secret,id=sec0,file=mypasswd.txt \
  ...other nbd args...

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c| 34 ++
 qemu-nbd.texi |  6 ++
 2 files changed, 40 insertions(+)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index d374015..130c306 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -24,9 +24,11 @@
 #include "qemu/main-loop.h"
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
+#include "qemu/config-file.h"
 #include "block/snapshot.h"
 #include "qapi/util.h"
 #include "qapi/qmp/qstring.h"
+#include "qom/object_interfaces.h"
 
 #include 
 #include 
@@ -41,6 +43,7 @@
 #define QEMU_NBD_OPT_AIO   2
 #define QEMU_NBD_OPT_DISCARD   3
 #define QEMU_NBD_OPT_DETECT_ZEROES 4
+#define QEMU_NBD_OPT_OBJECT5
 
 static NBDExport *exp;
 static int verbose;
@@ -74,6 +77,9 @@ static void usage(const char *name)
 "  -o, --offset=OFFSET   offset into the image\n"
 "  -P, --partition=NUM   only expose partition NUM\n"
 "\n"
+"General purpose options:\n"
+"  --object type,id=ID,...   define an object such as 'secret' for providing\n"
+"passwords and/or encryption keys\n"
 #ifdef __linux__
 "Kernel NBD client support:\n"
 "  -c, --connect=DEV connect FILE to the local NBD device DEV\n"
@@ -371,6 +377,16 @@ static SocketAddress *nbd_build_socket_address(const char 
*sockpath,
 }
 
 
+static QemuOptsList qemu_object_opts = {
+.name = "object",
+.implied_opt_name = "qom-type",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
+.desc = {
+{ }
+},
+};
+
+
 int main(int argc, char **argv)
 {
 BlockBackend *blk;
@@ -408,6 +424,7 @@ int main(int argc, char **argv)
 { "format", 1, NULL, 'f' },
 { "persistent", 0, NULL, 't' },
 { "verbose", 0, NULL, 'v' },
+{ "object", 1, NULL, QEMU_NBD_OPT_OBJECT },
 { NULL, 0, NULL, 0 }
 };
 int ch;
@@ -433,6 +450,8 @@ int main(int argc, char **argv)
 memset(&sa_sigterm, 0, sizeof(sa_sigterm));
 sa_sigterm.sa_handler = termsig_handler;
 sigaction(SIGTERM, &sa_sigterm, NULL);
+module_call_init(MODULE_INIT_QOM);
+qemu_add_opts(&qemu_object_opts);
 qemu_init_exec_dir(argv[0]);
 
 while ((ch = getopt_long(argc, argv, sopt, lopt, &opt_ind)) != -1) {
@@ -588,6 +607,14 @@ int main(int argc, char **argv)
 case '?':
 error_report("Try `%s --help' for more information.", argv[0]);
 exit(EXIT_FAILURE);
+case QEMU_NBD_OPT_OBJECT: {
+QemuOpts *opts;
+opts = qemu_opts_parse_noisily(&qemu_object_opts,
+   optarg, true);
+if (!opts) {
+exit(EXIT_FAILURE);
+}
+}   break;
 }
 }
 
@@ -597,6 +624,13 @@ int main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }
 
+if (qemu_opts_foreach(&qemu_object_opts,
+  user_creatable_add_opts_foreach,
+  NULL, &local_err)) {
+error_report_err(local_err);
+exit(EXIT_FAILURE);
+}
+
 if (disconnect) {
 fd = open(argv[optind], O_RDWR);
 if (fd < 0) {
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 0027841..a56ebc3 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -18,6 +18,12 @@ Export a QEMU disk image using the NBD protocol.
 @var{dev} is an NBD device.
 
 @table @option
+@item --object type,id=@var{id},...props...
+Define a new instance of the @var{type} object class identified by @var{id}.
+See the @code{qemu(1)} manual page for full details of the properties
+supported. The common object type that it makes sense to define is the
+@code{secret} object, which is used to supply passwords and/or encryption
+keys.
 @item -p, --port=@var{port}
 The TCP port to listen on (default @samp{10809})
 @item -o, --offset=@var{offset}
-- 
2.5.0




[Qemu-devel] [PATCH v6 00/16] Implement TLS support to QEMU NBD server & client

2016-02-10 Thread Daniel P. Berrange
This is an update of the series previously posted:

  v1: https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg06126.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01580.html
  v3: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03440.html
v4: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04160.html

This series of patches implements support for TLS in the QEMU NBD
server and client code.

It is implementing the NBD_OPT_STARTTLS option that was previously
discussed here:

  https://www.redhat.com/archives/libvir-list/2014-October/msg00506.html

And is also described in the NBD spec here:

  https://github.com/yoe/nbd/blob/master/doc/proto.md

To ensure that clients always get a suitable error message from the
NBD server when it is configured with TLS, a client speaking the
new style protocol will always send NBD_OPT_LIST as the first thing
it does, so that we can see the NBD_REP_ERR_TLS_REQD response. This
should all be backwards & forwards compatible with previous QEMU
impls of NBD

Usage of TLS is described in the commit messages for each patch,
but for sake of people who don't want to explore the series, here's
the summary

Starting QEMU system emulator with a disk backed by an TLS encrypted
NBD export

  $ qemu-system-x86_64 \
 -object 
tls-creds-x509,id=tls0,endpoint=client,dir=/home/berrange/security/qemutls \
 -drive driver=nbd,host=localhost,port=9000,tls-creds=tls0

Starting a standalone NBD server providing a TLS encrypted NBD export

  $ qemu-nbd \
 --object 
tls-creds-x509,id=tls0,endpoint=server,dir=/home/berrange/security/qemutls
 --tls-creds tls0 \
 --export-name default \
 $IMAGEFILE

The --export-name is optional, if omitted, the default "" will
be used.

Starting a QEMU system emulator built-in NBD server

  $ qemu-system-x86_64 \
 -qmp unix:/tmp/qmp,server \
 -hda /home/berrange/Fedora-Server-netinst-x86_64-23.iso \
 -object 
tls-creds-x509,id=tls0,dir=/home/berrange/security/qemutls,endpoint=server

  $ qmp-shell /tmp/qmp
 (qmp) nbd-server-start addr={"host":"localhost","port":"9000"}
 tls-creds=tls0
 (qmp) nbd-server-add device=ide0-hd0

The first 2 patches are taken from this other pending patch
series in order to facilitate merge:

  https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg00296.html

The first 4 patches are the conversion to the I/O channels
framework.

The next 6 patches are general tweaks to QEMU's impl of the
NBD protocol for better compliance and/or future proofing.

The next patch provides the NBD protocol TLS implementation.

The final 3 patches allow TLS to be enabled in the QEMU NBD
client and servers.

Changed in v6:

 - Rebase to resolve conflicts with recent qapi changes (Eric)
   and recent nbd changes (Paolo) which merged
 - Add MODULE_INIT_QOM to qemu-io & qemu-img to ensure they
   can load the QIOChannel object types

Changed in v5:

 - Pulled in https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg00297.html
   and applied fixes for issues Eric mentioned in that review
 - Pulled in https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg00302.html
 - Rebased to latest git master

Changed in v4:

 - Don't pick the first export name in the list if no export
   name is provided (Paolo)
 - Set client requested export name to "" if none is provided
   by the user (Paolo)
 - Set server advertized export name to "" if TLS is enabled
   and none is provided by the user (Paolo)
 - Rename qemu-nbd --exportname to --export-name (Paolo)
 - Use iov_discard_front() to simplify iov handling (Paolo)

Changed in v3:

 - Rebase to resolve conflicts with recently merged NBD patches

Changed in v2:

 - Fix error codes used during NBD TLS option negotiate
 - Update patch with helpers for UserCreatable object types

Daniel P. Berrange (16):
  qom: add helpers for UserCreatable object types
  qemu-nbd: add support for --object command line arg
  nbd: convert block client to use I/O channels for connection setup
  nbd: convert qemu-nbd server to use I/O channels for connection setup
  nbd: convert blockdev NBD server to use I/O channels for connection
setup
  nbd: convert to using I/O channels for actual socket I/O
  nbd: invert client logic for negotiating protocol version
  nbd: make server compliant with fixed newstyle spec
  nbd: make client request fixed new style if advertized
  nbd: allow setting of an export name for qemu-nbd server
  nbd: always query export list in fixed new style protocol
  nbd: use "" as a default export name if none provided
  nbd: implement TLS support in the protocol negotiation
  nbd: enable use of TLS with NBD block driver
  nbd: enable use of TLS with qemu-nbd server
  nbd: enable use of TLS with nbd-server-start command

 Makefile|   6 +-
 block/nbd-client.c  |  91 ++---
 block/nbd-client.h  |  10 +-
 block/nbd.c | 105 --
 blockdev-nbd.c

Re: [Qemu-devel] [PATCH] linux-user: Don't assert if guest tries shmdt(0)

2016-02-10 Thread Laurent Vivier


Le 09/02/2016 16:57, Peter Maydell a écrit :
> Our implementation of shmat() and shmdt() for linux-user was
> using "zero guest address" as its marker for "entry in the
> shm_regions[] array is not in use". This meant that if the
> guest did a shmdt(0) we would match on an unused array entry

Is shmdt(0) valid ?

I mean, if shmat() is called with shmaddr equal to 0:
"the system chooses a suitable (unused) address at which
to attach the segment."

and

"The to-be-detached segment must be currently attached with shmaddr
equal to the value returned by the attaching shmat() call."

Did you check shmat() can return 0 ?
(I think our mmap_find_vma() cannot return 0)

Why don't you fail on shmdt(0) (EINVAL) ?

> and call page_set_flags() with both start and end addresses zero,
> which causes an assertion failure.
> 
> Use an explicit in_use flag to manage the shm_regions[] array,
> so that we avoid this problem.
> 
> Signed-off-by: Peter Maydell 
> Reported-by: Pavel Shamis 
> ---
>  linux-user/syscall.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 54ce14a..f46abf7 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -2598,8 +2598,9 @@ static abi_long do_socketcall(int num, abi_ulong vptr)
>  #define N_SHM_REGIONS32
>  
>  static struct shm_region {
> -abi_ulongstart;
> -abi_ulongsize;
> +abi_ulong start;
> +abi_ulong size;
> +bool in_use;
>  } shm_regions[N_SHM_REGIONS];
>  
>  struct target_semid_ds
> @@ -3291,7 +3292,8 @@ static inline abi_ulong do_shmat(int shmid, abi_ulong 
> shmaddr, int shmflg)
> ((shmflg & SHM_RDONLY)? 0 : PAGE_WRITE));
>  
>  for (i = 0; i < N_SHM_REGIONS; i++) {
> -if (shm_regions[i].start == 0) {
> +if (!shm_regions[i].in_use) {
> +shm_regions[i].in_use = true;
>  shm_regions[i].start = raddr;
>  shm_regions[i].size = shm_info.shm_segsz;
>  break;
> @@ -3308,8 +3310,8 @@ static inline abi_long do_shmdt(abi_ulong shmaddr)
>  int i;
>  
>  for (i = 0; i < N_SHM_REGIONS; ++i) {
> -if (shm_regions[i].start == shmaddr) {
> -shm_regions[i].start = 0;
> +if (shm_regions[i].in_use && shm_regions[i].start == shmaddr) {
> +shm_regions[i].in_use = false;
>  page_set_flags(shmaddr, shmaddr + shm_regions[i].size, 0);
>  break;
>  }
> 



Re: [Qemu-devel] [PATCH v5 6/9] qemu-log: new option -dfilter to limit output

2016-02-10 Thread Alex Bennée

Richard Henderson  writes:

> On 02/11/2016 04:40 AM, Alex Bennée wrote:
>> OK I think this version is a lot cleaner:
>>
>>void qemu_set_dfilter_ranges(const char *filter_spec)
>>{
>>gchar **ranges = g_strsplit(filter_spec, ",", 0);
>>if (ranges) {
>>gchar **next = ranges;
>>gchar *r = *next++;
>>debug_regions = g_array_sized_new(FALSE, FALSE,
>>  sizeof(Range), 
>> g_strv_length(ranges));
>>while (r) {
>>gchar *range_op = g_strstr_len(r, -1, "-");
>>gchar *r2 = range_op ? range_op + 1 : NULL;
>>if (!range_op) {
>>range_op = g_strstr_len(r, -1, "+");
>>r2 = range_op ? range_op + 1 : NULL;
>>}
>>if (!range_op) {
>>range_op = g_strstr_len(r, -1, "..");
>>r2 = range_op ? range_op + 2 : NULL;
>>}
>
> I guess I'll quit quibbling about silly glib functions.  But really, with the
> -1 argument, you gain nothing except obfuscation over using the
> standard C library.

No you are quite right to quibble. It's a hard habit to break because
I've gotten used to glib's arguably more predictable behaviour when
string munging.

>
>>if (range_op) {
>>struct Range range;
>>int err;
>>const char *e = NULL;
>>
>>err = qemu_strtoull(r, &e, 0, &range.begin);
>>
>>g_assert(e == range_op);
>>
>>switch (*range_op) {
>>case '+':
>>{
>>unsigned long len;
>>err |= qemu_strtoull(r2, NULL, 0, &len);
>
> You can't or errno's together and then...
>
>>g_error("Failed to parse range in: %s, %d", r, err);
>
> ... expect to get anything meaningful out of them.

True, I'll drop the %d, I was just trying to avoid having multiple error
handling legs.

>
 +case '+':
 +{
 +unsigned long len;
 +err |= qemu_strtoul(range_val, NULL, 0, &len);
 +range.end = range.begin + len;
 +break;
 +}
 +case '-':
 +{
 +unsigned long len;
 +err |= qemu_strtoul(range_val, NULL, 0, &len);
 +range.end = range.begin;
 +range.begin = range.end - len;
 +break;
 +}
>>>
>>> Both of these have off-by-one bugs, since end is inclusive.
>>
>> Sorry I don't quite follow, do you mean the position of range_val (now
>> r2) or the final value of range.end?
>
> Final value of range.end.  In that
>
> 0x1000..0x1000
> and
> 0x1000+1
>
> should both produce a range that covers a single byte at 0x1000.

Ahh OK. I suppose if I'm being good about this I should add some tests
to defend the ranges. I wonder how easy command line parsing unit tests
are in qtest?

>
>
> r~


--
Alex Bennée



[Qemu-devel] [PATCH 0/4] hw/ppc/spapr: Add "Processor Register Hypervisor Resource Access" H-calls

2016-02-10 Thread Thomas Huth
While we were recently debugging a problem with the H_SET_DABR
call [1], I noticed that some hypercalls from the chapter 14.5.4.3
("Processor Register Hypervisor Resource Access") from the LoPAPR
spec [2] are still missing in QEMU.
So here's are some patches that implement these hypercalls. Linux
apparently does not depend on these hypercalls yet (otherwise somebody
would have noticed this earlier), but the hypercalls are rather simple,
so I think the implementations are quite straight-forward and easy to
read.

[1] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=760a7364f27d974d

[2] https://members.openpowerfoundation.org/document/dl/469

Thomas Huth (4):
  hw/ppc/spapr: Add h_set_sprg0 hypercall
  hw/ppc/spapr: Implement h_set_dabr
  hw/ppc/spapr: Implement the h_set_xdabr hypercall
  hw/ppc/spapr: Implement the h_page_init hypercall

 hw/ppc/spapr_hcall.c | 105 +++
 1 file changed, 98 insertions(+), 7 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [PATCH 1/4] hw/ppc/spapr: Add h_set_sprg0 hypercall

2016-02-10 Thread Thomas Huth
This is a very simple hypercall that only sets up the SPRG0
register for the guest (since writing to SPRG0 was only permitted
to the hypervisor in older versions of the PowerISA).

Signed-off-by: Thomas Huth 
---
 hw/ppc/spapr_hcall.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 12f8c33..58103ef 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -332,6 +332,15 @@ static target_ulong h_read(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 return H_SUCCESS;
 }
 
+static target_ulong h_set_sprg0(PowerPCCPU *cpu, sPAPRMachineState *spapr,
+target_ulong opcode, target_ulong *args)
+{
+CPUState *cs = CPU(cpu);
+
+set_spr(cs, SPR_SPRG0, args[0], -1L);
+return H_SUCCESS;
+}
+
 static target_ulong h_set_dabr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
target_ulong opcode, target_ulong *args)
 {
@@ -997,6 +1006,10 @@ static void hypercall_register_types(void)
 spapr_register_hypercall(H_REGISTER_VPA, h_register_vpa);
 spapr_register_hypercall(H_CEDE, h_cede);
 
+/* processor register resource access h-calls */
+spapr_register_hypercall(H_SET_SPRG0, h_set_sprg0);
+spapr_register_hypercall(H_SET_MODE, h_set_mode);
+
 /* "debugger" hcalls (also used by SLOF). Note: We do -not- differenciate
  * here between the "CI" and the "CACHE" variants, they will use whatever
  * mapping attributes qemu is using. When using KVM, the kernel will
@@ -1013,8 +1026,6 @@ static void hypercall_register_types(void)
 /* qemu/KVM-PPC specific hcalls */
 spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
 
-spapr_register_hypercall(H_SET_MODE, h_set_mode);
-
 /* ibm,client-architecture-support support */
 spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
 }
-- 
1.8.3.1




[Qemu-devel] [PATCH 3/4] hw/ppc/spapr: Implement the h_set_xdabr hypercall

2016-02-10 Thread Thomas Huth
The H_SET_XDABR hypercall is similar to H_SET_DABR, but also sets
the extended DABR (DABRX) register.

Signed-off-by: Thomas Huth 
---
 hw/ppc/spapr_hcall.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 6b9d512..f14f849 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -366,6 +366,27 @@ static target_ulong h_set_dabr(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 return H_SUCCESS;
 }
 
+static target_ulong h_set_xdabr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
+target_ulong opcode, target_ulong *args)
+{
+CPUState *cs = CPU(cpu);
+target_ulong dabrx = args[1];
+
+if (!has_spr(cpu, SPR_DABR) || !has_spr(cpu, SPR_DABRX)) {
+return H_HARDWARE;
+}
+
+if ((dabrx & ~0xfULL) != 0 || (dabrx & H_DABRX_HYPERVISOR) != 0
+|| (dabrx & (H_DABRX_KERNEL | H_DABRX_USER)) == 0) {
+return H_PARAMETER;
+}
+
+set_spr(cs, SPR_DABRX, dabrx, -1L);
+set_spr(cs, SPR_DABR, args[0], -1L);
+
+return H_SUCCESS;
+}
+
 #define FLAGS_REGISTER_VPA 0x2000ULL
 #define FLAGS_REGISTER_DTL 0x4000ULL
 #define FLAGS_REGISTER_SLBSHADOW   0x6000ULL
@@ -1024,6 +1045,7 @@ static void hypercall_register_types(void)
 /* processor register resource access h-calls */
 spapr_register_hypercall(H_SET_SPRG0, h_set_sprg0);
 spapr_register_hypercall(H_SET_DABR, h_set_dabr);
+spapr_register_hypercall(H_SET_XDABR, h_set_xdabr);
 spapr_register_hypercall(H_SET_MODE, h_set_mode);
 
 /* "debugger" hcalls (also used by SLOF). Note: We do -not- differenciate
-- 
1.8.3.1




[Qemu-devel] [PATCH 4/4] hw/ppc/spapr: Implement the h_page_init hypercall

2016-02-10 Thread Thomas Huth
This hypercall either initializes a page with zeros, or copies
another page.
According to LoPAPR, the i-cache of the page should also be
flushed when using H_ICACHE_INVALIDATE or H_ICACHE_SYNCHRONIZE,
but this is currently only done when running with TCG - assuming
the cache will be flushed with KVM anyway when switching back to
kernel / VM context.
The code currently also does not explicitely flush the data cache
with H_ICACHE_SYNCHRONIZE, since this either also should be done
when switching back to kernel / VM context (with KVM), or not matter
anyway (for TCG).

Signed-off-by: Thomas Huth 
---
 hw/ppc/spapr_hcall.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index f14f849..91e703d 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -387,6 +387,47 @@ static target_ulong h_set_xdabr(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 return H_SUCCESS;
 }
 
+static target_ulong h_page_init(PowerPCCPU *cpu, sPAPRMachineState *spapr,
+target_ulong opcode, target_ulong *args)
+{
+target_ulong flags = args[0];
+target_ulong dest = args[1];
+target_ulong src = args[2];
+uint8_t buf[TARGET_PAGE_SIZE];
+
+if (flags & ~(H_ICACHE_SYNCHRONIZE | H_ICACHE_INVALIDATE
+  | H_COPY_PAGE | H_ZERO_PAGE)) {
+qemu_log_mask(LOG_UNIMP, "h_page_init: Bad flags (" TARGET_FMT_lx "\n",
+  flags);
+}
+
+if (!is_ram_address(spapr, dest) || (dest & ~TARGET_PAGE_MASK) != 0) {
+return H_PARAMETER;
+}
+
+if (flags & H_COPY_PAGE) {
+if (!is_ram_address(spapr, src) || (src & ~TARGET_PAGE_MASK) != 0) {
+return H_PARAMETER;
+}
+cpu_physical_memory_read(src, buf, TARGET_PAGE_SIZE);
+} else if (flags & H_ZERO_PAGE) {
+memset(buf, 0, TARGET_PAGE_SIZE);
+}
+
+if (flags & (H_COPY_PAGE | H_ZERO_PAGE)) {
+cpu_physical_memory_write(dest, buf, TARGET_PAGE_SIZE);
+}
+
+if (flags & (H_ICACHE_SYNCHRONIZE | H_ICACHE_INVALIDATE)) {
+if (tcg_enabled()) {
+tb_flush(CPU(cpu));
+}
+/* XXX: Flush data cache for H_ICACHE_SYNCHRONIZE? */
+}
+
+return H_SUCCESS;
+}
+
 #define FLAGS_REGISTER_VPA 0x2000ULL
 #define FLAGS_REGISTER_DTL 0x4000ULL
 #define FLAGS_REGISTER_SLBSHADOW   0x6000ULL
@@ -1046,6 +1087,7 @@ static void hypercall_register_types(void)
 spapr_register_hypercall(H_SET_SPRG0, h_set_sprg0);
 spapr_register_hypercall(H_SET_DABR, h_set_dabr);
 spapr_register_hypercall(H_SET_XDABR, h_set_xdabr);
+spapr_register_hypercall(H_PAGE_INIT, h_page_init);
 spapr_register_hypercall(H_SET_MODE, h_set_mode);
 
 /* "debugger" hcalls (also used by SLOF). Note: We do -not- differenciate
-- 
1.8.3.1




[Qemu-devel] [PATCH 2/4] hw/ppc/spapr: Implement h_set_dabr

2016-02-10 Thread Thomas Huth
According to LoPAPR, h_set_dabr should simply set DABRX to 3
(if the register is available), and load the parameter into DABR.
If DABRX is not available, the hypervisor has to check the
"Breakpoint Translation" bit of the DABR register first.

Signed-off-by: Thomas Huth 
---
 hw/ppc/spapr_hcall.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 58103ef..6b9d512 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -38,6 +38,12 @@ static void set_spr(CPUState *cs, int spr, target_ulong 
value,
 run_on_cpu(cs, do_spr_sync, &s);
 }
 
+static bool has_spr(PowerPCCPU *cpu, int spr)
+{
+/* We can test whether the SPR is defined by checking for a valid name */
+return cpu->env.spr_cb[spr].name != NULL;
+}
+
 static inline bool valid_pte_index(CPUPPCState *env, target_ulong pte_index)
 {
 /*
@@ -344,8 +350,20 @@ static target_ulong h_set_sprg0(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 static target_ulong h_set_dabr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
target_ulong opcode, target_ulong *args)
 {
-/* FIXME: actually implement this */
-return H_HARDWARE;
+CPUState *cs = CPU(cpu);
+
+if (!has_spr(cpu, SPR_DABR)) {
+return H_HARDWARE;  /* DABR register not available */
+}
+
+if (has_spr(cpu, SPR_DABRX)) {
+set_spr(cs, SPR_DABRX, 0x3, -1L);
+} else if (!(args[0] & 0x4)) {  /* Breakpoint Translation set? */
+return H_RESERVED_DABR;
+}
+
+set_spr(cs, SPR_DABR, args[0], -1L);
+return H_SUCCESS;
 }
 
 #define FLAGS_REGISTER_VPA 0x2000ULL
@@ -999,15 +1017,13 @@ static void hypercall_register_types(void)
 /* hcall-bulk */
 spapr_register_hypercall(H_BULK_REMOVE, h_bulk_remove);
 
-/* hcall-dabr */
-spapr_register_hypercall(H_SET_DABR, h_set_dabr);
-
 /* hcall-splpar */
 spapr_register_hypercall(H_REGISTER_VPA, h_register_vpa);
 spapr_register_hypercall(H_CEDE, h_cede);
 
 /* processor register resource access h-calls */
 spapr_register_hypercall(H_SET_SPRG0, h_set_sprg0);
+spapr_register_hypercall(H_SET_DABR, h_set_dabr);
 spapr_register_hypercall(H_SET_MODE, h_set_mode);
 
 /* "debugger" hcalls (also used by SLOF). Note: We do -not- differenciate
-- 
1.8.3.1




Re: [Qemu-devel] [Qemu-block] [PATCH v9 3/3] qmp: add monitor command to add/remove a child

2016-02-10 Thread Max Reitz
On 25.12.2015 10:22, Changlong Xie wrote:
> From: Wen Congyang 
> 
> The new QMP command name is x-blockdev-change. It's just for adding/removing
> quorum's child now, and doesn't support all kinds of children, all kinds of
> operations, nor all block drivers. So it is experimental now.
> 
> Signed-off-by: Wen Congyang 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Gonglei 
> Signed-off-by: Changlong Xie 
> ---
>  blockdev.c   | 54 
> 
>  qapi/block-core.json | 23 ++
>  qmp-commands.hx  | 47 +
>  3 files changed, 124 insertions(+)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 64dbfeb..4e62fdf 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3836,6 +3836,60 @@ out:
>  aio_context_release(aio_context);
>  }
>  
> +static BlockDriverState *bdrv_find_child(BlockDriverState *parent_bs,
> + const char *child_name)
> +{
> +BdrvChild *child;
> +
> +QLIST_FOREACH(child, &parent_bs->children, next) {
> +if (strcmp(child->name, child_name) == 0) {
> +return child->bs;
> +}
> +}
> +
> +return NULL;
> +}
> +
> +void qmp_x_blockdev_change(const char *parent, bool has_child,
> +   const char *child, bool has_node,
> +   const char *node, Error **errp)
> +{
> +BlockDriverState *parent_bs, *child_bs = NULL, *new_bs = NULL;
> +
> +parent_bs = bdrv_lookup_bs(parent, parent, errp);
> +if (!parent_bs) {
> +return;
> +}
> +
> +if (has_child == has_node) {
> +if (has_child) {
> +error_setg(errp, "The paramter child and node is conflict");

"The parameters child and node are in conflict"

Or, more naturally:

"child and node may not be specified at the same time"

> +} else {
> +error_setg(errp, "Either child or node should be specified");

s/should/must/

> +}
> +return;
> +}
> +
> +if (has_child) {
> +child_bs = bdrv_find_child(parent_bs, child);
> +if (!child_bs) {
> +error_setg(errp, "Node '%s' doesn't have child %s",

s/doesn't/does not/

(This is a personal opinion, but a pretty strong one.)

Also, if you put quotes around the node name, maybe you should do the
same around the child name.

> +   parent, child);
> +return;
> +}
> +bdrv_del_child(parent_bs, child_bs, errp);
> +}
> +
> +if (has_node) {
> +new_bs = bdrv_find_node(node);
> +if (!new_bs) {
> +error_setg(errp, "Node '%s' not found", node);
> +return;
> +}
> +bdrv_add_child(parent_bs, new_bs, errp);
> +}
> +}
> +
>  BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>  {
>  BlockJobInfoList *head = NULL, **p_next = &head;
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 1a5d9ce..fe63c6d 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2408,3 +2408,26 @@
>  ##
>  { 'command': 'block-set-write-threshold',
>'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
> +
> +##
> +# @x-blockdev-change
> +#
> +# Dynamically reconfigure the block driver state graph. It can be used
> +# to add, remove, insert or replace a block driver state. Currently only

I'd prefer "graph node" over BDS in this line.

> +# the Quorum driver implements this feature to add or remove its child.
> +# This is useful to fix a broken quorum child.
> +#

I'd like a list here what this command does depending on the parameters
given. For instance:

If @(new-)node is specified, it will be inserted under @parent. @child
may not be specified in this case.

If both @parent and @child are specified but @(new-)node is not, @child
will be detached from @parent.

> +# @parent: the id or name of the node that will be changed.

I don't know. The parent will actually not be changed, it's an edge that
will be changed; and the parent is the parent node of that edge. Just put

@parent: the id or name of the parent node

here.

> +#
> +# @child: #optional the name of the child that will be deleted.

For now. But maybe that will change in the future. Generally, it is just
the child node of the edge that will be changed. So just putting

@child: #optional the name of a child under the given parent node

(Let's hope this is clear enough that people know that this is not a
node name.)

> +#
> +# @node: #optional the name of the node will be added.

Maybe this should be named new-node instead.

Also: "...of the node that will be added."

> +#
> +# Note: this command is experimental, and its API is not stable.
> +#
> +# Since: 2.6
> +##
> +{ 'command': 'x-blockdev-change',
> +  'data' : { 'parent': 'str',
> + '*child': 'str',
> + '*node': 'str' } }
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 7b235ee..efee0ca 100644
> --- a/qmp-com

Re: [Qemu-devel] [PATCH v5 6/9] qemu-log: new option -dfilter to limit output

2016-02-10 Thread Richard Henderson

On 02/11/2016 04:40 AM, Alex Bennée wrote:

OK I think this version is a lot cleaner:

   void qemu_set_dfilter_ranges(const char *filter_spec)
   {
   gchar **ranges = g_strsplit(filter_spec, ",", 0);
   if (ranges) {
   gchar **next = ranges;
   gchar *r = *next++;
   debug_regions = g_array_sized_new(FALSE, FALSE,
 sizeof(Range), 
g_strv_length(ranges));
   while (r) {
   gchar *range_op = g_strstr_len(r, -1, "-");
   gchar *r2 = range_op ? range_op + 1 : NULL;
   if (!range_op) {
   range_op = g_strstr_len(r, -1, "+");
   r2 = range_op ? range_op + 1 : NULL;
   }
   if (!range_op) {
   range_op = g_strstr_len(r, -1, "..");
   r2 = range_op ? range_op + 2 : NULL;
   }


I guess I'll quit quibbling about silly glib functions.  But really, with the 
-1 argument, you gain nothing except obfuscation over using the standard C library.



   if (range_op) {
   struct Range range;
   int err;
   const char *e = NULL;

   err = qemu_strtoull(r, &e, 0, &range.begin);

   g_assert(e == range_op);

   switch (*range_op) {
   case '+':
   {
   unsigned long len;
   err |= qemu_strtoull(r2, NULL, 0, &len);


You can't or errno's together and then...


   g_error("Failed to parse range in: %s, %d", r, err);


... expect to get anything meaningful out of them.


+case '+':
+{
+unsigned long len;
+err |= qemu_strtoul(range_val, NULL, 0, &len);
+range.end = range.begin + len;
+break;
+}
+case '-':
+{
+unsigned long len;
+err |= qemu_strtoul(range_val, NULL, 0, &len);
+range.end = range.begin;
+range.begin = range.end - len;
+break;
+}


Both of these have off-by-one bugs, since end is inclusive.


Sorry I don't quite follow, do you mean the position of range_val (now
r2) or the final value of range.end?


Final value of range.end.  In that

   0x1000..0x1000
and
   0x1000+1

should both produce a range that covers a single byte at 0x1000.


r~



Re: [Qemu-devel] [PATCH v6 0/5] add ACPI node for fw_cfg on pc and arm

2016-02-10 Thread Gabriel L. Somlo
On Wed, Feb 10, 2016 at 11:16:57AM -0500, Gabriel L. Somlo wrote:
> Ping.
> 
> Now that the kernel side seems to have been accepted (Thanks again
> Laszlo and Matt for all the help and advice!!!), is there anything
> left to clean up before this series could be applied to QEMU ?

Actually, I take that back. I need to rebase the i386 patch (3/5) to
handle PcGuestInfo changes (commits fb306ff and f264d36); but right
now I'm bisecting, trying to figure out why after a giant git pull my
video suddenly stopped working ...

I'll be back soon with v7, stay tuned.

Thanks,
--Gabriel

> 
> gmane.org quick links:
> 1/5: http://article.gmane.org/gmane.comp.emulators.qemu/389896/raw
> 2/5: http://article.gmane.org/gmane.comp.emulators.qemu/389894/raw
> 3/5: http://article.gmane.org/gmane.comp.emulators.qemu/389895/raw
> 4/5: http://article.gmane.org/gmane.comp.emulators.qemu/389900/raw
> 5/5: http://article.gmane.org/gmane.comp.emulators.qemu/389898/raw
> 
> On Wed, Jan 27, 2016 at 10:00:52PM -0500, Gabriel L. Somlo wrote:
> > New since v5:
> > 
> > - rebased on top of latest QEMU git master
> > 
> > Thanks,
> >   --Gabriel
> > 
> > >New since v4:
> > >
> > >   - rebased on top of Marc's DMA series
> > >   - drop machine compat dependency for insertion into x86/ssdt
> > > (patch 3/5), following agreement between Igor and Eduardo
> > >   - [mm]io register range now covers DMA register as well, if
> > > available.
> > >   - s/bios/firmware in doc file updates
> > >
> > >>New since v3:
> > >>
> > >>  - rebased to work on top of 87e896ab (introducing pc-*-25 classes),
> > >>inserting fw_cfg acpi node only for machines >= 2.5.
> > >>
> > >>  - reintroduce _STA with value 0x0B (bit 2 for u/i visibility turned
> > >>off to avoid Windows complaining -- thanks Igor for catching that!)
> > >>
> > >>If there's any other feedback besides questions regarding the
> > >>appropriateness of "QEMU0002" as the value of _HID, please don't hesitate!
> > >>
> > >>>New since v2:
> > >>>
> > >>> - pc/i386 node in ssdt only on machine types *newer* than 2.4
> > >>>   (as suggested by Eduardo)
> > >>>
> > >>>I appreciate any further comments and reviews. Hopefully we can make
> > >>>this palatable for upstream, modulo the lingering concerns about whether
> > >>>"QEMU0002" is ok to use as the value of _HID, which I'll hopefully get
> > >>>sorted out with the kernel crew...
> > >>>
> > New since v1:
> > 
> > - expose control register size (suggested by Marc Marí)
> > 
> > - leaving out _UID and _STA fields (thanks Shannon & Igor)
> > 
> > - using "QEMU0002" as the value of _HID (thanks Michael)
> > 
> > - added documentation blurb to docs/specs/fw_cfg.txt
> >   (mainly to record usage of the "QEMU0002" string with fw_cfg).
> > 
> > > This series adds a fw_cfg device node to the SSDT (on pc), or to the
> > > DSDT (on arm).
> > >
> > >   - Patch 1/3 moves (and renames) the BIOS_CFG_IOPORT (0x510)
> > > define from pc.c to pc.h, so that it could be used from
> > > acpi-build.c in patch 2/3.
> > > 
> > >   - Patch 2/3 adds a fw_cfg node to the pc SSDT.
> > > 
> > >   - Patch 3/3 adds a fw_cfg node to the arm DSDT.
> > >
> > > I made up some names - "FWCF" for the node name, and "FWCF0001"
> > > for _HID; no idea whether that's appropriate, or how else I should
> > > figure out what to use instead...
> > >
> > > Also, using scope "\\_SB", based on where fw_cfg shows up in the
> > > output of "info qtree". Again, if that's wrong, please point me in
> > > the right direction.
> > >
> > > Re. 3/3 (also mentioned after the commit blurb in the patch itself),
> > > I noticed none of the other DSDT entries contain a _STA field, 
> > > wondering
> > > why it would (not) make sense to include that, same as on the PC.
> > 
> > Gabriel L. Somlo (5):
> >   fw_cfg: expose control register size in fw_cfg.h
> >   pc: fw_cfg: move ioport base constant to pc.h
> >   acpi: pc: add fw_cfg device node to ssdt
> >   acpi: arm: add fw_cfg device node to dsdt
> >   fw_cfg: document ACPI device node information
> > 
> >  docs/specs/fw_cfg.txt |  9 +
> >  hw/arm/virt-acpi-build.c  | 15 +++
> >  hw/i386/acpi-build.c  | 29 +
> >  hw/i386/pc.c  |  5 ++---
> >  hw/nvram/fw_cfg.c |  4 +++-
> >  include/hw/i386/pc.h  |  2 ++
> >  include/hw/nvram/fw_cfg.h |  3 +++
> >  7 files changed, 63 insertions(+), 4 deletions(-)
> > 
> > -- 
> > 2.4.3
> > 



Re: [Qemu-devel] [PATCH] memory: fix usage of find_next_bit and find_next_zero_bit

2016-02-10 Thread Thomas Huth
On 10.02.2016 15:11, Paolo Bonzini wrote:
> The last two arguments to these functions are the last and first bit to
> check relative to the base.  The code was using incorrectly the first
> bit and the number of bits.  Fix this in cpu_physical_memory_get_dirty
> and cpu_physical_memory_all_dirty.  This requires a few changes in the
> iteration; change the code in cpu_physical_memory_set_dirty_range to
> match.
> 
> Fixes: 5b82b70
> Cc: Stefan Hajnoczi 
> Signed-off-by: Paolo Bonzini 

That commit 5b82b70 also broke the pseries machine on qemu-ppc64:

- 8< --
$ ppc64-softmmu/qemu-system-ppc64 -net none -nographic -vga none 


SLOF **
QEMU Starting
 Build Date = Nov  5 2015 15:23:31
 FW Version = git-b4c93802a5b2c72f
 Press "s" to enter Open Firmware.



SLOF **
QEMU Starting
 Build Date = Nov  5 2015 15:23:31
 FW Version = git-b4c93802a5b2c72f
ERROR: Flatten device tree not available!
 Press "s" to enter Open Firmware.

  !!! roomfs lookup(bootinfo) = 1
Cannot find romfs file xvect
  !!! roomfs lookup(bootinfo) = 1
ERROR: Not enough memory for Open Firmware
qemu: fatal: Trying to execute code outside RAM or ROM at 0xffbf
- 8< --

With your patch here applied, SLOF boots fine again, so:

Tested-by: Thomas Huth 




Re: [Qemu-devel] [PATCH] qemu-ga: Fixed minor version switch issue

2016-02-10 Thread Michael Roth
Quoting Leonid Bloch (2016-01-11 03:12:41)
> With automatically generated GUID, on minor version changes, an error
> occurred, stating that there is a problem with the installer.
> Now, a notification is shown, warning the user that another version of
> this product is already installed, and that configuration or removal of
> the existing version is possible through Add/Remove Programs on the
> Control Panel (expected behavior).
> 
> Signed-off-by: Leonid Bloch 

Reviewed-by: Michael Roth 

> ---
>  qga/installer/qemu-ga.wxs | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/qga/installer/qemu-ga.wxs b/qga/installer/qemu-ga.wxs
> index 9473875..7f92891 100644
> --- a/qga/installer/qemu-ga.wxs
> +++ b/qga/installer/qemu-ga.wxs
> @@ -41,7 +41,7 @@
> 
>  Name="QEMU guest agent"
> -Id="*"
> +Id="{DF9974AD-E41A-4304-81AD-69AA8F299766}"
>  UpgradeCode="{EB6B8302-C06E-4BEC-ADAC-932C68A3A98D}"
>  Manufacturer="$(env.QEMU_GA_MANUFACTURER)"
>  Version="$(env.QEMU_GA_VERSION)"
> -- 
> 2.4.3
> 




Re: [Qemu-devel] [PATCH v5 6/9] qemu-log: new option -dfilter to limit output

2016-02-10 Thread Alex Bennée

Richard Henderson  writes:

> On 02/05/2016 01:56 AM, Alex Bennée wrote:
>> +gchar *range_op = g_strstr_len(r, -1, "-");
>
> This is strchr.
>
>> +range_op = g_strstr_len(r, -1, ".");
>
> Or at least if you're going to make use of strstr, search for "..".
>
>> +g_strdelimit(range_val, ".", ' ');
>
> Cause this is just weird.  It accepts "1+.2" just the same as "1..2".
>
>> +err = qemu_strtoul(r, NULL, 0, &range.begin);
>
> This should be strtoull.  You probably broke 32-bit compilation here.

OK I think this version is a lot cleaner:

  void qemu_set_dfilter_ranges(const char *filter_spec)
  {
  gchar **ranges = g_strsplit(filter_spec, ",", 0);
  if (ranges) {
  gchar **next = ranges;
  gchar *r = *next++;
  debug_regions = g_array_sized_new(FALSE, FALSE,
sizeof(Range), 
g_strv_length(ranges));
  while (r) {
  gchar *range_op = g_strstr_len(r, -1, "-");
  gchar *r2 = range_op ? range_op + 1 : NULL;
  if (!range_op) {
  range_op = g_strstr_len(r, -1, "+");
  r2 = range_op ? range_op + 1 : NULL;
  }
  if (!range_op) {
  range_op = g_strstr_len(r, -1, "..");
  r2 = range_op ? range_op + 2 : NULL;
  }
  if (range_op) {
  struct Range range;
  int err;
  const char *e = NULL;

  err = qemu_strtoull(r, &e, 0, &range.begin);

  g_assert(e == range_op);

  switch (*range_op) {
  case '+':
  {
  unsigned long len;
  err |= qemu_strtoull(r2, NULL, 0, &len);
  range.end = range.begin + len;
  break;
  }
  case '-':
  {
  unsigned long len;
  err |= qemu_strtoull(r2, NULL, 0, &len);
  range.end = range.begin;
  range.begin = range.end - len;
  break;
  }
  case '.':
  err |= qemu_strtoull(r2, NULL, 0, &range.end);
  break;
  default:
  g_assert_not_reached();
  }
  if (err) {
  g_error("Failed to parse range in: %s, %d", r, err);
  } else {
  g_array_append_val(debug_regions, range);
  }
  } else {
  g_error("Bad range specifier in: %s", r);
  }
  r = *next++;
  }
  g_strfreev(ranges);
  }
  }


>
>> +case '+':
>> +{
>> +unsigned long len;
>> +err |= qemu_strtoul(range_val, NULL, 0, &len);
>> +range.end = range.begin + len;
>> +break;
>> +}
>> +case '-':
>> +{
>> +unsigned long len;
>> +err |= qemu_strtoul(range_val, NULL, 0, &len);
>> +range.end = range.begin;
>> +range.begin = range.end - len;
>> +break;
>> +}
>
> Both of these have off-by-one bugs, since end is inclusive.

Sorry I don't quite follow, do you mean the position of range_val (now
r2) or the final value of range.end?

>
>> +case '.':
>> +err |= qemu_strtoul(range_val, NULL, 0, &range.end);
>> +break;
>
> I'd think multiple dot detection belongs here, and you need not smash them to 
> '
> ' but merely notice that there are two of them and then strtoull
> range_val+1.

I think this is covered with the new code now.

>
>
> r~


--
Alex Bennée



Re: [Qemu-devel] [PATCH 07/15] tcg-mips: Adjust qemu_ld/st for mips64

2016-02-10 Thread Richard Henderson

On 02/11/2016 03:34 AM, James Hogan wrote:

Hi Richard,

On Tue, Feb 09, 2016 at 09:39:55PM +1100, Richard Henderson wrote:

@@ -1212,11 +1237,24 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg 
base, TCGReg addrl,
 : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
  int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);

-tcg_out_opc_sa(s, OPC_SRL, TCG_REG_A0, addrl,
-   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_A0, TCG_REG_A0,
-(CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
-tcg_out_opc_reg(s, OPC_ADDU, TCG_REG_A0, TCG_REG_A0, TCG_AREG0);
+if (use_mips32r2_instructions) {
+if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
+tcg_out_opc_bf(s, OPC_EXT, TCG_REG_A0, addrl,
+   TARGET_PAGE_BITS + CPU_TLB_ENTRY_BITS - 1,
+   CPU_TLB_ENTRY_BITS);
+} else {
+tcg_out_opc_bf64(s, OPC_DEXT, OPC_DEXTM, OPC_DEXTU,
+ TCG_REG_A0, addrl,
+ TARGET_PAGE_BITS + CPU_TLB_ENTRY_BITS - 1,
+ CPU_TLB_ENTRY_BITS);
+}


The ext/dext here will end up with bits below bit CPU_TLB_ENTRY_BITS
set, which will result in load of addend from slightly offset address,
so things go badly wrong. You still need to either ANDI off the low bits
or trim them off with the ext/dext and shift it left again.

So I don't think there's any benefit to the use of these instructions
unless CPU_TLB_SIZE + CPU_TLB_ENTRY_BITS exceeds the 16-bits available
in the ANDI immediate field for the non r2 case.


Hmm.  I thought I'd deleted this code back out.  I must have messed up copying 
trees between machines and overwritten this.



r~



Re: [Qemu-devel] [PULL 48/49] i386: populate floppy drive information in DSDT

2016-02-10 Thread Roman Kagan
On Wed, Feb 10, 2016 at 12:16:32PM -0500, John Snow wrote:
> On 02/10/2016 12:10 PM, Roman Kagan wrote:
> > Well, as I wrote in another mail, SeaBIOS, which is supposed to provide
> > the same information to int 0x13/0x08, populates it with static data
> > based only on the drive type as encoded in CMOS at initialization time,
> > and everyone seem to have been fine with that so far.  I'll need to
> > re-test it with real Windows guests, though.
> > 
> 
> OK, but what we appear to be doing currently is polling the current
> geometry values for a drive instead of some pre-chosen ones based on the
> drive type.
> 
> What values does SeaBIOS use? (Can you point me to the table it uses?)

src/hw/floppy.c:

struct floppyinfo_s FloppyInfo[] VARFSEG = {
// Unknown
{ {0, 0, 0}, 0x00, 0x00},
// 1 - 360KB, 5.25" - 2 heads, 40 tracks, 9 sectors
{ {2, 40, 9}, FLOPPY_SIZE_525, FLOPPY_RATE_300K},
// 2 - 1.2MB, 5.25" - 2 heads, 80 tracks, 15 sectors
{ {2, 80, 15}, FLOPPY_SIZE_525, FLOPPY_RATE_500K},
// 3 - 720KB, 3.5"  - 2 heads, 80 tracks, 9 sectors
{ {2, 80, 9}, FLOPPY_SIZE_350, FLOPPY_RATE_250K},
// 4 - 1.44MB, 3.5" - 2 heads, 80 tracks, 18 sectors
{ {2, 80, 18}, FLOPPY_SIZE_350, FLOPPY_RATE_500K},
// 5 - 2.88MB, 3.5" - 2 heads, 80 tracks, 36 sectors
{ {2, 80, 36}, FLOPPY_SIZE_350, FLOPPY_RATE_1M},
// 6 - 160k, 5.25"  - 1 heads, 40 tracks, 8 sectors
{ {1, 40, 8}, FLOPPY_SIZE_525, FLOPPY_RATE_250K},
// 7 - 180k, 5.25"  - 1 heads, 40 tracks, 9 sectors
{ {1, 40, 9}, FLOPPY_SIZE_525, FLOPPY_RATE_300K},
// 8 - 320k, 5.25"  - 2 heads, 40 tracks, 8 sectors
{ {2, 40, 8}, FLOPPY_SIZE_525, FLOPPY_RATE_250K},
};

The array is indexed by the floppy drive type from CMOS

Roman.



Re: [Qemu-devel] [PATCH] memory: fix usage of find_next_bit and find_next_zero_bit

2016-02-10 Thread Leon Alrae
On 10/02/16 14:11, Paolo Bonzini wrote:
> The last two arguments to these functions are the last and first bit to
> check relative to the base.  The code was using incorrectly the first
> bit and the number of bits.  Fix this in cpu_physical_memory_get_dirty
> and cpu_physical_memory_all_dirty.  This requires a few changes in the
> iteration; change the code in cpu_physical_memory_set_dirty_range to
> match.
> 
> Fixes: 5b82b70
> Cc: Stefan Hajnoczi 
> Signed-off-by: Paolo Bonzini 
> ---
>  include/exec/ram_addr.h | 55 
> -
>  1 file changed, 36 insertions(+), 19 deletions(-)

It fixes the performance problem I was seeing:

Tested-by: Leon Alrae 

Thanks,
Leon




Re: [Qemu-devel] [PULL 48/49] i386: populate floppy drive information in DSDT

2016-02-10 Thread Roman Kagan
On Wed, Feb 10, 2016 at 11:14:30AM -0500, John Snow wrote:
> On 02/09/2016 01:48 PM, Michael S. Tsirkin wrote:
> > On Tue, Feb 09, 2016 at 07:36:12PM +0100, Laszlo Ersek wrote:
> >> Implementing this in QEMU would require:
> >> - inventing virt-only registers for the FDC that provide the current
> >> disk geometry,
> 
> We do secretly have these registers, but of course there's no spec to
> interpreting what they mean. For instance, what do they read when the
> drive is empty? I am not sure that information is codified in the ACPI spec.

There are various possibilities, like returning False for the
corresponding drive in _FDE (Floppy Disk Enumerate) object, or returning
0 aka unknown as drive type in _FDI.  Anyway I'd hate needing to expose
all of that in an ACPI-accessible form, this is going to become too fat.

> Could be wrong, you seemed to indicate that the ACPI spec said that the
> info matches what you get from a certain legacy bios routine, but I
> don't know the specifics of that routine, either.

Indeed, it's supposed to do the same as
https://en.wikipedia.org/wiki/INT_13H#INT_13h_AH.3D08h:_Read_Drive_Parameters

and, as I wrote in another mail, the SeaBIOS implementation here is
rather simplistic.

> Roman, does the 0xFF "empty disk geometry" hack appear to work for
> Windows 10?
> 
> Maybe it's fine enough as-is, as per Laszlo's good synopsis here.

I'll test and let you know.

Roman.



Re: [Qemu-devel] [PULL 48/49] i386: populate floppy drive information in DSDT

2016-02-10 Thread John Snow


On 02/10/2016 12:10 PM, Roman Kagan wrote:
> On Tue, Feb 09, 2016 at 07:36:12PM +0100, Laszlo Ersek wrote:
>> In my opinion, the real mess in this case is in the ACPI spec itself. If
>> you re-read the _FDI control method's description, the Package that it
>> returns contains *dynamic* geometry data, about the *disk* (not *drive*):
>>
>> - Maximum Cylinder Number // Integer (WORD)
>> - Maximum Sector Number   // Integer (WORD)
>> - Maximum Head Number // Integer (WORD)
>>
>> What this seems to require is: the firmware developer should write ACPI
>> code that
>> - dynamically accesses the floppy drive / controller (using MMIO or IO
>>   port registers),
>> - retrieves the geometry of the disk actually inserted,
>> - and returns the data nicely packaged.
>>
>> In effect, an ACPI-level driver for the disk.
>>
>> Now, John explained to me (and please keep in mind that this is my
>> personal account of his explanation, not a factual rendering thereof),
>> that there used to be no *standard* way to interrogate the current
>> disk's geometry, other than trial and error with seeking.
>>
>> Apparently in UEFI Windows, Microsoft didn't want to implement this
>> trial and error seeking, so -- given there was also no portable
>> *hardware spec* to retrieve the same data, directly from the disk drive
>> or controller -- they punted the entire question to ACPI. That is, to
>> the firmware implementor.
> 
> Well, as I wrote in another mail, SeaBIOS, which is supposed to provide
> the same information to int 0x13/0x08, populates it with static data
> based only on the drive type as encoded in CMOS at initialization time,
> and everyone seem to have been fine with that so far.  I'll need to
> re-test it with real Windows guests, though.
> 

OK, but what we appear to be doing currently is polling the current
geometry values for a drive instead of some pre-chosen ones based on the
drive type.

What values does SeaBIOS use? (Can you point me to the table it uses?)

We could just duplicate those, probably.

>> IMHO, do the *absolute minimum* to adapt this AML generation patch to
>> John's FDC rework, and ignore all dynamic aspects (like media change).
> 
> Hopefully that'll suffice.
> 
> Roman.
> 



Re: [Qemu-devel] [PULL 48/49] i386: populate floppy drive information in DSDT

2016-02-10 Thread Roman Kagan
On Tue, Feb 09, 2016 at 07:36:12PM +0100, Laszlo Ersek wrote:
> In my opinion, the real mess in this case is in the ACPI spec itself. If
> you re-read the _FDI control method's description, the Package that it
> returns contains *dynamic* geometry data, about the *disk* (not *drive*):
> 
> - Maximum Cylinder Number // Integer (WORD)
> - Maximum Sector Number   // Integer (WORD)
> - Maximum Head Number // Integer (WORD)
> 
> What this seems to require is: the firmware developer should write ACPI
> code that
> - dynamically accesses the floppy drive / controller (using MMIO or IO
>   port registers),
> - retrieves the geometry of the disk actually inserted,
> - and returns the data nicely packaged.
> 
> In effect, an ACPI-level driver for the disk.
> 
> Now, John explained to me (and please keep in mind that this is my
> personal account of his explanation, not a factual rendering thereof),
> that there used to be no *standard* way to interrogate the current
> disk's geometry, other than trial and error with seeking.
> 
> Apparently in UEFI Windows, Microsoft didn't want to implement this
> trial and error seeking, so -- given there was also no portable
> *hardware spec* to retrieve the same data, directly from the disk drive
> or controller -- they punted the entire question to ACPI. That is, to
> the firmware implementor.

Well, as I wrote in another mail, SeaBIOS, which is supposed to provide
the same information to int 0x13/0x08, populates it with static data
based only on the drive type as encoded in CMOS at initialization time,
and everyone seem to have been fine with that so far.  I'll need to
re-test it with real Windows guests, though.

> IMHO, do the *absolute minimum* to adapt this AML generation patch to
> John's FDC rework, and ignore all dynamic aspects (like media change).

Hopefully that'll suffice.

Roman.



Re: [Qemu-devel] [PULL 02/32] memory: RCU ram_list.dirty_memory[] for safe RAM hotplug

2016-02-10 Thread Paolo Bonzini


On 10/02/2016 17:56, Leon Alrae wrote:
> Hi,
> 
> I just noticed significant performance hit with this change. Booting
> small system (I tried on system mips only) was usually taking around 20
> seconds, now reaches 3 minutes with this change.

You're lucky that it booted at all. :)  Unfortunately I noticed a slow
boot yesterday (6 minutes for Fedora 21 on TCG), but I blamed that I was
running on top of TCG _and_ NFS.  In retrospect that was not too smart.

Today I tried FreeDOS and it hangs completely, but my self-nak and
Peter's push unfortunately crossed.

I've posted a fix and Stefan has already reviewed it.  If you can test
it, that would be great.  The above Fedora 21 image takes about 50
seconds to boot with the patch, so that's in line with the x9 slowdown
you reported.

http://article.gmane.org/gmane.comp.emulators.qemu/393105/raw

Paolo

> Leon
> 
> On 09/02/16 12:13, Paolo Bonzini wrote:
>> From: Stefan Hajnoczi 
>>
>> Although accesses to ram_list.dirty_memory[] use atomics so multiple
>> threads can safely dirty the bitmap, the data structure is not fully
>> thread-safe yet.
>>
>> This patch handles the RAM hotplug case where ram_list.dirty_memory[] is
>> grown.  ram_list.dirty_memory[] is change from a regular bitmap to an
>> RCU array of pointers to fixed-size bitmap blocks.  Threads can continue
>> accessing bitmap blocks while the array is being extended.  See the
>> comments in the code for an in-depth explanation of struct
>> DirtyMemoryBlocks.
>>
>> I have tested that live migration with virtio-blk dataplane works.
>>
>> Signed-off-by: Stefan Hajnoczi 
>> Message-Id: <1453728801-5398-2-git-send-email-stefa...@redhat.com>
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  exec.c  |  75 +++
>>  include/exec/ram_addr.h | 189 
>> ++--
>>  migration/ram.c |   4 -
>>  3 files changed, 225 insertions(+), 43 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index ab37360..7d67c11 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -980,8 +980,9 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t 
>> start,
>>ram_addr_t length,
>>unsigned client)
>>  {
>> +DirtyMemoryBlocks *blocks;
>>  unsigned long end, page;
>> -bool dirty;
>> +bool dirty = false;
>>  
>>  if (length == 0) {
>>  return false;
>> @@ -989,8 +990,22 @@ bool 
>> cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
>>  
>>  end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
>>  page = start >> TARGET_PAGE_BITS;
>> -dirty = bitmap_test_and_clear_atomic(ram_list.dirty_memory[client],
>> - page, end - page);
>> +
>> +rcu_read_lock();
>> +
>> +blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
>> +
>> +while (page < end) {
>> +unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
>> +unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
>> +unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - 
>> offset);
>> +
>> +dirty |= bitmap_test_and_clear_atomic(blocks->blocks[idx],
>> +  offset, num);
>> +page += num;
>> +}
>> +
>> +rcu_read_unlock();
>>  
>>  if (dirty && tcg_enabled()) {
>>  tlb_reset_dirty_range_all(start, length);
>> @@ -1504,6 +1519,47 @@ int qemu_ram_resize(ram_addr_t base, ram_addr_t 
>> newsize, Error **errp)
>>  return 0;
>>  }
>>  
>> +/* Called with ram_list.mutex held */
>> +static void dirty_memory_extend(ram_addr_t old_ram_size,
>> +ram_addr_t new_ram_size)
>> +{
>> +ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
>> + DIRTY_MEMORY_BLOCK_SIZE);
>> +ram_addr_t new_num_blocks = DIV_ROUND_UP(new_ram_size,
>> + DIRTY_MEMORY_BLOCK_SIZE);
>> +int i;
>> +
>> +/* Only need to extend if block count increased */
>> +if (new_num_blocks <= old_num_blocks) {
>> +return;
>> +}
>> +
>> +for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
>> +DirtyMemoryBlocks *old_blocks;
>> +DirtyMemoryBlocks *new_blocks;
>> +int j;
>> +
>> +old_blocks = atomic_rcu_read(&ram_list.dirty_memory[i]);
>> +new_blocks = g_malloc(sizeof(*new_blocks) +
>> +  sizeof(new_blocks->blocks[0]) * 
>> new_num_blocks);
>> +
>> +if (old_num_blocks) {
>> +memcpy(new_blocks->blocks, old_blocks->blocks,
>> +   old_num_blocks * sizeof(old_blocks->blocks[0]));
>> +}
>> +
>> +for (j = old_num_blocks; j < new_num_blocks; j++) {
>> +new_blocks->blocks[j] = bitmap_new(DIRTY_MEMORY_BLOCK_SIZE);
>> +}
>> +
>> +atomic_rcu_set(&ram_list.dirty_memory[i], new_blocks);
>> +
>> + 

Re: [Qemu-devel] [PULL 48/49] i386: populate floppy drive information in DSDT

2016-02-10 Thread Roman Kagan
On Tue, Feb 09, 2016 at 11:22:01AM -0500, John Snow wrote:
> > I don't.  At the time the patch was developed there basically were no
> > mechanisms to update the geometry at all (and this was what you patchset
> > addressed, in particular, wasn't it?) so I didn't care.
> 
> That's not true.
> 
> You could swap different 1.44MB-class diskettes for other geometries,
> check this out:
> 
> static const FDFormat fd_formats[] = {
> /* First entry is default format */
> /* 1.44 MB 3"1/2 floppy disks */
> { FDRIVE_DRV_144, 18, 80, 1, FDRIVE_RATE_500K, },
> { FDRIVE_DRV_144, 20, 80, 1, FDRIVE_RATE_500K, },
> { FDRIVE_DRV_144, 21, 80, 1, FDRIVE_RATE_500K, },
> { FDRIVE_DRV_144, 21, 82, 1, FDRIVE_RATE_500K, },
> { FDRIVE_DRV_144, 21, 83, 1, FDRIVE_RATE_500K, },
> { FDRIVE_DRV_144, 22, 80, 1, FDRIVE_RATE_500K, },
> { FDRIVE_DRV_144, 23, 80, 1, FDRIVE_RATE_500K, },
> { FDRIVE_DRV_144, 24, 80, 1, FDRIVE_RATE_500K, },
> ...
> 
> You absolutely could get different sector and track counts before my
> patchset.

Indeed (sorry the patch was developed a couple of months ago so I had to
look at the code to refresh my memory).

However, I tried to implement the part of ACPI spec that read

> 9.9.2 _FDI (Floppy Disk Information)
> 
> This object returns information about a floppy disk drive. This
> information is the same as that returned by the INT 13 Function 08H on
> IA-PCs.

so I went ahead and looked into what SeaBIOS did for int 0x13/0x08.  And
what it did was read the CMOS at 0x10 and obtain the drive type, and
then return a hardcoded set of parameters (including geometry)
associated to that drive type.  So this was what I basically did here,
too.  (As a matter of fact the first patch I submitted was just pure ASL
which mimicked exactly the SeaBIOS behavior: read the CMOS and return
the corresponding Package with parameters.)

And IIRC the drive type couldn't change at runtime so I thought I wasn't
doing worse than it was.

As for what to do now, I'll try to check how tolerant the guests are of
changing the floppy geometry under them without updating _FDI, and then
decide.

Roman.



Re: [Qemu-devel] [PULL 02/32] memory: RCU ram_list.dirty_memory[] for safe RAM hotplug

2016-02-10 Thread Leon Alrae
Hi,

I just noticed significant performance hit with this change. Booting
small system (I tried on system mips only) was usually taking around 20
seconds, now reaches 3 minutes with this change.

Leon

On 09/02/16 12:13, Paolo Bonzini wrote:
> From: Stefan Hajnoczi 
> 
> Although accesses to ram_list.dirty_memory[] use atomics so multiple
> threads can safely dirty the bitmap, the data structure is not fully
> thread-safe yet.
> 
> This patch handles the RAM hotplug case where ram_list.dirty_memory[] is
> grown.  ram_list.dirty_memory[] is change from a regular bitmap to an
> RCU array of pointers to fixed-size bitmap blocks.  Threads can continue
> accessing bitmap blocks while the array is being extended.  See the
> comments in the code for an in-depth explanation of struct
> DirtyMemoryBlocks.
> 
> I have tested that live migration with virtio-blk dataplane works.
> 
> Signed-off-by: Stefan Hajnoczi 
> Message-Id: <1453728801-5398-2-git-send-email-stefa...@redhat.com>
> Signed-off-by: Paolo Bonzini 
> ---
>  exec.c  |  75 +++
>  include/exec/ram_addr.h | 189 
> ++--
>  migration/ram.c |   4 -
>  3 files changed, 225 insertions(+), 43 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index ab37360..7d67c11 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -980,8 +980,9 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t 
> start,
>ram_addr_t length,
>unsigned client)
>  {
> +DirtyMemoryBlocks *blocks;
>  unsigned long end, page;
> -bool dirty;
> +bool dirty = false;
>  
>  if (length == 0) {
>  return false;
> @@ -989,8 +990,22 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t 
> start,
>  
>  end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
>  page = start >> TARGET_PAGE_BITS;
> -dirty = bitmap_test_and_clear_atomic(ram_list.dirty_memory[client],
> - page, end - page);
> +
> +rcu_read_lock();
> +
> +blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
> +
> +while (page < end) {
> +unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> +unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> +unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - 
> offset);
> +
> +dirty |= bitmap_test_and_clear_atomic(blocks->blocks[idx],
> +  offset, num);
> +page += num;
> +}
> +
> +rcu_read_unlock();
>  
>  if (dirty && tcg_enabled()) {
>  tlb_reset_dirty_range_all(start, length);
> @@ -1504,6 +1519,47 @@ int qemu_ram_resize(ram_addr_t base, ram_addr_t 
> newsize, Error **errp)
>  return 0;
>  }
>  
> +/* Called with ram_list.mutex held */
> +static void dirty_memory_extend(ram_addr_t old_ram_size,
> +ram_addr_t new_ram_size)
> +{
> +ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
> + DIRTY_MEMORY_BLOCK_SIZE);
> +ram_addr_t new_num_blocks = DIV_ROUND_UP(new_ram_size,
> + DIRTY_MEMORY_BLOCK_SIZE);
> +int i;
> +
> +/* Only need to extend if block count increased */
> +if (new_num_blocks <= old_num_blocks) {
> +return;
> +}
> +
> +for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
> +DirtyMemoryBlocks *old_blocks;
> +DirtyMemoryBlocks *new_blocks;
> +int j;
> +
> +old_blocks = atomic_rcu_read(&ram_list.dirty_memory[i]);
> +new_blocks = g_malloc(sizeof(*new_blocks) +
> +  sizeof(new_blocks->blocks[0]) * 
> new_num_blocks);
> +
> +if (old_num_blocks) {
> +memcpy(new_blocks->blocks, old_blocks->blocks,
> +   old_num_blocks * sizeof(old_blocks->blocks[0]));
> +}
> +
> +for (j = old_num_blocks; j < new_num_blocks; j++) {
> +new_blocks->blocks[j] = bitmap_new(DIRTY_MEMORY_BLOCK_SIZE);
> +}
> +
> +atomic_rcu_set(&ram_list.dirty_memory[i], new_blocks);
> +
> +if (old_blocks) {
> +g_free_rcu(old_blocks, rcu);
> +}
> +}
> +}
> +
>  static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
>  {
>  RAMBlock *block;
> @@ -1543,6 +1599,7 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, 
> Error **errp)
>(new_block->offset + new_block->max_length) >> 
> TARGET_PAGE_BITS);
>  if (new_ram_size > old_ram_size) {
>  migration_bitmap_extend(old_ram_size, new_ram_size);
> +dirty_memory_extend(old_ram_size, new_ram_size);
>  }
>  /* Keep the list sorted from biggest to smallest block.  Unlike QTAILQ,
>   * QLIST (which has an RCU-friendly variant) does not have insertion at
> @@ -1568,18 +1625,6 @@ static ram_addr_t ram_block_add(RAMBlock *ne

Re: [Qemu-devel] [PATCH v2] cirrus_vga: fix off-by-one in blit_region_is_unsafe

2016-02-10 Thread Laszlo Ersek
On 02/10/16 17:17, Paolo Bonzini wrote:
> The "max" value is being compared with >=, but addr + width points to
> the first byte that will _not_ be copied.  Laszlo suggested using a
> "greater than" comparison, instead of subtracting one like it is
> already done above for the height, so that max remains always positive.
> 
> The mistake is "safe"---it will reject some blits, but will never cause
> out-of-bounds writes.
> 
> Cc: Gerd Hoffmann 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/display/cirrus_vga.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c
> index b6ce1c8..57b91a7 100644
> --- a/hw/display/cirrus_vga.c
> +++ b/hw/display/cirrus_vga.c
> @@ -276,14 +276,14 @@ static bool blit_region_is_unsafe(struct CirrusVGAState 
> *s,
>  + ((int64_t)s->cirrus_blt_height-1) * pitch;
>  int32_t max = addr
>  + s->cirrus_blt_width;
> -if (min < 0 || max >= s->vga.vram_size) {
> +if (min < 0 || max > s->vga.vram_size) {
>  return true;
>  }
>  } else {
>  int64_t max = addr
>  + ((int64_t)s->cirrus_blt_height-1) * pitch
>  + s->cirrus_blt_width;
> -if (max >= s->vga.vram_size) {
> +if (max > s->vga.vram_size) {
>  return true;
>  }
>  }
> 

Thank you for your patience. :)

Reviewed-by: Laszlo Ersek 



Re: [Qemu-devel] [PULL 48/49] i386: populate floppy drive information in DSDT

2016-02-10 Thread Michael S. Tsirkin
On Wed, Feb 10, 2016 at 11:14:30AM -0500, John Snow wrote:
> 
> 
> On 02/09/2016 01:48 PM, Michael S. Tsirkin wrote:
> > On Tue, Feb 09, 2016 at 07:36:12PM +0100, Laszlo Ersek wrote:
> >> On 02/09/16 17:22, John Snow wrote:
> >>>
> >>>
> >>> On 02/09/2016 10:52 AM, Roman Kagan wrote:
>  On Mon, Feb 08, 2016 at 03:20:47PM -0500, John Snow wrote:
> > On 02/08/2016 08:14 AM, Roman Kagan wrote:
> >> On Fri, Feb 05, 2016 at 07:25:07PM +0100, Igor Mammedov wrote:
>  +aml_append(fdi,
>  +aml_int(cylinders - 1));  /* Maximum Cylinder Number */
> >>> this puts uint64_t(-1) in AML i.e. cylinders == 0 and overflow 
> >>> happens here
> >>>
> >>> CCing Jon
> >>
> >> I guess this is the effect of John's fdc rework.  I used to think zero
> >> geometry was impossible at the time this patch was developed.
> >>
> >> I wonder if it hasn't been fixed already by
> >>
> >>   commit fd9bdbd3459e5b9d51534f0747049bc5b6145e07
> >>   Author: John Snow 
> >>   Date:   Wed Feb 3 11:28:55 2016 -0500
> >>
> >>   fdc: fix detection under Linux
> >
> > Yes, hopefully solved on my end. The geometry values for an empty disk
> > are not well defined (they certainly don't have any *meaning*) so if you
> > are populating tables based on an empty drive, I just hope you also have
> > the mechanisms needed to update said tables when the media changes.
> 
>  I don't.  At the time the patch was developed there basically were no
>  mechanisms to update the geometry at all (and this was what you patchset
>  addressed, in particular, wasn't it?) so I didn't care.
> 
> >>>
> >>> That's not true.
> >>>
> >>> You could swap different 1.44MB-class diskettes for other geometries,
> >>> check this out:
> >>>
> >>> static const FDFormat fd_formats[] = {
> >>> /* First entry is default format */
> >>> /* 1.44 MB 3"1/2 floppy disks */
> >>> { FDRIVE_DRV_144, 18, 80, 1, FDRIVE_RATE_500K, },
> >>> { FDRIVE_DRV_144, 20, 80, 1, FDRIVE_RATE_500K, },
> >>> { FDRIVE_DRV_144, 21, 80, 1, FDRIVE_RATE_500K, },
> >>> { FDRIVE_DRV_144, 21, 82, 1, FDRIVE_RATE_500K, },
> >>> { FDRIVE_DRV_144, 21, 83, 1, FDRIVE_RATE_500K, },
> >>> { FDRIVE_DRV_144, 22, 80, 1, FDRIVE_RATE_500K, },
> >>> { FDRIVE_DRV_144, 23, 80, 1, FDRIVE_RATE_500K, },
> >>> { FDRIVE_DRV_144, 24, 80, 1, FDRIVE_RATE_500K, },
> >>> ...
> >>>
> >>> You absolutely could get different sector and track counts before my
> >>> patchset.
> >>>
> >>>
>  Now if it actually has to be fully dynamic it's gonna be more
>  involved...
> 
> > What do the guests use these values for? Are they fixed at boot?
> 
>  Only Windows guests use it so it's hard to tell.  I can only claim that
>  if I stick bogus values into that ACPI object the guest fails to read
>  the floppy.
> >>
> >> We discussed this with John a bit on IRC.
> >>
> >> In my opinion, the real mess in this case is in the ACPI spec itself. If
> >> you re-read the _FDI control method's description, the Package that it
> >> returns contains *dynamic* geometry data, about the *disk* (not *drive*):
> >>
> >> - Maximum Cylinder Number // Integer (WORD)
> >> - Maximum Sector Number   // Integer (WORD)
> >> - Maximum Head Number // Integer (WORD)
> >>
> >> What this seems to require is: the firmware developer should write ACPI
> >> code that
> >> - dynamically accesses the floppy drive / controller (using MMIO or IO
> >>   port registers),
> >> - retrieves the geometry of the disk actually inserted,
> >> - and returns the data nicely packaged.
> >>
> >> In effect, an ACPI-level driver for the disk.
> >>
> >> Now, John explained to me (and please keep in mind that this is my
> >> personal account of his explanation, not a factual rendering thereof),
> >> that there used to be no *standard* way to interrogate the current
> >> disk's geometry, other than trial and error with seeking.
> >>
> 
> At the very least, the Intel 82078 does not appear to be capable of
> replying to any query with a CHS triplet. It *can* report back the total
> number of sectors and the size of each sector, but that still requires
> geometry guesswork outside of the drive.
> 
> >> Apparently in UEFI Windows, Microsoft didn't want to implement this
> >> trial and error seeking, so -- given there was also no portable
> >> *hardware spec* to retrieve the same data, directly from the disk drive
> >> or controller -- they punted the entire question to ACPI. That is, to
> >> the firmware implementor.
> >>
> >> This is entirely bogus. For one, it ties the platform firmware (the UEFI
> >> binary in the flash chip on your motherboard) to your specific floppy
> >> drive / controller. Say good-bye to any independently bought & installed
> >> floppy drives.
> >>
> >> In theory, a floppy controller that comes with a separate PCI card could
> >> offer an option ROM with a UEFI driver in it, and that 

Re: [Qemu-devel] [PATCH v2 5/5] kvm/x86: Hyper-V VMBus hypercall userspace exit

2016-02-10 Thread Paolo Bonzini


On 21/01/2016 15:01, Andrey Smetanin wrote:
> The patch implements KVM_EXIT_HYPERV userspace exit
> functionality for Hyper-V VMBus hypercalls:
> HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT.
> 
> Changes v2:
> * use KVM_EXIT_HYPERV for hypercalls
> 
> Signed-off-by: Andrey Smetanin 
> Reviewed-by: Roman Kagan 
> CC: Gleb Natapov 
> CC: Paolo Bonzini 
> CC: Joerg Roedel 
> CC: "K. Y. Srinivasan" 
> CC: Haiyang Zhang 
> CC: Roman Kagan 
> CC: Denis V. Lunev 
> CC: qemu-devel@nongnu.org
> ---
>  Documentation/virtual/kvm/api.txt |  6 ++
>  arch/x86/kvm/hyperv.c | 29 ++---
>  arch/x86/kvm/hyperv.h |  1 +
>  arch/x86/kvm/x86.c|  5 +
>  include/uapi/linux/kvm.h  |  6 ++
>  5 files changed, 40 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 053f613..1bf1a07 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3339,6 +3339,7 @@ EOI was received.
>  
>   struct kvm_hyperv_exit {
>  #define KVM_EXIT_HYPERV_SYNIC  1
> +#define KVM_EXIT_HYPERV_HCALL  2
>   __u32 type;
>   union {
>   struct {
> @@ -3347,6 +3348,11 @@ EOI was received.
>   __u64 evt_page;
>   __u64 msg_page;
>   } synic;
> + struct {
> + __u64 input;
> + __u64 result;
> + __u64 params[2];
> + } hcall;
>   } u;
>   };
>   /* KVM_EXIT_HYPERV */
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index e1daa8b..26ae973 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1093,6 +1093,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>   case HV_X64_HCALL_NOTIFY_LONG_SPIN_WAIT:
>   kvm_vcpu_on_spin(vcpu);
>   break;
> + case HV_X64_HCALL_POST_MESSAGE:
> + case HV_X64_HCALL_SIGNAL_EVENT:
> + vcpu->run->exit_reason = KVM_EXIT_HYPERV;
> + vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL;
> + vcpu->run->hyperv.u.hcall.input = param;
> + vcpu->run->hyperv.u.hcall.params[0] = ingpa;
> + vcpu->run->hyperv.u.hcall.params[1] = outgpa;
> + return 0;
>   default:
>   res = HV_STATUS_INVALID_HYPERCALL_CODE;
>   break;
> @@ -1100,12 +1108,19 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>  
>  set_result:
>   ret = res | (((u64)rep_done & 0xfff) << 32);
> - if (longmode) {
> - kvm_register_write(vcpu, VCPU_REGS_RAX, ret);
> - } else {
> - kvm_register_write(vcpu, VCPU_REGS_RDX, ret >> 32);
> - kvm_register_write(vcpu, VCPU_REGS_RAX, ret & 0x);
> - }
> -
> + kvm_hv_hypercall_set_result(vcpu, ret);
>   return 1;
>  }
> +
> +void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)
> +{
> + bool longmode;
> +
> + longmode = is_64_bit_mode(vcpu);
> + if (longmode)
> + kvm_register_write(vcpu, VCPU_REGS_RAX, result);
> + else {
> + kvm_register_write(vcpu, VCPU_REGS_RDX, result >> 32);
> + kvm_register_write(vcpu, VCPU_REGS_RAX, result & 0x);
> + }
> +}
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index 60eccd4..64a4a3b 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -52,6 +52,7 @@ int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
> u64 *pdata);
>  
>  bool kvm_hv_hypercall_enabled(struct kvm *kvm);
>  int kvm_hv_hypercall(struct kvm_vcpu *vcpu);
> +void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result);
>  
>  void kvm_hv_irq_routing_update(struct kvm *kvm);
>  int kvm_hv_synic_set_irq(struct kvm *kvm, u32 vcpu_id, u32 sint);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f53f5b1..e5c842b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6891,6 +6891,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *kvm_run)
>   } else
>   WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
>  
> + if (unlikely(kvm_run->exit_reason == KVM_EXIT_HYPERV) &&
> + kvm_run->hyperv.type == KVM_EXIT_HYPERV_HCALL)
> + kvm_hv_hypercall_set_result(vcpu,
> + kvm_run->hyperv.u.hcall.result);

Can you use vcpu->arch.complete_userspace_io here instead?

Otherwise looks great, thanks.

Paolo

> +
>   r = vcpu_run(vcpu);
>  
>  out:
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 9da9051..c5519a9 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/k

Re: [Qemu-devel] [PATCH 1/2] qemu-iotests: add _filter_qmp_events() for filtering out QMP events

2016-02-10 Thread Max Reitz
On 09.02.2016 14:23, Sascha Silbe wrote:
> The order of some QMP events may depend on the architecture being
> tested. Add support for filtering out QMP events so we can use a
> single reference output for all architecture when the test doesn't
> care about the events.
> 
> Signed-off-by: Sascha Silbe 
> ---
>  tests/qemu-iotests/common.filter | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/tests/qemu-iotests/common.filter 
> b/tests/qemu-iotests/common.filter
> index 84b7434..b908aa2 100644
> --- a/tests/qemu-iotests/common.filter
> +++ b/tests/qemu-iotests/common.filter
> @@ -178,6 +178,12 @@ _filter_qmp()
>  -e 'QMP_VERSION'
>  }
>  
> +# remove QMP events from output
> +_filter_qmp_events()
> +{
> +sed -e '/^{\(.*, \)"event": ".*}$/ d'
> +}

There is a pretty good reason test 067 uses -qmp-pretty (as you yourself
say, the lines get pretty long otherwise, and if we have any change
within, the whole line needs to be changed). Using the following ugly
piece of code here instead, we would still be able to use it:

tr '\n' '\t' \
  | sed -e
's/{\s*"timestamp":\s*{[^}]*},\s*"event":[^,}]*\(,\s*"data":\s*{[^}]*}\)\?\s*}\s*//g'
\
  | tr '\t' '\n'

(I'm too lazy for multi-line sed, obviously; and this will break if the
data contains any JSON objects in turn.)

The correct way would be to actually parse the JSON (using perl, python
or whatever) and remove all the top-level objects containing an "event"
key, obviously... But that's probably too much.

I'm not strictly against just dropping -qmp-pretty in 067, but there is
a good reason it's there.

Max

> +
>  # replace driver-specific options in the "Formatting..." line
>  _filter_img_create()
>  {
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 4/4] rng: add request queue support to rng-random

2016-02-10 Thread Paolo Bonzini


On 10/02/2016 17:40, Ladi Prosek wrote:
>>
>>
>> On 10/02/2016 16:53, Ladi Prosek wrote:
>>> +req->size = size;
>>> +req->receive_entropy = receive_entropy;
>>> +req->opaque = opaque;
>>> +req->data = g_malloc(req->size);
>>> +
>>> +k->request_entropy(s, req);
>>> +
>>> +s->requests = g_slist_append(s->requests, req);
>>>  }
>>
>> g_slist_append has to traverse the entire list to find the place to add
>> the node.  You probably are better off using QSIMPLEQ (which is an
>> intrusive list unlike GSList).
> 
> This is what rng-egd does today and I would argue that since the expected
> length of the list is very small - it's going to be longer than 1 only
> very rarely - a simple lightweight data structure is a better choice than
> trying to be O(1) in the worst case.
> 
> I'll be happy to switch to QSIMPLEQ if you want though. Your call.

Ok, it can be done on top I guess.  I'll let others review the patches
more closely!

Paolo



Re: [Qemu-devel] [PATCH v2 4/4] rng: add request queue support to rng-random

2016-02-10 Thread Ladi Prosek
> 
> 
> On 10/02/2016 16:53, Ladi Prosek wrote:
> > +req->size = size;
> > +req->receive_entropy = receive_entropy;
> > +req->opaque = opaque;
> > +req->data = g_malloc(req->size);
> > +
> > +k->request_entropy(s, req);
> > +
> > +s->requests = g_slist_append(s->requests, req);
> >  }
> 
> g_slist_append has to traverse the entire list to find the place to add
> the node.  You probably are better off using QSIMPLEQ (which is an
> intrusive list unlike GSList).

This is what rng-egd does today and I would argue that since the expected
length of the list is very small - it's going to be longer than 1 only
very rarely - a simple lightweight data structure is a better choice than
trying to be O(1) in the worst case.

I'll be happy to switch to QSIMPLEQ if you want though. Your call.

Thanks!
Ladi
 
> Thanks,
> 
> Paolo
> 
> 



Re: [Qemu-devel] [PATCH 07/15] tcg-mips: Adjust qemu_ld/st for mips64

2016-02-10 Thread James Hogan
Hi Richard,

On Tue, Feb 09, 2016 at 09:39:55PM +1100, Richard Henderson wrote:
> @@ -1212,11 +1237,24 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg 
> base, TCGReg addrl,
> : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
>  int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
>  
> -tcg_out_opc_sa(s, OPC_SRL, TCG_REG_A0, addrl,
> -   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
> -tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_A0, TCG_REG_A0,
> -(CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
> -tcg_out_opc_reg(s, OPC_ADDU, TCG_REG_A0, TCG_REG_A0, TCG_AREG0);
> +if (use_mips32r2_instructions) {
> +if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
> +tcg_out_opc_bf(s, OPC_EXT, TCG_REG_A0, addrl,
> +   TARGET_PAGE_BITS + CPU_TLB_ENTRY_BITS - 1,
> +   CPU_TLB_ENTRY_BITS);
> +} else {
> +tcg_out_opc_bf64(s, OPC_DEXT, OPC_DEXTM, OPC_DEXTU,
> + TCG_REG_A0, addrl,
> + TARGET_PAGE_BITS + CPU_TLB_ENTRY_BITS - 1,
> + CPU_TLB_ENTRY_BITS);
> +}

The ext/dext here will end up with bits below bit CPU_TLB_ENTRY_BITS
set, which will result in load of addend from slightly offset address,
so things go badly wrong. You still need to either ANDI off the low bits
or trim them off with the ext/dext and shift it left again.

So I don't think there's any benefit to the use of these instructions
unless CPU_TLB_SIZE + CPU_TLB_ENTRY_BITS exceeds the 16-bits available
in the ANDI immediate field for the non r2 case.

Cheers
James

> +} else {
> +tcg_out_opc_sa(s, ALIAS_TSRL, TCG_REG_A0, addrl,
> +   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
> +tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_A0, TCG_REG_A0,
> +(CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
> +}
> +tcg_out_opc_reg(s, ALIAS_PADD, TCG_REG_A0, TCG_REG_A0, TCG_AREG0);
>  
>  /* Compensate for very large offsets.  */
>  if (add_off >= 0x8000) {


signature.asc
Description: Digital signature


Re: [Qemu-devel] [PATCH v3 00/10] Allow hotplug of s390 CPUs

2016-02-10 Thread Andreas Färber
Am 10.02.2016 um 16:28 schrieb David Hildenbrand:
> For x86, cpu models are realized by making x86_64-cpu an abstract class and
> creating loads of new classes, e.g. host-x86_64-cpu or haswell-x86_64-cpu.
> 
> How does 'device_add ' play together with the x86 cpu model
> approach? And with cpu models specified via "-cpu" in general?

device_add needs to use an instantiatable type, like the ones you sketch
above.

> Or does that in return mean, that "making models own classes" is outdated? Or
> will some internal conversion happen that I am missing?
> 
> What is the plan for cpu models and cpu hotplug? How are cpu models to be
> defined in the future?

Someone at IBM was working on defining models for s390x - not sure what
the status is?

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)



Re: [Qemu-devel] [PATCH v2 4/4] rng: add request queue support to rng-random

2016-02-10 Thread Paolo Bonzini


On 10/02/2016 16:53, Ladi Prosek wrote:
> +req->size = size;
> +req->receive_entropy = receive_entropy;
> +req->opaque = opaque;
> +req->data = g_malloc(req->size);
> +
> +k->request_entropy(s, req);
> +
> +s->requests = g_slist_append(s->requests, req);
>  }

g_slist_append has to traverse the entire list to find the place to add
the node.  You probably are better off using QSIMPLEQ (which is an
intrusive list unlike GSList).

Thanks,

Paolo



Re: [Qemu-devel] [PATCH v6 0/5] add ACPI node for fw_cfg on pc and arm

2016-02-10 Thread Gabriel L. Somlo
Ping.

Now that the kernel side seems to have been accepted (Thanks again
Laszlo and Matt for all the help and advice!!!), is there anything
left to clean up before this series could be applied to QEMU ?

gmane.org quick links:
1/5: http://article.gmane.org/gmane.comp.emulators.qemu/389896/raw
2/5: http://article.gmane.org/gmane.comp.emulators.qemu/389894/raw
3/5: http://article.gmane.org/gmane.comp.emulators.qemu/389895/raw
4/5: http://article.gmane.org/gmane.comp.emulators.qemu/389900/raw
5/5: http://article.gmane.org/gmane.comp.emulators.qemu/389898/raw

Thanks much,
--Gabriel


On Wed, Jan 27, 2016 at 10:00:52PM -0500, Gabriel L. Somlo wrote:
> New since v5:
> 
>   - rebased on top of latest QEMU git master
> 
> Thanks,
>   --Gabriel
> 
> >New since v4:
> >
> > - rebased on top of Marc's DMA series
> > - drop machine compat dependency for insertion into x86/ssdt
> >   (patch 3/5), following agreement between Igor and Eduardo
> > - [mm]io register range now covers DMA register as well, if
> >   available.
> > - s/bios/firmware in doc file updates
> >
> >>New since v3:
> >>
> >>- rebased to work on top of 87e896ab (introducing pc-*-25 classes),
> >>  inserting fw_cfg acpi node only for machines >= 2.5.
> >>
> >>- reintroduce _STA with value 0x0B (bit 2 for u/i visibility turned
> >>  off to avoid Windows complaining -- thanks Igor for catching that!)
> >>
> >>If there's any other feedback besides questions regarding the
> >>appropriateness of "QEMU0002" as the value of _HID, please don't hesitate!
> >>
> >>>New since v2:
> >>>
> >>>   - pc/i386 node in ssdt only on machine types *newer* than 2.4
> >>> (as suggested by Eduardo)
> >>>
> >>>I appreciate any further comments and reviews. Hopefully we can make
> >>>this palatable for upstream, modulo the lingering concerns about whether
> >>>"QEMU0002" is ok to use as the value of _HID, which I'll hopefully get
> >>>sorted out with the kernel crew...
> >>>
> New since v1:
> 
>   - expose control register size (suggested by Marc Marí)
> 
>   - leaving out _UID and _STA fields (thanks Shannon & Igor)
> 
>   - using "QEMU0002" as the value of _HID (thanks Michael)
> 
>   - added documentation blurb to docs/specs/fw_cfg.txt
> (mainly to record usage of the "QEMU0002" string with fw_cfg).
> 
> > This series adds a fw_cfg device node to the SSDT (on pc), or to the
> > DSDT (on arm).
> >
> > - Patch 1/3 moves (and renames) the BIOS_CFG_IOPORT (0x510)
> >   define from pc.c to pc.h, so that it could be used from
> >   acpi-build.c in patch 2/3.
> > 
> > - Patch 2/3 adds a fw_cfg node to the pc SSDT.
> > 
> > - Patch 3/3 adds a fw_cfg node to the arm DSDT.
> >
> > I made up some names - "FWCF" for the node name, and "FWCF0001"
> > for _HID; no idea whether that's appropriate, or how else I should
> > figure out what to use instead...
> >
> > Also, using scope "\\_SB", based on where fw_cfg shows up in the
> > output of "info qtree". Again, if that's wrong, please point me in
> > the right direction.
> >
> > Re. 3/3 (also mentioned after the commit blurb in the patch itself),
> > I noticed none of the other DSDT entries contain a _STA field, wondering
> > why it would (not) make sense to include that, same as on the PC.
> 
> Gabriel L. Somlo (5):
>   fw_cfg: expose control register size in fw_cfg.h
>   pc: fw_cfg: move ioport base constant to pc.h
>   acpi: pc: add fw_cfg device node to ssdt
>   acpi: arm: add fw_cfg device node to dsdt
>   fw_cfg: document ACPI device node information
> 
>  docs/specs/fw_cfg.txt |  9 +
>  hw/arm/virt-acpi-build.c  | 15 +++
>  hw/i386/acpi-build.c  | 29 +
>  hw/i386/pc.c  |  5 ++---
>  hw/nvram/fw_cfg.c |  4 +++-
>  include/hw/i386/pc.h  |  2 ++
>  include/hw/nvram/fw_cfg.h |  3 +++
>  7 files changed, 63 insertions(+), 4 deletions(-)
> 
> -- 
> 2.4.3
> 



  1   2   3   >