Re: [Qemu-devel] [RFC qemu PATCH] only writing out the last byte of MAC makes it have effect

2013-03-21 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Thu, Mar 21, 2013 at 02:44:50PM +0800, Amos Kong wrote:
>> The lengcy guests don't have mac programming command, we don't know when
>> it's safe to use MAC. This patch changed qemu to makes MAC change effect
>> when the last byte of MAC is written to config space.
>> 
>> MAC address takes first 6 bytes of config space of virtio-net, the addr
>> is 5 when the last byte is written in virtio_config_writeb().
>> 
>> MAC change will effect when n->mac is updated in virtio_net_set_config().
>> 
>> Signed-off-by: Amos Kong 
>
> Let's see what Rusty says about the spec change.

Implementation notes like this belong as a footnote, eg:

For older systems, it is recommended and typical that the device
write byte 5 of the mac address last, so devices can use that as
a trigger to commit the mac address change.

Now, is this a real, or theoretical issue?  Have we seen this problem in
practice, or should we continue to ignore it?

Cheers,
Rusty.




Re: [Qemu-devel] [PATCH 12/12] target-i386: implement CPU hot-add

2013-03-21 Thread Eric Blake
On 03/21/2013 08:28 AM, Igor Mammedov wrote:
> ... via do_cpu_hot_add() hook called by cpu_set QMP command,
> for x86 target.
> 
> * add extra check that APIC ID is in allowed range

> +if (x86_cpu_is_cpu_exist(qdev_get_machine(), &apic_id)) {
> +error_setg(errp, "Unable to add CPU with ID: 0x%" PRIx64
> +   ", it's already exists", apic_id);

s/it's/it/

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 11/12] qmp: add cpu_set qmp command

2013-03-21 Thread Eric Blake
On 03/21/2013 08:28 AM, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov 
> ---
> +++ b/qapi-schema.json
> @@ -1385,6 +1385,15 @@
>  { 'command': 'cpu', 'data': {'index': 'int'} }
>  
>  ##
> +# @cpu_set
> +#

I already mentioned naming this cpu-set on your cover letter.  Additionally:

> +# Sets specified cpu to online/ofline mode

s/ofline/offline/

> +#
> +# Notes: semantics is : cpu_set id=x status=online|offline
> +##
> +{ 'command': 'cpu_set', 'data': {'id': 'int', 'status': 'str'} }

Don't open-code a 'str'.  My preference would be a bool, with slightly
different naming.

Also, this is not the typical documentation style used in this file.
Rather, it would be more like:

# Sets specified cpu to online/offline mode
#
# @id: cpu id to be updated
#
# @online: true to put the cpu online, false to take it offline
#
##
{ 'command': 'cpu-set', 'data': {'id': 'int', 'online': 'bool'} }


> +void qmp_cpu_set(int64_t id, const char *status, Error **errp)
> +{
> +if (!strcmp(status, "online")) {
> +do_cpu_hot_add(id, errp);
> +} else if (!strcmp(status, "offline")) {
> +error_setg(errp, "Unplug is not implemented");
> +} else {
> +error_setg(errp, "Invalid parameter '%s'", status);
> +return;

This return could be omitted, with no change in behavior.  But again, I
think a bool rather than an open-coded string parser would make this
simpler:

void qmp_cpu_set(int64_t id, bool online, Error **errp)
{
if (online) {
do_cpu_hot_add(id, errp);
} else {
error_setg(errp, "Unplug is not implemented");
}
}

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v15 4/6] pvpanic: add document of pvpanic

2013-03-21 Thread Eric Blake
On 03/21/2013 02:35 AM, Hu Tao wrote:
> Signed-off-by: Hu Tao 
> ---
>  docs/specs/pvpanic.txt | 25 +
>  1 file changed, 25 insertions(+)
>  create mode 100644 docs/specs/pvpanic.txt

Grammar nits:

> 
> diff --git a/docs/specs/pvpanic.txt b/docs/specs/pvpanic.txt
> new file mode 100644
> index 000..761d20c
> --- /dev/null
> +++ b/docs/specs/pvpanic.txt
> @@ -0,0 +1,25 @@
> +PVPANIC DEVICE
> +==
> +
> +pvpanic device is a simulated ISA device, through which guest panic

s/which/which a/

> +event is sent to qemu, and a QMP event is generated. This allows
> +management apps(e.g. libvirt) to be notified and respond to the

space before ( in English text.

> +event.
> +
> +pvpanic uses port 0x505 by default to receive panic event from guest.

s/receive/receive a/
s/from guest/from the guest/

> +The port is configurable by specifying ioport property.
> +
> +pvpanic device is defined with ACPI ID "QEMU0001". To send panic

s/send/send a/

> +event, guest evaluates method WRPT, specifying a byte, bit 0 set,
s/guest/the guest/
/byte, bit 0/byte with bit 0/

> +as argument. Other bits are reserved.
> +
> +To use it, one will have to:
> +
> +1. add the device by specifying `-device pvpanic' in the qemu command
> +   line.
> +
> +2. load pvpanic device driver in guest OS.

s/in/in the/

> +
> +The management app has the options of waiting for GUEST_PANICKED events,

s/options/option/

> +and/or polling for guest-panicked RunState, to learn when the pvpanic
> +device has fired a panic event.
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-system-mipsel)

2013-03-21 Thread Yeongkyoon Lee

On 03/22/2013 07:11 AM, Aurélien Jarno wrote:

On Thu, Mar 21, 2013 at 04:04:44PM +0900, Yeongkyoon Lee wrote:

On 03/18/2013 07:27 AM, Aurélien Jarno wrote:

On Wed, Mar 06, 2013 at 07:10:17AM +0100, Aurélien Jarno wrote:

On Wed, Mar 06, 2013 at 11:05:15AM +0900, Yeongkyoon Lee wrote:

On 03/05/2013 11:18 PM, Aurélien Jarno wrote:

On Mon, Mar 04, 2013 at 05:37:31PM +0100, Aurélien Jarno wrote:

Hi,

On Sat, Feb 23, 2013 at 11:10:18PM +0100, Stefan Weil wrote:

This assertion occured with latest git master:

qemu-system-mipsel: /src/qemu/tcg/tcg-op.h:2589:
  tcg_gen_goto_tb: Assertion `(tcg_ctx.goto_tb_issue_mask & (1 << idx))
== 0' failed.
Aborted

QEMU was built with --enable-debug and running a Debian MIPS Lenny (NFS
root).
The assertion happened when running "apt-get update" in the guest.


Is it something reproductible or more or less random? Have you Cc:ed
Richard because it's related to the latest patches?

On my side I am experiencing random segfaults in various guests (at
least PowerPC, MIPS, SH4 and ARM). I have found a way to bisect it, even
if it is quite long (building Perl + the testsuite). Currently I know
that 1.3 is affected, while 1.2 is not.


I have found that the issue comes from the following commits, which
unfortunately are not bisectable one by one (though it won't change the
results a lot):

 commit b76f0d8c2e3eac94bc7fd90a510cb7426b2a2699
 Author: Yeongkyoon Lee 
 Date:   Wed Oct 31 16:04:25 2012 +0900
 tcg: Optimize qemu_ld/st by generating slow paths at the end of a block
 Add optimized TCG qemu_ld/st generation which locates the code of TLB 
miss
 cases at the end of a block after generating the other IRs.
 Currently, this optimization supports only i386 and x86_64 hosts.
 Signed-off-by: Yeongkyoon Lee 
 Signed-off-by: Blue Swirl 
 commit fdbb84d1332ae0827d60f1a2ca03c7d5678c6edd
 Author: Yeongkyoon Lee 
 Date:   Wed Oct 31 16:04:24 2012 +0900
 tcg: Add extended GETPC mechanism for MMU helpers with ldst 
optimization
 Add GETPC_EXT which is used by MMU helpers to selectively calculate 
the code
 address of accessing guest memory when called from a qemu_ld/st 
optimized code
 or a C function. Currently, it supports only i386 and x86-64 hosts.
 Signed-off-by: Yeongkyoon Lee 
 Signed-off-by: Blue Swirl 
 commit 32761257c0b9fa7ee04d2871a6e48a41f119c469
 Author: Yeongkyoon Lee 
 Date:   Wed Oct 31 16:04:23 2012 +0900
 configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st 
optimization
 Enable CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization 
only when
 a host is i386 or x86_64.
 Signed-off-by: Yeongkyoon Lee 
 Signed-off-by: Blue Swirl 

I will try to understand why.



Hi Aurélien,
Do you mean that those random segfaults occurred only when
configured with "--enable-debug"?
Although I cannot see how my commits affect debug built image at a
glance, I'll do double-check.
Thanks.

The problem is there even without configuring QEMU with --enable-debug.
It justs doesn't happens very often, and very randomly. The only way to
reproduce it each time is to launch a big task in the guest (for me
building Perl) and see if it completes or now. It can take up to one
hour until it happens.

I should precise that the segfault is on the guest side.

I have tried to look at your patches, and so far I haven't found the
issue. It seems the two first patches are fine, ie I have verified the
return address is always correctly computed.


I still haven't found the issue, but on the other hand I can't find any
problem in your code, after reading it dozen of times. I also tried to
modify it as less as possible while issuing the slow path back inside
the TB and it fixes the problem. So it really looks like to be due to
the slow path being at the end of the TB, and not to a bug in the code
generating it. After adding various checks, I am also convinced the
address computed in GETPC_EXT() is always correct. I have to say I am
running out of ideas.

One way to reproduce the issue more easily is to reduce the size of the
generated code buffer, for example by setting it to 512kB for both
MIN_CODE_GEN_BUFFER_SIZE and MAX_CODE_GEN_BUFFER_SIZE in
translate-all.c. That way booting an ARM guest triggers plenty of
segmentation faults or other strange issues with your patch but not
without.

OTOH increasing this size make the issue to almost disappear even when
building perl including the testsuite (for that it has to be at least
512MB).


Although I've not succeeded to reproduce the problem, I've found a
suspicious code stub about boundary-checking of generated code
(is_tcg_gen_code() in translate-all.c).

The code is supposed to be changed as follows.case
Before:
 return (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
 tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer +
 tcg_ctx.code_gen_buff

Re: [Qemu-devel] [PATCH][RFC 0/14] implement power chip

2013-03-21 Thread li guang
在 2013-03-21四的 10:41 +,Peter Maydell写道:
> On 21 March 2013 00:36, li guang  wrote:
> > 在 2013-03-20三的 10:50 +,Peter Maydell写道:
> >> The devices should just implement appropriate signals/connections
> >> if they have a means of talking to a power controller, and the
> >> board model should wire them up. That's all.
> >
> > Hmm, can you give some demo of signals implementation?
> > (I'm hesitating to say do you mean signal() or sigaction()? :) )
> 
> No, I don't mean signals in the POSIX signals sense. For a
> simple yes/no kind of connection, qemu_irq is what we currently
> use. If you need something more complicated QOM should be able
> to do that but I don't have an example to hand.
> 

OK, thanks!




Re: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Jason Gunthorpe
On Thu, Mar 21, 2013 at 07:42:37PM +0200, Michael S. Tsirkin wrote:

> It doesn't actually, and our app would sometimes write to these pages.
> It simply does not care which version does the remote get in this case
> since we track writes and resend later.

Heh, somehow I thought you might say that :)

A new flag seems like the only way then - maybe:
  IBV_ACCESS_NON_COHERENT - The adaptor and the CPU do not share
 a coherent view of registered memory. Memory writes from the CPU
 after ibv_reg_mr completes may not be reflected in the memory
 viewed by the adaptor.

 Can only be combined with read only access permissions.

Jason



Re: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Jason Gunthorpe
On Thu, Mar 21, 2013 at 08:16:33PM +0200, Michael S. Tsirkin wrote:

> This is the one I find redundant. Since the write will be done by
> the adaptor under direct control by the application, why does it
> make sense to declare this beforehand?  If you don't want to allow
> local write access to memory, just do not post any receive WRs with
> this address.  If you posted and regret it, reset the QP to cancel.

This is to support your COW scenario - the app declares before hand to
the kernel that it will write to the memory and the kernel ensures
pages are dedicated to the app at registration time. Or the app says
it will only read and the kernel could leave them shared.

The adaptor enforces the access control to prevent a naughty app from
writing to shared memory - think about mmap'ing libc.so and then using
RDMA to write to the shared pages. It is necessary to ensure that is
impossible.

Jason



Re: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Jason Gunthorpe
On Thu, Mar 21, 2013 at 09:15:41PM +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 21, 2013 at 12:41:35PM -0600, Jason Gunthorpe wrote:
> > On Thu, Mar 21, 2013 at 08:16:33PM +0200, Michael S. Tsirkin wrote:
> > 
> > > This is the one I find redundant. Since the write will be done by
> > > the adaptor under direct control by the application, why does it
> > > make sense to declare this beforehand?  If you don't want to allow
> > > local write access to memory, just do not post any receive WRs with
> > > this address.  If you posted and regret it, reset the QP to cancel.
> > 
> > This is to support your COW scenario - the app declares before hand to
> > the kernel that it will write to the memory and the kernel ensures
> > pages are dedicated to the app at registration time. Or the app says
> > it will only read and the kernel could leave them shared.
> 
> Someone here is confused. LOCAL_WRITE/absence of it does not address
> COW, it breaks COW anyway.  Are you now saying we should change rdma so
> without LOCAL_WRITE it will not break COW?

I am talking about 'from a spec' perspective - not what Linux does
today. The absence of LOCAL_WRITE is part of the specification to
support shared pages.

Pages can only be kept shared if all the ACCESS WRITE bits are clear -
today Linux always breaks the COW, but if you patch in the ability to
keep things shared then it must only happen when *all* the ACCESS
WRITE bits are clear.

> > The adaptor enforces the access control to prevent a naughty app from
> > writing to shared memory - think about mmap'ing libc.so and then using
> > RDMA to write to the shared pages. It is necessary to ensure that is
> > impossible.

> That's why it's redundant: we can't trust an application to tell us
> 'this page is writeable', we must get this info from kernel.  And so
> there's apparently no need for application to tell adaptor about
> LOCAL_WRITE.

The API design gives user space maximum flexibility, if it wants to
create an enforced no-write MR in otherwise writable pages by skipping
LOCAL_WRITE then it can do so.

The kernel's role in this should be to deny ibv_reg_mr with WRITE bits
set if the pages are not writable by the app - I don't know if it does
this today, it isn't critically important as long as the pages are
unshared.

Jason



Re: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Jason Gunthorpe
On Thu, Mar 21, 2013 at 11:39:47AM +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 21, 2013 at 02:13:38AM -0700, Roland Dreier wrote:
> > On Thu, Mar 21, 2013 at 1:51 AM, Michael S. Tsirkin  wrote:
> > >> In that case, no, I don't see any reason for LOCAL_WRITE, since the
> > >> only RDMA operations that will access this memory are remote reads.
> > >
> > > What is the meaning of LOCAL_WRITE then? There are no local
> > > RDMA writes as far as I can see.
> > 
> > Umm, it means you're giving the local adapter permission to write to
> > that memory.  So you can use it as a receive buffer or as the target
> > for remote data from an RDMA read operation.
> 
> Well RDMA read has it's own flag, IB_ACCESS_REMOTE_READ.
> I don't see why do you need to give adapter permission

The access flags have to do with what independent access remote nodes
get. There are four major cases:

access = IBV_ACCESS_REMOTE_READ says the adaptor will let remote nodes
read the memory.

access = 0 (ie IBV_ACCESS_LOCAL_READ) says that only the adaptor, under
the direct control of the application, can read this memory. Remote
nodes are barred.

access = IBV_ACCESS_REMOTE_WRITE|IBV_ACCESS_LOCAL_WRITE says the adaptor
will let remote nodes write the memory

access = IBV_ACCESS_LOCAL_WRITE bars remote nodes from writing to that
memory. Only the adaptor, under the direct control of the application,
can write the memory.

The fact LOCAL_READ/REMOTE_READ exists makes it possible to do what
you want - it guarentees the adaptor will never write to this memory
under any circumstances, so you can leave the page COW'd. If
LOCAL_WRITE was implied then you'd have to COW everything..

Would it be better to drive the COW break decision off the region's MM
flags? Ie if the memory is mapped read only into the process then you
can keep the COW at the RDMA layer, otherwise you should break
it. That seems more natural than a new flag?

Jason



Re: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Jason Gunthorpe
On Thu, Mar 21, 2013 at 07:15:25PM +0200, Michael S. Tsirkin wrote:

> No because application does this:
> init page
> 
> ...
> 
> after a lot of time
> 
> ..
> 
> register
> send
> unregister
> 
> so it can not be read only.

mprotect(READONLY)
register
send
unregister
mprotect(WRITABLE)

?

With something like GIFT the app already has to give up writing to the
pages while they are moving, so changing the protection seems in line
with that?

Jason



Re: [Qemu-devel] [PATCH 0/5] target-ppc: fix add-with-carry in narrow mode

2013-03-21 Thread Alexander Graf

On 21.03.2013, at 21:01, Richard Henderson wrote:

> The first patch fixes the problem reported by agraf here:
> 
>  http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg03747.html
> 
> The subsequent patches use the macro added in the first patch
> to remove a bunch of conditional compilation.
> 
> With this (or even just the first patch), I can boot Alex's
> test kernel all the way to the no rootfs panic.

Thanks a lot, applied all to ppc-next.

And yes, this does indeed fix the breakage I reported earlier :).


Alex




Re: [Qemu-devel] [RFC ppc-next PATCH 6/6] kvm/openpic: in-kernel mpic support

2013-03-21 Thread Alexander Graf

On 21.03.2013, at 22:59, Scott Wood wrote:

> On 03/21/2013 04:29:02 PM, Alexander Graf wrote:
>> Am 21.03.2013 um 21:50 schrieb Scott Wood :
>> > On 03/21/2013 03:41:19 AM, Alexander Graf wrote:
>> >> Can't all the stuff above here just simply go into the qdev init function?
>> >
>> > Not if you want platform code to be able to fall back to a QEMU mpic if an 
>> > in-kernel mpic is unavailable.
>> Do we want that? We used to have a default like that in qemu-kvm back in the 
>> day. That was very confusing, as people started to report that their VMs 
>> turned out to be really slow.
>> I think we should not have fallback code. It makes things easier and more 
>> obvious. The default should just depend on the host's capabilities.
> 
> I don't follow.  What is the difference between "falling back" and "depending 
> on the host's capabilities"?  Either we can create an in-kernel MPIC or we 
> can't.  We could use KVM_CREATE_DEVICE_TEST to see if the device type is 
> supported separately from actually creating it, but I don't see what that 
> would accomplish other than adding more code.

We usually have settled on a tri-state way to change QEMU behavior for most 
machine options:

  -machine  is not specified -> best possible behavior in the current 
system
  -machine =on -> turn the option on, fail if that doesn't work
  -machine =off -> turn the option off always

So for the in-kernel irqchip, we should follow the same model. If the -machine 
option is not passed in, we should try to allocate an in-kernel irqchip if 
possible. If the kernel advertises that it can do an in-kernel irqchip, but in 
fact it can't, I would consider that simply a bug we shouldn't worry about.

However, when the user says -machine kernel_irqchip=on, then we should create a 
kvm based irqchip. And if we can't do that, machine creation should just fail.

> 
>> >> > /* MPIC */
>> >> > mpic = g_new(qemu_irq, 256);
>> >> > -dev = qdev_create(NULL, "openpic");
>> >> > -qdev_prop_set_uint32(dev, "nb_cpus", smp_cpus);
>> >> > -qdev_prop_set_uint32(dev, "model", params->mpic_version);
>> >> > +
>> >> > +if (kvm_irqchip_wanted()) {
>> >> > +dev = kvm_openpic_create(NULL, params->mpic_version);
>> >> This really should be just a
>> >>dev = qdev_create(NULL, kvm_irqchip_wanted() ? "kvm-openpic" : 
>> >> "openpic");
>> >> The logic whether an in-kernel irqchip is available belongs into the 
>> >> default setting of kvm_irqchip_wanted.
>> >
>> > That is exactly what I was trying to avoid by introducing 
>> > kvm_irqchip_wanted.  We're no longer testing some vague generic irqchip 
>> > capability, but the presence of a specific type of device (and version 
>> > thereof).  How would the code that sets kvm_irqchip_wanted know what to 
>> > test for?
>> Then move the default code into the board file and check for the in-kernel 
>> mpic cap.
> 
> I'm not quite sure what you mean by "the default code" -- if you mean the 
> part that makes the decision whether to fall back or error out, that's 
> already in board code.

No, currently that lives mostly in kvm-all.c. I'm talking about the code that 
checks qemu_opt_get_bool("kernel_irqchip") and decides what to do based on that.


Alex




Re: [Qemu-devel] [SeaBIOS] [PATCH v15 2/2] patch dsdt to use passed-in pvpanic ioport

2013-03-21 Thread Kevin O'Connor
On Thu, Mar 21, 2013 at 05:08:34PM +0800, Hu Tao wrote:
> Signed-off-by: Hu Tao 

I don't think it is a good idea to dynamically modify the DSDT.  We've
been using the SSDT for that.

In any case, I think this would be a good candidate for merging after
the ACPI stuff is moved into QEMU.

-Kevin



Re: [Qemu-devel] [Qemu-ppc] [RFC ppc-next PATCH 3/6] memory: add memory_region_to_address()

2013-03-21 Thread Scott Wood

On 03/21/2013 06:51:57 AM, Alexander Graf wrote:


On 21.03.2013, at 12:49, Alexander Graf wrote:

>
> On 21.03.2013, at 12:44, Peter Maydell wrote:
>
>> On 21 March 2013 11:38, Alexander Graf  wrote:
>>>
>>> On 21.03.2013, at 12:32, Peter Maydell wrote:
>>>
 On 21 March 2013 11:29, Alexander Graf  wrote:
> On 21.03.2013, at 12:22, Peter Maydell wrote:
>> We already nest the VGIC inside another memory region (the  
a15mpcore
>> container), and it works fine. This function is just iterating  
through

>> "everything any device asked me to tell the kernel about".
>
> So kda is the real physical offset? I'm having a hard time  
reading that code :). According to this function:

>
> static void kvm_arm_devlistener_add(MemoryListener *listener,
>  MemoryRegionSection *section)
> {
>  KVMDevice *kd;
>
>  QSLIST_FOREACH(kd, &kvm_devices_head, entries) {
>  if (section->mr == kd->mr) {
>  kd->kda.addr = section->offset_within_address_space;
>  }
>  }
> }


What if the update is to a parent memory region, not to the one  
directly associated with the device?


Or does add() get called for all child regions (recursively) in such  
cases?



>>> The distinction on whether a region is handled by KVM really needs
>>> to be done by the device model.
>>
>> It is -- the device model is what calls kvm_arm_register_device().
>> It's just the mechanics of "how do we tell the kernel the right
>> address for this region at the point when we know it" that are
>> handled in kvm.c.
>
> I think I'm slowly grasping what you're aiming at :). Ok, that  
works. You do actually do the listener in the device model, just that  
you pass code responsibility over to kvm.c.

>
> That's perfectly valid and sounds like a good model that Scott  
probably wants to follow as well :).


s/follow/evaluate/ :).

The currently proposed device api doesn't have a generic notion of  
device regions. Regions are a per-device property, because a single  
device can have multiple regions.


However, maybe with a bit of brainstorming we could come up with a  
reasonably generic scheme.


In the kernel API?  Or do you mean a generic scheme within QEMU that  
encodes any reasonably expected mechanism for setting the device adress  
(e.g. assume that it is either a 64-bit attribute, or uses the legacy  
ARM API), or perhaps a callback into device code?


The MPIC's memory listener isn't that much code... I'm not sure there's  
a great need for a central KVM registry.


-Scott



Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-system-mipsel)

2013-03-21 Thread Aurélien Jarno
On Thu, Mar 21, 2013 at 04:04:44PM +0900, Yeongkyoon Lee wrote:
> On 03/18/2013 07:27 AM, Aurélien Jarno wrote:
> >On Wed, Mar 06, 2013 at 07:10:17AM +0100, Aurélien Jarno wrote:
> >>On Wed, Mar 06, 2013 at 11:05:15AM +0900, Yeongkyoon Lee wrote:
> >>>On 03/05/2013 11:18 PM, Aurélien Jarno wrote:
> On Mon, Mar 04, 2013 at 05:37:31PM +0100, Aurélien Jarno wrote:
> >Hi,
> >
> >On Sat, Feb 23, 2013 at 11:10:18PM +0100, Stefan Weil wrote:
> >>This assertion occured with latest git master:
> >>
> >>qemu-system-mipsel: /src/qemu/tcg/tcg-op.h:2589:
> >>  tcg_gen_goto_tb: Assertion `(tcg_ctx.goto_tb_issue_mask & (1 << idx))
> >>== 0' failed.
> >>Aborted
> >>
> >>QEMU was built with --enable-debug and running a Debian MIPS Lenny (NFS
> >>root).
> >>The assertion happened when running "apt-get update" in the guest.
> >>
> >Is it something reproductible or more or less random? Have you Cc:ed
> >Richard because it's related to the latest patches?
> >
> >On my side I am experiencing random segfaults in various guests (at
> >least PowerPC, MIPS, SH4 and ARM). I have found a way to bisect it, even
> >if it is quite long (building Perl + the testsuite). Currently I know
> >that 1.3 is affected, while 1.2 is not.
> >
> I have found that the issue comes from the following commits, which
> unfortunately are not bisectable one by one (though it won't change the
> results a lot):
> 
>  commit b76f0d8c2e3eac94bc7fd90a510cb7426b2a2699
>  Author: Yeongkyoon Lee 
>  Date:   Wed Oct 31 16:04:25 2012 +0900
>  tcg: Optimize qemu_ld/st by generating slow paths at the end of 
>  a block
>  Add optimized TCG qemu_ld/st generation which locates the code 
>  of TLB miss
>  cases at the end of a block after generating the other IRs.
>  Currently, this optimization supports only i386 and x86_64 hosts.
>  Signed-off-by: Yeongkyoon Lee 
>  Signed-off-by: Blue Swirl 
>  commit fdbb84d1332ae0827d60f1a2ca03c7d5678c6edd
>  Author: Yeongkyoon Lee 
>  Date:   Wed Oct 31 16:04:24 2012 +0900
>  tcg: Add extended GETPC mechanism for MMU helpers with ldst 
>  optimization
>  Add GETPC_EXT which is used by MMU helpers to selectively 
>  calculate the code
>  address of accessing guest memory when called from a qemu_ld/st 
>  optimized code
>  or a C function. Currently, it supports only i386 and x86-64 
>  hosts.
>  Signed-off-by: Yeongkyoon Lee 
>  Signed-off-by: Blue Swirl 
>  commit 32761257c0b9fa7ee04d2871a6e48a41f119c469
>  Author: Yeongkyoon Lee 
>  Date:   Wed Oct 31 16:04:23 2012 +0900
>  configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st 
>  optimization
>  Enable CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st 
>  optimization only when
>  a host is i386 or x86_64.
>  Signed-off-by: Yeongkyoon Lee 
>  Signed-off-by: Blue Swirl 
> 
> I will try to understand why.
> 
> 
> >>>Hi Aurélien,
> >>>Do you mean that those random segfaults occurred only when
> >>>configured with "--enable-debug"?
> >>>Although I cannot see how my commits affect debug built image at a
> >>>glance, I'll do double-check.
> >>>Thanks.
> >>The problem is there even without configuring QEMU with --enable-debug.
> >>It justs doesn't happens very often, and very randomly. The only way to
> >>reproduce it each time is to launch a big task in the guest (for me
> >>building Perl) and see if it completes or now. It can take up to one
> >>hour until it happens.
> >>
> >>I should precise that the segfault is on the guest side.
> >>
> >>I have tried to look at your patches, and so far I haven't found the
> >>issue. It seems the two first patches are fine, ie I have verified the
> >>return address is always correctly computed.
> >>
> >I still haven't found the issue, but on the other hand I can't find any
> >problem in your code, after reading it dozen of times. I also tried to
> >modify it as less as possible while issuing the slow path back inside
> >the TB and it fixes the problem. So it really looks like to be due to
> >the slow path being at the end of the TB, and not to a bug in the code
> >generating it. After adding various checks, I am also convinced the
> >address computed in GETPC_EXT() is always correct. I have to say I am
> >running out of ideas.
> >
> >One way to reproduce the issue more easily is to reduce the size of the
> >generated code buffer, for example by setting it to 512kB for both
> >MIN_CODE_GEN_BUFFER_SIZE and MAX_CODE_GEN_BUFFER_SIZE in
> >translate-all.c. That way booting an ARM guest triggers plenty of
> >segmentation faults or other strange issues with your patch but not
> >without.
> >
> >OTOH increasing this size ma

Re: [Qemu-devel] [PATCH v4 7/9] Extend test-visitor-serialization with ASN.1 visitor(s)

2013-03-21 Thread Stefan Berger

On 03/21/2013 05:59 PM, Eric Blake wrote:

On 03/21/2013 12:29 PM, Stefan Berger wrote:

+++ b/tests/test-visitor-serialization.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 

Is this addition needed?  I don't see any math functions added.



Indeed. Removed.

Stefan




Re: [Qemu-devel] [PATCH v4 9/9] ASN.1 specific test cases

2013-03-21 Thread Eric Blake
On 03/21/2013 12:29 PM, Stefan Berger wrote:
> BER visitor tests give us some assurance that the BER visitor
> code works, and also end up by extention helping out on our

s/extention/extension/

> code coverage of the filesystem tests.
> After the output visitor invocation the resuling buffer is

s/resuling/resulting/

> compared against a known byte stream -- this will lock the
> implementation into producing specific byte arrays.
> 
> Signed-off-by: Stefan Berger 
> Signed-off-by: Joel Schopp 
> ---
>  tests/Makefile   |   9 +
>  tests/test-ber-visitor.c | 746 
> +++
>  2 files changed, 755 insertions(+)
>  create mode 100644 tests/test-ber-visitor.c
> 

> +tests/test-ber-visitor.o: $(addprefix include/qapi/, ber.h 
> ber-input-visitor.h ber-output-visitor.h) $(addprefix qapi/, ber-common.c 
> ber-input-visitor.c ber-output-visitor.c)
> +tests/test-ber-visitor$(EXESUF): tests/test-ber-visitor.o $(tools-obj-y) 
> qapi/ber-output-visitor.o qapi/ber-input-visitor.o qapi/ber-common.o 
> $(block-obj-y) libqemuutil.a libqemustub.a

Long lines - worth using backslash-newline continuation?

> +++ b/tests/test-ber-visitor.c
> @@ -0,0 +1,746 @@
> +/*
> + * BER Output Visitor unit-tests.
> + *
> + * Copyright (C) 2011 Red Hat Inc.
> + * Copyright (C) 2011 IBM Corporation

It's 2013 (probably applies to other files earlier in the series, as well).

> +static void test_visitor_out_string(TestInputOutputVisitor *data,
> +const void *unused)
> +{
> +char *string_in = (char *) "Q E M U", *string_out = NULL;

Does the fact that you have to cast here...

> +Error *errp = NULL;
> +
> +visit_type_str(data->ov, &string_in, NULL, &errp);

...indicate a lack of const-correctness on visit_type_str()?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 1/2] char: add qemu_chr_be_is_fe_connected

2013-03-21 Thread Alon Levy
> > Alon Levy  writes:
> > 
> > >> Alon Levy  writes:
> > >> 
> > >> > Note that the handler is called chr_is_guest_connected and not
> > >> > chr_is_fe_connected, consistent with other members of
> > >> > CharDriverState.
> > >> 
> > >> Sorry, I don't get it.
> > >> 
> > >> There isn't a notion of "connected" for the front-ends in the
> > >> char
> > >> layer.  The closest thing is whether add_handlers() have been
> > >> called
> > >> or
> > >> not.
> > >
> > > It makes sense for virtio-console - it matches exactly the
> > > internal
> > > guest_connected port state.
> > 
> > I still don't understand why you need to know that detail in the
> > backend.  Hint: you should explain this in future commit
> > messages/cover
> > letters.
> 
> Hint will be taken into future commit messages.
> 
> Actually it already exists in the last commit message: currently,
> when the migration target finishing migration and enters the running
> state, the spice server has never received any indication that the
> guest agent is alive, and so it assumes it isn't. In the source
> machine, this is accomplished by the chr_guest_open callback
> implemented by spice_chr_guest_open. To make sure the target does
> see this event, the second patch adds a post_load for
> spicevmc/spiceport that will check the front end connected state and
> call spice_chr_guest_open if the front end is connected. spicevmc is
> the back end in this case, virtio-console is the front end.
> 
> > 
> > >> I really dislike the idea of introduction a new concept to the
> > >> char
> > >> layer in a half baked way.
> > >
> > > Is the fact there is only one user, virtio-console, the reason
> > > you
> > > call this half baked?
> > 
> > You are introducing a function:
> > 
> > qemu_chr_be_is_virtio_console_connnected()
> > 
> > And calling pretending like it's a generic character device
> > interface.
> > It's not.
> 
> There are guest open and guest closed callbacks in the generic
> character device interface. This is merely adding a convenient state
> that could be inferred by reading the history of those calls. In
> what way is it new or pretend?
> 
> > 
> > If spicevmc only works with virtio-console and has no hope of
> > working
> > with anything else, don't use the character device layer!  It's
> > trying
> > to fit a square peg into a round hole.
> 
> spicevmc is used by usbredir and ccid-card-passthru in addition to
> virtio-console. The bug/problem I am trying to solve is however only
> happening with the vdagent interface that uses virtio-console at the
> moment. It is not strange to assume it will use something else at
> some point, since it only requires a transport. It matches with the
> char device interface very well.
> 
> > 
> > If it's a concept that makes sense for all character devices front
> > ends,
> > then it should be a patch making it be a meaningful to all
> > character
> > device front end implementations.
> 
> It does make sense, since it is just tracking chr_guest_open &
> chr_guest_close history. But in order to work across migration it
> needs to have vmstate. Is vmstate in the chardev layer acceptable,
> something like the patch at the end of this mail?
> 
> >
> > >> Why can't migration notifiers be used for this?  I think Gerd
> > >> objected
> > >> to using a migration *handler* but not necessarily a state
> > >> notifier.
> > >
> > > Because if you have two chardevices, i.e.
> > >  -chardev spicevmc,name=vdagent,id=spice1 -chardev
> > >  spicevmc,name=vdagent,id=spice2
> > >
> > > Then the two on-the-wire vmstates will be identical.
> > 
> > I don't understand why this matters but I don't understand what the
> > problem you're trying to solve is either so that's not surprising.
> 
> Perhaps I'm missing something, or it's a non problem. I'm waiting for
> Gerd to explain it better then me since he raised it.
> 
> I didn't mean identical, that was a mistake - I meant they would have
> identifiers that are only distinguished by the order the
> corresponding "-chardev" appeared on the command line. So if the
> target vm had the two chardevs (that are connected to different
> devices) reversed, it could load the wrong state to both.
> 
> > 
> > Regards,
> > 
> > Anthony Liguori
> 
> Introducing fe_opened vmstate for replay of chr_guest_open for
> spice-qemu-char chardev (the second patch remains the same other
> then the renamed api to qemu_chr_be_is_fe_open): (this comes on top
> of a patch I omitted that renames CharDeviceState->opened to
> be_opened).
> 
> commit 35f3ce9dbaaa5059e8100c779eedbc0bf5bb98d2
> Author: Alon Levy 
> Date:   Thu Mar 21 17:24:00 2013 +0200
> 
> char: add qemu_chr_be_is_fe_open
> 
> This function returns the current open state of the guest, it
> tracks the
> existing fe called functions qemu_chr_fe_open and
> qemu_chr_fe_close,
> including adding vmstate to track this across migration.
> 
> Signed-off-by: Alon Levy 
> 
> diff --git a/include/char/char.h b/include/char/char

Re: [Qemu-devel] [RFC ppc-next PATCH 6/6] kvm/openpic: in-kernel mpic support

2013-03-21 Thread Scott Wood

On 03/21/2013 04:29:02 PM, Alexander Graf wrote:



Am 21.03.2013 um 21:50 schrieb Scott Wood :

> On 03/21/2013 03:41:19 AM, Alexander Graf wrote:
>> Can't all the stuff above here just simply go into the qdev init  
function?

>
> Not if you want platform code to be able to fall back to a QEMU  
mpic if an in-kernel mpic is unavailable.


Do we want that? We used to have a default like that in qemu-kvm back  
in the day. That was very confusing, as people started to report that  
their VMs turned out to be really slow.


I think we should not have fallback code. It makes things easier and  
more obvious. The default should just depend on the host's  
capabilities.


I don't follow.  What is the difference between "falling back" and  
"depending on the host's capabilities"?  Either we can create an  
in-kernel MPIC or we can't.  We could use KVM_CREATE_DEVICE_TEST to see  
if the device type is supported separately from actually creating it,  
but I don't see what that would accomplish other than adding more code.



>> > /* MPIC */
>> > mpic = g_new(qemu_irq, 256);
>> > -dev = qdev_create(NULL, "openpic");
>> > -qdev_prop_set_uint32(dev, "nb_cpus", smp_cpus);
>> > -qdev_prop_set_uint32(dev, "model", params->mpic_version);
>> > +
>> > +if (kvm_irqchip_wanted()) {
>> > +dev = kvm_openpic_create(NULL, params->mpic_version);
>> This really should be just a
>>dev = qdev_create(NULL, kvm_irqchip_wanted() ? "kvm-openpic" :  
"openpic");
>> The logic whether an in-kernel irqchip is available belongs into  
the default setting of kvm_irqchip_wanted.

>
> That is exactly what I was trying to avoid by introducing  
kvm_irqchip_wanted.  We're no longer testing some vague generic  
irqchip capability, but the presence of a specific type of device  
(and version thereof).  How would the code that sets  
kvm_irqchip_wanted know what to test for?


Then move the default code into the board file and check for the  
in-kernel mpic cap.


I'm not quite sure what you mean by "the default code" -- if you mean  
the part that makes the decision whether to fall back or error out,  
that's already in board code.


-Scott



Re: [Qemu-devel] [PATCH v4 7/9] Extend test-visitor-serialization with ASN.1 visitor(s)

2013-03-21 Thread Eric Blake
On 03/21/2013 12:29 PM, Stefan Berger wrote:
> Add BER visitor hooks to test-visitor-serialization
> 
> Cc: Michael Tsirkin 
> Cc: Stefan Berger 
> Signed-off-by: Joel Schopp 
> ---

> +++ b/tests/test-visitor-serialization.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Is this addition needed?  I don't see any math functions added.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 1/2] char: add qemu_chr_be_is_fe_connected

2013-03-21 Thread Alon Levy
> Alon Levy  writes:
> 
> >> Alon Levy  writes:
> >> 
> >> > Note that the handler is called chr_is_guest_connected and not
> >> > chr_is_fe_connected, consistent with other members of
> >> > CharDriverState.
> >> 
> >> Sorry, I don't get it.
> >> 
> >> There isn't a notion of "connected" for the front-ends in the char
> >> layer.  The closest thing is whether add_handlers() have been
> >> called
> >> or
> >> not.
> >
> > It makes sense for virtio-console - it matches exactly the internal
> > guest_connected port state.
> 
> I still don't understand why you need to know that detail in the
> backend.  Hint: you should explain this in future commit
> messages/cover
> letters.

Hint will be taken into future commit messages.

Actually it already exists in the last commit message: currently, when the 
migration target finishing migration and enters the running state, the spice 
server has never received any indication that the guest agent is alive, and so 
it assumes it isn't. In the source machine, this is accomplished by the 
chr_guest_open callback implemented by spice_chr_guest_open. To make sure the 
target does see this event, the second patch adds a post_load for 
spicevmc/spiceport that will check the front end connected state and call 
spice_chr_guest_open if the front end is connected. spicevmc is the back end in 
this case, virtio-console is the front end.

> 
> >> I really dislike the idea of introduction a new concept to the
> >> char
> >> layer in a half baked way.
> >
> > Is the fact there is only one user, virtio-console, the reason you
> > call this half baked?
> 
> You are introducing a function:
> 
> qemu_chr_be_is_virtio_console_connnected()
> 
> And calling pretending like it's a generic character device
> interface.
> It's not.

There are guest open and guest closed callbacks in the generic character device 
interface. This is merely adding a convenient state that could be inferred by 
reading the history of those calls. In what way is it new or pretend?

> 
> If spicevmc only works with virtio-console and has no hope of working
> with anything else, don't use the character device layer!  It's
> trying
> to fit a square peg into a round hole.

spicevmc is used by usbredir and ccid-card-passthru in addition to 
virtio-console. The bug/problem I am trying to solve is however only happening 
with the vdagent interface that uses virtio-console at the moment. It is not 
strange to assume it will use something else at some point, since it only 
requires a transport. It matches with the char device interface very well.

> 
> If it's a concept that makes sense for all character devices front
> ends,
> then it should be a patch making it be a meaningful to all character
> device front end implementations.

It does make sense, since it is just tracking chr_guest_open & chr_guest_close 
history. But in order to work across migration it needs to have vmstate. Is 
vmstate in the chardev layer acceptable, something like the patch at the end of 
this mail?

>
> >> Why can't migration notifiers be used for this?  I think Gerd
> >> objected
> >> to using a migration *handler* but not necessarily a state
> >> notifier.
> >
> > Because if you have two chardevices, i.e.
> >  -chardev spicevmc,name=vdagent,id=spice1 -chardev
> >  spicevmc,name=vdagent,id=spice2
> >
> > Then the two on-the-wire vmstates will be identical.
> 
> I don't understand why this matters but I don't understand what the
> problem you're trying to solve is either so that's not surprising.

Perhaps I'm missing something, or it's a non problem. I'm waiting for Gerd to 
explain it better then me since he raised it.

I didn't mean identical, that was a mistake - I meant they would have 
identifiers that are only distinguished by the order the corresponding 
"-chardev" appeared on the command line. So if the target vm had the two 
chardevs (that are connected to different devices) reversed, it could load the 
wrong state to both.

> 
> Regards,
> 
> Anthony Liguori

Introducing fe_opened vmstate for replay of chr_guest_open for spice-qemu-char 
chardev (the second patch remains the same other then the renamed api to 
qemu_chr_be_is_fe_open): (this comes on top of a patch I omitted that renames 
CharDeviceState->opened to be_opened).

commit 35f3ce9dbaaa5059e8100c779eedbc0bf5bb98d2
Author: Alon Levy 
Date:   Thu Mar 21 17:24:00 2013 +0200

char: add qemu_chr_be_is_fe_open

This function returns the current open state of the guest, it tracks the
existing fe called functions qemu_chr_fe_open and qemu_chr_fe_close,
including adding vmstate to track this across migration.

Signed-off-by: Alon Levy 

diff --git a/include/char/char.h b/include/char/char.h
index 26bc174..fb893a8 100644
--- a/include/char/char.h
+++ b/include/char/char.h
@@ -52,6 +52,7 @@ typedef struct {
 #define CHR_TIOCM_RTS  0x004
 
 typedef void IOEventHandler(void *opaque, int event);
+typedef bool IOIsGuestConnectedHandler(void *opaque);
 
 struct Ch

Re: [Qemu-devel] [RFC ppc-next PATCH 6/6] kvm/openpic: in-kernel mpic support

2013-03-21 Thread Alexander Graf


Am 21.03.2013 um 21:50 schrieb Scott Wood :

> On 03/21/2013 03:41:19 AM, Alexander Graf wrote:
>> On 14.02.2013, at 07:32, Scott Wood wrote:
>> > +DeviceState *kvm_openpic_create(BusState *bus, int model)
>> > +{
>> > +KVMState *s = kvm_state;
>> > +DeviceState *dev;
>> > +struct kvm_create_device cd = {0};
>> > +int ret;
>> > +
>> > +if (!kvm_check_extension(s, KVM_CAP_DEVICE_CTRL)) {
>> > +return NULL;
>> > +}
>> > +
>> > +switch (model) {
>> > +case OPENPIC_MODEL_FSL_MPIC_20:
>> > +cd.type = KVM_DEV_TYPE_FSL_MPIC_20;
>> > +break;
>> > +
>> > +case OPENPIC_MODEL_FSL_MPIC_42:
>> > +cd.type = KVM_DEV_TYPE_FSL_MPIC_42;
>> > +break;
>> > +
>> > +default:
>> > +qemu_log_mask(LOG_UNIMP, "%s: unknown openpic model %d\n",
>> > +  __func__, model);
>> > +return NULL;
>> > +}
>> > +
>> > +ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
>> > +if (ret < 0) {
>> > +fprintf(stderr, "%s: can't create device %d: %s\n", __func__, 
>> > cd.type,
>> > +strerror(errno));
>> > +return NULL;
>> > +}
>> Can't all the stuff above here just simply go into the qdev init function?
> 
> Not if you want platform code to be able to fall back to a QEMU mpic if an 
> in-kernel mpic is unavailable.

Do we want that? We used to have a default like that in qemu-kvm back in the 
day. That was very confusing, as people started to report that their VMs turned 
out to be really slow.

I think we should not have fallback code. It makes things easier and more 
obvious. The default should just depend on the host's capabilities.

> 
>> > /* MPIC */
>> > mpic = g_new(qemu_irq, 256);
>> > -dev = qdev_create(NULL, "openpic");
>> > -qdev_prop_set_uint32(dev, "nb_cpus", smp_cpus);
>> > -qdev_prop_set_uint32(dev, "model", params->mpic_version);
>> > +
>> > +if (kvm_irqchip_wanted()) {
>> > +dev = kvm_openpic_create(NULL, params->mpic_version);
>> This really should be just a
>>dev = qdev_create(NULL, kvm_irqchip_wanted() ? "kvm-openpic" : "openpic");
>> The logic whether an in-kernel irqchip is available belongs into the default 
>> setting of kvm_irqchip_wanted.
> 
> That is exactly what I was trying to avoid by introducing kvm_irqchip_wanted. 
>  We're no longer testing some vague generic irqchip capability, but the 
> presence of a specific type of device (and version thereof).  How would the 
> code that sets kvm_irqchip_wanted know what to test for?

Then move the default code into the board file and check for the in-kernel mpic 
cap.

> 
>> If the host kvm version can't handle an in-kernel MPIC, it should simply 
>> default to false. If it supports one, it defaults to true.
> 
> OK, I misread the existing code and thought that the in-kernel irqchip would 
> never be used unless explicitly requested.
> 
>> Whenever the user defines something explicitly with -machine, that wins.
> 
> Then we'd need kvm_irqchip_wanted to be a tristate -- on, off, or 
> unspecified.  At that point it might be better to drop it entirely and just 
> open-code the option check.

Yup :)

Alex

> 
> -Scott



Re: [Qemu-devel] [RFC ppc-next PATCH 6/6] kvm/openpic: in-kernel mpic support

2013-03-21 Thread Scott Wood

On 03/21/2013 03:41:19 AM, Alexander Graf wrote:


On 14.02.2013, at 07:32, Scott Wood wrote:

> +DeviceState *kvm_openpic_create(BusState *bus, int model)
> +{
> +KVMState *s = kvm_state;
> +DeviceState *dev;
> +struct kvm_create_device cd = {0};
> +int ret;
> +
> +if (!kvm_check_extension(s, KVM_CAP_DEVICE_CTRL)) {
> +return NULL;
> +}
> +
> +switch (model) {
> +case OPENPIC_MODEL_FSL_MPIC_20:
> +cd.type = KVM_DEV_TYPE_FSL_MPIC_20;
> +break;
> +
> +case OPENPIC_MODEL_FSL_MPIC_42:
> +cd.type = KVM_DEV_TYPE_FSL_MPIC_42;
> +break;
> +
> +default:
> +qemu_log_mask(LOG_UNIMP, "%s: unknown openpic model %d\n",
> +  __func__, model);
> +return NULL;
> +}
> +
> +ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
> +if (ret < 0) {
> +fprintf(stderr, "%s: can't create device %d: %s\n",  
__func__, cd.type,

> +strerror(errno));
> +return NULL;
> +}

Can't all the stuff above here just simply go into the qdev init  
function?


Not if you want platform code to be able to fall back to a QEMU mpic if  
an in-kernel mpic is unavailable.



> /* MPIC */
> mpic = g_new(qemu_irq, 256);
> -dev = qdev_create(NULL, "openpic");
> -qdev_prop_set_uint32(dev, "nb_cpus", smp_cpus);
> -qdev_prop_set_uint32(dev, "model", params->mpic_version);
> +
> +if (kvm_irqchip_wanted()) {
> +dev = kvm_openpic_create(NULL, params->mpic_version);

This really should be just a

dev = qdev_create(NULL, kvm_irqchip_wanted() ? "kvm-openpic" :  
"openpic");


The logic whether an in-kernel irqchip is available belongs into the  
default setting of kvm_irqchip_wanted.


That is exactly what I was trying to avoid by introducing  
kvm_irqchip_wanted.  We're no longer testing some vague generic irqchip  
capability, but the presence of a specific type of device (and version  
thereof).  How would the code that sets kvm_irqchip_wanted know what to  
test for?


If the host kvm version can't handle an in-kernel MPIC, it should  
simply default to false. If it supports one, it defaults to true.


OK, I misread the existing code and thought that the in-kernel irqchip  
would never be used unless explicitly requested.


Whenever the user defines something explicitly with -machine, that  
wins.


Then we'd need kvm_irqchip_wanted to be a tristate -- on, off, or  
unspecified.  At that point it might be better to drop it entirely and  
just open-code the option check.


-Scott



Re: [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite

2013-03-21 Thread Laszlo Ersek
On 03/14/13 18:49, Markus Armbruster wrote:
> Known bugs in to_json():

> My rewrite fixes them as follows:

I'll try to review this sometime later. Patch review doesn't scale *at
all*. I've spent hours on the first 3 patches. You should just be given
pull req rights. I'd need 36 hour days.

/me throws his hands up

Laszlo



Re: [Qemu-devel] [PATCH] block: Add support for Secure Shell (ssh) block device.

2013-03-21 Thread Richard W.M. Jones
On Thu, Mar 21, 2013 at 08:35:29PM +0100, Stefan Hajnoczi wrote:
> On Thu, Mar 21, 2013 at 2:38 PM, Richard W.M. Jones  wrote:
> > From: "Richard W.M. Jones" 
> >
> >   qemu-system-x86_64 -drive file=ssh://hostname/some/image
> >
> > QEMU will ssh into 'hostname' and open '/some/image' which is made
> > available as a standard block device.
> >
> > You can specify a username (ssh://user@host/...) and/or a port number
> > (ssh://host:port/...).
> >
> > Current limitations:
> >
> > - Authentication must be done without passwords or passphrases, using
> >   ssh-agent.  Other authentication methods are not supported. (*)
> >
> > - Does not check host key. (*)
> >
> > - New remote files cannot be created. (*)
> >
> > - Uses coroutine read/write, instead of true AIO.  (libssh2 supports
> >   non-blocking access, so this could be fixed with some effort).
> >
> > - Blocks during connection and authentication.
> >
> > (*) = potentially easy fix
> >
> > This is implemented using libssh2 on the client side.  The server just
> > requires a regular ssh daemon with sftp-server support.  Most ssh
> > daemons on Unix/Linux systems will work out of the box.
> > ---
> >  block/Makefile.objs |   1 +
> >  block/ssh.c | 514 
> > 
> >  configure   |  47 +
> >  qemu-doc.texi   |  28 +++
> >  4 files changed, 590 insertions(+)
> >  create mode 100644 block/ssh.c
> 
> Just noticed that libcurl supports sftp.
> 
> Did you try enabling sftp support in block/curl.c?  I think you just
> need to add CURLPROTO_SFTP to #define PROTOCOLS.

Interestingly curl's sftp support is implemented using libssh2.  I'll
take a look at how easy this will be.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW



Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings

2013-03-21 Thread Laszlo Ersek
On 03/14/13 18:49, Markus Armbruster wrote:
> These are all broken, too.

What are "these"? And how are they broken? And how does the patch fix them?

> 
> A few test cases use noncharacters U+ and U+10.  Risks testing
> noncharacters some more instead of what they're supposed to test.  Use
> U+FFFD and U+10FFFD instead.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  tests/check-qjson.c | 85 
> +
>  1 file changed, 72 insertions(+), 13 deletions(-)

I'm confused about the commit message. There are three paragraphs in it
(the title, the first paragraph, and the 2nd paragraph). This patch
modifies different tests:

> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
> index 852124a..efec1b2 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qjson.c
> @@ -158,7 +158,7 @@ static void utf8_string(void)
>   * consider using overlong encoding \xC0\x80 for U+ ("modified
>   * UTF-8").
>   *
> - * Test cases are scraped from Markus Kuhn's UTF-8 decoder
> + * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
>   * capability and stress test at
>   * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
>   */
> @@ -256,11 +256,11 @@ static void utf8_string(void)
>  "\xDF\xBF",
>  "\"\\u07FF\"",
>  },
> -/* 2.2.3  3 bytes U+ */
> +/* 2.2.3  3 bytes U+FFFD */
>  {
> -"\"\xEF\xBF\xBF\"",
> -"\xEF\xBF\xBF",
> -"\"\\u\"",
> +"\"\xEF\xBF\xBD\"",
> +"\xEF\xBF\xBD",
> +"\"\\uFFFD\"",
>  },

This is under "2.2  Last possible sequence of a certain length". I guess
this is where you say "last possible sequence of a certain length,
encoding a character (= non-noncharacter)". OK, p#2.


>  /* 2.2.4  4 bytes U+1F */
>  {
> @@ -303,10 +303,10 @@ static void utf8_string(void)
>  "\"\\uFFFD\"",
>  },
>  {
> -/* U+10 */
> -"\"\xF4\x8F\xBF\xBF\"",
> -"\xF4\x8F\xBF\xBF",
> -"\"\\u43FF\\u\"", /* bug: want "\"\\uDBFF\\uDFFF\"" */
> +/* U+10FFFD */
> +"\"\xF4\x8F\xBF\xBD\"",
> +"\xF4\x8F\xBF\xBD",
> +"\"\\u43FF\\u\"", /* bug: want "\"\\uDBFF\\uDFFD\"" */
>  },
>  {
>  /* U+11 */

Under "2.3  Other boundary conditions". Not a non-character any longer,
but also not a boundary condition. At least not the original one. Still
covered by the ...FFFD part of the commit message, p#2.


> @@ -584,9 +584,9 @@ static void utf8_string(void)
>  "\"\\u07FF\"",
>  },
>  {
> -/* \U+ */
> -"\"\xF0\x8F\xBF\xBF\"",
> -"\xF0\x8F\xBF\xBF",   /* bug: not corrected */
> +/* \U+FFFD */
> +"\"\xF0\x8F\xBF\xBD\"",
> +"\xF0\x8F\xBF\xBD",   /* bug: not corrected */
>  "\"\\u03FF\\u\"", /* bug: want "\"\\u\"" */
>  },
>  {

Under "4.2  Maximum overlong sequences". What does that even mean? "In
some sense maximum codepoints, all represented as overlong sequences"? P#2.

> @@ -731,6 +731,7 @@ static void utf8_string(void)
>  "\"\\uDBFF\\uDFFF\"", /* bug: want "\"\\u\\u\"" */
>  },
>  /* 5.3  Other illegal code positions */
> +/* BMP noncharacters */
>  {
>  /* \U+FFFE */
>  "\"\xEF\xBF\xBE\"",
> @@ -741,7 +742,65 @@ static void utf8_string(void)
>  /* \U+ */
>  "\"\xEF\xBF\xBF\"",
>  "\xEF\xBF\xBF", /* bug: not corrected */
> -"\"\\u\"",  /* bug: not corrected */
> +"\"\\u\"",
> +},
> +{
> +/* U+FDD0 */
> +"\"\xEF\xB7\x90\"",
> +"\xEF\xB7\x90", /* bug: not corrected */
> +"\"\\uFDD0\"",  /* bug: not corrected */
> +},
> +{
> +/* U+FDEF */
> +"\"\xEF\xB7\xAF\"",
> +"\xEF\xB7\xAF", /* bug: not corrected */
> +"\"\\uFDEF\"",  /* bug: not corrected */
> +},
> +/* Plane 1 .. 16 noncharacters */
> +{
> +/* U+1FFFE U+1 U+2FFFE U+2 ... U+10FFFE U+10 */
> +"\"\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
> +"\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
> +"\xF0\xBF\xBF\xBE\xF0\xBF\xBF\xBF"
> +"\xF1\x8F\xBF\xBE\xF1\x8F\xBF\xBF"
> +"\xF1\x9F\xBF\xBE\xF1\x9F\xBF\xBF"
> +"\xF1\xAF\xBF\xBE\xF1\xAF\xBF\xBF"
> +"\xF1\xBF\xBF\xBE\xF1\xBF\xBF\xBF"
> +"\xF2\x8F\xBF\xBE\xF2\x8F\xBF\xBF"
> +"\xF2\x9F\xBF\xBE\xF2\x9F\xBF\xBF"
> +"\xF2\xAF\xBF\xBE\xF2\xAF\xBF\xBF"
> +"\xF2\xBF\xBF\xBE\xF2\xBF\xBF\xBF"
> +"\xF3\x8F\xBF\xBE\xF3\x8F\xBF\xBF"
> +   

[Qemu-devel] [PATCH 4/5] target-ppc: Use NARROW_MODE macro for addresses

2013-03-21 Thread Richard Henderson
Removing conditional compilation in the process.

Signed-off-by: Richard Henderson 
---
 target-ppc/translate.c | 51 ++
 1 file changed, 18 insertions(+), 33 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index e2d657d..2bb28b8 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2332,45 +2332,37 @@ static inline void gen_addr_imm_index(DisasContext 
*ctx, TCGv EA,
 
 simm &= ~maskl;
 if (rA(ctx->opcode) == 0) {
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode) {
-tcg_gen_movi_tl(EA, (uint32_t)simm);
-} else
-#endif
+if (NARROW_MODE(ctx)) {
+simm = (uint32_t)simm;
+}
 tcg_gen_movi_tl(EA, simm);
 } else if (likely(simm != 0)) {
 tcg_gen_addi_tl(EA, cpu_gpr[rA(ctx->opcode)], simm);
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode) {
+if (NARROW_MODE(ctx)) {
 tcg_gen_ext32u_tl(EA, EA);
 }
-#endif
 } else {
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode) {
+if (NARROW_MODE(ctx)) {
 tcg_gen_ext32u_tl(EA, cpu_gpr[rA(ctx->opcode)]);
-} else
-#endif
-tcg_gen_mov_tl(EA, cpu_gpr[rA(ctx->opcode)]);
+} else {
+tcg_gen_mov_tl(EA, cpu_gpr[rA(ctx->opcode)]);
+}
 }
 }
 
 static inline void gen_addr_reg_index(DisasContext *ctx, TCGv EA)
 {
 if (rA(ctx->opcode) == 0) {
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode) {
+if (NARROW_MODE(ctx)) {
 tcg_gen_ext32u_tl(EA, cpu_gpr[rB(ctx->opcode)]);
-} else
-#endif
-tcg_gen_mov_tl(EA, cpu_gpr[rB(ctx->opcode)]);
+} else {
+tcg_gen_mov_tl(EA, cpu_gpr[rB(ctx->opcode)]);
+}
 } else {
 tcg_gen_add_tl(EA, cpu_gpr[rA(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode) {
+if (NARROW_MODE(ctx)) {
 tcg_gen_ext32u_tl(EA, EA);
 }
-#endif
 }
 }
 
@@ -2378,13 +2370,10 @@ static inline void gen_addr_register(DisasContext *ctx, 
TCGv EA)
 {
 if (rA(ctx->opcode) == 0) {
 tcg_gen_movi_tl(EA, 0);
+} else if (NARROW_MODE(ctx)) {
+tcg_gen_ext32u_tl(EA, cpu_gpr[rA(ctx->opcode)]);
 } else {
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode) {
-tcg_gen_ext32u_tl(EA, cpu_gpr[rA(ctx->opcode)]);
-} else
-#endif
-tcg_gen_mov_tl(EA, cpu_gpr[rA(ctx->opcode)]);
+tcg_gen_mov_tl(EA, cpu_gpr[rA(ctx->opcode)]);
 }
 }
 
@@ -2392,11 +2381,9 @@ static inline void gen_addr_add(DisasContext *ctx, TCGv 
ret, TCGv arg1,
 target_long val)
 {
 tcg_gen_addi_tl(ret, arg1, val);
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode) {
+if (NARROW_MODE(ctx)) {
 tcg_gen_ext32u_tl(ret, ret);
 }
-#endif
 }
 
 static inline void gen_check_align(DisasContext *ctx, TCGv EA, int mask)
@@ -7586,11 +7573,9 @@ static inline void gen_addr_spe_imm_index(DisasContext 
*ctx, TCGv EA, int sh)
 tcg_gen_movi_tl(EA, uimm << sh);
 } else {
 tcg_gen_addi_tl(EA, cpu_gpr[rA(ctx->opcode)], uimm << sh);
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode) {
+if (NARROW_MODE(ctx)) {
 tcg_gen_ext32u_tl(EA, EA);
 }
-#endif
 }
 }
 
-- 
1.8.1.4




Re: [Qemu-devel] [PATCH 2/4] check-qjson: Fix up a few bogus comments

2013-03-21 Thread Laszlo Ersek
I don't understand what's going on here.

On 03/14/13 18:49, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster 
> ---
>  tests/check-qjson.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
> index ec85a0c..852124a 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qjson.c
> @@ -4,7 +4,7 @@
>   *
>   * Authors:
>   *  Anthony Liguori   
> - *  Markus Armbruster ,
> + *  Markus Armbruster 
>   *
>   * This work is licensed under the terms of the GNU LGPL, version 2.1 or 
> later.
>   * See the COPYING.LIB file in the top-level directory.
> @@ -462,8 +462,7 @@ static void utf8_string(void)
>  },
>  /* 3.3.4  5-byte sequence with last byte missing (U+) */
>  {
> -/* invalid */
> -"\"\xF8\x80\x80\x80\"", /* bug: not corrected */
> +"\"\xF8\x80\x80\x80\"",

In this test case, we use an invalid UTF-8 sequence in a JSON string
literal (json_in). So "/* invalid */" could be justified; perhaps it's
just too laconic.

The "/* bug: not corrected */" comment seems indeed wrong, "json_in" is
*input*, there's nothing to correct on it.

>  NULL,   /* bug: rejected */

When the JSON parser rejects the invalid sequence, it's actually
correct. So why the "bug" comment? Are we expecting (according to the
comment in utf8_string()) U+FFFD?


>  "\"\\u8000\\u\"",   /* bug: want "\"\\u\"" */
>  "\xF8\x80\x80\x80",

In this test we're trying to format a UTF-8 byte sequence (utf8_in) as a
JSON string. The source is invalid. The JSON formatter should either
reject it, or emit an U+FFFD in its place. The actual JSON output is
probably wrong, hence the "bug" part is OK, but I thought what we
expected is not U+ but U+FFFD. Hence that part of the comment is
wrong. ... Hm, OK the leading comment has some notes on this as well.

So what your patch does here is:
- remove the halfway OK comment "/* invalid */" -- I think it wasn't
really wrong, but I won't miss it,
- removes an in fact bogus comment,
- (removes a runaway comma).

My eyes are bleeding.

Reviewed-by: Laszlo Ersek 




[Qemu-devel] [PATCH 5/5] target-ppc: Use NARROW_MODE macro for tlbie

2013-03-21 Thread Richard Henderson
Removing conditional compilation in the process.

Signed-off-by: Richard Henderson 
---
 target-ppc/translate.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 2bb28b8..af936cd 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -4320,15 +4320,14 @@ static void gen_tlbie(DisasContext *ctx)
 gen_inval_exception(ctx, POWERPC_EXCP_PRIV_OPC);
 return;
 }
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode) {
+if (NARROW_MODE(ctx)) {
 TCGv t0 = tcg_temp_new();
 tcg_gen_ext32u_tl(t0, cpu_gpr[rB(ctx->opcode)]);
 gen_helper_tlbie(cpu_env, t0);
 tcg_temp_free(t0);
-} else
-#endif
+} else {
 gen_helper_tlbie(cpu_env, cpu_gpr[rB(ctx->opcode)]);
+}
 #endif
 }
 
-- 
1.8.1.4




[Qemu-devel] [PATCH 1/5] target-ppc: Fix add and subf carry generation in narrow mode

2013-03-21 Thread Richard Henderson
The set of computations used in b5a73f8d8a57e940f9bbeb399a9e47897522ee9a
are only valid if the current word size == target_long size.  This failed
to take ppc64 in 32-bit (narrow) mode into account.

Add a NARROW_MODE macro to avoid conditional compilation.

Signed-off-by: Richard Henderson 
---
 target-ppc/translate.c | 64 +-
 1 file changed, 48 insertions(+), 16 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 380a884..ed6415c 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -204,6 +204,13 @@ typedef struct DisasContext {
 int singlestep_enabled;
 } DisasContext;
 
+/* True when active word size < size of target_long.  */
+#ifdef TARGET_PPC64
+# define NARROW_MODE(C)  (!(C)->sf_mode)
+#else
+# define NARROW_MODE(C)  0
+#endif
+
 struct opc_handler_t {
 /* invalid bits for instruction 1 (Rc(opcode) == 0) */
 uint32_t inval1;
@@ -778,14 +785,26 @@ static inline void gen_op_arith_add(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 }
 
 if (compute_ca) {
-TCGv zero = tcg_const_tl(0);
-if (add_ca) {
-tcg_gen_add2_tl(t0, cpu_ca, arg1, zero, cpu_ca, zero);
-tcg_gen_add2_tl(t0, cpu_ca, t0, cpu_ca, arg2, zero);
+if (NARROW_MODE(ctx)) {
+TCGv t1 = tcg_temp_new();
+tcg_gen_ext32u_tl(t1, arg2);
+tcg_gen_ext32u_tl(t0, arg1);
+tcg_gen_add_tl(t0, t0, t1);
+tcg_temp_free(t1);
+if (add_ca) {
+tcg_gen_add_tl(t0, t0, cpu_ca);
+}
+tcg_gen_shri_tl(cpu_ca, t0, 32);
 } else {
-tcg_gen_add2_tl(t0, cpu_ca, arg1, zero, arg2, zero);
+TCGv zero = tcg_const_tl(0);
+if (add_ca) {
+tcg_gen_add2_tl(t0, cpu_ca, arg1, zero, cpu_ca, zero);
+tcg_gen_add2_tl(t0, cpu_ca, t0, cpu_ca, arg2, zero);
+} else {
+tcg_gen_add2_tl(t0, cpu_ca, arg1, zero, arg2, zero);
+}
+tcg_temp_free(zero);
 }
-tcg_temp_free(zero);
 } else {
 tcg_gen_add_tl(t0, arg1, arg2);
 if (add_ca) {
@@ -1114,14 +1133,25 @@ static inline void gen_op_arith_subf(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 {
 TCGv t0 = ret;
 
-if (((add_ca && compute_ca) || compute_ov)
-&& (TCGV_EQUAL(ret, arg1) || TCGV_EQUAL(ret, arg2)))  {
+if (compute_ov && (TCGV_EQUAL(ret, arg1) || TCGV_EQUAL(ret, arg2)))  {
 t0 = tcg_temp_new();
 }
 
-if (add_ca) {
-/* dest = ~arg1 + arg2 + ca.  */
-if (compute_ca) {
+if (compute_ca) {
+/* dest = ~arg1 + arg2 [+ ca].  */
+if (NARROW_MODE(ctx)) {
+TCGv inv1 = tcg_temp_new();
+tcg_gen_not_tl(inv1, arg1);
+tcg_gen_ext32u_tl(t0, arg2);
+tcg_gen_ext32u_tl(inv1, inv1);
+if (add_ca) {
+tcg_gen_add_tl(t0, t0, cpu_ca);
+} else {
+tcg_gen_addi_tl(t0, t0, 1);
+}
+tcg_gen_add_tl(t0, t0, inv1);
+tcg_gen_shri_tl(cpu_ca, t0, 32);
+} else if (add_ca) {
 TCGv zero, inv1 = tcg_temp_new();
 tcg_gen_not_tl(inv1, arg1);
 zero = tcg_const_tl(0);
@@ -1130,14 +1160,16 @@ static inline void gen_op_arith_subf(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 tcg_temp_free(zero);
 tcg_temp_free(inv1);
 } else {
+tcg_gen_setcond_tl(TCG_COND_GEU, cpu_ca, arg2, arg1);
 tcg_gen_sub_tl(t0, arg2, arg1);
-tcg_gen_add_tl(t0, t0, cpu_ca);
-tcg_gen_subi_tl(t0, t0, 1);
 }
+} else if (add_ca) {
+/* Since we're ignoring carry-out, we can simplify the
+   standard ~arg1 + arg2 + ca to arg2 - arg1 + ca - 1.  */
+tcg_gen_sub_tl(t0, arg2, arg1);
+tcg_gen_add_tl(t0, t0, cpu_ca);
+tcg_gen_subi_tl(t0, t0, 1);
 } else {
-if (compute_ca) {
-tcg_gen_setcond_tl(TCG_COND_GEU, cpu_ca, arg2, arg1);
-}
 tcg_gen_sub_tl(t0, arg2, arg1);
 }
 
-- 
1.8.1.4




[Qemu-devel] [PATCH 3/5] target-ppc: Use NARROW_MODE macro for comparisons

2013-03-21 Thread Richard Henderson
Removing conditional compilation in the process.

Signed-off-by: Richard Henderson 
---
 target-ppc/translate.c | 41 -
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 92b4f6c..e2d657d 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -632,7 +632,6 @@ static inline void gen_op_cmpi(TCGv arg0, target_ulong 
arg1, int s, int crf)
 tcg_temp_free(t0);
 }
 
-#if defined(TARGET_PPC64)
 static inline void gen_op_cmp32(TCGv arg0, TCGv arg1, int s, int crf)
 {
 TCGv t0, t1;
@@ -656,68 +655,62 @@ static inline void gen_op_cmpi32(TCGv arg0, target_ulong 
arg1, int s, int crf)
 gen_op_cmp32(arg0, t0, s, crf);
 tcg_temp_free(t0);
 }
-#endif
 
 static inline void gen_set_Rc0(DisasContext *ctx, TCGv reg)
 {
-#if defined(TARGET_PPC64)
-if (!(ctx->sf_mode))
+if (NARROW_MODE(ctx)) {
 gen_op_cmpi32(reg, 0, 1, 0);
-else
-#endif
+} else {
 gen_op_cmpi(reg, 0, 1, 0);
+}
 }
 
 /* cmp */
 static void gen_cmp(DisasContext *ctx)
 {
-#if defined(TARGET_PPC64)
-if (!(ctx->sf_mode && (ctx->opcode & 0x0020)))
+if (NARROW_MODE(ctx) || !(ctx->opcode & 0x0020)) {
 gen_op_cmp32(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rB(ctx->opcode)],
  1, crfD(ctx->opcode));
-else
-#endif
+} else {
 gen_op_cmp(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rB(ctx->opcode)],
1, crfD(ctx->opcode));
+}
 }
 
 /* cmpi */
 static void gen_cmpi(DisasContext *ctx)
 {
-#if defined(TARGET_PPC64)
-if (!(ctx->sf_mode && (ctx->opcode & 0x0020)))
+if (NARROW_MODE(ctx) || !(ctx->opcode & 0x0020)) {
 gen_op_cmpi32(cpu_gpr[rA(ctx->opcode)], SIMM(ctx->opcode),
   1, crfD(ctx->opcode));
-else
-#endif
+} else {
 gen_op_cmpi(cpu_gpr[rA(ctx->opcode)], SIMM(ctx->opcode),
 1, crfD(ctx->opcode));
+}
 }
 
 /* cmpl */
 static void gen_cmpl(DisasContext *ctx)
 {
-#if defined(TARGET_PPC64)
-if (!(ctx->sf_mode && (ctx->opcode & 0x0020)))
+if (NARROW_MODE(ctx) || !(ctx->opcode & 0x0020)) {
 gen_op_cmp32(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rB(ctx->opcode)],
  0, crfD(ctx->opcode));
-else
-#endif
+} else {
 gen_op_cmp(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rB(ctx->opcode)],
0, crfD(ctx->opcode));
+}
 }
 
 /* cmpli */
 static void gen_cmpli(DisasContext *ctx)
 {
-#if defined(TARGET_PPC64)
-if (!(ctx->sf_mode && (ctx->opcode & 0x0020)))
+if (NARROW_MODE(ctx) || !(ctx->opcode & 0x0020)) {
 gen_op_cmpi32(cpu_gpr[rA(ctx->opcode)], UIMM(ctx->opcode),
   0, crfD(ctx->opcode));
-else
-#endif
+} else {
 gen_op_cmpi(cpu_gpr[rA(ctx->opcode)], UIMM(ctx->opcode),
 0, crfD(ctx->opcode));
+}
 }
 
 /* isel (PowerPC 2.03 specification) */
@@ -761,11 +754,9 @@ static inline void gen_op_arith_compute_ov(DisasContext 
*ctx, TCGv arg0,
 tcg_gen_andc_tl(cpu_ov, cpu_ov, t0);
 }
 tcg_temp_free(t0);
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode) {
+if (NARROW_MODE(ctx)) {
 tcg_gen_ext32s_tl(cpu_ov, cpu_ov);
 }
-#endif
 tcg_gen_shri_tl(cpu_ov, cpu_ov, TARGET_LONG_BITS - 1);
 tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 }
-- 
1.8.1.4




[Qemu-devel] [PATCH 2/5] target-ppc: Use NARROW_MODE macro for branches

2013-03-21 Thread Richard Henderson
Removing conditional compilation in the process.

Signed-off-by: Richard Henderson 
---
 target-ppc/translate.c | 62 +++---
 1 file changed, 24 insertions(+), 38 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index ed6415c..92b4f6c 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -267,12 +267,10 @@ static inline void gen_set_access_type(DisasContext *ctx, 
int access_type)
 
 static inline void gen_update_nip(DisasContext *ctx, target_ulong nip)
 {
-#if defined(TARGET_PPC64)
-if (ctx->sf_mode)
-tcg_gen_movi_tl(cpu_nip, nip);
-else
-#endif
-tcg_gen_movi_tl(cpu_nip, (uint32_t)nip);
+if (NARROW_MODE(ctx)) {
+nip = (uint32_t)nip;
+}
+tcg_gen_movi_tl(cpu_nip, nip);
 }
 
 static inline void gen_exception_err(DisasContext *ctx, uint32_t excp, 
uint32_t error)
@@ -3352,10 +3350,9 @@ static inline void gen_goto_tb(DisasContext *ctx, int n, 
target_ulong dest)
 {
 TranslationBlock *tb;
 tb = ctx->tb;
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode)
+if (NARROW_MODE(ctx)) {
 dest = (uint32_t) dest;
-#endif
+}
 if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) &&
 likely(!ctx->singlestep_enabled)) {
 tcg_gen_goto_tb(n);
@@ -3383,12 +3380,10 @@ static inline void gen_goto_tb(DisasContext *ctx, int 
n, target_ulong dest)
 
 static inline void gen_setlr(DisasContext *ctx, target_ulong nip)
 {
-#if defined(TARGET_PPC64)
-if (ctx->sf_mode == 0)
-tcg_gen_movi_tl(cpu_lr, (uint32_t)nip);
-else
-#endif
-tcg_gen_movi_tl(cpu_lr, nip);
+if (NARROW_MODE(ctx)) {
+nip = (uint32_t)nip;
+}
+tcg_gen_movi_tl(cpu_lr, nip);
 }
 
 /* b ba bl bla */
@@ -3398,18 +3393,16 @@ static void gen_b(DisasContext *ctx)
 
 ctx->exception = POWERPC_EXCP_BRANCH;
 /* sign extend LI */
-#if defined(TARGET_PPC64)
-if (ctx->sf_mode)
-li = ((int64_t)LI(ctx->opcode) << 38) >> 38;
-else
-#endif
-li = ((int32_t)LI(ctx->opcode) << 6) >> 6;
-if (likely(AA(ctx->opcode) == 0))
+li = LI(ctx->opcode);
+li = (li ^ 0x0200) - 0x0200;
+if (likely(AA(ctx->opcode) == 0)) {
 target = ctx->nip + li - 4;
-else
+} else {
 target = li;
-if (LK(ctx->opcode))
+}
+if (LK(ctx->opcode)) {
 gen_setlr(ctx, ctx->nip);
+}
 gen_update_cfar(ctx, ctx->nip);
 gen_goto_tb(ctx, 0, target);
 }
@@ -3445,12 +3438,11 @@ static inline void gen_bcond(DisasContext *ctx, int 
type)
 return;
 }
 tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
-#if defined(TARGET_PPC64)
-if (!ctx->sf_mode)
+if (NARROW_MODE(ctx)) {
 tcg_gen_ext32u_tl(temp, cpu_ctr);
-else
-#endif
+} else {
 tcg_gen_mov_tl(temp, cpu_ctr);
+}
 if (bo & 0x2) {
 tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
 } else {
@@ -3484,20 +3476,14 @@ static inline void gen_bcond(DisasContext *ctx, int 
type)
 gen_set_label(l1);
 gen_goto_tb(ctx, 1, ctx->nip);
 } else {
-#if defined(TARGET_PPC64)
-if (!(ctx->sf_mode))
+if (NARROW_MODE(ctx)) {
 tcg_gen_andi_tl(cpu_nip, target, (uint32_t)~3);
-else
-#endif
+} else {
 tcg_gen_andi_tl(cpu_nip, target, ~3);
+}
 tcg_gen_exit_tb(0);
 gen_set_label(l1);
-#if defined(TARGET_PPC64)
-if (!(ctx->sf_mode))
-tcg_gen_movi_tl(cpu_nip, (uint32_t)ctx->nip);
-else
-#endif
-tcg_gen_movi_tl(cpu_nip, ctx->nip);
+gen_update_nip(ctx, ctx->nip);
 tcg_gen_exit_tb(0);
 }
 }
-- 
1.8.1.4




[Qemu-devel] [PATCH 0/5] target-ppc: fix add-with-carry in narrow mode

2013-03-21 Thread Richard Henderson
The first patch fixes the problem reported by agraf here:

  http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg03747.html

The subsequent patches use the macro added in the first patch
to remove a bunch of conditional compilation.

With this (or even just the first patch), I can boot Alex's
test kernel all the way to the no rootfs panic.


r~


Richard Henderson (5):
  target-ppc: Fix add and subf carry generation in narrow mode
  target-ppc: Use NARROW_MODE macro for branches
  target-ppc: Use NARROW_MODE macro for comparisons
  target-ppc: Use NARROW_MODE macro for addresses
  target-ppc: Use NARROW_MODE macro for tlbie

 target-ppc/translate.c | 225 -
 1 file changed, 109 insertions(+), 116 deletions(-)

-- 
1.8.1.4




[Qemu-devel] Large guest boot hangs the host.

2013-03-21 Thread Chegu Vinod

Hello,

I have been noticing host hangs when trying to boot large guests 
(>=40Vcpus) with the current upstream qemu.


Host is running 3.8.2 kernel.
qemu is the latest one from qemu.git.

Example qemu command line listed below... this used to work with a 
slightly older qemu (about 1.5 weeks ago and on the same host with the 
3.8.2 kernel)  'am trying to determine the cause of the host 
hang...but wanted to check to see if anyone

else has seen it...

Thanks
Vinod





/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
-cpu host \
-smp sockets=8,cores=10,threads=1 \
-numa node,nodeid=0,cpus=0-9,mem=64g \
-numa node,nodeid=1,cpus=10-19,mem=64g \
-numa node,nodeid=2,cpus=20-29,mem=64g \
-numa node,nodeid=3,cpus=30-39,mem=64g \
-numa node,nodeid=4,cpus=40-49,mem=64g \
-numa node,nodeid=5,cpus=50-59,mem=64g \
-numa node,nodeid=6,cpus=60-69,mem=64g \
-numa node,nodeid=7,cpus=70-79,mem=64g \
-m 524288 \
-mem-path /dev/hugepages \
-name vm1 \
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait \
-drive 
file=/dev/libvirt_lvm/vm1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native 
\
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 
\

-monitor stdio \
-net nic,model=virtio,macaddr=52:54:00:71:01:01,netdev=nic-0 \
-netdev tap,id=nic-0,ifname=tap0,script=no,downscript=no,vhost=on \
-vnc :4





Re: [Qemu-devel] [PATCHv3 8/9] migration: do not search dirty pages in bulk stage

2013-03-21 Thread Peter Lieven

Am 21.03.2013 um 20:27 schrieb Eric Blake :

> On 03/21/2013 09:57 AM, Peter Lieven wrote:
>> avoid searching for dirty pages just increment the
>> page offset. all pages are dirty anyway.
>> 
>> Signed-off-by: Peter Lieven 
>> ---
>> arch_init.c |   10 --
>> 1 file changed, 8 insertions(+), 2 deletions(-)
> 
> Prior review still stands.

I changed the logic a little bit.

a) last_offset is initialized to 0 again.
b) for the case last_offset == 0 in bulk ram migration the search
is not skipped resulting in offset == 0 if the page is dirty (first call
to this function with last_offset==0) and offset==1 when its called
the second time (page is no longer dirty).
Afterwards offset is calculated as last_offset + 1.

Peter


> 
> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 




Re: [Qemu-devel] [PATCH] qemu-ga: use key-value store to avoid recycling fd handles after restart

2013-03-21 Thread Markus Armbruster
Luiz Capitulino  writes:

> On Thu, 21 Mar 2013 11:13:13 -0500
> mdroth  wrote:
>
>> > Looks like you guys have no *practical* problems to solve.  Congrats!
>> > Take a vacation!  Please report back no later than 275 years from now,
>> > to make sure this 64 bit fd counter overflow problem gets taken care of
>> > in time.  ;-P
>> > 
>> 
>> Haha, well, I didn't want to be that one lazy developer who brings about
>> the downfall of future human civilization... but if it's a really big
>> deal they'll probably send someone back from the future to let me know,
>> so maybe I'm jumping the gun a bit :)
>
> I *am* that guy, but I was afraid to tell :)
>
>> I just didn't want to introduce a new interface that relied on
>> interfaces that were planned for deprecation in the *long*-term, but i
>> think you're right, it's too much hassle for current users for too
>> little gain, and there's plenty of time to do it in the future so I'll
>> hold off on it for now.
>
> Let me clarify it: when I read the code I didn't realize fd_counter
> would never wrap. I think this discussion is settled now. However, I
> still think that having an assert there is good practice.
>
> I can post a patch myself.

Asserting the counter doesn't wrap makes sense.



Re: [Qemu-devel] [PATCHv3 7/9] migration: do not sent zero pages in bulk stage

2013-03-21 Thread Peter Lieven

Am 21.03.2013 um 20:26 schrieb Eric Blake :

> On 03/21/2013 09:57 AM, Peter Lieven wrote:
>> during bulk stage of ram migration if a page is a
>> zero page do not send it at all.
>> the memory at the destination reads as zero anyway.
>> 
>> even if there is an madvise with QEMU_MADV_DONTNEED
>> at the target upon receipt of a zero page I have observed
>> that the target starts swapping if the memory is overcommitted.
>> it seems that the pages are dropped asynchronously.
>> 
>> Signed-off-by: Peter Lieven 
>> ---
>> arch_init.c |   10 ++
>> 1 file changed, 6 insertions(+), 4 deletions(-)
> 
>> if (is_zero_page(p)) {
>> acct_info.dup_pages++;
>> -bytes_sent = save_block_hdr(f, block, offset, cont,
>> -RAM_SAVE_FLAG_COMPRESS);
>> -qemu_put_byte(f, *p);
>> -bytes_sent += 1;
>> +if (!ram_bulk_stage) {
>> +bytes_sent = save_block_hdr(f, block, offset, cont,
>> +RAM_SAVE_FLAG_COMPRESS);
>> +qemu_put_byte(f, 0);
>> +}
>> +bytes_sent++;
> 
> Logic is STILL wrong.  I pointed out in v2 that bytes_sent should not be
> incremented if you are not sending the page, so it needs to be inside
> the 'if (!ram_bulk_stage)'.

If its inside then bytes_sent will be -1 at the end if we skip a page. This 
would lead
to the raw page being sent. This way it is 0 what I think is correct.

> 
> Do we want to add a new migration statistic counter of how many zero
> pages we omitted sending during the bulk stage?

You mean sth like skipped zero pages?

I was also thinking of renaming dup_pages into zero_pages in the statistics,
but this could break someone relying on it so I left it as is.

Peter


> 
> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 




Re: [Qemu-devel] [PATCH v4 8/8] coalesce adjacent iovecs

2013-03-21 Thread Eric Blake
On 03/21/2013 12:34 PM, Orit Wasserman wrote:
> This way we send one big buffer instead of many small ones
> 
> Signed-off-by: Orit Wasserman 
> ---
>  savevm.c | 19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/savevm.c b/savevm.c
> index b024354..9e8ddee 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -622,6 +622,18 @@ int qemu_fclose(QEMUFile *f)
>  return ret;
>  }
>  
> +static void add_to_iovec(QEMUFile *f, const uint8_t *buf, int size)
> +{
> +/* check for adjoint buffer and colasce them */

s/adjoint/adjacent/
s/colasce/coalesce/

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4 6/8] Add qemu_put_buffer_async

2013-03-21 Thread Eric Blake
On 03/21/2013 12:34 PM, Orit Wasserman wrote:
> This allow us to add a buffer to the iovec to send without copying it

s/allow/allows/

> into the static buffer, the buffer will be send later when qemu_fflush is 
> called.

s/send/sent/

> 
> Signed-off-by: Orit Wasserman 
> ---
>  include/migration/qemu-file.h |  5 +
>  savevm.c  | 36 +++-
>  2 files changed, 32 insertions(+), 9 deletions(-)
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] block: Add support for Secure Shell (ssh) block device.

2013-03-21 Thread Stefan Hajnoczi
On Thu, Mar 21, 2013 at 2:38 PM, Richard W.M. Jones  wrote:
> From: "Richard W.M. Jones" 
>
>   qemu-system-x86_64 -drive file=ssh://hostname/some/image
>
> QEMU will ssh into 'hostname' and open '/some/image' which is made
> available as a standard block device.
>
> You can specify a username (ssh://user@host/...) and/or a port number
> (ssh://host:port/...).
>
> Current limitations:
>
> - Authentication must be done without passwords or passphrases, using
>   ssh-agent.  Other authentication methods are not supported. (*)
>
> - Does not check host key. (*)
>
> - New remote files cannot be created. (*)
>
> - Uses coroutine read/write, instead of true AIO.  (libssh2 supports
>   non-blocking access, so this could be fixed with some effort).
>
> - Blocks during connection and authentication.
>
> (*) = potentially easy fix
>
> This is implemented using libssh2 on the client side.  The server just
> requires a regular ssh daemon with sftp-server support.  Most ssh
> daemons on Unix/Linux systems will work out of the box.
> ---
>  block/Makefile.objs |   1 +
>  block/ssh.c | 514 
> 
>  configure   |  47 +
>  qemu-doc.texi   |  28 +++
>  4 files changed, 590 insertions(+)
>  create mode 100644 block/ssh.c

Just noticed that libcurl supports sftp.

Did you try enabling sftp support in block/curl.c?  I think you just
need to add CURLPROTO_SFTP to #define PROTOCOLS.

Stefan



Re: [Qemu-devel] [PATCH 1/4] unicode: New mod_utf8_codepoint()

2013-03-21 Thread Laszlo Ersek
I've (re-)read the UTF-8 article in wikipedia now... comments below

On 03/14/13 18:49, Markus Armbruster wrote:

> diff --git a/util/unicode.c b/util/unicode.c
> new file mode 100644
> index 000..954bcf7

> +/**
> + * mod_utf8_codepoint:
> + * @s: string encoded in modified UTF-8
> + * @n: maximum number of bytes to read from @s, if less than 6
> + * @end: set to end of sequence on return
> + *
> + * Convert the modified UTF-8 sequence at the start of @s.  Modified
> + * UTF-8 is exactly like UTF-8, except U+ is encoded as
> + * "\xC0\x80".
> + *
> + * If @s points to an impossible byte (0xFE or 0xFF) or a continuation
> + * byte, the sequence is invalid, and @end is set to @s + 1
> + *
> + * Else, the first byte determines how many continuation bytes are
> + * expected.  If there are fewer, the sequence is invalid, and @end is
> + * set to @s + 1 + actual number of continuation bytes.  Else, the
> + * sequence is well-formed, and @end is set to @s + 1 + expected
> + * number of continuation bytes.

The wording also covers (number of cont. bytes == 0), OK.

... One point that wasn't clear to me until I read the code too is that
"there are fewer" covers both "out of bytes" and "byte available, but
it's not a cont. byte". Subtle :)

The way "*end" is set ensures progress.

> + *
> + * A well-formed sequence is valid unless it encodes a code point
> + * outside the Unicode range U+..U+10, one of Unicode's 66
> + * noncharacters, a surrogate code point, or is overlong.  Except the
> + * overlong sequence "\xC0\x80" is valid.
> + *
> + * Conversion succeeds if and only if the sequence is valid.
> + *
> + * Returns: the Unicode code point on success, -1 on failure.
> + */
> +int mod_utf8_codepoint(const char *s, size_t n, char **end)

The type of "end" follows the strtol() prototype. I guess I'd prefer
"const char **end", and "unsigned char" all around actually. (The input
is binary garbage.) Anyway this is irrelevant. ("const char **end" would
be probably quite inconvenient for the caller,
.)

> +{
> +static int min_cp[5] = { 0x80, 0x800, 0x1, 0x20, 0x400 };
> +const unsigned char *p;
> +unsigned byte, mask, len, i;
> +int cp;
> +
> +if (n == 0) {
> +*end = (char *)s;
> +return 0;
> +}

This is a special case (we return the code point U+ after looking at
zero bytes); we can probably expect the caller to know about n==0.

> +
> +p = (const unsigned char *)s;
> +byte = *p++;

We have n>=1 bytes here, and pointing one past the last one (in case
n==1) is allowed.

> +if (byte < 0x80) {
> +cp = byte;  /* one byte sequence */

So, since this is modified UTF-8, the U+ code point is represented
with the overlong "\xC0\x80" sequence, and a bare '\0' can never be part
of the Modified UTF-8 byte stream. (According to wp, '\0' is used in C
and similar to zero-terminate the string, but I think that's outside the
scope of the wire format.)

If we find a '\0' here, we report it as code point U+ instead of
rejecting it. One could maybe argue that it's a violation of the
interface contract (namely, due to '\0' being at offset #0, the caller
should have passed in n==0), but I assume from the description that the
caller need not have any idea about the contents, knowing the size
should be enough.

IOW I'm not sure about the intended use of this function, but if the
caller is allowed to throw any binary data at it (with correctly
communicated size of course), then we can misreport U+ here. (*)

> +} else if (byte >= 0xFE) {
> +cp = -1;/* impossible bytes 0xFE, 0xFF */

OK, binary 111x is as first byte misformed.

> +} else if ((byte & 0x40) == 0) {
> +cp = -1;/* unexpected continuation byte */

We know here that byte >= 0x80, and continuation bytes look like
10xx, OK.

> +} else {
> +/* multi-byte sequence */

We have one of:

110x
1110
0xxx
10xx
110x

> +len = 0;
> +for (mask = 0x80; byte & mask; mask >>= 1) {
> +len++;
> +}
> +assert(len > 1 && len < 7);

OK, [2..6].

(Maybe use g_assert()? :))

> +cp = byte & (mask - 1);

OK, the only bit set in "mask" matches the leftmost clear bit in "byte".
(mask-1) selects the  bits in "byte".

> +for (i = 1; i < len; i++) {

Runs at least once, and as many times as we need continuation bytes. OK.

> +byte = i < n ? *p : 0;

"p" is valid to evaluate, "*p" may not be, so we check first. OK.

> +if ((byte & 0xC0) != 0x80) {
> +cp = -1;/* continuation byte missing */
> +goto out;
> +}

Right, if a byte is available, it must look 10xx (binary). If we're
out of bytes, we also take this branch. *end will be left one past the
actual cont. bytes.

> +p++;
> +cp <<= 6;
> +

Re: [Qemu-devel] [PATCH v4 2/8] Add socket_writev_buffer function

2013-03-21 Thread Eric Blake
On 03/21/2013 12:34 PM, Orit Wasserman wrote:
> Signed-off-by: Orit Wasserman 
> ---
>  savevm.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/savevm.c b/savevm.c
> index 35c8d1e..6608b6e 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -39,6 +39,7 @@
>  #include "qmp-commands.h"
>  #include "trace.h"
>  #include "qemu/bitops.h"
> +#include "qemu/iov.h"
>  
>  #define SELF_ANNOUNCE_ROUNDS 5
>  
> @@ -171,6 +172,19 @@ static void coroutine_fn yield_until_fd_readable(int fd)
>  qemu_coroutine_yield();
>  }
>  
> +static int socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt)

Returning int...

> +{
> +QEMUFileSocket *s = opaque;
> +ssize_t len;
> +ssize_t size = iov_size(iov, iovcnt);
> +
> +len = iov_send(s->fd, iov, iovcnt, 0, size);
> +if (len < size) {
> +len = -socket_error();
> +}
> +return len;

...but len is an ssize_t.  If we send an iov with 2 gigabytes of data,
this can wrap around to a negative int even though we send a positive
amount of data.  Why not make the callback be typed to return ssize_t
from the beginning (affects patch 1/8)?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCHv3 9/9] migration: use XBZRLE only after bulk stage

2013-03-21 Thread Eric Blake
On 03/21/2013 09:57 AM, Peter Lieven wrote:
> at the beginning of migration all pages are marked dirty and
> in the first round a bulk migration of all pages is performed.
> 
> currently all these pages are copied to the page cache regardless
> if there are frequently updated or not. this doesn't make sense

s/if there/of whether they/

> since most of these pages are never transferred again.
> 
> this patch changes the XBZRLE transfer to only be used after
> the bulk stage has been completed. that means a page is added
> to the page cache the second time it is transferred and XBZRLE
> can benefit from the third time of transfer.
> 
> since the page cache is likely smaller than the number of pages
> its also likely that in the second round the page is missing in the

s/its/it's/

> cache due to collisions in the bulk phase.
> 
> on the other hand a lot of unnecessary mallocs, memdups and frees
> are saved.
> 
...
> Signed-off-by: Peter Lieven 
> ---
>  arch_init.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Prior review still stands (touching up typos in a commit message
generally does not invalidate carrying forward a reviewed-by comment).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] block: Add support for Secure Shell (ssh) block device.

2013-03-21 Thread Stefan Hajnoczi
On Thu, Mar 21, 2013 at 4:39 PM, Richard W.M. Jones  wrote:
> On Thu, Mar 21, 2013 at 04:26:23PM +0100, Stefan Hajnoczi wrote:
>> On Thu, Mar 21, 2013 at 01:38:58PM +, Richard W.M. Jones wrote:
>> > From: "Richard W.M. Jones" 
>> >
>> >   qemu-system-x86_64 -drive file=ssh://hostname/some/image
>> >
>> > QEMU will ssh into 'hostname' and open '/some/image' which is made
>> > available as a standard block device.
>> >
>> > You can specify a username (ssh://user@host/...) and/or a port number
>> > (ssh://host:port/...).
>>
>> I can see this being handy for qemu-img since it gives you the ability
>> to work with remote image files.
>>
>> > Current limitations:
>> >
>> > - Authentication must be done without passwords or passphrases, using
>> >   ssh-agent.  Other authentication methods are not supported. (*)
>> >
>> > - Does not check host key. (*)
>> >
>> > - New remote files cannot be created. (*)
>>
>> Would be important to fix these limitations.  Authentication methods to
>> make this more usable.  Host key check for security.  File creation for
>> qemu-img.
>
> I agree.
>
>> > - Uses coroutine read/write, instead of true AIO.  (libssh2 supports
>> >   non-blocking access, so this could be fixed with some effort).
>>
>> This patch does not really use coroutines - the SSH I/O is blocking!
>>
>> Coroutines must submit the SSH I/O and then yield so the QEMU event loop
>> can get on with other work.  When SSH I/O finishes the request's
>> coroutine is re-entered and the request gets completed.
>
> Hmm OK.  Is there any documentation at all on how coroutines are
> supposed to work?  Or AIO for that matter?  For example, do coroutines
> really replace all the read/write syscalls deep inside a library
> (libssh2) so that these calls context switch, or am I missing the
> point of how this works entirely?

Before we go into coroutines, take a look at AIO since it may fit
libssh2's non-blocking mode of operation better.

Doing AIO means implementing .bdrv_aio_readv()/.bdrv_aio_writev().
These functions cannot block so you need to use non-blocking libssh2
interfaces and let QEMU's event loop notify us of
readability/writability (see qemu_aio_set_fd_handler()).

Look at block/iscsi.c:iscsi_aio_readv() for an example of how to
interface with an asynchronous library.

Back to coroutines.  Coroutines are just a primitive, like threads,
that your code can use.  They aren't a framework for how to do block
I/O.

If you want an example, take a look at block/sheepdog.c:

static coroutine_fn void do_co_req(void *opaque)
{
int ret;
Coroutine *co;
[...]
co = qemu_coroutine_self();
qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, have_co_req, co);

ret = send_co_req(sockfd, hdr, data, wlen);

Using qemu_aio_set_fd_handler() we tell the event loop to call
restart_co_req() when the file descriptor becomes writable.

Then we call send_co_req() which yields if send(2) returns EAGAIN.

This is an example of setting up an event handler, invoking a
non-blocking syscall, and yielding on EAGAIN.  Here is what
restart_co_req() looks like:

static void restart_co_req(void *opaque)
{
Coroutine *co = opaque;

qemu_coroutine_enter(co, NULL);
}

If you want to make the code clean and reusable it's best to put the
event handler, non-blocking I/O, and yield/re-enter into a function
that can be reused.  That way each callers don't duplicate this
low-level code.

Hope this gives you an idea of how to use coroutines for block drivers.

>> > This is implemented using libssh2 on the client side.  The server just
>> > requires a regular ssh daemon with sftp-server support.  Most ssh
>> > daemons on Unix/Linux systems will work out of the box.
>>
>> How much of a win over sshfs is this?
>>
>> sshfs can be mounted by unprivileged users and QEMU accesses it like a
>> regular file.  So the sshfs approach already works today.
>
> Sure, but compared to having to install and set up sshfs and FUSE,
> using this is a lot simpler.  It's also potentially faster since it
> cuts out FUSE and context switching to the sshfs process.
>
> BTW I looked into the implementation of sshfs before starting this,
> and what it does is to run an 'ssh' client subprocess, then implements
> the sftp protocol by hand on top.  So using libssh2 cuts out *two*
> external processes (plus FUSE).

I see.  Benchmarks would be interesting once AIO or coroutines is implemented.

Stefan



Re: [Qemu-devel] [PATCHv3 8/9] migration: do not search dirty pages in bulk stage

2013-03-21 Thread Eric Blake
On 03/21/2013 09:57 AM, Peter Lieven wrote:
> avoid searching for dirty pages just increment the
> page offset. all pages are dirty anyway.
> 
> Signed-off-by: Peter Lieven 
> ---
>  arch_init.c |   10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)

Prior review still stands.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCHv3 6/9] migration: add an indicator for bulk state of ram migration

2013-03-21 Thread Eric Blake
On 03/21/2013 09:57 AM, Peter Lieven wrote:
> the first round of ram transfer is special since all pages
> are dirty and thus all memory pages are transferred to
> the target. this patch adds a boolean variable to track
> this stage.
> 
> Signed-off-by: Peter Lieven 
> ---
>  arch_init.c |3 +++
>  1 file changed, 3 insertions(+)

Prior review still stands.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCHv3 7/9] migration: do not sent zero pages in bulk stage

2013-03-21 Thread Eric Blake
On 03/21/2013 09:57 AM, Peter Lieven wrote:
> during bulk stage of ram migration if a page is a
> zero page do not send it at all.
> the memory at the destination reads as zero anyway.
> 
> even if there is an madvise with QEMU_MADV_DONTNEED
> at the target upon receipt of a zero page I have observed
> that the target starts swapping if the memory is overcommitted.
> it seems that the pages are dropped asynchronously.
> 
> Signed-off-by: Peter Lieven 
> ---
>  arch_init.c |   10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)

>  if (is_zero_page(p)) {
>  acct_info.dup_pages++;
> -bytes_sent = save_block_hdr(f, block, offset, cont,
> -RAM_SAVE_FLAG_COMPRESS);
> -qemu_put_byte(f, *p);
> -bytes_sent += 1;
> +if (!ram_bulk_stage) {
> +bytes_sent = save_block_hdr(f, block, offset, cont,
> +RAM_SAVE_FLAG_COMPRESS);
> +qemu_put_byte(f, 0);
> +}
> +bytes_sent++;

Logic is STILL wrong.  I pointed out in v2 that bytes_sent should not be
incremented if you are not sending the page, so it needs to be inside
the 'if (!ram_bulk_stage)'.

Do we want to add a new migration statistic counter of how many zero
pages we omitted sending during the bulk stage?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 1/2] char: add qemu_chr_be_is_fe_connected

2013-03-21 Thread Anthony Liguori
Alon Levy  writes:

>> Alon Levy  writes:
>> 
>> > Note that the handler is called chr_is_guest_connected and not
>> > chr_is_fe_connected, consistent with other members of
>> > CharDriverState.
>> 
>> Sorry, I don't get it.
>> 
>> There isn't a notion of "connected" for the front-ends in the char
>> layer.  The closest thing is whether add_handlers() have been called
>> or
>> not.
>
> It makes sense for virtio-console - it matches exactly the internal
> guest_connected port state.

I still don't understand why you need to know that detail in the
backend.  Hint: you should explain this in future commit messages/cover
letters.

>> I really dislike the idea of introduction a new concept to the char
>> layer in a half baked way.
>
> Is the fact there is only one user, virtio-console, the reason you
> call this half baked?

You are introducing a function:

qemu_chr_be_is_virtio_console_connnected()

And calling pretending like it's a generic character device interface.
It's not.

If spicevmc only works with virtio-console and has no hope of working
with anything else, don't use the character device layer!  It's trying
to fit a square peg into a round hole.

If it's a concept that makes sense for all character devices front ends,
then it should be a patch making it be a meaningful to all character
device front end implementations.

>> Why can't migration notifiers be used for this?  I think Gerd
>> objected
>> to using a migration *handler* but not necessarily a state notifier.
>
> Because if you have two chardevices, i.e.
>  -chardev spicevmc,name=vdagent,id=spice1 -chardev 
> spicevmc,name=vdagent,id=spice2
>
> Then the two on-the-wire vmstates will be identical.

I don't understand why this matters but I don't understand what the
problem you're trying to solve is either so that's not surprising.

Regards,

Anthony Liguori

>
>> 
>> Regards,
>> 
>> Anthony Liguori
>> 
>> >
>> > Signed-off-by: Alon Levy 
>> > ---
>> >  hw/virtio-console.c |  9 +
>> >  include/char/char.h | 11 +++
>> >  qemu-char.c |  9 +
>> >  3 files changed, 29 insertions(+)
>> >
>> > diff --git a/hw/virtio-console.c b/hw/virtio-console.c
>> > index e2d1c58..643e24e 100644
>> > --- a/hw/virtio-console.c
>> > +++ b/hw/virtio-console.c
>> > @@ -120,6 +120,13 @@ static void chr_event(void *opaque, int event)
>> >  }
>> >  }
>> >  
>> > +static bool chr_is_guest_connected(void *opaque)
>> > +{
>> > +VirtConsole *vcon = opaque;
>> > +
>> > +return vcon->port.guest_connected;
>> > +}
>> > +
>> >  static int virtconsole_initfn(VirtIOSerialPort *port)
>> >  {
>> >  VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
>> > @@ -133,6 +140,8 @@ static int virtconsole_initfn(VirtIOSerialPort
>> > *port)
>> >  if (vcon->chr) {
>> >  qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read,
>> >  chr_event,
>> >vcon);
>> > +/* only user of chr_is_guest_connected so leave it as
>> > special cased*/
>> > +vcon->chr->chr_is_guest_connected =
>> > chr_is_guest_connected;
>> >  }
>> >  
>> >  return 0;
>> > diff --git a/include/char/char.h b/include/char/char.h
>> > index 0326b2a..b41ddc0 100644
>> > --- a/include/char/char.h
>> > +++ b/include/char/char.h
>> > @@ -52,6 +52,7 @@ typedef struct {
>> >  #define CHR_TIOCM_RTS 0x004
>> >  
>> >  typedef void IOEventHandler(void *opaque, int event);
>> > +typedef bool IOIsGuestConnectedHandler(void *opaque);
>> >  
>> >  struct CharDriverState {
>> >  void (*init)(struct CharDriverState *s);
>> > @@ -64,6 +65,7 @@ struct CharDriverState {
>> >  IOEventHandler *chr_event;
>> >  IOCanReadHandler *chr_can_read;
>> >  IOReadHandler *chr_read;
>> > +IOIsGuestConnectedHandler *chr_is_guest_connected;
>> >  void *handler_opaque;
>> >  void (*chr_close)(struct CharDriverState *chr);
>> >  void (*chr_accept_input)(struct CharDriverState *chr);
>> > @@ -229,6 +231,15 @@ void qemu_chr_be_write(CharDriverState *s,
>> > uint8_t *buf, int len);
>> >   */
>> >  void qemu_chr_be_event(CharDriverState *s, int event);
>> >  
>> > +/**
>> > + * @qemu_chr_be_is_fe_connected:
>> > + *
>> > + * Back end calls this to check if the front end is connected.
>> > + *
>> > + * Returns: true if the guest (front end) is connected, false
>> > otherwise.
>> > + */
>> > +bool qemu_chr_be_is_fe_connected(CharDriverState *s);
>> > +
>> >  void qemu_chr_add_handlers(CharDriverState *s,
>> > IOCanReadHandler *fd_can_read,
>> > IOReadHandler *fd_read,
>> > diff --git a/qemu-char.c b/qemu-char.c
>> > index 4e011df..77a501a 100644
>> > --- a/qemu-char.c
>> > +++ b/qemu-char.c
>> > @@ -120,6 +120,15 @@ void qemu_chr_be_event(CharDriverState *s, int
>> > event)
>> >  s->chr_event(s->handler_opaque, event);
>> >  }
>> >  
>> > +bool qemu_chr_be_is_fe_connected(CharDriverState *s)
>> > +{
>> > +if (s->chr_is_guest_connected) {
>> > +   

Re: [Qemu-devel] [PATCHv3 5/9] migration: search for zero instead of dup pages

2013-03-21 Thread Eric Blake
On 03/21/2013 09:57 AM, Peter Lieven wrote:
> virtually all dup pages are zero pages. remove
> the special is_dup_page() function and use the
> optimized buffer_find_nonzero_offset() function
> instead.
> 
> here buffer_find_nonzero_offset() is used directly
> to avoid the unnecssary additional checks in
> buffer_is_zero().
> 
> Signed-off-by: Peter Lieven 
> ---

> -return 1;
> +return 
> + buffer_find_nonzero_offset(p, TARGET_PAGE_SIZE) == TARGET_PAGE_SIZE;

Unusual layout.  I would have written:

return buffer_find_nonzero_offset(p, TARGET_PAGE_SIZE) ==
TARGET_PAGE_SIZE;

>  /* struct contains XBZRLE cache and a static page
> @@ -443,7 +434,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>  
>  /* In doubt sent page as normal */
>  bytes_sent = -1;
> -if (is_dup_page(p)) {
> +if (is_zero_page(p)) {
>  acct_info.dup_pages++;
>  bytes_sent = save_block_hdr(f, block, offset, cont,
>  RAM_SAVE_FLAG_COMPRESS);

I would move the change for qemu_put_byte(f, 0) out of patch 7 into this
patch, since this is the first point where you know the byte to send is 0.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCHv3 4/9] bitops: use vector algorithm to optimize find_next_bit()

2013-03-21 Thread Eric Blake
On 03/21/2013 09:57 AM, Peter Lieven wrote:
> this patch adds the usage of buffer_find_nonzero_offset()
> to skip large areas of zeroes.
> 
> compared to loop unrolling presented in an earlier
> patch this adds another 50% performance benefit for
> skipping large areas of zeroes. loop unrolling alone
> added close to 100% speedup.
> 
> Signed-off-by: Peter Lieven 
> ---
>  util/bitops.c |   22 +++---
>  1 file changed, 19 insertions(+), 3 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2013 at 12:41:35PM -0600, Jason Gunthorpe wrote:
> On Thu, Mar 21, 2013 at 08:16:33PM +0200, Michael S. Tsirkin wrote:
> 
> > This is the one I find redundant. Since the write will be done by
> > the adaptor under direct control by the application, why does it
> > make sense to declare this beforehand?  If you don't want to allow
> > local write access to memory, just do not post any receive WRs with
> > this address.  If you posted and regret it, reset the QP to cancel.
> 
> This is to support your COW scenario - the app declares before hand to
> the kernel that it will write to the memory and the kernel ensures
> pages are dedicated to the app at registration time. Or the app says
> it will only read and the kernel could leave them shared.

Someone here is confused. LOCAL_WRITE/absence of it does not address
COW, it breaks COW anyway.  Are you now saying we should change rdma so
without LOCAL_WRITE it will not break COW?

> The adaptor enforces the access control to prevent a naughty app from
> writing to shared memory - think about mmap'ing libc.so and then using
> RDMA to write to the shared pages. It is necessary to ensure that is
> impossible.
> 
> Jason

That's why it's redundant: we can't trust an application to tell us
'this page is writeable', we must get this info from kernel.  And so
there's apparently no need for application to tell adaptor about
LOCAL_WRITE.

-- 
MST



[Qemu-devel] [PATCH v4 7/9] Extend test-visitor-serialization with ASN.1 visitor(s)

2013-03-21 Thread Stefan Berger
Add BER visitor hooks to test-visitor-serialization

Cc: Michael Tsirkin 
Cc: Stefan Berger 
Signed-off-by: Joel Schopp 
---
 tests/Makefile |  2 +-
 tests/test-visitor-serialization.c | 73 ++
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/tests/Makefile b/tests/Makefile
index 567e36e..578d732 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -124,7 +124,7 @@ tests/test-qmp-output-visitor$(EXESUF): 
tests/test-qmp-output-visitor.o $(test-q
 tests/test-qmp-input-visitor$(EXESUF): tests/test-qmp-input-visitor.o 
$(test-qapi-obj-y) libqemuutil.a libqemustub.a
 tests/test-qmp-input-strict$(EXESUF): tests/test-qmp-input-strict.o 
$(test-qapi-obj-y) libqemuutil.a libqemustub.a
 tests/test-qmp-commands$(EXESUF): tests/test-qmp-commands.o 
tests/test-qmp-marshal.o $(test-qapi-obj-y) qapi-types.o qapi-visit.o 
libqemuutil.a libqemustub.a
-tests/test-visitor-serialization$(EXESUF): tests/test-visitor-serialization.o 
$(test-qapi-obj-y) libqemuutil.a libqemustub.a
+tests/test-visitor-serialization$(EXESUF): tests/test-visitor-serialization.o 
$(test-qapi-obj-y) $(block-obj-y) libqemuutil.a libqemustub.a
 
 tests/test-mul64$(EXESUF): tests/test-mul64.o libqemuutil.a
 
diff --git a/tests/test-visitor-serialization.c 
b/tests/test-visitor-serialization.c
index 3c6b8df..ba07948 100644
--- a/tests/test-visitor-serialization.c
+++ b/tests/test-visitor-serialization.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "qemu-common.h"
 #include "test-qapi-types.h"
@@ -23,6 +24,9 @@
 #include "qapi/qmp-output-visitor.h"
 #include "qapi/string-input-visitor.h"
 #include "qapi/string-output-visitor.h"
+#include "qapi/ber-input-visitor.h"
+#include "qapi/ber-output-visitor.h"
+#include "migration/qemu-file.h"
 
 typedef struct PrimitiveType {
 union {
@@ -701,6 +705,60 @@ static void string_cleanup(void *datap)
 string_input_visitor_cleanup(d->siv);
 }
 
+
+typedef struct BERSerializeData {
+BEROutputVisitor *sov;
+QEMUFile *qoutfile;
+BERInputVisitor *siv;
+QEMUFile *qinfile;
+} BERSerializeData;
+
+static void ber_serialize(void *native_in, void **datap,
+  VisitorFunc visit, Error **errp,
+  BERTypePC ber_type_pc)
+{
+BERSerializeData *d = g_malloc0(sizeof(*d));
+
+d->qoutfile = qemu_bufopen("w", NULL);
+d->sov = ber_output_visitor_new(d->qoutfile, ber_type_pc);
+visit(ber_output_get_visitor(d->sov), &native_in, errp);
+*datap = d;
+}
+
+static void ber_primitive_serialize(void *native_in, void **datap,
+  VisitorFunc visit, Error **errp)
+{
+ber_serialize(native_in, datap, visit, errp, BER_TYPE_PRIMITIVE);
+}
+
+static void ber_constructed_serialize(void *native_in, void **datap,
+  VisitorFunc visit, Error **errp)
+{
+ber_serialize(native_in, datap, visit, errp, BER_TYPE_CONSTRUCTED);
+}
+
+static void ber_deserialize(void **native_out, void *datap,
+   VisitorFunc visit, Error **errp)
+{
+BERSerializeData *d = datap;
+const QEMUSizedBuffer *qsb = qemu_buf_get(d->qoutfile);
+QEMUSizedBuffer *new_qsb = qsb_clone(qsb);
+g_assert(new_qsb != NULL);
+
+d->qinfile = qemu_bufopen("r", new_qsb);
+
+d->siv = ber_input_visitor_new(d->qinfile, ~0);
+visit(ber_input_get_visitor(d->siv), native_out, errp);
+}
+
+static void ber_cleanup(void *datap)
+{
+BERSerializeData *d = datap;
+ber_output_visitor_cleanup(d->sov);
+ber_input_visitor_cleanup(d->siv);
+}
+
+
 /* visitor registration, test harness */
 
 /* note: to function interchangeably as a serialization mechanism your
@@ -722,6 +780,21 @@ static const SerializeOps visitors[] = {
 .cleanup = string_cleanup,
 .caps = VCAP_PRIMITIVES
 },
+{
+.type = "ASN.1 BER primitives",
+.serialize = ber_primitive_serialize,
+.deserialize = ber_deserialize,
+.cleanup = ber_cleanup,
+.caps = VCAP_PRIMITIVES
+},
+{
+.type = "ASN.1 BER constructed",
+.serialize = ber_constructed_serialize,
+.deserialize = ber_deserialize,
+.cleanup = ber_cleanup,
+.caps = VCAP_PRIMITIVES | VCAP_LISTS
+},
+
 { NULL }
 };
 
-- 
1.7.11.7




Re: [Qemu-devel] [PATCHv3 2/9] cutils: add a function to find non-zero content in a buffer

2013-03-21 Thread Peter Lieven

Am 21.03.2013 um 19:12 schrieb Eric Blake :

> On 03/21/2013 09:57 AM, Peter Lieven wrote:
>> this adds buffer_find_nonzero_offset() which is a SSE2/Altives
> 
> s/Altives/Altivec/
> 
>> optimized function that searches for non-zero content in a
>> buffer.
>> 
>> due to the optimizations used in the function there are restrictions
>> on buffer address and search length. the function
>> can_use_buffer_find_nonzero_content() can be used to check if
>> the function can be used safely.
>> 
>> Signed-off-by: Peter Lieven 
>> ---
>> include/qemu-common.h |3 +++
>> util/cutils.c |   50 
>> +
>> 2 files changed, 53 insertions(+)
> 
>> +inline bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len);
>> +inline size_t buffer_find_nonzero_offset(const void *buf, size_t len);
> 
> Ouch.  It is okay to add a 'static inline' function, but then the
> implementation must live in this header.  Otherwise, the function must
> not be inline, or you risk linker errors.

Sorry, i was not aware. 

> 
>> +++ b/util/cutils.c
>> @@ -143,6 +143,56 @@ int qemu_fdatasync(int fd)
>> }
>> 
>> /*
>> + * Searches for an area with non-zero content in a buffer
>> + *
>> + * Attention! The len must be a multiple of 8 * sizeof(VECTYPE) 
> 
> Should we call out BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR instead of a
> magic number here?  But I'm okay with leaving it as-is.
> 
>> + * and addr must be a multiple of sizeof(VECTYPE) due to 
> 
> Trailing whitespace (here, and on several other lines).  Please run your
> series through scripts/checkpatch.pl before submitting v4.

thanks for the pointer.

> 
>> + * restriction of optimizations in this function.
>> + * 
>> + * can_use_buffer_find_nonzero_offset() can be used to check
>> + * these requirements.
>> + * 
>> + * The return value is the offset of the non-zero area rounded
>> + * down to 8 * sizeof(VECTYPE). If the buffer is all zero 
> 
> Same comment on this use of '8'.
> 
>> + * the return value is equal to len.
>> + */
>> +
>> +inline size_t buffer_find_nonzero_offset(const void *buf, size_t len)
> 
> s/inline// (or move it to a 'static inline' definition in the .h)

I would move at least the can_use_buffer_find_nonzero_offset() function
completely into the header and rely the compiler inlineing
buffer_find_nonzero_offset().

> 
>> +{
>> +VECTYPE *p = (VECTYPE *)buf;
>> +VECTYPE zero = ZERO_SPLAT;
>> +size_t i;
>> +
> 
> You copied the 'Attention! ...' message from buffer_is_zero, which
> currently asserts that its condition is held.  Therefore, consistency
> would argue that you should assert your preconditions here, even if it
> adds more to the code size.  But this is something where a maintainer
> might have a better opinion on whether to keep the code robust with an
> assert(), or whether the faster operation without sanity checking is
> more appropriate (in which case a followup to remove the assert from
> buffer_is_zero would make sense).

I would appreciate feedback on this from the maintainers. My concern
with buffer_find_nonzero_offset() is that it is used for checking millions
of 4KByte pages. buffer_is_zero is used for larger objects in the bdrv
context, afaik.



> 
>>  * Checks if a buffer is all zeroes
>>  *
>>  * Attention! The len must be a multiple of 4 * sizeof(long) due to
>> 
> 
> Cleaning up whitespace is trivial; but the incorrect use of 'inline'
> requires a v4.
> 
> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 




Re: [Qemu-devel] [PATCH v2] Add option to mlock qemu and guest memory

2013-03-21 Thread Satoru Moriya
Hi Jan,

Thank you for reviewing.

On 03/21/2013 05:08 AM, Jan Kiszka wrote:
> On 2013-02-14 21:21, Satoru Moriya wrote:
>> @@ -31,6 +31,7 @@ void os_set_proc_name(const char *s);  void 
>> os_setup_signal_handling(void);  void os_daemonize(void);  void 
>> os_setup_post(void);
>> +void os_mlock(void);
>>  
>>  typedef struct timeval qemu_timeval;  #define qemu_gettimeofday(tp) 
>> gettimeofday(tp, NULL) diff --git a/include/sysemu/os-win32.h 
>> b/include/sysemu/os-win32.h index bf9edeb..a74ca13 100644
>> --- a/include/sysemu/os-win32.h
>> +++ b/include/sysemu/os-win32.h
>> @@ -80,6 +80,7 @@ static inline void os_daemonize(void) {}  static 
>> inline void os_setup_post(void) {}  void os_set_line_buffering(void);  
>> static inline void os_set_proc_name(const char *dummy) {}
>> +static inline void os_mlock(void) {}
> 
> Let's  return an error code from os_mlock, a permanent failure like 
> -ENOSYS on Win32, the result of mlock() on POSIX. Requesting to lock 
> the memory should not silently fail on Windows.

OK. I'll update it.
 
>> diff --git a/vl.c b/vl.c
>> index 1355f69..c16c8ad 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -491,6 +491,18 @@ static QemuOptsList qemu_object_opts = {
>>  },
>>  };
>>  
>> +static QemuOptsList qemu_realtime_opts = {
>> +.name = "realtime",
>> +.head = QTAILQ_HEAD_INITIALIZER(qemu_realtime_opts.head),
>> +.desc = {
>> +{
>> +.name = "mlock",
>> +.type = QEMU_OPT_BOOL,
>> +},
>> +{ /* end of list */ }
>> +},
>> +};
>> +
>>  const char *qemu_get_vm_name(void)
>>  {
>>  return qemu_name;
>> @@ -1384,6 +1396,17 @@ static void smp_parse(const char *optarg)
>>  max_cpus = smp_cpus;
>>  }
>>  
>> +static void configure_realtime(QemuOpts *opts) {
>> +bool is_mlock;
> 
> "enable_mlock" or just "mlock".

will do.

Regards,
Satoru



Re: [Qemu-devel] [PATCH 1/2] char: add qemu_chr_be_is_fe_connected

2013-03-21 Thread Alon Levy
> Alon Levy  writes:
> 
> > Note that the handler is called chr_is_guest_connected and not
> > chr_is_fe_connected, consistent with other members of
> > CharDriverState.
> 
> Sorry, I don't get it.
> 
> There isn't a notion of "connected" for the front-ends in the char
> layer.  The closest thing is whether add_handlers() have been called
> or
> not.

It makes sense for virtio-console - it matches exactly the internal 
guest_connected port state.

> 
> I really dislike the idea of introduction a new concept to the char
> layer in a half baked way.

Is the fact there is only one user, virtio-console, the reason you call this 
half baked?

> 
> Why can't migration notifiers be used for this?  I think Gerd
> objected
> to using a migration *handler* but not necessarily a state notifier.

Because if you have two chardevices, i.e.
 -chardev spicevmc,name=vdagent,id=spice1 -chardev 
spicevmc,name=vdagent,id=spice2

Then the two on-the-wire vmstates will be identical.

> 
> Regards,
> 
> Anthony Liguori
> 
> >
> > Signed-off-by: Alon Levy 
> > ---
> >  hw/virtio-console.c |  9 +
> >  include/char/char.h | 11 +++
> >  qemu-char.c |  9 +
> >  3 files changed, 29 insertions(+)
> >
> > diff --git a/hw/virtio-console.c b/hw/virtio-console.c
> > index e2d1c58..643e24e 100644
> > --- a/hw/virtio-console.c
> > +++ b/hw/virtio-console.c
> > @@ -120,6 +120,13 @@ static void chr_event(void *opaque, int event)
> >  }
> >  }
> >  
> > +static bool chr_is_guest_connected(void *opaque)
> > +{
> > +VirtConsole *vcon = opaque;
> > +
> > +return vcon->port.guest_connected;
> > +}
> > +
> >  static int virtconsole_initfn(VirtIOSerialPort *port)
> >  {
> >  VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
> > @@ -133,6 +140,8 @@ static int virtconsole_initfn(VirtIOSerialPort
> > *port)
> >  if (vcon->chr) {
> >  qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read,
> >  chr_event,
> >vcon);
> > +/* only user of chr_is_guest_connected so leave it as
> > special cased*/
> > +vcon->chr->chr_is_guest_connected =
> > chr_is_guest_connected;
> >  }
> >  
> >  return 0;
> > diff --git a/include/char/char.h b/include/char/char.h
> > index 0326b2a..b41ddc0 100644
> > --- a/include/char/char.h
> > +++ b/include/char/char.h
> > @@ -52,6 +52,7 @@ typedef struct {
> >  #define CHR_TIOCM_RTS  0x004
> >  
> >  typedef void IOEventHandler(void *opaque, int event);
> > +typedef bool IOIsGuestConnectedHandler(void *opaque);
> >  
> >  struct CharDriverState {
> >  void (*init)(struct CharDriverState *s);
> > @@ -64,6 +65,7 @@ struct CharDriverState {
> >  IOEventHandler *chr_event;
> >  IOCanReadHandler *chr_can_read;
> >  IOReadHandler *chr_read;
> > +IOIsGuestConnectedHandler *chr_is_guest_connected;
> >  void *handler_opaque;
> >  void (*chr_close)(struct CharDriverState *chr);
> >  void (*chr_accept_input)(struct CharDriverState *chr);
> > @@ -229,6 +231,15 @@ void qemu_chr_be_write(CharDriverState *s,
> > uint8_t *buf, int len);
> >   */
> >  void qemu_chr_be_event(CharDriverState *s, int event);
> >  
> > +/**
> > + * @qemu_chr_be_is_fe_connected:
> > + *
> > + * Back end calls this to check if the front end is connected.
> > + *
> > + * Returns: true if the guest (front end) is connected, false
> > otherwise.
> > + */
> > +bool qemu_chr_be_is_fe_connected(CharDriverState *s);
> > +
> >  void qemu_chr_add_handlers(CharDriverState *s,
> > IOCanReadHandler *fd_can_read,
> > IOReadHandler *fd_read,
> > diff --git a/qemu-char.c b/qemu-char.c
> > index 4e011df..77a501a 100644
> > --- a/qemu-char.c
> > +++ b/qemu-char.c
> > @@ -120,6 +120,15 @@ void qemu_chr_be_event(CharDriverState *s, int
> > event)
> >  s->chr_event(s->handler_opaque, event);
> >  }
> >  
> > +bool qemu_chr_be_is_fe_connected(CharDriverState *s)
> > +{
> > +if (s->chr_is_guest_connected) {
> > +return s->chr_is_guest_connected(s->handler_opaque);
> > +}
> > +/* default to always connected */
> > +return true;
> > +}
> > +
> >  static gboolean qemu_chr_generic_open_bh(gpointer opaque)
> >  {
> >  CharDriverState *s = opaque;
> > --
> > 1.8.1.4
> 
> 
> 



Re: [Qemu-devel] [PATCH] ifname=xxx for -netdev bridge

2013-03-21 Thread Eric Blake
On 03/21/2013 12:05 PM, Alexandre Kandalintsev wrote:
> Hi!
> 
> 
> Here is the patch that allows us to specify the name of tap interface
> when -netdev bridge is used. It's like -netdev tap,ifname=xxx, but for
> bridges.
> 

> +++ b/qapi-schema.json
> @@ -2676,6 +2676,7 @@
>  { 'type': 'NetdevBridgeOptions',
>'data': {
>  '*br': 'str',
> +'*ifname': 'str',
>  '*helper': 'str' } }

You also need to add a line documenting this field:

# @ifname: #optional Set the interface name that will be used (since 1.5).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v4 6/9] ASN.1 input visitor

2013-03-21 Thread Stefan Berger
Implement an input visitor for ASN.1 BER and CER encoding.

Cc: Michael Tsirkin 
Signed-off-by: Stefan Berger 
Signed-off-by: Joel Schopp 
---
 include/qapi/ber-input-visitor.h |   30 +
 qapi/Makefile.objs   |2 +-
 qapi/ber-input-visitor.c | 1141 ++
 3 files changed, 1172 insertions(+), 1 deletion(-)
 create mode 100644 include/qapi/ber-input-visitor.h
 create mode 100644 qapi/ber-input-visitor.c

diff --git a/include/qapi/ber-input-visitor.h b/include/qapi/ber-input-visitor.h
new file mode 100644
index 000..eaa3d0e
--- /dev/null
+++ b/include/qapi/ber-input-visitor.h
@@ -0,0 +1,30 @@
+/*
+ * BER Input Visitor header
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *  Stefan Berger 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef BER_INPUT_VISITOR_H
+#define BER_INPUT_VISITOR_H
+
+#include "qapi/visitor.h"
+
+typedef struct BERInputVisitor BERInputVisitor;
+
+BERInputVisitor *ber_input_visitor_new(QEMUFile *,
+   uint64_t max_allowd_buffer_size);
+void ber_input_visitor_cleanup(BERInputVisitor *v);
+uint64_t ber_input_get_parser_position(BERInputVisitor *v);
+uint64_t ber_input_get_largest_needed_buffer(BERInputVisitor *v);
+
+Visitor *ber_input_get_visitor(BERInputVisitor *v);
+
+#endif
diff --git a/qapi/Makefile.objs b/qapi/Makefile.objs
index 519e3ee..f7f080a 100644
--- a/qapi/Makefile.objs
+++ b/qapi/Makefile.objs
@@ -3,4 +3,4 @@ util-obj-y += qmp-output-visitor.o qmp-registry.o qmp-dispatch.o
 util-obj-y += string-input-visitor.o string-output-visitor.o
 
 util-obj-y += opts-visitor.o
-util-obj-y += ber-common.o ber-output-visitor.o
+util-obj-y += ber-common.o ber-output-visitor.o ber-input-visitor.o
diff --git a/qapi/ber-input-visitor.c b/qapi/ber-input-visitor.c
new file mode 100644
index 000..bfc32aa
--- /dev/null
+++ b/qapi/ber-input-visitor.c
@@ -0,0 +1,1141 @@
+/*
+ * BER Input Visitor
+ *
+ * Copyright IBM, Corp. 2011, 2013
+ * Copyright Red Hat, Inc. 2011
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *  Stefan Berger 
+ *  Michael Tsirkin   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include 
+#include 
+
+#include "qemu-common.h"
+#include "qapi/ber-common.h"
+#include "qapi/ber-input-visitor.h"
+#include "qemu/queue.h"
+#include "qemu-common.h"
+#include "hw/hw.h"
+#include "qapi/ber.h"
+#include "include/qapi/qmp/qerror.h"
+#include "migration/qemu-file.h"
+#include "qapi/visitor-impl.h"
+
+#define AIV_STACK_SIZE 1024
+
+/* whether to allow the parsing of primitives that are fragmented */
+#define BER_ALLOW_FRAGMENTED_PRIMITIVES
+
+/* #define BER_DEBUG */
+
+typedef struct StackEntry {
+uint64_t cur_pos;
+} StackEntry;
+
+struct BERInputVisitor {
+Visitor visitor;
+QEMUFile *qfile;
+uint64_t cur_pos;
+StackEntry stack[AIV_STACK_SIZE];
+int nb_stack;
+uint64_t max_allowed_buffer_size;
+uint64_t largest_needed_buffer;
+};
+
+static BERInputVisitor *to_biv(Visitor *v)
+{
+return container_of(v, BERInputVisitor, visitor);
+}
+
+static void ber_input_push(BERInputVisitor *aiv,
+   uint64_t cur_pos, Error **errp)
+{
+aiv->stack[aiv->nb_stack].cur_pos = cur_pos;
+aiv->nb_stack++;
+
+if (aiv->nb_stack >= AIV_STACK_SIZE) {
+error_set(errp, QERR_BUFFER_OVERRUN);
+}
+}
+
+static uint64_t ber_input_pop(BERInputVisitor *aiv, Error **errp)
+{
+aiv->nb_stack--;
+
+if (aiv->nb_stack < 0) {
+error_set(errp, QERR_BUFFER_OVERRUN);
+return 0;
+}
+
+return aiv->stack[aiv->nb_stack].cur_pos;
+}
+
+/*
+ * Read a type tag from the stream. Up-to 32 bit type tags are supported
+ * for reading and otherwise an error is returned. Anything larger than that
+ * would not be reasonable and could only be abused.
+ */
+static uint32_t ber_read_type(BERInputVisitor *aiv, uint8_t *ber_type_flags,
+  Error **errp)
+{
+uint32_t type;
+uint8_t byte;
+uint8_t ctr = 0;
+char buf[128];
+
+if (qemu_read_bytes(aiv->qfile, &byte, 1) != 1) {
+error_setg(errp, "QEMUFile has an error, error was '%s'",
+  "Error while reading type");
+return 0;
+}
+aiv->cur_pos++;
+type = byte;
+
+*ber_type_flags = type & (BER_TYPE_P_C_MASK | BER_TYPE_CLASS_MASK);
+
+if ((type & BER_TYPE_TAG_MASK) == BER_TYPE_LONG_FORM) {
+type = 0;
+while (true) {
+type <<= 7;
+if (qemu_read_bytes(aiv->qfile, &byte, 1) != 1) {
+error_setg(errp, "QEMUFile has an error, error was '%s'",
+  "Error while reading long type");
+return 0;
+}
+aiv->cur_pos++;
+
+type |= (byte & 0x

[Qemu-devel] [PATCH v4 5/9] ASN.1 output visitor

2013-03-21 Thread Stefan Berger
Implement an output visitor for ASN.1 BER and CER encoding.

Cc: Michael Tsirkin 
Signed-off-by: Stefan Berger 
Signed-off-by: Joel Schopp 
---
 configure |   2 +-
 include/qapi/ber-output-visitor.h |  28 ++
 include/qapi/ber.h| 107 ++
 include/qemu-common.h |   2 +
 qapi/Makefile.objs|   1 +
 qapi/ber-common.c |  86 +
 qapi/ber-common.h |  29 ++
 qapi/ber-output-visitor.c | 673 ++
 util/qemu-file.c  |  58 
 9 files changed, 985 insertions(+), 1 deletion(-)
 create mode 100644 include/qapi/ber-output-visitor.h
 create mode 100644 include/qapi/ber.h
 create mode 100644 qapi/ber-common.c
 create mode 100644 qapi/ber-common.h
 create mode 100644 qapi/ber-output-visitor.c

diff --git a/configure b/configure
index 46a7594..5e1d69f 100755
--- a/configure
+++ b/configure
@@ -2844,7 +2844,7 @@ fi
 # Do we need libm
 cat > $TMPC << EOF
 #include 
-int main(void) { return isnan(sin(0.0)); }
+int main(void) { return isnan(nan("NAN")); }
 EOF
 if compile_prog "" "" ; then
   :
diff --git a/include/qapi/ber-output-visitor.h 
b/include/qapi/ber-output-visitor.h
new file mode 100644
index 000..bd7e198
--- /dev/null
+++ b/include/qapi/ber-output-visitor.h
@@ -0,0 +1,28 @@
+/*
+ * BER Output Visitor header
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *  Stefan Berger 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef BER_OUTPUT_VISITOR_H
+#define BER_OUTPUT_VISITOR_H
+
+#include "qapi/visitor.h"
+#include "qapi/ber.h"
+
+typedef struct BEROutputVisitor BEROutputVisitor;
+
+BEROutputVisitor *ber_output_visitor_new(QEMUFile *, BERTypePC mode);
+void ber_output_visitor_cleanup(BEROutputVisitor *v);
+
+Visitor *ber_output_get_visitor(BEROutputVisitor *v);
+
+#endif
diff --git a/include/qapi/ber.h b/include/qapi/ber.h
new file mode 100644
index 000..ebf7bf1
--- /dev/null
+++ b/include/qapi/ber.h
@@ -0,0 +1,107 @@
+/*
+ * ASN.1 Basic Encoding Rules Common functions
+ *
+ * Copyright IBM, Corp. 2011, 2013
+ * Copyright Red Hat, Inc. 2011
+ *
+ * Authors:
+ *  Stefan Berger 
+ *  Michael Tsirkin   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+#ifndef QAPI_BER_H
+#define QAPI_BER_H
+
+/*
+ * This is a subset of BER for QEMU use.
+ * QEMU will use the DER encoding always with one extension from
+ * CER: SET and SEQUENCE types can have indefinite-length encoding
+ * if the encoding is not all immediately available.
+ *
+ * We assume that SET encodings can be available or not available,
+ * and that SEQUENCE encodings are available unless a SEQUENCE includes
+ * a non-available SET.
+ *
+ * The last is an extension to allow an arbitrarily large SET
+ * to be produced online without knowing the length in advance.
+ *
+ * All types used shall be universal, with explicit tagging, to simplify
+ * use by external tools.
+ */
+
+
+#define BER_TYPE_CLASS_SHIFT  6
+#define BER_TYPE_PC_SHIFT 5
+
+typedef enum ber_type_class {
+BER_TYPE_CLASS_UNIVERSAL = 0x0 << BER_TYPE_CLASS_SHIFT,
+BER_TYPE_CLASS_APPLICATION = 0x1 << BER_TYPE_CLASS_SHIFT,
+BER_TYPE_CLASS_CONTENT_SPECIFIC = 0x2 << BER_TYPE_CLASS_SHIFT,
+BER_TYPE_CLASS_PRIVATE = 0x3 << BER_TYPE_CLASS_SHIFT,
+BER_TYPE_CLASS_MASK = 0x3 << BER_TYPE_CLASS_SHIFT /* Mask to get class */
+} BERTypeClass;
+
+/* P/C bit */
+typedef enum ber_type_p_c {
+BER_TYPE_PRIMITIVE = 0x0 << BER_TYPE_PC_SHIFT,
+BER_TYPE_CONSTRUCTED = 0x1 << BER_TYPE_PC_SHIFT,
+BER_TYPE_P_C_MASK = 0x1 << BER_TYPE_PC_SHIFT /* Mask to get P/C bit */
+} BERTypePC;
+
+typedef enum ber_type_tag {
+BER_TYPE_EOC  /*  P0   0*/,
+BER_TYPE_BOOLEAN  /*  P1   1*/,
+BER_TYPE_INTEGER  /*  P2   2*/,
+BER_TYPE_BIT_STRING   /*  P/C  3   3*/,
+BER_TYPE_OCTET_STRING /*  P/C  4   4*/,
+BER_TYPE_NULL /*  P5   5*/,
+BER_TYPE_OBJECT_ID/*  P6   6*/,
+BER_TYPE_OBJECT_DESC  /*  P7   7*/,
+BER_TYPE_EXTERNAL /*  C8   8*/,
+BER_TYPE_REAL /*  P9   9*/,
+BER_TYPE_ENUMERATED   /*  P10  A*/,
+BER_TYPE_EMBEDDED /*  C11  B*/,
+BER_TYPE_UTF8_STRING  /*  P/C  12  C*/,
+BER_TYPE_RELATIVE_OID /*  P13  D*/,
+BER_TYPE_UNUSED_0xE   /**/,
+BER_TYPE_UNUSED_0xF   /**/,
+BER_TYPE_SEQUENCE /*  C16  10*/,
+BER_TYPE_SET  /*  C17  11*/,
+BER_TYPE_NUMERIC_STRING   /*  P/C  18  12*/,
+BER_T

[Qemu-devel] [PATCH v4 9/9] ASN.1 specific test cases

2013-03-21 Thread Stefan Berger
BER visitor tests give us some assurance that the BER visitor
code works, and also end up by extention helping out on our
code coverage of the filesystem tests.
After the output visitor invocation the resuling buffer is
compared against a known byte stream -- this will lock the
implementation into producing specific byte arrays.

Signed-off-by: Stefan Berger 
Signed-off-by: Joel Schopp 
---
 tests/Makefile   |   9 +
 tests/test-ber-visitor.c | 746 +++
 2 files changed, 755 insertions(+)
 create mode 100644 tests/test-ber-visitor.c

diff --git a/tests/Makefile b/tests/Makefile
index 578d732..b74f90f 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -23,6 +23,10 @@ check-unit-y += tests/test-string-input-visitor$(EXESUF)
 gcov-files-test-string-input-visitor-y = qapi/string-input-visitor.c
 check-unit-y += tests/test-string-output-visitor$(EXESUF)
 gcov-files-test-string-output-visitor-y = qapi/string-output-visitor.c
+check-unit-y += tests/test-ber-visitor$(EXESUF)
+gcov-files-test-vistor-y = qapi/ber-input-visitor.c
+
+
 check-unit-y += tests/test-coroutine$(EXESUF)
 ifeq ($(CONFIG_WIN32),y)
 gcov-files-test-coroutine-y = coroutine-win32.c
@@ -134,6 +138,11 @@ tests/fdc-test$(EXESUF): tests/fdc-test.o
 tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o
 tests/tmp105-test$(EXESUF): tests/tmp105-test.o
 
+tests/test-ber-visitor.o : QEMU_CFLAGS += -I include/qapi
+
+tests/test-ber-visitor.o: $(addprefix include/qapi/, ber.h ber-input-visitor.h 
ber-output-visitor.h) $(addprefix qapi/, ber-common.c ber-input-visitor.c 
ber-output-visitor.c)
+tests/test-ber-visitor$(EXESUF): tests/test-ber-visitor.o $(tools-obj-y) 
qapi/ber-output-visitor.o qapi/ber-input-visitor.o qapi/ber-common.o 
$(block-obj-y) libqemuutil.a libqemustub.a
+
 # QTest rules
 
 TARGETS=$(patsubst %-softmmu,%, $(filter %-softmmu,$(TARGET_DIRS)))
diff --git a/tests/test-ber-visitor.c b/tests/test-ber-visitor.c
new file mode 100644
index 000..f309432
--- /dev/null
+++ b/tests/test-ber-visitor.c
@@ -0,0 +1,746 @@
+/*
+ * BER Output Visitor unit-tests.
+ *
+ * Copyright (C) 2011 Red Hat Inc.
+ * Copyright (C) 2011 IBM Corporation
+ *
+ * Authors:
+ *  Luiz Capitulino 
+ *  Stefan Berger 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu-common.h"
+#include "qapi/ber-output-visitor.h"
+#include "qapi/ber-input-visitor.h"
+#include "hw/hw.h"
+#include "include/qapi/visitor.h"
+
+
+typedef struct TestExpResult {
+const uint8_t *exp;
+size_t exp_len;
+} TestExpResult;
+
+typedef struct TestInputOutputVisitorData {
+QEMUFile *qoutfile;
+BEROutputVisitor *bov;
+Visitor *ov;
+
+QEMUFile *qinfile;
+BERInputVisitor *biv;
+Visitor *iv;
+
+const TestExpResult *ter;
+} TestInputOutputVisitor;
+
+static void test_visitor_dump(const uint8_t *exp, size_t exp_len,
+  const uint8_t *act, size_t act_len)
+{
+size_t i;
+
+fprintf(stderr, "\nExpected output:");
+
+for (i = 0; i < exp_len; i++) {
+if ((i % 0x8) == 0) {
+fprintf(stderr, "\n");
+}
+fprintf(stderr, "0x%02x, ", exp[i]);
+}
+
+fprintf(stderr, "\nActual output:");
+
+for (i = 0; i < act_len; i++) {
+if ((i % 0x8) == 0) {
+fprintf(stderr, "\n");
+}
+fprintf(stderr, "0x%02x, ", act[i]);
+}
+fprintf(stderr, "\n");
+}
+
+static void test_visitor_check_result(const uint8_t *exp, size_t exp_len,
+  const uint8_t *act, size_t act_len)
+{
+size_t i;
+
+if (exp_len != act_len) {
+test_visitor_dump(exp, exp_len, act, act_len);
+}
+
+for (i = 0; i < exp_len; i++) {
+if (exp[i] != act[i]) {
+test_visitor_dump(exp, exp_len, act, act_len);
+}
+}
+}
+
+static void visitor_output_setup(TestInputOutputVisitor *data,
+ BERTypePC ber_type_pc,
+ const TestExpResult *ter)
+{
+data->ter = ter;
+
+data->qoutfile = qemu_bufopen("w", NULL);
+
+data->bov = ber_output_visitor_new(data->qoutfile, ber_type_pc);
+g_assert(data->bov != NULL);
+
+data->ov = ber_output_get_visitor(data->bov);
+g_assert(data->ov != NULL);
+}
+
+static void visitor_output_setup_primitive(TestInputOutputVisitor *data,
+   const void *results)
+{
+const TestExpResult *ter = results;
+
+visitor_output_setup(data, BER_TYPE_PRIMITIVE, ter);
+}
+
+static void visitor_output_setup_constructed(TestInputOutputVisitor *data,
+ const void *results)
+{
+const TestExpResult *ter = results;
+
+visitor_output_setup(data, BER_TYPE_CONSTRUCTED, ter);
+}
+
+static void visitor_input_setup(TestInputOutputVisitor *data)
+{
+const QEMUSizedBuffer *qsb = qe

[Qemu-devel] [PATCH v4 3/9] QEMUSizedBuffer

2013-03-21 Thread Stefan Berger
Using the QEMUFile interface, this patch adds support functions for operating
on in-memory sized buffers that can be written to or read from.

Signed-off-by: Stefan Berger 
Signed-off-by: Joel Schopp 
---
 include/migration/qemu-file.h |  13 ++
 include/qemu-common.h |  12 ++
 util/qemu-file.c  | 411 ++
 3 files changed, 436 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 728b6e2..0a297fc 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -24,6 +24,8 @@
 #ifndef QEMU_FILE_H
 #define QEMU_FILE_H 1
 
+#include 
+
 /* This function writes a chunk of data to a file at the given position.
  * The pos argument can be ignored if the file is only being used for
  * streaming.  The handler should try to write all of the data it can.
@@ -58,6 +60,15 @@ typedef struct QEMUFileOps {
 QEMUFileGetFD *get_fd;
 } QEMUFileOps;
 
+struct QEMUSizedBuffer {
+struct iovec *iov;
+size_t n_iov;
+size_t size; /* total allocated size in all iov's */
+size_t used; /* number of used bytes */
+};
+
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
+
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
@@ -71,6 +82,8 @@ void qemu_put_byte(QEMUFile *f, int v);
 int qemu_read_bytes(QEMUFile *f, uint8_t *buf, int size);
 int qemu_peek_bytes(QEMUFile *f, uint8_t *buf, int size, size_t offset);
 int qemu_write_bytes(QEMUFile *f, const uint8_t *buf, int size);
+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 7754ee2..7794fa7 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -448,4 +448,16 @@ int uleb128_decode_small(const uint8_t *in, uint32_t *n);
 
 void hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
 
+/* QEMU Sized Buffer */
+#include "include/migration/qemu-file.h"
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len);
+QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *);
+void qsb_free(QEMUSizedBuffer *);
+size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length);
+size_t qsb_get_length(const QEMUSizedBuffer *qsb);
+ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
+   uint8_t **buf);
+ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
+ off_t pos, size_t count);
+
 #endif
diff --git a/util/qemu-file.c b/util/qemu-file.c
index f8a54e7..89b0614 100644
--- a/util/qemu-file.c
+++ b/util/qemu-file.c
@@ -743,3 +743,414 @@ int qemu_write_bytes(QEMUFile *f, const uint8_t *buf, int 
size)
 
 return size;
 }
+
+
+#define QSB_CHUNK_SIZE  (1 << 10)
+#define QSB_MAX_CHUNK_SIZE  (10 * QSB_CHUNK_SIZE)
+
+/**
+ * Create a QEMUSizedBuffer
+ * This type of buffer uses scatter-gather lists internally and
+ * can grow to any size. Any data array in the scatter-gather list
+ * can hold different amount of bytes.
+ *
+ * @buffer: Optional buffer to copy into the QSB
+ * @len: size of initial buffer; if @buffer is given, buffer must
+ *   hold at least len bytes
+ *
+ * Returns a pointer to a QEMUSizedBuffer
+ */
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len)
+{
+QEMUSizedBuffer *qsb;
+size_t alloc_len, num_chunks, i, to_copy;
+size_t chunk_size = (len > QSB_MAX_CHUNK_SIZE)
+? QSB_MAX_CHUNK_SIZE
+: QSB_CHUNK_SIZE;
+
+if (len == 0) {
+/* we want to allocate at least one chunk */
+len = QSB_CHUNK_SIZE;
+}
+
+num_chunks = DIV_ROUND_UP(len, chunk_size);
+alloc_len = num_chunks * chunk_size;
+
+qsb = g_new0(QEMUSizedBuffer, 1);
+qsb->iov = g_new0(struct iovec, num_chunks);
+qsb->n_iov = num_chunks;
+
+for (i = 0; i < num_chunks; i++) {
+qsb->iov[i].iov_base = g_malloc0(chunk_size);
+qsb->iov[i].iov_len = chunk_size;
+if (buffer) {
+to_copy = (len - qsb->used) > chunk_size
+  ? chunk_size : (len - qsb->used);
+memcpy(qsb->iov[i].iov_base, &buffer[qsb->used], to_copy);
+qsb->used += to_copy;
+}
+}
+
+qsb->size = alloc_len;
+
+return qsb;
+}
+
+/**
+ * Free the QEMUSizedBuffer
+ *
+ * @qsb: The QEMUSizedBuffer to free
+ */
+void qsb_free(QEMUSizedBuffer *qsb)
+{
+size_t i;
+
+if (!qsb) {
+return;
+}
+
+for (i = 0; i < qsb->n_iov; i++) {
+g_free(qsb->iov[i].iov_base);
+}
+g_free(qsb->iov);
+g_free(qsb);
+}
+
+/**
+ * Get the number of of used bytes in the QEMUSizedBuffer
+ *
+ * @qsb: A QEMUSizedBuffer
+ *
+ * Returns the number of bytes currently used in this buffer
+ */
+size_t qsb_get_l

Re: [Qemu-devel] [PATCH] linux-user: change do_semop to return target errno when unsuccessful

2013-03-21 Thread Peter Maydell
On 21 March 2013 17:57, Petar Jovanovic  wrote:
> From: Petar Jovanovic 
>
> do_semop() is called from two places, and one of these fails to convert
> return error to target errno when semop fails. This patch changes the
> function to always return target errno in case of an unsuccessful call.
>
> Signed-off-by: Petar Jovanovic 

Reviewed-by: Peter Maydell 

-- PMM



Re: [Qemu-devel] [PATCH] qemu-ga: use key-value store to avoid recycling fd handles after restart

2013-03-21 Thread mdroth
On Thu, Mar 21, 2013 at 02:24:56PM -0400, Luiz Capitulino wrote:
> On Thu, 21 Mar 2013 11:13:13 -0500
> mdroth  wrote:
> 
> > > Looks like you guys have no *practical* problems to solve.  Congrats!
> > > Take a vacation!  Please report back no later than 275 years from now,
> > > to make sure this 64 bit fd counter overflow problem gets taken care of
> > > in time.  ;-P
> > > 
> > 
> > Haha, well, I didn't want to be that one lazy developer who brings about
> > the downfall of future human civilization... but if it's a really big
> > deal they'll probably send someone back from the future to let me know,
> > so maybe I'm jumping the gun a bit :)
> 
> I *am* that guy, but I was afraid to tell :)
> 
> > I just didn't want to introduce a new interface that relied on
> > interfaces that were planned for deprecation in the *long*-term, but i
> > think you're right, it's too much hassle for current users for too
> > little gain, and there's plenty of time to do it in the future so I'll
> > hold off on it for now.
> 
> Let me clarify it: when I read the code I didn't realize fd_counter
> would never wrap. I think this discussion is settled now. However, I
> still think that having an assert there is good practice.
> 
> I can post a patch myself.
> 

Sounds good :)



Re: [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages

2013-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2013 at 08:08:58PM +0200, Orit Wasserman wrote:
> On 03/21/2013 07:37 PM, Juan Quintela wrote:
> > Orit Wasserman  wrote:
> >> This will remove an unneeded copy of guest memory pages.
> >> For the page header and device state we still copy the data to the
> >> static buffer the other option is to allocate the memory on demand
> >> which is more expensive.
> >>
> >> Signed-off-by: Orit Wasserman 
> >> ---
> >>  arch_init.c | 2 +-
> >>  savevm.c| 2 +-
> >>  2 files changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch_init.c b/arch_init.c
> >> index 98e2bc6..27b53eb 100644
> >> --- a/arch_init.c
> >> +++ b/arch_init.c
> >> @@ -481,7 +481,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
> >>  /* XBZRLE overflow or normal page */
> >>  if (bytes_sent == -1) {
> >>  bytes_sent = save_block_hdr(f, block, offset, cont, 
> >> RAM_SAVE_FLAG_PAGE);
> >> -qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> >> +qemu_put_buffer_no_copy(f, p, TARGET_PAGE_SIZE);
> >>  bytes_sent += TARGET_PAGE_SIZE;
> >>  acct_info.norm_pages++;
> >>  }
> > 
> > Once here, shouldn't we also change:
> > 
> > block-migration.c::blk_send()
> > 
> > qemu_put_buffer(f, blk->buf, BLOCK_SIZE);
> > 
> > to nocopy?
> Sadly no, I looked at the code and I saw the buffer is freed immediately
> after call blk_send :(. look at mig_save_device_dirty.
> 
> Orit

Hmm if _nocopy also implies it is not safe to free the
buffer immediately, then I'd rather put this in a name
e.g. _async instead of _nocopy.

> > 
> > Again, this can be an additional patch.
> > 
> > 
> >> diff --git a/savevm.c b/savevm.c
> >> index fbfb9e3..50e8fb2 100644
> >> --- a/savevm.c
> >> +++ b/savevm.c
> >> @@ -634,7 +634,7 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const 
> >> uint8_t *buf, int size)
> >>  abort();
> >>  }
> >>  
> >> -f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
> >> +f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> >>  f->iov[f->iovcnt++].iov_len = size;
> > 
> > This bit should be in the previous patch, the error that I pointed?
> > 
> > Later, Juan.
> > 



[Qemu-devel] [PATCH v4 8/8] coalesce adjacent iovecs

2013-03-21 Thread Orit Wasserman
This way we send one big buffer instead of many small ones

Signed-off-by: Orit Wasserman 
---
 savevm.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/savevm.c b/savevm.c
index b024354..9e8ddee 100644
--- a/savevm.c
+++ b/savevm.c
@@ -622,6 +622,18 @@ int qemu_fclose(QEMUFile *f)
 return ret;
 }
 
+static void add_to_iovec(QEMUFile *f, const uint8_t *buf, int size)
+{
+/* check for adjoint buffer and colasce them */
+if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
+f->iov[f->iovcnt - 1].iov_len) {
+f->iov[f->iovcnt - 1].iov_len += size;
+} else {
+f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
+f->iov[f->iovcnt++].iov_len = size;
+}
+}
+
 void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size)
 {
 if (f->last_error) {
@@ -634,8 +646,7 @@ void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, 
int size)
 abort();
 }
 
-f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
-f->iov[f->iovcnt++].iov_len = size;
+add_to_iovec(f, buf, size);
 
 f->is_write = 1;
 f->bytes_xfer += size;
@@ -690,8 +701,8 @@ void qemu_put_byte(QEMUFile *f, int v)
 f->buf[f->buf_index++] = v;
 f->is_write = 1;
 f->bytes_xfer++;
-f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
-f->iov[f->iovcnt++].iov_len = 1;
+
+add_to_iovec(f, f->buf + (f->buf_index - 1), 1);
 
 if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
 qemu_fflush(f);
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 6/8] Add qemu_put_buffer_async

2013-03-21 Thread Orit Wasserman
This allow us to add a buffer to the iovec to send without copying it
into the static buffer, the buffer will be send later when qemu_fflush is 
called.

Signed-off-by: Orit Wasserman 
---
 include/migration/qemu-file.h |  5 +
 savevm.c  | 36 +++-
 2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 8d3da9b..402408c 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -75,6 +75,11 @@ int qemu_fclose(QEMUFile *f);
 int64_t qemu_ftell(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+/*
+ * put_buffer without copying the buffer.
+ * The buffer should be available till it is sent asynchronously.
+ */
+void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index 6c0cc7b..b024354 100644
--- a/savevm.c
+++ b/savevm.c
@@ -622,6 +622,29 @@ int qemu_fclose(QEMUFile *f)
 return ret;
 }
 
+void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size)
+{
+if (f->last_error) {
+return;
+}
+
+if (f->is_write == 0 && f->buf_index > 0) {
+fprintf(stderr,
+"Attempted to write to buffer while read buffer is not 
empty\n");
+abort();
+}
+
+f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
+f->iov[f->iovcnt++].iov_len = size;
+
+f->is_write = 1;
+f->bytes_xfer += size;
+
+if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
+qemu_fflush(f);
+}
+}
+
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size)
 {
 int l;
@@ -641,19 +664,14 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int 
size)
 if (l > size)
 l = size;
 memcpy(f->buf + f->buf_index, buf, l);
-f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
-f->iov[f->iovcnt++].iov_len = l;
 f->is_write = 1;
 f->buf_index += l;
-f->bytes_xfer += l;
+qemu_put_buffer_async(f, f->buf + (f->buf_index - l), l);
+if (qemu_file_get_error(f)) {
+break;
+}
 buf += l;
 size -= l;
-if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
-qemu_fflush(f);
-if (qemu_file_get_error(f)) {
-break;
-}
-}
 }
 }
 
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 4/8] Store the data to send also in iovec

2013-03-21 Thread Orit Wasserman
All data is still copied into the static buffer.

Signed-off-by: Orit Wasserman 
Reviewed-by: Juan Quintela 
---
 savevm.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/savevm.c b/savevm.c
index bd60169..fb20f13 100644
--- a/savevm.c
+++ b/savevm.c
@@ -114,6 +114,7 @@ void qemu_announce_self(void)
 /* savevm/loadvm support */
 
 #define IO_BUF_SIZE 32768
+#define MAX_IOV_SIZE MIN(IOV_MAX, 64)
 
 struct QEMUFile {
 const QEMUFileOps *ops;
@@ -129,6 +130,9 @@ struct QEMUFile {
 int buf_size; /* 0 when writing */
 uint8_t buf[IO_BUF_SIZE];
 
+struct iovec iov[MAX_IOV_SIZE];
+unsigned int iovcnt;
+
 int last_error;
 };
 
@@ -528,6 +532,7 @@ static void qemu_fflush(QEMUFile *f)
 f->pos += f->buf_index;
 }
 f->buf_index = 0;
+f->iovcnt = 0;
 }
 if (ret < 0) {
 qemu_file_set_error(f, ret);
@@ -620,12 +625,14 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int 
size)
 if (l > size)
 l = size;
 memcpy(f->buf + f->buf_index, buf, l);
+f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
+f->iov[f->iovcnt++].iov_len = l;
 f->is_write = 1;
 f->buf_index += l;
 f->bytes_xfer += l;
 buf += l;
 size -= l;
-if (f->buf_index >= IO_BUF_SIZE) {
+if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
 qemu_fflush(f);
 if (qemu_file_get_error(f)) {
 break;
@@ -649,8 +656,10 @@ void qemu_put_byte(QEMUFile *f, int v)
 f->buf[f->buf_index++] = v;
 f->is_write = 1;
 f->bytes_xfer++;
+f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
+f->iov[f->iovcnt++].iov_len = 1;
 
-if (f->buf_index >= IO_BUF_SIZE) {
+if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
 qemu_fflush(f);
 }
 }
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 5/8] Use writev ops if available

2013-03-21 Thread Orit Wasserman
Update qemu_fflush and stdio_close to use writev ops if they are available
Use the buffers stored in the iovec.

Signed-off-by: Orit Wasserman 
Reviewed-by: Juan Quintela 
---
 savevm.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/savevm.c b/savevm.c
index fb20f13..6c0cc7b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -293,7 +293,7 @@ static int stdio_fclose(void *opaque)
 QEMUFileStdio *s = opaque;
 int ret = 0;
 
-if (s->file->ops->put_buffer) {
+if (s->file->ops->put_buffer || s->file->ops->writev_buffer) {
 int fd = fileno(s->stdio_file);
 struct stat st;
 
@@ -516,24 +516,40 @@ static void qemu_file_set_error(QEMUFile *f, int ret)
 }
 }
 
-/** Flushes QEMUFile buffer
+/**
+ * Flushes QEMUFile buffer
  *
+ * If there is writev_buffer QEMUFileOps it uses it otherwise uses
+ * put_buffer ops.
  */
 static void qemu_fflush(QEMUFile *f)
 {
 int ret = 0;
+int i = 0;
 
-if (!f->ops->put_buffer) {
+if (!f->ops->writev_buffer && !f->ops->put_buffer) {
 return;
 }
-if (f->is_write && f->buf_index > 0) {
-ret = f->ops->put_buffer(f->opaque, f->buf, f->pos, f->buf_index);
-if (ret >= 0) {
-f->pos += f->buf_index;
+
+if (f->is_write && f->iovcnt > 0) {
+if (f->ops->writev_buffer) {
+ret = f->ops->writev_buffer(f->opaque, f->iov, f->iovcnt);
+if (ret >= 0) {
+f->pos += ret;
+}
+} else {
+for (i = 0; i < f->iovcnt && ret >= 0; i++) {
+ret = f->ops->put_buffer(f->opaque, f->iov[i].iov_base, f->pos,
+ f->iov[i].iov_len);
+if (ret >= 0) {
+f->pos += ret;
+}
+}
 }
 f->buf_index = 0;
 f->iovcnt = 0;
 }
+
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 3/8] Update bytes_xfer in qemu_put_byte

2013-03-21 Thread Orit Wasserman
Signed-off-by: Orit Wasserman 
Reviewed-by: Juan Quintela 
---
 savevm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/savevm.c b/savevm.c
index 6608b6e..bd60169 100644
--- a/savevm.c
+++ b/savevm.c
@@ -648,6 +648,8 @@ void qemu_put_byte(QEMUFile *f, int v)
 
 f->buf[f->buf_index++] = v;
 f->is_write = 1;
+f->bytes_xfer++;
+
 if (f->buf_index >= IO_BUF_SIZE) {
 qemu_fflush(f);
 }
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 7/8] Use qemu_put_buffer_async for guest memory pages

2013-03-21 Thread Orit Wasserman
This will remove an unneeded copy of guest memory pages.
For the page header and device state we still copy the data to the
static buffer the other option is to allocate the memory on demand
which is more expensive.

Signed-off-by: Orit Wasserman 
---
 arch_init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 98e2bc6..aa56a20 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -481,7 +481,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 /* XBZRLE overflow or normal page */
 if (bytes_sent == -1) {
 bytes_sent = save_block_hdr(f, block, offset, cont, 
RAM_SAVE_FLAG_PAGE);
-qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE);
 bytes_sent += TARGET_PAGE_SIZE;
 acct_info.norm_pages++;
 }
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 2/8] Add socket_writev_buffer function

2013-03-21 Thread Orit Wasserman
Signed-off-by: Orit Wasserman 
---
 savevm.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/savevm.c b/savevm.c
index 35c8d1e..6608b6e 100644
--- a/savevm.c
+++ b/savevm.c
@@ -39,6 +39,7 @@
 #include "qmp-commands.h"
 #include "trace.h"
 #include "qemu/bitops.h"
+#include "qemu/iov.h"
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -171,6 +172,19 @@ static void coroutine_fn yield_until_fd_readable(int fd)
 qemu_coroutine_yield();
 }
 
+static int socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt)
+{
+QEMUFileSocket *s = opaque;
+ssize_t len;
+ssize_t size = iov_size(iov, iovcnt);
+
+len = iov_send(s->fd, iov, iovcnt, 0, size);
+if (len < size) {
+len = -socket_error();
+}
+return len;
+}
+
 static int socket_get_fd(void *opaque)
 {
 QEMUFileSocket *s = opaque;
@@ -387,6 +401,7 @@ static const QEMUFileOps socket_read_ops = {
 static const QEMUFileOps socket_write_ops = {
 .get_fd = socket_get_fd,
 .put_buffer = socket_put_buffer,
+.writev_buffer = socket_writev_buffer,
 .close =  socket_close
 };
 
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 1/8] Add QemuFileWritevBuffer QemuFileOps

2013-03-21 Thread Orit Wasserman
This will allow us to write an iovec

Signed-off-by: Orit Wasserman 
Reviewed-by: Juan Quintela 
---
 include/migration/qemu-file.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index df81261..8d3da9b 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -51,11 +51,18 @@ typedef int (QEMUFileCloseFunc)(void *opaque);
  */
 typedef int (QEMUFileGetFD)(void *opaque);
 
+/*
+ * This function writes an iovec to file.
+ */
+typedef int (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
+   int iovcnt);
+
 typedef struct QEMUFileOps {
 QEMUFilePutBufferFunc *put_buffer;
 QEMUFileGetBufferFunc *get_buffer;
 QEMUFileCloseFunc *close;
 QEMUFileGetFD *get_fd;
+QEMUFileWritevBufferFunc *writev_buffer;
 } QEMUFileOps;
 
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 0/8] Migration: Remove copying of guest ram pages

2013-03-21 Thread Orit Wasserman
In migration all data is copied to a static buffer in QEMUFile,
this hurts our network bandwidth and CPU usage especially with large guests.
We switched to iovec for storing different buffers to send (even a byte field is
considered as a buffer) and use writev to send the iovec.
writev was chosen (as apposed to sendmsg) because it supprts non socket fds.
  
Guest memory pages are not copied by calling a new function 
qemu_put_buffer_no_copy.
The page header data and device state data are still copied into the static
buffer. This data consists of a lot of bytes and integer fields and the static
buffer is used to store it during batching.
Another improvement is changing qemu_putbe64/32/16 to create a single
buffer instead of several byte sized buffer.

git repository: git://github.com/oritwas/qemu.git sendv_v2

Changes from v3:
Use a variable for iov_size
Change f->bytes_xfer +=1 to f->bytes_xfer++
Remove unneeded "More optimized qemu_put_be64/32/16" patch
Move code to the right patch
Rename qemu_put_buffer_no_copy to qemu_put_buffer_async
Use function for updating the iovec that detect adjacent iovecs and coalesce
them.

Change from v2:
Always send data for the iovec even if writev_buffer is not implemented.
Coalesce adjacent iovecs to create one big buffer from small adjacent buffer.

Changes from v1:
Use iov_send for socket.
Make writev_buffer optional and if it is not implemented use put_buffer

Future work: Make number of iovec changeable

Orit Wasserman (8):
  Add QemuFileWritevBuffer QemuFileOps
  Add socket_writev_buffer function
  Update bytes_xfer in qemu_put_byte
  Store the data to send also in iovec
  Use writev ops if available
  Add qemu_put_buffer_async
  Use qemu_put_buffer_async for guest memory pages
  coalesce adjacent iovecs

 arch_init.c   |   2 +-
 include/migration/qemu-file.h |  12 +
 savevm.c  | 101 +++---
 3 files changed, 99 insertions(+), 16 deletions(-)

-- 
1.7.11.7




Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs

2013-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2013 at 07:22:01PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin"  wrote:
> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> We can optimize at some pount to write a bigger/different header and
> >> sent a bunch of pages together, but just now we don't have that code.
> >> 
> >> Later, Juan.
> >
> > Sending the page can do vmsplice, can't it?
> > Multipage is likely a good idea anyway, e.g. RDMA wants to
> > do this too.
> 
> RDMA requires de pages lock into memory, no?
> 
> Later, Juan.

That's a separate issue, but the point is it is asynchronous.
It can be made to behave exactly the same implementing this API:

- send async: lock, send
- flush: poll for completion, unlock





[Qemu-devel] [PATCH v4 8/9] Update MAINTAINERS

2013-03-21 Thread Stefan Berger
Since I'm throwing all this code out there I'm also signing up to maintain it.
Send me your bug reports.

Cc: Michael Tsirkin 
Cc: Stefan Berger 
Signed-off-by: Joel Schopp 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0ca7e1d..3e32ecb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -790,3 +790,11 @@ Stable 0.10
 L: qemu-sta...@nongnu.org
 T: git git://git.qemu.org/qemu-stable-0.10.git
 S: Orphan
+
+Visitors
+---
+ASN.1 BER
+M: Stefan Berger 
+S: Maintained
+F: /qapi/ber*
+F: include/qapi/ber*
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 1/9] Move some contents of savevm.c to qemu-file.c

2013-03-21 Thread Stefan Berger
This patch reorganizes qemu file operations to be in their own source file
instead of being lumped in savevm.c.  Besides being more logical for maintenance
it also makes it easier for future users of the file functions to add tests.

Cc: pbonz...@redhat.com
Cc: Michael Tsirkin 
Signed-off-by: Stefan Berger 
Signed-off-by: Joel Schopp 
---
 include/migration/qemu-file.h |   6 +
 savevm.c  | 690 
 util/Makefile.objs|   1 +
 util/qemu-file.c  | 715 ++
 4 files changed, 722 insertions(+), 690 deletions(-)
 create mode 100644 util/qemu-file.c

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index df81261..7194b84 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -98,6 +98,12 @@ void qemu_file_reset_rate_limit(QEMUFile *f);
 void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
+void qemu_file_set_error(QEMUFile *f, int ret);
+QEMUFile *qemu_fopen_bdrv(BlockDriverState *bs, int is_writable);
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
+int qemu_peek_byte(QEMUFile *f, int offset);
+void qemu_file_skip(QEMUFile *f, int size);
+void qemu_fflush(QEMUFile *f);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
diff --git a/savevm.c b/savevm.c
index 35c8d1e..8d78c3c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -109,696 +109,6 @@ void qemu_announce_self(void)
qemu_announce_self_once(&timer);
 }
 
-/***/
-/* savevm/loadvm support */
-
-#define IO_BUF_SIZE 32768
-
-struct QEMUFile {
-const QEMUFileOps *ops;
-void *opaque;
-int is_write;
-
-int64_t bytes_xfer;
-int64_t xfer_limit;
-
-int64_t pos; /* start of buffer when writing, end of buffer
-when reading */
-int buf_index;
-int buf_size; /* 0 when writing */
-uint8_t buf[IO_BUF_SIZE];
-
-int last_error;
-};
-
-typedef struct QEMUFileStdio
-{
-FILE *stdio_file;
-QEMUFile *file;
-} QEMUFileStdio;
-
-typedef struct QEMUFileSocket
-{
-int fd;
-QEMUFile *file;
-} QEMUFileSocket;
-
-typedef struct {
-Coroutine *co;
-int fd;
-} FDYieldUntilData;
-
-static void fd_coroutine_enter(void *opaque)
-{
-FDYieldUntilData *data = opaque;
-qemu_set_fd_handler(data->fd, NULL, NULL, NULL);
-qemu_coroutine_enter(data->co, NULL);
-}
-
-/**
- * Yield until a file descriptor becomes readable
- *
- * Note that this function clobbers the handlers for the file descriptor.
- */
-static void coroutine_fn yield_until_fd_readable(int fd)
-{
-FDYieldUntilData data;
-
-assert(qemu_in_coroutine());
-data.co = qemu_coroutine_self();
-data.fd = fd;
-qemu_set_fd_handler(fd, fd_coroutine_enter, NULL, &data);
-qemu_coroutine_yield();
-}
-
-static int socket_get_fd(void *opaque)
-{
-QEMUFileSocket *s = opaque;
-
-return s->fd;
-}
-
-static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
-{
-QEMUFileSocket *s = opaque;
-ssize_t len;
-
-for (;;) {
-len = qemu_recv(s->fd, buf, size, 0);
-if (len != -1) {
-break;
-}
-if (socket_error() == EAGAIN) {
-yield_until_fd_readable(s->fd);
-} else if (socket_error() != EINTR) {
-break;
-}
-}
-
-if (len == -1) {
-len = -socket_error();
-}
-return len;
-}
-
-static int socket_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, 
int size)
-{
-QEMUFileSocket *s = opaque;
-ssize_t len;
-
-len = qemu_send_full(s->fd, buf, size, 0);
-if (len < size) {
-len = -socket_error();
-}
-return len;
-}
-
-static int socket_close(void *opaque)
-{
-QEMUFileSocket *s = opaque;
-closesocket(s->fd);
-g_free(s);
-return 0;
-}
-
-static int stdio_get_fd(void *opaque)
-{
-QEMUFileStdio *s = opaque;
-
-return fileno(s->stdio_file);
-}
-
-static int stdio_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int 
size)
-{
-QEMUFileStdio *s = opaque;
-return fwrite(buf, 1, size, s->stdio_file);
-}
-
-static int stdio_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
-{
-QEMUFileStdio *s = opaque;
-FILE *fp = s->stdio_file;
-int bytes;
-
-for (;;) {
-clearerr(fp);
-bytes = fread(buf, 1, size, fp);
-if (bytes != 0 || !ferror(fp)) {
-break;
-}
-if (errno == EAGAIN) {
-yield_until_fd_readable(fileno(fp));
-} else if (errno != EINTR) {
-break;
-}
-}
-return bytes;
-}
-
-static int stdio_pclose(void *opaque)
-{
-QEMUFileStdio *s = opaque;
-int ret;
-ret = pclose(s->stdio_file);
-if (ret == -1) {
-ret = -errno;
-} else if (!WIFE

[Qemu-devel] [PATCH v4 4/9] QAPI: add type_sized_buffer

2013-03-21 Thread Stefan Berger
Add a sized buffer interface to qapi for serializing and
deserializing of u8[], u16[], u32[] and u64[] with proper
handling of endianess.

Cc: Michael Roth 
Cc: Michael Tsirkin 
Signed-off-by: Stefan Berger 
Signed-off-by: Joel Schopp 
---
 include/qapi/visitor-impl.h | 3 +++
 include/qapi/visitor.h  | 3 +++
 qapi/qapi-visit-core.c  | 8 
 3 files changed, 14 insertions(+)

diff --git a/include/qapi/visitor-impl.h b/include/qapi/visitor-impl.h
index 5159964..be4e5ab 100644
--- a/include/qapi/visitor-impl.h
+++ b/include/qapi/visitor-impl.h
@@ -34,6 +34,9 @@ struct Visitor
 void (*type_str)(Visitor *v, char **obj, const char *name, Error **errp);
 void (*type_number)(Visitor *v, double *obj, const char *name,
 Error **errp);
+void (*type_sized_buffer)(Visitor *v, void **obj, const char *name,
+  size_t elem_count, size_t elem_size,
+  Error **errp);
 
 /* May be NULL */
 void (*start_optional)(Visitor *v, bool *present, const char *name,
diff --git a/include/qapi/visitor.h b/include/qapi/visitor.h
index 1fef18c..66ba4bf 100644
--- a/include/qapi/visitor.h
+++ b/include/qapi/visitor.h
@@ -51,5 +51,8 @@ void visit_type_size(Visitor *v, uint64_t *obj, const char 
*name, Error **errp);
 void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp);
 void visit_type_str(Visitor *v, char **obj, const char *name, Error **errp);
 void visit_type_number(Visitor *v, double *obj, const char *name, Error 
**errp);
+void visit_type_sized_buffer(Visitor *v, void **obj, const char *name,
+ size_t elem_counter, size_t elem_size,
+ Error **errp);
 
 #endif
diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
index 401ee6e..374c0ff 100644
--- a/qapi/qapi-visit-core.c
+++ b/qapi/qapi-visit-core.c
@@ -313,3 +313,11 @@ void input_type_enum(Visitor *v, int *obj, const char 
*strings[],
 g_free(enum_str);
 *obj = value;
 }
+
+void visit_type_sized_buffer(Visitor *v, void **obj, const char *name,
+ size_t elem_count, size_t elem_size, Error **errp)
+{
+if (!error_is_set(errp)) {
+v->type_sized_buffer(v, obj, name, elem_count, elem_size, errp);
+}
+}
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 2/9] 3 new file wrappers

2013-03-21 Thread Stefan Berger
Add 3 very short file wrapper functions to make code that follows more
readable.

Cc: Michael Tsirkin 
Signed-off-by: Stefan Berger 
Signed-off-by: Joel Schopp 
---
 include/migration/qemu-file.h |  3 +++
 util/qemu-file.c  | 30 ++
 2 files changed, 33 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 7194b84..728b6e2 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -68,6 +68,9 @@ int qemu_fclose(QEMUFile *f);
 int64_t qemu_ftell(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+int qemu_read_bytes(QEMUFile *f, uint8_t *buf, int size);
+int qemu_peek_bytes(QEMUFile *f, uint8_t *buf, int size, size_t offset);
+int qemu_write_bytes(QEMUFile *f, const uint8_t *buf, int size);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/util/qemu-file.c b/util/qemu-file.c
index 4fed6d5..f8a54e7 100644
--- a/util/qemu-file.c
+++ b/util/qemu-file.c
@@ -713,3 +713,33 @@ uint64_t qemu_get_be64(QEMUFile *f)
 return v;
 }
 
+int qemu_read_bytes(QEMUFile *f, uint8_t *buf, int size)
+{
+if (qemu_file_get_error(f)) {
+return -1;
+}
+return qemu_get_buffer(f, buf, size);
+}
+
+int qemu_peek_bytes(QEMUFile *f, uint8_t *buf, int size, size_t offset)
+{
+if (qemu_file_get_error(f)) {
+return -1;
+}
+return qemu_peek_buffer(f, buf, size, offset);
+}
+
+int qemu_write_bytes(QEMUFile *f, const uint8_t *buf, int size)
+{
+if (qemu_file_get_error(f)) {
+return -1;
+}
+
+qemu_put_buffer(f, buf, size);
+
+if (qemu_file_get_error(f)) {
+return -1;
+}
+
+return size;
+}
-- 
1.7.11.7




[Qemu-devel] [PATCH v4 0/9] Subject: Implement and test ASN.1 BER visitors

2013-03-21 Thread Stefan Berger
This patch series implements ASN.1 BER visitors for encoding and decoding
of data into byte streams.

Stefan Berger (9):
  Move some contents of savevm.c to qemu-file.c
  3 new file wrappers
  QEMUSizedBuffer
  QAPI: add type_sized_buffer
  ASN.1 output visitor
  ASN.1 input visitor
  Extend test-visitor-serialization with ASN.1 visitor(s)
  Update MAINTAINERS
  ASN.1 specific test cases

 MAINTAINERS|8 +
 configure  |2 +-
 include/migration/qemu-file.h  |   22 +
 include/qapi/ber-input-visitor.h   |   30 +
 include/qapi/ber-output-visitor.h  |   28 +
 include/qapi/ber.h |  107 
 include/qapi/visitor-impl.h|3 +
 include/qapi/visitor.h |3 +
 include/qemu-common.h  |   14 +
 qapi/Makefile.objs |1 +
 qapi/ber-common.c  |   86 +++
 qapi/ber-common.h  |   29 +
 qapi/ber-input-visitor.c   | 1141 +
 qapi/ber-output-visitor.c  |  673 
 qapi/qapi-visit-core.c |8 +
 savevm.c   |  690 
 tests/Makefile |   11 +-
 tests/test-ber-visitor.c   |  746 ++
 tests/test-visitor-serialization.c |   73 +++
 util/Makefile.objs |1 +
 util/qemu-file.c   | 1214 
 21 files changed, 4198 insertions(+), 692 deletions(-)
 create mode 100644 include/qapi/ber-input-visitor.h
 create mode 100644 include/qapi/ber-output-visitor.h
 create mode 100644 include/qapi/ber.h
 create mode 100644 qapi/ber-common.c
 create mode 100644 qapi/ber-common.h
 create mode 100644 qapi/ber-input-visitor.c
 create mode 100644 qapi/ber-output-visitor.c
 create mode 100644 tests/test-ber-visitor.c
 create mode 100644 util/qemu-file.c

-- 
1.7.11.7




Re: [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages

2013-03-21 Thread Orit Wasserman
On 03/21/2013 07:37 PM, Juan Quintela wrote:
> Orit Wasserman  wrote:
>> This will remove an unneeded copy of guest memory pages.
>> For the page header and device state we still copy the data to the
>> static buffer the other option is to allocate the memory on demand
>> which is more expensive.
>>
>> Signed-off-by: Orit Wasserman 
>> ---
>>  arch_init.c | 2 +-
>>  savevm.c| 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 98e2bc6..27b53eb 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -481,7 +481,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>>  /* XBZRLE overflow or normal page */
>>  if (bytes_sent == -1) {
>>  bytes_sent = save_block_hdr(f, block, offset, cont, 
>> RAM_SAVE_FLAG_PAGE);
>> -qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
>> +qemu_put_buffer_no_copy(f, p, TARGET_PAGE_SIZE);
>>  bytes_sent += TARGET_PAGE_SIZE;
>>  acct_info.norm_pages++;
>>  }
> 
> Once here, shouldn't we also change:
> 
> block-migration.c::blk_send()
> 
> qemu_put_buffer(f, blk->buf, BLOCK_SIZE);
> 
> to nocopy?
Sadly no, I looked at the code and I saw the buffer is freed immediately
after call blk_send :(. look at mig_save_device_dirty.

Orit
> 
> Again, this can be an additional patch.
> 
> 
>> diff --git a/savevm.c b/savevm.c
>> index fbfb9e3..50e8fb2 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -634,7 +634,7 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t 
>> *buf, int size)
>>  abort();
>>  }
>>  
>> -f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
>> +f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
>>  f->iov[f->iovcnt++].iov_len = size;
> 
> This bit should be in the previous patch, the error that I pointed?
> 
> Later, Juan.
> 




Re: [Qemu-devel] [PATCH 22/23] gtk: show a window for each graphical QemuConsole

2013-03-21 Thread Anthony Liguori
Gerd Hoffmann  writes:

>   Hi,
>
> If so, we're going to need to model what the hardware actually
> does, which is that there's a single connection on the back
> of the box for a monitor, and it's guest software controllable
> which of the two display devices is routed to the connection...

 How does this work in 1.4?
>>>
>>> I guess the second display device was never ever shown anywhere?
>> 
>> Correct. We rely on "last display device wins" plus the fact this
>> happens to match up with the device Linux chooses for display.
>> This is obviously not great but up til now QEMU hasn't actually
>> supported multiple display devices so I haven't worried about it.
>
> Ok.
>
> I think the most sensible way to handle this is to implement the output
> routing device, make it own the (single) QemuConsole, and depending on
> the router state the one or the other display device is allowed to
> render to the QemuConsole.

Where does the switching happen in hardware?  Is this two devices with a
DVI port with a switch on it to have a single output port or it is
something more sophisticated where there are two memory regions and a
register is used to select which one is written out?

Regards,

Anthony Liguori




Re: [Qemu-devel] [PATCH] qemu-ga: use key-value store to avoid recycling fd handles after restart

2013-03-21 Thread Luiz Capitulino
On Thu, 21 Mar 2013 11:13:13 -0500
mdroth  wrote:

> > Looks like you guys have no *practical* problems to solve.  Congrats!
> > Take a vacation!  Please report back no later than 275 years from now,
> > to make sure this 64 bit fd counter overflow problem gets taken care of
> > in time.  ;-P
> > 
> 
> Haha, well, I didn't want to be that one lazy developer who brings about
> the downfall of future human civilization... but if it's a really big
> deal they'll probably send someone back from the future to let me know,
> so maybe I'm jumping the gun a bit :)

I *am* that guy, but I was afraid to tell :)

> I just didn't want to introduce a new interface that relied on
> interfaces that were planned for deprecation in the *long*-term, but i
> think you're right, it's too much hassle for current users for too
> little gain, and there's plenty of time to do it in the future so I'll
> hold off on it for now.

Let me clarify it: when I read the code I didn't realize fd_counter
would never wrap. I think this discussion is settled now. However, I
still think that having an assert there is good practice.

I can post a patch myself.



Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs

2013-03-21 Thread Juan Quintela
"Michael S. Tsirkin"  wrote:

>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> We can optimize at some pount to write a bigger/different header and
>> sent a bunch of pages together, but just now we don't have that code.
>> 
>> Later, Juan.
>
> Sending the page can do vmsplice, can't it?
> Multipage is likely a good idea anyway, e.g. RDMA wants to
> do this too.

RDMA requires de pages lock into memory, no?

Later, Juan.



Re: [Qemu-devel] [PATCH 1/2] char: add qemu_chr_be_is_fe_connected

2013-03-21 Thread Anthony Liguori
Alon Levy  writes:

> Note that the handler is called chr_is_guest_connected and not
> chr_is_fe_connected, consistent with other members of CharDriverState.

Sorry, I don't get it.

There isn't a notion of "connected" for the front-ends in the char
layer.  The closest thing is whether add_handlers() have been called or
not.

I really dislike the idea of introduction a new concept to the char
layer in a half baked way.

Why can't migration notifiers be used for this?  I think Gerd objected
to using a migration *handler* but not necessarily a state notifier.

Regards,

Anthony Liguori

>
> Signed-off-by: Alon Levy 
> ---
>  hw/virtio-console.c |  9 +
>  include/char/char.h | 11 +++
>  qemu-char.c |  9 +
>  3 files changed, 29 insertions(+)
>
> diff --git a/hw/virtio-console.c b/hw/virtio-console.c
> index e2d1c58..643e24e 100644
> --- a/hw/virtio-console.c
> +++ b/hw/virtio-console.c
> @@ -120,6 +120,13 @@ static void chr_event(void *opaque, int event)
>  }
>  }
>  
> +static bool chr_is_guest_connected(void *opaque)
> +{
> +VirtConsole *vcon = opaque;
> +
> +return vcon->port.guest_connected;
> +}
> +
>  static int virtconsole_initfn(VirtIOSerialPort *port)
>  {
>  VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
> @@ -133,6 +140,8 @@ static int virtconsole_initfn(VirtIOSerialPort *port)
>  if (vcon->chr) {
>  qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read, chr_event,
>vcon);
> +/* only user of chr_is_guest_connected so leave it as special cased*/
> +vcon->chr->chr_is_guest_connected = chr_is_guest_connected;
>  }
>  
>  return 0;
> diff --git a/include/char/char.h b/include/char/char.h
> index 0326b2a..b41ddc0 100644
> --- a/include/char/char.h
> +++ b/include/char/char.h
> @@ -52,6 +52,7 @@ typedef struct {
>  #define CHR_TIOCM_RTS0x004
>  
>  typedef void IOEventHandler(void *opaque, int event);
> +typedef bool IOIsGuestConnectedHandler(void *opaque);
>  
>  struct CharDriverState {
>  void (*init)(struct CharDriverState *s);
> @@ -64,6 +65,7 @@ struct CharDriverState {
>  IOEventHandler *chr_event;
>  IOCanReadHandler *chr_can_read;
>  IOReadHandler *chr_read;
> +IOIsGuestConnectedHandler *chr_is_guest_connected;
>  void *handler_opaque;
>  void (*chr_close)(struct CharDriverState *chr);
>  void (*chr_accept_input)(struct CharDriverState *chr);
> @@ -229,6 +231,15 @@ void qemu_chr_be_write(CharDriverState *s, uint8_t *buf, 
> int len);
>   */
>  void qemu_chr_be_event(CharDriverState *s, int event);
>  
> +/**
> + * @qemu_chr_be_is_fe_connected:
> + *
> + * Back end calls this to check if the front end is connected.
> + *
> + * Returns: true if the guest (front end) is connected, false otherwise.
> + */
> +bool qemu_chr_be_is_fe_connected(CharDriverState *s);
> +
>  void qemu_chr_add_handlers(CharDriverState *s,
> IOCanReadHandler *fd_can_read,
> IOReadHandler *fd_read,
> diff --git a/qemu-char.c b/qemu-char.c
> index 4e011df..77a501a 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -120,6 +120,15 @@ void qemu_chr_be_event(CharDriverState *s, int event)
>  s->chr_event(s->handler_opaque, event);
>  }
>  
> +bool qemu_chr_be_is_fe_connected(CharDriverState *s)
> +{
> +if (s->chr_is_guest_connected) {
> +return s->chr_is_guest_connected(s->handler_opaque);
> +}
> +/* default to always connected */
> +return true;
> +}
> +
>  static gboolean qemu_chr_generic_open_bh(gpointer opaque)
>  {
>  CharDriverState *s = opaque;
> -- 
> 1.8.1.4




Re: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2013 at 11:11:15AM -0600, Jason Gunthorpe wrote:
> On Thu, Mar 21, 2013 at 11:39:47AM +0200, Michael S. Tsirkin wrote:
> > On Thu, Mar 21, 2013 at 02:13:38AM -0700, Roland Dreier wrote:
> > > On Thu, Mar 21, 2013 at 1:51 AM, Michael S. Tsirkin  
> > > wrote:
> > > >> In that case, no, I don't see any reason for LOCAL_WRITE, since the
> > > >> only RDMA operations that will access this memory are remote reads.
> > > >
> > > > What is the meaning of LOCAL_WRITE then? There are no local
> > > > RDMA writes as far as I can see.
> > > 
> > > Umm, it means you're giving the local adapter permission to write to
> > > that memory.  So you can use it as a receive buffer or as the target
> > > for remote data from an RDMA read operation.
> > 
> > Well RDMA read has it's own flag, IB_ACCESS_REMOTE_READ.
> > I don't see why do you need to give adapter permission
> 
> The access flags have to do with what independent access remote nodes
> get. There are four major cases:
> 
> access = IBV_ACCESS_REMOTE_READ says the adaptor will let remote nodes
> read the memory.
> 
> access = 0 (ie IBV_ACCESS_LOCAL_READ) says that only the adaptor, under
> the direct control of the application, can read this memory. Remote
> nodes are barred.
> 
> access = IBV_ACCESS_REMOTE_WRITE|IBV_ACCESS_LOCAL_WRITE says the adaptor
> will let remote nodes write the memory
> 
> access = IBV_ACCESS_LOCAL_WRITE bars remote nodes from writing to that
> memory. Only the adaptor, under the direct control of the application,
> can write the memory.

This is the one I find redundant. Since the write will be done
by the adaptor under direct control by the application,
why does it make sense to declare this beforehand?
If you don't want to allow local write access to memory,
just do not post any receive WRs with this address.
If you posted and regret it, reset the QP to cancel.

But IB spec specified LOCAL_WRITE in this redundant way so I guess
applications expect it to have the semantics defined there, I just
didn't remember what they are.

No way around it then, need another flag.



Re: [Qemu-devel] [PATCHv3 3/9] buffer_is_zero: use vector optimizations if possible

2013-03-21 Thread Eric Blake
On 03/21/2013 09:57 AM, Peter Lieven wrote:
> performance gain on SSE2 is approx. 20-25%. altivec
> is not tested. performance for unsigned long arithmetic
> is unchanged.
> 
> Signed-off-by: Peter Lieven 
> ---
>  util/cutils.c |5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/util/cutils.c b/util/cutils.c
> index 6d079ac..52205a2 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -210,6 +210,11 @@ bool buffer_is_zero(const void *buf, size_t len)
>  long d0, d1, d2, d3;
>  const long * const data = buf;
>  
> +/* use vector optimized zero check if possible */
> +if (can_use_buffer_find_nonzero_offset(buf,len)) {

Space after comma.

> +return buffer_find_nonzero_offset(buf, len)==len;

And still missing spaces around the '==', even though I pointed it out
in v2.  Run your series through checkpatch.pl.

As whitespace cleanups are trivial, you can send v4 with:

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCHv3 2/9] cutils: add a function to find non-zero content in a buffer

2013-03-21 Thread Eric Blake
On 03/21/2013 09:57 AM, Peter Lieven wrote:
> this adds buffer_find_nonzero_offset() which is a SSE2/Altives

s/Altives/Altivec/

> optimized function that searches for non-zero content in a
> buffer.
> 
> due to the optimizations used in the function there are restrictions
> on buffer address and search length. the function
> can_use_buffer_find_nonzero_content() can be used to check if
> the function can be used safely.
> 
> Signed-off-by: Peter Lieven 
> ---
>  include/qemu-common.h |3 +++
>  util/cutils.c |   50 
> +
>  2 files changed, 53 insertions(+)

> +inline bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len);
> +inline size_t buffer_find_nonzero_offset(const void *buf, size_t len);

Ouch.  It is okay to add a 'static inline' function, but then the
implementation must live in this header.  Otherwise, the function must
not be inline, or you risk linker errors.

> +++ b/util/cutils.c
> @@ -143,6 +143,56 @@ int qemu_fdatasync(int fd)
>  }
>  
>  /*
> + * Searches for an area with non-zero content in a buffer
> + *
> + * Attention! The len must be a multiple of 8 * sizeof(VECTYPE) 

Should we call out BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR instead of a
magic number here?  But I'm okay with leaving it as-is.

> + * and addr must be a multiple of sizeof(VECTYPE) due to 

Trailing whitespace (here, and on several other lines).  Please run your
series through scripts/checkpatch.pl before submitting v4.

> + * restriction of optimizations in this function.
> + * 
> + * can_use_buffer_find_nonzero_offset() can be used to check
> + * these requirements.
> + * 
> + * The return value is the offset of the non-zero area rounded
> + * down to 8 * sizeof(VECTYPE). If the buffer is all zero 

Same comment on this use of '8'.

> + * the return value is equal to len.
> + */
> +
> +inline size_t buffer_find_nonzero_offset(const void *buf, size_t len)

s/inline// (or move it to a 'static inline' definition in the .h)

> +{
> +VECTYPE *p = (VECTYPE *)buf;
> +VECTYPE zero = ZERO_SPLAT;
> +size_t i;
> +

You copied the 'Attention! ...' message from buffer_is_zero, which
currently asserts that its condition is held.  Therefore, consistency
would argue that you should assert your preconditions here, even if it
adds more to the code size.  But this is something where a maintainer
might have a better opinion on whether to keep the code robust with an
assert(), or whether the faster operation without sanity checking is
more appropriate (in which case a followup to remove the assert from
buffer_is_zero would make sense).

>   * Checks if a buffer is all zeroes
>   *
>   * Attention! The len must be a multiple of 4 * sizeof(long) due to
> 

Cleaning up whitespace is trivial; but the incorrect use of 'inline'
requires a v4.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] ifname=xxx for -netdev bridge

2013-03-21 Thread Alexandre Kandalintsev
Hi!


Here is the patch that allows us to specify the name of tap interface
when -netdev bridge is used. It's like -netdev tap,ifname=xxx, but for
bridges.


** Motivation **

We've got zillions of VMs and would like to see meaningful names of tap
interfaces. This is really useful for for, e.g., system administrators
in case they want to run tcpdump on it.


** How it works **

Just specify a ifname= parameter as it is done if --netdev tap is used.
However, as it requires root privs, the interface renaming is
actually done by qemu-bridge-helper. --netdev tap,ifname=xxx will fail
if qemu is launched not from root.


** TODO **

1. Update docs
2. I'm afraid that net_init_tap should not run helper with
--br=DEFAULT_BRIDGE_INTERFACE . At least bridge name should be tunnable.
But this is a future work.
3. May be we should call qemu-bridge-helper for tap interface renamings
because it always has root privs?


From 079027fe3696de2e2adc8e60377b995dd9548eac Mon Sep 17 00:00:00 2001
From: Alexandre Kandalintsev 
Date: Thu, 21 Mar 2013 18:48:12 +0100
Subject: [PATCH] added support for ifname in -netdev bridge


Signed-off-by: Alexandre Kandalintsev 
---
 include/net/net.h|  1 +
 net/tap.c| 25 -
 qapi-schema.json |  1 +
 qemu-bridge-helper.c | 12 ++--
 4 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index cb049a1..4b25eb5 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -172,6 +172,7 @@ NetClientState *net_hub_port_find(int hub_id);
 #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown"
 #define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR "/qemu-bridge-helper"
 #define DEFAULT_BRIDGE_INTERFACE "br0"
+#define DEFAULT_BRIDGE_IFNAME "tap%d"
 
 void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd);
 
diff --git a/net/tap.c b/net/tap.c
index daab350..d2e4bfc 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -424,11 +424,11 @@ static int recv_fd(int c)
 return len;
 }
 
-static int net_bridge_run_helper(const char *helper, const char *bridge)
+static int net_bridge_run_helper(const char *helper, const char *bridge, const 
char *ifname)
 {
 sigset_t oldmask, mask;
 int pid, status;
-char *args[5];
+char *args[6];
 char **parg;
 int sv[2];
 
@@ -446,6 +446,7 @@ static int net_bridge_run_helper(const char *helper, const 
char *bridge)
 int open_max = sysconf(_SC_OPEN_MAX), i;
 char fd_buf[6+10];
 char br_buf[6+IFNAMSIZ] = {0};
+char ifname_buf[9+IFNAMSIZ] = {0};
 char helper_cmd[PATH_MAX + sizeof(fd_buf) + sizeof(br_buf) + 15];
 
 for (i = 0; i < open_max; i++) {
@@ -459,6 +460,10 @@ static int net_bridge_run_helper(const char *helper, const 
char *bridge)
 
 snprintf(fd_buf, sizeof(fd_buf), "%s%d", "--fd=", sv[1]);
 
+if (ifname) {
+  snprintf(ifname_buf, sizeof(ifname_buf), "%s%s", "--ifname=", 
ifname);
+}
+
 if (strrchr(helper, ' ') || strrchr(helper, '\t')) {
 /* assume helper is a command */
 
@@ -466,8 +471,8 @@ static int net_bridge_run_helper(const char *helper, const 
char *bridge)
 snprintf(br_buf, sizeof(br_buf), "%s%s", "--br=", bridge);
 }
 
-snprintf(helper_cmd, sizeof(helper_cmd), "%s %s %s %s",
- helper, "--use-vnet", fd_buf, br_buf);
+snprintf(helper_cmd, sizeof(helper_cmd), "%s %s %s %s %s",
+ helper, "--use-vnet", fd_buf, br_buf, ifname_buf);
 
 parg = args;
 *parg++ = (char *)"sh";
@@ -486,6 +491,7 @@ static int net_bridge_run_helper(const char *helper, const 
char *bridge)
 *parg++ = (char *)"--use-vnet";
 *parg++ = fd_buf;
 *parg++ = br_buf;
+*parg++ = ifname_buf;
 *parg++ = NULL;
 
 execv(helper, args);
@@ -524,7 +530,7 @@ int net_init_bridge(const NetClientOptions *opts, const 
char *name,
 NetClientState *peer)
 {
 const NetdevBridgeOptions *bridge;
-const char *helper, *br;
+const char *helper, *br, *ifname;
 
 TAPState *s;
 int fd, vnet_hdr;
@@ -534,8 +540,9 @@ int net_init_bridge(const NetClientOptions *opts, const 
char *name,
 
 helper = bridge->has_helper ? bridge->helper : DEFAULT_BRIDGE_HELPER;
 br = bridge->has_br ? bridge->br : DEFAULT_BRIDGE_INTERFACE;
+ifname = bridge->has_ifname ? bridge->ifname : DEFAULT_BRIDGE_IFNAME;
 
-fd = net_bridge_run_helper(helper, br);
+fd = net_bridge_run_helper(helper, br, ifname);
 if (fd == -1) {
 return -1;
 }
@@ -686,11 +693,12 @@ int net_init_tap(const NetClientOptions *opts, const char 
*name,
 const char *script = NULL; /* suppress wrong "uninit'd use" gcc warning */
 const char *downscript = NULL;
 const char *vhostfdname;
+const char *br;
 char ifname[128];
 
 assert(opts->kind == NET_CLIENT_OPTIONS_KIND_TAP);
 tap

Re: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2013 at 11:57:32AM -0600, Jason Gunthorpe wrote:
> On Thu, Mar 21, 2013 at 07:42:37PM +0200, Michael S. Tsirkin wrote:
> 
> > It doesn't actually, and our app would sometimes write to these pages.
> > It simply does not care which version does the remote get in this case
> > since we track writes and resend later.
> 
> Heh, somehow I thought you might say that :)
> 
> A new flag seems like the only way then - maybe:
>   IBV_ACCESS_NON_COHERENT - The adaptor and the CPU do not share
>  a coherent view of registered memory. Memory writes from the CPU
>  after ibv_reg_mr completes may not be reflected in the memory
>  viewed by the adaptor.
> 
>  Can only be combined with read only access permissions.
> 
> Jason

I kind of like _GIFT for name, gifts are nice :) But yes that's exactly
the semantics we need.

-- 
MST



[Qemu-devel] [PATCH] linux-user: change do_semop to return target errno when unsuccessful

2013-03-21 Thread Petar Jovanovic
From: Petar Jovanovic 

do_semop() is called from two places, and one of these fails to convert
return error to target errno when semop fails. This patch changes the
function to always return target errno in case of an unsuccessful call.

Signed-off-by: Petar Jovanovic 
---
 linux-user/syscall.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ee82a2d..3c4c155 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2764,7 +2764,7 @@ static inline abi_long do_semop(int semid, abi_long ptr, 
unsigned nsops)
 if (target_to_host_sembuf(sops, ptr, nsops))
 return -TARGET_EFAULT;
 
-return semop(semid, sops, nsops);
+return get_errno(semop(semid, sops, nsops));
 }
 
 struct target_msqid_ds
@@ -6957,7 +6957,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 #endif
 #ifdef TARGET_NR_semop
 case TARGET_NR_semop:
-ret = get_errno(do_semop(arg1, arg2, arg3));
+ret = do_semop(arg1, arg2, arg3);
 break;
 #endif
 #ifdef TARGET_NR_semctl
-- 
1.7.9.5




Re: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2013 at 11:21:50AM -0600, Jason Gunthorpe wrote:
> On Thu, Mar 21, 2013 at 07:15:25PM +0200, Michael S. Tsirkin wrote:
> 
> > No because application does this:
> > init page
> > 
> > ...
> > 
> > after a lot of time
> > 
> > ..
> > 
> > register
> > send
> > unregister
> > 
> > so it can not be read only.
> 
> mprotect(READONLY)
> register
> send
> unregister
> mprotect(WRITABLE)
> 
> ?
> With something like GIFT the app already has to give up writing to the
> pages while they are moving, so changing the protection seems in line
> with that?
> 
> Jason

It doesn't actually, and our app would sometimes write to these pages.
It simply does not care which version does the remote get in this case
since we track writes and resend later.

Also this is per-page, MRs have byte granularity so easier to use.

-- 
MST



Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs

2013-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2013 at 06:44:14PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin"  wrote:
> > On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote:
> >> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote:
> >> > On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote:
> >> >> This way we send one big buffer instead of many small ones
> >> >>
> >> >> Signed-off-by: Orit Wasserman 
> >> > 
> >> > Why does this happen BTW?
> >> 
> >> It happens in the last phase when we send the device state that
> >> consists of a lot
> >> bytes and int field that are written using qemu_put_byte/be16/...
> >> 
> >
> > Confused  I thought device_state does not use _nocopy?
> > My idea of using vmsplice relies exactly on this:
> > we can not splice device state ...
> 
> 
> As it is today, I am not sure that we can use vmsplice() because we
> are sending:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> We can optimize at some pount to write a bigger/different header and
> sent a bunch of pages together, but just now we don't have that code.
> 
> Later, Juan.

Sending the page can do vmsplice, can't it?
Multipage is likely a good idea anyway, e.g. RDMA wants to
do this too.



Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs

2013-03-21 Thread Juan Quintela
"Michael S. Tsirkin"  wrote:
> On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote:
>> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote:
>> > On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote:
>> >> This way we send one big buffer instead of many small ones
>> >>
>> >> Signed-off-by: Orit Wasserman 
>> > 
>> > Why does this happen BTW?
>> 
>> It happens in the last phase when we send the device state that
>> consists of a lot
>> bytes and int field that are written using qemu_put_byte/be16/...
>> 
>
> Confused  I thought device_state does not use _nocopy?
> My idea of using vmsplice relies exactly on this:
> we can not splice device state ...


As it is today, I am not sure that we can use vmsplice() because we
are sending:









We can optimize at some pount to write a bigger/different header and
sent a bunch of pages together, but just now we don't have that code.

Later, Juan.



Re: [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages

2013-03-21 Thread Juan Quintela
Orit Wasserman  wrote:
> In migration all data is copied to a static buffer in QEMUFile,
> this hurts our network bandwidth and CPU usage especially with large guests.
> We switched to iovec for storing different buffers to send (even a byte field 
> is
> considered as a buffer) and use writev to send the iovec.
> writev was chosen (as apposed to sendmsg) because it supprts non socket fds.
>   
> Guest memory pages are not copied by calling a new function 
> qemu_put_buffer_no_copy.
> The page header data and device state data are still copied into the static
> buffer. This data consists of a lot of bytes and integer fields and the static
> buffer is used to store it during batching.
> Another improvement is changing qemu_putbe64/32/16 to create a single
> buffer instead of several byte sized buffer.
>
> git repository: git://github.com/oritwas/qemu.git sendv_v2

it is still sendv_v2 or sendv_v3 O:-)

> Change from v2:
> Always send data for the iovec even if writev_buffer is not implemented.
> Coalesce adjacent iovecs to create one big buffer from small adjacent buffer.
>
> Changes from v1:
> Use iov_send for socket.
> Make writev_buffer optional and if it is not implemented use put_buffer
>
> Future work: Make number of iovec changeable
>
> Orit Wasserman (9):
>   Add QemuFileWritevBuffer QemuFileOps
>   Add socket_writev_buffer function
>   Update bytes_xfer in qemu_put_byte
>   Store the data to send also in iovec
>   Use writev ops if available
>   More optimized qemu_put_be64/32/16
>   Add qemu_put_buffer_no_copy
>   Use qemu_put_buffer_no_copy for guest memory pages
>   coalesce adjacent iovecs
>
>  arch_init.c   |   2 +-
>  include/migration/qemu-file.h |  12 
>  savevm.c  | 127 
> ++
>  3 files changed, 116 insertions(+), 25 deletions(-)

Really nice series.  Just a couple of small nits.

Later, Juan.



Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs

2013-03-21 Thread Juan Quintela
Orit Wasserman  wrote:
> This way we send one big buffer instead of many small ones
>
> Signed-off-by: Orit Wasserman 
> ---
>  savevm.c | 21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/savevm.c b/savevm.c
> index 50e8fb2..13a533b 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t 
> *buf, int size)
>  abort();
>  }
>  
> -f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> -f->iov[f->iovcnt++].iov_len = size;


> +/* check for adjoint buffer and colace them */

s/colace/coalesce/

> +if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
> +f->iov[f->iovcnt - 1].iov_len) {
> +f->iov[f->iovcnt - 1].iov_len += size;
> +} else {
> +f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> +f->iov[f->iovcnt++].iov_len = size;
> +}

Can we create a function for this?
Would help make sure that we don't forget to "fix" both places when we
have changes.

Later, Juan.



Re: [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte

2013-03-21 Thread Orit Wasserman
On 03/21/2013 07:38 PM, Eric Blake wrote:
> On 03/21/2013 10:05 AM, Orit Wasserman wrote:
>> Signed-off-by: Orit Wasserman 
>> ---
>>  savevm.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/savevm.c b/savevm.c
>> index baa45ae..686c8c8 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -647,6 +647,8 @@ void qemu_put_byte(QEMUFile *f, int v)
>>  
>>  f->buf[f->buf_index++] = v;
>>  f->is_write = 1;
>> +f->bytes_xfer += 1;
> 
> Why ' += 1' instead of the shorter '++'?
> 
It compiles to the same code but I can change it :)



Re: [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte

2013-03-21 Thread Eric Blake
On 03/21/2013 10:05 AM, Orit Wasserman wrote:
> Signed-off-by: Orit Wasserman 
> ---
>  savevm.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/savevm.c b/savevm.c
> index baa45ae..686c8c8 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -647,6 +647,8 @@ void qemu_put_byte(QEMUFile *f, int v)
>  
>  f->buf[f->buf_index++] = v;
>  f->is_write = 1;
> +f->bytes_xfer += 1;

Why ' += 1' instead of the shorter '++'?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 7/9] Add qemu_put_buffer_no_copy

2013-03-21 Thread Orit Wasserman
On 03/21/2013 07:34 PM, Juan Quintela wrote:
> Orit Wasserman  wrote:
>> This allow us to add a buffer to the iovec to send without copying it
>> into the static buffer.
>>
>> Signed-off-by: Orit Wasserman 
>> ---
>>  include/migration/qemu-file.h |  5 +
>>  savevm.c  | 37 -
>>  2 files changed, 33 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
>> index 8d3da9b..5168be2 100644
>> --- a/include/migration/qemu-file.h
>> +++ b/include/migration/qemu-file.h
>> @@ -75,6 +75,11 @@ int qemu_fclose(QEMUFile *f);
>>  int64_t qemu_ftell(QEMUFile *f);
>>  void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
>>  void qemu_put_byte(QEMUFile *f, int v);
>> +/*
>> + * put_buffer without copying the buffer.
>> + * The buffer should be available till it is sent.
>> + */
>> +void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size);
>>  
>>  static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
>>  {
>> diff --git a/savevm.c b/savevm.c
>> index 83aa9e7..fbfb9e3 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -621,6 +621,30 @@ int qemu_fclose(QEMUFile *f)
>>  return ret;
>>  }
>>  
>> +
>> +void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
>> +{
>> +if (f->last_error) {
>> +return;
>> +}
>> +
>> +if (f->is_write == 0 && f->buf_index > 0) {
>> +fprintf(stderr,
>> +"Attempted to write to buffer while read buffer is not 
>> empty\n");
>> +abort();
>> +}
> 
> I don't understand this test at all (yes, I know that the test already
> existed).
> 
> We shouldn't never arrived qemu_put_buffer() with a QEMUFile with
> f->is_write == 0.
> 
> Change it for one assert()?
> 
>> +f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
>> +f->iov[f->iovcnt++].iov_len = size;
> 
> This is clearly wrong, or I have completely missunderstood it (I will
> give a 50% to each posiblitiy).
> 
> Here we should be using "buf" and "size" passed as paramenters, f->buf
> and f->buf_index shouldn't be used, no?
you are right, it somehow moved to the next patch :(
I will send a fixed v4 after you finish review the series ..
> 
>> +
>> +f->is_write = 1;
> 
> is_write is completely redundant, and should just be a:
> 
> f->ops->put_buffer test (or now with writev).  We only set it up when
> there is anything to be written?
> 
> But again, this is independent of this series.
> 
I agree, maybe we need a cleanup series and remove it.
It will certainly simplify this code

Orit




Re: [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages

2013-03-21 Thread Juan Quintela
Orit Wasserman  wrote:
> This will remove an unneeded copy of guest memory pages.
> For the page header and device state we still copy the data to the
> static buffer the other option is to allocate the memory on demand
> which is more expensive.
>
> Signed-off-by: Orit Wasserman 
> ---
>  arch_init.c | 2 +-
>  savevm.c| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 98e2bc6..27b53eb 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -481,7 +481,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>  /* XBZRLE overflow or normal page */
>  if (bytes_sent == -1) {
>  bytes_sent = save_block_hdr(f, block, offset, cont, 
> RAM_SAVE_FLAG_PAGE);
> -qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> +qemu_put_buffer_no_copy(f, p, TARGET_PAGE_SIZE);
>  bytes_sent += TARGET_PAGE_SIZE;
>  acct_info.norm_pages++;
>  }

Once here, shouldn't we also change:

block-migration.c::blk_send()

qemu_put_buffer(f, blk->buf, BLOCK_SIZE);

to nocopy?

Again, this can be an additional patch.


> diff --git a/savevm.c b/savevm.c
> index fbfb9e3..50e8fb2 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -634,7 +634,7 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t 
> *buf, int size)
>  abort();
>  }
>  
> -f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
> +f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
>  f->iov[f->iovcnt++].iov_len = size;

This bit should be in the previous patch, the error that I pointed?

Later, Juan.



Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs

2013-03-21 Thread Orit Wasserman
On 03/21/2013 06:29 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote:
>> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote:
>>> On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote:
 This way we send one big buffer instead of many small ones

 Signed-off-by: Orit Wasserman 
>>>
>>> Why does this happen BTW?
>>
>> It happens in the last phase when we send the device state that consists of 
>> a lot
>> bytes and int field that are written using qemu_put_byte/be16/...
>>
> 
> Confused  I thought device_state does not use _nocopy?
> My idea of using vmsplice relies exactly on this:
> we can not splice device state ...

qemu_put_buffer calls qemu_put_buffer_no_copy (you know code reuse)
So I guess we will need to add some flag if we want to use vmsplice or not
or split no_copy into external and internal. It won't be a big change

> 
>>>
 ---
  savevm.c | 21 +
  1 file changed, 17 insertions(+), 4 deletions(-)

 diff --git a/savevm.c b/savevm.c
 index 50e8fb2..13a533b 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const 
 uint8_t *buf, int size)
  abort();
  }
  
 -f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
 -f->iov[f->iovcnt++].iov_len = size;
 +/* check for adjoint buffer and colace them */
 +if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
 +f->iov[f->iovcnt - 1].iov_len) {
 +f->iov[f->iovcnt - 1].iov_len += size;
 +} else {
 +f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
 +f->iov[f->iovcnt++].iov_len = size;
 +}
  
  f->is_write = 1;
  f->bytes_xfer += size;
 @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v)
  f->buf[f->buf_index++] = v;
  f->is_write = 1;
  f->bytes_xfer += 1;
 -f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
 -f->iov[f->iovcnt++].iov_len = 1;
 +
 +/* check for adjoint buffer and colace them */
 +if (f->iovcnt > 0 && f->buf + (f->buf_index - 1)  ==
 +f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) {
 +f->iov[f->iovcnt - 1].iov_len += 1;
 +} else {
 +f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
 +f->iov[f->iovcnt++].iov_len = 1;
 +}
  
  if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
  qemu_fflush(f);
 -- 
 1.7.11.7




  1   2   3   4   >