date:20130221

[Qemu-devel] (iSCSI-)Readcache in Qemu?

2013-02-21 Thread Peter Lieven


Hi,

is there any project or plans to integrate a readcache in qemu either for iscsi 
or for the whole block layer?

I was thinking of 2 options:
a) assign (up to) a fixed amount of ram for readcaching of a VMs disk I/O. 
Without SG_IO this seems
to be quite easy as there are only a handful of operations to intercept.
b) add SSDs to a Node and allow a VM to use up to X MB of that SSD as 
read-cache. This should still be
a benefit if the storage for the VMs is on a NAS.

I would like to do that independent of the OS for 2 reasons. First with iSCSI 
the OS
is not involved. And second I would like to be able to add a memory limit for 
the cache.

Peter

Re: [Qemu-devel] [RFC PATCH 3/3] KVM: s390: Hook up ioeventfds.

2013-02-21 Thread Cornelia Huck

On Thu, 21 Feb 2013 22:42:41 +0200
"Michael S. Tsirkin"  wrote:

> On Thu, Feb 21, 2013 at 07:14:31PM +0100, Cornelia Huck wrote:
> > On Thu, 21 Feb 2013 18:34:59 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Thu, Feb 21, 2013 at 04:21:43PM +0100, Cornelia Huck wrote:
> > > > On Thu, 21 Feb 2013 16:39:05 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Thu, Feb 21, 2013 at 02:13:00PM +0100, Cornelia Huck wrote:
> > > > > > As s390 doesn't use memory writes for virtio notifcations, create
> > > > > > a special kind of ioeventfd instead that hooks up into diagnose
> > > > > > 0x500 (kvm hypercall) with function code 3 (virtio-ccw 
> > > > > > notification).
> > > > > > 
> > > > > > Signed-off-by: Cornelia Huck 
> > > > > 
> > > > > Do we really have to put virtio specific stuff into kvm?
> > > > > How about we add generic functionality to match GPRs
> > > > > on a hypercall and signal an eventfd?
> > > > 
> > > > Worth a try implementing that.
> > > > 
> > > > > 
> > > > > Also, it's a bit unfortunate that this doesn't use
> > > > > the io bus datastructure, long term the linked list handling
> > > > > might become a bottleneck, using shared code this could maybe
> > > > > benefit from performance optimizations there.
> > > > 
> > > > The linked list stuff was more like an initial implementation that
> > > > could be improved later.
> > > > 
> > > > > io bus data structure currently has the ability to match on
> > > > > two 64 bit fields (addr/datamatch) and a signed 32 bit one (length).
> > > > > Isn't this sufficient for your purposes?
> > > > > How about sticking subchannel id in address, vq in data match
> > > > > and using io bus?
> > > > 
> > > > I can give that a try. (I must admit that I didn't look at the iobus
> > > > stuff in detail.)
> > > > 
> > > > > 
> > > > > BTW maybe we could do this for the user interface too,
> > > > > while I'm not 100% sure it's the cleanest thing to do
> > > > > (or will work), it would certainly minimize the patchset.
> > > > 
> > > > You mean integrating with the generic interface and dropping the new
> > > > ARCH flag?
> > > 
> > > Not sure about the flag but we could use the general structure
> > > without an arch-specific format, if that's a good fit.
> > > 
> > So I have something that seems to do what I want. I'll see if I can
> > morph it into something presentable tomorrow.
> > 
> > diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
> > index b58dd86..3c43e30 100644
> > --- a/arch/s390/kvm/Kconfig
> > +++ b/arch/s390/kvm/Kconfig
> > @@ -22,6 +22,7 @@ config KVM
> > select PREEMPT_NOTIFIERS
> > select ANON_INODES
> > select HAVE_KVM_CPU_RELAX_INTERCEPT
> > +   select HAVE_KVM_EVENTFD
> > ---help---
> >   Support hosting paravirtualized guest machines using the SIE
> >   virtualization capability on the mainframe. This should work
> > diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
> > index 3975722..8fe9d65 100644
> > --- a/arch/s390/kvm/Makefile
> > +++ b/arch/s390/kvm/Makefile
> > @@ -6,7 +6,7 @@
> >  # it under the terms of the GNU General Public License (version 2 only)
> >  # as published by the Free Software Foundation.
> >  
> > -common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o)
> > +common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o eventfd.o)
> >  
> >  ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
> >  
> > diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
> > index a390687..7fc195e 100644
> > --- a/arch/s390/kvm/diag.c
> > +++ b/arch/s390/kvm/diag.c
> > @@ -104,6 +104,20 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
> > return -EREMOTE;
> >  }
> >  
> > +static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
> > +{
> > +   int ret, idx;
> > +   u64 vq = vcpu->run->s.regs.gprs[3];
> > +
> > +   idx = srcu_read_lock(&vcpu->kvm->srcu);
> > +   ret = kvm_io_bus_write(vcpu->kvm, KVM_CSS_BUS,
> > +   vcpu->run->s.regs.gprs[2],
> > +   vcpu->run->s.regs.gprs[1],
> > +   &vq);
> 
> Hmm, do you really need three gprs? If not, we could
> just pass len == 8, which would be cleaner.
> Also might as well drop the vq variable, no?

If I want to pass generic hypercalls, I need to pass the subcode (in
gpr 1) in the len variable. If all we'll ever need is the virtio-ccw
notify hypercall, gprs 2 and 3 are enough. Would perhaps make the
common code less hacky, but we'd lose (unneeded?) flexibility.

> 
> > +   srcu_read_unlock(&vcpu->kvm->srcu, idx);
> > +   return ret;
> > +}
> > +
> >  int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
> >  {
> > int code = (vcpu->arch.sie_block->ipb & 0xfff) >> 16;
> > @@ -118,6 +132,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
> > return __diag_time_slice_end_directed(vcpu);
> > case 0x308:
> > return __diag_ipl_functions(vcpu);
> > +   case 0x500:
> > +   return __diag_virtio_hypercall(vcpu);
> > default:
>

Re: [Qemu-devel] [PATCH] machine: correct macro name for default boot_order

2013-02-21 Thread li guang

在 2013-02-20三的 09:28 +0100，Markus Armbruster写道：
> liguang  writes:
> 
> > DEFAULT_MACHINE_OPTIONS is setting default boot_order,
> > while QEMUMachine already has default_machine_opts
> > to encapsulate some default options, so change it to
> > DEFAULT_MACHINE_BOOT_ORDER.
> 
> Right now, DEFAULT_MACHINE_OPTIONS contains just a .boot_order
> initializer.  But that's not necessarily so; it could contain anything.
> Avik, Anthony, you wrote or reviewed the patch that added it, what do
> you think?

DEFAULT_MACHINE_OPTIONS seems same with default_machine_opts which is 
already a member of QEMUMachine struct

Re: [Qemu-devel] [PATCH 4/9] target-i386: convert 'hv_relaxed' to static property

2013-02-21 Thread Igor Mammedov

On Wed, 20 Feb 2013 08:55:42 -0300
Eduardo Habkost  wrote:

> On Wed, Feb 20, 2013 at 10:03:37AM +0100, Igor Mammedov wrote:
> > On Tue, 19 Feb 2013 15:45:16 -0300
> > Eduardo Habkost  wrote:
> > 
> > > On Mon, Feb 11, 2013 at 05:35:06PM +0100, Igor Mammedov wrote:
> > > > Signed-off-by: Igor Mammedov 
> > > > ---
> > > >  target-i386/cpu.c |   35 ++-
> > > >  1 files changed, 34 insertions(+), 1 deletions(-)
> > > > 
> > > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > > > index 1f14b65..b804031 100644
> > > > --- a/target-i386/cpu.c
> > > > +++ b/target-i386/cpu.c
> > > > @@ -528,6 +528,38 @@ PropertyInfo qdev_prop_spinlocks = {
> > > >  .defval =
> > > > _defval  \ }
> > > >  
> > > > +static void x86_get_hv_relaxed(Object *obj, Visitor *v, void *opaque,
> > > > + const char *name, Error **errp)
> > > > +{
> > > > +bool value = hyperv_relaxed_timing_enabled();
> > > > +
> > > > +visit_type_bool(v, &value, name, errp);
> > > > +}
> > > > +
> > > > +static void x86_set_hv_relaxed(Object *obj, Visitor *v, void *opaque,
> > > > + const char *name, Error **errp)
> > > > +{
> > > > +bool value;
> > > > +
> > > > +visit_type_bool(v, &value, name, errp);
> > > > +if (error_is_set(errp)) {
> > > > +return;
> > > > +}
> > > > +hyperv_enable_relaxed_timing(value);
> > > > +}
> > > > +
> > > > +PropertyInfo qdev_prop_hv_relaxed = {
> > > > +.name  = "boolean",
> > > > +.get   = x86_get_hv_relaxed,
> > > > +.set   = x86_set_hv_relaxed,
> > > > +};
> > > > +#define DEFINE_PROP_HV_RELAXED(_n, _defval)
> > > > {  \
> > > > +.name  =
> > > > _n,   \
> > > > +.info  =
> > > > &qdev_prop_hv_relaxed,\
> > > > +.qtype =
> > > > QTYPE_QBOOL,  \
> > > > +.defval =
> > > > _defval  \ +}
> > > > +
> > > >  static Property cpu_x86_properties[] = {
> > > >  DEFINE_PROP_FAMILY("family"),
> > > >  DEFINE_PROP_MODEL("model"),
> > > > @@ -538,6 +570,7 @@ static Property cpu_x86_properties[] = {
> > > >  DEFINE_PROP_MODEL_ID("model-id"),
> > > >  DEFINE_PROP_TSC_FREQ("tsc-frequency"),
> > > >  DEFINE_PROP_HV_SPINLOCKS("hv-spinlocks",
> > > > HYPERV_SPINLOCK_NEVER_RETRY),
> > > > +DEFINE_PROP_HV_RELAXED("hv-relaxed", false),
> > > 
> > > Why not simply make it a X86CPU struct field, so we don't need a special
> > > PropertyInfo?
> > > 
> > > The whole contents of target-i386/hyperv.c are getters/setters for three
> > > static variables that should have been X86CPU fields in the first place.
> > I went via less intrusive approach to avoid breaking anything during
> > conversion. Can we proceed with conversion first and than decide whether to
> > move hv_* into CPU or not?
> 
> Personally, I find the complex getter+setter+PropertyInfo+DEFINE_PROP_*
> code above more complex and harder to review (thus harder to make me
> confident it won't break anything) than simply moving a static variable
> to a X86CPU field.
That isn't as easy as you say. That would be more intrusive since it requires
to hack all the code that access this static variables, and I do not have
anything to test that kind of change. While using the same hv_* functions from
cpu_x86_parse_featurestr() in setters is easy to test with much less chance to
regress.

> 
> Also, it doesn't even make sense to have a X86CPU property available for
> a static variable that is not per-CPU. What do we gain by making it look
> like a per-X86CPU property if it is not?
getting/setting it. Aside of academic interest, It's not likely you'll have
CPUs with different hv_* values set ever and expect guest to work.

You can make this movement later if you like.

> 
> 
> > 
> > > 
> > > 
> > > >  DEFINE_PROP_END_OF_LIST(),
> > > >   };
> > > >  
> > > > @@ -1468,7 +1501,7 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu,
> > > > char *features, Error **errp) } else if (!strcmp(featurestr, 
> > > > "enforce")) {
> > > >  check_cpuid = enforce_cpuid = 1;
> > > >  } else if (!strcmp(featurestr, "hv_relaxed")) {
> > > > -hyperv_enable_relaxed_timing(true);
> > > > +object_property_parse(OBJECT(cpu), "on", "hv-relaxed", 
> > > > errp);
> > > >  } else if (!strcmp(featurestr, "hv_vapic")) {
> > > >  hyperv_enable_vapic_recommended(true);
> > > >  } else {
> > > > -- 
> > > > 1.7.1
> > > > 
> > > > 
> > > 
> > 
> 
> -- 
> Eduardo
> 


-- 
Regards,
  Igor

[Qemu-devel] [PATCH V2] help: add docs for multiqueue tap options

2013-02-21 Thread Jason Wang

Cc: Markus Armbruster 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Jason Wang 
---
Changes from V1:
- Add the missing docs for 'queues' (Markus)
---
 qapi-schema.json |8 
 qemu-options.hx  |5 -
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 7275b5d..b3844e6 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2504,6 +2504,9 @@
 #
 # @fd: #optional file descriptor of an already opened tap
 #
+# @fds: #optional multiple file descriptors of already opened multiqueue 
capable
+# tap
+#
 # @script: #optional script to initialize the interface
 #
 # @downscript: #optional script to shut down the interface
@@ -2518,8 +2521,13 @@
 #
 # @vhostfd: #optional file descriptor of an already opened vhost net device
 #
+# @vhostfds: #optional file descriptors of multiple already opened vhost net
+# devices
+#
 # @vhostforce: #optional vhost on for non-MSIX virtio guests
 #
+# @queues: #optional number of queues to be created for multiqueue capable tap
+#
 # Since 1.2
 ##
 { 'type': 'NetdevTapOptions',
diff --git a/qemu-options.hx b/qemu-options.hx
index 4bc9c85..3928620 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1354,7 +1354,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 "-net tap[,vlan=n][,name=str],ifname=name\n"
 "connect the host TAP network interface to VLAN 'n'\n"
 #else
-"-net 
tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostforce=on|off]\n"
+"-net 
tap[,vlan=n][,name=str][,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]\n"
 "connect the host TAP network interface to VLAN 'n'\n"
 "use network scripts 'file' (default=" 
DEFAULT_NETWORK_SCRIPT ")\n"
 "to configure it and 'dfile' (default=" 
DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
@@ -1363,6 +1363,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 "use network helper 'helper' (default=" 
DEFAULT_BRIDGE_HELPER ") to\n"
 "configure it\n"
 "use 'fd=h' to connect to an already opened TAP 
interface\n"
+"use 'fds=x:y:...:z' to connect to already opened 
multiqueue capable TAP interfaces\n"
 "use 'sndbuf=nbytes' to limit the size of the send buffer 
(the\n"
 "default is disabled 'sndbuf=0' to enable flow control set 
'sndbuf=1048576')\n"
 "use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag\n"
@@ -1371,6 +1372,8 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 "(only has effect for virtio guests which use MSIX)\n"
 "use vhostforce=on to force vhost on for non-MSIX virtio 
guests\n"
 "use 'vhostfd=h' to connect to an already opened vhost net 
device\n"
+"use 'vhostfds=x:y:...:z to connect to multiple already 
opened vhost net devices\n"
+"use 'queues=n' to specify the number of queues to be 
created for multiqueue TAP\n"
 "-net bridge[,vlan=n][,name=str][,br=bridge][,helper=helper]\n"
 "connects a host TAP network interface to a host bridge 
device 'br'\n"
 "(default=" DEFAULT_BRIDGE_INTERFACE ") using the program 
'helper'\n"
-- 
1.7.1

[Qemu-devel] [PATCH] ui/gtk: Fix build (missing include for setlocale)

2013-02-21 Thread Stefan Weil

At least for Ubuntu Linux locale.h is needed.

Signed-off-by: Stefan Weil 
---

This is a build regression, please apply without waiting for qemu-trivial.

Thanks,
Stefan

 ui/gtk.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/ui/gtk.c b/ui/gtk.c
index 29156be..5f91de4 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
1.7.10.4

Re: [Qemu-devel] GTK UI is now the default

2013-02-21 Thread Stefan Weil

Am 22.02.2013 07:08, schrieb Stefan Weil:
> Am 22.02.2013 00:04, schrieb Anthony Liguori:
>> Since this is a pretty visible change for a lot of people, I thought I'd
>> send a top level note.  The GTK UI is now committed and is the default
>> UI provided it's available.
>>
>> For anyone counting, it's been a little more than 7 years in the making:
>>
>> http://article.gmane.org/gmane.comp.emulators.qemu/9726
>>
>> If you want to try it, make sure you have the gtk-2.0 development
>> packages installed and the VTE development packages.
>>
>> Regards,
>>
>> Anthony Liguori
> Hi Anthony,
>
> I appreciate that the gtk ui is now available in git master.
> Alas, the current code did not address some issues which
> I had reported earlier. Most important:
>
> * The code does not compile on current Debian Squeeze
> because it requires a newer version of vte:
> /gtk.c:860: error: ‘VtePty’ undeclared (first use in this function)
> and lots of more compiler errors.
>
> * There is no w32/w64 support for vte, and it would need
> a really big effort to get a vte binary for those hosts.
> They need a gtk ui without vte.
>
> * New: Ubuntu does not provide /usr/lib/pkgconfig/gtk+-2.0.pc,
> so Ubuntu hosts won't compile the new gtk ui.

Sorry, I was wrong here. There is a
/usr/lib/i386-linux-gnu/pkgconfig/gtk+-2.0.pc which works.

Ubuntu still needs a patch for ui/gtk.c (coming with separate mail).

>
> Regards,
> Stefan
>
>
>

Re: [Qemu-devel] [PATCH 9/9] target-i386: set [+-]feature using static properties

2013-02-21 Thread Igor Mammedov

On Tue, 19 Feb 2013 16:03:04 -0300
Eduardo Habkost  wrote:

> On Mon, Feb 11, 2013 at 05:35:11PM +0100, Igor Mammedov wrote:
> >  * Define static properties for cpuid feature bits
> > * property names of CPUID features are changed to have "f-" prefix,
> >   so that it would be easy to distinguish them from other properties.
> > 
> >  * Convert [+-]cpuid_features to a set(QDict) of key, value pairs, where
> >  +feat => (f-feat, on)
> >  -feat => (f-feat, off)
> >  [+-] unknown feat => (feat, on/off) - it's expected to be rejected
> >by property setter later
> >QDict is used as convinience sturcture to keep -foo semantic.
> >Once all +-foo are parsed, collected features are applied to CPU 
> > instance.
> > 
> >  * Cleanup unused anymore:
> >  add_flagname_to_bitmaps(), lookup_feature(), altcmp(), sstrcmp(),
> > 
> >  * Rename lowlevel 'kvmclock' into 'f-kvmclock1' and introduce
> >legacy composite property 'f-kvmclock' that sets both 'f-kvmclock1'
> >and 'f-kvmclock2' feature bits to mantain legacy kvmclock behavior
> > 
> > Signed-off-by: Igor Mammedov 
> > ---
> >  v4:
> >- major patch reordering, rebasing on current qom-cpu-next
> >- merged patches: "define static properties..." and "set [+-]feature..."
> >- merge in "make 'f-kvmclock' compatible..." to aviod behavior breakage
> >- use register name in feature macros, so that if we rename cpuid_* 
> > fields,
> >  it wouldn't involve mass rename in features array.
> >- when converting feature names into property names, replace '_' with 
> > '-',
> >Requested-By: Don Slutz ,
> >  mgs-id: <5085d4aa.1060...@cloudswitch.com>
> > 
> >  v3:
> >- new static properties after rebase:
> >   - add features "f-rdseed" & "f-adx"
> >   - add features added by c8acc380
> >   - add features f-kvm_steal_tm and f-kvmclock_stable
> >   - ext4 features f-xstore, f-xstore-en, f-xcrypt, f-xcrypt-en,
> > f-ace2, f-ace2-en, f-phe, f-phe-en, f-pmm, f-pmm-en
> > 
> >- f-hypervisor set default to false as for the rest of features
> >- define helper FEAT for declaring CPUID feature properties to
> >  make shorter lines (<80 characters)
> > 
> >  v2:
> >- use X86CPU as a type to count of offset correctly, because env field
> >  isn't starting at CPUstate beginning, but located after it.
> > ---
> >  target-i386/cpu.c |  348 
> > +
> >  1 files changed, 242 insertions(+), 106 deletions(-)
> > 
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > index fcfe8ec..2a5a5b5 100644
> > --- a/target-i386/cpu.c
> > +++ b/target-i386/cpu.c
> > @@ -656,6 +656,65 @@ PropertyInfo qdev_prop_enforce = {
> >  .defval = _defval  
> > \
> >  }
> >  
> > +static void x86_cpu_get_kvmclock(Object *obj, Visitor *v, void *opaque,
> > + const char *name, Error **errp)
> > +{
> > +X86CPU *cpu = X86_CPU(obj);
> > +bool value = cpu->env.cpuid_kvm_features;
> > +value = (value & KVM_FEATURE_CLOCKSOURCE) &&
> > +(value & KVM_FEATURE_CLOCKSOURCE2);
> > +visit_type_bool(v, &value, name, errp);
> > +}
> > +
> > +static void x86_cpu_set_kvmclock(Object *obj, Visitor *v, void *opaque,
> > + const char *name, Error **errp)
> > +{
> > +X86CPU *cpu = X86_CPU(obj);
> > +bool value;
> > +visit_type_bool(v, &value, name, errp);
> > +if (value == true) {
> > +cpu->env.cpuid_kvm_features |= KVM_FEATURE_CLOCKSOURCE |
> > +  KVM_FEATURE_CLOCKSOURCE2;
> > +} else {
> > +cpu->env.cpuid_kvm_features &= ~(KVM_FEATURE_CLOCKSOURCE |
> > +   KVM_FEATURE_CLOCKSOURCE2);
> > +}
> > +}
> > +
> > +PropertyInfo qdev_prop_kvmclock = {
> > +.name  = "boolean",
> > +.get   = x86_cpu_get_kvmclock,
> > +.set   = x86_cpu_set_kvmclock,
> > +};
> > +#define DEFINE_PROP_KVMCLOCK(_n) { 
> > \
> > +.name  = _n,   
> > \
> > +.info  = &qdev_prop_kvmclock   
> > \
> > +}
> 
> Instead of the complexity of a new PropertyInfo struct, I would rather
> have a "enable_both_kvmclock_bits" field in X86CPU, that would be
> handled on x86_cpu_realize() and set the other feature bits. Then we
> could have a plain struct-field property with no special PropertyInfo
> struct.
No extra fields pls, unless we have to.

> 
> Or, maybe we shouldn't even add a "kvmclock" property at all, and
> translate the legacy "kvmclock" flag to kvmclock1|kvmclock2 inside
> parse_featurestr()?
That's what I was proposing a while back. Lets do it this way.

> 
> Except for that, patch looks good overall, but I still need to review
> the

Re: [Qemu-devel] GTK UI is now the default

2013-02-21 Thread Stefan Weil

Am 22.02.2013 00:04, schrieb Anthony Liguori:
> Since this is a pretty visible change for a lot of people, I thought I'd
> send a top level note.  The GTK UI is now committed and is the default
> UI provided it's available.
>
> For anyone counting, it's been a little more than 7 years in the making:
>
> http://article.gmane.org/gmane.comp.emulators.qemu/9726
>
> If you want to try it, make sure you have the gtk-2.0 development
> packages installed and the VTE development packages.
>
> Regards,
>
> Anthony Liguori

Hi Anthony,

I appreciate that the gtk ui is now available in git master.
Alas, the current code did not address some issues which
I had reported earlier. Most important:

* The code does not compile on current Debian Squeeze
because it requires a newer version of vte:
/gtk.c:860: error: ‘VtePty’ undeclared (first use in this function)
and lots of more compiler errors.

* There is no w32/w64 support for vte, and it would need
a really big effort to get a vte binary for those hosts.
They need a gtk ui without vte.

* New: Ubuntu does not provide /usr/lib/pkgconfig/gtk+-2.0.pc,
so Ubuntu hosts won't compile the new gtk ui.

Regards,
Stefan

Re: [Qemu-devel] [PATCH] net: reduce the unnecessary memory allocation of multiqueue

2013-02-21 Thread Jason Wang

On 02/22/2013 06:32 AM, Anthony Liguori wrote:
> Jason Wang  writes:
>
>> Edivaldo reports a problem that the array of NetClientState in NICState is 
>> too
>> large - MAX_QUEUE_NUM(1024) which will wastes memory even if multiqueue is 
>> not
>> used.
>>
>> Instead of static arrays, solving this issue by allocating the queues on 
>> demand
>> for both the NetClientState array in NICState and VirtIONetQueue array in
>> VirtIONet.
>>
>> Tested by myself, with single virtio-net-pci device. The memory allocation is
>> almost the same as when multiqueue is not merged.
>>
>> Cc: Edivaldo de Araujo Pereira 
>> Cc: qemu-sta...@nongnu.org
>> Signed-off-by: Jason Wang 
>> ---
>>  hw/virtio-net.c   |6 --
>>  include/net/net.h |2 +-
>>  net/net.c |   19 +--
>>  3 files changed, 14 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
>> index 573c669..70ab641 100644
>> --- a/hw/virtio-net.c
>> +++ b/hw/virtio-net.c
>> @@ -44,7 +44,7 @@ typedef struct VirtIONet
>>  VirtIODevice vdev;
>>  uint8_t mac[ETH_ALEN];
>>  uint16_t status;
>> -VirtIONetQueue vqs[MAX_QUEUE_NUM];
>> +VirtIONetQueue *vqs;
>>  VirtQueue *ctrl_vq;
>>  NICState *nic;
>>  uint32_t tx_timeout;
>> @@ -1326,8 +1326,9 @@ VirtIODevice *virtio_net_init(DeviceState *dev, 
>> NICConf *conf,
>>  n->vdev.set_status = virtio_net_set_status;
>>  n->vdev.guest_notifier_mask = virtio_net_guest_notifier_mask;
>>  n->vdev.guest_notifier_pending = virtio_net_guest_notifier_pending;
>> +n->max_queues = MAX(conf->queues, 1);
> I think you mean MIN here.

Then the at most 1 queue will be created. MAX is used to create at least
one queue when conf->queues is zero which means the nic were not created
with netdev.
>> +n->vqs = g_malloc0(sizeof(VirtIONetQueue) * n->max_queues);
>>  n->vqs[0].rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
>> -n->max_queues = conf->queues;
>>  n->curr_queues = 1;
>>  n->vqs[0].n = n;
>>  n->tx_timeout = net->txtimer;
>> @@ -1397,6 +1398,7 @@ void virtio_net_exit(VirtIODevice *vdev)
>>  
>>  g_free(n->mac_table.macs);
>>  g_free(n->vlans);
>> +g_free(n->vqs);
>>  
>>  for (i = 0; i < n->max_queues; i++) {
>>  VirtIONetQueue *q = &n->vqs[i];
>> diff --git a/include/net/net.h b/include/net/net.h
>> index 43a045e..cb049a1 100644
>> --- a/include/net/net.h
>> +++ b/include/net/net.h
>> @@ -72,7 +72,7 @@ struct NetClientState {
>>  };
>>  
>>  typedef struct NICState {
>> -NetClientState ncs[MAX_QUEUE_NUM];
>> +NetClientState *ncs;
>>  NICConf *conf;
>>  void *opaque;
>>  bool peer_deleted;
>> diff --git a/net/net.c b/net/net.c
>> index be03a8d..6262ed0 100644
>> --- a/net/net.c
>> +++ b/net/net.c
>> @@ -235,23 +235,20 @@ NICState *qemu_new_nic(NetClientInfo *info,
>> const char *name,
>> void *opaque)
>>  {
>> -NetClientState *nc;
>>  NetClientState **peers = conf->peers.ncs;
>>  NICState *nic;
>> -int i;
>> +int i, queues = MAX(1, conf->queues);
> And here.  This patch is what had created problems for me.

Same as above, so I think the code is ok or anything I miss?
>
> Regards,
>
> Anthony Liguori
>
>>  
>>  assert(info->type == NET_CLIENT_OPTIONS_KIND_NIC);
>>  assert(info->size >= sizeof(NICState));
>>  
>> -nc = qemu_new_net_client(info, peers[0], model, name);
>> -nc->queue_index = 0;
>> -
>> -nic = qemu_get_nic(nc);
>> +nic = g_malloc0(info->size + sizeof(NetClientState) * queues);
>> +nic->ncs = (void *)nic + info->size;
>>  nic->conf = conf;
>>  nic->opaque = opaque;
>>  
>> -for (i = 1; i < conf->queues; i++) {
>> -qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, nc->name,
>> +for (i = 0; i < queues; i++) {
>> +qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, name,
>>NULL);
>>  nic->ncs[i].queue_index = i;
>>  }
>> @@ -261,7 +258,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
>>  
>>  NetClientState *qemu_get_subqueue(NICState *nic, int queue_index)
>>  {
>> -return &nic->ncs[queue_index];
>> +return nic->ncs + queue_index;
>>  }
>>  
>>  NetClientState *qemu_get_queue(NICState *nic)
>> @@ -273,7 +270,7 @@ NICState *qemu_get_nic(NetClientState *nc)
>>  {
>>  NetClientState *nc0 = nc - nc->queue_index;
>>  
>> -return DO_UPCAST(NICState, ncs[0], nc0);
>> +return (NICState *)((void *)nc0 - nc->info->size);
>>  }
>>  
>>  void *qemu_get_nic_opaque(NetClientState *nc)
>> @@ -368,6 +365,8 @@ void qemu_del_nic(NICState *nic)
>>  qemu_cleanup_net_client(nc);
>>  qemu_free_net_client(nc);
>>  }
>> +
>> +g_free(nic);
>>  }
>>  
>>  void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
>> -- 
>> 1.7.1
>

Re: [Qemu-devel] [PATCH] xhci: fix bad print specifier

2013-02-21 Thread Stefan Weil

Am 21.02.2013 22:58, schrieb Hervé Poussineau:
> This fixes the following compilation error:
> hw/usb/hcd-xhci.c:1156:17: error: format ‘%llx’ expects argument of type
> ‘long long unsigned int’, but argument 4 has type ‘unsigned int’
>
> Signed-off-by: Hervé Poussineau 
> ---
>  hw/usb/hcd-xhci.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
> index 5796102..07afdee 100644
> --- a/hw/usb/hcd-xhci.c
> +++ b/hw/usb/hcd-xhci.c
> @@ -1152,8 +1152,8 @@ static XHCIStreamContext 
> *xhci_find_stream(XHCIEPContext *epctx,
>  
>  if (sctx->sct == -1) {
>  xhci_dma_read_u32s(epctx->xhci, sctx->pctx, ctx, sizeof(ctx));
> -fprintf(stderr, "%s: init sctx #%d @ %lx: %08x %08x\n", __func__,
> -streamid, sctx->pctx, ctx[0], ctx[1]);
> +fprintf(stderr, "%s: init sctx #%d @ " DMA_ADDR_FMT ": %08x %08x\n",
> +__func__, streamid, sctx->pctx, ctx[0], ctx[1]);
>  sct = (ctx[0] >> 1) & 0x07;
>  if (epctx->lsa && sct != 1) {
>  *cc_error = CC_INVALID_STREAM_TYPE_ERROR;


Reviewed-by: Stefan Weil 

Anthony, this is a build regression in git master, so we should not
wait for qemu-trivial to apply it.

Thanks,

Stefan

Re: [Qemu-devel] [PATCH v4 4/6] introduce new vma archive format

2013-02-21 Thread Dietmar Maurer

> Not in qemu.  There's a reason that we ask for clean specs, independent of
> source code.  That is the only way that we can later change the source code to
> do something more efficient or to define an extension, and still have clean
> documentation of what the extensions are, vs. how an older version will 
> behave.
> The initial implementation source code might be easy to read, but that 
> condition
> is not guaranteed to last.
> 
> If reading the source code to determine the header format is as easy as you 
> say,
> then you should have no problem writing the spec as detailed as we have asked.

Sure, that is no problem. But so far there is no indication that this code will 
be added
to qemu.

Re: [Qemu-devel] [PATCH] help: add docs for multiqueue tap options

2013-02-21 Thread Jason Wang

On 02/21/2013 11:12 PM, Markus Armbruster wrote:
> Jason Wang  writes:
>
>> Cc: Markus Armbruster 
>> Cc: Jason Wang 
>> Signed-off-by: Jason Wang 
>> ---
>> This patch is neede for 1.4 stable also.
> Forgot to mention, the recommended way to nominate for stable is to cc:
> stable in the commit message, like this:
>
> Cc: qemu-sta...@nongnu.org
>

Ah, thanks for reminding.

Re: [Qemu-devel] [PATCH] help: add docs for multiqueue tap options

2013-02-21 Thread Jason Wang

On 02/21/2013 11:09 PM, Markus Armbruster wrote:
> Jason Wang  writes:
>
>> Cc: Markus Armbruster 
>> Cc: Jason Wang 
>> Signed-off-by: Jason Wang 
>> ---
>> This patch is neede for 1.4 stable also.
>> ---
>>  qapi-schema.json |6 ++
>>  qemu-options.hx  |4 +++-
>>  2 files changed, 9 insertions(+), 1 deletions(-)
>>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 7275b5d..cd7ea25 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -2504,6 +2504,9 @@
>>  #
>>  # @fd: #optional file descriptor of an already opened tap
>>  #
>> +# @fds: #optional multiple file descriptors of already opened multiqueue 
>> capable
>> +# tap
>> +#
>>  # @script: #optional script to initialize the interface
>>  #
>>  # @downscript: #optional script to shut down the interface
>> @@ -2518,6 +2521,9 @@
>>  #
>>  # @vhostfd: #optional file descriptor of an already opened vhost net device
>>  #
>> +# @vhostfds: #optional file descriptors of multiple already opened vhost net
>> +# devices
>> +#
>>  # @vhostforce: #optional vhost on for non-MSIX virtio guests
>>  #
>>  # Since 1.2
> Now only 'queues' remains undocumented.  Could you take care of that,
> too?  Separate patch, if you like.

Oh, I missed that, will send a v2.
> Quick question to help me understand the feature: is "fds=N" equivalent
> to "fd=N"?  Is "vhostfds=N" equivalent to "vhostfd=N"?

Yes, they are equivalent.
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 4bc9c85..2832d82 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -1354,7 +1354,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
>>  "-net tap[,vlan=n][,name=str],ifname=name\n"
>>  "connect the host TAP network interface to VLAN 'n'\n"
>>  #else
>> -"-net 
>> tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostforce=on|off]\n"
>> +"-net 
>> tap[,vlan=n][,name=str][,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off]\n"
>>  "connect the host TAP network interface to VLAN 'n'\n"
>>  "use network scripts 'file' (default=" 
>> DEFAULT_NETWORK_SCRIPT ")\n"
>>  "to configure it and 'dfile' (default=" 
>> DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
>> @@ -1363,6 +1363,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
>>  "use network helper 'helper' (default=" 
>> DEFAULT_BRIDGE_HELPER ") to\n"
>>  "configure it\n"
>>  "use 'fd=h' to connect to an already opened TAP 
>> interface\n"
>> +"use 'fds=x:y:...:z' to connect to already opened 
>> multiqueue capable TAP interfaces\n"
>>  "use 'sndbuf=nbytes' to limit the size of the send 
>> buffer (the\n"
>>  "default is disabled 'sndbuf=0' to enable flow control 
>> set 'sndbuf=1048576')\n"
>>  "use vnet_hdr=off to avoid enabling the IFF_VNET_HDR 
>> tap flag\n"
>> @@ -1371,6 +1372,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
>>  "(only has effect for virtio guests which use 
>> MSIX)\n"
>>  "use vhostforce=on to force vhost on for non-MSIX 
>> virtio guests\n"
>>  "use 'vhostfd=h' to connect to an already opened vhost 
>> net device\n"
>> +"use 'vhostfds=x:y:...:z to connect to multiple already 
>> opened vhost net devices\n"
>>  "-net bridge[,vlan=n][,name=str][,br=bridge][,helper=helper]\n"
>>  "connects a host TAP network interface to a host bridge 
>> device 'br'\n"
>>  "(default=" DEFAULT_BRIDGE_INTERFACE ") using the 
>> program 'helper'\n"
> Reviewed-by: Markus Armbruster

[Qemu-devel] [PATCH v2 3/6] bitops: change BITS_TO_LONGS

2013-02-21 Thread liguang

Signed-off-by: liguang 
---
 include/qemu/bitops.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index 74e14e5..7758792 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -20,7 +20,7 @@
 #define BIT(nr)(1UL << (nr))
 #define BIT_MASK(nr)   (1UL << ((nr) % BITS_PER_LONG))
 #define BIT_WORD(nr)   ((nr) / BITS_PER_LONG)
-#define BITS_TO_LONGS(nr)  DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
+#define BITS_TO_LONGS(nr)  DIV_ROUND_UP(nr, BITS_PER_LONG)
 
 /**
  * bitops_ffs - find first bit in word.
-- 
1.7.2.5

[Qemu-devel] [PATCH v2 2/6] pc/numa: refactor bios_init function

2013-02-21 Thread liguang

orginally, numa data was packed into an array,
which was implicit and hard to maintain, we
define a struct for this data, hope to be as
clear as enough.
also, we only pass cpumask of corresponding
nodes to seabios, and leave the paring work
for it.

Signed-off-by: liguang 
---
 hw/pc.c |   40 ++--
 1 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index d010c75..f227c12 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -562,13 +562,21 @@ static unsigned int pc_apic_id_limit(unsigned int 
max_cpus)
 return x86_cpu_apic_id_from_index(max_cpus - 1) + 1;
 }
 
+typedef struct SRATData {
+uint64_t apic_map; /* size is MAX_NODES */
+uint64_t memory_size;
+} SRATData;
+
 static void *bios_init(void)
 {
 void *fw_cfg;
 uint8_t *smbios_table;
 size_t smbios_len;
-uint64_t *numa_fw_cfg;
-int i, j;
+struct FwNUMACfg {
+uint32_t nr_node;
+SRATData *srat_data;
+} fw_cfg_numa;
+int i;
 unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
 
 fw_cfg = fw_cfg_init(FW_CFG_CTL_IOPORT, FW_CFG_DATA_IOPORT, 0, 0);
@@ -601,28 +609,16 @@ static void *bios_init(void)
  &e820_table, sizeof(e820_table));
 
 fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
-/* allocate memory for the NUMA channel: one (64bit) word for the number
- * of nodes, one word for each VCPU->node and one word for each node to
- * hold the amount of memory.
- */
-numa_fw_cfg = g_new0(uint64_t, 1 + apic_id_limit + nb_numa_nodes);
-numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
-for (i = 0; i < max_cpus; i++) {
-unsigned int apic_id = x86_cpu_apic_id_from_index(i);
-assert(apic_id < apic_id_limit);
-for (j = 0; j < nb_numa_nodes; j++) {
-if (test_bit(i, node_cpumask[j])) {
-numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
-break;
-}
-}
-}
+
+fw_cfg_numa.srat_data = g_new0(SRATData, nb_numa_nodes);
+fw_cfg_numa.nr_node = cpu_to_le64(nb_numa_nodes);
+
 for (i = 0; i < nb_numa_nodes; i++) {
-numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
+fw_cfg_numa.srat_data[i].apic_map = *node_cpumask[i];
+fw_cfg_numa.srat_data[i].memory_size = cpu_to_le64(node_mem[i]);
 }
-fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
- (1 + apic_id_limit + nb_numa_nodes) *
- sizeof(*numa_fw_cfg));
+
+fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, &fw_cfg_numa, sizeof(fw_cfg_numa));
 
 return fw_cfg;
 }
-- 
1.7.2.5

[Qemu-devel] [PATCH v2 4/6] pc: format load_linux()

2013-02-21 Thread liguang

seems this function was in wrong coding format
so try correct it boldly.

Signed-off-by: liguang 
---
 hw/pc.c |   90 +++---
 1 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index f227c12..fd3a68c 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -639,8 +639,8 @@ static long get_file_size(FILE *f)
 
 static void load_linux(void *fw_cfg,
const char *kernel_filename,
-  const char *initrd_filename,
-  const char *kernel_cmdline,
+   const char *initrd_filename,
+   const char *kernel_cmdline,
hwaddr max_ram_size)
 {
 uint16_t protocol;
@@ -657,11 +657,11 @@ static void load_linux(void *fw_cfg,
 /* load the kernel header */
 f = fopen(kernel_filename, "rb");
 if (!f || !(kernel_size = get_file_size(f)) ||
-   fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
-   MIN(ARRAY_SIZE(header), kernel_size)) {
-   fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
-   kernel_filename, strerror(errno));
-   exit(1);
+fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
+MIN(ARRAY_SIZE(header), kernel_size)) {
+fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
+kernel_filename, strerror(errno));
+exit(1);
 }
 
 /* kernel protocol version */
@@ -669,48 +669,48 @@ static void load_linux(void *fw_cfg,
 fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
 #endif
 if (ldl_p(header+0x202) == 0x53726448)
-   protocol = lduw_p(header+0x206);
+protocol = lduw_p(header+0x206);
 else {
-   /* This looks like a multiboot kernel. If it is, let's stop
-  treating it like a Linux kernel. */
+/* This looks like a multiboot kernel. If it is, let's stop
+   treating it like a Linux kernel. */
 if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
kernel_cmdline, kernel_size, header))
 return;
-   protocol = 0;
+protocol = 0;
 }
 
 if (protocol < 0x200 || !(header[0x211] & 0x01)) {
-   /* Low kernel */
-   real_addr= 0x9;
-   cmdline_addr = 0x9a000 - cmdline_size;
-   prot_addr= 0x1;
+/* Low kernel */
+real_addr= 0x9;
+cmdline_addr = 0x9a000 - cmdline_size;
+prot_addr= 0x1;
 } else if (protocol < 0x202) {
-   /* High but ancient kernel */
-   real_addr= 0x9;
-   cmdline_addr = 0x9a000 - cmdline_size;
-   prot_addr= 0x10;
+/* High but ancient kernel */
+real_addr= 0x9;
+cmdline_addr = 0x9a000 - cmdline_size;
+prot_addr= 0x10;
 } else {
-   /* High and recent kernel */
-   real_addr= 0x1;
-   cmdline_addr = 0x2;
-   prot_addr= 0x10;
+/* High and recent kernel */
+real_addr= 0x1;
+cmdline_addr = 0x2;
+prot_addr= 0x10;
 }
 
 #if 0
 fprintf(stderr,
-   "qemu: real_addr = 0x" TARGET_FMT_plx "\n"
-   "qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
-   "qemu: prot_addr = 0x" TARGET_FMT_plx "\n",
-   real_addr,
-   cmdline_addr,
-   prot_addr);
+"qemu: real_addr = 0x" TARGET_FMT_plx "\n"
+"qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
+"qemu: prot_addr = 0x" TARGET_FMT_plx "\n",
+real_addr,
+cmdline_addr,
+prot_addr);
 #endif
 
 /* highest address for loading the initrd */
 if (protocol >= 0x203)
-   initrd_max = ldl_p(header+0x22c);
+initrd_max = ldl_p(header+0x22c);
 else
-   initrd_max = 0x37ff;
+initrd_max = 0x37ff;
 
 if (initrd_max >= max_ram_size-ACPI_DATA_SIZE)
initrd_max = max_ram_size-ACPI_DATA_SIZE-1;
@@ -720,10 +720,10 @@ static void load_linux(void *fw_cfg,
 fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
 
 if (protocol >= 0x202) {
-   stl_p(header+0x228, cmdline_addr);
+stl_p(header+0x228, cmdline_addr);
 } else {
-   stw_p(header+0x20, 0xA33F);
-   stw_p(header+0x22, cmdline_addr-real_addr);
+stw_p(header+0x20, 0xA33F);
+stw_p(header+0x22, cmdline_addr-real_addr);
 }
 
 /* handle vga= parameter */
@@ -749,22 +749,22 @@ static void load_linux(void *fw_cfg,
If this code is substantially changed, you may want to consider
incrementing the revision. */
 if (protocol >= 0x200)
-   header[0x210] = 0xB0;
+header[0x210] = 0xB0;
 
 /* heap */
 if (protocol >= 0x201) {
-   header[0x211] |= 0x80;  /* CAN_USE_HEAP */
-   stw_p(header+0x224, cmdline_addr-real_addr-0x200);
+header[0x211] |= 0x80; /* CAN_

[Qemu-devel] [PATCH v2 1/6] pc/bios: move common BIOS_CFG_IOPORT into fw_cfg.h

2013-02-21 Thread liguang

BIOS_CFG_IOPORT are commonly used, so move it to fw_cfg.h
bochs_bios_init seems misleading, bios may be seabios,
seabios is not only for bochs, and also we are in qemu.

Signed-off-by: liguang 
---
 hw/fw_cfg.h |4 
 hw/pc.c |9 -
 hw/sun4u.c  |3 +--
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index 05c8df1..6b3147d 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -38,6 +38,10 @@
 
 #define FW_CFG_INVALID  0x
 
+#define FW_CFG_CTL_IOPORT 0x510
+#define FW_CFG_DATA_IOPORT 0x511
+
+
 #ifndef NO_QEMU_PROTOS
 typedef struct FWCfgFile {
 uint32_t  size;/* file size */
diff --git a/hw/pc.c b/hw/pc.c
index 34b6dff..d010c75 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -64,7 +64,6 @@
 
 /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.  */
 #define ACPI_DATA_SIZE   0x1
-#define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1)
 #define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2)
@@ -556,14 +555,14 @@ int e820_add_entry(uint64_t address, uint64_t length, 
uint32_t type)
  * This function returns the limit for the APIC ID value, so that all
  * CPU APIC IDs are < pc_apic_id_limit().
  *
- * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
+ * This is used for FW_CFG_MAX_CPUS. See comments on bios_init().
  */
 static unsigned int pc_apic_id_limit(unsigned int max_cpus)
 {
 return x86_cpu_apic_id_from_index(max_cpus - 1) + 1;
 }
 
-static void *bochs_bios_init(void)
+static void *bios_init(void)
 {
 void *fw_cfg;
 uint8_t *smbios_table;
@@ -572,7 +571,7 @@ static void *bochs_bios_init(void)
 int i, j;
 unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
 
-fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
+fw_cfg = fw_cfg_init(FW_CFG_CTL_IOPORT, FW_CFG_DATA_IOPORT, 0, 0);
 /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
  *
  * SeaBIOS needs FW_CFG_MAX_CPUS for CPU hotplug, but the CPU hotplug
@@ -954,7 +953,7 @@ void *pc_memory_init(MemoryRegion *system_memory,
 option_rom_mr,
 1);
 
-fw_cfg = bochs_bios_init();
+fw_cfg = bios_init();
 rom_set_fw(fw_cfg);
 
 if (linux_boot) {
diff --git a/hw/sun4u.c b/hw/sun4u.c
index 9fbda29..1bdc443 100644
--- a/hw/sun4u.c
+++ b/hw/sun4u.c
@@ -76,7 +76,6 @@
 #define PROM_FILENAME"openbios-sparc64"
 #define NVRAM_SIZE   0x2000
 #define MAX_IDE_BUS  2
-#define BIOS_CFG_IOPORT  0x510
 #define FW_CFG_SPARC64_WIDTH (FW_CFG_ARCH_LOCAL + 0x00)
 #define FW_CFG_SPARC64_HEIGHT (FW_CFG_ARCH_LOCAL + 0x01)
 #define FW_CFG_SPARC64_DEPTH (FW_CFG_ARCH_LOCAL + 0x02)
@@ -877,7 +876,7 @@ static void sun4uv_init(MemoryRegion *address_space_mem,
graphic_width, graphic_height, graphic_depth,
(uint8_t *)&nd_table[0].macaddr);
 
-fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
+fw_cfg = fw_cfg_init(FW_CFG_CTL_IOPORT, FW_CFG_DATA_IOPORT, 0, 0);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
-- 
1.7.2.5

[Qemu-devel] [PATCH v2 5/6] load_linux: report open kernel file & its size error

2013-02-21 Thread liguang

Signed-off-by: liguang 
---
 hw/pc.c |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index fd3a68c..30b3262 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -652,12 +652,22 @@ static void load_linux(void *fw_cfg,
 char *vmode;
 
 /* Align to 16 bytes as a paranoia measure */
-cmdline_size = (strlen(kernel_cmdline)+16) & ~15;
+cmdline_size = QEMU_ALIGN_UP(strlen(kernel_cmdline), 16);
 
 /* load the kernel header */
 f = fopen(kernel_filename, "rb");
-if (!f || !(kernel_size = get_file_size(f)) ||
-fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
+if (!f) {
+fprintf(stderr, "can't open kernel image file: %s\n",
+strerror(errno));
+exit(1);
+}
+kernel_size = get_file_size(f);
+if (kernel_size <= 0) {
+fprintf(stderr, "can't get size of kernel image file: %s\n",
+strerror(errno));
+exit(1);
+}
+if (fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
 MIN(ARRAY_SIZE(header), kernel_size)) {
 fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
 kernel_filename, strerror(errno));
-- 
1.7.2.5

[Qemu-devel] [PATCH v2 0/6] change acpi numa info format passed from qemu to seabios

2013-02-21 Thread liguang

orginally, numa info was packed into an array
of 64-bit data which was implicit and hard to
maintain, so we define a struct for these info,
hope to be as clear as enough.

these changes also involved seabios paches 
which was sent to seabios mail-list.

v2:
changes to fix coding style
spotted by Blue Swirl 

Li Guang (6)
 pc/bios: move common BIOS_CFG_IOPORT into fw_cfg.h
 pc/numa: refactor bios_init function
 bitops: change BITS_TO_LONGS
 pc: format load_linux()
 load_linux: report open kernel file & its size error
 load_linux: change kernel header size allocation

hw/fw_cfg.h   | 4 
hw/pc.c   | 169 -
hw/sun4u.c| 3 +--
include/qemu/bitops.h | 2 +-
4 files changed, 94 insertions(+), 84 deletions(-)

[Qemu-devel] [PATCH v2 6/6] load_linux: change kernel header size allocation

2013-02-21 Thread liguang

it's not necessary to alloc 8K bytes for kernel
header, 0.5K is enough.

Signed-off-by: liguang 
---
 hw/pc.c |   14 --
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 30b3262..4f78d4f 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -637,6 +637,9 @@ static long get_file_size(FILE *f)
 return size;
 }
 
+#define HEADER_SIZE 0x200
+#define HEADER_SIGNATURE 0x53726448
+
 static void load_linux(void *fw_cfg,
const char *kernel_filename,
const char *initrd_filename,
@@ -646,7 +649,7 @@ static void load_linux(void *fw_cfg,
 uint16_t protocol;
 int setup_size, kernel_size, initrd_size = 0, cmdline_size;
 uint32_t initrd_max;
-uint8_t header[8192], *setup, *kernel, *initrd_data;
+uint8_t header[HEADER_SIZE], *setup, *kernel, *initrd_data;
 hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0;
 FILE *f;
 char *vmode;
@@ -667,8 +670,7 @@ static void load_linux(void *fw_cfg,
 strerror(errno));
 exit(1);
 }
-if (fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
-MIN(ARRAY_SIZE(header), kernel_size)) {
+if (fread(header, 1, HEADER_SIZE, f) <= 0) {
 fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
 kernel_filename, strerror(errno));
 exit(1);
@@ -678,9 +680,9 @@ static void load_linux(void *fw_cfg,
 #if 0
 fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
 #endif
-if (ldl_p(header+0x202) == 0x53726448)
-protocol = lduw_p(header+0x206);
-else {
+if (ldl_p(header + 0x202) == HEADER_SIGNATURE) {
+protocol = lduw_p(header + 0x206);
+} else {
 /* This looks like a multiboot kernel. If it is, let's stop
treating it like a Linux kernel. */
 if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
-- 
1.7.2.5

[Qemu-devel] [PATCH v2] Fix guest OS hang when 64bit PCI bar present

2013-02-21 Thread Alexey Korolev

This patch addresses the issue fully described here:
http://lists.nongnu.org/archive/html/qemu-devel/2013-02/msg01804.html

Linux kernels prior to 2.6.36 do not disable the PCI device during
enumeration process. Since lower and higher parts of a 64bit BAR
are programmed separately this leads to qemu receiving a request to occupy
a completely wrong address region for a short period of time.
We have found that the boot process screws up completely if kvm-apic range
is overlapped even for a short period of time (it is fine for other
regions though).

This patch raises the priority of the kvm-apic memory region, so it is
never pushed out by PCI devices. The patch is quite safe as it does not
touch memory manager.

Signed-off-by: Alexey Korolev 
Signed-off-by: Michael S. Tsirkin 
---
 hw/sysbus.c   |   27 +++
 hw/sysbus.h   |2 ++
 target-i386/cpu.c |3 ++-
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/hw/sysbus.c b/hw/sysbus.c
index 6d9d1df..50c7232 100644
--- a/hw/sysbus.c
+++ b/hw/sysbus.c
@@ -48,7 +48,8 @@ void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq 
irq)
 }
 }
 
-void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
+static void sysbus_mmio_map_common(SysBusDevice *dev, int n, hwaddr addr,
+   bool may_overlap, unsigned priority)
 {
 assert(n >= 0 && n < dev->num_mmio);
 
@@ -61,11 +62,29 @@ void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
 memory_region_del_subregion(get_system_memory(), dev->mmio[n].memory);
 }
 dev->mmio[n].addr = addr;
-memory_region_add_subregion(get_system_memory(),
-addr,
-dev->mmio[n].memory);
+if (may_overlap) {
+memory_region_add_subregion_overlap(get_system_memory(),
+addr,
+dev->mmio[n].memory,
+priority);
+}
+else {
+memory_region_add_subregion(get_system_memory(),
+addr,
+dev->mmio[n].memory);
+}
 }
 
+void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
+{
+sysbus_mmio_map_common(dev, n, addr, false, 0);
+}
+
+void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
+ unsigned priority)
+{
+sysbus_mmio_map_common(dev, n, addr, true, priority);
+}
 
 /* Request an IRQ source.  The actual IRQ object may be populated later.  */
 void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p)
diff --git a/hw/sysbus.h b/hw/sysbus.h
index a7fcded..2100bd7 100644
--- a/hw/sysbus.h
+++ b/hw/sysbus.h
@@ -56,6 +56,8 @@ void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t 
ioport, pio_addr_t size);
 
 void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq);
 void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr);
+void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
+ unsigned priority);
 void sysbus_add_memory(SysBusDevice *dev, hwaddr addr,
MemoryRegion *mem);
 void sysbus_add_memory_overlap(SysBusDevice *dev, hwaddr addr,
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index dfcf86e..4cf27eb 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2078,7 +2078,8 @@ static void x86_cpu_apic_init(X86CPU *cpu, Error **errp)
 /* NOTE: the APIC is directly connected to the CPU - it is not
on the global memory bus. */
 /* XXX: what if the base changes? */
-sysbus_mmio_map(SYS_BUS_DEVICE(env->apic_state), 0, MSI_ADDR_BASE);
+sysbus_mmio_map_overlap(SYS_BUS_DEVICE(env->apic_state), 0,
+MSI_ADDR_BASE, 0x1000);
 apic_mapped = 1;
 }
 }
-- 
1.7.9.5

[Qemu-devel] [PATCH v4 RESEND 0/5] sheepdog: unix domain socket support

2013-02-21 Thread MORITA Kazutaka

This series makes sheepdog accept URI syntax, and adds a unix domain
socket support for a connection between qemu and local sheepdog server
based on the syntax.

Changes from v3:
 - fix wrong URI syntax in the commit log

Changes from v2:
 - fix coding style in tcp_connect
 - accept URI syntax

Changes from v1:
 - split patch for easy review
 - move set_nodelay to lib/osdep.c
 - remove redundant error checks
 - add a bit more explanation to qemu-options.hx

MORITA Kazutaka (5):
  slirp/tcp_subr.c: fix coding style in tcp_connect
  move socket_set_nodelay to osdep.c
  sheepdog: accept URIs
  sheepdog: use inet_connect to simplify connect code
  sheepdog: add support for connecting to unix domain socket

 block/sheepdog.c   | 315 ++---
 gdbstub.c  |   5 +-
 include/qemu/sockets.h |   1 +
 qemu-char.c|   6 -
 qemu-doc.texi  |  22 ++--
 qemu-options.hx|  18 +--
 slirp/tcp_subr.c   | 139 +++---
 util/osdep.c   |   6 +
 8 files changed, 290 insertions(+), 222 deletions(-)

-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH v4 RESEND 4/5] sheepdog: use inet_connect to simplify connect code

2013-02-21 Thread MORITA Kazutaka

This uses the form ":" for the representation of the
sheepdog server to use inet_connect.

Signed-off-by: MORITA Kazutaka 
---
 block/sheepdog.c | 111 +++
 1 file changed, 30 insertions(+), 81 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index bfa8a00..b5cbdfe 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -22,7 +22,7 @@
 #define SD_PROTO_VER 0x01
 
 #define SD_DEFAULT_ADDR "localhost"
-#define SD_DEFAULT_PORT "7000"
+#define SD_DEFAULT_PORT 7000
 
 #define SD_OP_CREATE_AND_WRITE_OBJ  0x01
 #define SD_OP_READ_OBJ   0x02
@@ -298,8 +298,7 @@ typedef struct BDRVSheepdogState {
 bool is_snapshot;
 uint32_t cache_flags;
 
-char *addr;
-char *port;
+char *host_spec;
 int fd;
 
 CoMutex lock;
@@ -447,56 +446,18 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 return acb;
 }
 
-static int connect_to_sdog(const char *addr, const char *port)
+static int connect_to_sdog(BDRVSheepdogState *s)
 {
-char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV];
-int fd, ret;
-struct addrinfo hints, *res, *res0;
-
-if (!addr) {
-addr = SD_DEFAULT_ADDR;
-port = SD_DEFAULT_PORT;
-}
+int fd;
+Error *err = NULL;
 
-memset(&hints, 0, sizeof(hints));
-hints.ai_socktype = SOCK_STREAM;
+fd = inet_connect(s->host_spec, &err);
 
-ret = getaddrinfo(addr, port, &hints, &res0);
-if (ret) {
-error_report("unable to get address info %s, %s",
- addr, strerror(errno));
-return -errno;
+if (err != NULL) {
+qerror_report_err(err);
+error_free(err);
 }
 
-for (res = res0; res; res = res->ai_next) {
-ret = getnameinfo(res->ai_addr, res->ai_addrlen, hbuf, sizeof(hbuf),
-  sbuf, sizeof(sbuf), NI_NUMERICHOST | NI_NUMERICSERV);
-if (ret) {
-continue;
-}
-
-fd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
-if (fd < 0) {
-continue;
-}
-
-reconnect:
-ret = connect(fd, res->ai_addr, res->ai_addrlen);
-if (ret < 0) {
-if (errno == EINTR) {
-goto reconnect;
-}
-close(fd);
-break;
-}
-
-dprintf("connected to %s:%s\n", addr, port);
-goto success;
-}
-fd = -errno;
-error_report("failed connect to %s:%s", addr, port);
-success:
-freeaddrinfo(res0);
 return fd;
 }
 
@@ -798,9 +759,8 @@ static int get_sheep_fd(BDRVSheepdogState *s)
 {
 int ret, fd;
 
-fd = connect_to_sdog(s->addr, s->port);
+fd = connect_to_sdog(s);
 if (fd < 0) {
-error_report("%s", strerror(errno));
 return fd;
 }
 
@@ -836,12 +796,8 @@ static int sd_parse_uri(BDRVSheepdogState *s, const char 
*filename,
 pstrcpy(vdi, SD_MAX_VDI_LEN, uri->path + 1);
 
 /* sheepdog[+tcp]://[host:port]/vdiname */
-s->addr = g_strdup(uri->server ?: SD_DEFAULT_ADDR);
-if (uri->port) {
-s->port = g_strdup_printf("%d", uri->port);
-} else {
-s->port = g_strdup(SD_DEFAULT_PORT);
-}
+s->host_spec = g_strdup_printf("%s:%d", uri->server ?: SD_DEFAULT_ADDR,
+   uri->port ?: SD_DEFAULT_PORT);
 
 /* snapshot tag */
 if (uri->fragment) {
@@ -935,7 +891,7 @@ static int find_vdi_name(BDRVSheepdogState *s, char 
*filename, uint32_t snapid,
 unsigned int wlen, rlen = 0;
 char buf[SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN];
 
-fd = connect_to_sdog(s->addr, s->port);
+fd = connect_to_sdog(s);
 if (fd < 0) {
 return fd;
 }
@@ -1178,9 +1134,8 @@ static int sd_open(BlockDriverState *bs, const char 
*filename, int flags)
 s->is_snapshot = true;
 }
 
-fd = connect_to_sdog(s->addr, s->port);
+fd = connect_to_sdog(s);
 if (fd < 0) {
-error_report("failed to connect");
 ret = fd;
 goto out;
 }
@@ -1213,9 +1168,8 @@ out:
 return ret;
 }
 
-static int do_sd_create(char *filename, int64_t vdi_size,
-uint32_t base_vid, uint32_t *vdi_id, int snapshot,
-const char *addr, const char *port)
+static int do_sd_create(BDRVSheepdogState *s, char *filename, int64_t vdi_size,
+uint32_t base_vid, uint32_t *vdi_id, int snapshot)
 {
 SheepdogVdiReq hdr;
 SheepdogVdiRsp *rsp = (SheepdogVdiRsp *)&hdr;
@@ -1223,7 +1177,7 @@ static int do_sd_create(char *filename, int64_t vdi_size,
 unsigned int wlen, rlen = 0;
 char buf[SD_MAX_VDI_LEN];
 
-fd = connect_to_sdog(addr, port);
+fd = connect_to_sdog(s);
 if (fd < 0) {
 return fd;
 }
@@ -1390,7 +1344,7 @@ static int sd_create(const char *filename, 
QEMUOptionParameter *options)
 bdrv_delete(bs);
 }
 
-ret = do_sd_create(vdi, vdi_size, base_vid, &vid, 0, s->addr, s->port);
+ret = do_sd_create(s, vd

[Qemu-devel] [PATCH v4 RESEND 1/5] slirp/tcp_subr.c: fix coding style in tcp_connect

2013-02-21 Thread MORITA Kazutaka

Fix coding style in tcp_connect before the next patch.

Signed-off-by: MORITA Kazutaka 
---
 slirp/tcp_subr.c | 140 ---
 1 file changed, 72 insertions(+), 68 deletions(-)

diff --git a/slirp/tcp_subr.c b/slirp/tcp_subr.c
index 1542e43..317dc07 100644
--- a/slirp/tcp_subr.c
+++ b/slirp/tcp_subr.c
@@ -384,83 +384,87 @@ int tcp_fconnect(struct socket *so)
  * the time it gets to accept(), so... We simply accept
  * here and SYN the local-host.
  */
-void
-tcp_connect(struct socket *inso)
+void tcp_connect(struct socket *inso)
 {
-   Slirp *slirp = inso->slirp;
-   struct socket *so;
-   struct sockaddr_in addr;
-   socklen_t addrlen = sizeof(struct sockaddr_in);
-   struct tcpcb *tp;
-   int s, opt;
+Slirp *slirp = inso->slirp;
+struct socket *so;
+struct sockaddr_in addr;
+socklen_t addrlen = sizeof(struct sockaddr_in);
+struct tcpcb *tp;
+int s, opt;
 
-   DEBUG_CALL("tcp_connect");
-   DEBUG_ARG("inso = %lx", (long)inso);
+DEBUG_CALL("tcp_connect");
+DEBUG_ARG("inso = %lx", (long)inso);
 
-   /*
-* If it's an SS_ACCEPTONCE socket, no need to socreate()
-* another socket, just use the accept() socket.
-*/
-   if (inso->so_state & SS_FACCEPTONCE) {
-   /* FACCEPTONCE already have a tcpcb */
-   so = inso;
-   } else {
-   if ((so = socreate(slirp)) == NULL) {
-   /* If it failed, get rid of the pending connection */
-   closesocket(accept(inso->s,(struct sockaddr 
*)&addr,&addrlen));
-   return;
-   }
-   if (tcp_attach(so) < 0) {
-   free(so); /* NOT sofree */
-   return;
-   }
-   so->so_laddr = inso->so_laddr;
-   so->so_lport = inso->so_lport;
-   }
+/*
+ * If it's an SS_ACCEPTONCE socket, no need to socreate()
+ * another socket, just use the accept() socket.
+ */
+if (inso->so_state & SS_FACCEPTONCE) {
+/* FACCEPTONCE already have a tcpcb */
+so = inso;
+} else {
+so = socreate(slirp);
+if (so == NULL) {
+/* If it failed, get rid of the pending connection */
+closesocket(accept(inso->s, (struct sockaddr *)&addr, &addrlen));
+return;
+}
+if (tcp_attach(so) < 0) {
+free(so); /* NOT sofree */
+return;
+}
+so->so_laddr = inso->so_laddr;
+so->so_lport = inso->so_lport;
+}
 
-   (void) tcp_mss(sototcpcb(so), 0);
+tcp_mss(sototcpcb(so), 0);
 
-   if ((s = accept(inso->s,(struct sockaddr *)&addr,&addrlen)) < 0) {
-   tcp_close(sototcpcb(so)); /* This will sofree() as well */
-   return;
-   }
-   socket_set_nonblock(s);
-   opt = 1;
-   setsockopt(s,SOL_SOCKET,SO_REUSEADDR,(char *)&opt,sizeof(int));
-   opt = 1;
-   setsockopt(s,SOL_SOCKET,SO_OOBINLINE,(char *)&opt,sizeof(int));
-   opt = 1;
-   setsockopt(s,IPPROTO_TCP,TCP_NODELAY,(char *)&opt,sizeof(int));
-
-   so->so_fport = addr.sin_port;
-   so->so_faddr = addr.sin_addr;
-   /* Translate connections from localhost to the real hostname */
-if (so->so_faddr.s_addr == 0 ||
-(so->so_faddr.s_addr & loopback_mask) ==
-(loopback_addr.s_addr & loopback_mask)) {
-so->so_faddr = slirp->vhost_addr;
-}
+s = accept(inso->s, (struct sockaddr *)&addr, &addrlen);
+if (s < 0) {
+tcp_close(sototcpcb(so)); /* This will sofree() as well */
+return;
+}
+socket_set_nonblock(s);
+opt = 1;
+setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (char *)&opt, sizeof(int));
+opt = 1;
+setsockopt(s, SOL_SOCKET, SO_OOBINLINE, (char *)&opt, sizeof(int));
+opt = 1;
+setsockopt(s, IPPROTO_TCP, TCP_NODELAY, (char *)&opt, sizeof(int));
+
+so->so_fport = addr.sin_port;
+so->so_faddr = addr.sin_addr;
+/* Translate connections from localhost to the real hostname */
+if (so->so_faddr.s_addr == 0 ||
+(so->so_faddr.s_addr & loopback_mask) ==
+(loopback_addr.s_addr & loopback_mask)) {
+so->so_faddr = slirp->vhost_addr;
+}
 
-   /* Close the accept() socket, set right state */
-   if (inso->so_state & SS_FACCEPTONCE) {
-   closesocket(so->s); /* If we only accept once, close the 
accept() socket */
-   so->so_state = SS_NOFDREF; /* Don't select it yet, even though 
we have an FD */
-  /* if it's not FACCEPTONCE, it's 
already NOFDREF */
-   }
-   so->s = s;
-   so->so_state |= SS_INCOMING;
+/* Close the accept() socket, set right state */
+if (inso->so_state & SS_FACCEPTONCE) {
+/* If we only accept once, close the accept() socket */
+closesocket(so->s);
+
+/* Don't

[Qemu-devel] [PATCH v4 RESEND 5/5] sheepdog: add support for connecting to unix domain socket

2013-02-21 Thread MORITA Kazutaka

This patch adds support for a unix domain socket for a connection
between qemu and local sheepdog server.  You can use the unix domain
socket with the following syntax:

 $ qemu sheepdog+unix:///?socket=[#snapid]

Signed-off-by: MORITA Kazutaka 
---
 block/sheepdog.c | 82 +++-
 qemu-doc.texi|  6 +
 qemu-options.hx  |  2 +-
 3 files changed, 77 insertions(+), 13 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index b5cbdfe..c711c28 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -299,6 +299,7 @@ typedef struct BDRVSheepdogState {
 uint32_t cache_flags;
 
 char *host_spec;
+bool is_unix;
 int fd;
 
 CoMutex lock;
@@ -451,7 +452,18 @@ static int connect_to_sdog(BDRVSheepdogState *s)
 int fd;
 Error *err = NULL;
 
-fd = inet_connect(s->host_spec, &err);
+if (s->is_unix) {
+fd = unix_connect(s->host_spec, &err);
+} else {
+fd = inet_connect(s->host_spec, &err);
+
+if (err == NULL) {
+int ret = socket_set_nodelay(fd);
+if (ret < 0) {
+error_report("%s", strerror(errno));
+}
+}
+}
 
 if (err != NULL) {
 qerror_report_err(err);
@@ -757,7 +769,7 @@ static int aio_flush_request(void *opaque)
  */
 static int get_sheep_fd(BDRVSheepdogState *s)
 {
-int ret, fd;
+int fd;
 
 fd = connect_to_sdog(s);
 if (fd < 0) {
@@ -766,13 +778,6 @@ static int get_sheep_fd(BDRVSheepdogState *s)
 
 socket_set_nonblock(fd);
 
-ret = socket_set_nodelay(fd);
-if (ret) {
-error_report("%s", strerror(errno));
-closesocket(fd);
-return -errno;
-}
-
 qemu_aio_set_fd_handler(fd, co_read_response, NULL, aio_flush_request, s);
 return fd;
 }
@@ -789,15 +794,42 @@ static int sd_parse_uri(BDRVSheepdogState *s, const char 
*filename,
 return -EINVAL;
 }
 
+/* transport */
+if (!strcmp(uri->scheme, "sheepdog")) {
+s->is_unix = false;
+} else if (!strcmp(uri->scheme, "sheepdog+tcp")) {
+s->is_unix = false;
+} else if (!strcmp(uri->scheme, "sheepdog+unix")) {
+s->is_unix = true;
+} else {
+ret = -EINVAL;
+goto out;
+}
+
 if (uri->path == NULL || !strcmp(uri->path, "/")) {
 ret = -EINVAL;
 goto out;
 }
 pstrcpy(vdi, SD_MAX_VDI_LEN, uri->path + 1);
 
-/* sheepdog[+tcp]://[host:port]/vdiname */
-s->host_spec = g_strdup_printf("%s:%d", uri->server ?: SD_DEFAULT_ADDR,
-   uri->port ?: SD_DEFAULT_PORT);
+qp = query_params_parse(uri->query);
+if (qp->n > 1 || (s->is_unix && !qp->n) || (!s->is_unix && qp->n)) {
+ret = -EINVAL;
+goto out;
+}
+
+if (s->is_unix) {
+/* sheepdog+unix:///vdiname?socket=path */
+if (uri->server || uri->port || strcmp(qp->p[0].name, "socket")) {
+ret = -EINVAL;
+goto out;
+}
+s->host_spec = g_strdup(qp->p[0].value);
+} else {
+/* sheepdog[+tcp]://[host:port]/vdiname */
+s->host_spec = g_strdup_printf("%s:%d", uri->server ?: SD_DEFAULT_ADDR,
+   uri->port ?: SD_DEFAULT_PORT);
+}
 
 /* snapshot tag */
 if (uri->fragment) {
@@ -2098,9 +2130,35 @@ static BlockDriver bdrv_sheepdog_tcp = {
 .create_options = sd_create_options,
 };
 
+static BlockDriver bdrv_sheepdog_unix = {
+.format_name= "sheepdog",
+.protocol_name  = "sheepdog+unix",
+.instance_size  = sizeof(BDRVSheepdogState),
+.bdrv_file_open = sd_open,
+.bdrv_close = sd_close,
+.bdrv_create= sd_create,
+.bdrv_getlength = sd_getlength,
+.bdrv_truncate  = sd_truncate,
+
+.bdrv_co_readv  = sd_co_readv,
+.bdrv_co_writev = sd_co_writev,
+.bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
+
+.bdrv_snapshot_create   = sd_snapshot_create,
+.bdrv_snapshot_goto = sd_snapshot_goto,
+.bdrv_snapshot_delete   = sd_snapshot_delete,
+.bdrv_snapshot_list = sd_snapshot_list,
+
+.bdrv_save_vmstate  = sd_save_vmstate,
+.bdrv_load_vmstate  = sd_load_vmstate,
+
+.create_options = sd_create_options,
+};
+
 static void bdrv_sheepdog_init(void)
 {
 bdrv_register(&bdrv_sheepdog);
 bdrv_register(&bdrv_sheepdog_tcp);
+bdrv_register(&bdrv_sheepdog_unix);
 }
 block_init(bdrv_sheepdog_init);
diff --git a/qemu-doc.texi b/qemu-doc.texi
index d4eb5eb..9f41589 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -865,6 +865,12 @@ qemu-img create -b sheepdog:///@var{base}#@var{tag} 
sheepdog:///@var{image}
 where @var{base} is a image name of the source snapshot and @var{tag}
 is its tag name.
 
+You can use an unix socket instead of an inet socket:
+
+@example
+qemu-system-i386 sheepdog+unix:///@var{image}?socket=@var{path}
+@end example
+
 If the Sheepdog daemon doesn't run on the local host, you need to
 specify one of the Sheepdog servers

[Qemu-devel] [PATCH v4 RESEND 2/5] move socket_set_nodelay to osdep.c

2013-02-21 Thread MORITA Kazutaka

Signed-off-by: MORITA Kazutaka 
---
 block/sheepdog.c   | 11 +--
 gdbstub.c  |  5 ++---
 include/qemu/sockets.h |  1 +
 qemu-char.c|  6 --
 slirp/tcp_subr.c   |  3 +--
 util/osdep.c   |  6 ++
 6 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index d466b23..51e75ad 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -787,15 +787,6 @@ static int aio_flush_request(void *opaque)
 !QLIST_EMPTY(&s->pending_aio_head);
 }
 
-static int set_nodelay(int fd)
-{
-int ret, opt;
-
-opt = 1;
-ret = setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)&opt, sizeof(opt));
-return ret;
-}
-
 /*
  * Return a socket discriptor to read/write objects.
  *
@@ -814,7 +805,7 @@ static int get_sheep_fd(BDRVSheepdogState *s)
 
 socket_set_nonblock(fd);
 
-ret = set_nodelay(fd);
+ret = socket_set_nodelay(fd);
 if (ret) {
 error_report("%s", strerror(errno));
 closesocket(fd);
diff --git a/gdbstub.c b/gdbstub.c
index 32dfea9..e414ad9 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -2841,7 +2841,7 @@ static void gdb_accept(void)
 GDBState *s;
 struct sockaddr_in sockaddr;
 socklen_t len;
-int val, fd;
+int fd;
 
 for(;;) {
 len = sizeof(sockaddr);
@@ -2858,8 +2858,7 @@ static void gdb_accept(void)
 }
 
 /* set short latency */
-val = 1;
-setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)&val, sizeof(val));
+socket_set_nodelay(fd);
 
 s = g_malloc0(sizeof(GDBState));
 s->c_cpu = first_cpu;
diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 803ae17..6125bf7 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -34,6 +34,7 @@ int inet_aton(const char *cp, struct in_addr *ia);
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 int socket_set_cork(int fd, int v);
+int socket_set_nodelay(int fd);
 void socket_set_block(int fd);
 void socket_set_nonblock(int fd);
 int send_all(int fd, const void *buf, int len1);
diff --git a/qemu-char.c b/qemu-char.c
index e4b0f53..3e16b5b 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2365,12 +2365,6 @@ static void tcp_chr_telnet_init(int fd)
 send(fd, (char *)buf, 3, 0);
 }
 
-static void socket_set_nodelay(int fd)
-{
-int val = 1;
-setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)&val, sizeof(val));
-}
-
 static int tcp_chr_add_client(CharDriverState *chr, int fd)
 {
 TCPCharDriver *s = chr->opaque;
diff --git a/slirp/tcp_subr.c b/slirp/tcp_subr.c
index 317dc07..7b7ad60 100644
--- a/slirp/tcp_subr.c
+++ b/slirp/tcp_subr.c
@@ -430,8 +430,7 @@ void tcp_connect(struct socket *inso)
 setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (char *)&opt, sizeof(int));
 opt = 1;
 setsockopt(s, SOL_SOCKET, SO_OOBINLINE, (char *)&opt, sizeof(int));
-opt = 1;
-setsockopt(s, IPPROTO_TCP, TCP_NODELAY, (char *)&opt, sizeof(int));
+socket_set_nodelay(s);
 
 so->so_fport = addr.sin_port;
 so->so_faddr = addr.sin_addr;
diff --git a/util/osdep.c b/util/osdep.c
index 5b51a03..c408261 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -63,6 +63,12 @@ int socket_set_cork(int fd, int v)
 #endif
 }
 
+int socket_set_nodelay(int fd)
+{
+int v = 1;
+return setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &v, sizeof(v));
+}
+
 int qemu_madvise(void *addr, size_t len, int advice)
 {
 if (advice == QEMU_MADV_INVALID) {
-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH v4 RESEND 3/5] sheepdog: accept URIs

2013-02-21 Thread MORITA Kazutaka

The URI syntax is consistent with the NBD and Gluster syntax.  The
syntax is

  sheepdog[+tcp]://[host:port]/vdiname[#snapid|#tag]

Signed-off-by: MORITA Kazutaka 
---
 block/sheepdog.c | 139 +--
 qemu-doc.texi|  16 +++
 qemu-options.hx  |  18 ++-
 3 files changed, 117 insertions(+), 56 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 51e75ad..bfa8a00 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -13,6 +13,7 @@
  */
 
 #include "qemu-common.h"
+#include "qemu/uri.h"
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
 #include "block/block_int.h"
@@ -816,8 +817,52 @@ static int get_sheep_fd(BDRVSheepdogState *s)
 return fd;
 }
 
+static int sd_parse_uri(BDRVSheepdogState *s, const char *filename,
+char *vdi, uint32_t *snapid, char *tag)
+{
+URI *uri;
+QueryParams *qp = NULL;
+int ret = 0;
+
+uri = uri_parse(filename);
+if (!uri) {
+return -EINVAL;
+}
+
+if (uri->path == NULL || !strcmp(uri->path, "/")) {
+ret = -EINVAL;
+goto out;
+}
+pstrcpy(vdi, SD_MAX_VDI_LEN, uri->path + 1);
+
+/* sheepdog[+tcp]://[host:port]/vdiname */
+s->addr = g_strdup(uri->server ?: SD_DEFAULT_ADDR);
+if (uri->port) {
+s->port = g_strdup_printf("%d", uri->port);
+} else {
+s->port = g_strdup(SD_DEFAULT_PORT);
+}
+
+/* snapshot tag */
+if (uri->fragment) {
+*snapid = strtoul(uri->fragment, NULL, 10);
+if (*snapid == 0) {
+pstrcpy(tag, SD_MAX_VDI_TAG_LEN, uri->fragment);
+}
+} else {
+*snapid = CURRENT_VDI_ID; /* search current vdi */
+}
+
+out:
+if (qp) {
+query_params_free(qp);
+}
+uri_free(uri);
+return ret;
+}
+
 /*
- * Parse a filename
+ * Parse a filename (old syntax)
  *
  * filename must be one of the following formats:
  *   1. [vdiname]
@@ -836,9 +881,11 @@ static int get_sheep_fd(BDRVSheepdogState *s)
 static int parse_vdiname(BDRVSheepdogState *s, const char *filename,
  char *vdi, uint32_t *snapid, char *tag)
 {
-char *p, *q;
-int nr_sep;
+char *p, *q, *uri;
+const char *host_spec, *vdi_spec;
+int nr_sep, ret;
 
+strstart(filename, "sheepdog:", (const char **)&filename);
 p = q = g_strdup(filename);
 
 /* count the number of separators */
@@ -851,38 +898,32 @@ static int parse_vdiname(BDRVSheepdogState *s, const char 
*filename,
 }
 p = q;
 
-/* use the first two tokens as hostname and port number. */
+/* use the first two tokens as host_spec. */
 if (nr_sep >= 2) {
-s->addr = p;
+host_spec = p;
 p = strchr(p, ':');
-*p++ = '\0';
-
-s->port = p;
+p++;
 p = strchr(p, ':');
 *p++ = '\0';
 } else {
-s->addr = NULL;
-s->port = 0;
+host_spec = "";
 }
 
-pstrcpy(vdi, SD_MAX_VDI_LEN, p);
+vdi_spec = p;
 
-p = strchr(vdi, ':');
+p = strchr(vdi_spec, ':');
 if (p) {
-*p++ = '\0';
-*snapid = strtoul(p, NULL, 10);
-if (*snapid == 0) {
-pstrcpy(tag, SD_MAX_VDI_TAG_LEN, p);
-}
-} else {
-*snapid = CURRENT_VDI_ID; /* search current vdi */
+*p++ = '#';
 }
 
-if (s->addr == NULL) {
-g_free(q);
-}
+uri = g_strdup_printf("sheepdog://%s/%s", host_spec, vdi_spec);
 
-return 0;
+ret = sd_parse_uri(s, uri, vdi, snapid, tag);
+
+g_free(q);
+g_free(uri);
+
+return ret;
 }
 
 static int find_vdi_name(BDRVSheepdogState *s, char *filename, uint32_t snapid,
@@ -1097,16 +1138,19 @@ static int sd_open(BlockDriverState *bs, const char 
*filename, int flags)
 uint32_t snapid;
 char *buf = NULL;
 
-strstart(filename, "sheepdog:", (const char **)&filename);
-
 QLIST_INIT(&s->inflight_aio_head);
 QLIST_INIT(&s->pending_aio_head);
 s->fd = -1;
 
 memset(vdi, 0, sizeof(vdi));
 memset(tag, 0, sizeof(tag));
-if (parse_vdiname(s, filename, vdi, &snapid, tag) < 0) {
-ret = -EINVAL;
+
+if (strstr(filename, "://")) {
+ret = sd_parse_uri(s, filename, vdi, &snapid, tag);
+} else {
+ret = parse_vdiname(s, filename, vdi, &snapid, tag);
+}
+if (ret < 0) {
 goto out;
 }
 s->fd = get_sheep_fd(s);
@@ -1275,17 +1319,17 @@ static int sd_create(const char *filename, 
QEMUOptionParameter *options)
 char vdi[SD_MAX_VDI_LEN], tag[SD_MAX_VDI_TAG_LEN];
 uint32_t snapid;
 bool prealloc = false;
-const char *vdiname;
 
 s = g_malloc0(sizeof(BDRVSheepdogState));
 
-strstart(filename, "sheepdog:", &vdiname);
-
 memset(vdi, 0, sizeof(vdi));
 memset(tag, 0, sizeof(tag));
-if (parse_vdiname(s, vdiname, vdi, &snapid, tag) < 0) {
-error_report("invalid filename");
-ret = -EINVAL;
+if (strstr(filename, "://")) {
+ret = sd_parse_uri(s, fi

[Qemu-devel] [PATCH] qemu: define a TOM register to report the base of PCI

2013-02-21 Thread Xudong Hao

Define a TOM(top of memory) register to report the base of PCI memory, update
memory region dynamically.

The use case of this register defination is for Xen till now.

Signed-off-by: Xudong Hao 
Signed-off-by: Xiantao Zhang 
---
 hw/pc.h   |  4 ---
 hw/pc_piix.c  |  6 -
 hw/piix_pci.c | 79 ---
 3 files changed, 59 insertions(+), 30 deletions(-)

diff --git a/hw/pc.h b/hw/pc.h
index fbcf43d..2a60490 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -134,10 +134,6 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int 
*piix_devfn,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
 ram_addr_t ram_size,
-hwaddr pci_hole_start,
-hwaddr pci_hole_size,
-hwaddr pci_hole64_start,
-hwaddr pci_hole64_size,
 MemoryRegion *pci_memory,
 MemoryRegion *ram_memory);
 
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 0a6923d..98cf467 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -130,12 +130,6 @@ static void pc_init1(MemoryRegion *system_memory,
 if (pci_enabled) {
 pci_bus = i440fx_init(&i440fx_state, &piix3_devfn, &isa_bus, gsi,
   system_memory, system_io, ram_size,
-  below_4g_mem_size,
-  0x1ULL - below_4g_mem_size,
-  0x1ULL + above_4g_mem_size,
-  (sizeof(hwaddr) == 4
-   ? 0
-   : ((uint64_t)1 << 62)),
   pci_memory, ram_memory);
 } else {
 pci_bus = NULL;
diff --git a/hw/piix_pci.c b/hw/piix_pci.c
index 3d79c73..88bd688 100644
--- a/hw/piix_pci.c
+++ b/hw/piix_pci.c
@@ -86,6 +86,14 @@ struct PCII440FXState {
 #define I440FX_PAM_SIZE 7
 #define I440FX_SMRAM0x72
 
+/* The maximum vaule of TOM(top of memory) register in I440FX
+ * is 1G, so it doesn't meet any popular virutal machines, so
+ * define another register to report the base of PCI memory.
+ * Use four bytes 0xb0-0xb3 for this purpose, they are originally
+ * resevered for host bridge.
+ * */
+#define I440FX_PCI_HOLE_BASE 0xb0
+
 static void piix3_set_irq(void *opaque, int pirq, int level);
 static PCIINTxRoute piix3_route_intx_pin_to_irq(void *opaque, int pci_intx);
 static void piix3_write_config_xen(PCIDevice *dev,
@@ -101,6 +109,43 @@ static int pci_slot_get_pirq(PCIDevice *pci_dev, int 
pci_intx)
 return (pci_intx + slot_addend) & 3;
 }
 
+
+static void i440fx_update_pci_mem_hole(PCII440FXState *f, bool del)
+{
+ram_addr_t above_4g_mem_size;
+hwaddr pci_hole_start, pci_hole_size, pci_hole64_start, pci_hole64_size;
+
+pci_hole_start = pci_default_read_config(&f->dev, I440FX_PCI_HOLE_BASE, 4);
+pci_hole_size = 0x1ULL - pci_hole_start;
+
+if (ram_size >= pci_hole_start) {
+above_4g_mem_size = ram_size - pci_hole_start;
+} else {
+above_4g_mem_size = 0;
+}
+pci_hole64_start = 0x1ULL + above_4g_mem_size;
+pci_hole64_size = sizeof(hwaddr) == 4 ? 0 : ((uint64_t)1 << 62);
+
+if (del) {
+memory_region_del_subregion(f->system_memory, &f->pci_hole);
+if (pci_hole64_size) {
+memory_region_del_subregion(f->system_memory, &f->pci_hole_64bit);
+}
+}
+
+memory_region_init_alias(&f->pci_hole, "pci-hole", f->pci_address_space,
+ pci_hole_start, pci_hole_size);
+memory_region_add_subregion(f->system_memory, pci_hole_start, 
&f->pci_hole);
+memory_region_init_alias(&f->pci_hole_64bit, "pci-hole64",
+ f->pci_address_space,
+ pci_hole64_start, pci_hole64_size);
+if (pci_hole64_size) {
+memory_region_add_subregion(f->system_memory, pci_hole64_start,
+&f->pci_hole_64bit);
+}
+}
+
+
 static void i440fx_update_memory_mappings(PCII440FXState *d)
 {
 int i;
@@ -136,6 +181,9 @@ static void i440fx_write_config(PCIDevice *dev,
 range_covers_byte(address, len, I440FX_SMRAM)) {
 i440fx_update_memory_mappings(d);
 }
+if (range_covers_byte(address, len, I440FX_PCI_HOLE_BASE)) {
+i440fx_update_pci_mem_hole(d, true);
+}
 }
 
 static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id)
@@ -203,6 +251,16 @@ static int i440fx_initfn(PCIDevice *dev)
 
 d->dev.config[I440FX_SMRAM] = 0x02;
 
+/* Emulate top of memory, here use 0xe000 as default val*/
+if (xen_enabled()) {
+d->dev.config[I440FX_PCI_HOLE_BASE] = 0xf0;
+} else {
+d->dev.config[I440FX_PCI_HOLE_BASE] = 0xe0;
+}
+d->dev.config[I440FX_PCI_HOLE_BASE + 1] = 0x00;
+d->dev.config[I440FX_PCI_HOLE_BASE + 2] = 0x00;
+d->dev.config[I440FX_PCI_HOLE_BASE + 3] = 0x00;
+

Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region

2013-02-21 Thread Alexey Korolev

On Wed, Feb 20, 2013 at 4:20 AM, Michael S. Tsirkin  wrote:
> apic overlaps PCI space. On real hardware it has
> higher priority, emulate this correctly.
>
> This should addresses the following issue:
>
>> Subject: Re: [BUG] Guest OS hangs on boot when 64bit BAR present 
>> (kvm-apic-msi resource conflict)
>> Sometime ago I reported an issue about guest OS hang when 64bit BAR present.
>> http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg03189.html
>> http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg00413.html
>>
>> Some more investigation has been done, so in this post I'll try to explain 
>> why it happens and offer possible solutions:
>>
>> *When the issue happens*
>> The issue occurs on Linux guest OS if kernel version <2.6.36
>> A Guest OS hangs on boot when a 64bit PCI BAR is present in a system (if we 
>> use ivshmem driver for example) and occupies range within first
>> 4 GB.
>>
>> *How to reproduce*
>> I used the following qemu command to reproduce the case:
>> /usr/local/bin/qemu-system-x86_64 -M pc-1.3 -enable-kvm -m 2000 -smp 
>> 1,sockets=1,cores=1,threads=1 -name Rh5332 -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/Rh5332.monitor,server,nowait
>>  -mon chardev=charmonitor,id=monitor,mode=readline -rtc
>> base=utc -boot cd -drive 
>> file=/home/akorolev/rh5332.img,if=none,id=drive-ide0-0-0,format=raw -device
>> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev 
>> file,id=charserial0,path=/home/akorolev/serial.log -device
>> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us 
>> -vga cirrus -device ivshmem,shm,size=32M-device
>> virtio-balloon-pci,id=balloon0
>>
>> Tried different guests: Centos 5.8 64bit, RHEL 5.3 32bit, FC 12 64bit on all 
>> machines hang occurs in 100% cases
>>
>> *Why it happens*
>> The issue basically comes from Linux PCI enumeration code.
>>
>> The OS enumerates 64BIT bars when device is enabled using the following 
>> procedure.
>> 1. Write all FF's to lower half of 64bit BAR
>> 2. Write address back to lower half of 64bit BAR
>> 3. Write all FF's to higher half of 64bit BAR
>> 4. Write address back to higher half of 64bit BAR
>>
>> For qemu it means that  qemu pci_default_write_config() recevies all FFs for 
>> lower part of the 64bit BAR.
>> Then it applies the mask and converts the value to "All FF's - size + 1" 
>> (FE00 if size is 32MB).
>>
>> So for short period of time the range [0xFE00 - 0x] will be 
>> occupied by ivshmem resource.
>> For some reason it is lethal for further boot process.
>>
>> We have found that boot process screws up completely if kvm-apic-msi range 
>> is overlapped even for short period of time.  (We still don't
>> know why it happens, hope that the qemu maintainers can answer?)
>>
>> If we look at kvm-apic-msi memory region it is a non-overlapable memory 
>> region with hardcoded address range [0xFEE0 - 0xFEF0].
>>
>> Here is a log we collected from render_memory_regions:
>>
>>  system overlap 0 pri 0 [0x0 - 0x7fff]
>>  kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000]
>>  pc.ram overlap 0 pri 0 [0xca000 - 0xcd000]
>>  ++ pc.ram [0xca000 - 0xcd000] is added to view
>>  
>>  smram-region overlap 1 pri 1 [0xa - 0xc]
>>  pci overlap 0 pri 0 [0xa - 0xc]
>>  cirrus-lowmem-container overlap 1 pri 1 [0xa - 0xc]
>>  cirrus-low-memory overlap 0 pri 0 [0xa - 0xc]
>> ++cirrus-low-memory [0xa - 0xc] is added to view
>>  kvm-ioapic overlap 0 pri 0 [0xfec0 - 0xfec01000]
>> ++kvm-ioapic [0xfec0 - 0xfec01000] is added to view
>>  pci-hole64 overlap 0 pri 0 [0x1 - 0x4001]
>>  pci overlap 0 pri 0 [0x1 - 0x4001]
>>  pci-hole overlap 0 pri 0 [0x7d00 - 0x1]
>>  pci overlap 0 pri 0 [0x7d00 - 0x1]
>>  ivshmem-bar2-container overlap 1 pri 1 [0xfe00 - 
>> 0x1]
>>  ivshmem.bar2 overlap 0 pri 0 [0xfe00 - 0x1]
>> ++ivshmem.bar2 [0xfe00 - 0xfec0] is added to view
>> ++ivshmem.bar2  [0xfec01000 - 0x1] is added to view
>>  ivshmem-mmio overlap 1 pri 1 [0xfebf1000 - 0xfebf1100]
>>  e1000-mmio overlap 1 pri 1 [0xfeba - 0xfebc]
>>  cirrus-mmio overlap 1 pri 1 [0xfebf - 0xfebf1000]
>>  cirrus-pci-bar0 overlap 1 pri 1 [0xfa00 - 0xfc00]
>>  vga.vram overlap 1 pri 1 [0xfa00 - 0xfa80]
>> ++vga.vram [0xfa00 - 0xfa80] is added to view
>>  cirrus-bitblt-mmio overlap 0 pri 0 [0xfb00 - 0xfb40]
>> ++cirrus-bitblt-mmio [0xfb00 - 0xfb40] is added to 
>> view
>>  cirrus-linear-io overlap 0 pri 0 [0xfa00 - 0xfa80]
>>  pc.bios overlap 0 pri 0 [0xfffe

Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/3] powerpc iommu: multiple TCE requests enabled

2013-02-21 Thread Alexey Kardashevskiy


On 22/02/13 09:52, David Gibson wrote:

On Tue, Feb 19, 2013 at 06:43:35PM +1100, Alexey Kardashevskiy wrote:

Currently only single TCE entry per requiest is supported (H_PUT_TCE).
However PAPR+ specification allows multiple entry requests such as
H_PUT_TCE_INDIRECT and H_STUFFF_TCE. Having less transitions to the host
kernel via ioctls, support of these calls can accelerate IOMMU operations.

The patch adds a check for the KVM_CAP_PPC_MULTITCE capability and
if it is supported, QEMU adds the "call-multi-tce" property to hypertas
list which triggers the guest to use H_PUT_TCE_INDIRECT and H_STUFF_TCE
instead of H_PUT_TCE.

Signed-off-by: Alexey Kardashevskiy 

Conflicts:
hw/spapr_iommu.c
linux-headers/linux/kvm.h


Try to remember to remove the conflict messages before you send out.


---
  hw/spapr.c|   12 ++--
  hw/spapr_iommu.c  |   71 +
  linux-headers/linux/kvm.h |1 +
  3 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 2ec0cd0..231a7b6 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -233,6 +233,9 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
  CPUPPCState *env;
  uint32_t start_prop = cpu_to_be32(initrd_base);
  uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
+char hypertas_propm[] = 
"hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
+"\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk"
+"\0hcall-multi-tce";
  char hypertas_prop[] = 
"hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
  "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk";
  char qemu_hypertas_prop[] = "hcall-memop1";
@@ -406,8 +409,13 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
  /* RTAS */
  _FDT((fdt_begin_node(fdt, "rtas")));

-_FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
-   sizeof(hypertas_prop;
+if (kvm_check_extension(kvm_state, KVM_CAP_PPC_MULTITCE)) {
+_FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_propm,
+   sizeof(hypertas_propm;
+} else {
+_FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
+   sizeof(hypertas_prop;
+}


You've implemented the multitce hypercalls in qemu, but because of the
kvm capability check, you'll never advertise them in full emu.
Instead you should always advertise them as available, and the kvm
capability will just be a question of whether they go fast (through
kvm) or slow (through qemu).


So we do not need the KVM_CAP_PPC_MULTITCE capability at all as we are not 
going to support real mode without multi-tce support in the host kernel, is 
that correct?




--
Alexey

[Qemu-devel] CPU scheduling with TCG in SMP system emulation mode

2013-02-21 Thread Max Filippov

Hello.

Do I understand it right that there's no dedicated mechanism
other than icount that would switch current CPU in emulated
SMP system and that in the absence of icount such scheduling
is a side effect of interrupt delivery to the current CPU?

-- 
Thanks.
-- Max

Re: [Qemu-devel] [PATCH] seabios: Add a dummy PCI slot to irq mapping function

2013-02-21 Thread Kevin O'Connor

On Thu, Feb 21, 2013 at 09:12:23AM -0700, Alex Williamson wrote:
> This should never get called, but if we somehow get a new chipset
> that fails to implement their own pci_slot_get_irq function, fail
> gracefully and add a debug log message.

Thanks.  I pushed this and the second part of your previous patch
series.

-Kevin

Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/3] powerpc iommu: multiple TCE requests enabled

2013-02-21 Thread David Gibson

On Tue, Feb 19, 2013 at 06:43:35PM +1100, Alexey Kardashevskiy wrote:
> Currently only single TCE entry per requiest is supported (H_PUT_TCE).
> However PAPR+ specification allows multiple entry requests such as
> H_PUT_TCE_INDIRECT and H_STUFFF_TCE. Having less transitions to the host
> kernel via ioctls, support of these calls can accelerate IOMMU operations.
> 
> The patch adds a check for the KVM_CAP_PPC_MULTITCE capability and
> if it is supported, QEMU adds the "call-multi-tce" property to hypertas
> list which triggers the guest to use H_PUT_TCE_INDIRECT and H_STUFF_TCE
> instead of H_PUT_TCE.
> 
> Signed-off-by: Alexey Kardashevskiy 
> 
> Conflicts:
>   hw/spapr_iommu.c
>   linux-headers/linux/kvm.h

Try to remember to remove the conflict messages before you send out.

> ---
>  hw/spapr.c|   12 ++--
>  hw/spapr_iommu.c  |   71 
> +
>  linux-headers/linux/kvm.h |1 +
>  3 files changed, 82 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/spapr.c b/hw/spapr.c
> index 2ec0cd0..231a7b6 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -233,6 +233,9 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
>  CPUPPCState *env;
>  uint32_t start_prop = cpu_to_be32(initrd_base);
>  uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
> +char hypertas_propm[] = 
> "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
> +"\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk"
> +"\0hcall-multi-tce";
>  char hypertas_prop[] = 
> "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
>  "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk";
>  char qemu_hypertas_prop[] = "hcall-memop1";
> @@ -406,8 +409,13 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
>  /* RTAS */
>  _FDT((fdt_begin_node(fdt, "rtas")));
>  
> -_FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
> -   sizeof(hypertas_prop;
> +if (kvm_check_extension(kvm_state, KVM_CAP_PPC_MULTITCE)) {
> +_FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_propm,
> +   sizeof(hypertas_propm;
> +} else {
> +_FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
> +   sizeof(hypertas_prop;
> +}

You've implemented the multitce hypercalls in qemu, but because of the
kvm capability check, you'll never advertise them in full emu.
Instead you should always advertise them as available, and the kvm
capability will just be a question of whether they go fast (through
kvm) or slow (through qemu).

>  _FDT((fdt_property(fdt, "qemu,hypertas-functions", qemu_hypertas_prop,
> sizeof(qemu_hypertas_prop;
>  
> diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c
> index 5904c1c..94630c1 100644
> --- a/hw/spapr_iommu.c
> +++ b/hw/spapr_iommu.c
> @@ -234,6 +234,75 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, 
> target_ulong ioba,
>  return H_SUCCESS;
>  }
>  
> +static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
> +   sPAPREnvironment *spapr,
> +   target_ulong opcode, target_ulong 
> *args)
> +{
> +int i;
> +target_ulong liobn = args[0];
> +target_ulong ioba = args[1];
> +target_ulong tce_list = args[2];
> +target_ulong npages = args[3];
> +target_ulong *tces, ret = 0;
> +sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
> +
> +if (liobn & 0xULL) {
> +hcall_dprintf("spapr_vio_put_tce on out-of-boundsw LIOBN "
> +  TARGET_FMT_lx "\n", liobn);
> +return H_PARAMETER;
> +}
> +
> +tces = (target_ulong *) qemu_get_ram_ptr(tce_list & 
> ~SPAPR_TCE_PAGE_MASK);
> +
> +if (tcet) {
> +for (i = 0; (i < npages) && !ret; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
> +ret = put_tce_emu(tcet, ioba, tces[i]);
> +}
> +return ret;

The !ret in the loop condition is really easy to miss.  I'd prefer an
explicit if (ret != 0) within the loop, even though it's longer.

> +}
> +#ifdef DEBUG_TCE
> +fprintf(stderr, "%s on liobn=" TARGET_FMT_lx /*%s*/
> +"  ioba 0x" TARGET_FMT_lx "  TCE 0x" TARGET_FMT_lx "\n",
> +__func__, liobn, /*dev->qdev.id, */ioba, tce);
> +#endif
> +
> +return H_PARAMETER;
> +}
> +
> +static target_ulong h_stuff_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +  target_ulong opcode, target_ulong *args)
> +{
> +int i;
> +target_ulong liobn = args[0];
> +target_ulong ioba = args[1];
> +target_ulong tce_value = args[2];
> +target_ulong npages = args[3];
> +target_ulong ret = 0;
> +sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
> +
> +if (liobn & 0xULL) {
> +hcall_dprintf("spapr_vio_put_tce on out-of-boundsw LIOBN "
> +

Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread mdroth

On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote:
> Il 21/02/2013 22:07, mdroth ha scritto:
> >> > 
> >> > 100% agree.  In particular hw/dataplane/event-poll.c should be the first
> >> > to go away, but AioContext provides the functionality that Ping Fan
> >> > needs.  But hw/dataplane/vring.c will probably be here for a longer
> > Has there been any discussion around introducing something similar to
> > AioContexts for fd handlers? This would avoid the dataplane-specific hooks
> > needed for NetClients in this series.
> 
> AioContext can include file descriptors on POSIX systems (used for NBD
> and other network backends), see aio_set_fd_handler.

Sorry, was using "fd handlers" too generally. I mean specifically for
the qemu_set_fd_handler interfaces, where we currently rely on a single list
of IOHandlerRecords for registration and a single loop to drive them. Would
be nice if we could drive subsets of those via mini main loops, similar to the
way dataplane threads would with a particular AioContext via aio_poll (or 
perhaps
the exact same way)

Currently, Ping Fan's patches basically do this already by accessing a
global reference to the vnet worker thread and attaching events/handlers to
it's event loop via a new set of registration functions (PATCH 7).

I think generalizing this by basing qemu_set_fd_handler() around
AioContext, or something similar, would help to extend support to other
NetClient implementations without requiring dataplane-specific hooks
throughout.

> 
> Paolo
>

[Qemu-devel] Build failure with -werror on i386

2013-02-21 Thread David Holsgrove

Configuring QEMU as

./configure --target-list=i386-softmmu --cpu=i386 --enable-werror

results in following error

cc1: warnings being treated as errors
qemu-char.c: In function 'qmp_ringbuf_write':
qemu-char.c:2764: error: passing argument 2 of 'g_base64_decode' from 
incompatible pointer type
/usr/include/glib-2.0/glib/gbase64.h:49: note: expected 'gsize *' but argument 
is of type 'size_t *'

A git-blame (not a bisect) seems to indicate this was introduced with the 
following commit.

commit 1f590cf9455c571799d1bfc0777255fa0796d4da
Author: Lei Li 
Date:   Fri Jan 25 00:03:20 2013 +0800

QAPI: Introduce memchar-write QMP command

Signed-off-by: Lei Li 
Signed-off-by: Luiz Capitulino 

This was produced on an Ubuntu 10.04 x86_64 machine.

regards,
David

Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle

2013-02-21 Thread mdroth

On Thu, Feb 21, 2013 at 06:16:10PM +, Matthew Anderson wrote:
> If this isn't the correct list just let me know,
> 
> I've run into a bug whereby a Windows guest (tested on Server 2008R2 and 
> 2012) no longer receives RTC ticks when it has been idle for a random amount 
> of time. HPET is disabled and the guest is running Hyper-V relaxed timers 
> (same situation without hv_relaxed). The guest clock stands still and the 
> qemu process uses very little CPU (<0.5%, normally it's >5% when the guest is 
> idle) . Eventually the guest stops responding to network requests but if you 
> open the guest console via VNC and move the mouse around it comes back to 
> life and QEMU replays the lost RTC ticks and the guest recovers. I've also 
> been able to make it recover by querying the clock over the network via the 
> net time command, you can see the clock stand still for 30 seconds then it 
> replays the ticks and catches up.
> 
> I've tried to reproduce the issue but it seems fairly illusive, the only way 
> I've been able to reproduce it is by letting the VM's idle and waiting. 
> Sometimes it's hours and sometimes minutes. Can anyone suggest a way to 
> narrow the issue down?
> 
> Qemu command line is-
> /usr/bin/kvm -name SQL01 -S -M pc-0.14 -cpu qemu64,hv_relaxed -enable-kvm -m 
> 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid 
> 5f54333b-c250-aa72-c979-39d156814b85 -no-user-config -nodefaults -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/iHost-SQL01.monitor,server,nowait
>  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
> -no-hpet -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 
> -drive 
> file=/mnt/gluster1-norep/iHost/SQL01.qed,if=none,id=drive-virtio-disk0,format=qed,cache=writeback
>  -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0
>  -drive 
> file=/mnt/gluster1-norep/iHost/SQL01-Data.qed,if=none,id=drive-virtio-disk2,format=qed,cache=writeback
>  -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk2,id=virtio-disk2
>  -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device 
> ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev 
> tap,fd=29,id=hostnet0,vhost=on,vhostfd=39 -device 
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2c:8d:23,bus=pci.0,addr=0x3
>  -chardev pty,id=charserial0 -device 
> isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 
> 127.0.0.1:22 -vga cirrus -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
> 
> Environment is -
> Mainline 3.7.5 and 3.8.0
> Qemu 1.2.2, 1.3.1 and 1.4.0

Were all of these with -M pc-0.14? Only thing that stands out to me is
kernel_irqchip being disabled in your case. -M 1.1 and higher will
enable it by default. Worth a shot.

> Scientific Linux 6.3
> KSM enabled, transparent hugepages disabled.
> Dual Xeon 5650
> 192GB
> 
> Thanks all
>

[Qemu-devel] GTK UI is now the default

2013-02-21 Thread Anthony Liguori


Since this is a pretty visible change for a lot of people, I thought I'd
send a top level note.  The GTK UI is now committed and is the default
UI provided it's available.

For anyone counting, it's been a little more than 7 years in the making:

http://article.gmane.org/gmane.comp.emulators.qemu/9726

If you want to try it, make sure you have the gtk-2.0 development
packages installed and the VTE development packages.

Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH] net: reduce the unnecessary memory allocation of multiqueue

2013-02-21 Thread Anthony Liguori

Jason Wang  writes:

> Edivaldo reports a problem that the array of NetClientState in NICState is too
> large - MAX_QUEUE_NUM(1024) which will wastes memory even if multiqueue is not
> used.
>
> Instead of static arrays, solving this issue by allocating the queues on 
> demand
> for both the NetClientState array in NICState and VirtIONetQueue array in
> VirtIONet.
>
> Tested by myself, with single virtio-net-pci device. The memory allocation is
> almost the same as when multiqueue is not merged.
>
> Cc: Edivaldo de Araujo Pereira 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Jason Wang 
> ---
>  hw/virtio-net.c   |6 --
>  include/net/net.h |2 +-
>  net/net.c |   19 +--
>  3 files changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index 573c669..70ab641 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -44,7 +44,7 @@ typedef struct VirtIONet
>  VirtIODevice vdev;
>  uint8_t mac[ETH_ALEN];
>  uint16_t status;
> -VirtIONetQueue vqs[MAX_QUEUE_NUM];
> +VirtIONetQueue *vqs;
>  VirtQueue *ctrl_vq;
>  NICState *nic;
>  uint32_t tx_timeout;
> @@ -1326,8 +1326,9 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf 
> *conf,
>  n->vdev.set_status = virtio_net_set_status;
>  n->vdev.guest_notifier_mask = virtio_net_guest_notifier_mask;
>  n->vdev.guest_notifier_pending = virtio_net_guest_notifier_pending;
> +n->max_queues = MAX(conf->queues, 1);

I think you mean MIN here.

> +n->vqs = g_malloc0(sizeof(VirtIONetQueue) * n->max_queues);
>  n->vqs[0].rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
> -n->max_queues = conf->queues;
>  n->curr_queues = 1;
>  n->vqs[0].n = n;
>  n->tx_timeout = net->txtimer;
> @@ -1397,6 +1398,7 @@ void virtio_net_exit(VirtIODevice *vdev)
>  
>  g_free(n->mac_table.macs);
>  g_free(n->vlans);
> +g_free(n->vqs);
>  
>  for (i = 0; i < n->max_queues; i++) {
>  VirtIONetQueue *q = &n->vqs[i];
> diff --git a/include/net/net.h b/include/net/net.h
> index 43a045e..cb049a1 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -72,7 +72,7 @@ struct NetClientState {
>  };
>  
>  typedef struct NICState {
> -NetClientState ncs[MAX_QUEUE_NUM];
> +NetClientState *ncs;
>  NICConf *conf;
>  void *opaque;
>  bool peer_deleted;
> diff --git a/net/net.c b/net/net.c
> index be03a8d..6262ed0 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -235,23 +235,20 @@ NICState *qemu_new_nic(NetClientInfo *info,
> const char *name,
> void *opaque)
>  {
> -NetClientState *nc;
>  NetClientState **peers = conf->peers.ncs;
>  NICState *nic;
> -int i;
> +int i, queues = MAX(1, conf->queues);

And here.  This patch is what had created problems for me.

Regards,

Anthony Liguori

>  
>  assert(info->type == NET_CLIENT_OPTIONS_KIND_NIC);
>  assert(info->size >= sizeof(NICState));
>  
> -nc = qemu_new_net_client(info, peers[0], model, name);
> -nc->queue_index = 0;
> -
> -nic = qemu_get_nic(nc);
> +nic = g_malloc0(info->size + sizeof(NetClientState) * queues);
> +nic->ncs = (void *)nic + info->size;
>  nic->conf = conf;
>  nic->opaque = opaque;
>  
> -for (i = 1; i < conf->queues; i++) {
> -qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, nc->name,
> +for (i = 0; i < queues; i++) {
> +qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, name,
>NULL);
>  nic->ncs[i].queue_index = i;
>  }
> @@ -261,7 +258,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
>  
>  NetClientState *qemu_get_subqueue(NICState *nic, int queue_index)
>  {
> -return &nic->ncs[queue_index];
> +return nic->ncs + queue_index;
>  }
>  
>  NetClientState *qemu_get_queue(NICState *nic)
> @@ -273,7 +270,7 @@ NICState *qemu_get_nic(NetClientState *nc)
>  {
>  NetClientState *nc0 = nc - nc->queue_index;
>  
> -return DO_UPCAST(NICState, ncs[0], nc0);
> +return (NICState *)((void *)nc0 - nc->info->size);
>  }
>  
>  void *qemu_get_nic_opaque(NetClientState *nc)
> @@ -368,6 +365,8 @@ void qemu_del_nic(NICState *nic)
>  qemu_cleanup_net_client(nc);
>  qemu_free_net_client(nc);
>  }
> +
> +g_free(nic);
>  }
>  
>  void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
> -- 
> 1.7.1

Re: [Qemu-devel] [PATCH v4 0/6] Efficient VM backup for qemu

2013-02-21 Thread Paolo Bonzini

Il 21/02/2013 16:56, Dietmar Maurer ha scritto:
>> > In _your_ use case. That means that we should support using non-seekable
>> > pipes, but it doesn't mean that any other use case is irrelevant. In fact 
>> > I think
>> > backing up to a normal raw or qcow2 image file is the interesting case for 
>> > the
>> > majority of people.
> VMs can have more than one disk. How do you backup them into a single raw or 
> qcow2 image?
> And where do you store the configuration inside a qcow2 file?

You use 1 directory per backup.

Paolo

Re: [Qemu-devel] [PATCH] fix wrong output with 'info chardev' for tcp socket.

2013-02-21 Thread mdroth

On Fri, Feb 22, 2013 at 12:29:44AM +0400, Michael Tokarev wrote:
> 22.02.2013 00:20, Serge E. Hallyn wrote:
> > The snprintf format isn't taking into account the new 'left' and
> > 'right' variables (for ipv6 []) when placing the ':', which should
> > go immediately before the port.
> 
> This fixes actual isse (also found by Serge), where `info chardev'
> prints `tcp:127.0.0.1,server,nowait' for a monitor running on port
> .
> 
> This is definitely a stable material (CCed).
> 
> Reviewed-by: Michael Tokarev 

Reviewed-by: Michael Roth 

> 
> Thanks!
> 
> /mjt
> 
> > Signed-off-by: Serge Hallyn 
> > ---
> >  qemu-char.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/qemu-char.c b/qemu-char.c
> > index e4b0f53..3e152e1 100644
> > --- a/qemu-char.c
> > +++ b/qemu-char.c
> > @@ -2482,7 +2482,7 @@ static CharDriverState *qemu_chr_open_socket_fd(int 
> > fd, bool do_nodelay,
> >  s->do_nodelay = do_nodelay;
> >  getnameinfo((struct sockaddr *) &ss, ss_len, host, sizeof(host),
> >  serv, sizeof(serv), NI_NUMERICHOST | NI_NUMERICSERV);
> > -snprintf(chr->filename, 256, "%s:%s:%s%s%s%s",
> > +snprintf(chr->filename, 256, "%s:%s%s%s:%s%s",
> >   is_telnet ? "telnet" : "tcp",
> >   left, host, right, serv,
> >   is_listen ? ",server" : "");
> > 
> 
>

Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread Paolo Bonzini

Il 21/02/2013 22:07, mdroth ha scritto:
>> > 
>> > 100% agree.  In particular hw/dataplane/event-poll.c should be the first
>> > to go away, but AioContext provides the functionality that Ping Fan
>> > needs.  But hw/dataplane/vring.c will probably be here for a longer
> Has there been any discussion around introducing something similar to
> AioContexts for fd handlers? This would avoid the dataplane-specific hooks
> needed for NetClients in this series.

AioContext can include file descriptors on POSIX systems (used for NBD
and other network backends), see aio_set_fd_handler.

Paolo

Re: [Qemu-devel] [PATCH] dataplane: remove EventPoll in favor of AioContext

2013-02-21 Thread Paolo Bonzini

Il 21/02/2013 22:32, mdroth ha scritto:
> On Thu, Feb 21, 2013 at 05:29:55PM +0100, Paolo Bonzini wrote:
>> During the review of the dataplane code, the EventPoll API morphed itself
>> (not concidentially) into something very very similar to an AioContext.
>> Thus, it is trivial to convert virtio-blk-dataplane to use AioContext,
>> and a first baby step towards letting dataplane talk directly to the
>> QEMU block layer.
>>
>> The only interesting note is the value-copy of EventNotifiers.  At least
>> in my opinion this is part of the EventNotifier API and is even portable
>> to Windows.  Of course, in this case you should not close the notifier's
>> underlying file descriptors or handle with event_notifier_cleanup.
>>
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  hw/dataplane/Makefile.objs |   2 +-
>>  hw/dataplane/event-poll.c  | 100 
>> -
>>  hw/dataplane/event-poll.h  |  40 --
>>  hw/dataplane/virtio-blk.c  |  41 ++-
>>  4 files changed, 22 insertions(+), 161 deletions(-)
>>  delete mode 100644 hw/dataplane/event-poll.c
>>  delete mode 100644 hw/dataplane/event-poll.h
>>
>> diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
>> index 3e47d05..70c 100644
>> --- a/hw/dataplane/Makefile.objs
>> +++ b/hw/dataplane/Makefile.objs
>> @@ -1 +1 @@
>> -obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o ioq.o 
>> virtio-blk.o
>> +obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o ioq.o virtio-blk.o
>> diff --git a/hw/dataplane/event-poll.c b/hw/dataplane/event-poll.c
>> deleted file mode 100644
>> index 2b55c6e..000
>> --- a/hw/dataplane/event-poll.c
>> +++ /dev/null
>> @@ -1,100 +0,0 @@
>> -/*
>> - * Event loop with file descriptor polling
>> - *
>> - * Copyright 2012 IBM, Corp.
>> - * Copyright 2012 Red Hat, Inc. and/or its affiliates
>> - *
>> - * Authors:
>> - *   Stefan Hajnoczi 
>> - *
>> - * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> - * See the COPYING file in the top-level directory.
>> - *
>> - */
>> -
>> -#include 
>> -#include "hw/dataplane/event-poll.h"
>> -
>> -/* Add an event notifier and its callback for polling */
>> -void event_poll_add(EventPoll *poll, EventHandler *handler,
>> -EventNotifier *notifier, EventCallback *callback)
>> -{
>> -struct epoll_event event = {
>> -.events = EPOLLIN,
>> -.data.ptr = handler,
>> -};
>> -handler->notifier = notifier;
>> -handler->callback = callback;
>> -if (epoll_ctl(poll->epoll_fd, EPOLL_CTL_ADD,
>> -  event_notifier_get_fd(notifier), &event) != 0) {
>> -fprintf(stderr, "failed to add event handler to epoll: %m\n");
>> -exit(1);
>> -}
>> -}
>> -
>> -/* Event callback for stopping event_poll() */
>> -static void handle_stop(EventHandler *handler)
>> -{
>> -/* Do nothing */
>> -}
>> -
>> -void event_poll_init(EventPoll *poll)
>> -{
>> -/* Create epoll file descriptor */
>> -poll->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
>> -if (poll->epoll_fd < 0) {
>> -fprintf(stderr, "epoll_create1 failed: %m\n");
>> -exit(1);
>> -}
>> -
>> -/* Set up stop notifier */
>> -if (event_notifier_init(&poll->stop_notifier, 0) < 0) {
>> -fprintf(stderr, "failed to init stop notifier\n");
>> -exit(1);
>> -}
>> -event_poll_add(poll, &poll->stop_handler,
>> -   &poll->stop_notifier, handle_stop);
>> -}
>> -
>> -void event_poll_cleanup(EventPoll *poll)
>> -{
>> -event_notifier_cleanup(&poll->stop_notifier);
>> -close(poll->epoll_fd);
>> -poll->epoll_fd = -1;
>> -}
>> -
>> -/* Block until the next event and invoke its callback */
>> -void event_poll(EventPoll *poll)
>> -{
>> -EventHandler *handler;
>> -struct epoll_event event;
>> -int nevents;
>> -
>> -/* Wait for the next event.  Only do one event per call to keep the
>> - * function simple, this could be changed later. */
>> -do {
>> -nevents = epoll_wait(poll->epoll_fd, &event, 1, -1);
>> -} while (nevents < 0 && errno == EINTR);
>> -if (unlikely(nevents != 1)) {
>> -fprintf(stderr, "epoll_wait failed: %m\n");
>> -exit(1); /* should never happen */
>> -}
>> -
>> -/* Find out which event handler has become active */
>> -handler = event.data.ptr;
>> -
>> -/* Clear the eventfd */
>> -event_notifier_test_and_clear(handler->notifier);
> 
> Wouldn't we need to move this into the handle_io/handle_notify to maintain the
> old behavior?

Yes, thanks.

Paolo

Re: [Qemu-devel] [PATCH] slirp: fixed potential use-after-free of a socket

2013-02-21 Thread mdroth

On Thu, Feb 21, 2013 at 03:47:25PM -0600, mdroth wrote:
> On Fri, Feb 15, 2013 at 12:00:13PM +0100, Vitaly Chipounov wrote:
> > A socket may still have references to it in various queues
> > at the time it is freed, causing memory corruptions.
> > 
> > Signed-off-by: Vitaly Chipounov 

Meant to cc qemu-stable

> > ---
> >  slirp/socket.c |   29 +
> >  1 file changed, 29 insertions(+)
> > 
> > diff --git a/slirp/socket.c b/slirp/socket.c
> > index 77b0c98..8a7adc8 100644
> > --- a/slirp/socket.c
> > +++ b/slirp/socket.c
> > @@ -55,6 +55,33 @@ socreate(Slirp *slirp)
> >return(so);
> >  }
> > 
> > +/**
> > + * It may happen that a socket is still referenced in various
> > + * mbufs at the time it is freed. Clear all references to the
> > + * socket here.
> > + */
> > +static void soremovefromqueues(struct socket *so)
> > +{
> > +if (!so->so_queued) {
> > +return;
> > +}
> > +
> > +Slirp *slirp = so->slirp;
> > +struct mbuf *ifm;
> 
> Declarations should come at the beginning of a block/function (see: ...
> er, I could've sworn it was in HACKING or CODING_STYLE but maybe not.
> That's the standard in any case)
> 
> > +
> > +for (ifm = slirp->if_fastq.ifq_next; ifm != &slirp->if_fastq; ifm = 
> > ifm->ifq_next) {
> > +if (ifm->ifq_so == so) {
> > +ifm->ifq_so = NULL;
> > +}
> > +}
> > +
> > +for (ifm = slirp->if_batchq.ifq_next; ifm != &slirp->if_batchq; ifm = 
> > ifm->ifq_next) {
> > +if (ifm->ifq_so == so) {
> > +ifm->ifq_so = NULL;
> > +}
> > +}
> 
> Is the intention just to mark them null or actually remove? Not sure
> which, but either the logic should change or the function name should
> (soinvalidate() or something along that line if the former).
> 
> > +}
> > +
> >  /*
> >   * remque and free a socket, clobber cache
> >   */
> > @@ -79,6 +106,8 @@ sofree(struct socket *so)
> >if(so->so_next && so->so_prev)
> >  remque(so);  /* crashes if so is not in a queue */
> > 
> > +  soremovefromqueues(so);
> > +
> >free(so);
> >  }
> > 
> > -- 
> > 1.7.10
> > 
> >

Re: [Qemu-devel] [PATCH] slirp: fixed potential use-after-free of a socket

2013-02-21 Thread mdroth

On Fri, Feb 15, 2013 at 12:00:13PM +0100, Vitaly Chipounov wrote:
> A socket may still have references to it in various queues
> at the time it is freed, causing memory corruptions.
> 
> Signed-off-by: Vitaly Chipounov 
> ---
>  slirp/socket.c |   29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/slirp/socket.c b/slirp/socket.c
> index 77b0c98..8a7adc8 100644
> --- a/slirp/socket.c
> +++ b/slirp/socket.c
> @@ -55,6 +55,33 @@ socreate(Slirp *slirp)
>return(so);
>  }
> 
> +/**
> + * It may happen that a socket is still referenced in various
> + * mbufs at the time it is freed. Clear all references to the
> + * socket here.
> + */
> +static void soremovefromqueues(struct socket *so)
> +{
> +if (!so->so_queued) {
> +return;
> +}
> +
> +Slirp *slirp = so->slirp;
> +struct mbuf *ifm;

Declarations should come at the beginning of a block/function (see: ...
er, I could've sworn it was in HACKING or CODING_STYLE but maybe not.
That's the standard in any case)

> +
> +for (ifm = slirp->if_fastq.ifq_next; ifm != &slirp->if_fastq; ifm = 
> ifm->ifq_next) {
> +if (ifm->ifq_so == so) {
> +ifm->ifq_so = NULL;
> +}
> +}
> +
> +for (ifm = slirp->if_batchq.ifq_next; ifm != &slirp->if_batchq; ifm = 
> ifm->ifq_next) {
> +if (ifm->ifq_so == so) {
> +ifm->ifq_so = NULL;
> +}
> +}

Is the intention just to mark them null or actually remove? Not sure
which, but either the logic should change or the function name should
(soinvalidate() or something along that line if the former).

> +}
> +
>  /*
>   * remque and free a socket, clobber cache
>   */
> @@ -79,6 +106,8 @@ sofree(struct socket *so)
>if(so->so_next && so->so_prev)
>  remque(so);  /* crashes if so is not in a queue */
> 
> +  soremovefromqueues(so);
> +
>free(so);
>  }
> 
> -- 
> 1.7.10
> 
>

[Qemu-devel] [PATCH] xhci: fix bad print specifier

2013-02-21 Thread Hervé Poussineau

This fixes the following compilation error:
hw/usb/hcd-xhci.c:1156:17: error: format ‘%llx’ expects argument of type
‘long long unsigned int’, but argument 4 has type ‘unsigned int’

Signed-off-by: Hervé Poussineau 
---
 hw/usb/hcd-xhci.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 5796102..07afdee 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -1152,8 +1152,8 @@ static XHCIStreamContext *xhci_find_stream(XHCIEPContext 
*epctx,
 
 if (sctx->sct == -1) {
 xhci_dma_read_u32s(epctx->xhci, sctx->pctx, ctx, sizeof(ctx));
-fprintf(stderr, "%s: init sctx #%d @ %lx: %08x %08x\n", __func__,
-streamid, sctx->pctx, ctx[0], ctx[1]);
+fprintf(stderr, "%s: init sctx #%d @ " DMA_ADDR_FMT ": %08x %08x\n",
+__func__, streamid, sctx->pctx, ctx[0], ctx[1]);
 sct = (ctx[0] >> 1) & 0x07;
 if (epctx->lsa && sct != 1) {
 *cc_error = CC_INVALID_STREAM_TYPE_ERROR;
-- 
1.7.10.4

[Qemu-devel] [PATCH v2] vl.c: Support multiple CPU ranges on -numa option

2013-02-21 Thread Eduardo Habkost

This allows ":" to be used a separator between each CPU range, so the
command-line may look like:

  -numa node,cpus=A-B:C-D

Note that the following format, currently used by libvirt:

  -numa nodes,cpus=A-B,C-D

will _not_ work, as "," is the option separator for the command-line
option parser, and it would require changing the -numa option parsing
code to handle "cpus" as a special case.

Signed-off-by: Eduardo Habkost 
---
Changes v2:
 - Use ":" as separator
 - Document the new format
---
 qemu-options.hx | 10 --
 vl.c| 14 +-
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 4bc9c85..b135044 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -95,12 +95,18 @@ specifies the maximum number of hotpluggable CPUs.
 ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
-"-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
+"-numa node[,mem=size][,cpus=cpu[-cpu][:...]][,nodeid=node]\n", 
QEMU_ARCH_ALL)
 STEXI
 @item -numa @var{opts}
 @findex -numa
 Simulate a multi node NUMA system. If mem and cpus are omitted, resources
-are split equally.
+are split equally. Multiple CPU ranges in the "cpus" option may be separated
+by colons. e.g.:
+
+@example
+-numa node,mem=1024,cpus=0-1:4-5:8-9
+-numa node,mem=1024,cpus=2-3:6-7:10-11
+@end example
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
diff --git a/vl.c b/vl.c
index 955d2ff..fe632db 100644
--- a/vl.c
+++ b/vl.c
@@ -1244,7 +1244,7 @@ char *get_boot_devices_list(size_t *size)
 return list;
 }
 
-static void numa_node_parse_cpus(int nodenr, const char *cpus)
+static void numa_node_parse_cpu_range(int nodenr, const char *cpus)
 {
 char *endptr;
 unsigned long long value, endvalue;
@@ -1288,6 +1288,18 @@ error:
 exit(1);
 }
 
+static void numa_node_parse_cpus(int nodenr, const char *option)
+{
+char **parts;
+int i;
+
+parts = g_strsplit(option, ":", 0);
+for (i = 0; parts[i]; i++) {
+numa_node_parse_cpu_range(nodenr, parts[i]);
+}
+g_strfreev(parts);
+}
+
 static void numa_add(const char *optarg)
 {
 char option[128];
-- 
1.8.1

Re: [Qemu-devel] [PATCH] dataplane: remove EventPoll in favor of AioContext

2013-02-21 Thread mdroth

On Thu, Feb 21, 2013 at 05:29:55PM +0100, Paolo Bonzini wrote:
> During the review of the dataplane code, the EventPoll API morphed itself
> (not concidentially) into something very very similar to an AioContext.
> Thus, it is trivial to convert virtio-blk-dataplane to use AioContext,
> and a first baby step towards letting dataplane talk directly to the
> QEMU block layer.
> 
> The only interesting note is the value-copy of EventNotifiers.  At least
> in my opinion this is part of the EventNotifier API and is even portable
> to Windows.  Of course, in this case you should not close the notifier's
> underlying file descriptors or handle with event_notifier_cleanup.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/dataplane/Makefile.objs |   2 +-
>  hw/dataplane/event-poll.c  | 100 
> -
>  hw/dataplane/event-poll.h  |  40 --
>  hw/dataplane/virtio-blk.c  |  41 ++-
>  4 files changed, 22 insertions(+), 161 deletions(-)
>  delete mode 100644 hw/dataplane/event-poll.c
>  delete mode 100644 hw/dataplane/event-poll.h
> 
> diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
> index 3e47d05..70c 100644
> --- a/hw/dataplane/Makefile.objs
> +++ b/hw/dataplane/Makefile.objs
> @@ -1 +1 @@
> -obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o ioq.o 
> virtio-blk.o
> +obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o ioq.o virtio-blk.o
> diff --git a/hw/dataplane/event-poll.c b/hw/dataplane/event-poll.c
> deleted file mode 100644
> index 2b55c6e..000
> --- a/hw/dataplane/event-poll.c
> +++ /dev/null
> @@ -1,100 +0,0 @@
> -/*
> - * Event loop with file descriptor polling
> - *
> - * Copyright 2012 IBM, Corp.
> - * Copyright 2012 Red Hat, Inc. and/or its affiliates
> - *
> - * Authors:
> - *   Stefan Hajnoczi 
> - *
> - * This work is licensed under the terms of the GNU GPL, version 2 or later.
> - * See the COPYING file in the top-level directory.
> - *
> - */
> -
> -#include 
> -#include "hw/dataplane/event-poll.h"
> -
> -/* Add an event notifier and its callback for polling */
> -void event_poll_add(EventPoll *poll, EventHandler *handler,
> -EventNotifier *notifier, EventCallback *callback)
> -{
> -struct epoll_event event = {
> -.events = EPOLLIN,
> -.data.ptr = handler,
> -};
> -handler->notifier = notifier;
> -handler->callback = callback;
> -if (epoll_ctl(poll->epoll_fd, EPOLL_CTL_ADD,
> -  event_notifier_get_fd(notifier), &event) != 0) {
> -fprintf(stderr, "failed to add event handler to epoll: %m\n");
> -exit(1);
> -}
> -}
> -
> -/* Event callback for stopping event_poll() */
> -static void handle_stop(EventHandler *handler)
> -{
> -/* Do nothing */
> -}
> -
> -void event_poll_init(EventPoll *poll)
> -{
> -/* Create epoll file descriptor */
> -poll->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
> -if (poll->epoll_fd < 0) {
> -fprintf(stderr, "epoll_create1 failed: %m\n");
> -exit(1);
> -}
> -
> -/* Set up stop notifier */
> -if (event_notifier_init(&poll->stop_notifier, 0) < 0) {
> -fprintf(stderr, "failed to init stop notifier\n");
> -exit(1);
> -}
> -event_poll_add(poll, &poll->stop_handler,
> -   &poll->stop_notifier, handle_stop);
> -}
> -
> -void event_poll_cleanup(EventPoll *poll)
> -{
> -event_notifier_cleanup(&poll->stop_notifier);
> -close(poll->epoll_fd);
> -poll->epoll_fd = -1;
> -}
> -
> -/* Block until the next event and invoke its callback */
> -void event_poll(EventPoll *poll)
> -{
> -EventHandler *handler;
> -struct epoll_event event;
> -int nevents;
> -
> -/* Wait for the next event.  Only do one event per call to keep the
> - * function simple, this could be changed later. */
> -do {
> -nevents = epoll_wait(poll->epoll_fd, &event, 1, -1);
> -} while (nevents < 0 && errno == EINTR);
> -if (unlikely(nevents != 1)) {
> -fprintf(stderr, "epoll_wait failed: %m\n");
> -exit(1); /* should never happen */
> -}
> -
> -/* Find out which event handler has become active */
> -handler = event.data.ptr;
> -
> -/* Clear the eventfd */
> -event_notifier_test_and_clear(handler->notifier);

Wouldn't we need to move this into the handle_io/handle_notify to maintain the
old behavior?

Re: [Qemu-devel] [PATCH 24028/24028] Evaluate breakpoint condition on target.Mechanism: translate gdb bytecode to TCG code and add to the translation block. Most of code is located in the new file tra

2013-02-21 Thread Peter Maydell

On 21 February 2013 14:02, Anna Neiman  wrote:
> +#if defined(TARGET_ARM)
> +cpu_get_reg_var_func = cpu_get_reg_var_arm;
> +#else
> +cpu_get_reg_var_func = 0;
> +#endif

No new for-each-target #ifdef ladders, please.
Consider also whether you can abstract away some of the
TARGET_LONG ifdefs.

You need to look up what the C spec says about local variable
values after longjmp() :-)

Some of this patch is touching parts of QEMU which are critical
and hard to understand, so you'll need to provide good patch
commit messages to convince us that your changes are correct
and safe.

Strong agreement with other review comments about splitting the
patch and that just exposing every target's cpu_env local variable
is a really bad idea.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread mdroth

On Thu, Feb 21, 2013 at 06:05:44PM +0100, Paolo Bonzini wrote:
> Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto:
> > Interesting patch series.  I want to share my thoughts on the status of
> > dataplane and what I'm currently working on.  There will probably be
> > some overlap that we can coordinate.
> > 
> > First, I want to eventually delete hw/dataplane/ :).  That is because
> > dataplane duplicates an event loop, thread-friendly guest RAM access,
> > and virtio.  QEMU already has all this functionality except it's not
> > thread-friendly.
> > 
> > Because hw/dataplane/ will eventually go away we should avoid adding new
> > code where possible.
> 
> 100% agree.  In particular hw/dataplane/event-poll.c should be the first
> to go away, but AioContext provides the functionality that Ping Fan
> needs.  But hw/dataplane/vring.c will probably be here for a longer

Has there been any discussion around introducing something similar to
AioContexts for fd handlers? This would avoid the dataplane-specific hooks
needed for NetClients in this series.

Re: [Qemu-devel] [PATCH 6/9] virtio net: introduce dataplane for virtio net

2013-02-21 Thread mdroth

On Thu, Feb 21, 2013 at 08:54:50PM +0800, Liu Ping Fan wrote:
> From: Liu Ping Fan 
> 
> This is a emulation to virtio-blk dataplane, which push the data
> handling out of biglock. And it is a try to implement this process
> in userspace, while vhost-net in kernel.
> 
> Signed-off-by: Liu Ping Fan 
> ---
>  hw/dataplane/virtio-net.c |  422 
> +
>  hw/dataplane/virtio-net.h |   26 +++
>  hw/virtio-net.c   |   56 +-
>  hw/virtio-net.h   |   61 +++
>  4 files changed, 517 insertions(+), 48 deletions(-)
>  create mode 100644 hw/dataplane/virtio-net.c
>  create mode 100644 hw/dataplane/virtio-net.h
> 
> diff --git a/hw/dataplane/virtio-net.c b/hw/dataplane/virtio-net.c
> new file mode 100644
> index 000..9a1795d
> --- /dev/null
> +++ b/hw/dataplane/virtio-net.c
> @@ -0,0 +1,422 @@
> +/* Copyright IBM, Corp. 2013
> + *
> + * Based on vhost-net and virtio-blk dataplane code
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + */
> +#include "hw/virtio.h"
> +#include "qemu/iov.h"
> +#include "vring.h"
> +#include 
> +#include "net/net.h"
> +#include "net/checksum.h"
> +#include "net/tap.h"
> +#include "virtio-net.h"
> +#include "qemu/error-report.h"
> +
> +typedef struct VirtIONetDataPlane {
> +int async_tx_head;
> +Vring *rx_vring;
> +Vring *tx_vring;
> +EventHandler *rx_handler;
> +EventHandler *tx_handler;
> +bool stop;
> +} VirtIONetDataPlane;
> +
> +WorkThread virt_net_thread;
> +
> +#define VRING_MAX 128
> +
> +static int32_t virtnet_tx(VirtIONet *n, VirtQueue *vq);
> +
> +static void virtnet_tx_complete(struct NetClientState *nc, ssize_t sz)
> +{
> +int ret;
> +VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
> +
> +vring_push(n->dp->tx_vring, n->dp->async_tx_head, 0);
> +ret = virtnet_tx(n, n->tx_vq);
> +if (ret != -EBUSY) {
> +vring_enable_notification(&n->vdev, n->dp->tx_vring);
> +}
> +}
> +
> +static int virtnet_tx(VirtIONet *n, VirtQueue *vq)
> +{
> +struct iovec out_iov[VRING_MAX], sg[VRING_MAX];
> +struct iovec *snd, *end = &out_iov[VRING_MAX];
> +int head;
> +unsigned int out_num, in_num, sg_num;
> +int ret;
> +int num_packets = 0;
> +
> +if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +return num_packets;
> +}
> +
> +assert(n->vdev.vm_running);
> +
> +if (n->async_tx.elem.out_num) {
> +return num_packets;
> +}
> +
> +while (true) {
> +head = vring_pop(&n->vdev, n->dp->tx_vring, out_iov, end, &out_num,
> +&in_num);
> +if (head < 0) {
> +break;
> +}
> +snd = out_iov;
> +assert(n->host_hdr_len <= n->guest_hdr_len);
> +if (n->host_hdr_len != n->guest_hdr_len) {
> +sg_num = iov_copy(sg, ARRAY_SIZE(sg),
> +   out_iov, out_num,
> +   0, n->host_hdr_len);
> +sg_num += iov_copy(sg + sg_num, ARRAY_SIZE(sg) - sg_num,
> + out_iov, out_num,
> + n->guest_hdr_len, -1);
> +out_num = sg_num;
> +snd = sg;
> +}
> +
> +ret = qemu_sendv_packet_async(&n->nic->nc, snd, out_num,
> +virtnet_tx_complete);
> +if (ret == 0) {
> +n->dp->async_tx_head = head;
> +return -EBUSY;
> +}
> +vring_push(n->dp->tx_vring, head, 0);
> +if (num_packets++ > n->tx_burst) {
> +break;
> +}

I'm not sure why we'd break here: if we're sending out lots of packets
should we keep notifications disabled and continue sending them till
we'd block? Is it to avoid starving the rx side?

> +}
> +
> +return num_packets;
> +}
> +
> +static void virtnet_handle_tx(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +int32 ret;
> +VirtIONet *n = (VirtIONet *)vdev;
> +
> +/* This happens when device was stopped but VCPU wasn't. */
> +if (!n->vdev.vm_running) {
> +return;
> +}
> +vring_disable_notification(vdev, n->dp->tx_vring);
> +ret = virtnet_tx(n, vq);
> +if (ret != -EBUSY) {
> +vring_enable_notification(vdev, n->dp->tx_vring);
> +}
> +}
> +
> +
> +static int virtio_net_can_receive(NetClientState *nc)
> +{
> +VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
> +if (!n->vdev.vm_running) {
> +return 0;
> +}
> +if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +return 0;
> +}
> +
> +return 1;
> +}
> +
> +/* peek but not use */
> +static int rx_mergeable_buf_sz(VirtIONet *n)
> +{
> +uint16_t start, idx, head;
> +int total = 0;
> +Vring *vring = n->dp->rx_vring;
> +struct vring_desc *dsc;
> +struct vring_desc *base;
> +
> +for (start = vring->last_avail_idx; start != vring->vr.avail->idx;
> +start++) {
> +head = start%vring->vr

Re: [Qemu-devel] [PATCH] vl.c: Support multiple CPU ranges on -numa option

2013-02-21 Thread Eduardo Habkost

On Thu, Feb 21, 2013 at 09:23:22PM +0100, Markus Armbruster wrote:
> Eduardo Habkost  writes:
> 
> > This allows "," to be used a separator between each CPU range.  Note
> > that commas inside key=value command-line options have to be escaped
> > using ",,", so the command-line will look like:
> >
> >   -numa node,cpus=A,,B,,C,,D
> 
> This is really, really ugly, and an embarrassment to document.  Which
> you didn't ;)

I was trying to have an intermediate solution using the current -numa
parser. I have patches in my queue that will change the code to properly
use QemuOpts later.

It would be interesting to support the "A,B,C,D" format in config files,
though, as it is simple and straighforward when no escaping is required.

> 
> What about
> 
> -numa node,cpus=A,cpus=B,cpus=C,cpus=D

Looks better for the command-line usage, at least. I will give it a try.

> 
> Yes, QemuOpts lets you do that. Getting all the values isn't as easy as
> it could be (unless you use Laszlo's opt-visitor), but that could be
> improved.

Guess what: -numa doesn't even use QemuOpts, and I am not sure the
current format of -numa will allow QemuOpts to be used easily. I expect
the proper solution using QemuOpts to involve having a
standards-compliant "numa-node" config section instead of this weird
"-numa ,..." format where the only valid  that ever existed
was "node".

But I believe it will be feasible to allow "cpus=A,cpus=B" using the
current parser, before we convert to a proper QemuOpts-based
implementaiton.

> 
> > Note that the following format, currently used by libvirt:
> >
> >   -numa nodes,cpus=A,B,C,D
> >
> > will _not_ work yet, as "," is the option separator for the command-line
> > option parser, and it will require changing the -numa option parsing
> > code to handle "cpus" as a special case.
> 
> No way.

Agreed.  :-)

The bad news is that libvirt uses this format since forever, this format
never worked, and nobody ever noticed that this was broken.

-- 
Eduardo

Re: [Qemu-devel] [PATCH 5/9] event poll: enable event poll handle more than one event each time

2013-02-21 Thread mdroth

On Thu, Feb 21, 2013 at 08:54:49PM +0800, Liu Ping Fan wrote:
> From: Liu Ping Fan 
> 
> Set handled event count at initialized time for the dedicated
> thread.
> 
> Signed-off-by: Liu Ping Fan 
> ---
>  hw/dataplane/event-poll.c |   31 ++-
>  hw/dataplane/event-poll.h |4 +++-
>  hw/dataplane/virtio-blk.c |2 +-
>  3 files changed, 18 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/dataplane/event-poll.c b/hw/dataplane/event-poll.c
> index e19f3f3..3009787 100644
> --- a/hw/dataplane/event-poll.c
> +++ b/hw/dataplane/event-poll.c
> @@ -25,6 +25,7 @@ void event_poll_add(EventPoll *poll, EventHandler *handler,
>  };
>  handler->notifier = notifier;
>  handler->callback = callback;
> +
>  if (epoll_ctl(poll->epoll_fd, EPOLL_CTL_ADD,
>event_notifier_get_fd(notifier), &event) != 0) {
>  fprintf(stderr, "failed to add event handler to epoll: %m\n");
> @@ -74,7 +75,7 @@ static void handle_stop(EventHandler *handler, uint32_t 
> events)
>  /* Do nothing */
>  }
> 
> -void event_poll_init(EventPoll *poll)
> +void event_poll_init(EventPoll *poll, int num)
>  {
>  /* Create epoll file descriptor */
>  poll->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
> @@ -83,6 +84,8 @@ void event_poll_init(EventPoll *poll)
>  exit(1);
>  }
> 
> +poll->num = num + 1;
> +poll->events = (struct epoll_event *)g_malloc(poll->num * sizeof(struct 
> epoll_event));
>  /* Set up stop notifier */
>  if (event_notifier_init(&poll->stop_notifier, 0) < 0) {
>  fprintf(stderr, "failed to init stop notifier\n");
> @@ -103,27 +106,21 @@ void event_poll_cleanup(EventPoll *poll)
>  void event_poll(EventPoll *poll)
>  {
>  EventHandler *handler;
> -struct epoll_event event;
> -int nevents;
> +struct epoll_event *events;
> +int nevents, i;
> 
> -/* Wait for the next event.  Only do one event per call to keep the
> - * function simple, this could be changed later. */
> +events = poll->events;
>  do {
> -nevents = epoll_wait(poll->epoll_fd, &event, 1, -1);
> +nevents = epoll_wait(poll->epoll_fd, events, poll->num, -1);
>  } while (nevents < 0 && errno == EINTR);
> -if (unlikely(nevents != 1)) {
> -fprintf(stderr, "epoll_wait failed: %m\n");
> -exit(1); /* should never happen */
> -}
> 
> -/* Find out which event handler has become active */
> -handler = event.data.ptr;
> +for (i = 0; i < nevents; i++) {
> +/* Find out which event handler has become active */
> +handler = events[i].data.ptr;
> 
> -/* Clear the eventfd */
> -event_notifier_test_and_clear(handler->notifier);
> -
> -/* Handle the event */
> -handler->callback(handler, event.events);
> +/* Handle the event */
> +handler->callback(handler, events[i].events);
> +}

I see where it's needed for vnet dataplane, but removing the test and
clear might cause problems for vblock dataplane. Perhaps we could add
a flag during notifier registration to control the behavior? Not sure
if the AioContext stuff has this already.

>  }
> 
>  /* Stop event_poll()
> diff --git a/hw/dataplane/event-poll.h b/hw/dataplane/event-poll.h
> index ff9712b..f08884b 100644
> --- a/hw/dataplane/event-poll.h
> +++ b/hw/dataplane/event-poll.h
> @@ -30,6 +30,8 @@ typedef struct {
>  int epoll_fd;   /* epoll(2) file descriptor */
>  EventNotifier stop_notifier;/* stop poll notifier */
>  EventHandler stop_handler;  /* stop poll handler */
> +int num;
> +struct epoll_event *events;
>  } EventPoll;
> 
>  void event_poll_add(EventPoll *poll, EventHandler *handler,
> @@ -40,7 +42,7 @@ void event_poll_del_fd(EventPoll *poll, int fd);
>  void event_poll_modify_fd(EventPoll *poll, int fd, uint32_t type,
>  EventHandler *handler);
> 
> -void event_poll_init(EventPoll *poll);
> +void event_poll_init(EventPoll *poll, int num);
>  void event_poll_cleanup(EventPoll *poll);
>  void event_poll(EventPoll *poll);
>  void event_poll_notify(EventPoll *poll);
> diff --git a/hw/dataplane/virtio-blk.c b/hw/dataplane/virtio-blk.c
> index b60211a..c6a43d8 100644
> --- a/hw/dataplane/virtio-blk.c
> +++ b/hw/dataplane/virtio-blk.c
> @@ -395,7 +395,7 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
>  return;
>  }
> 
> -event_poll_init(&s->event_poll);
> +event_poll_init(&s->event_poll, 2);
> 
>  /* Set up guest notifier (irq) */
>  if (s->vdev->binding->set_guest_notifiers(s->vdev->binding_opaque,
> -- 
> 1.7.4.4
> 
>

Re: [Qemu-devel] [RFC PATCH RDMA support v2: 5/6] connection-setup code between client/server

2013-02-21 Thread Michael S. Tsirkin

I don't think people mind using RDMA specifically,
small amounts of data can be sent using SEND commands.
But it's cleaner not to use TCP for parts of data.


On Tue, Feb 19, 2013 at 01:03:59AM -0500, Michael R. Hines wrote:
> 
> This sounds great. My team was just discussing customizing
> QEMUFileOps to be more "rdma-friendly".
> 
> We weren't certain if it would be well received, so this is good to know.
> 
> I'll try to scrounge up a patch of some kind before sending out a v3
> and then do my best to get rid of the migrate_use_rdma() checks
> everywhere...
> 
> Thanks for the suggestion,
> 
> - Michael
> 
> On 02/18/2013 05:52 AM, Paolo Bonzini wrote:
> >Il 14/02/2013 20:29, Michael R. Hines ha scritto:
> >>>Are you still using the tcp for transferring device state? If so you
> >>>can call the tcp functions from the migration rdma code as a first
> >>>step but I would prefer it to use RDMA too.
> >>This is the crux of the problem of using RDMA for migration: Currently
> >>all of the QEMU migration control logic and device state goes through
> >>the the QEMUFile implementation. RDMA, however is by nature a zero-copy
> >>solution and is incompatible with QEMUFile.
> >With the patches I sent yesterday, there is no more buffering involved
> >in migration.c.  All data goes straight from arch_init.c to a QEMUFile.
> >
> >QEMUFile still does some buffering, but this should change with other
> >patches that Orit is working on.
> >
> >>Using RDMA for transferring device state is not recommended: Setuping an
> >>RDMA requires registering the memory locations on both sides with the
> >>RDMA hardware. This is not ideal because this would require pinning the
> >>memory holding the device state and then issuing the RDMA transfer for
> >>*each* type of device - which would require changing the control path of
> >>every type of migrated device in QEMU.
> >Yes, this would not work well.  However, you can (I think) define a
> >QEMUFileOps implementation for RDMA that:
> >
> >1) registers the buffer of a QEMUFile with the RDMA hardware;
> >
> >2) in its get_buffer (receive) and put_buffer (send) callbacks, issues a
> >synchronous RDMA transfer;
> >
> >3) unregisters the buffer in the close callback.
> >
> >As a proof of concept, this would also work (though it would make no
> >sense) for transferring the RAM; in the end of course it would be used
> >only for the device state.
> >
> >It's not a problem to add more operations to QEMUFileOps or similar.
> >It's also not a problem to change the way buf is allocated, if you need
> >it to be page-aligned or something like that.
> >
> >It is much better than adding migration_use_rdma() everywhere.
> >
> >Paolo
>

Re: [Qemu-devel] [PATCH] vl.c: Support multiple CPU ranges on -numa option

2013-02-21 Thread Markus Armbruster

Anthony Liguori  writes:

> Markus Armbruster  writes:
>
>> Eduardo Habkost  writes:
>>
>>> This allows "," to be used a separator between each CPU range.  Note
>>> that commas inside key=value command-line options have to be escaped
>>> using ",,", so the command-line will look like:
>>>
>>>   -numa node,cpus=A,,B,,C,,D
>>
>> This is really, really ugly, and an embarrassment to document.  Which
>> you didn't ;)
>>
>> What about
>>
>> -numa node,cpus=A,cpus=B,cpus=C,cpus=D
>>
>> Yes, QemuOpts lets you do that.  Getting all the values isn't as easy as
>> it could be (unless you use Laszlo's opt-visitor), but that could be
>> improved.
>
> No more of this.
>
>  -numa node,cpus=A:B:C:D 
>
> if you want to express a list.

Okay for command line and human monitor, just don't let it bleed into
QMP.

Re: [Qemu-devel] Block I/O optimizations

2013-02-21 Thread Loic Dachary

Hi Stefan,

Thanks a lot, I'm looking forward to it ;-)

Cheers

On 02/21/2013 09:11 AM, Stefan Hajnoczi wrote:
> On Mon, Feb 18, 2013 at 7:19 PM, Loic Dachary  wrote:
>> I recently tried to figure out the best and easiest ways to increase block 
>> I/O performances with qemu. Not being a qemu expert, I expected to find a 
>> few optimization tricks. Much to my surprise, it appears that there are many 
>> significant improvements being worked on. This is excellent news :-)
>>
>> However, I'm not sure I understand how they all fit together. It's probably 
>> quite obvious from the developer point of view but I would very much 
>> appreciate an overview of how dataplane, vhost-blk, ELVIS etc. should be 
>> used or developed to maximize I/O performances. Are there documents I should 
>> read ? If not, would someone be willing to share bits of wisdom ?
> 
> Hi Loic,
> There will be more information on dataplane shortly.  I'll write up a
> blog post and share the link with you.
> 
> Stefan
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH 3/3] KVM: s390: Hook up ioeventfds.

2013-02-21 Thread Michael S. Tsirkin

On Thu, Feb 21, 2013 at 07:14:31PM +0100, Cornelia Huck wrote:
> On Thu, 21 Feb 2013 18:34:59 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Thu, Feb 21, 2013 at 04:21:43PM +0100, Cornelia Huck wrote:
> > > On Thu, 21 Feb 2013 16:39:05 +0200
> > > "Michael S. Tsirkin"  wrote:
> > > 
> > > > On Thu, Feb 21, 2013 at 02:13:00PM +0100, Cornelia Huck wrote:
> > > > > As s390 doesn't use memory writes for virtio notifcations, create
> > > > > a special kind of ioeventfd instead that hooks up into diagnose
> > > > > 0x500 (kvm hypercall) with function code 3 (virtio-ccw notification).
> > > > > 
> > > > > Signed-off-by: Cornelia Huck 
> > > > 
> > > > Do we really have to put virtio specific stuff into kvm?
> > > > How about we add generic functionality to match GPRs
> > > > on a hypercall and signal an eventfd?
> > > 
> > > Worth a try implementing that.
> > > 
> > > > 
> > > > Also, it's a bit unfortunate that this doesn't use
> > > > the io bus datastructure, long term the linked list handling
> > > > might become a bottleneck, using shared code this could maybe
> > > > benefit from performance optimizations there.
> > > 
> > > The linked list stuff was more like an initial implementation that
> > > could be improved later.
> > > 
> > > > io bus data structure currently has the ability to match on
> > > > two 64 bit fields (addr/datamatch) and a signed 32 bit one (length).
> > > > Isn't this sufficient for your purposes?
> > > > How about sticking subchannel id in address, vq in data match
> > > > and using io bus?
> > > 
> > > I can give that a try. (I must admit that I didn't look at the iobus
> > > stuff in detail.)
> > > 
> > > > 
> > > > BTW maybe we could do this for the user interface too,
> > > > while I'm not 100% sure it's the cleanest thing to do
> > > > (or will work), it would certainly minimize the patchset.
> > > 
> > > You mean integrating with the generic interface and dropping the new
> > > ARCH flag?
> > 
> > Not sure about the flag but we could use the general structure
> > without an arch-specific format, if that's a good fit.
> > 
> So I have something that seems to do what I want. I'll see if I can
> morph it into something presentable tomorrow.
> 
> diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
> index b58dd86..3c43e30 100644
> --- a/arch/s390/kvm/Kconfig
> +++ b/arch/s390/kvm/Kconfig
> @@ -22,6 +22,7 @@ config KVM
>   select PREEMPT_NOTIFIERS
>   select ANON_INODES
>   select HAVE_KVM_CPU_RELAX_INTERCEPT
> + select HAVE_KVM_EVENTFD
>   ---help---
> Support hosting paravirtualized guest machines using the SIE
> virtualization capability on the mainframe. This should work
> diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
> index 3975722..8fe9d65 100644
> --- a/arch/s390/kvm/Makefile
> +++ b/arch/s390/kvm/Makefile
> @@ -6,7 +6,7 @@
>  # it under the terms of the GNU General Public License (version 2 only)
>  # as published by the Free Software Foundation.
>  
> -common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o)
> +common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o eventfd.o)
>  
>  ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
>  
> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
> index a390687..7fc195e 100644
> --- a/arch/s390/kvm/diag.c
> +++ b/arch/s390/kvm/diag.c
> @@ -104,6 +104,20 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
>   return -EREMOTE;
>  }
>  
> +static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
> +{
> + int ret, idx;
> + u64 vq = vcpu->run->s.regs.gprs[3];
> +
> + idx = srcu_read_lock(&vcpu->kvm->srcu);
> + ret = kvm_io_bus_write(vcpu->kvm, KVM_CSS_BUS,
> + vcpu->run->s.regs.gprs[2],
> + vcpu->run->s.regs.gprs[1],
> + &vq);

Hmm, do you really need three gprs? If not, we could
just pass len == 8, which would be cleaner.
Also might as well drop the vq variable, no?

> + srcu_read_unlock(&vcpu->kvm->srcu, idx);
> + return ret;
> +}
> +
>  int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>  {
>   int code = (vcpu->arch.sie_block->ipb & 0xfff) >> 16;
> @@ -118,6 +132,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>   return __diag_time_slice_end_directed(vcpu);
>   case 0x308:
>   return __diag_ipl_functions(vcpu);
> + case 0x500:
> + return __diag_virtio_hypercall(vcpu);
>   default:
>   return -EOPNOTSUPP;
>   }
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index f822d36..04d2454 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -142,6 +142,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>   case KVM_CAP_ONE_REG:
>   case KVM_CAP_ENABLE_CAP:
>   case KVM_CAP_S390_CSS_SUPPORT:
> + case KVM_CAP_IOEVENTFD:
>   r = 1;
>   break;
>   case KVM_CAP_NR_VCPUS:
> diff --git a/include/linux/kvm_host.h b/i

Re: [Qemu-devel] [PATCH] vl.c: Support multiple CPU ranges on -numa option

2013-02-21 Thread Anthony Liguori

Markus Armbruster  writes:

> Eduardo Habkost  writes:
>
>> This allows "," to be used a separator between each CPU range.  Note
>> that commas inside key=value command-line options have to be escaped
>> using ",,", so the command-line will look like:
>>
>>   -numa node,cpus=A,,B,,C,,D
>
> This is really, really ugly, and an embarrassment to document.  Which
> you didn't ;)
>
> What about
>
> -numa node,cpus=A,cpus=B,cpus=C,cpus=D
>
> Yes, QemuOpts lets you do that.  Getting all the values isn't as easy as
> it could be (unless you use Laszlo's opt-visitor), but that could be
> improved.

No more of this.

 -numa node,cpus=A:B:C:D 

if you want to express a list.

Regards,

Anthony Liguori

>
>> Note that the following format, currently used by libvirt:
>>
>>   -numa nodes,cpus=A,B,C,D
>>
>> will _not_ work yet, as "," is the option separator for the command-line
>> option parser, and it will require changing the -numa option parsing
>> code to handle "cpus" as a special case.
>
> No way.

Re: [Qemu-devel] [PATCH] fix wrong output with 'info chardev' for tcp socket.

2013-02-21 Thread Michael Tokarev

22.02.2013 00:20, Serge E. Hallyn wrote:
> The snprintf format isn't taking into account the new 'left' and
> 'right' variables (for ipv6 []) when placing the ':', which should
> go immediately before the port.

This fixes actual isse (also found by Serge), where `info chardev'
prints `tcp:127.0.0.1,server,nowait' for a monitor running on port
.

This is definitely a stable material (CCed).

Reviewed-by: Michael Tokarev 

Thanks!

/mjt

> Signed-off-by: Serge Hallyn 
> ---
>  qemu-char.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/qemu-char.c b/qemu-char.c
> index e4b0f53..3e152e1 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -2482,7 +2482,7 @@ static CharDriverState *qemu_chr_open_socket_fd(int fd, 
> bool do_nodelay,
>  s->do_nodelay = do_nodelay;
>  getnameinfo((struct sockaddr *) &ss, ss_len, host, sizeof(host),
>  serv, sizeof(serv), NI_NUMERICHOST | NI_NUMERICSERV);
> -snprintf(chr->filename, 256, "%s:%s:%s%s%s%s",
> +snprintf(chr->filename, 256, "%s:%s%s%s:%s%s",
>   is_telnet ? "telnet" : "tcp",
>   left, host, right, serv,
>   is_listen ? ",server" : "");
>

Re: [Qemu-devel] [RFC PATCH RDMA support v2: 2/6] create migration-rdma.c for core RDMA migration code

2013-02-21 Thread Michael S. Tsirkin

On Mon, Feb 11, 2013 at 05:49:53PM -0500, Michael R. Hines wrote:
> +struct rdma_data {

Picking on this but it's everywhere: should be RdmaData, with
a typedef. Please check CODING_STYLE and follow the rules.

Also, prefixing everything with rdma_ means I can't figure out
which functions are local and which from rdma_cm. Let's find
some other prefix.

> +char *host;
> +int port;
> +int enabled;
> +int gidx;
> +union ibv_gid gid;
> +
> +struct rdma_context rdma_ctx;
> +struct rdma_ram_blocks rdma_ram_blocks;
> +
> +/* This is used for synchronization: We use
> +   IBV_WR_SEND to send it after all IBV_WR_RDMA_WRITEs
> +   are done. When the receiver gets it, it can be certain
> +   that all the RDMAs are completed. */

/*
 * Always
 * like
 * this
 */

/* Never
   like
   this */

I'll reserve judgement until the protocol is documented,
but I see a somewhat troubling lack of handling for cross-endian
and cross-verion compatibility.
Did I miss it?

-- 
MST

Re: [Qemu-devel] [PATCH] vl.c: Support multiple CPU ranges on -numa option

2013-02-21 Thread Markus Armbruster

Eduardo Habkost  writes:

> This allows "," to be used a separator between each CPU range.  Note
> that commas inside key=value command-line options have to be escaped
> using ",,", so the command-line will look like:
>
>   -numa node,cpus=A,,B,,C,,D

This is really, really ugly, and an embarrassment to document.  Which
you didn't ;)

What about

-numa node,cpus=A,cpus=B,cpus=C,cpus=D

Yes, QemuOpts lets you do that.  Getting all the values isn't as easy as
it could be (unless you use Laszlo's opt-visitor), but that could be
improved.

> Note that the following format, currently used by libvirt:
>
>   -numa nodes,cpus=A,B,C,D
>
> will _not_ work yet, as "," is the option separator for the command-line
> option parser, and it will require changing the -numa option parsing
> code to handle "cpus" as a special case.

No way.

[Qemu-devel] [PATCH] fix wrong output with 'info chardev' for tcp socket.

2013-02-21 Thread Serge E. Hallyn

The snprintf format isn't taking into account the new 'left' and
'right' variables (for ipv6 []) when placing the ':', which should
go immediately before the port.

Signed-off-by: Serge Hallyn 
---
 qemu-char.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-char.c b/qemu-char.c
index e4b0f53..3e152e1 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2482,7 +2482,7 @@ static CharDriverState *qemu_chr_open_socket_fd(int fd, 
bool do_nodelay,
 s->do_nodelay = do_nodelay;
 getnameinfo((struct sockaddr *) &ss, ss_len, host, sizeof(host),
 serv, sizeof(serv), NI_NUMERICHOST | NI_NUMERICSERV);
-snprintf(chr->filename, 256, "%s:%s:%s%s%s%s",
+snprintf(chr->filename, 256, "%s:%s%s%s:%s%s",
  is_telnet ? "telnet" : "tcp",
  left, host, right, serv,
  is_listen ? ",server" : "");
-- 
1.8.1.2

Re: [Qemu-devel] [PATCH 3/9] event poll: make epoll work for normal fd

2013-02-21 Thread mdroth

On Thu, Feb 21, 2013 at 08:54:47PM +0800, Liu Ping Fan wrote:
> From: Liu Ping Fan 
> 
> When event poll can work with normal fd, we can port them
> onto the event loop.
> 
> Signed-off-by: Liu Ping Fan 
> ---
>  hw/dataplane/event-poll.c |   36 
>  hw/dataplane/event-poll.h |8 
>  2 files changed, 44 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/dataplane/event-poll.c b/hw/dataplane/event-poll.c
> index 2b55c6e..b7dea53 100644
> --- a/hw/dataplane/event-poll.c
> +++ b/hw/dataplane/event-poll.c
> @@ -32,6 +32,42 @@ void event_poll_add(EventPoll *poll, EventHandler *handler,
>  }
>  }
> 
> +void event_poll_add_fd(EventPoll *poll, int fd, uint32_t type,
> +EventHandler *handler)
> +{
> +struct epoll_event event = {
> +.events = type,
> +.data.ptr = handler,
> +};
> +
> +if (epoll_ctl(poll->epoll_fd, EPOLL_CTL_ADD, fd, &event) != 0) {
> +fprintf(stderr, "failed to add event fd handler to epoll: %m\n");
> +exit(1);
> +}
> +
> +}

An alternative to adding these interfaces would be initializing an
EventNotifier using event_notifier_init_fd() and then passing that in to
event_poll_add()

I think doing it this way would be applicable on top of Paolo's
patches to switch dataplane to using AioContext as well.

> +void event_poll_del_fd(EventPoll *poll, int fd)
> +{
> +if (epoll_ctl(poll->epoll_fd, EPOLL_CTL_DEL, fd, NULL) != 0) {
> +fprintf(stderr, "failed to del event handler to epoll: %m\n");
> +exit(1);
> +}
> +}
> +
> +
> +void event_poll_modify_fd(EventPoll *poll, int fd, uint32_t type,
> +EventHandler *handler)
> +{
> +struct epoll_event event = {
> +.events = type,
> +.data.ptr = handler,
> +};
> +if (epoll_ctl(poll->epoll_fd, EPOLL_CTL_MOD, fd, &event) != 0) {
> +fprintf(stderr, "failed to modify event handler to epoll: %m\n");
> +exit(1);
> +}
> +}
> +
>  /* Event callback for stopping event_poll() */
>  static void handle_stop(EventHandler *handler)
>  {
> diff --git a/hw/dataplane/event-poll.h b/hw/dataplane/event-poll.h
> index 3e8d3ec..606138c 100644
> --- a/hw/dataplane/event-poll.h
> +++ b/hw/dataplane/event-poll.h
> @@ -22,6 +22,8 @@ typedef void EventCallback(EventHandler *handler);
>  struct EventHandler {
>  EventNotifier *notifier;/* eventfd */
>  EventCallback *callback;/* callback function */
> +void *opaque;
> +int fd; /* normal fd*/
>  };
> 
>  typedef struct {
> @@ -32,6 +34,12 @@ typedef struct {
> 
>  void event_poll_add(EventPoll *poll, EventHandler *handler,
>  EventNotifier *notifier, EventCallback *callback);
> +void event_poll_add_fd(EventPoll *poll, int fd, uint32_t type,
> +EventHandler *handler);
> +void event_poll_del_fd(EventPoll *poll, int fd);
> +void event_poll_modify_fd(EventPoll *poll, int fd, uint32_t type,
> +EventHandler *handler);
> +
>  void event_poll_init(EventPoll *poll);
>  void event_poll_cleanup(EventPoll *poll);
>  void event_poll(EventPoll *poll);
> -- 
> 1.7.4.4
> 
>

Re: [Qemu-devel] [RFC PATCH RDMA support v2: 5/6] connection-setup code between client/server

2013-02-21 Thread Michael S. Tsirkin

On Tue, Feb 19, 2013 at 10:41:08AM -0500, Michael R. Hines wrote:
> On 02/18/2013 03:24 AM, Orit Wasserman wrote:
> >Hi Michael, The guest device state is quite small (~100K probably
> >less) especially when compared to the guest memory and we already
> >are pinning the guest memory for RDMA any way I was actually
> >wondering about the memory pinning, do we pin all guest memory
> >pages as migration starts or on demand?
> 
> The patch supports both methods. There's a function called
> "rdma_server_prepare()" which optionally pins all the memory in
> advanced.
> 
> We prefer on-demand pinning, of course, because we want to preserve
> the ability to do ballooning and the occasional madvise() calls.
> 
> The patch defaults to pinning everything right now for performance
> evaluation.. later I'll make sure to switch that off when
> we've coalesced on a solution for the actual RDMA transfer itself.

How well does the incremental pinning work?
The pin everything approach does not look very useful
because it conflicts with memory overcommit.

> >For the guest memory pages, sending the pages directly without
> >QemuFile (that does buffering) is better, I would suggest
> >implementing an QemuRDMAFile for this. It will have a new API for
> >the memory pages (zero copy) so instead of using qemu_put_buffer
> >we will call qemu_rdma_buffer or it can reimplement
> >qemu_put_buffer (you need to add offset). As for device state
> >which is sent in the last phase and is small you can modify the
> >current implementation. (well Paolo sent patches that are changing
> >this but I think buffering is still an option) The current
> >migration implementation copies the device state into a buffer and
> >than send the data from the buffer (QemuBufferedFile). You only
> >need to pin this buffer, and RDMA it after all the device state
> >was written into it. Regards, Orit
> I like it .. can you help me understand - how much different
> is this design than the "QEMUFileOps" design that Paulo suggested?
> 
> Or it basically the same? reimplementing
> qemu_put_buffer/get_buffer() for RDMA purposes..
> 
> - Michael
>

Re: [Qemu-devel] [RFC PATCH RDMA support v2: 5/6] connection-setup code between client/server

2013-02-21 Thread Michael S. Tsirkin

On Thu, Feb 14, 2013 at 02:29:09PM -0500, Michael R. Hines wrote:
> Orit (and anthony if you're not busy),
> 
> I forgot to respond to this very important comment:
> 
> On 02/13/2013 03:46 AM, Orit Wasserman wrote:
> 
> Are you still using the tcp for transferring device state? If so you can
> call the tcp functions from the migration rdma code as a first step but I
> would prefer it to use RDMA too.
> 
> 
> This is the crux of the problem of using RDMA for migration: Currently all of
> the QEMU migration control logic and device state goes through the the 
> QEMUFile
> implementation. RDMA, however is by nature a zero-copy solution and is
> incompatible with QEMUFile.

RDMA might be overkill but you could reuse the same connection,
using send instructions.

> Using RDMA for transferring device state is not recommended: Setuping an RDMA
> requires registering the memory locations on both sides with the RDMA 
> hardware.
> This is not ideal because this would require pinning the memory holding the
> device state and then issuing the RDMA transfer for *each* type of device -
> which would require changing the control path of every type of migrated device
> in QEMU.
> 
> Currently the Patch we submitted bypasses QEMUFile. It does just issues the
> RDMA transfer for the memory that was dirtied and then continues along with 
> the
> rest of the migration call path normally.
> 
> In an ideal world, we would prefer a hyrbid approach, something like:
> 
> Begin Migration Iteration Round:
> 1. stop VCPU
> 2. start iterative pass over memory
> 3. send control signals (if any) / device state to QEMUFile
> 4. When a dirty memory page is found, do:
>  a) Instruct the QEMUFile to block
>  b) Issue the RDMA transfer
>  c) Instruct the QEMUFile to unblock
> 5. resume VCPU
> 
> This would require a "smarter" QEMUFile implementation that understands when 
> to
> block and for how long.
> 
> Comments?
> 
> - Michael

Re: [Qemu-devel] [PATCH v2 2/2] qom/object.c: Allow itf cast with num_itfs = 0

2013-02-21 Thread Anthony Liguori

Anthony Liguori  writes:

> Peter Crosthwaite  writes:
>
>> num_interfaces only tells you how many interfaces the concrete child class 
>> has
>> (as defined in the TypeInfo). This means if you have a child class which 
>> defines
>> no interfaces of its own, but its parent has interfaces you cannot cast to 
>> those
>> parent interfaces.
>>
>> Fixed changing the guard to check the class->interfaces list instead (which 
>> is
>> a complete flattened list of implemented interfaces).
>>
>> Signed-off-by: Peter Crosthwaite 
>
> I must admit I don't understand it yet, but this breaks hot unplug.

It was a bad bisect.  It doesn't always happen and I wasn't running
enough tests to catch it.  Am going again now.  Stay tuned.

Regards,

Anthony Liguori

>
> Here is the qemu-test output:
>
> [aliguori@ccnode4 qemu-test]$ QEMU_TEST_SEED=45240 ./qemu-test 
> ~/build/qemu/x86_64-softmmu/qemu-system-x86_64 tests/device-del.sh
> Using RANDOM seed 45240
> Testing nic
> Formatting '.tmp-31849/disk.img', fmt=qcow2 size=10737418240 encryption=off 
> cluster_size=65536 lazy_refcounts=off 
> /home/aliguori/build/qemu/x86_64-softmmu/qemu-system-x86_64 -kernel 
> /usr/local/share/qemu-jeos/kernel-x86_64-pc -initrd 
> .tmp-31849/initramfs-31849.img.gz -device isa-debug-exit -append 
> console=ttyS0 seed=45240 -nographic -enable-kvm -device virtio-balloon-pci 
> -pidfile .tmp-31849/pidfile-31849.pid -qmp 
> unix:.tmp-31849/qmpsock-31849.sock,server,nowait
> [0.00] Initializing cgroup subsys cpuset
> [0.00] Initializing cgroup subsys cpu
> [0.00] Linux version 3.4.0 (root@ccnode4) (gcc version 4.6.4 20120830 
> (prerelease) (GCC) ) #2 SMP Mon Dec 3 19:40:41 CST 2012
> [0.00] Command line: console=ttyS0 seed=45240
> [0.00] BIOS-provided physical RAM map:
>
> <- snip ->
>
> [0.865631] Freeing unused kernel memory: 584k freed
> [0.866320] Write protecting the kernel read-only data: 12288k
> [0.868626] Freeing unused kernel memory: 640k freed
> [0.873366] Freeing unused kernel memory: 1724k freed
> Setting guest RANDOM seed to 45240
> *** Running tests ***
> /tests/device-del.sh
> Running test /tests/device-del.sh...
> ** waiting for hotplug **
> ./tests/device-del.sh: line 36: test: !=: unary operator expected
> ** waiting for remove **
> ** waiting for guest to see device **
> ./qemu-test: line 99: 31883 Segmentation fault  (core dumped) "$@"
>
> The backtrace is:
>
> Program terminated with signal 11, Segmentation fault.
> #0  qemu_bh_delete (bh=0x0) at /home/aliguori/git/qemu/async.c:116
> 116   bh->scheduled = 0;
> Missing separate debuginfos, use: debuginfo-install 
> bzip2-libs-1.0.6-7.fc18.x86_64 glib2-2.34.2-2.fc18.x86_64 
> glibc-2.16-28.fc18.x86_64 libaio-0.3.109-6.fc18.x86_64 
> libgcc-4.7.2-8.fc18.x86_64 pixman-0.26.2-5.fc18.x86_64 
> xen-libs-4.2.1-5.fc18.x86_64 xz-libs-5.1.2-2alpha.fc18.x86_64 
> zlib-1.2.7-9.fc18.x86_64
> (gdb) bt
> #0  qemu_bh_delete (bh=0x0) at /home/aliguori/git/qemu/async.c:116
> #1  0x7f55cac9bd95 in virtio_net_exit (vdev=0x7f55cd6eb960)
> at /home/aliguori/git/qemu/hw/virtio-net.c:1413
> #2  0x7f55cabde7f9 in virtio_net_exit_pci (pci_dev=0x7f55cd668260)
> at /home/aliguori/git/qemu/hw/virtio-pci.c:1016
> #3  0x7f55cab92e4a in pci_unregister_device (dev=)
> at /home/aliguori/git/qemu/hw/pci/pci.c:889
> #4  0x7f55caba7485 in device_unparent (obj=)
> at /home/aliguori/git/qemu/hw/qdev.c:773
> #5  0x7f55cac1ea0f in object_unparent (obj=0x7f55cd668260)
> at /home/aliguori/git/qemu/qom/object.c:370
> #6  0x7f55caba7a4d in qdev_free (dev=)
> at /home/aliguori/git/qemu/hw/qdev.c:270
> #7  0x7f55cab38da0 in acpi_piix_eject_slot (s=0x7f55cd68fd50, 
> slots=) at /home/aliguori/git/qemu/hw/acpi_piix4.c:306
> #8  0x7f55caca70f2 in access_with_adjusted_size (addr=addr@entry=8, 
> value=value@entry=0x7f55c8104c28, size=4, access_size_min= out>, 
> access_size_max=, access=access@entry=
> 0x7f55caca7710 , opaque=opaque@entry=
> 0x7f55cd690540) at /home/aliguori/git/qemu/memory.c:364
> #9  0x7f55caca8767 in memory_region_iorange_write (
> iorange=, offset=8, width=4, data=32)
> at /home/aliguori/git/qemu/memory.c:439
>
> It very much looks like a double free to me of the device.
>
> The test is doing a simple device_del in the monitor.
>
> Regards,
>
> Anthony Liguori
>
>> ---
>> changed from v1:
>> Change guard implementation rather than removed it (Paolo review)
>>
>>  qom/object.c |3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/qom/object.c b/qom/object.c
>> index 4b72a64..3d638ff 100644
>> --- a/qom/object.c
>> +++ b/qom/object.c
>> @@ -449,7 +449,8 @@ ObjectClass *object_class_dynamic_cast(ObjectClass 
>> *class,
>>  TypeImpl *type = class->type;
>>  ObjectClass *ret = NULL;
>>  
>> -if (type->num_interfaces && type_is_ancestor(target_type, 
>> type_interface)) {
>> +if (type->class->interfaces &&
>> +

Re: [Qemu-devel] [PATCH v2 2/2] qom/object.c: Allow itf cast with num_itfs = 0

2013-02-21 Thread Anthony Liguori

Peter Crosthwaite  writes:

> num_interfaces only tells you how many interfaces the concrete child class has
> (as defined in the TypeInfo). This means if you have a child class which 
> defines
> no interfaces of its own, but its parent has interfaces you cannot cast to 
> those
> parent interfaces.
>
> Fixed changing the guard to check the class->interfaces list instead (which is
> a complete flattened list of implemented interfaces).
>
> Signed-off-by: Peter Crosthwaite 

I must admit I don't understand it yet, but this breaks hot unplug.

Here is the qemu-test output:

[aliguori@ccnode4 qemu-test]$ QEMU_TEST_SEED=45240 ./qemu-test 
~/build/qemu/x86_64-softmmu/qemu-system-x86_64 tests/device-del.sh
Using RANDOM seed 45240
Testing nic
Formatting '.tmp-31849/disk.img', fmt=qcow2 size=10737418240 encryption=off 
cluster_size=65536 lazy_refcounts=off 
/home/aliguori/build/qemu/x86_64-softmmu/qemu-system-x86_64 -kernel 
/usr/local/share/qemu-jeos/kernel-x86_64-pc -initrd 
.tmp-31849/initramfs-31849.img.gz -device isa-debug-exit -append console=ttyS0 
seed=45240 -nographic -enable-kvm -device virtio-balloon-pci -pidfile 
.tmp-31849/pidfile-31849.pid -qmp 
unix:.tmp-31849/qmpsock-31849.sock,server,nowait
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.4.0 (root@ccnode4) (gcc version 4.6.4 20120830 
(prerelease) (GCC) ) #2 SMP Mon Dec 3 19:40:41 CST 2012
[0.00] Command line: console=ttyS0 seed=45240
[0.00] BIOS-provided physical RAM map:

<- snip ->

[0.865631] Freeing unused kernel memory: 584k freed
[0.866320] Write protecting the kernel read-only data: 12288k
[0.868626] Freeing unused kernel memory: 640k freed
[0.873366] Freeing unused kernel memory: 1724k freed
Setting guest RANDOM seed to 45240
*** Running tests ***
/tests/device-del.sh
Running test /tests/device-del.sh...
** waiting for hotplug **
./tests/device-del.sh: line 36: test: !=: unary operator expected
** waiting for remove **
** waiting for guest to see device **
./qemu-test: line 99: 31883 Segmentation fault  (core dumped) "$@"

The backtrace is:

Program terminated with signal 11, Segmentation fault.
#0  qemu_bh_delete (bh=0x0) at /home/aliguori/git/qemu/async.c:116
116 bh->scheduled = 0;
Missing separate debuginfos, use: debuginfo-install 
bzip2-libs-1.0.6-7.fc18.x86_64 glib2-2.34.2-2.fc18.x86_64 
glibc-2.16-28.fc18.x86_64 libaio-0.3.109-6.fc18.x86_64 
libgcc-4.7.2-8.fc18.x86_64 pixman-0.26.2-5.fc18.x86_64 
xen-libs-4.2.1-5.fc18.x86_64 xz-libs-5.1.2-2alpha.fc18.x86_64 
zlib-1.2.7-9.fc18.x86_64
(gdb) bt
#0  qemu_bh_delete (bh=0x0) at /home/aliguori/git/qemu/async.c:116
#1  0x7f55cac9bd95 in virtio_net_exit (vdev=0x7f55cd6eb960)
at /home/aliguori/git/qemu/hw/virtio-net.c:1413
#2  0x7f55cabde7f9 in virtio_net_exit_pci (pci_dev=0x7f55cd668260)
at /home/aliguori/git/qemu/hw/virtio-pci.c:1016
#3  0x7f55cab92e4a in pci_unregister_device (dev=)
at /home/aliguori/git/qemu/hw/pci/pci.c:889
#4  0x7f55caba7485 in device_unparent (obj=)
at /home/aliguori/git/qemu/hw/qdev.c:773
#5  0x7f55cac1ea0f in object_unparent (obj=0x7f55cd668260)
at /home/aliguori/git/qemu/qom/object.c:370
#6  0x7f55caba7a4d in qdev_free (dev=)
at /home/aliguori/git/qemu/hw/qdev.c:270
#7  0x7f55cab38da0 in acpi_piix_eject_slot (s=0x7f55cd68fd50, 
slots=) at /home/aliguori/git/qemu/hw/acpi_piix4.c:306
#8  0x7f55caca70f2 in access_with_adjusted_size (addr=addr@entry=8, 
value=value@entry=0x7f55c8104c28, size=4, access_size_min=, 
access_size_max=, access=access@entry=
0x7f55caca7710 , opaque=opaque@entry=
0x7f55cd690540) at /home/aliguori/git/qemu/memory.c:364
#9  0x7f55caca8767 in memory_region_iorange_write (
iorange=, offset=8, width=4, data=32)
at /home/aliguori/git/qemu/memory.c:439

It very much looks like a double free to me of the device.

The test is doing a simple device_del in the monitor.

Regards,

Anthony Liguori

> ---
> changed from v1:
> Change guard implementation rather than removed it (Paolo review)
>
>  qom/object.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/qom/object.c b/qom/object.c
> index 4b72a64..3d638ff 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -449,7 +449,8 @@ ObjectClass *object_class_dynamic_cast(ObjectClass *class,
>  TypeImpl *type = class->type;
>  ObjectClass *ret = NULL;
>  
> -if (type->num_interfaces && type_is_ancestor(target_type, 
> type_interface)) {
> +if (type->class->interfaces &&
> +type_is_ancestor(target_type, type_interface)) {
>  int found = 0;
>  GSList *i;
>  
> -- 
> 1.7.0.4

Re: [Qemu-devel] [RFC PATCH RDMA support v2: 4/6] initialize RDMA options when QEMU first runs on command-line

2013-02-21 Thread Michael S. Tsirkin

On Tue, Feb 19, 2013 at 09:42:45AM +0100, Paolo Bonzini wrote:
> Il 19/02/2013 07:00, Michael R. Hines ha scritto:
> > Yes, this is done at migration time (see functions "rdma_client_init"
> > and "rdma_server_prepare()")
> > 
> > To explain the host and port:
> > 
> > The separate host and port are used by the library "librdmacm". This
> > library performs a network translation between the IP address and a
> > unique infiniband user-level Port number and the physical interface that
> > has the RDMA capabilities. This library requires an IP address and port
> > bound specifically to the requested RDMA interface to work.
> > 
> > The patch does not assume that the network interface used for TCP
> > traffic will necessarily be the same as the interface used for RDMA
> > traffic.
> 
> Of course the best thing to do would be to have all traffic on the RDMA
> interface... :)
> 
> Paolo

You can't do this with infiniband, RDMA is only possible once the
connection is established.


> > Alternatively, this host and port could be specified using the QMP
> > "migrate" command, but this command already has the URI for the TCP side
> > of things reserved.
> > 
> > If you guys like, we could specify a *second* URI on the QMP command
> > line - we don't really have a preference.
> > 
> > Either way is fine whatever the consensus is.
> > 
> > - Michael
>

Re: [Qemu-devel] [PATCH 24028/24028] Evaluate breakpoint condition on target.

2013-02-21 Thread Paul Brook

In addition to the comments others made about patch formatting, etc:

> +/* conditional breakpoint evaluation on target*/
> +pstrcat(buf, sizeof(buf), ";ConditionalBreakpoints+");

I'm pretty sure this is a lie for most targets, given later on we have:

> +#if defined(TARGET_ARM)
> +cpu_get_reg_var_func = cpu_get_reg_var_arm;
> +#else
> +cpu_get_reg_var_func = 0;
> +#endif

> +for (i = 0 ; i < bp_cond_len ; i++) {
> +if (!isxdigit(*p) || !isxdigit(*(p + 1))) {
> +bp_cond_len = 0 ;
> +g_free(bp_cond_expr);
> +bp_cond_expr = NULL;
> +perror("Error in breakpoint condition");

perror is the wrong way to report a malformed gdb command.

> +#if TARGET_LONG_SIZE == 4
> +typedef float target_double;
> +#else /* TARGET_LONG_SIZE == 8 */
> +typedef double target_double;
> +#endif

This clearly has nothing to do with the target double precision floating point 
type.

> +int qemu_rw_debug_flag;

This appears to be a write-only variable.

> +#define BP_AGENT_MAX_COND_SIZE 1024

By my reading this isn't the maximim size, it's the maximum stack depth.

> +void cpu_get_reg_var_arm(TCGv var, int reg)
> +{
> +tcg_gen_mov_i32(var, cpu_R[reg]);
> +}

Looks like it will break horribly when the user requests anything other than 
r0-r15.  And r15 is probably also wrong.

> +bswap16(val);

Clearly wrong.

> +fprintf(stderr,
> +"GDB agent: const 64 is not supported for 32 bit

This is not a good way to report user errors.  Several other occurances.

> +static target_long bp_agent_get_arg(const uint8_t *cond_exp,
>...
> +case 4:
> +default:

I'd be amazed if this default case is correct.

> +/*for case error , ex.buffer overloading -
> +  need to set labels anyway in order to avoid segmentation fault  */

Sounds like you're failing to check for errors somewhere else.

Re: [Qemu-devel] [RFC PATCH RDMA support v2: 2/6] create migration-rdma.c for core RDMA migration code

2013-02-21 Thread Michael S. Tsirkin

On Mon, Feb 11, 2013 at 05:49:53PM -0500, Michael R. Hines wrote:
> From: "Michael R. Hines" 
> 
> 
> Signed-off-by: Michael R. Hines 
> ---
>  include/qemu/rdma.h |  281 ++
>  migration-rdma.c| 1444 
> +++
>  2 files changed, 1725 insertions(+)
>  create mode 100644 include/qemu/rdma.h
>  create mode 100644 migration-rdma.c

Could you document the protocol used, including
sync using send etc?
A good place to put it would be docs/


> diff --git a/include/qemu/rdma.h b/include/qemu/rdma.h
> new file mode 100644
> index 000..2dc2519
> --- /dev/null
> +++ b/include/qemu/rdma.h
> @@ -0,0 +1,281 @@
> +/*
> + *  Copyright (C) 2013 Michael R. Hines 
> + *  Copyright (C) 2013 Jiuxing Liu 
> + *
> + *  RDMA data structures and helper functions (for migration)
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; under version 2 of the License.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see .
> + */
> +
> +#ifndef _RDMA_H
> +#define _RDMA_H
> +
> +#include "config-host.h"
> +#ifdef CONFIG_RDMA
> +#include 
> +#endif
> +#include "monitor/monitor.h"
> +
> +extern int rdmaport;
> +extern char rdmahost[64];
> +
> +struct rdma_context {
> +/* cm_id also has ibv_conext, rdma_event_channel, and ibv_qp in
> +   cm_id->verbs, cm_id->channel, and cm_id->qp. */
> +struct rdma_cm_id *cm_id;
> +struct rdma_cm_id *listen_id;
> +
> +struct ibv_context *verbs;
> +struct rdma_event_channel *channel;
> +struct ibv_qp *qp;
> +
> +struct ibv_comp_channel *comp_channel;
> +struct ibv_pd *pd;
> +struct ibv_cq *cq;
> +};
> +
> +static inline void rdma_init_context(struct rdma_context *rdma_ctx)
> +{
> +rdma_ctx->cm_id = NULL;
> +rdma_ctx->listen_id = NULL;
> +rdma_ctx->verbs = NULL;
> +rdma_ctx->channel = NULL;
> +rdma_ctx->qp = NULL;
> +rdma_ctx->comp_channel = NULL;
> +rdma_ctx->pd = NULL;
> +rdma_ctx->cq = NULL;
> +}
> +
> +void cpu_physical_memory_reset_dirty_all(void);
> +
> +int rdma_resolve_host(struct rdma_context *rdma_ctx,
> +const char *host, int port);
> +int rdma_alloc_pd_cq(struct rdma_context *rdma_ctx);
> +int rdma_alloc_qp(struct rdma_context *rdma_ctx);
> +int rdma_migrate_connect(struct rdma_context *rdma_ctx,
> +void *in_data, int *in_len, void *out_data, int out_len);
> +int rdma_migrate_accept(struct rdma_context *rdma_ctx,
> +void *in_data, int *in_len, void *out_data, int out_len);
> +void rdma_migrate_disconnect(struct rdma_context *rdma_ctx);
> +
> +/* Instead of registering whole ram blocks, we can register them in smaller
> + * chunks. This may be benefial if the ram blocks have holes in them */
> +#define RDMA_CHUNK_REGISTRATION
> +
> +#define RDMA_LAZY_REGISTRATION
> +
> +#define RDMA_REG_CHUNK_SHIFT 20
> +#define RDMA_REG_CHUNK_SIZE (1UL << (RDMA_REG_CHUNK_SHIFT))
> +#define RDMA_REG_CHUNK_INDEX(start_addr, host_addr) \
> +(((unsigned long)(host_addr) >> RDMA_REG_CHUNK_SHIFT) - \
> +((unsigned long)(start_addr) >> RDMA_REG_CHUNK_SHIFT))
> +#define RDMA_REG_NUM_CHUNKS(rdma_ram_block) \
> +(RDMA_REG_CHUNK_INDEX((rdma_ram_block)->local_host_addr,\
> +(rdma_ram_block)->local_host_addr +\
> +(rdma_ram_block)->length) + 1)
> +#define RDMA_REG_CHUNK_START(rdma_ram_block, i) ((uint8_t *)\
> +unsigned long)((rdma_ram_block)->local_host_addr) >> \
> +RDMA_REG_CHUNK_SHIFT) + (i)) << \
> +RDMA_REG_CHUNK_SHIFT))
> +#define RDMA_REG_CHUNK_END(rdma_ram_block, i) \
> +(RDMA_REG_CHUNK_START(rdma_ram_block, i) + \
> + RDMA_REG_CHUNK_SIZE)
> +
> +struct rdma_ram_block {
> +uint8_t *local_host_addr;
> +uint64_t remote_host_addr;
> +uint64_t offset;
> +uint64_t length;
> +struct ibv_mr **pmr;
> +struct ibv_mr *mr;
> +uint32_t remote_rkey;
> +};
> +
> +struct rdma_remote_ram_block {
> +uint64_t remote_host_addr;
> +uint64_t offset;
> +uint64_t length;
> +uint32_t remote_rkey;
> +};
> +
> +#define RDMA_MAX_RAM_BLOCKS 64
> +
> +struct rdma_ram_blocks {
> +int num_blocks;
> +struct rdma_ram_block block[RDMA_MAX_RAM_BLOCKS];
> +};
> +
> +struct rdma_remote_ram_blocks {
> +int num_blocks;
> +struct rdma_remote_ram_block block[RDMA_MAX_RAM_BLOCKS];
> +};
> +
> +int rdma_init_ram_blocks(struct rdma_ram_blocks *rdma_ram_blocks);
> +int rdma_reg_chunk_ram_blocks(struct rdma_conte

Re: [Qemu-devel] [edk2] OvmfPkg reset [was: Distinguish between reset types]

2013-02-21 Thread Laszlo Ersek

On 02/21/13 20:31, Paolo Bonzini wrote:
> Il 21/02/2013 19:24, Laszlo Ersek ha scritto:
 (1) The reset capability that OVMF exports via ACPI -- I agree that I
 should be effecting the 0xCF9 thing in the appropriate table.
>> On a second thought, this will require a new build -D flag, or a PCD.
>>
>> I'm not worried about the ACPI 1.0 --> ACPI 2.0 change in the FADT, the
>> table struct itself forward compatible.
>>
>> However currently we're not saying anything about the reset capabilities
>> of the platform. A client looking at the FADT for reset info will find
>> nothing and follow its own logic, which may or may not include writing
>> to 0xCF9, but we don't have any part in it.
>>
>> If now the FADT starts to claim 0xCF9 on a qemu version that doesn't
>> actually support it, we could mislead the client. Unless we can
>> interrogate qemu about the support (and I think we can't), we'll have to
>> depend on a build-time option. (Or should I shoehorn it into -D CSM_ENABLE?)
>>
>> Jordan, what do you think?
> 
> ACPI tables are hosed enough on some real system that we can assume that
> all guests will fall back to something else---typically a keyboard
> controller reset.

That works for me. Thx.

(BTW I can "plausibly justify" why a BIOS vendor would advertize 0xCF9
(or anything else) in RESET_REG, but deny it by clearing RESET_REG_SUP:
they're probably not sure what hardware the BIOS will be flashed to. So
RESET_REG is a hint for OSPM, but if it doesn't work, "we told you so in
RESET_REG_SUP!" :) Repudiation of responsibility. Maybe we should follow
suit :))

Laszlo

[Qemu-devel] [PATCH] vl.c: Support multiple CPU ranges on -numa option

2013-02-21 Thread Eduardo Habkost

This allows "," to be used a separator between each CPU range.  Note
that commas inside key=value command-line options have to be escaped
using ",,", so the command-line will look like:

  -numa node,cpus=A,,B,,C,,D

Note that the following format, currently used by libvirt:

  -numa nodes,cpus=A,B,C,D

will _not_ work yet, as "," is the option separator for the command-line
option parser, and it will require changing the -numa option parsing
code to handle "cpus" as a special case.

Signed-off-by: Eduardo Habkost 
---
 vl.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/vl.c b/vl.c
index 955d2ff..cd247be 100644
--- a/vl.c
+++ b/vl.c
@@ -1244,7 +1244,7 @@ char *get_boot_devices_list(size_t *size)
 return list;
 }
 
-static void numa_node_parse_cpus(int nodenr, const char *cpus)
+static void numa_node_parse_cpu_range(int nodenr, const char *cpus)
 {
 char *endptr;
 unsigned long long value, endvalue;
@@ -1288,6 +1288,18 @@ error:
 exit(1);
 }
 
+static void numa_node_parse_cpus(int nodenr, const char *option)
+{
+char **parts;
+int i;
+
+parts = g_strsplit(option, ",", 0);
+for (i = 0; parts[i]; i++) {
+numa_node_parse_cpu_range(nodenr, parts[i]);
+}
+g_strfreev(parts);
+}
+
 static void numa_add(const char *optarg)
 {
 char option[128];
-- 
1.8.1

Re: [Qemu-devel] [edk2] OvmfPkg reset [was: Distinguish between reset types]

2013-02-21 Thread Paolo Bonzini

Il 21/02/2013 19:24, Laszlo Ersek ha scritto:
>> > (1) The reset capability that OVMF exports via ACPI -- I agree that I
>> > should be effecting the 0xCF9 thing in the appropriate table.
> On a second thought, this will require a new build -D flag, or a PCD.
> 
> I'm not worried about the ACPI 1.0 --> ACPI 2.0 change in the FADT, the
> table struct itself forward compatible.
> 
> However currently we're not saying anything about the reset capabilities
> of the platform. A client looking at the FADT for reset info will find
> nothing and follow its own logic, which may or may not include writing
> to 0xCF9, but we don't have any part in it.
> 
> If now the FADT starts to claim 0xCF9 on a qemu version that doesn't
> actually support it, we could mislead the client. Unless we can
> interrogate qemu about the support (and I think we can't), we'll have to
> depend on a build-time option. (Or should I shoehorn it into -D CSM_ENABLE?)
> 
> Jordan, what do you think?

ACPI tables are hosed enough on some real system that we can assume that
all guests will fall back to something else---typically a keyboard
controller reset.

See the sequence that Windows does: ACPI, kbd, ACPI, kbd.

Paolo

Re: [Qemu-devel] [edk2] [RFC PATCH] Distinguish between reset types

2013-02-21 Thread Anthony Liguori

Paolo Bonzini  writes:

> Il 21/02/2013 18:12, David Woodhouse ha scritto:
>>> > Since
>>> > - I like kvm, and
>>> > - I cannot easily change my hardware (which doesn't support UG), and
>>> > - I prefer to run the RHEL-6 kernel which has "old" KVM,
>>> > I depend on David's fix for the PAM registers.
>> Which means your suspend/resume is broken. Yay! :)
>> 
>> Anthony's patchset should hopefully fix that.
>> 
>> I wonder what it would take to work around the KVM bug in qemu though;
>> can we actually *emulate* 16-bit mode instead of using KVM, on the
>> kernels where KVM is broken? And then transition back to using KVM when
>> we switch to 32-bit or 64-bit mode?
>
> I don't think you want to do that.  It would be really slow, it would
> trade some emulation bugs for others, and people who mind security would
> not appreciate the additional attack surface of TCG.
>
> (BTW, that's how the very first version of KVM worked.  It didn't use
> hardware virtualization at all until entering 64-bit mode, IIRC).

Pre-released versions, yes.  Avi started with emulating up to 64-bit and
worked up from there.  None of the public versions ever implemented
this.

Regards,

Anthony Liguori

>
> I think we just have to fix it in 1.5, and suggest a newer kernel to
> everyone else.
>
> Paolo

Re: [Qemu-devel] [PATCH v4 00/10] main-loop: switch to g_poll(3) on POSIX hosts

2013-02-21 Thread Anthony Liguori

Paolo Bonzini  writes:

> Il 21/02/2013 18:25, Anthony Liguori ha scritto:
>>> >
>>> > What would be more problematic is the chardev flow control patches,
>>> > which use the glib main loop directly.  I don't recall your KVM forum
>>> > presentation---did you need RT prioritization of the serial port too?
>> It uses GSources which don't need a full glib main loop.  We just need
>> to be able to support glib event dispatch from whatever our main loop
>> is.
>
> The lock in g_main_context_{prepare,query,check,dispatch} is not
> RT-friendly.  We could have contention on that lock between the iothread
> and the VCPU thread.

Since I assume we'll have multiple I/O threads by then, I also assume
that we'll run RT events on one or more dedicated I/O thread that only
handles RT events.

In terms of attaching events to I/O threads, it's pretty simple to delay
the GSource attach to a bottom half on a non-RT I/O thread if the VCPU
needs to add a glib based event.

Regards,

Anthony Liguori

>
> Paolo
>
>> I would assume that we'd treat any glib event with the same priority if
>> we ever had a RT-iothread.

Re: [Qemu-devel] [PATCH v4 00/10] main-loop: switch to g_poll(3) on POSIX hosts

2013-02-21 Thread Jan Kiszka

On 2013-02-21 18:10, Paolo Bonzini wrote:
> Il 21/02/2013 17:29, Jan Kiszka ha scritto:
>> In this context, I'd like to recall a detail: Real-time prioritization
>> of those I/O threads will most probably require locking with
>> prio-inversion avoidance (*). In that case external libs without a
>> chance to tune or replace their internal locking (like glib) will be a
>> no-go for those kind of I/O threads.
>>
>> The glib mainloop might be fine for all the rest, but we will likely
>> also need event loops with RT-compatible locking. So this refactoring
>> should keep the door open for alternative implementations.
> 
> Yes, this refactoring is fine.  It doesn't use the glib mainloop any
> more than before.  AioContext is fine as an RT-compatible event loop.
> You can use both as a GSource or manually from a separate thread.
> 
> What would be more problematic is the chardev flow control patches,
> which use the glib main loop directly.  I don't recall your KVM forum
> presentation---did you need RT prioritization of the serial port too?

Not in my use case (that was the RTC).

I didn't come across a scenario where you have to emulated a serial port
with real-time constraints so far but I would not exclude it. Hmm, think
of serial over LAN: UART device is remote, virtualized legacy system
shall talk to it as if it was locally connected, and that in a timely
manner.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

[Qemu-devel] BUG: RTC issue when Windows guest is idle

2013-02-21 Thread Matthew Anderson

If this isn't the correct list just let me know,

I've run into a bug whereby a Windows guest (tested on Server 2008R2 and 2012) 
no longer receives RTC ticks when it has been idle for a random amount of time. 
HPET is disabled and the guest is running Hyper-V relaxed timers (same 
situation without hv_relaxed). The guest clock stands still and the qemu 
process uses very little CPU (<0.5%, normally it's >5% when the guest is idle) 
. Eventually the guest stops responding to network requests but if you open the 
guest console via VNC and move the mouse around it comes back to life and QEMU 
replays the lost RTC ticks and the guest recovers. I've also been able to make 
it recover by querying the clock over the network via the net time command, you 
can see the clock stand still for 30 seconds then it replays the ticks and 
catches up.

I've tried to reproduce the issue but it seems fairly illusive, the only way 
I've been able to reproduce it is by letting the VM's idle and waiting. 
Sometimes it's hours and sometimes minutes. Can anyone suggest a way to narrow 
the issue down?

Qemu command line is-
/usr/bin/kvm -name SQL01 -S -M pc-0.14 -cpu qemu64,hv_relaxed -enable-kvm -m 
2048 -smp 2,sockets=2,cores=1,threads=1 -uuid 
5f54333b-c250-aa72-c979-39d156814b85 -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/iHost-SQL01.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-hpet 
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/mnt/gluster1-norep/iHost/SQL01.qed,if=none,id=drive-virtio-disk0,format=qed,cache=writeback
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0
 -drive 
file=/mnt/gluster1-norep/iHost/SQL01-Data.qed,if=none,id=drive-virtio-disk2,format=qed,cache=writeback
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk2,id=virtio-disk2
 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device 
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev 
tap,fd=29,id=hostnet0,vhost=on,vhostfd=39 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2c:8d:23,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-device usb-tablet,id=input0 -vnc 127.0.0.1:22 -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

Environment is -
Mainline 3.7.5 and 3.8.0
Qemu 1.2.2, 1.3.1 and 1.4.0
Scientific Linux 6.3
KSM enabled, transparent hugepages disabled.
Dual Xeon 5650
192GB

Thanks all

[Qemu-devel] [PATCH 1/2] Fix dma interrupt

2013-02-21 Thread Amadeusz Sławiński

In openbios (drivers/ide.c) they are set to

000d  0002 
000e  0003 
000f  0004 
(The last one seems to be not implemented in qemu)

It follows convention of how they are set on real machines,
both ide and dma ones are increased

Real machine one:
http://web.archive.org/web/20090107151044/http://penguinppc.org/historical/dev-trees-html/g4_agp_500_2.html
0013 0001 000b 
0014 0001 000c 
0015 0001 000d 

Signed-off-by: Amadeusz Sławiński 
---
 hw/ppc/mac_newworld.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index 065ea87..a08a6b2 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -370,7 +370,7 @@ static void ppc_core99_init(QEMUMachineInitArgs *args)
 qdev_connect_gpio_out(dev, 1, pic[0x0d]); /* IDE */
 qdev_connect_gpio_out(dev, 2, pic[0x02]); /* IDE DMA */
 qdev_connect_gpio_out(dev, 3, pic[0x0e]); /* IDE */
-qdev_connect_gpio_out(dev, 4, pic[0x02]); /* IDE DMA */
+qdev_connect_gpio_out(dev, 4, pic[0x03]); /* IDE DMA */
 macio_init(macio, pic_mem, escc_bar);
 
 /* We only emulate 2 out of 3 IDE controllers for now */
-- 
1.8.1.4

[Qemu-devel] OvmfPkg reset [was: Distinguish between reset types]

2013-02-21 Thread Laszlo Ersek

> (1) The reset capability that OVMF exports via ACPI -- I agree that I
> should be effecting the 0xCF9 thing in the appropriate table.

On a second thought, this will require a new build -D flag, or a PCD.

I'm not worried about the ACPI 1.0 --> ACPI 2.0 change in the FADT, the
table struct itself forward compatible.

However currently we're not saying anything about the reset capabilities
of the platform. A client looking at the FADT for reset info will find
nothing and follow its own logic, which may or may not include writing
to 0xCF9, but we don't have any part in it.

If now the FADT starts to claim 0xCF9 on a qemu version that doesn't
actually support it, we could mislead the client. Unless we can
interrogate qemu about the support (and I think we can't), we'll have to
depend on a build-time option. (Or should I shoehorn it into -D CSM_ENABLE?)

Jordan, what do you think?

(Of course this *too* would go away if qemu itself prepared all ACPI
tables over fw_cfg for whichever firmware the user selects. Then we
wouldn't have to duplicate ACPI work between SeaBIOS and OVMF any
longer. This task has been on my todo list forever, but it never seems
to near the top.

Maybe moving to a hermit cabin for a month and sleeping even less would
make it happen. I have no clue how others do it.)

Thanks!
Laszlo

Re: [Qemu-devel] [PATCH 01/41] migration: simplify while loop

2013-02-21 Thread Juan Quintela

Paolo Bonzini  wrote:
> Unify the goto around the loop, with the exit condition at the end of it.
> Both can be expressed as "while (ret >= 0)".
>
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Juan Quintela

[Qemu-devel] [PULL] check-qjson: More thorough testing of UTF-8 in strings

2013-02-21 Thread Luiz Capitulino

From: Markus Armbruster 

Test cases are scraped from Markus Kuhn's UTF-8 decoder capability and
stress test at
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

Unfortunately, both JSON parser and formatter misbehave right now.
This test expects current, incorrect results.  They're all clearly
marked, and are to be replaced by correct ones as the bugs get fixed.
See comments in new utf8_string() for details.

Signed-off-by: Markus Armbruster 
Signed-off-by: Luiz Capitulino 
---
 tests/check-qjson.c | 664 
 1 file changed, 664 insertions(+)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 32ffb43..ec85a0c 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1,8 +1,10 @@
 /*
  * Copyright IBM, Corp. 2009
+ * Copyright (c) 2013 Red Hat Inc.
  *
  * Authors:
  *  Anthony Liguori   
+ *  Markus Armbruster ,
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
  * See the COPYING.LIB file in the top-level directory.
@@ -131,6 +133,667 @@ static void single_quote_string(void)
 }
 }
 
+static void utf8_string(void)
+{
+/*
+ * FIXME Current behavior for invalid UTF-8 sequences is
+ * incorrect.  This test expects current, incorrect results.
+ * They're all marked "bug:" below, and are to be replaced by
+ * correct ones as the bugs get fixed.
+ *
+ * The JSON parser rejects some invalid sequences, but accepts
+ * others without correcting the problem.
+ *
+ * The JSON formatter replaces some invalid sequences by U+ (a
+ * noncharacter), and goes wonky for others.
+ *
+ * For both directions, we should either reject all invalid
+ * sequences, or minimize overlong sequences and replace all other
+ * invalid sequences by a suitable replacement character.  A
+ * common choice for replacement is U+FFFD.
+ *
+ * Problem: we can't easily deal with embedded U+.  Parsing
+ * the JSON string "this \\u" is fun" yields "this \0 is fun",
+ * which gets misinterpreted as NUL-terminated "this ".  We should
+ * consider using overlong encoding \xC0\x80 for U+ ("modified
+ * UTF-8").
+ *
+ * Test cases are scraped from Markus Kuhn's UTF-8 decoder
+ * capability and stress test at
+ * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
+ */
+static const struct {
+const char *json_in;
+const char *utf8_out;
+const char *json_out;   /* defaults to @json_in */
+const char *utf8_in;/* defaults to @utf8_out */
+} test_cases[] = {
+/*
+ * Bug markers used here:
+ * - bug: not corrected
+ *   JSON parser fails to correct invalid sequence(s)
+ * - bug: rejected
+ *   JSON parser rejects invalid sequence(s)
+ *   We may choose to define this as feature
+ * - bug: want "\"...\""
+ *   JSON formatter produces incorrect result, this is the
+ *   correct one, assuming replacement character U+
+ * - bug: want "..." (no \")
+ *   JSON parser produces incorrect result, this is the
+ *   correct one, assuming replacement character U+
+ *   We may choose to reject instead of replace
+ * Not marked explicitly, but trivial to find:
+ * - JSON formatter replacing invalid sequence by \\u is a
+ *   bug if we want it to fail for invalid sequences.
+ */
+
+/* 1  Some correct UTF-8 text */
+{
+/* a bit of German */
+"\"Falsches \xC3\x9C" "ben von Xylophonmusik qu\xC3\xA4lt"
+" jeden gr\xC3\xB6\xC3\x9F" "eren Zwerg.\"",
+"Falsches \xC3\x9C" "ben von Xylophonmusik qu\xC3\xA4lt"
+" jeden gr\xC3\xB6\xC3\x9F" "eren Zwerg.",
+"\"Falsches \\u00DCben von Xylophonmusik qu\\u00E4lt"
+" jeden gr\\u00F6\\u00DFeren Zwerg.\"",
+},
+{
+/* a bit of Greek */
+"\"\xCE\xBA\xE1\xBD\xB9\xCF\x83\xCE\xBC\xCE\xB5\"",
+"\xCE\xBA\xE1\xBD\xB9\xCF\x83\xCE\xBC\xCE\xB5",
+"\"\\u03BA\\u1F79\\u03C3\\u03BC\\u03B5\"",
+},
+/* 2  Boundary condition test cases */
+/* 2.1  First possible sequence of a certain length */
+/* 2.1.1  1 byte U+ */
+{
+"\"\\u\"",
+"", /* bug: want overlong "\xC0\x80" */
+"\"\"", /* bug: want "\"\\u\"" */
+},
+/* 2.1.2  2 bytes U+0080 */
+{
+"\"\xC2\x80\"",
+"\xC2\x80",
+"\"\\u0080\"",
+},
+/* 2.1.3  3 bytes U+0800 */
+{
+"\"\xE0\xA0\x80\"",
+"\xE0\xA0\x80",
+"\"\\u0800\"",
+},
+/* 2.1.4  4 bytes U+1 */
+{
+"\"\xF0\x90\x80\x80\"",
+"\xF0\x90\x80\x80",
+"\"\\u0400\\uFF

[Qemu-devel] [PULL] QMP queue

2013-02-21 Thread Luiz Capitulino

The changes (since 70aa41b56ce3f34fceac44e828ba2d8cc19523ee) are available
in the following repository:

git://repo.or.cz/qemu/qmp-unstable.git queue/qmp

Markus Armbruster (1):
  check-qjson: More thorough testing of UTF-8 in strings

 tests/check-qjson.c | 664 
 1 file changed, 664 insertions(+)

-- 
1.8.1.2

Re: [Qemu-devel] [RFC PATCH 3/3] KVM: s390: Hook up ioeventfds.

2013-02-21 Thread Cornelia Huck

On Thu, 21 Feb 2013 18:34:59 +0200
"Michael S. Tsirkin"  wrote:

> On Thu, Feb 21, 2013 at 04:21:43PM +0100, Cornelia Huck wrote:
> > On Thu, 21 Feb 2013 16:39:05 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Thu, Feb 21, 2013 at 02:13:00PM +0100, Cornelia Huck wrote:
> > > > As s390 doesn't use memory writes for virtio notifcations, create
> > > > a special kind of ioeventfd instead that hooks up into diagnose
> > > > 0x500 (kvm hypercall) with function code 3 (virtio-ccw notification).
> > > > 
> > > > Signed-off-by: Cornelia Huck 
> > > 
> > > Do we really have to put virtio specific stuff into kvm?
> > > How about we add generic functionality to match GPRs
> > > on a hypercall and signal an eventfd?
> > 
> > Worth a try implementing that.
> > 
> > > 
> > > Also, it's a bit unfortunate that this doesn't use
> > > the io bus datastructure, long term the linked list handling
> > > might become a bottleneck, using shared code this could maybe
> > > benefit from performance optimizations there.
> > 
> > The linked list stuff was more like an initial implementation that
> > could be improved later.
> > 
> > > io bus data structure currently has the ability to match on
> > > two 64 bit fields (addr/datamatch) and a signed 32 bit one (length).
> > > Isn't this sufficient for your purposes?
> > > How about sticking subchannel id in address, vq in data match
> > > and using io bus?
> > 
> > I can give that a try. (I must admit that I didn't look at the iobus
> > stuff in detail.)
> > 
> > > 
> > > BTW maybe we could do this for the user interface too,
> > > while I'm not 100% sure it's the cleanest thing to do
> > > (or will work), it would certainly minimize the patchset.
> > 
> > You mean integrating with the generic interface and dropping the new
> > ARCH flag?
> 
> Not sure about the flag but we could use the general structure
> without an arch-specific format, if that's a good fit.
> 
So I have something that seems to do what I want. I'll see if I can
morph it into something presentable tomorrow.

diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index b58dd86..3c43e30 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -22,6 +22,7 @@ config KVM
select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_CPU_RELAX_INTERCEPT
+   select HAVE_KVM_EVENTFD
---help---
  Support hosting paravirtualized guest machines using the SIE
  virtualization capability on the mainframe. This should work
diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
index 3975722..8fe9d65 100644
--- a/arch/s390/kvm/Makefile
+++ b/arch/s390/kvm/Makefile
@@ -6,7 +6,7 @@
 # it under the terms of the GNU General Public License (version 2 only)
 # as published by the Free Software Foundation.
 
-common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o)
+common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o eventfd.o)
 
 ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
 
diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index a390687..7fc195e 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -104,6 +104,20 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
return -EREMOTE;
 }
 
+static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
+{
+   int ret, idx;
+   u64 vq = vcpu->run->s.regs.gprs[3];
+
+   idx = srcu_read_lock(&vcpu->kvm->srcu);
+   ret = kvm_io_bus_write(vcpu->kvm, KVM_CSS_BUS,
+   vcpu->run->s.regs.gprs[2],
+   vcpu->run->s.regs.gprs[1],
+   &vq);
+   srcu_read_unlock(&vcpu->kvm->srcu, idx);
+   return ret;
+}
+
 int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
 {
int code = (vcpu->arch.sie_block->ipb & 0xfff) >> 16;
@@ -118,6 +132,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
return __diag_time_slice_end_directed(vcpu);
case 0x308:
return __diag_ipl_functions(vcpu);
+   case 0x500:
+   return __diag_virtio_hypercall(vcpu);
default:
return -EOPNOTSUPP;
}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index f822d36..04d2454 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -142,6 +142,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_ONE_REG:
case KVM_CAP_ENABLE_CAP:
case KVM_CAP_S390_CSS_SUPPORT:
+   case KVM_CAP_IOEVENTFD:
r = 1;
break;
case KVM_CAP_NR_VCPUS:
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3b768ef..59be516 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -148,6 +148,7 @@ struct kvm_io_bus {
 enum kvm_bus {
KVM_MMIO_BUS,
KVM_PIO_BUS,
+   KVM_CSS_BUS,
KVM_NR_BUSES
 };
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9a2db57..1df0766 100644
--- a/include/uapi/linux/kvm.h

[Qemu-devel] [PATCH] hw/usb/redirect.c: add missing parameter to usb_wakeup()

2013-02-21 Thread Eduardo Habkost

commit 8550a02d1239415342959f6a32d178bc05c557cc broke the build because
the usb_wakeup() call wasn't changed to include the new 'stream'
parameter.

This changes the call to pass 0 as the 'stream' parameter.

Signed-off-by: Eduardo Habkost 
---
 hw/usb/redirect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 7078403..c519b9b 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1897,7 +1897,7 @@ static void usbredir_interrupt_packet(void *priv, 
uint64_t id,
 }
 
 if (QTAILQ_EMPTY(&dev->endpoint[EP2I(ep)].bufpq)) {
-usb_wakeup(usb_ep_get(&dev->dev, USB_TOKEN_IN, ep & 0x0f));
+usb_wakeup(usb_ep_get(&dev->dev, USB_TOKEN_IN, ep & 0x0f), 0);
 }
 
 /* bufp_alloc also adds the packet to the ep queue */
-- 
1.8.1

Re: [Qemu-devel] [PATCH 12/14] usb-core: usb3 streams

2013-02-21 Thread Eduardo Habkost

On Thu, Feb 21, 2013 at 10:59:11AM +0100, Gerd Hoffmann wrote:
> This patch adds support for usb3 streams to the usb subsystem core.
> This is just adding a streams field / parameter in a number of places.
> 
> Signed-off-by: Gerd Hoffmann 

This broke the build on master:

 CChw/usb/redirect.o
hw/usb/redirect.c: In function ‘usbredir_interrupt_packet’:
hw/usb/redirect.c:1900:13: error: too few arguments to function ‘usb_wakeup’
In file included from hw/usb/redirect.c:41:0:
./hw/usb.h:435:6: note: declared here
make: *** [hw/usb/redirect.o] Error 1

-- 
Eduardo

Re: [Qemu-devel] [edk2] [RFC PATCH] Distinguish between reset types

2013-02-21 Thread Anthony Liguori

David Woodhouse  writes:

> On Thu, 2013-02-21 at 18:10 +0100, Laszlo Ersek wrote:
>> Since
>> - I like kvm, and
>> - I cannot easily change my hardware (which doesn't support UG), and
>> - I prefer to run the RHEL-6 kernel which has "old" KVM,
>> I depend on David's fix for the PAM registers.
>
> Which means your suspend/resume is broken. Yay! :)
>
> Anthony's patchset should hopefully fix that.

Will post a new version later today.  Am testing now with Laszlo's mock
up from a previous note.

I'm also going to write a qtest test case to try and validate all of
this behavior (minus the KVM bits).

> I wonder what it would take to work around the KVM bug in qemu though;
> can we actually *emulate* 16-bit mode instead of using KVM, on the
> kernels where KVM is broken? And then transition back to using KVM when
> we switch to 32-bit or 64-bit mode?

It's in the realm of possibility but ugly enough that we never did it.

I even worked on this way back in the day with Xen.  It sucked pretty
bad.  The challenge in support SMP eventually scared me off of it.

http://new-wiki.xen.org/old-wiki/xenwiki/HVM/V2E.html

Regards,

Anthony Liguori

>
> -- 
> dwmw2

Re: [Qemu-devel] [PATCH v4 00/10] main-loop: switch to g_poll(3) on POSIX hosts

2013-02-21 Thread Paolo Bonzini

Il 21/02/2013 18:25, Anthony Liguori ha scritto:
>> >
>> > What would be more problematic is the chardev flow control patches,
>> > which use the glib main loop directly.  I don't recall your KVM forum
>> > presentation---did you need RT prioritization of the serial port too?
> It uses GSources which don't need a full glib main loop.  We just need
> to be able to support glib event dispatch from whatever our main loop
> is.

The lock in g_main_context_{prepare,query,check,dispatch} is not
RT-friendly.  We could have contention on that lock between the iothread
and the VCPU thread.

Paolo

> I would assume that we'd treat any glib event with the same priority if
> we ever had a RT-iothread.

[Qemu-devel] [PATCH 2/2] xnu kernel expects FLUSH to be cleared on STOP

2013-02-21 Thread Amadeusz Sławiński

otherwise it gets stuck in a loop
so clear it when unsetting run when flush is set

void
IODBDMAStop( volatile IODBDMAChannelRegisters *registers)
{

IOSetDBDMAChannelControl( registers,
IOClearDBDMAChannelControlBits( kdbdmaRun )
| IOSetDBDMAChannelControlBits(  kdbdmaFlush ));

DBDMA: writel 0x0b00 <= 0xa0002000
DBDMA: channel 0x16 reg 0x0
DBDMA: status 0x2000

while( IOGetDBDMAChannelStatus( registers) & (
kdbdmaActive | kdbdmaFlush))
eieio();

DBDMA: readl 0x0b04 => 0x2000
DBDMA: channel 0x16 reg 0x1
DBDMA: readl 0x0b04 => 0x2000
DBDMA: channel 0x16 reg 0x1
DBDMA: readl 0x0b04 => 0x2000
DBDMA: channel 0x16 reg 0x1
DBDMA: readl 0x0b04 => 0x2000
DBDMA: channel 0x16 reg 0x1
it continues to get printed

}

Signed-off-by: Amadeusz Sławiński 
---
 hw/mac_dbdma.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/mac_dbdma.c b/hw/mac_dbdma.c
index b894ab2..34c1757 100644
--- a/hw/mac_dbdma.c
+++ b/hw/mac_dbdma.c
@@ -688,6 +688,10 @@ dbdma_control_write(DBDMA_channel *ch)
 if ((ch->regs[DBDMA_STATUS] & RUN) && !(status & RUN)) {
 /* RUN is cleared */
 status &= ~(ACTIVE|DEAD);
+if ((status & FLUSH) && ch->flush) {
+ch->flush(&ch->io);
+status &= ~FLUSH;
+   }
 }
 
 DBDMA_DPRINTF("status 0x%08x\n", status);
-- 
1.8.1.4

[Qemu-devel] [Bug 1130533] Re: Documentation cannot be build since commit c70a01e449536c616c85ab820c6fbad7d7e9cf39

2013-02-21 Thread FredBezies

Tried it. You can add it, it works !

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1130533

Title:
  Documentation cannot be build since commit
  c70a01e449536c616c85ab820c6fbad7d7e9cf39

Status in QEMU:
  New

Bug description:
  I tried to build git -based qemu and when documentation is processed I
  got this error :

  ./qemu-options.texi:1526: unknown command `list'
  ./qemu-options.texi:1526: table requires an argument: the formatter for @item
  ./qemu-options.texi:1526: warning: @table has text but no @item

  Looks like commit c70a01e449536c616c85ab820c6fbad7d7e9cf39 is guilty
  ?!

  Or any modification related to documentation, I think.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1130533/+subscriptions

[Qemu-devel] [Bug 1130533] Re: Documentation cannot be build since commit c70a01e449536c616c85ab820c6fbad7d7e9cf39

2013-02-21 Thread Stefan Hajnoczi

Please try this patch:
http://patchwork.ozlabs.org/patch/12/

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1130533

Title:
  Documentation cannot be build since commit
  c70a01e449536c616c85ab820c6fbad7d7e9cf39

Status in QEMU:
  New

Bug description:
  I tried to build git -based qemu and when documentation is processed I
  got this error :

  ./qemu-options.texi:1526: unknown command `list'
  ./qemu-options.texi:1526: table requires an argument: the formatter for @item
  ./qemu-options.texi:1526: warning: @table has text but no @item

  Looks like commit c70a01e449536c616c85ab820c6fbad7d7e9cf39 is guilty
  ?!

  Or any modification related to documentation, I think.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1130533/+subscriptions

Re: [Qemu-devel] [PATCH 11/41] migration: simplify error handling

2013-02-21 Thread Paolo Bonzini

Il 21/02/2013 18:34, Juan Quintela ha scritto:
> This move buffered_flush() to inside the iothread lock.  At least the
> commit message needs to be changed.

No, it doesn't... Here is the full body of the migration thread:

qemu_mutex_lock_iothread();
qemu_savevm_state_begin(s->file, &s->params);

while (s->state == MIG_STATE_ACTIVE) {
int64_t current_time = qemu_get_clock_ms(rt_clock);
uint64_t pending_size;

if (s->bytes_xfer < s->xfer_limit) {
/* call qemu_savevm_state_* */
}

qemu_mutex_unlock_iothread();
if (current_time >= initial_time + BUFFER_DELAY) {
/* yadda yadda */
s->bytes_xfer = 0;
initial_time = current_time;
}
if (!last_round && (s->bytes_xfer >= s->xfer_limit)) {
/* usleep expects microseconds */
g_usleep((initial_time + BUFFER_DELAY - current_time)*1000);
}
buffered_flush(s);
qemu_mutex_lock_iothread();

if (qemu_file_get_error(s->file)) {
migrate_fd_error(s);
}
}

qemu_mutex_unlock_iothread();

> Looking at the rest of the series before thinking if that is the right
> approach.

The series is fully bisectable.  There should be no thread-unsafe patch,
nor any state where blocking calls are done with iothread lock taken
(except at the end of migration where qemu_savevm_state_complete runs
with iothread lock taken; but this happens much later than this patch).

Paolo

Re: [Qemu-devel] [PATCH v4 4/6] introduce new vma archive format

2013-02-21 Thread Eric Blake

On 02/21/2013 08:32 AM, Dietmar Maurer wrote:
 Hint - look at how the qcow2 file format is specified.  You need a
 lot more details - enough that someone could independently implement
 a program to read and create vma images that would be compatible
 with what your implementation produces.
>>>
>>> The format is really simple, you have the definitions in the header
>>> file, and a reference implementation. I am quite sure any experienced
>>> developer can write an implementation in a few hours.
>>>
>>> We do not talk about an extremely complex format like 'qcow2' here.
>>
>> This is not an excuse for lacking details in the spec. Quite the opposite. If
>> someone has to look at your code in order to implement the format, you have
>> failed as a spec writer.
> 
> I do not agree here. Clean source code is as good as a spec.

Not in qemu.  There's a reason that we ask for clean specs, independent
of source code.  That is the only way that we can later change the
source code to do something more efficient or to define an extension,
and still have clean documentation of what the extensions are, vs. how
an older version will behave.  The initial implementation source code
might be easy to read, but that condition is not guaranteed to last.

If reading the source code to determine the header format is as easy as
you say, then you should have no problem writing the spec as detailed as
we have asked.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 13/41] migration: prepare to access s->state outside critical sections

2013-02-21 Thread Paolo Bonzini

Il 21/02/2013 18:42, Juan Quintela ha scritto:
> Paolo Bonzini  wrote:
>> Accessing s->state outside the big QEMU lock will simplify a bit the
>> locking/unlocking of the iothread lock.
>>
>> Signed-off-by: Paolo Bonzini 
> 
> Reviewed-by: Juan Quintela 
> 
> We compensate the locking removal you did on the previous patch.

I thought this was thread-safe at any step (and it took a while to reach
that state :)), did I miss something?

>> +__sync_val_compare_and_swap(&s->state, MIG_STATE_ACTIVE, 
>> MIG_STATE_ERROR);
> 
> This can be done later, but can we change this to a macro/inline
> function:
> 
> inline void migration_set_state(int state)
> {
> __sync_val_compare_and_swap(&s->state, MIG_STATE_ACTIVE, state);
> }
> 
> or something like that?

Yes, especially so that we can add a tracepoint as in the
recently-posted series.

Paolo

Re: [Qemu-devel] [PATCH 4/4] piix_pci: Implement reset for i440FX PAM configuration

2013-02-21 Thread Laszlo Ersek

On 02/21/13 12:02, David Woodhouse wrote:
> On Thu, 2013-02-21 at 09:17 +, David Woodhouse wrote:
>> Thanks. I suppose I'd better test that I haven't broken suspend/resume
>> first. I don't care about OS/2 or VDISK, but it's vaguely possible that
>> suspend/resume might be another "reset" user which actually *would* be
>> offended if the PAM configuration got reset, causing the BIOS to do a
>> complete reinit.
>>
>> If that breaks, this will have to wait for the 'soft-reset' bits that
>> Anthony is looking at.
> 
> Ah crap, it breaks. Suspend/resume now does a full PAM reset, so SeaBIOS
> doesn't see that it's a resume and goes through a full hardware init and
> reboot.

:(

But at least now I understand why the PAM regs were not reset before,
and why SeaBIOS has to jump through the qemu_prep_reset() hoop... I'm
afraid people had this entire discussion before :(

I'm sorry I couldn't think of this earlier... May be related to the fact
that I could never really appreciate the "suspend in a VM" thing.

Laszlo

Re: [Qemu-devel] [PATCH 13/41] migration: prepare to access s->state outside critical sections

2013-02-21 Thread Juan Quintela

Paolo Bonzini  wrote:
> Accessing s->state outside the big QEMU lock will simplify a bit the
> locking/unlocking of the iothread lock.
>
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Juan Quintela 

We compensate the locking removal you did on the previous patch.

> +__sync_val_compare_and_swap(&s->state, MIG_STATE_ACTIVE, 
> MIG_STATE_ERROR);

This can be done later, but can we change this to a macro/inline
function:

inline void migration_set_state(int state)
{
__sync_val_compare_and_swap(&s->state, MIG_STATE_ACTIVE, state);
}

or something like that?

Re: [Qemu-devel] [PATCH v4 3/6] add backup related monitor commands

2013-02-21 Thread Luiz Capitulino

On Thu, 21 Feb 2013 14:47:08 +0100
Stefan Hajnoczi  wrote:

> On Thu, Feb 21, 2013 at 06:21:32AM +, Dietmar Maurer wrote:
> > > > +##
> > > > +# @backup:
> > > > +#
> > > > +# Starts a VM backup.
> > > > +#
> > > > +# @backup-file: the backup file name
> > > > +#
> > > > +# @format: format of the backup file
> > > > +#
> > > > +# @config-filename: #optional name of a configuration file to include
> > > > +into # the backup archive.
> > > > +#
> > > > +# @speed: #optional the maximum speed, in bytes per second # #
> > > > +@devlist: #optional list of block device names (separated by ',', ';'
> > > > +# or ':'). By default the backup includes all writable block devices.
> > > > +#
> > > > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## {
> > > > +'command': 'backup', 'data': { 'backup-file': 'str',
> > > > + '*format': 'BackupFormat',
> > > > + '*config-file': 'str',
> > > > + '*devlist': 'str', '*speed': 'int'
> > > > +},
> > > 
> > > devlist should be ['String'].
> > 
> > I want to be able to use that command on the human monitor.
> > That is no longer possible if I use ['String']?

Unless I'm missing something, it should work just fine. The HMP command
can have an argument which is a device list, then it will have to parse
that list and transform it into the list expected by QAPI functions, which
is what is going to be passed to the qmp_ function.

Re: [Qemu-devel] [PATCH 11/41] migration: simplify error handling

2013-02-21 Thread Juan Quintela

Paolo Bonzini  wrote:
> Always use qemu_file_get_error to detect errors, since that is how
> QEMUFile itself drops I/O after an error occurs.  There is no need
> to propagate and check return values all the time.
>
> Also remove the "complete" member, since we know that it is set (via
> migrate_fd_cleanup) only when the state changes.
>
> Signed-off-by: Paolo Bonzini 

This move buffered_flush() to inside the iothread lock.  At least the
commit message needs to be changed.

Looking at the rest of the series before thinking if that is the right
approach.

Later, Juan.

Re: [Qemu-devel] [RFC for-1.4] Revert "block: fix block tray status"

2013-02-21 Thread Luiz Capitulino

On Thu, 7 Feb 2013 11:31:08 -0200
Luiz Capitulino  wrote:

> On Thu, 07 Feb 2013 14:28:33 +0100
> Kevin Wolf  wrote:
> 
> > Am 07.02.2013 14:15, schrieb Luiz Capitulino:
> > > On Thu, 07 Feb 2013 14:12:10 +0100
> > > Kevin Wolf  wrote:
> > >> I think the right solution is to move the bdrv_dev_change_media_cb()
> > >> call to those callers of bdrv_close() that actually need it. I haven't
> > >> checked it yet in detail, but at the first sight it looks like
> > >> eject_device() and do_drive_del() are the only ones that need it.
> > >>
> > >> Such a change I'd rather move to 1.5, though. Do you think it's
> > >> important to get rid of these (admittedly odd) artefacts for 1.4?
> > > 
> > > I don't have any reports of this causing issues to QMP clients, so
> > > we can defer to 1.5. Worst case we merge this fix for a stable release.
> > 
> > Okay, then that's the plan. Are you going to send a patch for 1.5 or
> > should I make sure to have it somewhere on a todo list?
> 
> I can do it, but not this week.

It turns out I couldn't work on it this week, and I'll be off for some
time starting Monday next week.

Re: [Qemu-devel] [PATCH v4 00/10] main-loop: switch to g_poll(3) on POSIX hosts

2013-02-21 Thread Anthony Liguori

Paolo Bonzini  writes:

> Il 21/02/2013 17:29, Jan Kiszka ha scritto:
>> In this context, I'd like to recall a detail: Real-time prioritization
>> of those I/O threads will most probably require locking with
>> prio-inversion avoidance (*). In that case external libs without a
>> chance to tune or replace their internal locking (like glib) will be a
>> no-go for those kind of I/O threads.
>> 
>> The glib mainloop might be fine for all the rest, but we will likely
>> also need event loops with RT-compatible locking. So this refactoring
>> should keep the door open for alternative implementations.
>
> Yes, this refactoring is fine.  It doesn't use the glib mainloop any
> more than before.  AioContext is fine as an RT-compatible event loop.
> You can use both as a GSource or manually from a separate thread.
>
> What would be more problematic is the chardev flow control patches,
> which use the glib main loop directly.  I don't recall your KVM forum
> presentation---did you need RT prioritization of the serial port too?

It uses GSources which don't need a full glib main loop.  We just need
to be able to support glib event dispatch from whatever our main loop
is.

I would assume that we'd treat any glib event with the same priority if
we ever had a RT-iothread.

Regards,

Anthony Liguori

>
> Paolo
>
>> Jan
>> 
>> (*) Scenario: RT-iothread with highest prio wants to take a lock that is
>> held briefly by low-prio maintenance thread which is interrupted by
>> long-running medium-prio VCPU thread. This can delay or prevent the
>> event injection. Of course, this depends on the existence of such
>> dependencies.

Re: [Qemu-devel] [PATCH 05/41] block-migration: remove useless calls to blk_mig_cleanup

2013-02-21 Thread Juan Quintela

Paolo Bonzini  wrote:
> Now that the cancel callback is called consistently for all errors,
> we can avoid doing its work in the other callbacks.
>
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Juan Quintela

1 2 3 4 >

1 - 100 of 307 matches

Mail list logo