date:20091103


On 11/04/2009 05:45 AM, Andrew Olney wrote:

OK, if I convert the raw image to qcow2, it boots fine.

Why should a raw image give a blue screen?


Sounds like a serious bug.  What is the host filesystem?  Did you 
upgrade the host kernel or qemu?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [PATCH] [RFC] KVM test: Major control file cleanup

2009-11-03 Thread Yolkfull Chow

On Wed, Oct 28, 2009 at 02:04:59PM -0400, Michael Goldish wrote:
> 
> - "Lucas Meneghel Rodrigues"  wrote:
> 
> > On Wed, Oct 28, 2009 at 1:43 PM, Michael Goldish 
> > wrote:
> > > Sounds great, except it won't allow you to debug your configuration
> > > using kvm_config.py.  So the question now is what's more important
> > --
> > > the ability to debug or ease of use when running from the server.
> > 
> > Here we have 2 use cases:
> > 
> > 1) Users of the web interface, that (hopefully) have canned test sets
> > that work reliably. Ability to debug stuff is less important on this
> > scenario.
> > 2) People developing tests, and in this case ability to debug config
> > is very important
> > 
> > I see the following options:
> > 
> > 1) Document as part of the "test development guide" that, in order to
> > be able to debug stuff, that all the test sets are to be written to
> > the config file and then, can be parsed using kvm_config.
> > 2) If we write all dictionaries generated by that particular
> > configuration on files inside the job results directory, we still
> > have
> > debug ability for all use cases (I am starting to like this idea very
> > much, as I type).
> > 
> > So I'd then implement option 2) and refactor the control file with
> > the
> > test sets defined inside strings in the control file, then you can
> > see
> > how it looks? How about that?
> 
> Sounds fine.
> - Where exactly will the test list appear?
> - We should also allow printing of verbose debug output ("parsing variants
> block, 9000 dicts in current context...") by passing something to the
> constructor of the config object.
> - We should make it clear to the user that he/she must rename the control
> file (to control.lucas for example) or else it may be overwritten on the
> next git-fetch or -pull.
> 
> I'm still not sure it's a great idea to make config debugging harder, so
> if anyone other than Lucas who uses the KVM test is reading this, please
> let us know if you ever use kvm_config.py and if you think the ability to
> print the list of test dicts is important.

Hi Michael,

I had used kvm_config.py for printing lists of selected test dicts
often. And I think it's necessary to keep this feature. IMHO, option 2) Lucas
proposed is a good idea. What do you think? Hope I haven't missed
something. :)

> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vga_arb warning [was: mmotm 2009-11-01-10-01 uploaded]

2009-11-03 Thread Andrew Morton


Please cc me on mmotm bug reports!

On Sun, 01 Nov 2009 22:47:05 +0100 Jiri Slaby  wrote:

> On 11/01/2009 07:07 PM, a...@linux-foundation.org wrote:
> > The mm-of-the-moment snapshot 2009-11-01-10-01 has been uploaded to
> 
> Hi, I got the following warning while booting an image in qemu-kvm:
> 
> WARNING: at fs/attr.c:158 notify_change+0x2da/0x310()
> Hardware name:
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.32-rc5-mm1_64 #862
> Call Trace:
>  [] warn_slowpath_common+0x78/0xb0
>  [] warn_slowpath_null+0xf/0x20
>  [] notify_change+0x2da/0x310
>  [] ? fsnotify_create+0x48/0x60
>  [] ? vfs_mknod+0xbb/0xe0
>  [] devtmpfs_create_node+0x1e6/0x270
>  [] ? sysfs_addrm_finish+0x20/0x280
>  [] ? __sysfs_add_one+0x26/0xf0
>  [] ? sysfs_do_create_link+0xcc/0x160
>  [] device_add+0x1e0/0x5b0
>  [] ? pm_runtime_init+0xa1/0xb0
>  [] ? device_pm_init+0x65/0x70
>  [] device_register+0x19/0x20
>  [] device_create_vargs+0xf0/0x120
>  [] device_create+0x2c/0x30
>  [] ? __register_chrdev+0x86/0xf0
>  [] ? __class_create+0x69/0xa0
>  [] ? mutex_lock+0x19/0x50
>  [] misc_register+0x93/0x170
>  [] ? vga_arb_device_init+0x0/0x77
>  [] vga_arb_device_init+0x13/0x77
>  [] ? vga_arb_device_init+0x0/0x77
>  [] do_one_initcall+0x37/0x190
>  [] kernel_init+0x172/0x1c8
>  [] child_rip+0xa/0x20
>  [] ? kernel_init+0x0/0x1c8
>  [] ? child_rip+0x0/0x20

There's a -mm-only debug patch:

http://userweb.kernel.org/~akpm/mmotm/broken-out/notify_change-callers-must-hold-i_mutex.patch

--- a/fs/attr.c~notify_change-callers-must-hold-i_mutex
+++ a/fs/attr.c
@@ -155,6 +155,8 @@ int notify_change(struct dentry * dentry
return -EPERM;
}
 
+   WARN_ON_ONCE(!mutex_is_locked(&inode->i_mutex));
+
now = current_fs_time(inode->i_sb);
 
attr->ia_ctime = now;
_

I forget why it was added.


It looks like it's blaming the open-coded notify_change() call in
devtmpfs_create_node().

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: xp guest, blue screen c0000221 on boot

2009-11-03 Thread Andrew Olney


OK, if I convert the raw image to qcow2, it boots fine.

Why should a raw image give a blue screen?

Avi Kivity wrote:

On 10/28/2009 05:27 PM, Andrew Olney wrote:
Thanks. In pursuing this suggestion I discovered that I also can't 
make new XP VMs. Setup fails with "the disk may be damaged".


The image was created with

qemu-img create xp_new.img 13G

And the setup command is

kvm -cdrom xp/xp_pro.iso -hda xp_new.img -boot d -m 512 -no-acpi -usb 
-usbdevice tablet


What happens if you use a new qcow2 disk?  Does it fail in the same way?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: xp guest, blue screen c0000221 on boot

2009-11-03 Thread Andrew Olney


Thanks for the suggestion. I've successfully installed on a new qcow2 image.

Strangely all of my xp raw images no longer work. That makes me think 
that this is more than just a corrupt image.


I've been using kvm for several years with raw images, so I don't think 
the BIOS could be a factor either.


If there's a way to get my raw images to work, that would be nice. I'm 
trying to convert one of the raw images to qcow2 -- perhaps that will work.


Avi Kivity wrote:

On 10/28/2009 05:27 PM, Andrew Olney wrote:
Thanks. In pursuing this suggestion I discovered that I also can't 
make new XP VMs. Setup fails with "the disk may be damaged".


The image was created with

qemu-img create xp_new.img 13G

And the setup command is

kvm -cdrom xp/xp_pro.iso -hda xp_new.img -boot d -m 512 -no-acpi -usb 
-usbdevice tablet


What happens if you use a new qcow2 disk?  Does it fail in the same way?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Paul E. McKenney

On Tue, Nov 03, 2009 at 01:14:06PM -0500, Gregory Haskins wrote:
> Gregory Haskins wrote:
> > Eric Dumazet wrote:
> >> Michael S. Tsirkin a écrit :
> >>> +static void handle_tx(struct vhost_net *net)
> >>> +{
> >>> + struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
> >>> + unsigned head, out, in, s;
> >>> + struct msghdr msg = {
> >>> + .msg_name = NULL,
> >>> + .msg_namelen = 0,
> >>> + .msg_control = NULL,
> >>> + .msg_controllen = 0,
> >>> + .msg_iov = vq->iov,
> >>> + .msg_flags = MSG_DONTWAIT,
> >>> + };
> >>> + size_t len, total_len = 0;
> >>> + int err, wmem;
> >>> + size_t hdr_size;
> >>> + struct socket *sock = rcu_dereference(vq->private_data);
> >>> + if (!sock)
> >>> + return;
> >>> +
> >>> + wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> >>> + if (wmem >= sock->sk->sk_sndbuf)
> >>> + return;
> >>> +
> >>> + use_mm(net->dev.mm);
> >>> + mutex_lock(&vq->mutex);
> >>> + vhost_no_notify(vq);
> >>> +
> >> using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
> >> suspect
> >> that your use of RCU is not correct.
> >>
> >> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
> >>we are not allowed to sleep in such a section.
> >>(Quoting Documentation/RCU/whatisRCU.txt :
> >>  It is illegal to block while in an RCU read-side critical section, )
> >>
> >> 2) mutex_lock() can sleep (ie block)
> >>
> > 
> > 
> > Michael,
> >   I warned you that this needed better documentation ;)
> > 
> > Eric,
> >   I think I flagged this once before, but Michael convinced me that it
> > was indeed "ok", if but perhaps a bit unconventional.  I will try to
> > find the thread.
> > 
> > Kind Regards,
> > -Greg
> > 
> 
> Here it is:
> 
> http://lkml.org/lkml/2009/8/12/173

What was happening in that case was that the rcu_dereference()
was being used in a workqueue item.  The role of rcu_read_lock()
was taken on be the start of execution of the workqueue item, of
rcu_read_unlock() by the end of execution of the workqueue item, and
of synchronize_rcu() by flush_workqueue().  This does work, at least
assuming that flush_workqueue() operates as advertised, which it appears
to at first glance.

The above code looks somewhat different, however -- I don't see
handle_tx() being executed in the context of a work queue.  Instead
it appears to be in an interrupt handler.

So what is the story?  Using synchronize_irq() or some such?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.31.4 panic: CRED: put_cred_rcu() sees ffff880204e58c00 with usage 82150912

2009-11-03 Thread Nikola Ciprich

Hello Marcelo,
well, my report might have been a bit misleading, 2.6.31.2 has been running 
there for almost 3 weeks.
So the problem didn't occur JUST after the upgrade. But it never occured before 
it, while
we were using older kernels (varius 2.6.30.x, 2.6.29.x and older).
I've updated machine to 2.6.31.5, and it's been running without problem since 
then,
but it's been just few days, so we'll see.
I'll report if the problem appears again.
Thanks.
regards
nik

On Tue, Nov 03, 2009 at 12:46:38PM -0200, Marcelo Tosatti wrote:
> On Fri, Oct 30, 2009 at 12:15:34PM +0100, Nikola Ciprich wrote:
> > Ouch, typo in subject, it's 2.6.31.1 of course. sorry about that.
> > also CCing kvm.
> > n.
> > 
> > On Fri, Oct 30, 2009 at 12:06:32PM +0100, Nikola Ciprich wrote:
> > > Hi,
> > > some time ago, I updated my KVM hosting machine to 2.6.31.1 and it just 
> > > died horribly:
> 
> Nikola,
> 
> Upgraded from what? Did you see experience the crash again?
> 
> > > Oct 30 10:45:17 vbox [706369.133516] Kernel panic - not syncing: CRED: 
> > > put_cred_rcu() sees 880204e58c00 with usage 82150912
> > > Oct 30 10:45:17 vbox [706369.133519]
> > > Oct 30 10:45:17 vbox [706369.144990] Pid: 19, comm: ksoftirqd/5 Not 
> > > tainted 2.6.31lb.02 #1
> > > Oct 30 10:45:17 vbox [706369.151554] Call Trace:
> > > Oct 30 10:45:17 vbox [706369.154332]  
> > > Oct 30 10:45:17 vbox [] panic+0xaa/0x180
> > > Oct 30 10:45:17 vbox [706369.160280]  [] ? 
> > > _spin_unlock+0x30/0x60
> > > Oct 30 10:45:17 vbox [706369.166256]  [] ? 
> > > add_partial+0x21/0x90
> > > Oct 30 10:45:17 vbox [706369.172155]  [] ? 
> > > __slab_free+0x92/0x3c0
> > > Oct 30 10:45:17 vbox [706369.178127]  [] ? 
> > > file_free_rcu+0x37/0x50
> > > Oct 30 10:45:17 vbox [706369.184198]  [] 
> > > put_cred_rcu+0x75/0x80
> > > Oct 30 10:45:17 vbox [706369.190008]  [] 
> > > __rcu_process_callbacks+0x125/0x250
> > > Oct 30 10:45:17 vbox [706369.197020]  [] 
> > > rcu_process_callbacks+0x39/0x60
> > > Oct 30 10:45:17 vbox [706369.203624]  [] 
> > > __do_softirq+0xc1/0x250
> > > Oct 30 10:45:17 vbox [706369.209506]  [] ? 
> > > ksoftirqd+0x0/0x1a0
> > > Oct 30 10:45:17 vbox [706369.215182]  [] 
> > > call_softirq+0x1c/0x30
> > > Oct 30 10:45:17 vbox [706369.220986]  [] 
> > > do_softirq+0x3d/0x80
> > > Oct 30 10:45:17 vbox [706369.227317]  [] ? 
> > > ksoftirqd+0x0/0x1a0
> > > Oct 30 10:45:17 vbox [706369.233020]  [] 
> > > ksoftirqd+0x84/0x1a0
> > > Oct 30 10:45:17 vbox [706369.238622]  [] 
> > > kthread+0xa6/0xb0
> > > Oct 30 10:45:17 vbox [706369.243956]  [] 
> > > child_rip+0xa/0x20
> > > Oct 30 10:45:17 vbox [706369.249390]  [] ? 
> > > kthread+0x0/0xb0
> > > Oct 30 10:45:17 vbox [706369.254806]  [] ? 
> > > child_rip+0x0/0x20
> > > Oct 30 10:45:17 vbox [706369.260454] Rebooting in 10 seconds..
> > > (trace is obtained from netconsole, so hopefully it's not mangled).
> > > The machine was running ~30 KVM guests, it's 8CPU 16GB x86_64, when it 
> > > crashed, it was only
> > > moderately loaded. Never had this (or any other) kind of crash on it 
> > > before.
> > > I know there's 2.6.31.5 already out, but I'm not sure if some related 
> > > problem has been
> > > reported/fixed and I'm obviously not able to quicky test/reproduce it 
> > > with latest kernel,
> > > so I'm rather reporting.
> > > Should more information/testing/etc be required, I'll be glad to help
> > > regards
> > > nik
> 

-- 
-
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpsoP7I6Rqq7.pgp
Description: PGP signature

Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-03 Thread Benjamin Herrenschmidt

On Tue, 2009-11-03 at 10:06 +0100, Alexander Graf wrote:

> > DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
> > are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
> > that always clears a full cache line (128 bytes).
> 
> Yes. We only come here when we patched the dcbz opcodes to invalid  
> instructions because cache line size of target == 32.
> On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.
> 
> Admittedly though, this could be a lot more clever.

Yeah well, we also really need to fix ppc32 Linux to use the device-tree
provided cache line size :-) For 64-bits, that should already be the
case, and thus the emulation trick shouldn't be useful as long as you
properly provide the guest with the right size in the device-tree.

(Though glibc can be nasty, afaik it might load up optimized variants of
some routines with hard wired cache line sizes based on the CPU type)

> >> +  switch (sprn) {
> >> +  case SPRN_IBAT0U ... SPRN_IBAT3L:
> >> +  bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
> >> +  break;
> >> +  case SPRN_IBAT4U ... SPRN_IBAT7L:
> >> +  bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
> >> +  break;
> >> +  case SPRN_DBAT0U ... SPRN_DBAT3L:
> >> +  bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
> >> +  break;
> >> +  case SPRN_DBAT4U ... SPRN_DBAT7L:
> >> +  bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
> >> +  break;
> >
> > Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
> > specific
> > SPRs, after all.  Some CPUs have only six, some only four, some  
> > none, btw.
> 
> For now only Linux runs which only uses the first 3(?) IIRC. But yes,  
> it's probably worth looking into at one point or the other.
> 
> >
> >> +  case SPRN_HID0:
> >> +  to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
> >> +  break;
> >> +  case SPRN_HID1:
> >> +  to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
> >> +  break;
> >> +  case SPRN_HID2:
> >> +  to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
> >> +  break;
> >> +  case SPRN_HID4:
> >> +  to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
> >> +  break;
> >> +  case SPRN_HID5:
> >> +  to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
> >
> > HIDs are different per CPU; and worse, different CPUs have different
> > registers (SPR #s) for the same register name!
> 
> Sigh :-(

On the other hand, you can probably just "Swallow" all of these and
Linux won't even notice, except for the case of the sleep state maybe on
6xx/7xx/7xxx. Just a matter of knowing what your are emulating as guest.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet

Michael S. Tsirkin a écrit :
> 
> Paul, you acked this previously. Should I add you acked-by line so
> people calm down?  If you would rather I replace
> rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this.
> Or maybe patch Documentation to explain this RCU usage?
> 

So you believe I am over-reacting to this dubious use of RCU ?

RCU documentation is already very complex, we dont need to add yet another
subtle use, and makes it less readable.

It seems you use 'RCU api' in drivers/vhost/net.c as convenient macros :

#define rcu_dereference(p) ({ \
typeof(p) _p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); \
(_p1); \
})

#define rcu_assign_pointer(p, v) \
({ \
if (!__builtin_constant_p(v) || \
((v) != NULL)) \
smp_wmb(); \
(p) = (v); \
})


There are plenty regular uses of smp_wmb() in kernel, not related to Read Copy 
Update,
there is nothing wrong to use barriers with appropriate comments.

(And you already use mb(), wmb(), rmb(), smp_wmb() in your patch)


BTW there is at least one locking bug in vhost_net_set_features()

Apparently, mutex_unlock() doesnt trigger a fault if mutex is not locked
by current thread... even with DEBUG_MUTEXES / DEBUG_LOCK_ALLOC


static void vhost_net_set_features(struct vhost_net *n, u64 features)
{
   size_t hdr_size = features & (1 << VHOST_NET_F_VIRTIO_NET_HDR) ?
   sizeof(struct virtio_net_hdr) : 0;
   int i;
<>  mutex_unlock(&n->dev.mutex);
   n->dev.acked_features = features;
   smp_wmb();
   for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
   mutex_lock(&n->vqs[i].mutex);
   n->vqs[i].hdr_size = hdr_size;
   mutex_unlock(&n->vqs[i].mutex);
   }
   mutex_unlock(&n->dev.mutex);
   vhost_net_flush(n);
}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

Eric Dumazet wrote:
> Gregory Haskins a écrit :
>> Gregory Haskins wrote:
>>> Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :

 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.

 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )

 2) mutex_lock() can sleep (ie block)

>>> Michael,
>>>   I warned you that this needed better documentation ;)
>>>
>>> Eric,
>>>   I think I flagged this once before, but Michael convinced me that it
>>> was indeed "ok", if but perhaps a bit unconventional.  I will try to
>>> find the thread.
>>>
>>> Kind Regards,
>>> -Greg
>>>
>> Here it is:
>>
>> http://lkml.org/lkml/2009/8/12/173
>>
> 
> Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
> use.
> People wanting to use RCU do a grep on kernel sources to find how to correctly
> use RCU.
> 
> Michael, please use existing locking/barrier mechanisms, and not pretend to 
> use RCU.

Yes, I would tend to agree with you.  In fact, I think I suggested that
a normal barrier should be used instead of abusing rcu_dereference().

But as far as his code is concerned, I think it technically works
properly, and that was my main point.  Also note that the usage
rcu_dereference+mutex_lock() are not necessarily broken, per se:  it
could be an srcu-based critical section created by the caller, for
instance.  It would be perfectly legal to sleep on the mutex if that
were the case.

To me, the bigger issue is that the rcu_dereference() without any
apparent hint of a corresponding RSCS is simply confusing as a reviewer.
 smp_rmb() (or whatever is proper in this case) is probably more
appropriate.

Kind Regards,
-Greg

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation

2009-11-03 Thread Gerd Hoffmann


On 10/30/09 09:12, Hannes Reinecke wrote:

Gerd Hoffmann wrote:

http://repo.or.cz/w/qemu/kraxel.git?a=shortlog;h=refs/heads/scsi.v1

It is far from being completed, will continue tomorrow.  Should give a
idea of the direction I'm heading to though.  Comments welcome.


Yep, this looks good.


More bits available now at:

http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v3

Please have a look at the new interface, this commit:

http://repo.or.cz/w/qemu/kraxel.git/commitdiff/9c825dac540282dd4d5f5f660ca13af617888037

The workflow I have in mind is:

  ->request_new()

Returns a parsed request, i.e. all fields of req->cmd are filled 
already, so the host adapter can easily check whenever it is a read or 
write and how many bytes will be transfered.


  ->request_run()

Execute the command.  iovec is passed to the dma_bdrv_*() functions, 
which means it should contain guest physical addresses.


When the request is done the complete callback is called.  For commands 
which complete immediately the callback might be called before 
request_run() returns.  Or maybe don't call the callback at all then? 
We can easily indicate using the return value that we are done already.


  ->request_del()

Release request when done.

Questions & comments are welcome.

cheers,
  Gerd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

On Tue, Nov 03, 2009 at 07:03:55PM +0100, Eric Dumazet wrote:
> Michael S. Tsirkin a écrit :
> > +static void handle_tx(struct vhost_net *net)
> > +{
> > +   struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
> > +   unsigned head, out, in, s;
> > +   struct msghdr msg = {
> > +   .msg_name = NULL,
> > +   .msg_namelen = 0,
> > +   .msg_control = NULL,
> > +   .msg_controllen = 0,
> > +   .msg_iov = vq->iov,
> > +   .msg_flags = MSG_DONTWAIT,
> > +   };
> > +   size_t len, total_len = 0;
> > +   int err, wmem;
> > +   size_t hdr_size;
> > +   struct socket *sock = rcu_dereference(vq->private_data);
> > +   if (!sock)
> > +   return;
> > +
> > +   wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> > +   if (wmem >= sock->sk->sk_sndbuf)
> > +   return;
> > +
> > +   use_mm(net->dev.mm);
> > +   mutex_lock(&vq->mutex);
> > +   vhost_no_notify(vq);
> > +
> 
> using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
> suspect
> that your use of RCU is not correct.
> 
> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
>we are not allowed to sleep in such a section.
>(Quoting Documentation/RCU/whatisRCU.txt :
>  It is illegal to block while in an RCU read-side critical section, )
> 
> 2) mutex_lock() can sleep (ie block)

This use is correct. See comment in vhost.h This use of RCU has been
acked by Paul E. McKenney (paul...@linux.vnet.ibm.com) as well.
There are many ways to use RCU not all of which involve read_rcu_lock.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: libvirt bug #532480

2009-11-03 Thread Vadim Rozenfeld

On 11/03/2009 06:48 PM, Brian Jackson wrote:

On Tuesday 03 November 2009 06:02:42 am roma1390 wrote:

Lib virt thinks that bug #532480 must be addressed to quemu/kvm team.

https://bugzilla.redhat.com/show_bug.cgi?id=532480

For future reference adding some overview to your email instead of making all
the devs with arguably limited time go read through a bug report is probably a
good idea.

Any ideas how to fix this issue?

Iirc, it's being worked on. And yes, it is the developers of said drivers
responsibility to do the signing. Keep watching the url from the bug for
updated drivers. Until then, there are workarounds to this issue also
mentioned at that url.

http://sourceforge.net/projects/kvm/files/kvm-driver-disc/20080318/kvm-driver-disc-20080318.iso/download
doesn't contain viostor.sys (virtual block driver) at all.

http://people.redhat.com/~yvugenfi/24.09.2009/viostor.zip drivers are
not signed (there are no cat files inside),

so you need to do it by yourself.
Download and install WDK first, than take a look at Selfsign_example.cmd
example. It is pretty self-explaining I think. You can also find tons on
information about driver signing by googling on the Internet.

btw, as a developer I care about passing the WHQL tests, not about signing.

Regards,
Vadim

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

On Tue, Nov 03, 2009 at 07:51:35PM +0100, Eric Dumazet wrote:
> Gregory Haskins a écrit :
> > Gregory Haskins wrote:
> >> Eric Dumazet wrote:
> >>> Michael S. Tsirkin a écrit :
>  +static void handle_tx(struct vhost_net *net)
>  +{
>  +struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
>  +unsigned head, out, in, s;
>  +struct msghdr msg = {
>  +.msg_name = NULL,
>  +.msg_namelen = 0,
>  +.msg_control = NULL,
>  +.msg_controllen = 0,
>  +.msg_iov = vq->iov,
>  +.msg_flags = MSG_DONTWAIT,
>  +};
>  +size_t len, total_len = 0;
>  +int err, wmem;
>  +size_t hdr_size;
>  +struct socket *sock = rcu_dereference(vq->private_data);
>  +if (!sock)
>  +return;
>  +
>  +wmem = atomic_read(&sock->sk->sk_wmem_alloc);
>  +if (wmem >= sock->sk->sk_sndbuf)
>  +return;
>  +
>  +use_mm(net->dev.mm);
>  +mutex_lock(&vq->mutex);
>  +vhost_no_notify(vq);
>  +
> >>> using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
> >>> suspect
> >>> that your use of RCU is not correct.
> >>>
> >>> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
> >>>we are not allowed to sleep in such a section.
> >>>(Quoting Documentation/RCU/whatisRCU.txt :
> >>>  It is illegal to block while in an RCU read-side critical section, )
> >>>
> >>> 2) mutex_lock() can sleep (ie block)
> >>>
> >>
> >> Michael,
> >>   I warned you that this needed better documentation ;)
> >>
> >> Eric,
> >>   I think I flagged this once before, but Michael convinced me that it
> >> was indeed "ok", if but perhaps a bit unconventional.  I will try to
> >> find the thread.
> >>
> >> Kind Regards,
> >> -Greg
> >>
> > 
> > Here it is:
> > 
> > http://lkml.org/lkml/2009/8/12/173
> > 
> 
> Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
> use.
> People wanting to use RCU do a grep on kernel sources to find how to correctly
> use RCU.
> 
> Michael, please use existing locking/barrier mechanisms, and not pretend to 
> use RCU.
> 
> Some automatic tools might barf later.
> 
> For example, we could add a debugging facility to check that 
> rcu_dereference() is used
> in an appropriate context, ie conflict with existing mutex_lock() debugging 
> facility.


Paul, you acked this previously. Should I add you acked-by line so
people calm down?  If you would rather I replace
rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this.
Or maybe patch Documentation to explain this RCU usage?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: XP blue screen with qemu-kvm-0.11.0

2009-11-03 Thread Ross Boylan

On Sat, 2009-10-31 at 15:21 +0300, Michael Tokarev wrote:
> Ross Boylan wrote:
> > My XP VM was working OK, and then started crashing shortly after it
> > logged me in.  There were no obvious changes at the time.  I built the
> > latest qemu-kvm, but the problem persists.
> > 
> > I am running 32 bit XP on Intel(R) Xeon(R) CPU E5420  @ 2.50GHz (8 cores
> > total), Debian GNU/Linux mostly Lenny (amd64), but with some more recent
> > stuff.  In particular, the kernel is 2.6.30-8 and I pulled in the
> > kernel-headers package to match before building kvm.  However, libc6 and
> > libc6-dev are at Lenny's 2.7-18 version.
> 
> Libc is basically irrelevant here.  What matters are the host kernel
> and kvm version.
> 
> > $ ./XP.sh
> > ++ sudo vdeq bin/qemu-system-x86_64 -net 
> > nic,vlan=1,macaddr=52:54:a0:12:01:00 -net 
> > vde,vlan=1,sock=/var/run/vde2/tap0.ctl -boot c -vga std -hda 
> > /dev/turtle/XP01 -soundhw es1370 -localtime -m 1G -smp 2
> > arg ,vlan=1,sock=/var/run/vde2/tap0.ctl
> > TUNGETIFF ioctl() failed: Invalid argument
> > TUNSETOFFLOAD ioctl() failed: Bad address
> > oss: Could not initialize DAC
> > oss: Failed to open `/dev/dsp'
> > oss: Reason: Device or resource busy
> > oss: Could not initialize DAC
> > oss: Failed to open `/dev/dsp'
> > oss: Reason: Device or resource busy
> > audio: Failed to create voice `es1370.dac2'
> > # and more sound-related complaints
> 
> Switch to alsa to get your audio working.
I don't see an alsa option for kvm/qemu.  I'm already running alsa, but
under KDE which tends to grab the device.
> 
> > The VM starts; I see the initial XP screen with the 4 colors; I see the
> > background I get when I log in (it logs me in directly without prompt);
> > and then (pretty fast) I get a blue screen.  The stop code is 0x8E, and
> > the text says to check disk space and BIOS options.
> 
> What's the bios files your kvm uses?  Are they by a change
> from some old qemu install?
They appear to be from the latest install, since strace shows various
bios files loading from /usr/local/kvm/share.  The invoking environment
was a little different from the real run, since strace vdeq 
apparently traced vdeq but not kvm calls.  So I just ran the kvm bare.

> 
> Does kvm deb from http://www.corpit.ru/debian/tls/kvm/ expose the same
> issue?
Yes.  However, as it fails it left a reverberating sound (fragment of
the Windows login tone).

I tried starting in safe mode.  XP said there was new hardware: the
video (-vga std).  It could not find a driver on the internet(!? the
device was identified as a VGA controller).  Then it told me the driver
had been installed (after I hit finish).  I rebooted in regular mode.
This time there was no sound, but the machine failed again with STOP
0x8E (as before).  The video appeared to be working throughout this,
showing a window that exceeded vanilla VGA resolution.

Ross


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Paolo Bonzini


On 11/03/2009 05:05 PM, Avi Kivity wrote:

On 11/03/2009 05:29 PM, Ondrej Zajicek wrote:

On Tue, Nov 03, 2009 at 11:38:18AM +0200, Avi Kivity wrote:

On 11/03/2009 01:25 PM, Vincent Hanquez wrote:

not sure if i'm missing the point here, but couldn't it be
hypothetically
extended to stuff 3d (or video& more 2d accel ?) commands too ? I can't
imagine the cirrus or stdvga driver be able to do that ever ;)


cirrus has pretty good 2d acceleration. 3D is a mega-project though.

Cirrus has no blending/compositing hardware support.
Paravirtualized graphics can easily support full XRender-style
2D acceleration.


What do that entail? 3/4 operand raster ops?


With alpha compositing.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet

Gregory Haskins a écrit :
> Gregory Haskins wrote:
>> Eric Dumazet wrote:
>>> Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 +  struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
 +  unsigned head, out, in, s;
 +  struct msghdr msg = {
 +  .msg_name = NULL,
 +  .msg_namelen = 0,
 +  .msg_control = NULL,
 +  .msg_controllen = 0,
 +  .msg_iov = vq->iov,
 +  .msg_flags = MSG_DONTWAIT,
 +  };
 +  size_t len, total_len = 0;
 +  int err, wmem;
 +  size_t hdr_size;
 +  struct socket *sock = rcu_dereference(vq->private_data);
 +  if (!sock)
 +  return;
 +
 +  wmem = atomic_read(&sock->sk->sk_wmem_alloc);
 +  if (wmem >= sock->sk->sk_sndbuf)
 +  return;
 +
 +  use_mm(net->dev.mm);
 +  mutex_lock(&vq->mutex);
 +  vhost_no_notify(vq);
 +
>>> using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
>>> suspect
>>> that your use of RCU is not correct.
>>>
>>> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
>>>we are not allowed to sleep in such a section.
>>>(Quoting Documentation/RCU/whatisRCU.txt :
>>>  It is illegal to block while in an RCU read-side critical section, )
>>>
>>> 2) mutex_lock() can sleep (ie block)
>>>
>>
>> Michael,
>>   I warned you that this needed better documentation ;)
>>
>> Eric,
>>   I think I flagged this once before, but Michael convinced me that it
>> was indeed "ok", if but perhaps a bit unconventional.  I will try to
>> find the thread.
>>
>> Kind Regards,
>> -Greg
>>
> 
> Here it is:
> 
> http://lkml.org/lkml/2009/8/12/173
> 

Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
use.
People wanting to use RCU do a grep on kernel sources to find how to correctly
use RCU.

Michael, please use existing locking/barrier mechanisms, and not pretend to use 
RCU.

Some automatic tools might barf later.

For example, we could add a debugging facility to check that rcu_dereference() 
is used
in an appropriate context, ie conflict with existing mutex_lock() debugging 
facility.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

Eric Dumazet wrote:
> Michael S. Tsirkin a écrit :
>> +static void handle_tx(struct vhost_net *net)
>> +{
>> +struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
>> +unsigned head, out, in, s;
>> +struct msghdr msg = {
>> +.msg_name = NULL,
>> +.msg_namelen = 0,
>> +.msg_control = NULL,
>> +.msg_controllen = 0,
>> +.msg_iov = vq->iov,
>> +.msg_flags = MSG_DONTWAIT,
>> +};
>> +size_t len, total_len = 0;
>> +int err, wmem;
>> +size_t hdr_size;
>> +struct socket *sock = rcu_dereference(vq->private_data);
>> +if (!sock)
>> +return;
>> +
>> +wmem = atomic_read(&sock->sk->sk_wmem_alloc);
>> +if (wmem >= sock->sk->sk_sndbuf)
>> +return;
>> +
>> +use_mm(net->dev.mm);
>> +mutex_lock(&vq->mutex);
>> +vhost_no_notify(vq);
>> +
> 
> using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
> suspect
> that your use of RCU is not correct.
> 
> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
>we are not allowed to sleep in such a section.
>(Quoting Documentation/RCU/whatisRCU.txt :
>  It is illegal to block while in an RCU read-side critical section, )
> 
> 2) mutex_lock() can sleep (ie block)
> 


Michael,
  I warned you that this needed better documentation ;)

Eric,
  I think I flagged this once before, but Michael convinced me that it
was indeed "ok", if but perhaps a bit unconventional.  I will try to
find the thread.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature

[PATCHv7 3/3] vhost_net: a kernel-level virtio server

What it is: vhost net is a character device that can be used to reduce
the number of system calls involved in virtio networking.
Existing virtio net code is used in the guest without modification.

There's similarity with vringfd, with some differences and reduced scope
- uses eventfd for signalling
- structures can be moved around in memory at any time (good for
  migration, bug work-arounds in userspace)
- write logging is supported (good for migration)
- support memory table and not just an offset (needed for kvm)

common virtio related code has been put in a separate file vhost.c and
can be made into a separate module if/when more backends appear.  I used
Rusty's lguest.c as the source for developing this part : this supplied
me with witty comments I wouldn't be able to write myself.

What it is not: vhost net is not a bus, and not a generic new system
call. No assumptions are made on how guest performs hypercalls.
Userspace hypervisors are supported as well as kvm.

How it works: Basically, we connect virtio frontend (configured by
userspace) to a backend. The backend could be a network device, or a tap
device.  Backend is also configured by userspace, including vlan/mac
etc.

Status: This works for me, and I haven't see any crashes.
Compared to userspace, people reported improved latency (as I save up to
4 system calls per packet), as well as better bandwidth and CPU
utilization.

Features that I plan to look at in the future:
- mergeable buffers
- zero copy
- scalability tuning: figure out the best threading model to use

Acked-by: Arnd Bergmann 
Signed-off-by: Michael S. Tsirkin 
---
 MAINTAINERS|9 +
 arch/x86/kvm/Kconfig   |1 +
 drivers/Makefile   |1 +
 drivers/vhost/Kconfig  |   11 +
 drivers/vhost/Makefile |2 +
 drivers/vhost/net.c|  633 +
 drivers/vhost/vhost.c  |  970 
 drivers/vhost/vhost.h  |  158 +++
 include/linux/Kbuild   |1 +
 include/linux/miscdevice.h |1 +
 include/linux/vhost.h  |  126 ++
 11 files changed, 1913 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vhost/Kconfig
 create mode 100644 drivers/vhost/Makefile
 create mode 100644 drivers/vhost/net.c
 create mode 100644 drivers/vhost/vhost.c
 create mode 100644 drivers/vhost/vhost.h
 create mode 100644 include/linux/vhost.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 8824115..980a69b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5619,6 +5619,15 @@ S:   Maintained
 F: Documentation/filesystems/vfat.txt
 F: fs/fat/
 
+VIRTIO HOST (VHOST)
+M: "Michael S. Tsirkin" 
+L: kvm@vger.kernel.org
+L: virtualizat...@lists.osdl.org
+L: net...@vger.kernel.org
+S: Maintained
+F: drivers/vhost/
+F: include/linux/vhost.h
+
 VIA RHINE NETWORK DRIVER
 M: Roger Luethi 
 S: Maintained
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index b84e571..94f44d9 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -64,6 +64,7 @@ config KVM_AMD
 
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
+source drivers/vhost/Kconfig
 source drivers/lguest/Kconfig
 source drivers/virtio/Kconfig
 
diff --git a/drivers/Makefile b/drivers/Makefile
index 6ee53c7..81e3659 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_HID)   += hid/
 obj-$(CONFIG_PPC_PS3)  += ps3/
 obj-$(CONFIG_OF)   += of/
 obj-$(CONFIG_SSB)  += ssb/
+obj-$(CONFIG_VHOST_NET)+= vhost/
 obj-$(CONFIG_VIRTIO)   += virtio/
 obj-$(CONFIG_VLYNQ)+= vlynq/
 obj-$(CONFIG_STAGING)  += staging/
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
new file mode 100644
index 000..d955406
--- /dev/null
+++ b/drivers/vhost/Kconfig
@@ -0,0 +1,11 @@
+config VHOST_NET
+   tristate "Host kernel accelerator for virtio net"
+   depends on NET && EVENTFD
+   ---help---
+ This kernel module can be loaded in host kernel to accelerate
+ guest networking with virtio_net. Not to be confused with virtio_net
+ module itself which needs to be loaded in guest kernel.
+
+ To compile this driver as a module, choose M here: the module will
+ be called vhost_net.
+
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
new file mode 100644
index 000..72dd020
--- /dev/null
+++ b/drivers/vhost/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_VHOST_NET) += vhost_net.o
+vhost_net-y := vhost.o net.o
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
new file mode 100644
index 000..4af3b98
--- /dev/null
+++ b/drivers/vhost/net.c
@@ -0,0 +1,633 @@
+/* Copyright (C) 2009 Red Hat, Inc.
+ * Author: Michael S. Tsirkin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * virtio-net server in host kernel.
+ */
+
+#include 
+#include 
+#inclu

Re: [RFC] make cpu creation happen inside the right thread.

On Tue, Nov 03, 2009 at 12:35:08PM -0200, Glauber Costa wrote:
> Right now, we issue cpu creation from the i/o thread, and then shoot a thread
> from inside that code. Over the last months, a lot of subtle bugs were 
> reported,
> usually arising from the very fragile order of that initialization.
> 
> I propose we rethink that a little. This is a patch that received basic 
> testing
> only, and I'd  like to hear on the overall direction. The idea is to issue 
> the new
> thread as early as possible. The first direct benefits I can identify are that
> we no longer have to rely at on_vcpu-like schemes for issuing vcpu ioctls, 
> since
> we are already on the right thread. Apic creation has far less spots for race
> conditions as well.
> 
> I am implementing this on qemu-kvm first, since we can show the benefits of it
> a bit better in there (since we already support smp)
> 
> Let me know what you guys think

Makes sense to me. You still need on_vcpu for issuing vcpu ioctls though
(after initialization).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] make cpu creation happen inside the right thread.

On Tue, Nov 03, 2009 at 03:46:07PM -0200, Marcelo Tosatti wrote:
> On Tue, Nov 03, 2009 at 12:35:08PM -0200, Glauber Costa wrote:
> > Right now, we issue cpu creation from the i/o thread, and then shoot a 
> > thread
> > from inside that code. Over the last months, a lot of subtle bugs were 
> > reported,
> > usually arising from the very fragile order of that initialization.
> > 
> > I propose we rethink that a little. This is a patch that received basic 
> > testing
> > only, and I'd  like to hear on the overall direction. The idea is to issue 
> > the new
> > thread as early as possible. The first direct benefits I can identify are 
> > that
> > we no longer have to rely at on_vcpu-like schemes for issuing vcpu ioctls, 
> > since
> > we are already on the right thread. Apic creation has far less spots for 
> > race
> > conditions as well.
> > 
> > I am implementing this on qemu-kvm first, since we can show the benefits of 
> > it
> > a bit better in there (since we already support smp)
> > 
> > Let me know what you guys think
> 
> Makes sense to me. You still need on_vcpu for issuing vcpu ioctls though
> (after initialization).

Yes, but I believe we can avoid most of them. There is a performance hit of 
using
it, but I am not so concerned with that. The nasty races that arises from it, 
are
more a concern to me.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] qemu-kvm: x86: Add support for event states

2009-11-03 Thread Jan Kiszka

This patch extends the qemu-kvm state sync logic with the event substate
from the new VCPU state interface, giving access to yet missing
exception, interrupt and NMI states.

The patch  does not switch the rest of qemu-kvm's code to the new
interface as it is expected to be morphed into upstream's version
anyway. Instead, a full conversion will be submitted for upstream.

Signed-off-by: Jan Kiszka 
---

 qemu-kvm-x86.c|   85 +
 target-i386/cpu.h |4 ++
 target-i386/machine.c |4 ++
 3 files changed, 93 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index e03a4ba..b12b103 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -903,6 +903,81 @@ static void get_seg(SegmentCache *lhs, const struct 
kvm_segment *rhs)
| (rhs->avl * DESC_AVL_MASK);
 }
 
+static void kvm_get_events(CPUState *env)
+{
+#ifdef KVM_CAP_VCPU_STATE
+struct {
+struct kvm_vcpu_state header;
+struct kvm_vcpu_substate substates[1];
+} request;
+struct kvm_x86_event_state events;
+int r;
+
+request.header.nsubstates = 1;
+request.header.substates[0].type = KVM_X86_VCPU_STATE_EVENTS;
+request.header.substates[0].offset = (size_t)&events - (size_t)&request;
+r = kvm_vcpu_ioctl(env, KVM_GET_VCPU_STATE, &request);
+if (r == 0) {
+if (events.exception.injected) {
+env->exception_index = events.exception.nr;
+env->error_code = events.exception.error_code;
+} else {
+env->exception_index = -1;
+}
+
+env->interrupt_injected =
+events.interrupt.injected ? events.interrupt.nr : -1;
+env->soft_interrupt = events.interrupt.soft;
+
+env->nmi_injected = events.nmi.injected;
+env->nmi_pending = events.nmi.pending;
+if (events.nmi.masked) {
+env->hflags2 |= HF2_NMI_MASK;
+} else {
+env->hflags2 &= ~HF2_NMI_MASK;
+}
+
+env->sipi_vector = events.sipi_vector;
+
+return;
+}
+#endif
+env->nmi_injected = 0;
+env->nmi_pending = 0;
+env->hflags2 &= ~HF2_NMI_MASK;
+}
+
+static void kvm_set_events(CPUState *env)
+{
+#ifdef KVM_CAP_VCPU_STATE
+struct {
+struct kvm_vcpu_state header;
+struct kvm_vcpu_substate substates[1];
+} request;
+struct kvm_x86_event_state events;
+
+request.header.nsubstates = 1;
+request.header.substates[0].type = KVM_X86_VCPU_STATE_EVENTS;
+request.header.substates[0].offset = (size_t)&events - (size_t)&request;
+
+events.exception.injected = (env->exception_index >= 0);
+events.exception.nr = env->exception_index;
+events.exception.error_code = env->error_code;
+
+events.interrupt.injected = (env->interrupt_injected >= 0);
+events.interrupt.nr = env->interrupt_injected;
+events.interrupt.soft = env->soft_interrupt;
+
+events.nmi.injected = env->nmi_injected;
+events.nmi.pending = env->nmi_pending;
+events.nmi.masked = !!(env->hflags2 & HF2_NMI_MASK);
+
+events.sipi_vector = env->sipi_vector;
+
+kvm_vcpu_ioctl(env, KVM_SET_VCPU_STATE, &request);
+#endif
+}
+
 void kvm_arch_load_regs(CPUState *env)
 {
 struct kvm_regs regs;
@@ -1019,6 +1094,8 @@ void kvm_arch_load_regs(CPUState *env)
 rc = kvm_set_msrs(env, msrs, n);
 if (rc == -1)
 perror("kvm_set_msrs FAILED");
+
+kvm_set_events(env);
 }
 
 void kvm_load_tsc(CPUState *env)
@@ -1215,6 +1292,8 @@ void kvm_arch_save_regs(CPUState *env)
 return;
 }
 }
+
+kvm_get_events(env);
 }
 
 static void do_cpuid_ent(struct kvm_cpuid_entry2 *e, uint32_t function,
@@ -1383,7 +1462,10 @@ int kvm_arch_init_vcpu(CPUState *cenv)
 kvm_tpr_vcpu_start(cenv);
 #endif
 
+cenv->exception_index = -1;
 cenv->interrupt_injected = -1;
+cenv->nmi_injected = 0;
+cenv->nmi_pending = 0;
 
 return 0;
 }
@@ -1453,7 +1535,10 @@ void kvm_arch_push_nmi(void *opaque)
 
 void kvm_arch_cpu_reset(CPUState *env)
 {
+env->exception_index = -1;
 env->interrupt_injected = -1;
+env->nmi_injected = 0;
+env->nmi_pending = 0;
 kvm_arch_load_regs(env);
 if (!cpu_is_bsp(env)) {
if (kvm_irqchip_in_kernel()) {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index a638e70..863c5a1 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -711,6 +711,10 @@ typedef struct CPUX86State {
 /* For KVM */
 uint32_t mp_state;
 int32_t interrupt_injected;
+uint8_t soft_interrupt;
+uint8_t nmi_injected;
+uint8_t nmi_pending;
+uint32_t sipi_vector;
 
 /* in order to simplify APIC support, we leave this pointer to the
user */
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 6bd447f..f066e6a 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -452,6 +452,10 @@ static const VMStateDescription vmstate_cpu = {
 VMSTATE_INT32_V(interrupt_injected, CPUState, 9),

[PATCH 1/2] qemu-kvm: x86: Refactor use of interrupt_bitmap

2009-11-03 Thread Jan Kiszka

Drop interrupt_bitmap from the cpustate and solely rely on the integer
interupt_injected. This prepares us for the new injected-interrupt
interface, which will deprecate the bitmap, while preserving
compatibility.

Signed-off-by: Jan Kiszka 
---

Note: A corresponding version for upstream is on the way as well.

 qemu-kvm-x86.c|   19 +--
 target-i386/cpu.h |3 +--
 target-i386/kvm.c |   24 +---
 target-i386/machine.c |   24 ++--
 4 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 9df0d83..e03a4ba 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -946,7 +946,11 @@ void kvm_arch_load_regs(CPUState *env)
 fpu.mxcsr = env->mxcsr;
 kvm_set_fpu(env, &fpu);
 
-memcpy(sregs.interrupt_bitmap, env->interrupt_bitmap, 
sizeof(sregs.interrupt_bitmap));
+memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
+if (env->interrupt_injected >= 0) {
+sregs.interrupt_bitmap[env->interrupt_injected / 64] |=
+(uint64_t)1 << (env->interrupt_injected % 64);
+}
 
 if ((env->eflags & VM_MASK)) {
set_v8086_seg(&sregs.cs, &env->segs[R_CS]);
@@ -1104,7 +1108,14 @@ void kvm_arch_save_regs(CPUState *env)
 
 kvm_get_sregs(env, &sregs);
 
-memcpy(env->interrupt_bitmap, sregs.interrupt_bitmap, 
sizeof(env->interrupt_bitmap));
+env->interrupt_injected = -1;
+for (i = 0; i < ARRAY_SIZE(sregs.interrupt_bitmap); i++) {
+if (sregs.interrupt_bitmap[i]) {
+n = ctz64(sregs.interrupt_bitmap[i]);
+env->interrupt_injected = i * 64 + n;
+break;
+}
+}
 
 get_seg(&env->segs[R_CS], &sregs.cs);
 get_seg(&env->segs[R_DS], &sregs.ds);
@@ -1371,6 +1382,9 @@ int kvm_arch_init_vcpu(CPUState *cenv)
 #ifdef KVM_EXIT_TPR_ACCESS
 kvm_tpr_vcpu_start(cenv);
 #endif
+
+cenv->interrupt_injected = -1;
+
 return 0;
 }
 
@@ -1439,6 +1453,7 @@ void kvm_arch_push_nmi(void *opaque)
 
 void kvm_arch_cpu_reset(CPUState *env)
 {
+env->interrupt_injected = -1;
 kvm_arch_load_regs(env);
 if (!cpu_is_bsp(env)) {
if (kvm_irqchip_in_kernel()) {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 4605fd2..a638e70 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -709,8 +709,8 @@ typedef struct CPUX86State {
 MTRRVar mtrr_var[8];
 
 /* For KVM */
-uint64_t interrupt_bitmap[256 / 64];
 uint32_t mp_state;
+int32_t interrupt_injected;
 
 /* in order to simplify APIC support, we leave this pointer to the
user */
@@ -727,7 +727,6 @@ typedef struct CPUX86State {
 uint16_t fpus_vmstate;
 uint16_t fptag_vmstate;
 uint16_t fpregs_format_vmstate;
-int32_t pending_irq_vmstate;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 24c9903..33f7d65 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -23,6 +23,7 @@
 #include "kvm.h"
 #include "cpu.h"
 #include "gdbstub.h"
+#include "host-utils.h"
 
 #ifdef KVM_UPSTREAM
 //#define DEBUG_KVM
@@ -408,9 +409,11 @@ static int kvm_put_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
 
-memcpy(sregs.interrupt_bitmap,
-   env->interrupt_bitmap,
-   sizeof(sregs.interrupt_bitmap));
+memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
+if (env->interrupt_injected >= 0) {
+sregs.interrupt_bitmap[env->interrupt_injected / 64] |=
+(uint64_t)1 << (env->interrupt_injected % 64);
+}
 
 if ((env->eflags & VM_MASK)) {
set_v8086_seg(&sregs.cs, &env->segs[R_CS]);
@@ -518,15 +521,22 @@ static int kvm_get_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
 uint32_t hflags;
-int ret;
+int bit, i, ret;
 
 ret = kvm_vcpu_ioctl(env, KVM_GET_SREGS, &sregs);
 if (ret < 0)
 return ret;
 
-memcpy(env->interrupt_bitmap, 
-   sregs.interrupt_bitmap,
-   sizeof(sregs.interrupt_bitmap));
+/* There can only be one pending IRQ set in the bitmap at a time, so try
+   to find it and save its number instead (-1 for none). */
+env->interrupt_injected = -1;
+for (i = 0; i < ARRAY_SIZE(sregs.interrupt_bitmap); i++) {
+if (sregs.interrupt_bitmap[i]) {
+bit = ctz64(sregs.interrupt_bitmap[i]);
+env->interrupt_injected = i * 64 + bit;
+break;
+}
+}
 
 get_seg(&env->segs[R_CS], &sregs.cs);
 get_seg(&env->segs[R_DS], &sregs.ds);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 2b88fea..6bd447f 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -2,7 +2,6 @@
 #include "hw/boards.h"
 #include "hw/pc.h"
 #include "hw/isa.h"
-#include "host-utils.h"
 
 #include "exec-all.h"
 #include "kvm.h"
@@ -321,7 +320,7 @@ static const VMStateInfo vmstate_hack_uint64_as_uint32 = {
 static void cpu_pre_save(void *opaq

Re: [PATCHv7 2/3] mm: export use_mm/unuse_mm to modules

Michael S. Tsirkin wrote:
> vhost net module wants to do copy to/from user from a kernel thread,
> which needs use_mm. Export it to modules.
> 
> Acked-by: Andrea Arcangeli 
> Signed-off-by: Michael S. Tsirkin 

I need this too:

Acked-by: Gregory Haskins 

> ---
>  mm/mmu_context.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/mmu_context.c b/mm/mmu_context.c
> index ded9081..0777654 100644
> --- a/mm/mmu_context.c
> +++ b/mm/mmu_context.c
> @@ -5,6 +5,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -37,6 +38,7 @@ void use_mm(struct mm_struct *mm)
>   if (active_mm != mm)
>   mmdrop(active_mm);
>  }
> +EXPORT_SYMBOL_GPL(use_mm);
>  
>  /*
>   * unuse_mm
> @@ -56,3 +58,4 @@ void unuse_mm(struct mm_struct *mm)
>   enter_lazy_tlb(mm, tsk);
>   task_unlock(tsk);
>  }
> +EXPORT_SYMBOL_GPL(unuse_mm);




signature.asc
Description: OpenPGP digital signature

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

Gregory Haskins wrote:
> Eric Dumazet wrote:
>> Michael S. Tsirkin a écrit :
>>> +static void handle_tx(struct vhost_net *net)
>>> +{
>>> +   struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
>>> +   unsigned head, out, in, s;
>>> +   struct msghdr msg = {
>>> +   .msg_name = NULL,
>>> +   .msg_namelen = 0,
>>> +   .msg_control = NULL,
>>> +   .msg_controllen = 0,
>>> +   .msg_iov = vq->iov,
>>> +   .msg_flags = MSG_DONTWAIT,
>>> +   };
>>> +   size_t len, total_len = 0;
>>> +   int err, wmem;
>>> +   size_t hdr_size;
>>> +   struct socket *sock = rcu_dereference(vq->private_data);
>>> +   if (!sock)
>>> +   return;
>>> +
>>> +   wmem = atomic_read(&sock->sk->sk_wmem_alloc);
>>> +   if (wmem >= sock->sk->sk_sndbuf)
>>> +   return;
>>> +
>>> +   use_mm(net->dev.mm);
>>> +   mutex_lock(&vq->mutex);
>>> +   vhost_no_notify(vq);
>>> +
>> using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
>> suspect
>> that your use of RCU is not correct.
>>
>> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
>>we are not allowed to sleep in such a section.
>>(Quoting Documentation/RCU/whatisRCU.txt :
>>  It is illegal to block while in an RCU read-side critical section, )
>>
>> 2) mutex_lock() can sleep (ie block)
>>
> 
> 
> Michael,
>   I warned you that this needed better documentation ;)
> 
> Eric,
>   I think I flagged this once before, but Michael convinced me that it
> was indeed "ok", if but perhaps a bit unconventional.  I will try to
> find the thread.
> 
> Kind Regards,
> -Greg
> 

Here it is:

http://lkml.org/lkml/2009/8/12/173

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet

Michael S. Tsirkin a écrit :
> +static void handle_tx(struct vhost_net *net)
> +{
> + struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
> + unsigned head, out, in, s;
> + struct msghdr msg = {
> + .msg_name = NULL,
> + .msg_namelen = 0,
> + .msg_control = NULL,
> + .msg_controllen = 0,
> + .msg_iov = vq->iov,
> + .msg_flags = MSG_DONTWAIT,
> + };
> + size_t len, total_len = 0;
> + int err, wmem;
> + size_t hdr_size;
> + struct socket *sock = rcu_dereference(vq->private_data);
> + if (!sock)
> + return;
> +
> + wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> + if (wmem >= sock->sk->sk_sndbuf)
> + return;
> +
> + use_mm(net->dev.mm);
> + mutex_lock(&vq->mutex);
> + vhost_no_notify(vq);
> +

using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect
that your use of RCU is not correct.

1) rcu_dereference() should be done inside a read_rcu_lock() section, and
   we are not allowed to sleep in such a section.
   (Quoting Documentation/RCU/whatisRCU.txt :
 It is illegal to block while in an RCU read-side critical section, )

2) mutex_lock() can sleep (ie block)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm problem: bonding network interface breaks dhcp

2009-11-03 Thread Matthew Palmer

On Tue, Nov 03, 2009 at 04:45:48PM +0100, Harald Dunkel wrote:
> I am trying to use a bonding network interface as a bridge
> for a virtual machine (kvm). Host and guest are both running
> 2.6.31.5. Problem: The guest does not receive the DHCPOFFER
> reply sent by my dhcp server. There is no such problem if
> the host uses just a single network interface instead of
> bond0.

The output of brctl show, ip addr list, and cat /proc/net/bonding/bond*
might be helpful.

- Matt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Ondrej Zajicek

On Tue, Nov 03, 2009 at 06:05:13PM +0200, Avi Kivity wrote:
>>> cirrus has pretty good 2d acceleration.  3D is a mega-project though.
>>>  
>> Cirrus has no blending/compositing hardware support.
>> Paravirtualized graphics can easily support full XRender-style
>> 2D acceleration.
>
> What do that entail? 3/4 operand raster ops?

Yes, basically three operand render/composite operation and rendering of
some 2D primitives (trapezoids).

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature

Re: libvirt bug #532480

2009-11-03 Thread Brian Jackson

On Tuesday 03 November 2009 06:02:42 am roma1390 wrote:
> Lib virt thinks that bug #532480 must be addressed to quemu/kvm team.
> 
>https://bugzilla.redhat.com/show_bug.cgi?id=532480

For future reference adding some overview to your email instead of making all 
the devs with arguably limited time go read through a bug report is probably a 
good idea.

> 
> Any ideas how to fix this issue?

Iirc, it's being worked on. And yes, it is the developers of said drivers 
responsibility to do the signing. Keep watching the url from the bug for 
updated drivers. Until then, there are workarounds to this issue also 
mentioned at that url.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] make cpu creation happen inside the right thread.

Right now, we issue cpu creation from the i/o thread, and then shoot a thread
from inside that code. Over the last months, a lot of subtle bugs were reported,
usually arising from the very fragile order of that initialization.

I propose we rethink that a little. This is a patch that received basic testing
only, and I'd  like to hear on the overall direction. The idea is to issue the 
new
thread as early as possible. The first direct benefits I can identify are that
we no longer have to rely at on_vcpu-like schemes for issuing vcpu ioctls, since
we are already on the right thread. Apic creation has far less spots for race
conditions as well.

I am implementing this on qemu-kvm first, since we can show the benefits of it
a bit better in there (since we already support smp)

Let me know what you guys think

Signed-off-by: Glauber Costa 
CC: Marcelo Tosatti 
CC: Avi Kivity 
CC: Jan Kiszka 
CC: Anthony Liguori 
---
 cpu-defs.h |2 +-
 hw/acpi.c  |2 +-
 hw/pc.c|   26 --
 hw/pc.h|2 +-
 qemu-kvm-x86.c |5 +
 qemu-kvm.c |   44 +++-
 qemu-kvm.h |2 ++
 vl.c   |2 --
 8 files changed, 53 insertions(+), 32 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index cf502e9..6d026e0 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -139,7 +139,7 @@ typedef struct CPUWatchpoint {
 struct qemu_work_item;
 
 struct KVMCPUState {
-pthread_t thread;
+pthread_t *thread;
 int signalled;
 struct qemu_work_item *queued_work_first, *queued_work_last;
 int regs_modified;
diff --git a/hw/acpi.c b/hw/acpi.c
index 7564abf..cc68188 100644
--- a/hw/acpi.c
+++ b/hw/acpi.c
@@ -781,7 +781,7 @@ void qemu_system_cpu_hot_add(int cpu, int state)
 CPUState *env;
 
 if (state && !qemu_get_cpu(cpu)) {
-env = pc_new_cpu(model);
+pc_new_cpu(model, &env);
 if (!env) {
 fprintf(stderr, "cpu %d creation failed\n", cpu);
 return;
diff --git a/hw/pc.c b/hw/pc.c
index 83012a9..53e7273 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1013,29 +1013,26 @@ int cpu_is_bsp(CPUState *env)
 return env->cpuid_apic_id == 0;
 }
 
-CPUState *pc_new_cpu(const char *cpu_model)
+void pc_new_cpu(const char *cpu_model, CPUState **env)
 {
-CPUState *env;
-
-env = cpu_init(cpu_model);
-if (!env) {
+*env = cpu_init(cpu_model);
+if (!*env) {
 fprintf(stderr, "Unable to find x86 CPU definition\n");
 exit(1);
 }
-env->kvm_cpu_state.regs_modified = 1;
-if ((env->cpuid_features & CPUID_APIC) || smp_cpus > 1) {
-env->cpuid_apic_id = env->cpu_index;
+(*env)->kvm_cpu_state.regs_modified = 1;
+if (((*env)->cpuid_features & CPUID_APIC) || smp_cpus > 1) {
+(*env)->cpuid_apic_id = (*env)->cpu_index;
 /* APIC reset callback resets cpu */
-apic_init(env);
+apic_init(*env);
 } else {
-qemu_register_reset((QEMUResetHandler*)cpu_reset, env);
+qemu_register_reset((QEMUResetHandler*)cpu_reset, *env);
 }
 
 /* kvm needs this to run after the apic is initialized. Otherwise,
  * it can access invalid state and crash.
  */
-qemu_init_vcpu(env);
-return env;
+qemu_init_vcpu(*env);
 }
 
 /* PC hardware initialisation */
@@ -1055,7 +1052,6 @@ static void pc_init1(ram_addr_t ram_size,
 PCIBus *pci_bus;
 ISADevice *isa_dev;
 int piix3_devfn = -1;
-CPUState *env;
 qemu_irq *cpu_irq;
 qemu_irq *isa_irq;
 qemu_irq *i8259;
@@ -1086,8 +1082,10 @@ static void pc_init1(ram_addr_t ram_size,
 if (kvm_enabled()) {
 kvm_set_boot_cpu_id(0);
 }
+
 for (i = 0; i < smp_cpus; i++) {
-env = pc_new_cpu(cpu_model);
+//pc_new_cpu(cpu_model, &env);
+ kvm_init_vcpu(NULL);
 }
 
 vmport_init();
diff --git a/hw/pc.h b/hw/pc.h
index 93eb34d..f931380 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -100,7 +100,7 @@ extern int fd_bootchk;
 
 void ioport_set_a20(int enable);
 int ioport_get_a20(void);
-CPUState *pc_new_cpu(const char *cpu_model);
+void pc_new_cpu(const char *cpu_model, CPUState **env);
 
 /* acpi.c */
 extern int acpi_enabled;
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 0d263ca..4084312 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -160,6 +160,11 @@ int kvm_arch_create(kvm_context_t kvm, unsigned long 
phys_mem_bytes,
return 0;
 }
 
+void kvm_arch_create_vcpu(const char *model, CPUState **env)
+{
+pc_new_cpu("qemu64", env);
+}
+
 #ifdef KVM_EXIT_TPR_ACCESS
 
 static int kvm_handle_tpr_access(CPUState *env)
diff --git a/qemu-kvm.c b/qemu-kvm.c
index b58a457..f83d19a 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -436,10 +436,18 @@ void kvm_disable_pit_creation(kvm_context_t kvm)
 kvm->no_pit_creation = 1;
 }
 
-static void kvm_create_vcpu(CPUState *env, int id)
+
+static void kvm_create_vcpu(CPUState **_env)
 {
 long mmap_size;
 int r;
+int id;
+CPUState *env;
+
+kvm_arch

kvm problem: bonding network interface breaks dhcp

2009-11-03 Thread Harald Dunkel

Hi folks,

I am trying to use a bonding network interface as a bridge
for a virtual machine (kvm). Host and guest are both running
2.6.31.5. Problem: The guest does not receive the DHCPOFFER
reply sent by my dhcp server. There is no such problem if
the host uses just a single network interface instead of
bond0.

Looking at tcpdump on the Linux guest there are several dhcp
discover packages like

15:17:44.005306 00:16:36:2f:f1:d2 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), 
length 342: (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), 
length 328) 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 
00:16:36:2f:f1:d2, length 300, xid 0x4c31213d, secs 10, Flags [none]
  Client-Ethernet-Address 00:16:36:2f:f1:d2 [|bootp]

The dhcp server receives these packages, and sends out
a reply

15:17:45.927589 00:16:36:2f:f1:d2 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), 
length 342: (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), 
length 328) 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 
00:16:36:2f:f1:d2, length 300, xid 0x4c31213d, secs 10, Flags [none]
  Client-Ethernet-Address 00:16:36:2f:f1:d2 [|bootp]
15:17:45.927658 00:15:17:94:16:65 > 00:16:36:2f:f1:d2, ethertype IPv4 (0x0800), 
length 364: (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), 
length 350) 172.19.96.123.67 > 172.19.97.243.68: BOOTP/DHCP, Reply, length 322, 
xid 0x4c31213d, secs 10, Flags [none]
  Your-IP 172.19.97.243
  Client-Ethernet-Address 00:16:36:2f:f1:d2 [|bootp]

This reply never shows up on the guest.


iptable is not set, of course. sysctl.conf says

net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0


Any helpful comment would be highly appreciated.


Many thanx

Harri
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support


On 11/03/2009 05:29 PM, Ondrej Zajicek wrote:

On Tue, Nov 03, 2009 at 11:38:18AM +0200, Avi Kivity wrote:
   

On 11/03/2009 01:25 PM, Vincent Hanquez wrote:
 

not sure if i'm missing the point here, but couldn't it be hypothetically
extended to stuff 3d (or video&   more 2d accel ?) commands too ? I can't
imagine the cirrus or stdvga driver be able to do that ever ;)

   

cirrus has pretty good 2d acceleration.  3D is a mega-project though.
 

Cirrus has no blending/compositing hardware support.
Paravirtualized graphics can easily support full XRender-style
2D acceleration.
   


What do that entail? 3/4 operand raster ops?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add VirtIO Frame Buffer Support


On 11/03/2009 05:14 PM, Anthony Liguori wrote:

Avi Kivity wrote:

On 11/03/2009 10:26 AM, Alexander Graf wrote:
Exactly. In fact, I'm even scared to reboot mine because I might end 
up in a 3270 terminal. The whole text only crap keeps people from 
using this platform! And that's what I want to change here.


Ok.  I oppose paravirtualization for its own sake and only support it 
if there's no other way to get performance.  In this case it buys us 
basic functionality which is surprisingly missing on native, that's 
arguably even more important.


There is no "native" on s390.  Everything is "paravirtual".


I meant native as in "what they usually do without our stuff".

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CPU change causes hanging of .NET apps


On 11/03/2009 05:11 PM, Timur Safin wrote:


My totally noob in QEMU guess -
my bet it's CR4.OSFXSR which is controlled by presence of
cpuid.1.edx[24] - FXSR bit (FXSAVE and FXRSTOR) instructions.
   


That would affect floating point as well.


I'm curious - is there any way in QEMU to redefine returned cpuid leaf
values? It will be interesting to see results if that given bit will
be disabled in configuration.
   


qemu -cpu host,-flag

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: vhost-net patches

2009-11-03 Thread Shirley Ma

Hello Xiaohui,

On Tue, 2009-11-03 at 09:06 +0800, Xin, Xiaohui wrote:
> Hi, Michael,
> What's your deferring skb allocation patch mentioned here, may you
> elaborate it a little more detailed?

That's my patch. It was submitted a few month ago. Here is the link to
this RFC patch:
http://www.mail-archive.com/kvm@vger.kernel.org/msg20777.html

It is a patch for guest receiving. Right now, kvm guest did pre-skb
allocation, the worse case when receiving large packet for mergable
buffers, 15/16 skbs need to be freed. Avi and Michale gave me some
comments. I will post the updated patch after running a few more test in
a few days.

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] allow userspace to set MSR no-ops

On Wed, Oct 28, 2009 at 01:23:07PM -0400, David Windsor wrote:
> Hi,
> 
> I've encountered a situation in which I would like to allow userspace
> to set the MSRs which KVM should not emulate and instead implement
> these as no-ops.
> 
> I have not seen any work in this space, furthermore there is an item
> on the KVM TODO that is very similar to what I'm trying to do.
> 
> The userspace interface is an extension of kvm_vcpu_ioctl, adding the
> KVM_SET_MSRS_NOOP flag.  It takes a struct kvm_msrs as a list of which
> MSRs should be no-ops and adds the field noop to struct kvm_msr_entry.
>  This patch only affects vmx, but if the approach is sane, I can
> extend it to support svm as well.

Does the ignore_msrs kvm.ko parameter achieve what you want?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Ondrej Zajicek

On Tue, Nov 03, 2009 at 11:38:18AM +0200, Avi Kivity wrote:
> On 11/03/2009 01:25 PM, Vincent Hanquez wrote:
> > not sure if i'm missing the point here, but couldn't it be hypothetically
> > extended to stuff 3d (or video&  more 2d accel ?) commands too ? I can't
> > imagine the cirrus or stdvga driver be able to do that ever ;)
> >
> 
> cirrus has pretty good 2d acceleration.  3D is a mega-project though.

Cirrus has no blending/compositing hardware support.
Paravirtualized graphics can easily support full XRender-style
2D acceleration.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature

Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Anthony Liguori


Avi Kivity wrote:

On 11/03/2009 10:26 AM, Alexander Graf wrote:
Exactly. In fact, I'm even scared to reboot mine because I might end 
up in a 3270 terminal. The whole text only crap keeps people from 
using this platform! And that's what I want to change here.


Ok.  I oppose paravirtualization for its own sake and only support it 
if there's no other way to get performance.  In this case it buys us 
basic functionality which is surprisingly missing on native, that's 
arguably even more important.


There is no "native" on s390.  Everything is "paravirtual".

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CPU change causes hanging of .NET apps

2009-11-03 Thread Timur Safin

2009/11/3 Avi Kivity :
> On 11/03/2009 04:56 PM, Erik Rull wrote:
>>
>> I took all flags, same effect as without all of these flags.  :(
>>
>> Any other idea?
>
> It's probably the cache size query.
>
> Does -cpu host work?
>

My totally noob in QEMU guess -
my bet it's CR4.OSFXSR which is controlled by presence of
cpuid.1.edx[24] - FXSR bit (FXSAVE and FXRSTOR) instructions.

I'm curious - is there any way in QEMU to redefine returned cpuid leaf
values? It will be interesting to see results if that given bit will
be disabled in configuration.

Best Regards,
Timur
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CPU change causes hanging of .NET apps


On 11/03/2009 04:56 PM, Erik Rull wrote:


I took all flags, same effect as without all of these flags.  :(

Any other idea?


It's probably the cache size query.

Does -cpu host work?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CPU change causes hanging of .NET apps

2009-11-03 Thread Erik Rull


Avi Kivity wrote:

On 11/02/2009 01:45 AM, Erik Rull wrote:

Hi Avi,



Please don't top-post.


the Host CPU is a Intel Core2Duo - VT capable and enabled!


The problem is that one of the flags that -cpu core2duo enables is 
implemented incorrectly, so it leads to .net breakage.


These flags are pni, lm, nx, ssse3, syscall.

Please try -cpu core2duo,-pni,-lm,-nx,-ssse3,-syscall.  If it works, 
remove features one by one until it doesn't and let us know the results.


I took all flags, same effect as without all of these flags.  :(

Any other idea?

- Erik
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.31.4 panic: CRED: put_cred_rcu() sees ffff880204e58c00 with usage 82150912

On Fri, Oct 30, 2009 at 12:15:34PM +0100, Nikola Ciprich wrote:
> Ouch, typo in subject, it's 2.6.31.1 of course. sorry about that.
> also CCing kvm.
> n.
> 
> On Fri, Oct 30, 2009 at 12:06:32PM +0100, Nikola Ciprich wrote:
> > Hi,
> > some time ago, I updated my KVM hosting machine to 2.6.31.1 and it just 
> > died horribly:

Nikola,

Upgraded from what? Did you see experience the crash again?

> > Oct 30 10:45:17 vbox [706369.133516] Kernel panic - not syncing: CRED: 
> > put_cred_rcu() sees 880204e58c00 with usage 82150912
> > Oct 30 10:45:17 vbox [706369.133519]
> > Oct 30 10:45:17 vbox [706369.144990] Pid: 19, comm: ksoftirqd/5 Not tainted 
> > 2.6.31lb.02 #1
> > Oct 30 10:45:17 vbox [706369.151554] Call Trace:
> > Oct 30 10:45:17 vbox [706369.154332]  
> > Oct 30 10:45:17 vbox [] panic+0xaa/0x180
> > Oct 30 10:45:17 vbox [706369.160280]  [] ? 
> > _spin_unlock+0x30/0x60
> > Oct 30 10:45:17 vbox [706369.166256]  [] ? 
> > add_partial+0x21/0x90
> > Oct 30 10:45:17 vbox [706369.172155]  [] ? 
> > __slab_free+0x92/0x3c0
> > Oct 30 10:45:17 vbox [706369.178127]  [] ? 
> > file_free_rcu+0x37/0x50
> > Oct 30 10:45:17 vbox [706369.184198]  [] 
> > put_cred_rcu+0x75/0x80
> > Oct 30 10:45:17 vbox [706369.190008]  [] 
> > __rcu_process_callbacks+0x125/0x250
> > Oct 30 10:45:17 vbox [706369.197020]  [] 
> > rcu_process_callbacks+0x39/0x60
> > Oct 30 10:45:17 vbox [706369.203624]  [] 
> > __do_softirq+0xc1/0x250
> > Oct 30 10:45:17 vbox [706369.209506]  [] ? 
> > ksoftirqd+0x0/0x1a0
> > Oct 30 10:45:17 vbox [706369.215182]  [] 
> > call_softirq+0x1c/0x30
> > Oct 30 10:45:17 vbox [706369.220986]  [] 
> > do_softirq+0x3d/0x80
> > Oct 30 10:45:17 vbox [706369.227317]  [] ? 
> > ksoftirqd+0x0/0x1a0
> > Oct 30 10:45:17 vbox [706369.233020]  [] 
> > ksoftirqd+0x84/0x1a0
> > Oct 30 10:45:17 vbox [706369.238622]  [] kthread+0xa6/0xb0
> > Oct 30 10:45:17 vbox [706369.243956]  [] 
> > child_rip+0xa/0x20
> > Oct 30 10:45:17 vbox [706369.249390]  [] ? 
> > kthread+0x0/0xb0
> > Oct 30 10:45:17 vbox [706369.254806]  [] ? 
> > child_rip+0x0/0x20
> > Oct 30 10:45:17 vbox [706369.260454] Rebooting in 10 seconds..
> > (trace is obtained from netconsole, so hopefully it's not mangled).
> > The machine was running ~30 KVM guests, it's 8CPU 16GB x86_64, when it 
> > crashed, it was only
> > moderately loaded. Never had this (or any other) kind of crash on it before.
> > I know there's 2.6.31.5 already out, but I'm not sure if some related 
> > problem has been
> > reported/fixed and I'm obviously not able to quicky test/reproduce it with 
> > latest kernel,
> > so I'm rather reporting.
> > Should more information/testing/etc be required, I'll be glad to help
> > regards
> > nik

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.


On 11/03/2009 04:32 PM, Marcelo Tosatti wrote:

Any attempt to access the swapped out data will cause a #PF vmexit,
since the translation is marked as not present. If there's swapin in
progress, you wait for that swapin, otherwise start swapin and wait.

Its not as efficient as paravirt because you have to wait for a timer
interrupt and the guest scheduler to decide to taskswitch, but OTOH its
transparent.
   


With a dyntick guest the timer interrupt will come at the end of the 
time slice, likely after the page has been swapped in.  That leaves smp 
reschedule interrupts and non-dyntick guests.


An advantage is that there is one code path for apf and non-apf.  
Another is that interrupts are processed, improving timekeeping and 
maybe responsiveness.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.

On Tue, Nov 03, 2009 at 04:25:33PM +0200, Gleb Natapov wrote:
> On Tue, Nov 03, 2009 at 12:14:23PM -0200, Marcelo Tosatti wrote:
> > On Sun, Nov 01, 2009 at 01:56:22PM +0200, Gleb Natapov wrote:
> > > Asynchronous page fault notifies vcpu that page it is trying to access
> > > is swapped out by a host. In response guest puts a task that caused the
> > > fault to sleep until page is swapped in again. When missing page is
> > > brought back into the memory guest is notified and task resumes execution.
> > 
> > Can't you apply this to non-paravirt guests, and continue to deliver
> > interrupts while waiting for the swapin? 
> > 
> > It should allow the guest to schedule a different task.
> But how can I make the guest to not run the task that caused the fault?

Any attempt to access the swapped out data will cause a #PF vmexit,
since the translation is marked as not present. If there's swapin in
progress, you wait for that swapin, otherwise start swapin and wait.

Its not as efficient as paravirt because you have to wait for a timer
interrupt and the guest scheduler to decide to taskswitch, but OTOH its
transparent.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.

2009-11-03 Thread Gleb Natapov

On Tue, Nov 03, 2009 at 12:14:23PM -0200, Marcelo Tosatti wrote:
> On Sun, Nov 01, 2009 at 01:56:22PM +0200, Gleb Natapov wrote:
> > Asynchronous page fault notifies vcpu that page it is trying to access
> > is swapped out by a host. In response guest puts a task that caused the
> > fault to sleep until page is swapped in again. When missing page is
> > brought back into the memory guest is notified and task resumes execution.
> 
> Can't you apply this to non-paravirt guests, and continue to deliver
> interrupts while waiting for the swapin? 
> 
> It should allow the guest to schedule a different task.
But how can I make the guest to not run the task that caused the fault?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.

On Sun, Nov 01, 2009 at 01:56:22PM +0200, Gleb Natapov wrote:
> Asynchronous page fault notifies vcpu that page it is trying to access
> is swapped out by a host. In response guest puts a task that caused the
> fault to sleep until page is swapped in again. When missing page is
> brought back into the memory guest is notified and task resumes execution.

Can't you apply this to non-paravirt guests, and continue to deliver
interrupts while waiting for the swapin? 

It should allow the guest to schedule a different task.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] tests: The order of the fields are in reverse one isr stack.

Applied both, thanks.

On Thu, Oct 29, 2009 at 11:12:57AM +0200, Gleb Natapov wrote:
> 
> Signed-off-by: Gleb Natapov 
> diff --git a/kvm/user/test/x86/apic.c b/kvm/user/test/x86/apic.c
> index 4e89c77..b6718ec 100644
> --- a/kvm/user/test/x86/apic.c
> +++ b/kvm/user/test/x86/apic.c
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG

On Fri, Oct 30, 2009 at 12:46:59PM +0100, Jan Kiszka wrote:
> Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from
> KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance,
> avoid triggering the WARN_ON in kvm_queue_exception if there is already
> an exception pending and reject such invalid requests.
> 
> Signed-off-by: Jan Kiszka 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Fix KVM_GET_CLOCK

On Tue, Nov 03, 2009 at 12:49:05PM +0100, Jan Kiszka wrote:
> The flags field of kvm_clock_data is supposed to indicate the
> availability of additional fields one day. There are none yet, so clear
> it. Moreover, drop the bogus check of this field and return 0 on
> success.
> 
> Signed-off-by: Jan Kiszka 
> ---
> 
> Note: This replaces "Clear flags field on return from KVM_GET_CLOCK".
> 
>  arch/x86/kvm/x86.c |8 +++-
>  1 files changed, 3 insertions(+), 5 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM Live Migration

2009-11-03 Thread Gilberto Nunes Ferreira

hi all

Is this my first post...

I sucess migrate a VM with this command:
(on kvm-0 )

r...@kvm-0:~# virsh list --all
Connecting to uri: qemu:///system
Id Name State
--
  1 win2003  running

r...@kvm-0:~:# visrh migrate --live win2003 qemu+ssh://kvm-1/system

But, when the VM start on kvm-1, it's started on paused state!!!

I don't have any idea why this happen!!!

So, to unpause the vm, I log in on kvm-1, and run virt-manager to
unpause the VM...

I use DRBD as a share storage to vm's...

Any help will be welcome..

Thanks





Gilberto Nunes Ferreira
TI
Selbetti Gestão de Documentos
Telefone: +55 (47) 3441-6004
Celular: +55 (47) 8861-6672 















--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv6 1/3] tun: export underlying socket

2009-11-03 Thread Arnd Bergmann

On Tuesday 03 November 2009, Michael S. Tsirkin wrote:
> > What was your reason for changing?
> 
> It turns out socket structure is really bound to specific a file, so we
> can not have 2 files referencing the same socket.  Instead, as I say
> above, it's possible to make sendmsg/recvmsg work on tap file directly.

Ah, I see.

> I have implemented this (patch below), but decided to go with the simple
> thing first.  Since no userspace-visible changes are involved, let's do
> this by small steps: it will be easier to figure out when vhost
> is upstream.

This may even make it easier for me to do the same with macvtap
if I resume work on that.

> @@ -416,8 +422,8 @@ int sock_map_fd(struct socket *sock, int flags)
>  
>  static struct socket *sock_from_file(struct file *file, int *err)
>  {
> -   if (file->f_op == &socket_file_ops)
> -   return file->private_data;  /* set in sock_map_fd */
> +   if (file->f_op->get_socket)
> +   return file->f_op->get_socket(file);
>  
> *err = -ENOTSOCK;

Or maybe do both (socket_file_ops and get_socket), to avoid an indirect
function call.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv6 1/3] tun: export underlying socket

On Tue, Nov 03, 2009 at 01:12:33PM +0100, Arnd Bergmann wrote:
> On Monday 02 November 2009, Michael S. Tsirkin wrote:
> > Tun device looks similar to a packet socket
> > in that both pass complete frames from/to userspace.
> > 
> > This patch fills in enough fields in the socket underlying tun driver
> > to support sendmsg/recvmsg operations, and message flags
> > MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket
> > to modules.  Regular read/write behaviour is unchanged.
> > 
> > This way, code using raw sockets to inject packets
> > into a physical device, can support injecting
> > packets into host network stack almost without modification.
> > 
> > First user of this interface will be vhost virtualization
> > accelerator.
> 
> You mentioned before that you wanted to export the socket
> using some ioctl function returning an open file descriptor,
> which seemed to be a cleaner approach than this one.

Note that a similar feature can be implemented on top of tun_get_socket,
as seen from patch below.

> What was your reason for changing?

It turns out socket structure is really bound to specific a file, so we
can not have 2 files referencing the same socket.  Instead, as I say
above, it's possible to make sendmsg/recvmsg work on tap file directly.

For vhost, the advantage of such a feature over using tun_get_socket
directly would be that vhost module won't depend on tun module then.  I
have implemented this (patch below), but decided to go with the simple
thing first.  Since no userspace-visible changes are involved, let's do
this by small steps: it will be easier to figure out when vhost
is upstream.


---

Note: patch below aplies on top of patch tun: export underlying socket.
It is not intended for merge yet.

net: convert tun device to socket

Add callback to file_ops to retrieve socket from
file structure. Use this to make tun character device
accept sendmsg/recvmsg calls.

Signed-off-by: Michael S. Tsirkin 

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b58095a..53e1806 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1405,7 +1405,8 @@ static const struct file_operations tun_fops = {
.unlocked_ioctl = tun_chr_ioctl,
.open   = tun_chr_open,
.release = tun_chr_close,
-   .fasync = tun_chr_fasync
+   .fasync = tun_chr_fasync,
+   .get_socket = tun_get_socket,
 };
 
 static struct miscdevice tun_miscdev = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2620a8c..f2b381f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1506,6 +1506,9 @@ struct file_operations {
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t 
*, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info 
*, size_t, unsigned int);
int (*setlease)(struct file *, long, struct file_lock **);
+#ifdef CONFIG_NET
+   struct socket *(*get_socket)(struct file *file);
+#endif
 };
 
 struct inode_operations {
diff --git a/net/socket.c b/net/socket.c
index 9dff31c..700efcb 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -119,6 +119,11 @@ static ssize_t sock_splice_read(struct file *file, loff_t 
*ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
 
+static struct socket *sock_get_socket(struct file *file)
+{
+   return file->private_data;  /* set in sock_map_fd */
+}
+
 /*
  * Socket files have a set of 'special' operations as well as the generic 
file ones. These don't appear
  * in the operation structures but are done directly via the socketcall() 
multiplexor.
@@ -141,6 +146,7 @@ static const struct file_operations socket_file_ops = {
.sendpage = sock_sendpage,
.splice_write = generic_splice_sendpage,
.splice_read =  sock_splice_read,
+   .get_socket =   sock_get_socket,
 };
 
 /*
@@ -416,8 +422,8 @@ int sock_map_fd(struct socket *sock, int flags)
 
 static struct socket *sock_from_file(struct file *file, int *err)
 {
-   if (file->f_op == &socket_file_ops)
-   return file->private_data;  /* set in sock_map_fd */
+   if (file->f_op->get_socket)
+   return file->f_op->get_socket(file);
 
*err = -ENOTSOCK;
return NULL;


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-AUTOTEST][PATCH] Fix kvm_config.py -f mode

2009-11-03 Thread Lucas Meneghel Rodrigues

Ooops, fixed. Thanks Ryan!

On Mon, Nov 2, 2009 at 9:28 PM, Ryan Harper  wrote:
> kvm_config.py supports specifying a different filename for
> test config.  This patch fixes the option parsing parameters.
> Currently it uses 'store_true' which stores the value True into
> the filename variable; we want the 'store' mode which will store
> the value of the option (aka, the filename) in the variable.
>
> Signed-off-by: Ryan Harper 
>
> diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py
> index 3114c07..52de4c7 100755
> --- a/client/tests/kvm/kvm_config.py
> +++ b/client/tests/kvm/kvm_config.py
> @@ -501,7 +501,7 @@ class config:
>
>  if __name__ == "__main__":
>     parser = optparse.OptionParser()
> -    parser.add_option('-f', '--file', dest="filename", action='store_true',
> +    parser.add_option('-f', '--file', dest="filename", action='store',
>                       help='path to a config file that will be parsed. '
>                            'If not specified, will parse kvm_tests.cfg '
>                            'located inside the kvm test dir.')
>
>
>
> --
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ry...@us.ibm.com
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv6 1/3] tun: export underlying socket

2009-11-03 Thread Arnd Bergmann

On Monday 02 November 2009, Michael S. Tsirkin wrote:
> Tun device looks similar to a packet socket
> in that both pass complete frames from/to userspace.
> 
> This patch fills in enough fields in the socket underlying tun driver
> to support sendmsg/recvmsg operations, and message flags
> MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket
> to modules.  Regular read/write behaviour is unchanged.
> 
> This way, code using raw sockets to inject packets
> into a physical device, can support injecting
> packets into host network stack almost without modification.
> 
> First user of this interface will be vhost virtualization
> accelerator.

You mentioned before that you wanted to export the socket
using some ioctl function returning an open file descriptor,
which seemed to be a cleaner approach than this one.

What was your reason for changing?

> index 3f5fd52..404abe0 100644
> --- a/include/linux/if_tun.h
> +++ b/include/linux/if_tun.h
> @@ -86,4 +86,18 @@ struct tun_filter {
> __u8   addr[0][ETH_ALEN];
>  };
>  
> +#ifdef __KERNEL__
> +#if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
> +struct socket *tun_get_socket(struct file *);
> +#else
> +#include 
> +#include 
> +struct file;
> +struct socket;
> +static inline struct socket *tun_get_socket(struct file *f)
> +{
> +   return ERR_PTR(-EINVAL);
> +}
> +#endif /* CONFIG_TUN */
> +#endif /* __KERNEL__ */
>  #endif /* __IF_TUN_H */

Is this a leftover from testing? Exporting the function for !__KERNEL__
seems pointless.

Arnd <><

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv6 3/3] vhost_net: a kernel-level virtio server

On Mon, Nov 02, 2009 at 04:05:58PM -0800, Daniel Walker wrote:
> 
> Random style issues below .. Part of this is just stuff checkpatch
> found.

Thanks very much, I'll fix these.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: Fix KVM_GET_CLOCK

2009-11-03 Thread Jan Kiszka

The flags field of kvm_clock_data is supposed to indicate the
availability of additional fields one day. There are none yet, so clear
it. Moreover, drop the bogus check of this field and return 0 on
success.

Signed-off-by: Jan Kiszka 
---

Note: This replaces "Clear flags field on return from KVM_GET_CLOCK".

 arch/x86/kvm/x86.c |8 +++-
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f68798..7344405 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2577,14 +2577,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
ktime_get_ts(&now);
now_ns = timespec_to_ns(&now);
user_ns.clock = kvm->arch.kvmclock_offset + now_ns;
+   user_ns.flags = 0;
 
+   r = -EFAULT;
if (copy_to_user(argp, &user_ns, sizeof(user_ns)))
-   r =  -EFAULT;
-
-   r = -EINVAL;
-   if (user_ns.flags)
goto out;
-
+   r = 0;
break;
}
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4 0/6] qemu-kvm: vhost net support

On Mon, Nov 02, 2009 at 04:58:39PM -0600, Anthony Liguori wrote:
> Hi Michael,
>
> I'll reserve individual patch review until they're in a mergable state,  
> but I do have some comments about the overall integration architecture.
>
> Generally speaking, I think the integration unnecessarily invasive.  It  
> adds things to the virtio infrastructure that shouldn't be there like  
> the irqfd/queuefd bindings.  It also sneaks in things like raw backend  
> support which really isn't needed.
>
> I think we can do better.  Here's what I suggest:
>
> The long term goal should be to have a NetDevice interface that looks  
> very much like virtio-net but as an API, not an ABI.  Roughly, it would  
> look something like:
>
> struct NetDevice {
>   int add_xmit(NetDevice *dev, struct iovec *iov, int iovcnt, void *token);
>   int add recv(NetDevice *dev, struct iovec *iov, int iovcnt, void *token);
>
>   void *get_xmit(NetDevice *dev);
>   void *get_recv(NetDevice *dev);
>
>   void kick(NetDevice *dev);
>
>   ...
> };
>
> That gives us a better API for use with virtio-net, e1000, etc.

This is not much different from what we have now with VLANClientState,
is it?

> Assuming we had this interface, I think a natural extension would be:
>
> int add_ring(NetDevice *dev, void *address);
> int add_kickfd(NetDevice *dev, int fd);
>
> For slot management, it really should happen outside of the NetDevice  
> structure.  We'll need a slot notifier mechanism such that we can keep  
> this up to date as things change.

Yes.

> vhost-net because a NetDevice.  It can support things like the e1000 by  
> doing ring translation behind the scenes.

And the point would be?

> virtio-net can be fast pathed  
> in the case that we're using KVM but otherwise, it would also rely on  
> the ring translation.

Won't it be easier to just keep using existing code?

>  N.B. in the case vhost-net is fast pathed, it  requires a different
>  device in QEMU that uses a separate virtio  transport.  We should
>  reuse as much code as possible obviously.  It  doesn't make sense to
>  have all of the virtio-pci code and virtio-net  code in place when we
>  aren't using it.

Note that all of virtio-pci and setup parts of virtio-net are reused.
The only things we are *not* re-using are send/receive and callbacks in
virtio-net.

> All this said, I'm *not* suggesting you have to implement all of this to  
> get vhost-net merged.  Rather, I'm suggesting that we should try to  
> structure the current vhost-net implementation to complement this  
> architecture assuming we all agree this is the sane thing to do.  That  
> means I would make the following changes to your series:
>
> - move vhost-net support to a VLANClientState backend.
> - do not introduce a raw socket backend
> - if for some reason you want to back to tap and raw, those should be  
> options to the vhost-net backend.
> - when fast pathing with vhost-net, we should introduce interfaces to  
> VLANClientState similar to add_ring and add_kickfd.  They'll be very  
> specific to vhost-net for now, but that's okay.
> - sort out the layering of vhost-net within the virtio infrastructure.   
> vhost-net should really be it's own qdev device.
>  I don't see very much  
> code reuse happening right now so I don't understand why it's not that  
> way currently.
> Regards,
>
> Anthony Liguori

What you propose short-term is workable.  So basically, vhost would be
an option supported by backends.  virtio net would go ahead and activate
it if available and other frontends will ignore it and just keep
injecting packets through regular interfaces.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: error while loading state for instance 0x0 of device 'kvmclock'

On Mon, Nov 02, 2009 at 04:37:15PM +0100, Jan Kiszka wrote:
> Avi Kivity wrote:
> > On 11/02/2009 12:28 PM, Jan Kiszka wrote:
> >> Jan Kiszka wrote:
> >>
> >>> Hi,
> >>>
> >>> current qemu-kvm.git gives me the message "qemu: warning: error while
> >>> loading state for instance 0x0 of device 'kvmclock'" when I run a simple
> >>> "savevm" followed by a "loadvm 1". What's broken here?
> >>>  
> >> OK, this is due to "KVM: add flags to kvm_clock_data" (958b0c5497): the
> >> flags field is not cleared on KVM_SET_CLOCK. Will post a fix.
> >>
> >> But the above kernel commit is also broken: KVM_GET_CLOCK checks
> >> uninitialized user_ns.flags (probably instead of the user's value). This
> >> raises the question if the caller of KVM_GET_CLOCK is also supposed to
> >> pass kvm_clock_data with flags cleared down to the kernel. Could someone
> >> clarify this so I could fix it accordingly?
> >>
> > 
> > I'd make KVM_GET_CLOCK set the flags, not get them.  So if we add new 
> > fields, we just set a new bit and userspace can read it.
> 
> This makes sense and actually fixes the issue completely. Patch on its
> way...
agreed. This is the best behaviour I can devise, indeed.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Clear flags field on return from KVM_GET_CLOCK

On Mon, Nov 02, 2009 at 04:41:54PM +0100, Jan Kiszka wrote:
> This field is supposed to indicate the availability of additional fields
> one day. There are none yet, so clear it - and drop the bogus check,
> too.
> 
> Signed-off-by: Jan Kiszka 
Makes sense.

Acked-by: Glauber Costa 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4 1/6] qemu/virtio: move features to an inline function

On Mon, Nov 02, 2009 at 04:33:53PM -0600, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>> devices should have the final say over which virtio features they
>> support. E.g. indirect entries may or may not make sense in the context
>> of virtio-console. In particular, for vhost, we do not want to report to
>> guest bits not supported by kernel backend.  Move the common bits from
>> virtio-pci to an inline function and let each device call it.
>>
>> No functional changes.
>>   
>
> This is a layering violation.  There are transport specific features and  
> device specific features.  The virtio-net device should have no  
> knowledge or nack'ing ability for transport features.

We could pass "vhost" flag to virtio, and have virtio query the device
for features. Would that be better?

> If you need to change transport features, it suggests you're modeling  
> things incorrectly and should be supplying an alternative transport  
> implementation.
> Regards,
>
> Anthony Liguori

Yes, you can make vhost an alternative transport in qemu.  This might be
one way to handle this. However, this seems to go contrary to your
previous proposal to make vhost a networking back end. Which will it be?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Vincent Hanquez

On Tue, Nov 03, 2009 at 07:39:34AM +0100, Alexander Graf wrote:
>
> On 03.11.2009, at 07:34, Avi Kivity wrote:
>
>> On 11/03/2009 08:27 AM, Alexander Graf wrote:
>>>
 How does it work today?
>>>
>>> You boot into a TERM=dumb line based emulation on 3270 (worst thing  
>>> haunting people's nightmares ever), trying to get out of that mode  
>>> as quickly as possible and off into SSH / VNC.
>>
>> Despite the coolness factor, IMO a few minutes during install time do 
>> not justify a new hardware model and a new driver.
>
> It's more than just coolness factor. There are use cases out there 
> (www.susestudio.com) that don't want to rely on the guest exporting a VNC 
> server to the outside just to access graphics. You also want to see boot 
> messages, have a console login screen, be able to debug things without 
> switching between virtio-console and vnc, etc. etc.
>
> The hardware model isn't exactly new either. It's just the next logical 
> step to a full PV machine using virtio. If the virtio-fb stuff turns out 
> to be really fast and reliable, I could even imagine it being the default 
> target for kvm on ppc as well, as we can't switch resolutions on the fly 
> there atm.

not sure if i'm missing the point here, but couldn't it be hypothetically
extended to stuff 3d (or video & more 2d accel ?) commands too ? I can't
imagine the cirrus or stdvga driver be able to do that ever ;)

-- 
Vincent
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support


On 11/03/2009 01:25 PM, Vincent Hanquez wrote:

not sure if i'm missing the point here, but couldn't it be hypothetically
extended to stuff 3d (or video&  more 2d accel ?) commands too ? I can't
imagine the cirrus or stdvga driver be able to do that ever ;)
   


cirrus has pretty good 2d acceleration.  3D is a mega-project though.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-03 Thread Alexander Graf



On 03.11.2009, at 09:47, Segher Boessenkool wrote:


Nice patchset.  Some comments on the emulation part:


Cool, thanks for looking though them!


+#define OP_31_XOP_EIOIO854


You mean EIEIO.


Probably, yeah.


+   case 19:
+   switch (get_xop(inst)) {
+   case OP_19_XOP_RFID:
+   case OP_19_XOP_RFI:
+   vcpu->arch.pc = vcpu->arch.srr0;
+   kvmppc_set_msr(vcpu, vcpu->arch.srr1);
+   *advance = 0;
+   break;


I think you should only emulate the insns that exist on whatever the  
guest
pretends to be.  RFID exist only on 64-bit implementations.  Same  
comment

everywhere else.


True.




+   case OP_31_XOP_EIOIO:
+   break;


Have you always executed an eieio or sync when you get here, or
do you just not allow direct access to I/O devices?  Other context
synchronising insns are not enough, they do not broadcast on the
bus.


There is no device passthrough yet :-). It's theoretically possible,  
but nothing for it is implemented so far.





+   case OP_31_XOP_DCBZ:
+   {
+   ulong rb =  vcpu->arch.gpr[get_rb(inst)];
+   ulong ra = 0;
+   ulong addr;
+   u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+   if (get_ra(inst))
+   ra = vcpu->arch.gpr[get_ra(inst)];
+
+   addr = (ra + rb) & ~31ULL;
+   if (!(vcpu->arch.msr & MSR_SF))
+   addr &= 0x;
+
+   if (kvmppc_st(vcpu, addr, 32, zeros)) {


DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
that always clears a full cache line (128 bytes).


Yes. We only come here when we patched the dcbz opcodes to invalid  
instructions because cache line size of target == 32.

On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.

Admittedly though, this could be a lot more clever.


+   switch (sprn) {
+   case SPRN_IBAT0U ... SPRN_IBAT3L:
+   bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
+   break;
+   case SPRN_IBAT4U ... SPRN_IBAT7L:
+   bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
+   break;
+   case SPRN_DBAT0U ... SPRN_DBAT3L:
+   bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
+   break;
+   case SPRN_DBAT4U ... SPRN_DBAT7L:
+   bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
+   break;


Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
specific
SPRs, after all.  Some CPUs have only six, some only four, some  
none, btw.


For now only Linux runs which only uses the first 3(?) IIRC. But yes,  
it's probably worth looking into at one point or the other.





+   case SPRN_HID0:
+   to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
+   break;
+   case SPRN_HID1:
+   to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
+   break;
+   case SPRN_HID2:
+   to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
+   break;
+   case SPRN_HID4:
+   to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
+   break;
+   case SPRN_HID5:
+   to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];


HIDs are different per CPU; and worse, different CPUs have different
registers (SPR #s) for the same register name!


Sigh :-(


+   /* guest HID5 set can change is_dcbz32 */
+   if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+   (mfmsr() & MSR_HV))
+   vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+   break;


Wait, does this mean you allow other HID writes when MSR[HV] isn't
set?  All HIDs (and many other SPRs) cannot be read or written in
supervisor mode.


When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
dcbz HID flag. So all we need to do is tell our entry/exit code to set  
this bit.


If we're on 970 on a hypervisor or on a non-970 though we can't use  
the HID5 bit, so we need to binary patch the opcodes.


So in order to emulate real 970 behavior, we need to be able to  
emulate that HID5 bit too! That's what this chunk of code does - it  
basically sets us in dcbz32 mode when allowed on 970 guests.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9] S390x KVM support


On 11/02/2009 10:23 PM, Alexander Graf wrote:

Any progress on the patch? This is really important to make KVM work
properly on S390. I'd even go as far as suggesting it for linux-stable.

   


I forgot all about it, sorry.  Marcelo, can you commit it?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add VirtIO Frame Buffer Support


On 11/03/2009 10:26 AM, Alexander Graf wrote:
Exactly. In fact, I'm even scared to reboot mine because I might end 
up in a 3270 terminal. The whole text only crap keeps people from 
using this platform! And that's what I want to change here.


Ok.  I oppose paravirtualization for its own sake and only support it if 
there's no other way to get performance.  In this case it buys us basic 
functionality which is surprisingly missing on native, that's arguably 
even more important.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-03 Thread Segher Boessenkool


Nice patchset.  Some comments on the emulation part:


+#define OP_31_XOP_EIOIO854


You mean EIEIO.


+   case 19:
+   switch (get_xop(inst)) {
+   case OP_19_XOP_RFID:
+   case OP_19_XOP_RFI:
+   vcpu->arch.pc = vcpu->arch.srr0;
+   kvmppc_set_msr(vcpu, vcpu->arch.srr1);
+   *advance = 0;
+   break;


I think you should only emulate the insns that exist on whatever the  
guest
pretends to be.  RFID exist only on 64-bit implementations.  Same  
comment

everywhere else.


+   case OP_31_XOP_EIOIO:
+   break;


Have you always executed an eieio or sync when you get here, or
do you just not allow direct access to I/O devices?  Other context
synchronising insns are not enough, they do not broadcast on the
bus.


+   case OP_31_XOP_DCBZ:
+   {
+   ulong rb =  vcpu->arch.gpr[get_rb(inst)];
+   ulong ra = 0;
+   ulong addr;
+   u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+   if (get_ra(inst))
+   ra = vcpu->arch.gpr[get_ra(inst)];
+
+   addr = (ra + rb) & ~31ULL;
+   if (!(vcpu->arch.msr & MSR_SF))
+   addr &= 0x;
+
+   if (kvmppc_st(vcpu, addr, 32, zeros)) {


DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
that always clears a full cache line (128 bytes).


+   switch (sprn) {
+   case SPRN_IBAT0U ... SPRN_IBAT3L:
+   bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
+   break;
+   case SPRN_IBAT4U ... SPRN_IBAT7L:
+   bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
+   break;
+   case SPRN_DBAT0U ... SPRN_DBAT3L:
+   bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
+   break;
+   case SPRN_DBAT4U ... SPRN_DBAT7L:
+   bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
+   break;


Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
specific
SPRs, after all.  Some CPUs have only six, some only four, some none,  
btw.



+   case SPRN_HID0:
+   to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
+   break;
+   case SPRN_HID1:
+   to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
+   break;
+   case SPRN_HID2:
+   to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
+   break;
+   case SPRN_HID4:
+   to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
+   break;
+   case SPRN_HID5:
+   to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];


HIDs are different per CPU; and worse, different CPUs have different
registers (SPR #s) for the same register name!


+   /* guest HID5 set can change is_dcbz32 */
+   if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+   (mfmsr() & MSR_HV))
+   vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+   break;


Wait, does this mean you allow other HID writes when MSR[HV] isn't
set?  All HIDs (and many other SPRs) cannot be read or written in
supervisor mode.


Segher

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Alexander Graf



On 03.11.2009, at 09:20, Avi Kivity wrote:


On 11/03/2009 09:50 AM, Alexander Graf wrote:


Ok, imagine this was not this unloved S390 odd architecture but  
X86. The only output choices you have are:


1) virtio-console
2) VNC / SSH over network
3) virtio-fb

Now you want to configure a server, probably using yast and all  
those nice graphical utilities, but still enable a firewall so  
people outside don't intrude your machine. Well, you managed to  
configure the firewall by luck to allow VNC, but now you  
reconfigured it and something broke - but VNC was your only chance  
to access the machine. Oops...


x86 has real framebuffers, so software and people expect it.  s390  
doesn't.  How do people manage now?


They cope with what's there. Fortunately we're in a position to change  
things, so we don't have to stick with the worse.



You also want to see boot messages, have a console login screen,


virtio-console does that, except for the penguins.  Better, since  
you can scroll back.


It doesn't do graphics. Ever used yast in text mode?


Once you're in, start ssh+X or vnc.  Again, what do people do now?


Exactly that. Again, it works but is not ideal. If we can improve user  
experience why work against it?


The hardware model isn't exactly new either. It's just the next  
logical step to a full PV machine using virtio. If the virtio-fb  
stuff turns out to be really fast and reliable, I could even  
imagine it being the default target for kvm on ppc as well, as we  
can't switch resolutions on the fly there atm.




We could with vmware-vga.


The vmware-port stuff is pretty much tied onto X86. I don't think  
modifying EAX is that easy on PPC ;-).


Yes, though we can probably make it work on ppc with minimal  
modifications.


Is it worth it? We can also just implement a virtio mouse event dev  
plus fb and be good. That way we control the whole stack without  
risking to break vmware.


Why?  the guest will typically have networking when it's set up,  
so it should have network access during install.  You can easily  
use slirp redirection and the built-in dhcp server to set this  
up with relatively few hassles.


That's how I use it right now. It's no fun.



The toolstack should hide the unfun parts.


You can't hide guest configuration. We as a distribution control  
the kernel. We don't control the user's configuration as that's by  
design the user's choice. The only thing we can do is give users  
meaningful choices to choose from - and having graphics available  
is definitely one of them.


Well, if the user chooses not to have networking then vnc or ssh+x  
definitely fail.  That would be a strange choice for a server machine.


It's actually rather common on S390, though admittedly not that much  
on Linux+S390. There are more ways for inter node communication than  
networking. You can talk to another VM on the same machine without any  
network whatsoever. That way you can set up an isolated job (your bank  
transfer database for example) that is always protected by a proxy to  
the outside world.


Seriously, try to ask someone internally to get access to an S390.  
I think you'll understand my motivations a lot better after having  
used it for a bit.


I actually have a s390 vm (RHEL 4 IIRC).  It acts just like any  
other remote machine over ssh except that it's especially slow  
(probably the host is overloaded).  Of course I wouldn't dream of  
trying to install something like that though.


Exactly. In fact, I'm even scared to reboot mine because I might end  
up in a 3270 terminal. The whole text only crap keeps people from  
using this platform! And that's what I want to change here.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add VirtIO Frame Buffer Support