Re: kvm problem: bonding network interface breaks dhcp
Hi Matt, Matthew Palmer wrote: > > The output of brctl show, ip addr list, and cat /proc/net/bonding/bond* > might be helpful. > Sure. Using the bridge on the bonding interface (while the guest was running) I got: # brctl show bridge name bridge id STP enabled interfaces br0 8000.001517ab0a59 no bond0 vnet0 # ip addr list 1: lo: mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth2: mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000 link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff 3: eth1: mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:30:48:c6:e0:98 brd ff:ff:ff:ff:ff:ff 4: eth3: mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000 link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff 5: _rename: mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:30:48:c6:e0:99 brd ff:ff:ff:ff:ff:ff 6: eth4: mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000 link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff 7: eth5: mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000 link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff 8: bond0: mtu 1500 qdisc noqueue state UP link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff inet6 fe80::215:17ff:feab:a59/64 scope link valid_lft forever preferred_lft forever 51: br0: mtu 1500 qdisc noqueue state UNKNOWN link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff inet 172.19.96.25/23 brd 172.19.97.255 scope global br0 inet6 fe80::215:17ff:feab:a59/64 scope link valid_lft forever preferred_lft forever 52: vnet0: mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500 link/ether c6:d7:7b:fb:02:35 brd ff:ff:ff:ff:ff:ff inet6 fe80::c4d7:7bff:fefb:235/64 scope link valid_lft forever preferred_lft forever # cat /proc/net/bonding/bond* Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008) Bonding Mode: load balancing (round-robin) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth2 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:15:17:ab:0a:59 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:15:17:ab:0a:58 Slave Interface: eth4 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:15:17:ab:0a:5b Slave Interface: eth5 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:15:17:ab:0a:5a For not using bonding I got # brctl show bridge name bridge id STP enabled interfaces br0 8000.001517ab0a59 no eth2 vnet0 # ip addr list 1: lo: mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth2: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff inet6 fe80::215:17ff:feab:a59/64 scope link valid_lft forever preferred_lft forever 3: eth1: mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:30:48:c6:e0:98 brd ff:ff:ff:ff:ff:ff 4: eth3: mtu 1500 qdisc pfifo_fast state DOWN qlen 1000 link/ether 00:15:17:ab:0a:58 brd ff:ff:ff:ff:ff:ff 5: _rename: mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:30:48:c6:e0:99 brd ff:ff:ff:ff:ff:ff 6: eth4: mtu 1500 qdisc pfifo_fast state DOWN qlen 1000 link/ether 00:15:17:ab:0a:5b brd ff:ff:ff:ff:ff:ff 7: eth5: mtu 1500 qdisc pfifo_fast state DOWN qlen 1000 link/ether 00:15:17:ab:0a:5a brd ff:ff:ff:ff:ff:ff 8: bond0: mtu 1500 qdisc noqueue state DOWN link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 53: br0: mtu 1500 qdisc noqueue state UNKNOWN link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff inet 172.19.96.25/23 brd 172.19.97.255 scope global br0 inet6 fe80::215:17ff:feab:a59/64 scope link valid_lft forever preferred_lft forever 54: vnet0: mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500 link/ether fe:2f:ce:cc:ec:ac brd ff:ff:ff:ff:ff:ff inet6 fe80::fc2f:ceff:fecc:ecac/64 scope link valid_lft forever preferred_lft forever Hope this helps. Regards Harri -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xp guest, blue screen c0000221 on boot
On 11/04/2009 05:45 AM, Andrew Olney wrote: OK, if I convert the raw image to qcow2, it boots fine. Why should a raw image give a blue screen? Sounds like a serious bug. What is the host filesystem? Did you upgrade the host kernel or qemu? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] [RFC] KVM test: Major control file cleanup
On Wed, Oct 28, 2009 at 02:04:59PM -0400, Michael Goldish wrote: > > - "Lucas Meneghel Rodrigues" wrote: > > > On Wed, Oct 28, 2009 at 1:43 PM, Michael Goldish > > wrote: > > > Sounds great, except it won't allow you to debug your configuration > > > using kvm_config.py. So the question now is what's more important > > -- > > > the ability to debug or ease of use when running from the server. > > > > Here we have 2 use cases: > > > > 1) Users of the web interface, that (hopefully) have canned test sets > > that work reliably. Ability to debug stuff is less important on this > > scenario. > > 2) People developing tests, and in this case ability to debug config > > is very important > > > > I see the following options: > > > > 1) Document as part of the "test development guide" that, in order to > > be able to debug stuff, that all the test sets are to be written to > > the config file and then, can be parsed using kvm_config. > > 2) If we write all dictionaries generated by that particular > > configuration on files inside the job results directory, we still > > have > > debug ability for all use cases (I am starting to like this idea very > > much, as I type). > > > > So I'd then implement option 2) and refactor the control file with > > the > > test sets defined inside strings in the control file, then you can > > see > > how it looks? How about that? > > Sounds fine. > - Where exactly will the test list appear? > - We should also allow printing of verbose debug output ("parsing variants > block, 9000 dicts in current context...") by passing something to the > constructor of the config object. > - We should make it clear to the user that he/she must rename the control > file (to control.lucas for example) or else it may be overwritten on the > next git-fetch or -pull. > > I'm still not sure it's a great idea to make config debugging harder, so > if anyone other than Lucas who uses the KVM test is reading this, please > let us know if you ever use kvm_config.py and if you think the ability to > print the list of test dicts is important. Hi Michael, I had used kvm_config.py for printing lists of selected test dicts often. And I think it's necessary to keep this feature. IMHO, option 2) Lucas proposed is a good idea. What do you think? Hope I haven't missed something. :) > ___ > Autotest mailing list > autot...@test.kernel.org > http://test.kernel.org/cgi-bin/mailman/listinfo/autotest -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vga_arb warning [was: mmotm 2009-11-01-10-01 uploaded]
Please cc me on mmotm bug reports! On Sun, 01 Nov 2009 22:47:05 +0100 Jiri Slaby wrote: > On 11/01/2009 07:07 PM, a...@linux-foundation.org wrote: > > The mm-of-the-moment snapshot 2009-11-01-10-01 has been uploaded to > > Hi, I got the following warning while booting an image in qemu-kvm: > > WARNING: at fs/attr.c:158 notify_change+0x2da/0x310() > Hardware name: > Modules linked in: > Pid: 1, comm: swapper Not tainted 2.6.32-rc5-mm1_64 #862 > Call Trace: > [] warn_slowpath_common+0x78/0xb0 > [] warn_slowpath_null+0xf/0x20 > [] notify_change+0x2da/0x310 > [] ? fsnotify_create+0x48/0x60 > [] ? vfs_mknod+0xbb/0xe0 > [] devtmpfs_create_node+0x1e6/0x270 > [] ? sysfs_addrm_finish+0x20/0x280 > [] ? __sysfs_add_one+0x26/0xf0 > [] ? sysfs_do_create_link+0xcc/0x160 > [] device_add+0x1e0/0x5b0 > [] ? pm_runtime_init+0xa1/0xb0 > [] ? device_pm_init+0x65/0x70 > [] device_register+0x19/0x20 > [] device_create_vargs+0xf0/0x120 > [] device_create+0x2c/0x30 > [] ? __register_chrdev+0x86/0xf0 > [] ? __class_create+0x69/0xa0 > [] ? mutex_lock+0x19/0x50 > [] misc_register+0x93/0x170 > [] ? vga_arb_device_init+0x0/0x77 > [] vga_arb_device_init+0x13/0x77 > [] ? vga_arb_device_init+0x0/0x77 > [] do_one_initcall+0x37/0x190 > [] kernel_init+0x172/0x1c8 > [] child_rip+0xa/0x20 > [] ? kernel_init+0x0/0x1c8 > [] ? child_rip+0x0/0x20 There's a -mm-only debug patch: http://userweb.kernel.org/~akpm/mmotm/broken-out/notify_change-callers-must-hold-i_mutex.patch --- a/fs/attr.c~notify_change-callers-must-hold-i_mutex +++ a/fs/attr.c @@ -155,6 +155,8 @@ int notify_change(struct dentry * dentry return -EPERM; } + WARN_ON_ONCE(!mutex_is_locked(&inode->i_mutex)); + now = current_fs_time(inode->i_sb); attr->ia_ctime = now; _ I forget why it was added. It looks like it's blaming the open-coded notify_change() call in devtmpfs_create_node(). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xp guest, blue screen c0000221 on boot
OK, if I convert the raw image to qcow2, it boots fine. Why should a raw image give a blue screen? Avi Kivity wrote: On 10/28/2009 05:27 PM, Andrew Olney wrote: Thanks. In pursuing this suggestion I discovered that I also can't make new XP VMs. Setup fails with "the disk may be damaged". The image was created with qemu-img create xp_new.img 13G And the setup command is kvm -cdrom xp/xp_pro.iso -hda xp_new.img -boot d -m 512 -no-acpi -usb -usbdevice tablet What happens if you use a new qcow2 disk? Does it fail in the same way? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xp guest, blue screen c0000221 on boot
Thanks for the suggestion. I've successfully installed on a new qcow2 image. Strangely all of my xp raw images no longer work. That makes me think that this is more than just a corrupt image. I've been using kvm for several years with raw images, so I don't think the BIOS could be a factor either. If there's a way to get my raw images to work, that would be nice. I'm trying to convert one of the raw images to qcow2 -- perhaps that will work. Avi Kivity wrote: On 10/28/2009 05:27 PM, Andrew Olney wrote: Thanks. In pursuing this suggestion I discovered that I also can't make new XP VMs. Setup fails with "the disk may be damaged". The image was created with qemu-img create xp_new.img 13G And the setup command is kvm -cdrom xp/xp_pro.iso -hda xp_new.img -boot d -m 512 -no-acpi -usb -usbdevice tablet What happens if you use a new qcow2 disk? Does it fail in the same way? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Tue, Nov 03, 2009 at 01:14:06PM -0500, Gregory Haskins wrote: > Gregory Haskins wrote: > > Eric Dumazet wrote: > >> Michael S. Tsirkin a écrit : > >>> +static void handle_tx(struct vhost_net *net) > >>> +{ > >>> + struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX]; > >>> + unsigned head, out, in, s; > >>> + struct msghdr msg = { > >>> + .msg_name = NULL, > >>> + .msg_namelen = 0, > >>> + .msg_control = NULL, > >>> + .msg_controllen = 0, > >>> + .msg_iov = vq->iov, > >>> + .msg_flags = MSG_DONTWAIT, > >>> + }; > >>> + size_t len, total_len = 0; > >>> + int err, wmem; > >>> + size_t hdr_size; > >>> + struct socket *sock = rcu_dereference(vq->private_data); > >>> + if (!sock) > >>> + return; > >>> + > >>> + wmem = atomic_read(&sock->sk->sk_wmem_alloc); > >>> + if (wmem >= sock->sk->sk_sndbuf) > >>> + return; > >>> + > >>> + use_mm(net->dev.mm); > >>> + mutex_lock(&vq->mutex); > >>> + vhost_no_notify(vq); > >>> + > >> using rcu_dereference() and mutex_lock() at the same time seems wrong, I > >> suspect > >> that your use of RCU is not correct. > >> > >> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and > >>we are not allowed to sleep in such a section. > >>(Quoting Documentation/RCU/whatisRCU.txt : > >> It is illegal to block while in an RCU read-side critical section, ) > >> > >> 2) mutex_lock() can sleep (ie block) > >> > > > > > > Michael, > > I warned you that this needed better documentation ;) > > > > Eric, > > I think I flagged this once before, but Michael convinced me that it > > was indeed "ok", if but perhaps a bit unconventional. I will try to > > find the thread. > > > > Kind Regards, > > -Greg > > > > Here it is: > > http://lkml.org/lkml/2009/8/12/173 What was happening in that case was that the rcu_dereference() was being used in a workqueue item. The role of rcu_read_lock() was taken on be the start of execution of the workqueue item, of rcu_read_unlock() by the end of execution of the workqueue item, and of synchronize_rcu() by flush_workqueue(). This does work, at least assuming that flush_workqueue() operates as advertised, which it appears to at first glance. The above code looks somewhat different, however -- I don't see handle_tx() being executed in the context of a work queue. Instead it appears to be in an interrupt handler. So what is the story? Using synchronize_irq() or some such? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.31.4 panic: CRED: put_cred_rcu() sees ffff880204e58c00 with usage 82150912
Hello Marcelo, well, my report might have been a bit misleading, 2.6.31.2 has been running there for almost 3 weeks. So the problem didn't occur JUST after the upgrade. But it never occured before it, while we were using older kernels (varius 2.6.30.x, 2.6.29.x and older). I've updated machine to 2.6.31.5, and it's been running without problem since then, but it's been just few days, so we'll see. I'll report if the problem appears again. Thanks. regards nik On Tue, Nov 03, 2009 at 12:46:38PM -0200, Marcelo Tosatti wrote: > On Fri, Oct 30, 2009 at 12:15:34PM +0100, Nikola Ciprich wrote: > > Ouch, typo in subject, it's 2.6.31.1 of course. sorry about that. > > also CCing kvm. > > n. > > > > On Fri, Oct 30, 2009 at 12:06:32PM +0100, Nikola Ciprich wrote: > > > Hi, > > > some time ago, I updated my KVM hosting machine to 2.6.31.1 and it just > > > died horribly: > > Nikola, > > Upgraded from what? Did you see experience the crash again? > > > > Oct 30 10:45:17 vbox [706369.133516] Kernel panic - not syncing: CRED: > > > put_cred_rcu() sees 880204e58c00 with usage 82150912 > > > Oct 30 10:45:17 vbox [706369.133519] > > > Oct 30 10:45:17 vbox [706369.144990] Pid: 19, comm: ksoftirqd/5 Not > > > tainted 2.6.31lb.02 #1 > > > Oct 30 10:45:17 vbox [706369.151554] Call Trace: > > > Oct 30 10:45:17 vbox [706369.154332] > > > Oct 30 10:45:17 vbox [] panic+0xaa/0x180 > > > Oct 30 10:45:17 vbox [706369.160280] [] ? > > > _spin_unlock+0x30/0x60 > > > Oct 30 10:45:17 vbox [706369.166256] [] ? > > > add_partial+0x21/0x90 > > > Oct 30 10:45:17 vbox [706369.172155] [] ? > > > __slab_free+0x92/0x3c0 > > > Oct 30 10:45:17 vbox [706369.178127] [] ? > > > file_free_rcu+0x37/0x50 > > > Oct 30 10:45:17 vbox [706369.184198] [] > > > put_cred_rcu+0x75/0x80 > > > Oct 30 10:45:17 vbox [706369.190008] [] > > > __rcu_process_callbacks+0x125/0x250 > > > Oct 30 10:45:17 vbox [706369.197020] [] > > > rcu_process_callbacks+0x39/0x60 > > > Oct 30 10:45:17 vbox [706369.203624] [] > > > __do_softirq+0xc1/0x250 > > > Oct 30 10:45:17 vbox [706369.209506] [] ? > > > ksoftirqd+0x0/0x1a0 > > > Oct 30 10:45:17 vbox [706369.215182] [] > > > call_softirq+0x1c/0x30 > > > Oct 30 10:45:17 vbox [706369.220986] [] > > > do_softirq+0x3d/0x80 > > > Oct 30 10:45:17 vbox [706369.227317] [] ? > > > ksoftirqd+0x0/0x1a0 > > > Oct 30 10:45:17 vbox [706369.233020] [] > > > ksoftirqd+0x84/0x1a0 > > > Oct 30 10:45:17 vbox [706369.238622] [] > > > kthread+0xa6/0xb0 > > > Oct 30 10:45:17 vbox [706369.243956] [] > > > child_rip+0xa/0x20 > > > Oct 30 10:45:17 vbox [706369.249390] [] ? > > > kthread+0x0/0xb0 > > > Oct 30 10:45:17 vbox [706369.254806] [] ? > > > child_rip+0x0/0x20 > > > Oct 30 10:45:17 vbox [706369.260454] Rebooting in 10 seconds.. > > > (trace is obtained from netconsole, so hopefully it's not mangled). > > > The machine was running ~30 KVM guests, it's 8CPU 16GB x86_64, when it > > > crashed, it was only > > > moderately loaded. Never had this (or any other) kind of crash on it > > > before. > > > I know there's 2.6.31.5 already out, but I'm not sure if some related > > > problem has been > > > reported/fixed and I'm obviously not able to quicky test/reproduce it > > > with latest kernel, > > > so I'm rather reporting. > > > Should more information/testing/etc be required, I'll be glad to help > > > regards > > > nik > -- - Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - pgpsoP7I6Rqq7.pgp Description: PGP signature
Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
On Tue, 2009-11-03 at 10:06 +0100, Alexander Graf wrote: > > DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there > > are HID bits to make it work on 32 bytes only, and an extra DCBZL insn > > that always clears a full cache line (128 bytes). > > Yes. We only come here when we patched the dcbz opcodes to invalid > instructions because cache line size of target == 32. > On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode. > > Admittedly though, this could be a lot more clever. Yeah well, we also really need to fix ppc32 Linux to use the device-tree provided cache line size :-) For 64-bits, that should already be the case, and thus the emulation trick shouldn't be useful as long as you properly provide the guest with the right size in the device-tree. (Though glibc can be nasty, afaik it might load up optimized variants of some routines with hard wired cache line sizes based on the CPU type) > >> + switch (sprn) { > >> + case SPRN_IBAT0U ... SPRN_IBAT3L: > >> + bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2]; > >> + break; > >> + case SPRN_IBAT4U ... SPRN_IBAT7L: > >> + bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2]; > >> + break; > >> + case SPRN_DBAT0U ... SPRN_DBAT3L: > >> + bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2]; > >> + break; > >> + case SPRN_DBAT4U ... SPRN_DBAT7L: > >> + bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2]; > >> + break; > > > > Do xBAT4..7 have the same SPR numbers on all CPUs? They are CPU- > > specific > > SPRs, after all. Some CPUs have only six, some only four, some > > none, btw. > > For now only Linux runs which only uses the first 3(?) IIRC. But yes, > it's probably worth looking into at one point or the other. > > > > >> + case SPRN_HID0: > >> + to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs]; > >> + break; > >> + case SPRN_HID1: > >> + to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs]; > >> + break; > >> + case SPRN_HID2: > >> + to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs]; > >> + break; > >> + case SPRN_HID4: > >> + to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs]; > >> + break; > >> + case SPRN_HID5: > >> + to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs]; > > > > HIDs are different per CPU; and worse, different CPUs have different > > registers (SPR #s) for the same register name! > > Sigh :-( On the other hand, you can probably just "Swallow" all of these and Linux won't even notice, except for the case of the sleep state maybe on 6xx/7xx/7xxx. Just a matter of knowing what your are emulating as guest. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Michael S. Tsirkin a écrit : > > Paul, you acked this previously. Should I add you acked-by line so > people calm down? If you would rather I replace > rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this. > Or maybe patch Documentation to explain this RCU usage? > So you believe I am over-reacting to this dubious use of RCU ? RCU documentation is already very complex, we dont need to add yet another subtle use, and makes it less readable. It seems you use 'RCU api' in drivers/vhost/net.c as convenient macros : #define rcu_dereference(p) ({ \ typeof(p) _p1 = ACCESS_ONCE(p); \ smp_read_barrier_depends(); \ (_p1); \ }) #define rcu_assign_pointer(p, v) \ ({ \ if (!__builtin_constant_p(v) || \ ((v) != NULL)) \ smp_wmb(); \ (p) = (v); \ }) There are plenty regular uses of smp_wmb() in kernel, not related to Read Copy Update, there is nothing wrong to use barriers with appropriate comments. (And you already use mb(), wmb(), rmb(), smp_wmb() in your patch) BTW there is at least one locking bug in vhost_net_set_features() Apparently, mutex_unlock() doesnt trigger a fault if mutex is not locked by current thread... even with DEBUG_MUTEXES / DEBUG_LOCK_ALLOC static void vhost_net_set_features(struct vhost_net *n, u64 features) { size_t hdr_size = features & (1 << VHOST_NET_F_VIRTIO_NET_HDR) ? sizeof(struct virtio_net_hdr) : 0; int i; <> mutex_unlock(&n->dev.mutex); n->dev.acked_features = features; smp_wmb(); for (i = 0; i < VHOST_NET_VQ_MAX; ++i) { mutex_lock(&n->vqs[i].mutex); n->vqs[i].hdr_size = hdr_size; mutex_unlock(&n->vqs[i].mutex); } mutex_unlock(&n->dev.mutex); vhost_net_flush(n); } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Eric Dumazet wrote: > Gregory Haskins a écrit : >> Gregory Haskins wrote: >>> Eric Dumazet wrote: Michael S. Tsirkin a écrit : using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) >>> Michael, >>> I warned you that this needed better documentation ;) >>> >>> Eric, >>> I think I flagged this once before, but Michael convinced me that it >>> was indeed "ok", if but perhaps a bit unconventional. I will try to >>> find the thread. >>> >>> Kind Regards, >>> -Greg >>> >> Here it is: >> >> http://lkml.org/lkml/2009/8/12/173 >> > > Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU > use. > People wanting to use RCU do a grep on kernel sources to find how to correctly > use RCU. > > Michael, please use existing locking/barrier mechanisms, and not pretend to > use RCU. Yes, I would tend to agree with you. In fact, I think I suggested that a normal barrier should be used instead of abusing rcu_dereference(). But as far as his code is concerned, I think it technically works properly, and that was my main point. Also note that the usage rcu_dereference+mutex_lock() are not necessarily broken, per se: it could be an srcu-based critical section created by the caller, for instance. It would be perfectly legal to sleep on the mutex if that were the case. To me, the bigger issue is that the rcu_dereference() without any apparent hint of a corresponding RSCS is simply confusing as a reviewer. smp_rmb() (or whatever is proper in this case) is probably more appropriate. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation
On 10/30/09 09:12, Hannes Reinecke wrote: Gerd Hoffmann wrote: http://repo.or.cz/w/qemu/kraxel.git?a=shortlog;h=refs/heads/scsi.v1 It is far from being completed, will continue tomorrow. Should give a idea of the direction I'm heading to though. Comments welcome. Yep, this looks good. More bits available now at: http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v3 Please have a look at the new interface, this commit: http://repo.or.cz/w/qemu/kraxel.git/commitdiff/9c825dac540282dd4d5f5f660ca13af617888037 The workflow I have in mind is: ->request_new() Returns a parsed request, i.e. all fields of req->cmd are filled already, so the host adapter can easily check whenever it is a read or write and how many bytes will be transfered. ->request_run() Execute the command. iovec is passed to the dma_bdrv_*() functions, which means it should contain guest physical addresses. When the request is done the complete callback is called. For commands which complete immediately the callback might be called before request_run() returns. Or maybe don't call the callback at all then? We can easily indicate using the return value that we are done already. ->request_del() Release request when done. Questions & comments are welcome. cheers, Gerd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Tue, Nov 03, 2009 at 07:03:55PM +0100, Eric Dumazet wrote: > Michael S. Tsirkin a écrit : > > +static void handle_tx(struct vhost_net *net) > > +{ > > + struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX]; > > + unsigned head, out, in, s; > > + struct msghdr msg = { > > + .msg_name = NULL, > > + .msg_namelen = 0, > > + .msg_control = NULL, > > + .msg_controllen = 0, > > + .msg_iov = vq->iov, > > + .msg_flags = MSG_DONTWAIT, > > + }; > > + size_t len, total_len = 0; > > + int err, wmem; > > + size_t hdr_size; > > + struct socket *sock = rcu_dereference(vq->private_data); > > + if (!sock) > > + return; > > + > > + wmem = atomic_read(&sock->sk->sk_wmem_alloc); > > + if (wmem >= sock->sk->sk_sndbuf) > > + return; > > + > > + use_mm(net->dev.mm); > > + mutex_lock(&vq->mutex); > > + vhost_no_notify(vq); > > + > > using rcu_dereference() and mutex_lock() at the same time seems wrong, I > suspect > that your use of RCU is not correct. > > 1) rcu_dereference() should be done inside a read_rcu_lock() section, and >we are not allowed to sleep in such a section. >(Quoting Documentation/RCU/whatisRCU.txt : > It is illegal to block while in an RCU read-side critical section, ) > > 2) mutex_lock() can sleep (ie block) This use is correct. See comment in vhost.h This use of RCU has been acked by Paul E. McKenney (paul...@linux.vnet.ibm.com) as well. There are many ways to use RCU not all of which involve read_rcu_lock. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libvirt bug #532480
On 11/03/2009 06:48 PM, Brian Jackson wrote: On Tuesday 03 November 2009 06:02:42 am roma1390 wrote: Lib virt thinks that bug #532480 must be addressed to quemu/kvm team. https://bugzilla.redhat.com/show_bug.cgi?id=532480 For future reference adding some overview to your email instead of making all the devs with arguably limited time go read through a bug report is probably a good idea. Any ideas how to fix this issue? Iirc, it's being worked on. And yes, it is the developers of said drivers responsibility to do the signing. Keep watching the url from the bug for updated drivers. Until then, there are workarounds to this issue also mentioned at that url. http://sourceforge.net/projects/kvm/files/kvm-driver-disc/20080318/kvm-driver-disc-20080318.iso/download doesn't contain viostor.sys (virtual block driver) at all. http://people.redhat.com/~yvugenfi/24.09.2009/viostor.zip drivers are not signed (there are no cat files inside), so you need to do it by yourself. Download and install WDK first, than take a look at Selfsign_example.cmd example. It is pretty self-explaining I think. You can also find tons on information about driver signing by googling on the Internet. btw, as a developer I care about passing the WHQL tests, not about signing. Regards, Vadim -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Tue, Nov 03, 2009 at 07:51:35PM +0100, Eric Dumazet wrote: > Gregory Haskins a écrit : > > Gregory Haskins wrote: > >> Eric Dumazet wrote: > >>> Michael S. Tsirkin a écrit : > +static void handle_tx(struct vhost_net *net) > +{ > +struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX]; > +unsigned head, out, in, s; > +struct msghdr msg = { > +.msg_name = NULL, > +.msg_namelen = 0, > +.msg_control = NULL, > +.msg_controllen = 0, > +.msg_iov = vq->iov, > +.msg_flags = MSG_DONTWAIT, > +}; > +size_t len, total_len = 0; > +int err, wmem; > +size_t hdr_size; > +struct socket *sock = rcu_dereference(vq->private_data); > +if (!sock) > +return; > + > +wmem = atomic_read(&sock->sk->sk_wmem_alloc); > +if (wmem >= sock->sk->sk_sndbuf) > +return; > + > +use_mm(net->dev.mm); > +mutex_lock(&vq->mutex); > +vhost_no_notify(vq); > + > >>> using rcu_dereference() and mutex_lock() at the same time seems wrong, I > >>> suspect > >>> that your use of RCU is not correct. > >>> > >>> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and > >>>we are not allowed to sleep in such a section. > >>>(Quoting Documentation/RCU/whatisRCU.txt : > >>> It is illegal to block while in an RCU read-side critical section, ) > >>> > >>> 2) mutex_lock() can sleep (ie block) > >>> > >> > >> Michael, > >> I warned you that this needed better documentation ;) > >> > >> Eric, > >> I think I flagged this once before, but Michael convinced me that it > >> was indeed "ok", if but perhaps a bit unconventional. I will try to > >> find the thread. > >> > >> Kind Regards, > >> -Greg > >> > > > > Here it is: > > > > http://lkml.org/lkml/2009/8/12/173 > > > > Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU > use. > People wanting to use RCU do a grep on kernel sources to find how to correctly > use RCU. > > Michael, please use existing locking/barrier mechanisms, and not pretend to > use RCU. > > Some automatic tools might barf later. > > For example, we could add a debugging facility to check that > rcu_dereference() is used > in an appropriate context, ie conflict with existing mutex_lock() debugging > facility. Paul, you acked this previously. Should I add you acked-by line so people calm down? If you would rather I replace rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this. Or maybe patch Documentation to explain this RCU usage? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XP blue screen with qemu-kvm-0.11.0
On Sat, 2009-10-31 at 15:21 +0300, Michael Tokarev wrote: > Ross Boylan wrote: > > My XP VM was working OK, and then started crashing shortly after it > > logged me in. There were no obvious changes at the time. I built the > > latest qemu-kvm, but the problem persists. > > > > I am running 32 bit XP on Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (8 cores > > total), Debian GNU/Linux mostly Lenny (amd64), but with some more recent > > stuff. In particular, the kernel is 2.6.30-8 and I pulled in the > > kernel-headers package to match before building kvm. However, libc6 and > > libc6-dev are at Lenny's 2.7-18 version. > > Libc is basically irrelevant here. What matters are the host kernel > and kvm version. > > > $ ./XP.sh > > ++ sudo vdeq bin/qemu-system-x86_64 -net > > nic,vlan=1,macaddr=52:54:a0:12:01:00 -net > > vde,vlan=1,sock=/var/run/vde2/tap0.ctl -boot c -vga std -hda > > /dev/turtle/XP01 -soundhw es1370 -localtime -m 1G -smp 2 > > arg ,vlan=1,sock=/var/run/vde2/tap0.ctl > > TUNGETIFF ioctl() failed: Invalid argument > > TUNSETOFFLOAD ioctl() failed: Bad address > > oss: Could not initialize DAC > > oss: Failed to open `/dev/dsp' > > oss: Reason: Device or resource busy > > oss: Could not initialize DAC > > oss: Failed to open `/dev/dsp' > > oss: Reason: Device or resource busy > > audio: Failed to create voice `es1370.dac2' > > # and more sound-related complaints > > Switch to alsa to get your audio working. I don't see an alsa option for kvm/qemu. I'm already running alsa, but under KDE which tends to grab the device. > > > The VM starts; I see the initial XP screen with the 4 colors; I see the > > background I get when I log in (it logs me in directly without prompt); > > and then (pretty fast) I get a blue screen. The stop code is 0x8E, and > > the text says to check disk space and BIOS options. > > What's the bios files your kvm uses? Are they by a change > from some old qemu install? They appear to be from the latest install, since strace shows various bios files loading from /usr/local/kvm/share. The invoking environment was a little different from the real run, since strace vdeq apparently traced vdeq but not kvm calls. So I just ran the kvm bare. > > Does kvm deb from http://www.corpit.ru/debian/tls/kvm/ expose the same > issue? Yes. However, as it fails it left a reverberating sound (fragment of the Windows login tone). I tried starting in safe mode. XP said there was new hardware: the video (-vga std). It could not find a driver on the internet(!? the device was identified as a VGA controller). Then it told me the driver had been installed (after I hit finish). I rebooted in regular mode. This time there was no sound, but the machine failed again with STOP 0x8E (as before). The video appeared to be working throughout this, showing a window that exceeded vanilla VGA resolution. Ross -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support
On 11/03/2009 05:05 PM, Avi Kivity wrote: On 11/03/2009 05:29 PM, Ondrej Zajicek wrote: On Tue, Nov 03, 2009 at 11:38:18AM +0200, Avi Kivity wrote: On 11/03/2009 01:25 PM, Vincent Hanquez wrote: not sure if i'm missing the point here, but couldn't it be hypothetically extended to stuff 3d (or video& more 2d accel ?) commands too ? I can't imagine the cirrus or stdvga driver be able to do that ever ;) cirrus has pretty good 2d acceleration. 3D is a mega-project though. Cirrus has no blending/compositing hardware support. Paravirtualized graphics can easily support full XRender-style 2D acceleration. What do that entail? 3/4 operand raster ops? With alpha compositing. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Gregory Haskins a écrit : > Gregory Haskins wrote: >> Eric Dumazet wrote: >>> Michael S. Tsirkin a écrit : +static void handle_tx(struct vhost_net *net) +{ + struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX]; + unsigned head, out, in, s; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_control = NULL, + .msg_controllen = 0, + .msg_iov = vq->iov, + .msg_flags = MSG_DONTWAIT, + }; + size_t len, total_len = 0; + int err, wmem; + size_t hdr_size; + struct socket *sock = rcu_dereference(vq->private_data); + if (!sock) + return; + + wmem = atomic_read(&sock->sk->sk_wmem_alloc); + if (wmem >= sock->sk->sk_sndbuf) + return; + + use_mm(net->dev.mm); + mutex_lock(&vq->mutex); + vhost_no_notify(vq); + >>> using rcu_dereference() and mutex_lock() at the same time seems wrong, I >>> suspect >>> that your use of RCU is not correct. >>> >>> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and >>>we are not allowed to sleep in such a section. >>>(Quoting Documentation/RCU/whatisRCU.txt : >>> It is illegal to block while in an RCU read-side critical section, ) >>> >>> 2) mutex_lock() can sleep (ie block) >>> >> >> Michael, >> I warned you that this needed better documentation ;) >> >> Eric, >> I think I flagged this once before, but Michael convinced me that it >> was indeed "ok", if but perhaps a bit unconventional. I will try to >> find the thread. >> >> Kind Regards, >> -Greg >> > > Here it is: > > http://lkml.org/lkml/2009/8/12/173 > Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU use. People wanting to use RCU do a grep on kernel sources to find how to correctly use RCU. Michael, please use existing locking/barrier mechanisms, and not pretend to use RCU. Some automatic tools might barf later. For example, we could add a debugging facility to check that rcu_dereference() is used in an appropriate context, ie conflict with existing mutex_lock() debugging facility. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Eric Dumazet wrote: > Michael S. Tsirkin a écrit : >> +static void handle_tx(struct vhost_net *net) >> +{ >> +struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX]; >> +unsigned head, out, in, s; >> +struct msghdr msg = { >> +.msg_name = NULL, >> +.msg_namelen = 0, >> +.msg_control = NULL, >> +.msg_controllen = 0, >> +.msg_iov = vq->iov, >> +.msg_flags = MSG_DONTWAIT, >> +}; >> +size_t len, total_len = 0; >> +int err, wmem; >> +size_t hdr_size; >> +struct socket *sock = rcu_dereference(vq->private_data); >> +if (!sock) >> +return; >> + >> +wmem = atomic_read(&sock->sk->sk_wmem_alloc); >> +if (wmem >= sock->sk->sk_sndbuf) >> +return; >> + >> +use_mm(net->dev.mm); >> +mutex_lock(&vq->mutex); >> +vhost_no_notify(vq); >> + > > using rcu_dereference() and mutex_lock() at the same time seems wrong, I > suspect > that your use of RCU is not correct. > > 1) rcu_dereference() should be done inside a read_rcu_lock() section, and >we are not allowed to sleep in such a section. >(Quoting Documentation/RCU/whatisRCU.txt : > It is illegal to block while in an RCU read-side critical section, ) > > 2) mutex_lock() can sleep (ie block) > Michael, I warned you that this needed better documentation ;) Eric, I think I flagged this once before, but Michael convinced me that it was indeed "ok", if but perhaps a bit unconventional. I will try to find the thread. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
[PATCHv7 3/3] vhost_net: a kernel-level virtio server
What it is: vhost net is a character device that can be used to reduce the number of system calls involved in virtio networking. Existing virtio net code is used in the guest without modification. There's similarity with vringfd, with some differences and reduced scope - uses eventfd for signalling - structures can be moved around in memory at any time (good for migration, bug work-arounds in userspace) - write logging is supported (good for migration) - support memory table and not just an offset (needed for kvm) common virtio related code has been put in a separate file vhost.c and can be made into a separate module if/when more backends appear. I used Rusty's lguest.c as the source for developing this part : this supplied me with witty comments I wouldn't be able to write myself. What it is not: vhost net is not a bus, and not a generic new system call. No assumptions are made on how guest performs hypercalls. Userspace hypervisors are supported as well as kvm. How it works: Basically, we connect virtio frontend (configured by userspace) to a backend. The backend could be a network device, or a tap device. Backend is also configured by userspace, including vlan/mac etc. Status: This works for me, and I haven't see any crashes. Compared to userspace, people reported improved latency (as I save up to 4 system calls per packet), as well as better bandwidth and CPU utilization. Features that I plan to look at in the future: - mergeable buffers - zero copy - scalability tuning: figure out the best threading model to use Acked-by: Arnd Bergmann Signed-off-by: Michael S. Tsirkin --- MAINTAINERS|9 + arch/x86/kvm/Kconfig |1 + drivers/Makefile |1 + drivers/vhost/Kconfig | 11 + drivers/vhost/Makefile |2 + drivers/vhost/net.c| 633 + drivers/vhost/vhost.c | 970 drivers/vhost/vhost.h | 158 +++ include/linux/Kbuild |1 + include/linux/miscdevice.h |1 + include/linux/vhost.h | 126 ++ 11 files changed, 1913 insertions(+), 0 deletions(-) create mode 100644 drivers/vhost/Kconfig create mode 100644 drivers/vhost/Makefile create mode 100644 drivers/vhost/net.c create mode 100644 drivers/vhost/vhost.c create mode 100644 drivers/vhost/vhost.h create mode 100644 include/linux/vhost.h diff --git a/MAINTAINERS b/MAINTAINERS index 8824115..980a69b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5619,6 +5619,15 @@ S: Maintained F: Documentation/filesystems/vfat.txt F: fs/fat/ +VIRTIO HOST (VHOST) +M: "Michael S. Tsirkin" +L: kvm@vger.kernel.org +L: virtualizat...@lists.osdl.org +L: net...@vger.kernel.org +S: Maintained +F: drivers/vhost/ +F: include/linux/vhost.h + VIA RHINE NETWORK DRIVER M: Roger Luethi S: Maintained diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index b84e571..94f44d9 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -64,6 +64,7 @@ config KVM_AMD # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. +source drivers/vhost/Kconfig source drivers/lguest/Kconfig source drivers/virtio/Kconfig diff --git a/drivers/Makefile b/drivers/Makefile index 6ee53c7..81e3659 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -106,6 +106,7 @@ obj-$(CONFIG_HID) += hid/ obj-$(CONFIG_PPC_PS3) += ps3/ obj-$(CONFIG_OF) += of/ obj-$(CONFIG_SSB) += ssb/ +obj-$(CONFIG_VHOST_NET)+= vhost/ obj-$(CONFIG_VIRTIO) += virtio/ obj-$(CONFIG_VLYNQ)+= vlynq/ obj-$(CONFIG_STAGING) += staging/ diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig new file mode 100644 index 000..d955406 --- /dev/null +++ b/drivers/vhost/Kconfig @@ -0,0 +1,11 @@ +config VHOST_NET + tristate "Host kernel accelerator for virtio net" + depends on NET && EVENTFD + ---help--- + This kernel module can be loaded in host kernel to accelerate + guest networking with virtio_net. Not to be confused with virtio_net + module itself which needs to be loaded in guest kernel. + + To compile this driver as a module, choose M here: the module will + be called vhost_net. + diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile new file mode 100644 index 000..72dd020 --- /dev/null +++ b/drivers/vhost/Makefile @@ -0,0 +1,2 @@ +obj-$(CONFIG_VHOST_NET) += vhost_net.o +vhost_net-y := vhost.o net.o diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c new file mode 100644 index 000..4af3b98 --- /dev/null +++ b/drivers/vhost/net.c @@ -0,0 +1,633 @@ +/* Copyright (C) 2009 Red Hat, Inc. + * Author: Michael S. Tsirkin + * + * This work is licensed under the terms of the GNU GPL, version 2. + * + * virtio-net server in host kernel. + */ + +#include +#include +#inclu
Re: [RFC] make cpu creation happen inside the right thread.
On Tue, Nov 03, 2009 at 12:35:08PM -0200, Glauber Costa wrote: > Right now, we issue cpu creation from the i/o thread, and then shoot a thread > from inside that code. Over the last months, a lot of subtle bugs were > reported, > usually arising from the very fragile order of that initialization. > > I propose we rethink that a little. This is a patch that received basic > testing > only, and I'd like to hear on the overall direction. The idea is to issue > the new > thread as early as possible. The first direct benefits I can identify are that > we no longer have to rely at on_vcpu-like schemes for issuing vcpu ioctls, > since > we are already on the right thread. Apic creation has far less spots for race > conditions as well. > > I am implementing this on qemu-kvm first, since we can show the benefits of it > a bit better in there (since we already support smp) > > Let me know what you guys think Makes sense to me. You still need on_vcpu for issuing vcpu ioctls though (after initialization). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] make cpu creation happen inside the right thread.
On Tue, Nov 03, 2009 at 03:46:07PM -0200, Marcelo Tosatti wrote: > On Tue, Nov 03, 2009 at 12:35:08PM -0200, Glauber Costa wrote: > > Right now, we issue cpu creation from the i/o thread, and then shoot a > > thread > > from inside that code. Over the last months, a lot of subtle bugs were > > reported, > > usually arising from the very fragile order of that initialization. > > > > I propose we rethink that a little. This is a patch that received basic > > testing > > only, and I'd like to hear on the overall direction. The idea is to issue > > the new > > thread as early as possible. The first direct benefits I can identify are > > that > > we no longer have to rely at on_vcpu-like schemes for issuing vcpu ioctls, > > since > > we are already on the right thread. Apic creation has far less spots for > > race > > conditions as well. > > > > I am implementing this on qemu-kvm first, since we can show the benefits of > > it > > a bit better in there (since we already support smp) > > > > Let me know what you guys think > > Makes sense to me. You still need on_vcpu for issuing vcpu ioctls though > (after initialization). Yes, but I believe we can avoid most of them. There is a performance hit of using it, but I am not so concerned with that. The nasty races that arises from it, are more a concern to me. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] qemu-kvm: x86: Add support for event states
This patch extends the qemu-kvm state sync logic with the event substate from the new VCPU state interface, giving access to yet missing exception, interrupt and NMI states. The patch does not switch the rest of qemu-kvm's code to the new interface as it is expected to be morphed into upstream's version anyway. Instead, a full conversion will be submitted for upstream. Signed-off-by: Jan Kiszka --- qemu-kvm-x86.c| 85 + target-i386/cpu.h |4 ++ target-i386/machine.c |4 ++ 3 files changed, 93 insertions(+), 0 deletions(-) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index e03a4ba..b12b103 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -903,6 +903,81 @@ static void get_seg(SegmentCache *lhs, const struct kvm_segment *rhs) | (rhs->avl * DESC_AVL_MASK); } +static void kvm_get_events(CPUState *env) +{ +#ifdef KVM_CAP_VCPU_STATE +struct { +struct kvm_vcpu_state header; +struct kvm_vcpu_substate substates[1]; +} request; +struct kvm_x86_event_state events; +int r; + +request.header.nsubstates = 1; +request.header.substates[0].type = KVM_X86_VCPU_STATE_EVENTS; +request.header.substates[0].offset = (size_t)&events - (size_t)&request; +r = kvm_vcpu_ioctl(env, KVM_GET_VCPU_STATE, &request); +if (r == 0) { +if (events.exception.injected) { +env->exception_index = events.exception.nr; +env->error_code = events.exception.error_code; +} else { +env->exception_index = -1; +} + +env->interrupt_injected = +events.interrupt.injected ? events.interrupt.nr : -1; +env->soft_interrupt = events.interrupt.soft; + +env->nmi_injected = events.nmi.injected; +env->nmi_pending = events.nmi.pending; +if (events.nmi.masked) { +env->hflags2 |= HF2_NMI_MASK; +} else { +env->hflags2 &= ~HF2_NMI_MASK; +} + +env->sipi_vector = events.sipi_vector; + +return; +} +#endif +env->nmi_injected = 0; +env->nmi_pending = 0; +env->hflags2 &= ~HF2_NMI_MASK; +} + +static void kvm_set_events(CPUState *env) +{ +#ifdef KVM_CAP_VCPU_STATE +struct { +struct kvm_vcpu_state header; +struct kvm_vcpu_substate substates[1]; +} request; +struct kvm_x86_event_state events; + +request.header.nsubstates = 1; +request.header.substates[0].type = KVM_X86_VCPU_STATE_EVENTS; +request.header.substates[0].offset = (size_t)&events - (size_t)&request; + +events.exception.injected = (env->exception_index >= 0); +events.exception.nr = env->exception_index; +events.exception.error_code = env->error_code; + +events.interrupt.injected = (env->interrupt_injected >= 0); +events.interrupt.nr = env->interrupt_injected; +events.interrupt.soft = env->soft_interrupt; + +events.nmi.injected = env->nmi_injected; +events.nmi.pending = env->nmi_pending; +events.nmi.masked = !!(env->hflags2 & HF2_NMI_MASK); + +events.sipi_vector = env->sipi_vector; + +kvm_vcpu_ioctl(env, KVM_SET_VCPU_STATE, &request); +#endif +} + void kvm_arch_load_regs(CPUState *env) { struct kvm_regs regs; @@ -1019,6 +1094,8 @@ void kvm_arch_load_regs(CPUState *env) rc = kvm_set_msrs(env, msrs, n); if (rc == -1) perror("kvm_set_msrs FAILED"); + +kvm_set_events(env); } void kvm_load_tsc(CPUState *env) @@ -1215,6 +1292,8 @@ void kvm_arch_save_regs(CPUState *env) return; } } + +kvm_get_events(env); } static void do_cpuid_ent(struct kvm_cpuid_entry2 *e, uint32_t function, @@ -1383,7 +1462,10 @@ int kvm_arch_init_vcpu(CPUState *cenv) kvm_tpr_vcpu_start(cenv); #endif +cenv->exception_index = -1; cenv->interrupt_injected = -1; +cenv->nmi_injected = 0; +cenv->nmi_pending = 0; return 0; } @@ -1453,7 +1535,10 @@ void kvm_arch_push_nmi(void *opaque) void kvm_arch_cpu_reset(CPUState *env) { +env->exception_index = -1; env->interrupt_injected = -1; +env->nmi_injected = 0; +env->nmi_pending = 0; kvm_arch_load_regs(env); if (!cpu_is_bsp(env)) { if (kvm_irqchip_in_kernel()) { diff --git a/target-i386/cpu.h b/target-i386/cpu.h index a638e70..863c5a1 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -711,6 +711,10 @@ typedef struct CPUX86State { /* For KVM */ uint32_t mp_state; int32_t interrupt_injected; +uint8_t soft_interrupt; +uint8_t nmi_injected; +uint8_t nmi_pending; +uint32_t sipi_vector; /* in order to simplify APIC support, we leave this pointer to the user */ diff --git a/target-i386/machine.c b/target-i386/machine.c index 6bd447f..f066e6a 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -452,6 +452,10 @@ static const VMStateDescription vmstate_cpu = { VMSTATE_INT32_V(interrupt_injected, CPUState, 9),
[PATCH 1/2] qemu-kvm: x86: Refactor use of interrupt_bitmap
Drop interrupt_bitmap from the cpustate and solely rely on the integer interupt_injected. This prepares us for the new injected-interrupt interface, which will deprecate the bitmap, while preserving compatibility. Signed-off-by: Jan Kiszka --- Note: A corresponding version for upstream is on the way as well. qemu-kvm-x86.c| 19 +-- target-i386/cpu.h |3 +-- target-i386/kvm.c | 24 +--- target-i386/machine.c | 24 ++-- 4 files changed, 37 insertions(+), 33 deletions(-) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 9df0d83..e03a4ba 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -946,7 +946,11 @@ void kvm_arch_load_regs(CPUState *env) fpu.mxcsr = env->mxcsr; kvm_set_fpu(env, &fpu); -memcpy(sregs.interrupt_bitmap, env->interrupt_bitmap, sizeof(sregs.interrupt_bitmap)); +memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap)); +if (env->interrupt_injected >= 0) { +sregs.interrupt_bitmap[env->interrupt_injected / 64] |= +(uint64_t)1 << (env->interrupt_injected % 64); +} if ((env->eflags & VM_MASK)) { set_v8086_seg(&sregs.cs, &env->segs[R_CS]); @@ -1104,7 +1108,14 @@ void kvm_arch_save_regs(CPUState *env) kvm_get_sregs(env, &sregs); -memcpy(env->interrupt_bitmap, sregs.interrupt_bitmap, sizeof(env->interrupt_bitmap)); +env->interrupt_injected = -1; +for (i = 0; i < ARRAY_SIZE(sregs.interrupt_bitmap); i++) { +if (sregs.interrupt_bitmap[i]) { +n = ctz64(sregs.interrupt_bitmap[i]); +env->interrupt_injected = i * 64 + n; +break; +} +} get_seg(&env->segs[R_CS], &sregs.cs); get_seg(&env->segs[R_DS], &sregs.ds); @@ -1371,6 +1382,9 @@ int kvm_arch_init_vcpu(CPUState *cenv) #ifdef KVM_EXIT_TPR_ACCESS kvm_tpr_vcpu_start(cenv); #endif + +cenv->interrupt_injected = -1; + return 0; } @@ -1439,6 +1453,7 @@ void kvm_arch_push_nmi(void *opaque) void kvm_arch_cpu_reset(CPUState *env) { +env->interrupt_injected = -1; kvm_arch_load_regs(env); if (!cpu_is_bsp(env)) { if (kvm_irqchip_in_kernel()) { diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 4605fd2..a638e70 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -709,8 +709,8 @@ typedef struct CPUX86State { MTRRVar mtrr_var[8]; /* For KVM */ -uint64_t interrupt_bitmap[256 / 64]; uint32_t mp_state; +int32_t interrupt_injected; /* in order to simplify APIC support, we leave this pointer to the user */ @@ -727,7 +727,6 @@ typedef struct CPUX86State { uint16_t fpus_vmstate; uint16_t fptag_vmstate; uint16_t fpregs_format_vmstate; -int32_t pending_irq_vmstate; } CPUX86State; CPUX86State *cpu_x86_init(const char *cpu_model); diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 24c9903..33f7d65 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -23,6 +23,7 @@ #include "kvm.h" #include "cpu.h" #include "gdbstub.h" +#include "host-utils.h" #ifdef KVM_UPSTREAM //#define DEBUG_KVM @@ -408,9 +409,11 @@ static int kvm_put_sregs(CPUState *env) { struct kvm_sregs sregs; -memcpy(sregs.interrupt_bitmap, - env->interrupt_bitmap, - sizeof(sregs.interrupt_bitmap)); +memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap)); +if (env->interrupt_injected >= 0) { +sregs.interrupt_bitmap[env->interrupt_injected / 64] |= +(uint64_t)1 << (env->interrupt_injected % 64); +} if ((env->eflags & VM_MASK)) { set_v8086_seg(&sregs.cs, &env->segs[R_CS]); @@ -518,15 +521,22 @@ static int kvm_get_sregs(CPUState *env) { struct kvm_sregs sregs; uint32_t hflags; -int ret; +int bit, i, ret; ret = kvm_vcpu_ioctl(env, KVM_GET_SREGS, &sregs); if (ret < 0) return ret; -memcpy(env->interrupt_bitmap, - sregs.interrupt_bitmap, - sizeof(sregs.interrupt_bitmap)); +/* There can only be one pending IRQ set in the bitmap at a time, so try + to find it and save its number instead (-1 for none). */ +env->interrupt_injected = -1; +for (i = 0; i < ARRAY_SIZE(sregs.interrupt_bitmap); i++) { +if (sregs.interrupt_bitmap[i]) { +bit = ctz64(sregs.interrupt_bitmap[i]); +env->interrupt_injected = i * 64 + bit; +break; +} +} get_seg(&env->segs[R_CS], &sregs.cs); get_seg(&env->segs[R_DS], &sregs.ds); diff --git a/target-i386/machine.c b/target-i386/machine.c index 2b88fea..6bd447f 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -2,7 +2,6 @@ #include "hw/boards.h" #include "hw/pc.h" #include "hw/isa.h" -#include "host-utils.h" #include "exec-all.h" #include "kvm.h" @@ -321,7 +320,7 @@ static const VMStateInfo vmstate_hack_uint64_as_uint32 = { static void cpu_pre_save(void *opaq
Re: [PATCHv7 2/3] mm: export use_mm/unuse_mm to modules
Michael S. Tsirkin wrote: > vhost net module wants to do copy to/from user from a kernel thread, > which needs use_mm. Export it to modules. > > Acked-by: Andrea Arcangeli > Signed-off-by: Michael S. Tsirkin I need this too: Acked-by: Gregory Haskins > --- > mm/mmu_context.c |3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/mm/mmu_context.c b/mm/mmu_context.c > index ded9081..0777654 100644 > --- a/mm/mmu_context.c > +++ b/mm/mmu_context.c > @@ -5,6 +5,7 @@ > > #include > #include > +#include > #include > > #include > @@ -37,6 +38,7 @@ void use_mm(struct mm_struct *mm) > if (active_mm != mm) > mmdrop(active_mm); > } > +EXPORT_SYMBOL_GPL(use_mm); > > /* > * unuse_mm > @@ -56,3 +58,4 @@ void unuse_mm(struct mm_struct *mm) > enter_lazy_tlb(mm, tsk); > task_unlock(tsk); > } > +EXPORT_SYMBOL_GPL(unuse_mm); signature.asc Description: OpenPGP digital signature
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Gregory Haskins wrote: > Eric Dumazet wrote: >> Michael S. Tsirkin a écrit : >>> +static void handle_tx(struct vhost_net *net) >>> +{ >>> + struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX]; >>> + unsigned head, out, in, s; >>> + struct msghdr msg = { >>> + .msg_name = NULL, >>> + .msg_namelen = 0, >>> + .msg_control = NULL, >>> + .msg_controllen = 0, >>> + .msg_iov = vq->iov, >>> + .msg_flags = MSG_DONTWAIT, >>> + }; >>> + size_t len, total_len = 0; >>> + int err, wmem; >>> + size_t hdr_size; >>> + struct socket *sock = rcu_dereference(vq->private_data); >>> + if (!sock) >>> + return; >>> + >>> + wmem = atomic_read(&sock->sk->sk_wmem_alloc); >>> + if (wmem >= sock->sk->sk_sndbuf) >>> + return; >>> + >>> + use_mm(net->dev.mm); >>> + mutex_lock(&vq->mutex); >>> + vhost_no_notify(vq); >>> + >> using rcu_dereference() and mutex_lock() at the same time seems wrong, I >> suspect >> that your use of RCU is not correct. >> >> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and >>we are not allowed to sleep in such a section. >>(Quoting Documentation/RCU/whatisRCU.txt : >> It is illegal to block while in an RCU read-side critical section, ) >> >> 2) mutex_lock() can sleep (ie block) >> > > > Michael, > I warned you that this needed better documentation ;) > > Eric, > I think I flagged this once before, but Michael convinced me that it > was indeed "ok", if but perhaps a bit unconventional. I will try to > find the thread. > > Kind Regards, > -Greg > Here it is: http://lkml.org/lkml/2009/8/12/173 Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Michael S. Tsirkin a écrit : > +static void handle_tx(struct vhost_net *net) > +{ > + struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX]; > + unsigned head, out, in, s; > + struct msghdr msg = { > + .msg_name = NULL, > + .msg_namelen = 0, > + .msg_control = NULL, > + .msg_controllen = 0, > + .msg_iov = vq->iov, > + .msg_flags = MSG_DONTWAIT, > + }; > + size_t len, total_len = 0; > + int err, wmem; > + size_t hdr_size; > + struct socket *sock = rcu_dereference(vq->private_data); > + if (!sock) > + return; > + > + wmem = atomic_read(&sock->sk->sk_wmem_alloc); > + if (wmem >= sock->sk->sk_sndbuf) > + return; > + > + use_mm(net->dev.mm); > + mutex_lock(&vq->mutex); > + vhost_no_notify(vq); > + using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm problem: bonding network interface breaks dhcp
On Tue, Nov 03, 2009 at 04:45:48PM +0100, Harald Dunkel wrote: > I am trying to use a bonding network interface as a bridge > for a virtual machine (kvm). Host and guest are both running > 2.6.31.5. Problem: The guest does not receive the DHCPOFFER > reply sent by my dhcp server. There is no such problem if > the host uses just a single network interface instead of > bond0. The output of brctl show, ip addr list, and cat /proc/net/bonding/bond* might be helpful. - Matt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support
On Tue, Nov 03, 2009 at 06:05:13PM +0200, Avi Kivity wrote: >>> cirrus has pretty good 2d acceleration. 3D is a mega-project though. >>> >> Cirrus has no blending/compositing hardware support. >> Paravirtualized graphics can easily support full XRender-style >> 2D acceleration. > > What do that entail? 3/4 operand raster ops? Yes, basically three operand render/composite operation and rendering of some 2D primitives (trapezoids). -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so." signature.asc Description: Digital signature
Re: libvirt bug #532480
On Tuesday 03 November 2009 06:02:42 am roma1390 wrote: > Lib virt thinks that bug #532480 must be addressed to quemu/kvm team. > >https://bugzilla.redhat.com/show_bug.cgi?id=532480 For future reference adding some overview to your email instead of making all the devs with arguably limited time go read through a bug report is probably a good idea. > > Any ideas how to fix this issue? Iirc, it's being worked on. And yes, it is the developers of said drivers responsibility to do the signing. Keep watching the url from the bug for updated drivers. Until then, there are workarounds to this issue also mentioned at that url. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] make cpu creation happen inside the right thread.
Right now, we issue cpu creation from the i/o thread, and then shoot a thread from inside that code. Over the last months, a lot of subtle bugs were reported, usually arising from the very fragile order of that initialization. I propose we rethink that a little. This is a patch that received basic testing only, and I'd like to hear on the overall direction. The idea is to issue the new thread as early as possible. The first direct benefits I can identify are that we no longer have to rely at on_vcpu-like schemes for issuing vcpu ioctls, since we are already on the right thread. Apic creation has far less spots for race conditions as well. I am implementing this on qemu-kvm first, since we can show the benefits of it a bit better in there (since we already support smp) Let me know what you guys think Signed-off-by: Glauber Costa CC: Marcelo Tosatti CC: Avi Kivity CC: Jan Kiszka CC: Anthony Liguori --- cpu-defs.h |2 +- hw/acpi.c |2 +- hw/pc.c| 26 -- hw/pc.h|2 +- qemu-kvm-x86.c |5 + qemu-kvm.c | 44 +++- qemu-kvm.h |2 ++ vl.c |2 -- 8 files changed, 53 insertions(+), 32 deletions(-) diff --git a/cpu-defs.h b/cpu-defs.h index cf502e9..6d026e0 100644 --- a/cpu-defs.h +++ b/cpu-defs.h @@ -139,7 +139,7 @@ typedef struct CPUWatchpoint { struct qemu_work_item; struct KVMCPUState { -pthread_t thread; +pthread_t *thread; int signalled; struct qemu_work_item *queued_work_first, *queued_work_last; int regs_modified; diff --git a/hw/acpi.c b/hw/acpi.c index 7564abf..cc68188 100644 --- a/hw/acpi.c +++ b/hw/acpi.c @@ -781,7 +781,7 @@ void qemu_system_cpu_hot_add(int cpu, int state) CPUState *env; if (state && !qemu_get_cpu(cpu)) { -env = pc_new_cpu(model); +pc_new_cpu(model, &env); if (!env) { fprintf(stderr, "cpu %d creation failed\n", cpu); return; diff --git a/hw/pc.c b/hw/pc.c index 83012a9..53e7273 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -1013,29 +1013,26 @@ int cpu_is_bsp(CPUState *env) return env->cpuid_apic_id == 0; } -CPUState *pc_new_cpu(const char *cpu_model) +void pc_new_cpu(const char *cpu_model, CPUState **env) { -CPUState *env; - -env = cpu_init(cpu_model); -if (!env) { +*env = cpu_init(cpu_model); +if (!*env) { fprintf(stderr, "Unable to find x86 CPU definition\n"); exit(1); } -env->kvm_cpu_state.regs_modified = 1; -if ((env->cpuid_features & CPUID_APIC) || smp_cpus > 1) { -env->cpuid_apic_id = env->cpu_index; +(*env)->kvm_cpu_state.regs_modified = 1; +if (((*env)->cpuid_features & CPUID_APIC) || smp_cpus > 1) { +(*env)->cpuid_apic_id = (*env)->cpu_index; /* APIC reset callback resets cpu */ -apic_init(env); +apic_init(*env); } else { -qemu_register_reset((QEMUResetHandler*)cpu_reset, env); +qemu_register_reset((QEMUResetHandler*)cpu_reset, *env); } /* kvm needs this to run after the apic is initialized. Otherwise, * it can access invalid state and crash. */ -qemu_init_vcpu(env); -return env; +qemu_init_vcpu(*env); } /* PC hardware initialisation */ @@ -1055,7 +1052,6 @@ static void pc_init1(ram_addr_t ram_size, PCIBus *pci_bus; ISADevice *isa_dev; int piix3_devfn = -1; -CPUState *env; qemu_irq *cpu_irq; qemu_irq *isa_irq; qemu_irq *i8259; @@ -1086,8 +1082,10 @@ static void pc_init1(ram_addr_t ram_size, if (kvm_enabled()) { kvm_set_boot_cpu_id(0); } + for (i = 0; i < smp_cpus; i++) { -env = pc_new_cpu(cpu_model); +//pc_new_cpu(cpu_model, &env); + kvm_init_vcpu(NULL); } vmport_init(); diff --git a/hw/pc.h b/hw/pc.h index 93eb34d..f931380 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -100,7 +100,7 @@ extern int fd_bootchk; void ioport_set_a20(int enable); int ioport_get_a20(void); -CPUState *pc_new_cpu(const char *cpu_model); +void pc_new_cpu(const char *cpu_model, CPUState **env); /* acpi.c */ extern int acpi_enabled; diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 0d263ca..4084312 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -160,6 +160,11 @@ int kvm_arch_create(kvm_context_t kvm, unsigned long phys_mem_bytes, return 0; } +void kvm_arch_create_vcpu(const char *model, CPUState **env) +{ +pc_new_cpu("qemu64", env); +} + #ifdef KVM_EXIT_TPR_ACCESS static int kvm_handle_tpr_access(CPUState *env) diff --git a/qemu-kvm.c b/qemu-kvm.c index b58a457..f83d19a 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -436,10 +436,18 @@ void kvm_disable_pit_creation(kvm_context_t kvm) kvm->no_pit_creation = 1; } -static void kvm_create_vcpu(CPUState *env, int id) + +static void kvm_create_vcpu(CPUState **_env) { long mmap_size; int r; +int id; +CPUState *env; + +kvm_arch
kvm problem: bonding network interface breaks dhcp
Hi folks, I am trying to use a bonding network interface as a bridge for a virtual machine (kvm). Host and guest are both running 2.6.31.5. Problem: The guest does not receive the DHCPOFFER reply sent by my dhcp server. There is no such problem if the host uses just a single network interface instead of bond0. Looking at tcpdump on the Linux guest there are several dhcp discover packages like 15:17:44.005306 00:16:36:2f:f1:d2 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328) 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:16:36:2f:f1:d2, length 300, xid 0x4c31213d, secs 10, Flags [none] Client-Ethernet-Address 00:16:36:2f:f1:d2 [|bootp] The dhcp server receives these packages, and sends out a reply 15:17:45.927589 00:16:36:2f:f1:d2 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328) 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:16:36:2f:f1:d2, length 300, xid 0x4c31213d, secs 10, Flags [none] Client-Ethernet-Address 00:16:36:2f:f1:d2 [|bootp] 15:17:45.927658 00:15:17:94:16:65 > 00:16:36:2f:f1:d2, ethertype IPv4 (0x0800), length 364: (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 350) 172.19.96.123.67 > 172.19.97.243.68: BOOTP/DHCP, Reply, length 322, xid 0x4c31213d, secs 10, Flags [none] Your-IP 172.19.97.243 Client-Ethernet-Address 00:16:36:2f:f1:d2 [|bootp] This reply never shows up on the guest. iptable is not set, of course. sysctl.conf says net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 Any helpful comment would be highly appreciated. Many thanx Harri -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support
On 11/03/2009 05:29 PM, Ondrej Zajicek wrote: On Tue, Nov 03, 2009 at 11:38:18AM +0200, Avi Kivity wrote: On 11/03/2009 01:25 PM, Vincent Hanquez wrote: not sure if i'm missing the point here, but couldn't it be hypothetically extended to stuff 3d (or video& more 2d accel ?) commands too ? I can't imagine the cirrus or stdvga driver be able to do that ever ;) cirrus has pretty good 2d acceleration. 3D is a mega-project though. Cirrus has no blending/compositing hardware support. Paravirtualized graphics can easily support full XRender-style 2D acceleration. What do that entail? 3/4 operand raster ops? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add VirtIO Frame Buffer Support
On 11/03/2009 05:14 PM, Anthony Liguori wrote: Avi Kivity wrote: On 11/03/2009 10:26 AM, Alexander Graf wrote: Exactly. In fact, I'm even scared to reboot mine because I might end up in a 3270 terminal. The whole text only crap keeps people from using this platform! And that's what I want to change here. Ok. I oppose paravirtualization for its own sake and only support it if there's no other way to get performance. In this case it buys us basic functionality which is surprisingly missing on native, that's arguably even more important. There is no "native" on s390. Everything is "paravirtual". I meant native as in "what they usually do without our stuff". -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CPU change causes hanging of .NET apps
On 11/03/2009 05:11 PM, Timur Safin wrote: My totally noob in QEMU guess - my bet it's CR4.OSFXSR which is controlled by presence of cpuid.1.edx[24] - FXSR bit (FXSAVE and FXRSTOR) instructions. That would affect floating point as well. I'm curious - is there any way in QEMU to redefine returned cpuid leaf values? It will be interesting to see results if that given bit will be disabled in configuration. qemu -cpu host,-flag -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: vhost-net patches
Hello Xiaohui, On Tue, 2009-11-03 at 09:06 +0800, Xin, Xiaohui wrote: > Hi, Michael, > What's your deferring skb allocation patch mentioned here, may you > elaborate it a little more detailed? That's my patch. It was submitted a few month ago. Here is the link to this RFC patch: http://www.mail-archive.com/kvm@vger.kernel.org/msg20777.html It is a patch for guest receiving. Right now, kvm guest did pre-skb allocation, the worse case when receiving large packet for mergable buffers, 15/16 skbs need to be freed. Avi and Michale gave me some comments. I will post the updated patch after running a few more test in a few days. Thanks Shirley -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] allow userspace to set MSR no-ops
On Wed, Oct 28, 2009 at 01:23:07PM -0400, David Windsor wrote: > Hi, > > I've encountered a situation in which I would like to allow userspace > to set the MSRs which KVM should not emulate and instead implement > these as no-ops. > > I have not seen any work in this space, furthermore there is an item > on the KVM TODO that is very similar to what I'm trying to do. > > The userspace interface is an extension of kvm_vcpu_ioctl, adding the > KVM_SET_MSRS_NOOP flag. It takes a struct kvm_msrs as a list of which > MSRs should be no-ops and adds the field noop to struct kvm_msr_entry. > This patch only affects vmx, but if the approach is sane, I can > extend it to support svm as well. Does the ignore_msrs kvm.ko parameter achieve what you want? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support
On Tue, Nov 03, 2009 at 11:38:18AM +0200, Avi Kivity wrote: > On 11/03/2009 01:25 PM, Vincent Hanquez wrote: > > not sure if i'm missing the point here, but couldn't it be hypothetically > > extended to stuff 3d (or video& more 2d accel ?) commands too ? I can't > > imagine the cirrus or stdvga driver be able to do that ever ;) > > > > cirrus has pretty good 2d acceleration. 3D is a mega-project though. Cirrus has no blending/compositing hardware support. Paravirtualized graphics can easily support full XRender-style 2D acceleration. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so." signature.asc Description: Digital signature
Re: [PATCH] Add VirtIO Frame Buffer Support
Avi Kivity wrote: On 11/03/2009 10:26 AM, Alexander Graf wrote: Exactly. In fact, I'm even scared to reboot mine because I might end up in a 3270 terminal. The whole text only crap keeps people from using this platform! And that's what I want to change here. Ok. I oppose paravirtualization for its own sake and only support it if there's no other way to get performance. In this case it buys us basic functionality which is surprisingly missing on native, that's arguably even more important. There is no "native" on s390. Everything is "paravirtual". Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CPU change causes hanging of .NET apps
2009/11/3 Avi Kivity : > On 11/03/2009 04:56 PM, Erik Rull wrote: >> >> I took all flags, same effect as without all of these flags. :( >> >> Any other idea? > > It's probably the cache size query. > > Does -cpu host work? > My totally noob in QEMU guess - my bet it's CR4.OSFXSR which is controlled by presence of cpuid.1.edx[24] - FXSR bit (FXSAVE and FXRSTOR) instructions. I'm curious - is there any way in QEMU to redefine returned cpuid leaf values? It will be interesting to see results if that given bit will be disabled in configuration. Best Regards, Timur -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CPU change causes hanging of .NET apps
On 11/03/2009 04:56 PM, Erik Rull wrote: I took all flags, same effect as without all of these flags. :( Any other idea? It's probably the cache size query. Does -cpu host work? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CPU change causes hanging of .NET apps
Avi Kivity wrote: On 11/02/2009 01:45 AM, Erik Rull wrote: Hi Avi, Please don't top-post. the Host CPU is a Intel Core2Duo - VT capable and enabled! The problem is that one of the flags that -cpu core2duo enables is implemented incorrectly, so it leads to .net breakage. These flags are pni, lm, nx, ssse3, syscall. Please try -cpu core2duo,-pni,-lm,-nx,-ssse3,-syscall. If it works, remove features one by one until it doesn't and let us know the results. I took all flags, same effect as without all of these flags. :( Any other idea? - Erik -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.31.4 panic: CRED: put_cred_rcu() sees ffff880204e58c00 with usage 82150912
On Fri, Oct 30, 2009 at 12:15:34PM +0100, Nikola Ciprich wrote: > Ouch, typo in subject, it's 2.6.31.1 of course. sorry about that. > also CCing kvm. > n. > > On Fri, Oct 30, 2009 at 12:06:32PM +0100, Nikola Ciprich wrote: > > Hi, > > some time ago, I updated my KVM hosting machine to 2.6.31.1 and it just > > died horribly: Nikola, Upgraded from what? Did you see experience the crash again? > > Oct 30 10:45:17 vbox [706369.133516] Kernel panic - not syncing: CRED: > > put_cred_rcu() sees 880204e58c00 with usage 82150912 > > Oct 30 10:45:17 vbox [706369.133519] > > Oct 30 10:45:17 vbox [706369.144990] Pid: 19, comm: ksoftirqd/5 Not tainted > > 2.6.31lb.02 #1 > > Oct 30 10:45:17 vbox [706369.151554] Call Trace: > > Oct 30 10:45:17 vbox [706369.154332] > > Oct 30 10:45:17 vbox [] panic+0xaa/0x180 > > Oct 30 10:45:17 vbox [706369.160280] [] ? > > _spin_unlock+0x30/0x60 > > Oct 30 10:45:17 vbox [706369.166256] [] ? > > add_partial+0x21/0x90 > > Oct 30 10:45:17 vbox [706369.172155] [] ? > > __slab_free+0x92/0x3c0 > > Oct 30 10:45:17 vbox [706369.178127] [] ? > > file_free_rcu+0x37/0x50 > > Oct 30 10:45:17 vbox [706369.184198] [] > > put_cred_rcu+0x75/0x80 > > Oct 30 10:45:17 vbox [706369.190008] [] > > __rcu_process_callbacks+0x125/0x250 > > Oct 30 10:45:17 vbox [706369.197020] [] > > rcu_process_callbacks+0x39/0x60 > > Oct 30 10:45:17 vbox [706369.203624] [] > > __do_softirq+0xc1/0x250 > > Oct 30 10:45:17 vbox [706369.209506] [] ? > > ksoftirqd+0x0/0x1a0 > > Oct 30 10:45:17 vbox [706369.215182] [] > > call_softirq+0x1c/0x30 > > Oct 30 10:45:17 vbox [706369.220986] [] > > do_softirq+0x3d/0x80 > > Oct 30 10:45:17 vbox [706369.227317] [] ? > > ksoftirqd+0x0/0x1a0 > > Oct 30 10:45:17 vbox [706369.233020] [] > > ksoftirqd+0x84/0x1a0 > > Oct 30 10:45:17 vbox [706369.238622] [] kthread+0xa6/0xb0 > > Oct 30 10:45:17 vbox [706369.243956] [] > > child_rip+0xa/0x20 > > Oct 30 10:45:17 vbox [706369.249390] [] ? > > kthread+0x0/0xb0 > > Oct 30 10:45:17 vbox [706369.254806] [] ? > > child_rip+0x0/0x20 > > Oct 30 10:45:17 vbox [706369.260454] Rebooting in 10 seconds.. > > (trace is obtained from netconsole, so hopefully it's not mangled). > > The machine was running ~30 KVM guests, it's 8CPU 16GB x86_64, when it > > crashed, it was only > > moderately loaded. Never had this (or any other) kind of crash on it before. > > I know there's 2.6.31.5 already out, but I'm not sure if some related > > problem has been > > reported/fixed and I'm obviously not able to quicky test/reproduce it with > > latest kernel, > > so I'm rather reporting. > > Should more information/testing/etc be required, I'll be glad to help > > regards > > nik -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.
On 11/03/2009 04:32 PM, Marcelo Tosatti wrote: Any attempt to access the swapped out data will cause a #PF vmexit, since the translation is marked as not present. If there's swapin in progress, you wait for that swapin, otherwise start swapin and wait. Its not as efficient as paravirt because you have to wait for a timer interrupt and the guest scheduler to decide to taskswitch, but OTOH its transparent. With a dyntick guest the timer interrupt will come at the end of the time slice, likely after the page has been swapped in. That leaves smp reschedule interrupts and non-dyntick guests. An advantage is that there is one code path for apf and non-apf. Another is that interrupts are processed, improving timekeeping and maybe responsiveness. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.
On Tue, Nov 03, 2009 at 04:25:33PM +0200, Gleb Natapov wrote: > On Tue, Nov 03, 2009 at 12:14:23PM -0200, Marcelo Tosatti wrote: > > On Sun, Nov 01, 2009 at 01:56:22PM +0200, Gleb Natapov wrote: > > > Asynchronous page fault notifies vcpu that page it is trying to access > > > is swapped out by a host. In response guest puts a task that caused the > > > fault to sleep until page is swapped in again. When missing page is > > > brought back into the memory guest is notified and task resumes execution. > > > > Can't you apply this to non-paravirt guests, and continue to deliver > > interrupts while waiting for the swapin? > > > > It should allow the guest to schedule a different task. > But how can I make the guest to not run the task that caused the fault? Any attempt to access the swapped out data will cause a #PF vmexit, since the translation is marked as not present. If there's swapin in progress, you wait for that swapin, otherwise start swapin and wait. Its not as efficient as paravirt because you have to wait for a timer interrupt and the guest scheduler to decide to taskswitch, but OTOH its transparent. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.
On Tue, Nov 03, 2009 at 12:14:23PM -0200, Marcelo Tosatti wrote: > On Sun, Nov 01, 2009 at 01:56:22PM +0200, Gleb Natapov wrote: > > Asynchronous page fault notifies vcpu that page it is trying to access > > is swapped out by a host. In response guest puts a task that caused the > > fault to sleep until page is swapped in again. When missing page is > > brought back into the memory guest is notified and task resumes execution. > > Can't you apply this to non-paravirt guests, and continue to deliver > interrupts while waiting for the swapin? > > It should allow the guest to schedule a different task. But how can I make the guest to not run the task that caused the fault? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.
On Sun, Nov 01, 2009 at 01:56:22PM +0200, Gleb Natapov wrote: > Asynchronous page fault notifies vcpu that page it is trying to access > is swapped out by a host. In response guest puts a task that caused the > fault to sleep until page is swapped in again. When missing page is > brought back into the memory guest is notified and task resumes execution. Can't you apply this to non-paravirt guests, and continue to deliver interrupts while waiting for the swapin? It should allow the guest to schedule a different task. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tests: The order of the fields are in reverse one isr stack.
Applied both, thanks. On Thu, Oct 29, 2009 at 11:12:57AM +0200, Gleb Natapov wrote: > > Signed-off-by: Gleb Natapov > diff --git a/kvm/user/test/x86/apic.c b/kvm/user/test/x86/apic.c > index 4e89c77..b6718ec 100644 > --- a/kvm/user/test/x86/apic.c > +++ b/kvm/user/test/x86/apic.c -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
On Fri, Oct 30, 2009 at 12:46:59PM +0100, Jan Kiszka wrote: > Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from > KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance, > avoid triggering the WARN_ON in kvm_queue_exception if there is already > an exception pending and reject such invalid requests. > > Signed-off-by: Jan Kiszka Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Fix KVM_GET_CLOCK
On Tue, Nov 03, 2009 at 12:49:05PM +0100, Jan Kiszka wrote: > The flags field of kvm_clock_data is supposed to indicate the > availability of additional fields one day. There are none yet, so clear > it. Moreover, drop the bogus check of this field and return 0 on > success. > > Signed-off-by: Jan Kiszka > --- > > Note: This replaces "Clear flags field on return from KVM_GET_CLOCK". > > arch/x86/kvm/x86.c |8 +++- > 1 files changed, 3 insertions(+), 5 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM Live Migration
hi all Is this my first post... I sucess migrate a VM with this command: (on kvm-0 ) r...@kvm-0:~# virsh list --all Connecting to uri: qemu:///system Id Name State -- 1 win2003 running r...@kvm-0:~:# visrh migrate --live win2003 qemu+ssh://kvm-1/system But, when the VM start on kvm-1, it's started on paused state!!! I don't have any idea why this happen!!! So, to unpause the vm, I log in on kvm-1, and run virt-manager to unpause the VM... I use DRBD as a share storage to vm's... Any help will be welcome.. Thanks Gilberto Nunes Ferreira TI Selbetti Gestão de Documentos Telefone: +55 (47) 3441-6004 Celular: +55 (47) 8861-6672 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 1/3] tun: export underlying socket
On Tuesday 03 November 2009, Michael S. Tsirkin wrote: > > What was your reason for changing? > > It turns out socket structure is really bound to specific a file, so we > can not have 2 files referencing the same socket. Instead, as I say > above, it's possible to make sendmsg/recvmsg work on tap file directly. Ah, I see. > I have implemented this (patch below), but decided to go with the simple > thing first. Since no userspace-visible changes are involved, let's do > this by small steps: it will be easier to figure out when vhost > is upstream. This may even make it easier for me to do the same with macvtap if I resume work on that. > @@ -416,8 +422,8 @@ int sock_map_fd(struct socket *sock, int flags) > > static struct socket *sock_from_file(struct file *file, int *err) > { > - if (file->f_op == &socket_file_ops) > - return file->private_data; /* set in sock_map_fd */ > + if (file->f_op->get_socket) > + return file->f_op->get_socket(file); > > *err = -ENOTSOCK; Or maybe do both (socket_file_ops and get_socket), to avoid an indirect function call. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 1/3] tun: export underlying socket
On Tue, Nov 03, 2009 at 01:12:33PM +0100, Arnd Bergmann wrote: > On Monday 02 November 2009, Michael S. Tsirkin wrote: > > Tun device looks similar to a packet socket > > in that both pass complete frames from/to userspace. > > > > This patch fills in enough fields in the socket underlying tun driver > > to support sendmsg/recvmsg operations, and message flags > > MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket > > to modules. Regular read/write behaviour is unchanged. > > > > This way, code using raw sockets to inject packets > > into a physical device, can support injecting > > packets into host network stack almost without modification. > > > > First user of this interface will be vhost virtualization > > accelerator. > > You mentioned before that you wanted to export the socket > using some ioctl function returning an open file descriptor, > which seemed to be a cleaner approach than this one. Note that a similar feature can be implemented on top of tun_get_socket, as seen from patch below. > What was your reason for changing? It turns out socket structure is really bound to specific a file, so we can not have 2 files referencing the same socket. Instead, as I say above, it's possible to make sendmsg/recvmsg work on tap file directly. For vhost, the advantage of such a feature over using tun_get_socket directly would be that vhost module won't depend on tun module then. I have implemented this (patch below), but decided to go with the simple thing first. Since no userspace-visible changes are involved, let's do this by small steps: it will be easier to figure out when vhost is upstream. --- Note: patch below aplies on top of patch tun: export underlying socket. It is not intended for merge yet. net: convert tun device to socket Add callback to file_ops to retrieve socket from file structure. Use this to make tun character device accept sendmsg/recvmsg calls. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/net/tun.c b/drivers/net/tun.c index b58095a..53e1806 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1405,7 +1405,8 @@ static const struct file_operations tun_fops = { .unlocked_ioctl = tun_chr_ioctl, .open = tun_chr_open, .release = tun_chr_close, - .fasync = tun_chr_fasync + .fasync = tun_chr_fasync, + .get_socket = tun_get_socket, }; static struct miscdevice tun_miscdev = { diff --git a/include/linux/fs.h b/include/linux/fs.h index 2620a8c..f2b381f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1506,6 +1506,9 @@ struct file_operations { ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); int (*setlease)(struct file *, long, struct file_lock **); +#ifdef CONFIG_NET + struct socket *(*get_socket)(struct file *file); +#endif }; struct inode_operations { diff --git a/net/socket.c b/net/socket.c index 9dff31c..700efcb 100644 --- a/net/socket.c +++ b/net/socket.c @@ -119,6 +119,11 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); +static struct socket *sock_get_socket(struct file *file) +{ + return file->private_data; /* set in sock_map_fd */ +} + /* * Socket files have a set of 'special' operations as well as the generic file ones. These don't appear * in the operation structures but are done directly via the socketcall() multiplexor. @@ -141,6 +146,7 @@ static const struct file_operations socket_file_ops = { .sendpage = sock_sendpage, .splice_write = generic_splice_sendpage, .splice_read = sock_splice_read, + .get_socket = sock_get_socket, }; /* @@ -416,8 +422,8 @@ int sock_map_fd(struct socket *sock, int flags) static struct socket *sock_from_file(struct file *file, int *err) { - if (file->f_op == &socket_file_ops) - return file->private_data; /* set in sock_map_fd */ + if (file->f_op->get_socket) + return file->f_op->get_socket(file); *err = -ENOTSOCK; return NULL; -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST][PATCH] Fix kvm_config.py -f mode
Ooops, fixed. Thanks Ryan! On Mon, Nov 2, 2009 at 9:28 PM, Ryan Harper wrote: > kvm_config.py supports specifying a different filename for > test config. This patch fixes the option parsing parameters. > Currently it uses 'store_true' which stores the value True into > the filename variable; we want the 'store' mode which will store > the value of the option (aka, the filename) in the variable. > > Signed-off-by: Ryan Harper > > diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py > index 3114c07..52de4c7 100755 > --- a/client/tests/kvm/kvm_config.py > +++ b/client/tests/kvm/kvm_config.py > @@ -501,7 +501,7 @@ class config: > > if __name__ == "__main__": > parser = optparse.OptionParser() > - parser.add_option('-f', '--file', dest="filename", action='store_true', > + parser.add_option('-f', '--file', dest="filename", action='store', > help='path to a config file that will be parsed. ' > 'If not specified, will parse kvm_tests.cfg ' > 'located inside the kvm test dir.') > > > > -- > Ryan Harper > Software Engineer; Linux Technology Center > IBM Corp., Austin, Tx > ry...@us.ibm.com > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 1/3] tun: export underlying socket
On Monday 02 November 2009, Michael S. Tsirkin wrote: > Tun device looks similar to a packet socket > in that both pass complete frames from/to userspace. > > This patch fills in enough fields in the socket underlying tun driver > to support sendmsg/recvmsg operations, and message flags > MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket > to modules. Regular read/write behaviour is unchanged. > > This way, code using raw sockets to inject packets > into a physical device, can support injecting > packets into host network stack almost without modification. > > First user of this interface will be vhost virtualization > accelerator. You mentioned before that you wanted to export the socket using some ioctl function returning an open file descriptor, which seemed to be a cleaner approach than this one. What was your reason for changing? > index 3f5fd52..404abe0 100644 > --- a/include/linux/if_tun.h > +++ b/include/linux/if_tun.h > @@ -86,4 +86,18 @@ struct tun_filter { > __u8 addr[0][ETH_ALEN]; > }; > > +#ifdef __KERNEL__ > +#if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) > +struct socket *tun_get_socket(struct file *); > +#else > +#include > +#include > +struct file; > +struct socket; > +static inline struct socket *tun_get_socket(struct file *f) > +{ > + return ERR_PTR(-EINVAL); > +} > +#endif /* CONFIG_TUN */ > +#endif /* __KERNEL__ */ > #endif /* __IF_TUN_H */ Is this a leftover from testing? Exporting the function for !__KERNEL__ seems pointless. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 3/3] vhost_net: a kernel-level virtio server
On Mon, Nov 02, 2009 at 04:05:58PM -0800, Daniel Walker wrote: > > Random style issues below .. Part of this is just stuff checkpatch > found. Thanks very much, I'll fix these. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Fix KVM_GET_CLOCK
The flags field of kvm_clock_data is supposed to indicate the availability of additional fields one day. There are none yet, so clear it. Moreover, drop the bogus check of this field and return 0 on success. Signed-off-by: Jan Kiszka --- Note: This replaces "Clear flags field on return from KVM_GET_CLOCK". arch/x86/kvm/x86.c |8 +++- 1 files changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1f68798..7344405 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2577,14 +2577,12 @@ long kvm_arch_vm_ioctl(struct file *filp, ktime_get_ts(&now); now_ns = timespec_to_ns(&now); user_ns.clock = kvm->arch.kvmclock_offset + now_ns; + user_ns.flags = 0; + r = -EFAULT; if (copy_to_user(argp, &user_ns, sizeof(user_ns))) - r = -EFAULT; - - r = -EINVAL; - if (user_ns.flags) goto out; - + r = 0; break; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 0/6] qemu-kvm: vhost net support
On Mon, Nov 02, 2009 at 04:58:39PM -0600, Anthony Liguori wrote: > Hi Michael, > > I'll reserve individual patch review until they're in a mergable state, > but I do have some comments about the overall integration architecture. > > Generally speaking, I think the integration unnecessarily invasive. It > adds things to the virtio infrastructure that shouldn't be there like > the irqfd/queuefd bindings. It also sneaks in things like raw backend > support which really isn't needed. > > I think we can do better. Here's what I suggest: > > The long term goal should be to have a NetDevice interface that looks > very much like virtio-net but as an API, not an ABI. Roughly, it would > look something like: > > struct NetDevice { > int add_xmit(NetDevice *dev, struct iovec *iov, int iovcnt, void *token); > int add recv(NetDevice *dev, struct iovec *iov, int iovcnt, void *token); > > void *get_xmit(NetDevice *dev); > void *get_recv(NetDevice *dev); > > void kick(NetDevice *dev); > > ... > }; > > That gives us a better API for use with virtio-net, e1000, etc. This is not much different from what we have now with VLANClientState, is it? > Assuming we had this interface, I think a natural extension would be: > > int add_ring(NetDevice *dev, void *address); > int add_kickfd(NetDevice *dev, int fd); > > For slot management, it really should happen outside of the NetDevice > structure. We'll need a slot notifier mechanism such that we can keep > this up to date as things change. Yes. > vhost-net because a NetDevice. It can support things like the e1000 by > doing ring translation behind the scenes. And the point would be? > virtio-net can be fast pathed > in the case that we're using KVM but otherwise, it would also rely on > the ring translation. Won't it be easier to just keep using existing code? > N.B. in the case vhost-net is fast pathed, it requires a different > device in QEMU that uses a separate virtio transport. We should > reuse as much code as possible obviously. It doesn't make sense to > have all of the virtio-pci code and virtio-net code in place when we > aren't using it. Note that all of virtio-pci and setup parts of virtio-net are reused. The only things we are *not* re-using are send/receive and callbacks in virtio-net. > All this said, I'm *not* suggesting you have to implement all of this to > get vhost-net merged. Rather, I'm suggesting that we should try to > structure the current vhost-net implementation to complement this > architecture assuming we all agree this is the sane thing to do. That > means I would make the following changes to your series: > > - move vhost-net support to a VLANClientState backend. > - do not introduce a raw socket backend > - if for some reason you want to back to tap and raw, those should be > options to the vhost-net backend. > - when fast pathing with vhost-net, we should introduce interfaces to > VLANClientState similar to add_ring and add_kickfd. They'll be very > specific to vhost-net for now, but that's okay. > - sort out the layering of vhost-net within the virtio infrastructure. > vhost-net should really be it's own qdev device. > I don't see very much > code reuse happening right now so I don't understand why it's not that > way currently. > Regards, > > Anthony Liguori What you propose short-term is workable. So basically, vhost would be an option supported by backends. virtio net would go ahead and activate it if available and other frontends will ignore it and just keep injecting packets through regular interfaces. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: error while loading state for instance 0x0 of device 'kvmclock'
On Mon, Nov 02, 2009 at 04:37:15PM +0100, Jan Kiszka wrote: > Avi Kivity wrote: > > On 11/02/2009 12:28 PM, Jan Kiszka wrote: > >> Jan Kiszka wrote: > >> > >>> Hi, > >>> > >>> current qemu-kvm.git gives me the message "qemu: warning: error while > >>> loading state for instance 0x0 of device 'kvmclock'" when I run a simple > >>> "savevm" followed by a "loadvm 1". What's broken here? > >>> > >> OK, this is due to "KVM: add flags to kvm_clock_data" (958b0c5497): the > >> flags field is not cleared on KVM_SET_CLOCK. Will post a fix. > >> > >> But the above kernel commit is also broken: KVM_GET_CLOCK checks > >> uninitialized user_ns.flags (probably instead of the user's value). This > >> raises the question if the caller of KVM_GET_CLOCK is also supposed to > >> pass kvm_clock_data with flags cleared down to the kernel. Could someone > >> clarify this so I could fix it accordingly? > >> > > > > I'd make KVM_GET_CLOCK set the flags, not get them. So if we add new > > fields, we just set a new bit and userspace can read it. > > This makes sense and actually fixes the issue completely. Patch on its > way... agreed. This is the best behaviour I can devise, indeed. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Clear flags field on return from KVM_GET_CLOCK
On Mon, Nov 02, 2009 at 04:41:54PM +0100, Jan Kiszka wrote: > This field is supposed to indicate the availability of additional fields > one day. There are none yet, so clear it - and drop the bogus check, > too. > > Signed-off-by: Jan Kiszka Makes sense. Acked-by: Glauber Costa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 1/6] qemu/virtio: move features to an inline function
On Mon, Nov 02, 2009 at 04:33:53PM -0600, Anthony Liguori wrote: > Michael S. Tsirkin wrote: >> devices should have the final say over which virtio features they >> support. E.g. indirect entries may or may not make sense in the context >> of virtio-console. In particular, for vhost, we do not want to report to >> guest bits not supported by kernel backend. Move the common bits from >> virtio-pci to an inline function and let each device call it. >> >> No functional changes. >> > > This is a layering violation. There are transport specific features and > device specific features. The virtio-net device should have no > knowledge or nack'ing ability for transport features. We could pass "vhost" flag to virtio, and have virtio query the device for features. Would that be better? > If you need to change transport features, it suggests you're modeling > things incorrectly and should be supplying an alternative transport > implementation. > Regards, > > Anthony Liguori Yes, you can make vhost an alternative transport in qemu. This might be one way to handle this. However, this seems to go contrary to your previous proposal to make vhost a networking back end. Which will it be? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support
On Tue, Nov 03, 2009 at 07:39:34AM +0100, Alexander Graf wrote: > > On 03.11.2009, at 07:34, Avi Kivity wrote: > >> On 11/03/2009 08:27 AM, Alexander Graf wrote: >>> How does it work today? >>> >>> You boot into a TERM=dumb line based emulation on 3270 (worst thing >>> haunting people's nightmares ever), trying to get out of that mode >>> as quickly as possible and off into SSH / VNC. >> >> Despite the coolness factor, IMO a few minutes during install time do >> not justify a new hardware model and a new driver. > > It's more than just coolness factor. There are use cases out there > (www.susestudio.com) that don't want to rely on the guest exporting a VNC > server to the outside just to access graphics. You also want to see boot > messages, have a console login screen, be able to debug things without > switching between virtio-console and vnc, etc. etc. > > The hardware model isn't exactly new either. It's just the next logical > step to a full PV machine using virtio. If the virtio-fb stuff turns out > to be really fast and reliable, I could even imagine it being the default > target for kvm on ppc as well, as we can't switch resolutions on the fly > there atm. not sure if i'm missing the point here, but couldn't it be hypothetically extended to stuff 3d (or video & more 2d accel ?) commands too ? I can't imagine the cirrus or stdvga driver be able to do that ever ;) -- Vincent -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support
On 11/03/2009 01:25 PM, Vincent Hanquez wrote: not sure if i'm missing the point here, but couldn't it be hypothetically extended to stuff 3d (or video& more 2d accel ?) commands too ? I can't imagine the cirrus or stdvga driver be able to do that ever ;) cirrus has pretty good 2d acceleration. 3D is a mega-project though. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
On 03.11.2009, at 09:47, Segher Boessenkool wrote: Nice patchset. Some comments on the emulation part: Cool, thanks for looking though them! +#define OP_31_XOP_EIOIO854 You mean EIEIO. Probably, yeah. + case 19: + switch (get_xop(inst)) { + case OP_19_XOP_RFID: + case OP_19_XOP_RFI: + vcpu->arch.pc = vcpu->arch.srr0; + kvmppc_set_msr(vcpu, vcpu->arch.srr1); + *advance = 0; + break; I think you should only emulate the insns that exist on whatever the guest pretends to be. RFID exist only on 64-bit implementations. Same comment everywhere else. True. + case OP_31_XOP_EIOIO: + break; Have you always executed an eieio or sync when you get here, or do you just not allow direct access to I/O devices? Other context synchronising insns are not enough, they do not broadcast on the bus. There is no device passthrough yet :-). It's theoretically possible, but nothing for it is implemented so far. + case OP_31_XOP_DCBZ: + { + ulong rb = vcpu->arch.gpr[get_rb(inst)]; + ulong ra = 0; + ulong addr; + u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 }; + + if (get_ra(inst)) + ra = vcpu->arch.gpr[get_ra(inst)]; + + addr = (ra + rb) & ~31ULL; + if (!(vcpu->arch.msr & MSR_SF)) + addr &= 0x; + + if (kvmppc_st(vcpu, addr, 32, zeros)) { DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there are HID bits to make it work on 32 bytes only, and an extra DCBZL insn that always clears a full cache line (128 bytes). Yes. We only come here when we patched the dcbz opcodes to invalid instructions because cache line size of target == 32. On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode. Admittedly though, this could be a lot more clever. + switch (sprn) { + case SPRN_IBAT0U ... SPRN_IBAT3L: + bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2]; + break; + case SPRN_IBAT4U ... SPRN_IBAT7L: + bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2]; + break; + case SPRN_DBAT0U ... SPRN_DBAT3L: + bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2]; + break; + case SPRN_DBAT4U ... SPRN_DBAT7L: + bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2]; + break; Do xBAT4..7 have the same SPR numbers on all CPUs? They are CPU- specific SPRs, after all. Some CPUs have only six, some only four, some none, btw. For now only Linux runs which only uses the first 3(?) IIRC. But yes, it's probably worth looking into at one point or the other. + case SPRN_HID0: + to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs]; + break; + case SPRN_HID1: + to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs]; + break; + case SPRN_HID2: + to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs]; + break; + case SPRN_HID4: + to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs]; + break; + case SPRN_HID5: + to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs]; HIDs are different per CPU; and worse, different CPUs have different registers (SPR #s) for the same register name! Sigh :-( + /* guest HID5 set can change is_dcbz32 */ + if (vcpu->arch.mmu.is_dcbz32(vcpu) && + (mfmsr() & MSR_HV)) + vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32; + break; Wait, does this mean you allow other HID writes when MSR[HV] isn't set? All HIDs (and many other SPRs) cannot be read or written in supervisor mode. When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte dcbz HID flag. So all we need to do is tell our entry/exit code to set this bit. If we're on 970 on a hypervisor or on a non-970 though we can't use the HID5 bit, so we need to binary patch the opcodes. So in order to emulate real 970 behavior, we need to be able to emulate that HID5 bit too! That's what this chunk of code does - it basically sets us in dcbz32 mode when allowed on 970 guests. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] S390x KVM support
On 11/02/2009 10:23 PM, Alexander Graf wrote: Any progress on the patch? This is really important to make KVM work properly on S390. I'd even go as far as suggesting it for linux-stable. I forgot all about it, sorry. Marcelo, can you commit it? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add VirtIO Frame Buffer Support
On 11/03/2009 10:26 AM, Alexander Graf wrote: Exactly. In fact, I'm even scared to reboot mine because I might end up in a 3270 terminal. The whole text only crap keeps people from using this platform! And that's what I want to change here. Ok. I oppose paravirtualization for its own sake and only support it if there's no other way to get performance. In this case it buys us basic functionality which is surprisingly missing on native, that's arguably even more important. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
Nice patchset. Some comments on the emulation part: +#define OP_31_XOP_EIOIO854 You mean EIEIO. + case 19: + switch (get_xop(inst)) { + case OP_19_XOP_RFID: + case OP_19_XOP_RFI: + vcpu->arch.pc = vcpu->arch.srr0; + kvmppc_set_msr(vcpu, vcpu->arch.srr1); + *advance = 0; + break; I think you should only emulate the insns that exist on whatever the guest pretends to be. RFID exist only on 64-bit implementations. Same comment everywhere else. + case OP_31_XOP_EIOIO: + break; Have you always executed an eieio or sync when you get here, or do you just not allow direct access to I/O devices? Other context synchronising insns are not enough, they do not broadcast on the bus. + case OP_31_XOP_DCBZ: + { + ulong rb = vcpu->arch.gpr[get_rb(inst)]; + ulong ra = 0; + ulong addr; + u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 }; + + if (get_ra(inst)) + ra = vcpu->arch.gpr[get_ra(inst)]; + + addr = (ra + rb) & ~31ULL; + if (!(vcpu->arch.msr & MSR_SF)) + addr &= 0x; + + if (kvmppc_st(vcpu, addr, 32, zeros)) { DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there are HID bits to make it work on 32 bytes only, and an extra DCBZL insn that always clears a full cache line (128 bytes). + switch (sprn) { + case SPRN_IBAT0U ... SPRN_IBAT3L: + bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2]; + break; + case SPRN_IBAT4U ... SPRN_IBAT7L: + bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2]; + break; + case SPRN_DBAT0U ... SPRN_DBAT3L: + bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2]; + break; + case SPRN_DBAT4U ... SPRN_DBAT7L: + bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2]; + break; Do xBAT4..7 have the same SPR numbers on all CPUs? They are CPU- specific SPRs, after all. Some CPUs have only six, some only four, some none, btw. + case SPRN_HID0: + to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs]; + break; + case SPRN_HID1: + to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs]; + break; + case SPRN_HID2: + to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs]; + break; + case SPRN_HID4: + to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs]; + break; + case SPRN_HID5: + to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs]; HIDs are different per CPU; and worse, different CPUs have different registers (SPR #s) for the same register name! + /* guest HID5 set can change is_dcbz32 */ + if (vcpu->arch.mmu.is_dcbz32(vcpu) && + (mfmsr() & MSR_HV)) + vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32; + break; Wait, does this mean you allow other HID writes when MSR[HV] isn't set? All HIDs (and many other SPRs) cannot be read or written in supervisor mode. Segher -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add VirtIO Frame Buffer Support
On 03.11.2009, at 09:20, Avi Kivity wrote: On 11/03/2009 09:50 AM, Alexander Graf wrote: Ok, imagine this was not this unloved S390 odd architecture but X86. The only output choices you have are: 1) virtio-console 2) VNC / SSH over network 3) virtio-fb Now you want to configure a server, probably using yast and all those nice graphical utilities, but still enable a firewall so people outside don't intrude your machine. Well, you managed to configure the firewall by luck to allow VNC, but now you reconfigured it and something broke - but VNC was your only chance to access the machine. Oops... x86 has real framebuffers, so software and people expect it. s390 doesn't. How do people manage now? They cope with what's there. Fortunately we're in a position to change things, so we don't have to stick with the worse. You also want to see boot messages, have a console login screen, virtio-console does that, except for the penguins. Better, since you can scroll back. It doesn't do graphics. Ever used yast in text mode? Once you're in, start ssh+X or vnc. Again, what do people do now? Exactly that. Again, it works but is not ideal. If we can improve user experience why work against it? The hardware model isn't exactly new either. It's just the next logical step to a full PV machine using virtio. If the virtio-fb stuff turns out to be really fast and reliable, I could even imagine it being the default target for kvm on ppc as well, as we can't switch resolutions on the fly there atm. We could with vmware-vga. The vmware-port stuff is pretty much tied onto X86. I don't think modifying EAX is that easy on PPC ;-). Yes, though we can probably make it work on ppc with minimal modifications. Is it worth it? We can also just implement a virtio mouse event dev plus fb and be good. That way we control the whole stack without risking to break vmware. Why? the guest will typically have networking when it's set up, so it should have network access during install. You can easily use slirp redirection and the built-in dhcp server to set this up with relatively few hassles. That's how I use it right now. It's no fun. The toolstack should hide the unfun parts. You can't hide guest configuration. We as a distribution control the kernel. We don't control the user's configuration as that's by design the user's choice. The only thing we can do is give users meaningful choices to choose from - and having graphics available is definitely one of them. Well, if the user chooses not to have networking then vnc or ssh+x definitely fail. That would be a strange choice for a server machine. It's actually rather common on S390, though admittedly not that much on Linux+S390. There are more ways for inter node communication than networking. You can talk to another VM on the same machine without any network whatsoever. That way you can set up an isolated job (your bank transfer database for example) that is always protected by a proxy to the outside world. Seriously, try to ask someone internally to get access to an S390. I think you'll understand my motivations a lot better after having used it for a bit. I actually have a s390 vm (RHEL 4 IIRC). It acts just like any other remote machine over ssh except that it's especially slow (probably the host is overloaded). Of course I wouldn't dream of trying to install something like that though. Exactly. In fact, I'm even scared to reboot mine because I might end up in a 3270 terminal. The whole text only crap keeps people from using this platform! And that's what I want to change here. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add VirtIO Frame Buffer Support
On 11/03/2009 09:50 AM, Alexander Graf wrote: Ok, imagine this was not this unloved S390 odd architecture but X86. The only output choices you have are: 1) virtio-console 2) VNC / SSH over network 3) virtio-fb Now you want to configure a server, probably using yast and all those nice graphical utilities, but still enable a firewall so people outside don't intrude your machine. Well, you managed to configure the firewall by luck to allow VNC, but now you reconfigured it and something broke - but VNC was your only chance to access the machine. Oops... x86 has real framebuffers, so software and people expect it. s390 doesn't. How do people manage now? You also want to see boot messages, have a console login screen, virtio-console does that, except for the penguins. Better, since you can scroll back. It doesn't do graphics. Ever used yast in text mode? Once you're in, start ssh+X or vnc. Again, what do people do now? The hardware model isn't exactly new either. It's just the next logical step to a full PV machine using virtio. If the virtio-fb stuff turns out to be really fast and reliable, I could even imagine it being the default target for kvm on ppc as well, as we can't switch resolutions on the fly there atm. We could with vmware-vga. The vmware-port stuff is pretty much tied onto X86. I don't think modifying EAX is that easy on PPC ;-). Yes, though we can probably make it work on ppc with minimal modifications. Why? the guest will typically have networking when it's set up, so it should have network access during install. You can easily use slirp redirection and the built-in dhcp server to set this up with relatively few hassles. That's how I use it right now. It's no fun. The toolstack should hide the unfun parts. You can't hide guest configuration. We as a distribution control the kernel. We don't control the user's configuration as that's by design the user's choice. The only thing we can do is give users meaningful choices to choose from - and having graphics available is definitely one of them. Well, if the user chooses not to have networking then vnc or ssh+x definitely fail. That would be a strange choice for a server machine. Seriously, try to ask someone internally to get access to an S390. I think you'll understand my motivations a lot better after having used it for a bit. I actually have a s390 vm (RHEL 4 IIRC). It acts just like any other remote machine over ssh except that it's especially slow (probably the host is overloaded). Of course I wouldn't dream of trying to install something like that though. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html