Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
On Fri, Jan 21, 2011 at 09:48:29AM -0500, Rik van Riel wrote: > >>Why? If a VCPU can't make progress because its waiting for some > >>resource, then why not schedule something else instead? > > > >In the process, "something else" can get more share of cpu resource than its > >entitled to and that's where I was bit concerned. I guess one could > >employ hard-limits to cap "something else's" bandwidth where it is of real > >concern (like clouds). > > I'd like to think I fixed those things in my yield_task_fair + > yield_to + kvm_vcpu_on_spin patch series from yesterday. Speaking of the spinlock-in-virtualized-environment problem as whole, IMHO I don't think that kvm_vcpu_on_spin + yield changes will provide the best results, especially where ticketlocks are involved and they are paravirtualized in a manner being discussed in this thread. An important focus of pv-ticketlocks is to reduce the lock _acquisition_ time by ensuring that the next-in-line vcpu gets to run asap when a ticket lock is released. With the way kvm_vcpu_on_spin+yield_to is implemented, I don't see how we can provide the best lock acquisition times for threads. It would be nice though to compare the two approaches (kvm_vcpu_on_spin optimization and the pv-ticketlock scheme) to get some real-world numbers. I unfortunately don't have access to a PLE capable hardware which is required to test your kvm_vcpu_on_spin changes? Also it may be possible for the pv-ticketlocks to track owning vcpu and make use of a yield-to interface as further optimization to avoid the "others-get-more-time" problem, but Peterz rightly pointed that PI would be a better solution there than yield-to. So overall IMO kvm_vcpu_on_spin+yield_to could be the best solution for unmodified guests, while paravirtualized ticketlocks + some sort of PI would be a better solution where we have the luxury of modifying guest sources! - vatsa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM test: tests_base.cfg: Fixing tabs instead of whitespace
Those were introduced on a previous netperf fixes. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/tests_base.cfg.sample | 14 +++--- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index bd2f720..cdfb3ad 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -687,13 +687,13 @@ variants: packet_size = 1500 setup_cmd = "cd %s && tar xvfj netperf-2.4.5.tar.bz2 && cd netperf-2.4.5 && patch -p0 < ../wait_before_data.patch && ./configure && make" netserver_cmd = %s/netperf-2.4.5/src/netserver - variants: - - stream: - netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 -- -m %s - protocols = "TCP_STREAM TCP_MAERTS TCP_SENDFILE UDP_STREAM" - - rr: - netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 -- -r %s - protocols = "TCP_RR TCP_CRR UDP_RR" +variants: +- stream: +netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 -- -m %s +protocols = "TCP_STREAM TCP_MAERTS TCP_SENDFILE UDP_STREAM" +- rr: +netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 -- -r %s +protocols = "TCP_RR TCP_CRR UDP_RR" - ethtool: install setup unattended_install.cdrom type = ethtool -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM test: Fix wrong parameter name for migrate_background
On unattended_install with background ping pong migration. It was my mistake when modifying the original Jason's patchset. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/tests_base.cfg.sample |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index cdfb3ad..b82d1dc 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -103,7 +103,7 @@ variants: initrd = initrd.img nic_mode = tap # uncomment the following line to test the migration in parallel -# migrate_with_background = yes +# migrate_background = yes variants: # Install guest from cdrom -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM test: Rename virtio_guest.py to virtio_console_guest.py
Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/scripts/virtio_console_guest.py | 715 ++ client/tests/kvm/scripts/virtio_guest.py | 715 -- client/tests/kvm/tests/virtio_console.py | 20 +- 3 files changed, 725 insertions(+), 725 deletions(-) create mode 100755 client/tests/kvm/scripts/virtio_console_guest.py delete mode 100755 client/tests/kvm/scripts/virtio_guest.py diff --git a/client/tests/kvm/scripts/virtio_console_guest.py b/client/tests/kvm/scripts/virtio_console_guest.py new file mode 100755 index 000..35efb7d --- /dev/null +++ b/client/tests/kvm/scripts/virtio_console_guest.py @@ -0,0 +1,715 @@ +#!/usr/bin/python +# -*- coding: utf-8 -*- +""" +Auxiliary script used to send data between ports on guests. + +@copyright: 2010 Red Hat, Inc. +@author: Jiri Zupka (jzu...@redhat.com) +@author: Lukas Doktor (ldok...@redhat.com) +""" +import threading +from threading import Thread +import os, time, select, re, random, sys, array +import fcntl, subprocess, traceback, signal + +DEBUGPATH = "/sys/kernel/debug" +SYSFSPATH = "/sys/class/virtio-ports/" + +exiting = False + +class VirtioGuest: +""" +Test tools of virtio_ports. +""" +LOOP_NONE = 0 +LOOP_POLL = 1 +LOOP_SELECT = 2 + +def __init__(self): +self.files = {} +self.exit_thread = threading.Event() +self.threads = [] +self.ports = {} +self.poll_fds = {} +self.catch_signal = None +self.use_config = threading.Event() + + +def _readfile(self, name): +""" +Read file and return content as string + +@param name: Name of file +@return: Content of file as string +""" +out = "" +try: +f = open(name, "r") +out = f.read() +f.close() +except: +print "FAIL: Cannot open file %s" % (name) + +return out + + +def _get_port_status(self): +""" +Get info about ports from kernel debugfs. + +@return: Ports dictionary of port properties +""" +ports = {} +not_present_msg = "FAIL: There's no virtio-ports dir in debugfs" +if (not os.path.ismount(DEBUGPATH)): +os.system('mount -t debugfs none %s' % (DEBUGPATH)) +try: +if not os.path.isdir('%s/virtio-ports' % (DEBUGPATH)): +print not_present_msg +except: +print not_present_msg +else: +viop_names = os.listdir('%s/virtio-ports' % (DEBUGPATH)) +for name in viop_names: +open_db_file = "%s/virtio-ports/%s" % (DEBUGPATH, name) +f = open(open_db_file, 'r') +port = {} +file = [] +for line in iter(f): +file.append(line) +try: +for line in file: +m = re.match("(\S+): (\S+)", line) +port[m.group(1)] = m.group(2) + +if (port['is_console'] == "yes"): +port["path"] = "/dev/hvc%s" % (port["console_vtermno"]) +# Console works like a serialport +else: +port["path"] = "/dev/%s" % name + +if (not os.path.exists(port['path'])): +print "FAIL: %s not exist" % port['path'] + +sysfspath = SYSFSPATH + name +if (not os.path.isdir(sysfspath)): +print "FAIL: %s not exist" % (sysfspath) + +info_name = sysfspath + "/name" +port_name = self._readfile(info_name).strip() +if (port_name != port["name"]): +print ("FAIL: Port info not match \n%s - %s\n%s - %s" % + (info_name , port_name, +"%s/virtio-ports/%s" % (DEBUGPATH, name), +port["name"])) +except AttributeError: +print ("In file " + open_db_file + + " are bad data\n"+ "".join(file).strip()) +print ("FAIL: Fail file data.") +return + +ports[port['name']] = port +f.close() + +return ports + + +def init(self, in_files): +""" +Init and check port properties. +""" +self.ports = self._get_port_status() + +if self.ports == None: +return +for item in in_files: +if (item[1] != self.ports[item[0]]["is_console"]): +print self.ports +print "FAIL: Host console is not like console on guest side\n" +print "PASS: Init and check virtioconsole files in system." + + +class Switch(Thread): +""" +Thread that sends data between ports. +""" +
[PATCH 1/4] KVM test: Renaming script bonding_setup.py to nic_bonding_guest.py
We'll stablish a convention (of course, no extremely strict) about scripts ran in guest: We can call them [test_name]_guest.py. Let's start by converting bonding_setup to this convention. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/scripts/bonding_setup.py | 37 - client/tests/kvm/scripts/nic_bonding_guest.py | 37 + client/tests/kvm/tests/nic_bonding.py |8 +++--- 3 files changed, 41 insertions(+), 41 deletions(-) delete mode 100644 client/tests/kvm/scripts/bonding_setup.py create mode 100644 client/tests/kvm/scripts/nic_bonding_guest.py diff --git a/client/tests/kvm/scripts/bonding_setup.py b/client/tests/kvm/scripts/bonding_setup.py deleted file mode 100644 index f2d4be9..000 --- a/client/tests/kvm/scripts/bonding_setup.py +++ /dev/null @@ -1,37 +0,0 @@ -import os, re, commands, sys -"""This script is used to setup bonding, macaddr of bond0 should be assigned by -argv1""" - -if len(sys.argv) != 2: -sys.exit(1) -mac = sys.argv[1] -eth_nums = 0 -ifconfig_output = commands.getoutput("ifconfig") -re_eth = "eth[0-9]*" -for ename in re.findall(re_eth, ifconfig_output): -eth_config_file = "/etc/sysconfig/network-scripts/ifcfg-%s" % ename -eth_config = """DEVICE=%s -USERCTL=no -ONBOOT=yes -MASTER=bond0 -SLAVE=yes -BOOTPROTO=none -""" % ename -f = file(eth_config_file,'w') -f.write(eth_config) -f.close() - -bonding_config_file = "/etc/sysconfig/network-scripts/ifcfg-bond0" -bond_config = """DEVICE=bond0 -BOOTPROTO=dhcp -NETWORKING_IPV6=no -ONBOOT=yes -USERCTL=no -MACADDR=%s -""" % mac -f = file(bonding_config_file, "w") -f.write(bond_config) -f.close() -os.system("modprobe bonding") -os.system("service NetworkManager stop") -os.system("service network restart") diff --git a/client/tests/kvm/scripts/nic_bonding_guest.py b/client/tests/kvm/scripts/nic_bonding_guest.py new file mode 100644 index 000..f2d4be9 --- /dev/null +++ b/client/tests/kvm/scripts/nic_bonding_guest.py @@ -0,0 +1,37 @@ +import os, re, commands, sys +"""This script is used to setup bonding, macaddr of bond0 should be assigned by +argv1""" + +if len(sys.argv) != 2: +sys.exit(1) +mac = sys.argv[1] +eth_nums = 0 +ifconfig_output = commands.getoutput("ifconfig") +re_eth = "eth[0-9]*" +for ename in re.findall(re_eth, ifconfig_output): +eth_config_file = "/etc/sysconfig/network-scripts/ifcfg-%s" % ename +eth_config = """DEVICE=%s +USERCTL=no +ONBOOT=yes +MASTER=bond0 +SLAVE=yes +BOOTPROTO=none +""" % ename +f = file(eth_config_file,'w') +f.write(eth_config) +f.close() + +bonding_config_file = "/etc/sysconfig/network-scripts/ifcfg-bond0" +bond_config = """DEVICE=bond0 +BOOTPROTO=dhcp +NETWORKING_IPV6=no +ONBOOT=yes +USERCTL=no +MACADDR=%s +""" % mac +f = file(bonding_config_file, "w") +f.write(bond_config) +f.close() +os.system("modprobe bonding") +os.system("service NetworkManager stop") +os.system("service network restart") diff --git a/client/tests/kvm/tests/nic_bonding.py b/client/tests/kvm/tests/nic_bonding.py index ca9d70a..52ce0ae 100644 --- a/client/tests/kvm/tests/nic_bonding.py +++ b/client/tests/kvm/tests/nic_bonding.py @@ -8,7 +8,7 @@ def run_nic_bonding(test, params, env): Nic bonding test in guest. 1) Start guest with four nic models. -2) Setup bond0 in guest by script bonding_setup.py. +2) Setup bond0 in guest by script nic_bonding_guest.py. 3) Execute file transfer test between guest and host. 4) Repeatedly put down/up interfaces by set_link 5) Execute file transfer test between guest and host. @@ -34,9 +34,9 @@ def run_nic_bonding(test, params, env): vm = env.get_vm(params["main_vm"]) vm.verify_alive() session_serial = vm.wait_for_serial_login(timeout=timeout) -script_path = kvm_utils.get_path(test.bindir, "scripts/bonding_setup.py") -vm.copy_files_to(script_path, "/tmp/bonding_setup.py") -cmd = "python /tmp/bonding_setup.py %s" % vm.get_mac_address() +script_path = kvm_utils.get_path(test.bindir, "scripts/nic_bonding_guest.py") +vm.copy_files_to(script_path, "/tmp/nic_bonding_guest.py") +cmd = "python /tmp/nic_bonding_guest.py %s" % vm.get_mac_address() session_serial.cmd(cmd) termination_event = threading.Event() -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM test: Renaming join_mcast.py to multicast_guest.py
Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/scripts/join_mcast.py | 37 --- client/tests/kvm/scripts/multicast_guest.py | 37 +++ client/tests/kvm/tests/multicast.py |4 +- 3 files changed, 39 insertions(+), 39 deletions(-) delete mode 100755 client/tests/kvm/scripts/join_mcast.py create mode 100755 client/tests/kvm/scripts/multicast_guest.py diff --git a/client/tests/kvm/scripts/join_mcast.py b/client/tests/kvm/scripts/join_mcast.py deleted file mode 100755 index 350cd5f..000 --- a/client/tests/kvm/scripts/join_mcast.py +++ /dev/null @@ -1,37 +0,0 @@ -#!/usr/bin/python -import socket, struct, os, signal, sys -# -*- coding: utf-8 -*- - -""" -Script used to join machine into multicast groups. - -@author Amos Kong -""" - -if __name__ == "__main__": -if len(sys.argv) < 4: -print """%s [mgroup_count] [prefix] [suffix] -mgroup_count: count of multicast addresses -prefix: multicast address prefix -suffix: multicast address suffix""" % sys.argv[0] -sys.exit() - -mgroup_count = int(sys.argv[1]) -prefix = sys.argv[2] -suffix = int(sys.argv[3]) - -s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) -for i in range(mgroup_count): -mcast = prefix + "." + str(suffix + i) -try: -mreq = struct.pack("4sl", socket.inet_aton(mcast), - socket.INADDR_ANY) -s.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq) -except: -s.close() -print "Could not join multicast: %s" % mcast -raise - -print "join_mcast_pid:%s" % os.getpid() -os.kill(os.getpid(), signal.SIGSTOP) -s.close() diff --git a/client/tests/kvm/scripts/multicast_guest.py b/client/tests/kvm/scripts/multicast_guest.py new file mode 100755 index 000..350cd5f --- /dev/null +++ b/client/tests/kvm/scripts/multicast_guest.py @@ -0,0 +1,37 @@ +#!/usr/bin/python +import socket, struct, os, signal, sys +# -*- coding: utf-8 -*- + +""" +Script used to join machine into multicast groups. + +@author Amos Kong +""" + +if __name__ == "__main__": +if len(sys.argv) < 4: +print """%s [mgroup_count] [prefix] [suffix] +mgroup_count: count of multicast addresses +prefix: multicast address prefix +suffix: multicast address suffix""" % sys.argv[0] +sys.exit() + +mgroup_count = int(sys.argv[1]) +prefix = sys.argv[2] +suffix = int(sys.argv[3]) + +s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) +for i in range(mgroup_count): +mcast = prefix + "." + str(suffix + i) +try: +mreq = struct.pack("4sl", socket.inet_aton(mcast), + socket.INADDR_ANY) +s.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq) +except: +s.close() +print "Could not join multicast: %s" % mcast +raise + +print "join_mcast_pid:%s" % os.getpid() +os.kill(os.getpid(), signal.SIGSTOP) +s.close() diff --git a/client/tests/kvm/tests/multicast.py b/client/tests/kvm/tests/multicast.py index ddb7807..5dfecbc 100644 --- a/client/tests/kvm/tests/multicast.py +++ b/client/tests/kvm/tests/multicast.py @@ -53,9 +53,9 @@ def run_multicast(test, params, env): prefix = re.findall("\d+.\d+.\d+", mcast)[0] suffix = int(re.findall("\d+", mcast)[-1]) # copy python script to guest for joining guest to multicast groups -mcast_path = os.path.join(test.bindir, "scripts/join_mcast.py") +mcast_path = os.path.join(test.bindir, "scripts/multicast_guest.py") vm.copy_files_to(mcast_path, "/tmp") -output = session.cmd_output("python /tmp/join_mcast.py %d %s %d" % +output = session.cmd_output("python /tmp/multicast_guest.py %d %s %d" % (mgroup_count, prefix, suffix)) # if success to join multicast, the process will be paused, and return PID. -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM test: renaming allocator.py to ksm_overcommit_guest.py
Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/scripts/allocator.py| 237 -- client/tests/kvm/scripts/ksm_overcommit_guest.py | 237 ++ client/tests/kvm/tests/ksm_overcommit.py | 40 ++-- 3 files changed, 258 insertions(+), 256 deletions(-) delete mode 100755 client/tests/kvm/scripts/allocator.py create mode 100755 client/tests/kvm/scripts/ksm_overcommit_guest.py diff --git a/client/tests/kvm/scripts/allocator.py b/client/tests/kvm/scripts/allocator.py deleted file mode 100755 index 09dc004..000 --- a/client/tests/kvm/scripts/allocator.py +++ /dev/null @@ -1,237 +0,0 @@ -#!/usr/bin/python -# -*- coding: utf-8 -*- -""" -Auxiliary script used to allocate memory on guests. - -@copyright: 2008-2009 Red Hat Inc. -@author: Jiri Zupka (jzu...@redhat.com) -""" - - -import os, array, sys, struct, random, copy, inspect, tempfile, datetime, math - -PAGE_SIZE = 4096 # machine page size - -TMPFS_OVERHEAD = 0.0022 # overhead on 1MB of write data - - -class MemFill(object): -""" -Fills guest memory according to certain patterns. -""" -def __init__(self, mem, static_value, random_key): -""" -Constructor of MemFill class. - -@param mem: Amount of test memory in MB. -@param random_key: Seed of random series used for fill up memory. -@param static_value: Value used to fill all memory. -""" -if (static_value < 0 or static_value > 255): -print ("FAIL: Initialization static value" - "can be only in range (0..255)") -return - -self.tmpdp = tempfile.mkdtemp() -ret_code = os.system("mount -o size=%dM tmpfs %s -t tmpfs" % - ((mem+math.ceil(mem*TMPFS_OVERHEAD)), - self.tmpdp)) -if ret_code != 0: -if os.getuid() != 0: -print ("FAIL: Unable to mount tmpfs " - "(likely cause: you are not root)") -else: -print "FAIL: Unable to mount tmpfs" -else: -self.f = tempfile.TemporaryFile(prefix='mem', dir=self.tmpdp) -self.allocate_by = 'L' -self.npages = ((mem * 1024 * 1024) / PAGE_SIZE) -self.random_key = random_key -self.static_value = static_value -print "PASS: Initialization" - - -def __del__(self): -if os.path.ismount(self.tmpdp): -self.f.close() -os.system("umount %s" % (self.tmpdp)) - - -def compare_page(self, original, inmem): -""" -Compare pages of memory and print the differences found. - -@param original: Data that was expected to be in memory. -@param inmem: Data in memory. -""" -for ip in range(PAGE_SIZE / original.itemsize): -if (not original[ip] == inmem[ip]): # find which item is wrong -originalp = array.array("B") -inmemp = array.array("B") -originalp.fromstring(original[ip:ip+1].tostring()) -inmemp.fromstring(inmem[ip:ip+1].tostring()) -for ib in range(len(originalp)): # find wrong byte in item -if not (originalp[ib] == inmemp[ib]): -position = (self.f.tell() - PAGE_SIZE + ip * -original.itemsize + ib) -print ("Mem error on position %d wanted 0x%Lx and is " - "0x%Lx" % (position, originalp[ib], inmemp[ib])) - - -def value_page(self, value): -""" -Create page filled by value. - -@param value: String we want to fill the page with. -@return: return array of bytes size PAGE_SIZE. -""" -a = array.array("B") -for i in range((PAGE_SIZE / a.itemsize)): -try: -a.append(value) -except: -print "FAIL: Value can be only in range (0..255)" -return a - - -def random_page(self, seed): -""" -Create page filled by static random series. - -@param seed: Seed of random series. -@return: Static random array series. -""" -random.seed(seed) -a = array.array(self.allocate_by) -for i in range(PAGE_SIZE / a.itemsize): -a.append(random.randrange(0, sys.maxint)) -return a - - -def value_fill(self, value=None): -""" -Fill memory page by page, with value generated with value_page. - -@param value: Parameter to be passed to value_page. None to just use -what's on the attribute static_value. -""" -self.f.seek(0) -if value is None: -value = self.static_value -page = self.value_page(value) -for pages in range(self.npages): -page.tofile(self.f) -print "PASS: Mem value fill" - - -def value_check(se
[PATCH 0/4] Renaming scripts that we run on guests
For the sake of clarity, we stablish a convention, scripts copied to guests and executed there by tests will be called [test_name]_guest.py. This patchset takes care of renaming the scripts. Lucas Meneghel Rodrigues (4): KVM test: Renaming script bonding_setup.py to nic_bonding_guest.py KVM test: Renaming join_mcast.py to multicast_guest.py KVM test: renaming allocator.py to ksm_overcommit_guest.py KVM test: Rename virtio_guest.py to virtio_console_guest.py client/tests/kvm/scripts/allocator.py| 237 --- client/tests/kvm/scripts/bonding_setup.py| 37 -- client/tests/kvm/scripts/join_mcast.py | 37 -- client/tests/kvm/scripts/ksm_overcommit_guest.py | 237 +++ client/tests/kvm/scripts/multicast_guest.py | 37 ++ client/tests/kvm/scripts/nic_bonding_guest.py| 37 ++ client/tests/kvm/scripts/virtio_console_guest.py | 715 ++ client/tests/kvm/scripts/virtio_guest.py | 715 -- client/tests/kvm/tests/ksm_overcommit.py | 40 +- client/tests/kvm/tests/multicast.py |4 +- client/tests/kvm/tests/nic_bonding.py|8 +- client/tests/kvm/tests/virtio_console.py | 20 +- 12 files changed, 1063 insertions(+), 1061 deletions(-) delete mode 100755 client/tests/kvm/scripts/allocator.py delete mode 100644 client/tests/kvm/scripts/bonding_setup.py delete mode 100755 client/tests/kvm/scripts/join_mcast.py create mode 100755 client/tests/kvm/scripts/ksm_overcommit_guest.py create mode 100755 client/tests/kvm/scripts/multicast_guest.py create mode 100644 client/tests/kvm/scripts/nic_bonding_guest.py create mode 100755 client/tests/kvm/scripts/virtio_console_guest.py delete mode 100755 client/tests/kvm/scripts/virtio_guest.py -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] KVM test: Removing enospc pre and post scripts
As their functionality has been reimplemented as framework functionality Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/scripts/enospc-post.py | 77 --- client/tests/kvm/scripts/enospc-pre.py | 73 - 2 files changed, 0 insertions(+), 150 deletions(-) delete mode 100755 client/tests/kvm/scripts/enospc-post.py delete mode 100755 client/tests/kvm/scripts/enospc-pre.py diff --git a/client/tests/kvm/scripts/enospc-post.py b/client/tests/kvm/scripts/enospc-post.py deleted file mode 100755 index c6714f2..000 --- a/client/tests/kvm/scripts/enospc-post.py +++ /dev/null @@ -1,77 +0,0 @@ -#!/usr/bin/python -""" -Simple script to setup enospc test environment -""" -import os, commands, sys - -SCRIPT_DIR = os.path.dirname(sys.modules[__name__].__file__) -KVM_TEST_DIR = os.path.abspath(os.path.join(SCRIPT_DIR, "..")) - -class SetupError(Exception): -""" -Simple wrapper for the builtin Exception class. -""" -pass - - -def find_command(cmd): -""" -Searches for a command on common paths, error if it can't find it. - -@param cmd: Command to be found. -""" -if os.path.exists(cmd): -return cmd -for dir in ["/usr/local/sbin", "/usr/local/bin", -"/usr/sbin", "/usr/bin", "/sbin", "/bin"]: -file = os.path.join(dir, cmd) -if os.path.exists(file): -return file -raise ValueError('Missing command: %s' % cmd) - - -def run(cmd, info=None): -""" -Run a command and throw an exception if it fails. -Optionally, you can provide additional contextual info. - -@param cmd: Command string. -@param reason: Optional string that explains the context of the failure. - -@raise: SetupError if command fails. -""" -print "Running '%s'" % cmd -cmd_name = cmd.split(' ')[0] -find_command(cmd_name) -status, output = commands.getstatusoutput(cmd) -if status: -e_msg = ('Command %s failed.\nStatus:%s\nOutput:%s' % - (cmd, status, output)) -if info is not None: -e_msg += '\nAdditional Info:%s' % info -raise SetupError(e_msg) - -return (status, output) - - -if __name__ == "__main__": -qemu_img_binary = os.environ['KVM_TEST_qemu_img_binary'] -if not os.path.isabs(qemu_img_binary): -qemu_img_binary = os.path.join(KVM_TEST_DIR, qemu_img_binary) -if not os.path.exists(qemu_img_binary): -raise SetupError('The qemu-img binary that is supposed to be used ' - '(%s) does not exist. Please verify your ' - 'configuration' % qemu_img_binary) - -run("lvremove -f vgtest") -status, output = run("losetup -a") -loopback_device = None -if output: -for line in output.splitlines(): -device = line.split(":")[0] -if "/tmp/enospc.raw" in line: -loopback_device = device -break -if loopback_device is not None: -run("losetup -d %s" % loopback_device) -run("rm -rf /tmp/enospc.raw /tmp/kvm_autotest_root/images/enospc.qcow2") diff --git a/client/tests/kvm/scripts/enospc-pre.py b/client/tests/kvm/scripts/enospc-pre.py deleted file mode 100755 index 1313de3..000 --- a/client/tests/kvm/scripts/enospc-pre.py +++ /dev/null @@ -1,73 +0,0 @@ -#!/usr/bin/python -""" -Simple script to setup enospc test environment -""" -import os, commands, sys - -SCRIPT_DIR = os.path.dirname(sys.modules[__name__].__file__) -KVM_TEST_DIR = os.path.abspath(os.path.join(SCRIPT_DIR, "..")) - -class SetupError(Exception): -""" -Simple wrapper for the builtin Exception class. -""" -pass - - -def find_command(cmd): -""" -Searches for a command on common paths, error if it can't find it. - -@param cmd: Command to be found. -""" -if os.path.exists(cmd): -return cmd -for dir in ["/usr/local/sbin", "/usr/local/bin", -"/usr/sbin", "/usr/bin", "/sbin", "/bin"]: -file = os.path.join(dir, cmd) -if os.path.exists(file): -return file -raise ValueError('Missing command: %s' % cmd) - - -def run(cmd, info=None): -""" -Run a command and throw an exception if it fails. -Optionally, you can provide additional contextual info. - -@param cmd: Command string. -@param reason: Optional string that explains the context of the failure. - -@raise: SetupError if command fails. -""" -print "Running '%s'" % cmd -cmd_name = cmd.split(' ')[0] -find_command(cmd_name) -status, output = commands.getstatusoutput(cmd) -if status: -e_msg = ('Command %s failed.\nStatus:%s\nOutput:%s' % - (cmd, status, output)) -if info is not None: -e_msg += '\nAdditional Info:%s' % info -raise SetupError(e_msg) - -return (status, output.strip()) - - -if __name__ == "__main__": -qemu_img_binary = os.environ['KVM_TEST_q
[PATCH 3/6] KVM test: Removing scripts/unattended.py
Now that its functionality was implemented as part of the framework. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/scripts/unattended.py | 543 client/tests/kvm/tests_base.cfg.sample |2 - 2 files changed, 0 insertions(+), 545 deletions(-) delete mode 100755 client/tests/kvm/scripts/unattended.py diff --git a/client/tests/kvm/scripts/unattended.py b/client/tests/kvm/scripts/unattended.py deleted file mode 100755 index e65fe46..000 --- a/client/tests/kvm/scripts/unattended.py +++ /dev/null @@ -1,543 +0,0 @@ -#!/usr/bin/python -""" -Simple script to setup unattended installs on KVM guests. -""" -# -*- coding: utf-8 -*- -import os, sys, shutil, tempfile, re, ConfigParser, glob, inspect, commands -import common - - -SCRIPT_DIR = os.path.dirname(sys.modules[__name__].__file__) -KVM_TEST_DIR = os.path.abspath(os.path.join(SCRIPT_DIR, "..")) - - -class SetupError(Exception): -""" -Simple wrapper for the builtin Exception class. -""" -pass - - -def find_command(cmd): -""" -Searches for a command on common paths, error if it can't find it. - -@param cmd: Command to be found. -""" -if os.path.exists(cmd): -return cmd -for dir in ["/usr/local/sbin", "/usr/local/bin", -"/usr/sbin", "/usr/bin", "/sbin", "/bin"]: -file = os.path.join(dir, cmd) -if os.path.exists(file): -return file -raise ValueError('Missing command: %s' % cmd) - - -def run(cmd, info=None): -""" -Run a command and throw an exception if it fails. -Optionally, you can provide additional contextual info. - -@param cmd: Command string. -@param reason: Optional string that explains the context of the failure. - -@raise: SetupError if command fails. -""" -print "Running '%s'" % cmd -cmd_name = cmd.split(' ')[0] -find_command(cmd_name) -status, output = commands.getstatusoutput(cmd) -if status: -e_msg = ('Command %s failed.\nStatus:%s\nOutput:%s' % - (cmd, status, output)) -if info is not None: -e_msg += '\nAdditional Info:%s' % info -raise SetupError(e_msg) - -return (status, output.strip()) - - -def cleanup(dir): -""" -If dir is a mountpoint, do what is possible to unmount it. Afterwards, -try to remove it. - -@param dir: Directory to be cleaned up. -""" -print "Cleaning up directory %s" % dir -if os.path.ismount(dir): -os.system('fuser -k %s' % dir) -run('umount %s' % dir, info='Could not unmount %s' % dir) -if os.path.isdir(dir): -shutil.rmtree(dir) - - -def clean_old_image(image): -""" -Clean a leftover image file from previous processes. If it contains a -mounted file system, do the proper cleanup procedures. - -@param image: Path to image to be cleaned up. -""" -if os.path.exists(image): -mtab = open('/etc/mtab', 'r') -mtab_contents = mtab.read() -mtab.close() -if image in mtab_contents: -os.system('fuser -k %s' % image) -os.system('umount %s' % image) -os.remove(image) - - -class Disk(object): -""" -Abstract class for Disk objects, with the common methods implemented. -""" -def __init__(self): -self.path = None - - -def setup_answer_file(self, filename, contents): -answer_file = open(os.path.join(self.mount, filename), 'w') -answer_file.write(contents) -answer_file.close() - - -def copy_to(self, src): -dst = os.path.join(self.mount, os.path.basename(src)) -if os.path.isdir(src): -shutil.copytree(src, dst) -elif os.path.isfile(src): -shutil.copyfile(src, dst) - - -def close(self): -os.chmod(self.path, 0755) -cleanup(self.mount) -print "Disk %s successfuly set" % self.path - - -class FloppyDisk(Disk): -""" -Represents a 1.44 MB floppy disk. We can copy files to it, and setup it in -convenient ways. -""" -def __init__(self, path): -print "Creating floppy unattended image %s" % path -qemu_img_binary = os.environ['KVM_TEST_qemu_img_binary'] -if not os.path.isabs(qemu_img_binary): -qemu_img_binary = os.path.join(KVM_TEST_DIR, qemu_img_binary) -if not os.path.exists(qemu_img_binary): -raise SetupError('The qemu-img binary that is supposed to be used ' - '(%s) does not exist. Please verify your ' - 'configuration' % qemu_img_binary) - -self.mount = tempfile.mkdtemp(prefix='floppy_', dir='/tmp') -self.virtio_mount = None -self.path = path -clean_old_image(path) -if not os.path.isdir(os.path.dirname(path)): -os.makedirs(os.path.dirname(path)) - -try: -c_cmd = '%s create -f raw %s 1440k' % (qemu_img_binary, path) -run(c_cm
[PATCH 5/6] KVM test: Turn enospc test pre/post actions into infrastructure
So we can get rid of the pre/post scripts. Wit the rearrangement we were able to achieve several advantages: - More rigorous and paranoid cleanup phase - Better identification of the lvm devices, less likely to originate conflicts with devices in the host - Use the shared autotest API avoiding code duplication Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/kvm_preprocessing.py |8 ++ client/tests/kvm/test_setup.py | 116 +-- client/tests/kvm/tests/enospc.py |6 ++- client/tests/kvm/tests_base.cfg.sample |5 +- 4 files changed, 124 insertions(+), 11 deletions(-) diff --git a/client/tests/kvm/kvm_preprocessing.py b/client/tests/kvm/kvm_preprocessing.py index 081a13f..2713805 100644 --- a/client/tests/kvm/kvm_preprocessing.py +++ b/client/tests/kvm/kvm_preprocessing.py @@ -262,6 +262,10 @@ def preprocess(test, params, env): u = test_setup.UnattendedInstallConfig(test, params) u.setup() +if params.get("type") == "enospc": +e = test_setup.EnospcConfig(test, params) +e.setup() + # Execute any pre_commands if params.get("pre_command"): process_command(test, params, env, params.get("pre_command"), @@ -362,6 +366,10 @@ def postprocess(test, params, env): h = kvm_utils.HugePageConfig(params) h.cleanup() +if params.get("type") == "enospc": +e = test_setup.EnospcConfig(test, params) +e.cleanup() + # Execute any post_commands if params.get("post_command"): process_command(test, params, env, params.get("post_command"), diff --git a/client/tests/kvm/test_setup.py b/client/tests/kvm/test_setup.py index b17c473..e906e18 100644 --- a/client/tests/kvm/test_setup.py +++ b/client/tests/kvm/test_setup.py @@ -2,7 +2,7 @@ Library to perform pre/post test setup for KVM autotest. """ import os, sys, shutil, tempfile, re, ConfigParser, glob, inspect, commands -import logging +import logging, time from autotest_lib.client.common_lib import error from autotest_lib.client.bin import utils @@ -42,6 +42,19 @@ def clean_old_image(image): os.remove(image) +def display_attributes(instance): +""" +Inspects a given class instance attributes and displays them, convenient +for debugging. +""" +logging.debug("Attributes set:") +for member in inspect.getmembers(instance): +name, value = member +attribute = getattr(instance, name) +if not (name.startswith("__") or callable(attribute) or not value): +logging.debug("%s: %s", name, value) + + class Disk(object): """ Abstract class for Disk objects, with the common methods implemented. @@ -472,13 +485,7 @@ class UnattendedInstallConfig(object): Uses an appropriate strategy according to each install model. """ logging.info("Starting unattended install setup") - -logging.debug("Variables set:") -for member in inspect.getmembers(self): -name, value = member -attribute = getattr(self, name) -if not (name.startswith("__") or callable(attribute) or not value): -logging.debug("%s: %s", name, value) +display_attributes(self) if self.unattended_file and (self.floppy or self.cdrom_unattended): self.setup_boot_disk() @@ -593,3 +600,96 @@ class HugePageConfig(object): return utils.system("echo 0 > %s" % self.kernel_hp_file) logging.debug("Hugepage memory successfuly dealocated") + + +class EnospcConfig(object): +""" +Performs setup for the test enospc. This is a borg class, similar to a +singleton. The idea is to keep state in memory for when we call cleanup() +on postprocessing. +""" +__shared_state = {} +def __init__(self, test, params): +self.__dict__ = self.__shared_state +root_dir = test.bindir +self.tmpdir = test.tmpdir +self.qemu_img_binary = params.get('qemu_img_binary') +if not os.path.isfile(self.qemu_img_binary): +self.qemu_img_binary = os.path.join(root_dir, +self.qemu_img_binary) +self.raw_file_path = os.path.join(self.tmpdir, 'enospc.raw') +# Here we're trying to choose fairly explanatory names so it's less +# likely that we run in conflict with other devices in the system +self.vgtest_name = params.get("vgtest_name") +self.lvtest_name = params.get("lvtest_name") +self.lvtest_device = "/dev/%s/%s" % (self.vgtest_name, self.lvtest_name) +image_dir = os.path.dirname(params.get("image_name")) +self.qcow_file_path = os.path.join(image_dir, 'enospc.qcow2') +try: +getattr(self, 'loopback') +except AttributeError: +self.loopback = '' + + +@error.context_aware +def setup(self): +logging.debug("Starting enospc setup") +
[PATCH 1/6] KVM test: Introducing test_setup library
In order to concentrate setup classes for the KVM autotest tests, create test_setup.py. This library will contain code used to perform actions before to the actual test execution, putting some hooks on the test postprocessing code. The first class in there is the UnattendedInstallConfig class, that prepares the environment for unattended installs. Advantages with doing this in framework code: - Setup errors are easier to figure out than having a 'pre command failed' error reason. - We can use test.tmpdir to store temp dirs, which makes things even cleaner and less intrusive in the system. - Less code duplication. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/test_setup.py | 494 1 files changed, 494 insertions(+), 0 deletions(-) create mode 100644 client/tests/kvm/test_setup.py diff --git a/client/tests/kvm/test_setup.py b/client/tests/kvm/test_setup.py new file mode 100644 index 000..7b7ef14 --- /dev/null +++ b/client/tests/kvm/test_setup.py @@ -0,0 +1,494 @@ +""" +Library to perform pre/post test setup for KVM autotest. +""" +import os, sys, shutil, tempfile, re, ConfigParser, glob, inspect, commands +import logging +from autotest_lib.client.common_lib import error +from autotest_lib.client.bin import utils + + +@error.context_aware +def cleanup(dir): +""" +If dir is a mountpoint, do what is possible to unmount it. Afterwards, +try to remove it. + +@param dir: Directory to be cleaned up. +""" +error.context("cleaning up unattended install directory %s" % dir) +if os.path.ismount(dir): +utils.run('fuser -k %s' % dir, ignore_status=True) +utils.run('umount %s' % dir) +if os.path.isdir(dir): +shutil.rmtree(dir) + + +@error.context_aware +def clean_old_image(image): +""" +Clean a leftover image file from previous processes. If it contains a +mounted file system, do the proper cleanup procedures. + +@param image: Path to image to be cleaned up. +""" +error.context("cleaning up old leftover image %s" % image) +if os.path.exists(image): +mtab = open('/etc/mtab', 'r') +mtab_contents = mtab.read() +mtab.close() +if image in mtab_contents: +utils.run('fuser -k %s' % image, ignore_status=True) +utils.run('umount %s' % image) +os.remove(image) + + +class Disk(object): +""" +Abstract class for Disk objects, with the common methods implemented. +""" +def __init__(self): +self.path = None + + +def setup_answer_file(self, filename, contents): +utils.open_write_close(os.path.join(self.mount, filename), contents) + + +def copy_to(self, src): +dst = os.path.join(self.mount, os.path.basename(src)) +if os.path.isdir(src): +shutil.copytree(src, dst) +elif os.path.isfile(src): +shutil.copyfile(src, dst) + + +def close(self): +os.chmod(self.path, 0755) +cleanup(self.mount) +logging.debug("Disk %s successfuly set", self.path) + + +class FloppyDisk(Disk): +""" +Represents a 1.44 MB floppy disk. We can copy files to it, and setup it in +convenient ways. +""" +@error.context_aware +def __init__(self, path, qemu_img_binary, tmpdir): +error.context("Creating unattended install floppy image %s" % path) +self.tmpdir = tmpdir +self.mount = tempfile.mkdtemp(prefix='floppy_', dir=self.tmpdir) +self.virtio_mount = None +self.path = path +clean_old_image(path) +if not os.path.isdir(os.path.dirname(path)): +os.makedirs(os.path.dirname(path)) + +try: +c_cmd = '%s create -f raw %s 1440k' % (qemu_img_binary, path) +utils.run(c_cmd) +f_cmd = 'mkfs.msdos -s 1 %s' % path +utils.run(f_cmd) +m_cmd = 'mount -o loop,rw %s %s' % (path, self.mount) +utils.run(m_cmd) +except error.CmdError, e: +cleanup(self.mount) +raise + + +def _copy_virtio_drivers(self, virtio_floppy): +""" +Copy the virtio drivers on the virtio floppy to the install floppy. + +1) Mount the floppy containing the viostor drivers +2) Copy its contents to the root of the install floppy +""" +virtio_mount = tempfile.mkdtemp(prefix='virtio_floppy_', +dir=self.tmpdir) + +pwd = os.getcwd() +try: +m_cmd = 'mount -o loop %s %s' % (virtio_floppy, virtio_mount) +utils.run(m_cmd) +os.chdir(virtio_mount) +path_list = glob.glob('*') +for path in path_list: +self.copy_to(path) +finally: +os.chdir(pwd) +cleanup(virtio_mount) + + +def setup_virtio_win2003(self, virtio_floppy, virtio_oemsetup_id): +""" +Setup the install floppy with the virtio s
[PATCH 2/6] KVM test: Make unattended _install use the new pre script
Also, get rid of references to the old unattended install script. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/kvm_preprocessing.py |6 +- client/tests/kvm/tests_base.cfg.sample |2 -- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/client/tests/kvm/kvm_preprocessing.py b/client/tests/kvm/kvm_preprocessing.py index 41455cf..12adb6a 100644 --- a/client/tests/kvm/kvm_preprocessing.py +++ b/client/tests/kvm/kvm_preprocessing.py @@ -1,7 +1,7 @@ import sys, os, time, commands, re, logging, signal, glob, threading, shutil from autotest_lib.client.bin import test, utils from autotest_lib.client.common_lib import error -import kvm_vm, kvm_utils, kvm_subprocess, kvm_monitor, ppm_utils +import kvm_vm, kvm_utils, kvm_subprocess, kvm_monitor, ppm_utils, test_setup try: import PIL.Image except ImportError: @@ -258,6 +258,10 @@ def preprocess(test, params, env): h = kvm_utils.HugePageConfig(params) h.setup() +if params.get("type") == "unattended_install": +u = test_setup.UnattendedInstallConfig(test, params) +u.setup() + # Execute any pre_commands if params.get("pre_command"): process_command(test, params, env, params.get("pre_command"), diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 184a582..c727c32 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -97,7 +97,6 @@ variants: kill_vm_gracefully = yes kill_vm_on_error = yes force_create_image = yes -pre_command += " scripts/unattended.py;" extra_params += " -boot d" guest_port_unattended_install = 12323 kernel = vmlinuz @@ -381,7 +380,6 @@ variants: # The support VM is identical to the tested VM in every way # except for the image name which ends with '-supportvm'. type = unattended_install -pre_command += " scripts/unattended.py;" extra_params += " -boot d" force_create_image = yes kill_vm = yes -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] KVM config: Move HugePageConfig() to test_setup
So we concentrate the setup classes together. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/kvm_preprocessing.py |2 +- client/tests/kvm/kvm_utils.py | 101 - client/tests/kvm/test_setup.py| 101 + 3 files changed, 102 insertions(+), 102 deletions(-) diff --git a/client/tests/kvm/kvm_preprocessing.py b/client/tests/kvm/kvm_preprocessing.py index 12adb6a..081a13f 100644 --- a/client/tests/kvm/kvm_preprocessing.py +++ b/client/tests/kvm/kvm_preprocessing.py @@ -255,7 +255,7 @@ def preprocess(test, params, env): test.write_test_keyval({"kvm_userspace_version": kvm_userspace_version}) if params.get("setup_hugepages") == "yes": -h = kvm_utils.HugePageConfig(params) +h = test_setup.HugePageConfig(params) h.setup() if params.get("type") == "unattended_install": diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index 632badb..78c9f25 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -1265,107 +1265,6 @@ class KvmLoggingConfig(logging_config.LoggingConfig): verbose=verbose) -class HugePageConfig: -def __init__(self, params): -""" -Gets environment variable values and calculates the target number -of huge memory pages. - -@param params: Dict like object containing parameters for the test. -""" -self.vms = len(params.objects("vms")) -self.mem = int(params.get("mem")) -self.max_vms = int(params.get("max_vms", 0)) -self.hugepage_path = '/mnt/kvm_hugepage' -self.hugepage_size = self.get_hugepage_size() -self.target_hugepages = self.get_target_hugepages() -self.kernel_hp_file = '/proc/sys/vm/nr_hugepages' - - -def get_hugepage_size(self): -""" -Get the current system setting for huge memory page size. -""" -meminfo = open('/proc/meminfo', 'r').readlines() -huge_line_list = [h for h in meminfo if h.startswith("Hugepagesize")] -try: -return int(huge_line_list[0].split()[1]) -except ValueError, e: -raise ValueError("Could not get huge page size setting from " - "/proc/meminfo: %s" % e) - - -def get_target_hugepages(self): -""" -Calculate the target number of hugepages for testing purposes. -""" -if self.vms < self.max_vms: -self.vms = self.max_vms -# memory of all VMs plus qemu overhead of 64MB per guest -vmsm = (self.vms * self.mem) + (self.vms * 64) -return int(vmsm * 1024 / self.hugepage_size) - - -@error.context_aware -def set_hugepages(self): -""" -Sets the hugepage limit to the target hugepage value calculated. -""" -error.context("setting hugepages limit to %s" % self.target_hugepages) -hugepage_cfg = open(self.kernel_hp_file, "r+") -hp = hugepage_cfg.readline() -while int(hp) < self.target_hugepages: -loop_hp = hp -hugepage_cfg.write(str(self.target_hugepages)) -hugepage_cfg.flush() -hugepage_cfg.seek(0) -hp = int(hugepage_cfg.readline()) -if loop_hp == hp: -raise ValueError("Cannot set the kernel hugepage setting " - "to the target value of %d hugepages." % - self.target_hugepages) -hugepage_cfg.close() -logging.debug("Successfuly set %s large memory pages on host ", - self.target_hugepages) - - -@error.context_aware -def mount_hugepage_fs(self): -""" -Verify if there's a hugetlbfs mount set. If there's none, will set up -a hugetlbfs mount using the class attribute that defines the mount -point. -""" -error.context("mounting hugepages path") -if not os.path.ismount(self.hugepage_path): -if not os.path.isdir(self.hugepage_path): -os.makedirs(self.hugepage_path) -cmd = "mount -t hugetlbfs none %s" % self.hugepage_path -utils.system(cmd) - - -def setup(self): -logging.debug("Number of VMs this test will use: %d", self.vms) -logging.debug("Amount of memory used by each vm: %s", self.mem) -logging.debug("System setting for large memory page size: %s", - self.hugepage_size) -logging.debug("Number of large memory pages needed for this test: %s", - self.target_hugepages) -self.set_hugepages() -self.mount_hugepage_fs() - - -@error.context_aware -def cleanup(self): -error.context("trying to dealocate hugepage memory") -try: -utils.system("umount %s" % self.hugepage_path) -except error.C
[RFC PATCH 2/2] device-assignment: Count required kvm memory slots
Each MMIO PCI BAR of an assigned device is directly mapped via a KVM memory slot to avoid bouncing reads and writes through qemu. KVM only provides a (small) fixed number of these slots and attempting to exceed the unadvertised limit results in an abort. We can't reserve slots, but let's at least try to make an attempt to check whether there are enough available before adding a device. The non-hotplug case is troublesome here because we have no visibility as to what else might make use of these slots, but hasn't yet been mapped. We used to limit the number of devices that could be specified on the commandline using the -pcidevice option. The heuristic here seems to work and provides a similar limit. We can also avoid using these memory slots by allowing devices to bounce mmio access through qemu. This is trivially accomplished by adding a force_slow=on option to pci-assign. Signed-off-by: Alex Williamson --- hw/device-assignment.c | 59 +++- hw/device-assignment.h |3 ++ 2 files changed, 61 insertions(+), 1 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index e97f565..0063a11 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -546,7 +546,9 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, ? PCI_BASE_ADDRESS_MEM_PREFETCH : PCI_BASE_ADDRESS_SPACE_MEMORY; -if (cur_region->size & 0xFFF) { +if (pci_dev->features & ASSIGNED_DEVICE_FORCE_SLOW_MASK) { +slow_map = 1; +} else if (cur_region->size & 0xFFF) { fprintf(stderr, "PCI region %d at address 0x%llx " "has size 0x%x, which is not a multiple of 4K. " "You might experience some performance hit " @@ -556,6 +558,10 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, slow_map = 1; } +if (!slow_map) { +pci_dev->slots_needed++; +} + /* map physical memory */ pci_dev->v_addrs[i].e_physbase = cur_region->base_addr; pci_dev->v_addrs[i].u.r_virtbase = mmap(NULL, cur_region->size, @@ -1666,6 +1672,30 @@ static CPUReadMemoryFunc *msix_mmio_read[] = { static int assigned_dev_register_msix_mmio(AssignedDevice *dev) { +int i; +PCIRegion *pci_region = dev->real_device.regions; + +/* Determine if the MSI-X table splits a BAR, requiring the use of + * two memory slots, one to map each remaining part. */ +if (!(dev->features & ASSIGNED_DEVICE_FORCE_SLOW_MASK)) { +for (i = 0; i < dev->real_device.region_number; i++, pci_region++) { +if (!pci_region->valid) { +continue; +} + +if (ranges_overlap(pci_region->base_addr, pci_region->size, + dev->msix_table_addr, 0x1000)) { +target_phys_addr_t offset; + +offset = dev->msix_table_addr - pci_region->base_addr; +if (offset && pci_region->size > offset + 0x1000) { +dev->slots_needed++; +} +break; +} +} +} + dev->msix_table_page = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); @@ -1768,6 +1798,31 @@ static int assigned_initfn(struct PCIDevice *pci_dev) if (assigned_dev_register_msix_mmio(dev)) goto assigned_out; +if (!(dev->features & ASSIGNED_DEVICE_FORCE_SLOW_MASK)) { +int free_slots = kvm_free_slots(); +int total_slots = dev->slots_needed; + +if (!dev->dev.qdev.hotplugged) { +AssignedDevice *adev; + +QLIST_FOREACH(adev, &devs, next) { +total_slots += adev->slots_needed; +} + +/* This seems to work, but it's completely heuristically + * determined. Any number of things might make use of kvm + * memory slots before the guest starts mapping memory BARs. + * This is really just a guess. */ +free_slots -= 13; +} + +if (total_slots > free_slots) { +error_report("pci-assign: Out of memory slots, need %d, have %d\n", + total_slots, free_slots); +goto assigned_out; +} +} + assigned_dev_load_option_rom(dev); QLIST_INSERT_HEAD(&devs, dev, next); @@ -1837,6 +1892,8 @@ static PCIDeviceInfo assign_info = { ASSIGNED_DEVICE_USE_IOMMU_BIT, true), DEFINE_PROP_BIT("prefer_msi", AssignedDevice, features, ASSIGNED_DEVICE_PREFER_MSI_BIT, true), +DEFINE_PROP_BIT("force_slow", AssignedDevice, features, +ASSIGNED_DEVICE_FORCE_SLOW_BIT, false), DEFINE_PROP_STRING("configfd", AssignedDevice
[RFC PATCH 1/2] kvm: Allow querying free slots
KVM memory slots are used any place we want a guest to have direct access to a chunk of memory. Unfortunately, there's only a small, fixed number of them, and accidentally going over the limit causes an abort. Add a trivial interface so that callers can at least guess if they have a chance to successfully map memory. Signed-off-by: Alex Williamson --- kvm-all.c | 16 kvm.h |2 ++ 2 files changed, 18 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 2f203dd..4fe3631 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -96,6 +96,22 @@ static KVMSlot *kvm_alloc_slot(KVMState *s) abort(); } +int kvm_free_slots(void) +{ +KVMState *s = kvm_state; +int i, j; + +for (i = 0, j = 0; i < ARRAY_SIZE(s->slots); i++) { +/* KVM private memory slots and used slots */ +if ((i >= 8 && i < 12) || s->slots[i].memory_size) { +continue; +} +j++; +} + +return j; +} + static KVMSlot *kvm_lookup_matching_slot(KVMState *s, target_phys_addr_t start_addr, target_phys_addr_t end_addr) diff --git a/kvm.h b/kvm.h index 02280a6..93da155 100644 --- a/kvm.h +++ b/kvm.h @@ -221,4 +221,6 @@ int kvm_irqchip_in_kernel(void); int kvm_set_irq(int irq, int level, int *status); +int kvm_free_slots(void); + #endif -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
When doing device assignment, we use cpu_register_physical_memory() to directly map the qemu mmap of the device resource into the address space of the guest. The unadvertised feature of the register physical memory code path on kvm, at least for this type of mapping, is that it needs to allocate an index from a small, fixed array of memory slots. Even better, if it can't get an index, the code aborts deep in the kvm specific bits, preventing the caller from having a chance to recover. It's really easy to hit this by hot adding too many assigned devices to a guest (pretty easy to hit with too many devices at instantiation time too, but the abort is slightly more bearable there). I'm assuming it's pretty difficult to make the memory slot array dynamically sized. If that's not the case, please let me know as that would be a much better solution. I'm not terribly happy with the solution in this series, it doesn't provide any guarantees whether a cpu_register_physical_memory() will succeed, only slightly better educated guesses. Are there better ideas how we could solve this? Thanks, Alex --- Alex Williamson (2): device-assignment: Count required kvm memory slots kvm: Allow querying free slots hw/device-assignment.c | 59 +++- hw/device-assignment.h |3 ++ kvm-all.c | 16 + kvm.h |2 ++ 4 files changed, 79 insertions(+), 1 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Flow Control and Port Mirroring Revisited
On Fri, Jan 21, 2011 at 11:59:30AM +0200, Michael S. Tsirkin wrote: > On Thu, Jan 20, 2011 at 05:38:33PM +0900, Simon Horman wrote: > > [ Trimmed Eric from CC list as vger was complaining that it is too long ] > > > > On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote: > > > >So it won't be all that simple to implement well, and before we try, > > > >I'd like to know whether there are applications that are helped > > > >by it. For example, we could try to measure latency at various > > > >pps and see whether the backpressure helps. netperf has -b, -w > > > >flags which might help these measurements. > > > > > > Those options are enabled when one adds --enable-burst to the > > > pre-compilation ./configure of netperf (one doesn't have to > > > recompile netserver). However, if one is also looking at latency > > > statistics via the -j option in the top-of-trunk, or simply at the > > > histogram with --enable-histogram on the ./configure and a verbosity > > > level of 2 (global -v 2) then one wants the very top of trunk > > > netperf from: > > > > Hi, > > > > I have constructed a test where I run an un-paced UDP_STREAM test in > > one guest and a paced omni rr test in another guest at the same time. > > Hmm, what is this supposed to measure? Basically each time you run an > un-paced UDP_STREAM you get some random load on the network. > You can't tell what it was exactly, only that it was between > the send and receive throughput. Rick mentioned in another email that I messed up my test parameters a bit, so I will re-run the tests, incorporating his suggestions. What I was attempting to measure was the effect of an unpaced UDP_STREAM on the latency of more moderated traffic. Because I am interested in what effect an abusive guest has on other guests and how that my be mitigated. Could you suggest some tests that you feel are more appropriate? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/18] kvm: x86: Swallow KVM_EXIT_SET_TPR
From: Jan Kiszka This exit only triggers activity in the common exit path, but we should accept it in order to be able to detect unknown exit types. Signed-off-by: Jan Kiszka --- target-i386/kvm.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index fda07d2..0aeb079 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1534,6 +1534,9 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run) DPRINTF("handle_hlt\n"); ret = kvm_handle_halt(env); break; +case KVM_EXIT_SET_TPR: +ret = 1; +break; } return ret; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/18] kvm: Improve reporting of fatal errors
From: Jan Kiszka Report KVM_EXIT_UNKNOWN, KVM_EXIT_FAIL_ENTRY, and KVM_EXIT_EXCEPTION with more details to stderr. The latter two are so far x86-only, so move them into the arch-specific handler. Integrate the Intel real mode warning on KVM_EXIT_FAIL_ENTRY that qemu-kvm carries, but actually restrict it to Intel CPUs. Moreover, always dump the CPU state in case we fail. Signed-off-by: Jan Kiszka --- kvm-all.c | 22 -- target-i386/cpu.h |2 ++ target-i386/cpuid.c |5 ++--- target-i386/kvm.c | 33 + 4 files changed, 45 insertions(+), 17 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index eaf9272..10e1194 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -817,22 +817,22 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size, #ifdef KVM_CAP_INTERNAL_ERROR_DATA static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run) { - +fprintf(stderr, "KVM internal error."); if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) { int i; -fprintf(stderr, "KVM internal error. Suberror: %d\n", -run->internal.suberror); - +fprintf(stderr, " Suberror: %d\n", run->internal.suberror); for (i = 0; i < run->internal.ndata; ++i) { fprintf(stderr, "extra data[%d]: %"PRIx64"\n", i, (uint64_t)run->internal.data[i]); } +} else { +fprintf(stderr, "\n"); } -cpu_dump_state(env, stderr, fprintf, 0); if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) { fprintf(stderr, "emulation failure\n"); if (!kvm_arch_stop_on_emulation_error(env)) { +cpu_dump_state(env, stderr, fprintf, 0); return 0; } } @@ -966,15 +966,8 @@ int kvm_cpu_exec(CPUState *env) ret = 1; break; case KVM_EXIT_UNKNOWN: -DPRINTF("kvm_exit_unknown\n"); -ret = -1; -break; -case KVM_EXIT_FAIL_ENTRY: -DPRINTF("kvm_exit_fail_entry\n"); -ret = -1; -break; -case KVM_EXIT_EXCEPTION: -DPRINTF("kvm_exit_exception\n"); +fprintf(stderr, "KVM: unknown exit, hardware reason %" PRIx64 "\n", +(uint64_t)run->hw.hardware_exit_reason); ret = -1; break; #ifdef KVM_CAP_INTERNAL_ERROR_DATA @@ -1001,6 +994,7 @@ int kvm_cpu_exec(CPUState *env) } while (ret > 0); if (ret < 0) { +cpu_dump_state(env, stderr, fprintf, 0); vm_stop(0); env->exit_request = 1; } diff --git a/target-i386/cpu.h b/target-i386/cpu.h index dddcd74..a457423 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -874,6 +874,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, uint32_t *ecx, uint32_t *edx); int cpu_x86_register (CPUX86State *env, const char *cpu_model); void cpu_clear_apic_feature(CPUX86State *env); +void host_cpuid(uint32_t function, uint32_t count, +uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx); /* helper.c */ int cpu_x86_handle_mmu_fault(CPUX86State *env, target_ulong addr, diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c index 165045e..5382a28 100644 --- a/target-i386/cpuid.c +++ b/target-i386/cpuid.c @@ -103,9 +103,8 @@ typedef struct model_features_t { int check_cpuid = 0; int enforce_cpuid = 0; -static void host_cpuid(uint32_t function, uint32_t count, - uint32_t *eax, uint32_t *ebx, - uint32_t *ecx, uint32_t *edx) +void host_cpuid(uint32_t function, uint32_t count, +uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx) { #if defined(CONFIG_KVM) uint32_t vec[4]; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 6b4abaa..0ba13fc 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1525,8 +1525,19 @@ static int kvm_handle_halt(CPUState *env) return 1; } +static bool host_supports_vmx(void) +{ +uint32_t ecx, unused; + +host_cpuid(1, 0, &unused, &unused, &ecx, &unused); +return ecx & CPUID_EXT_VMX; +} + +#define VMX_INVALID_GUEST_STATE 0x8021 + int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run) { +uint64_t code; int ret = 0; switch (run->exit_reason) { @@ -1537,6 +1548,28 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run) case KVM_EXIT_SET_TPR: ret = 1; break; +case KVM_EXIT_FAIL_ENTRY: +code = run->fail_entry.hardware_entry_failure_reason; +fprintf(stderr, "KVM: entry failed, hardware error 0x%" PRIx64 "\n", +code); +if (host_supports_vmx() && code == VMX_INVALID_GUEST_STATE) { +fprintf(stderr, +"\nIf you're runnning a guest on an Intel machine without " +"unrestricted mode\n" +"supp
[PATCH 11/18] kvm: x86: Fix !CONFIG_KVM_PARA build
From: Jan Kiszka If we lack kvm_para.h, MSR_KVM_ASYNC_PF_EN is not defined. The change in kvm_arch_init_vcpu is just for consistency reasons. Signed-off-by: Jan Kiszka --- target-i386/kvm.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 825af42..feaf33d 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -319,7 +319,7 @@ int kvm_arch_init_vcpu(CPUState *env) uint32_t limit, i, j, cpuid_i; uint32_t unused; struct kvm_cpuid_entry2 *c; -#ifdef KVM_CPUID_SIGNATURE +#ifdef CONFIG_KVM_PARA uint32_t signature[3]; #endif @@ -855,7 +855,7 @@ static int kvm_put_msrs(CPUState *env, int level) kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); -#ifdef KVM_CAP_ASYNC_PF +#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF) kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr); #endif } @@ -1091,7 +1091,7 @@ static int kvm_get_msrs(CPUState *env) #endif msrs[n++].index = MSR_KVM_SYSTEM_TIME; msrs[n++].index = MSR_KVM_WALL_CLOCK; -#ifdef KVM_CAP_ASYNC_PF +#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF) msrs[n++].index = MSR_KVM_ASYNC_PF_EN; #endif @@ -1167,7 +1167,7 @@ static int kvm_get_msrs(CPUState *env) } #endif break; -#ifdef KVM_CAP_ASYNC_PF +#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF) case MSR_KVM_ASYNC_PF_EN: env->async_pf_en_msr = msrs[i].data; break; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/18] kvm: x86: Refactor msr_star/hsave_pa setup and checks
From: Jan Kiszka Simplify kvm_has_msr_star/hsave_pa to booleans and push their one-time initialization into kvm_arch_init. Also handle potential errors of that setup procedure. Signed-off-by: Jan Kiszka --- target-i386/kvm.c | 47 +++ 1 files changed, 19 insertions(+), 28 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index c4a22dd..454ddb1 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -54,6 +54,8 @@ #define BUS_MCEERR_AO 5 #endif +static bool has_msr_star; +static bool has_msr_hsave_pa; static int lm_capable_kernel; #ifdef KVM_CAP_EXT_CPUID @@ -459,13 +461,10 @@ void kvm_arch_reset_vcpu(CPUState *env) } } -int has_msr_star; -int has_msr_hsave_pa; - -static void kvm_supported_msrs(CPUState *env) +static int kvm_get_supported_msrs(KVMState *s) { static int kvm_supported_msrs; -int ret; +int ret = 0; /* first time */ if (kvm_supported_msrs == 0) { @@ -476,9 +475,9 @@ static void kvm_supported_msrs(CPUState *env) /* Obtain MSR list from KVM. These are the MSRs that we must * save/restore */ msr_list.nmsrs = 0; -ret = kvm_ioctl(env->kvm_state, KVM_GET_MSR_INDEX_LIST, &msr_list); +ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, &msr_list); if (ret < 0 && ret != -E2BIG) { -return; +return ret; } /* Old kernel modules had a bug and could write beyond the provided memory. Allocate at least a safe amount of 1K. */ @@ -487,17 +486,17 @@ static void kvm_supported_msrs(CPUState *env) sizeof(msr_list.indices[0]))); kvm_msr_list->nmsrs = msr_list.nmsrs; -ret = kvm_ioctl(env->kvm_state, KVM_GET_MSR_INDEX_LIST, kvm_msr_list); +ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, kvm_msr_list); if (ret >= 0) { int i; for (i = 0; i < kvm_msr_list->nmsrs; i++) { if (kvm_msr_list->indices[i] == MSR_STAR) { -has_msr_star = 1; +has_msr_star = true; continue; } if (kvm_msr_list->indices[i] == MSR_VM_HSAVE_PA) { -has_msr_hsave_pa = 1; +has_msr_hsave_pa = true; continue; } } @@ -506,19 +505,7 @@ static void kvm_supported_msrs(CPUState *env) free(kvm_msr_list); } -return; -} - -static int kvm_has_msr_hsave_pa(CPUState *env) -{ -kvm_supported_msrs(env); -return has_msr_hsave_pa; -} - -static int kvm_has_msr_star(CPUState *env) -{ -kvm_supported_msrs(env); -return has_msr_star; +return ret; } static int kvm_init_identity_map_page(KVMState *s) @@ -543,9 +530,13 @@ static int kvm_init_identity_map_page(KVMState *s) int kvm_arch_init(KVMState *s, int smp_cpus) { int ret; - struct utsname utsname; +ret = kvm_get_supported_msrs(s); +if (ret < 0) { +return ret; +} + uname(&utsname); lm_capable_kernel = strcmp(utsname.machine, "x86_64") == 0; @@ -830,10 +821,10 @@ static int kvm_put_msrs(CPUState *env, int level) kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs); kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp); kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip); -if (kvm_has_msr_star(env)) { +if (has_msr_star) { kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star); } -if (kvm_has_msr_hsave_pa(env)) { +if (has_msr_hsave_pa) { kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave); } #ifdef TARGET_X86_64 @@ -1076,10 +1067,10 @@ static int kvm_get_msrs(CPUState *env) msrs[n++].index = MSR_IA32_SYSENTER_CS; msrs[n++].index = MSR_IA32_SYSENTER_ESP; msrs[n++].index = MSR_IA32_SYSENTER_EIP; -if (kvm_has_msr_star(env)) { +if (has_msr_star) { msrs[n++].index = MSR_STAR; } -if (kvm_has_msr_hsave_pa(env)) { +if (has_msr_hsave_pa) { msrs[n++].index = MSR_VM_HSAVE_PA; } msrs[n++].index = MSR_IA32_TSC; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/18] [uq/master] Rebased patch queue, part I
In order to make progress with flushing my kvm-upstream queue without overloading the channels (38 further patches are pending), here comes part I against updated uq/master. Changes in this part compared to last postings: - Dropped "kvm: Drop return value of kvm_cpu_exec", we will actually need it later on. - Additional patch to swallow KVM_EXIT_SET_TPR (required now that we watch out for unknown exits). - Postponed MCE bits, they will follow later as part of a complete rework. CC: Glauber Costa Jan Kiszka (18): kvm: x86: Swallow KVM_EXIT_SET_TPR kvm: Stop on all fatal exit reasons kvm: Improve reporting of fatal errors x86: Optionally dump code bytes on cpu_dump_state kvm: x86: Align kvm_arch_put_registers code with comment kvm: x86: Prepare kvm_get_mp_state for in-kernel irqchip kvm: x86: Remove redundant mp_state initialization kvm: x86: Fix xcr0 reset mismerge kvm: x86: Refactor msr_star/hsave_pa setup and checks kvm: x86: Reset paravirtual MSRs kvm: x86: Fix !CONFIG_KVM_PARA build kvm: Drop smp_cpus argument from init functions kvm: Consolidate must-have capability checks kvm: x86: Rework identity map and TSS setup for larger BIOS sizes kvm: Flush coalesced mmio buffer on IO window exits kvm: Do not use qemu_fair_mutex kvm: x86: Implicitly clear nmi_injected/pending on reset kvm: x86: Only read/write MSR_KVM_ASYNC_PF_EN if supported configure| 39 ++--- cpu-all.h|2 + cpus.c |2 - kvm-all.c| 108 +++- kvm-stub.c |2 +- kvm.h| 14 +++- target-i386/cpu.h|8 ++- target-i386/cpuid.c |5 +- target-i386/helper.c | 21 + target-i386/kvm.c| 227 +++--- target-ppc/kvm.c | 10 ++- target-s390x/kvm.c |6 +- vl.c |2 +- 13 files changed, 256 insertions(+), 190 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/18] kvm: Drop smp_cpus argument from init functions
From: Jan Kiszka No longer used. Signed-off-by: Jan Kiszka --- kvm-all.c |4 ++-- kvm-stub.c |2 +- kvm.h |4 ++-- target-i386/kvm.c |2 +- target-ppc/kvm.c |2 +- target-s390x/kvm.c |2 +- vl.c |2 +- 7 files changed, 9 insertions(+), 9 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 41decde..8053f92 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -636,7 +636,7 @@ static CPUPhysMemoryClient kvm_cpu_phys_memory_client = { .migration_log = kvm_client_migration_log, }; -int kvm_init(int smp_cpus) +int kvm_init(void) { static const char upgrade_note[] = "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n" @@ -749,7 +749,7 @@ int kvm_init(int smp_cpus) s->xcrs = kvm_check_extension(s, KVM_CAP_XCRS); #endif -ret = kvm_arch_init(s, smp_cpus); +ret = kvm_arch_init(s); if (ret < 0) { goto err; } diff --git a/kvm-stub.c b/kvm-stub.c index 33d4476..88682f2 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -58,7 +58,7 @@ int kvm_check_extension(KVMState *s, unsigned int extension) return 0; } -int kvm_init(int smp_cpus) +int kvm_init(void) { return -ENOSYS; } diff --git a/kvm.h b/kvm.h index ce08d42..a971752 100644 --- a/kvm.h +++ b/kvm.h @@ -34,7 +34,7 @@ struct kvm_run; /* external API */ -int kvm_init(int smp_cpus); +int kvm_init(void); int kvm_has_sync_mmu(void); int kvm_has_vcpu_events(void); @@ -105,7 +105,7 @@ int kvm_arch_get_registers(CPUState *env); int kvm_arch_put_registers(CPUState *env, int level); -int kvm_arch_init(KVMState *s, int smp_cpus); +int kvm_arch_init(KVMState *s); int kvm_arch_init_vcpu(CPUState *env); diff --git a/target-i386/kvm.c b/target-i386/kvm.c index feaf33d..016b67d 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -527,7 +527,7 @@ static int kvm_init_identity_map_page(KVMState *s) return 0; } -int kvm_arch_init(KVMState *s, int smp_cpus) +int kvm_arch_init(KVMState *s) { int ret; struct utsname utsname; diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 849b404..3c05630 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -56,7 +56,7 @@ static void kvm_kick_env(void *env) qemu_cpu_kick(env); } -int kvm_arch_init(KVMState *s, int smp_cpus) +int kvm_arch_init(KVMState *s) { #ifdef KVM_CAP_PPC_UNSET_IRQ cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ); diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c index adf4a9e..b177e10 100644 --- a/target-s390x/kvm.c +++ b/target-s390x/kvm.c @@ -70,7 +70,7 @@ #define SCLP_CMDW_READ_SCP_INFO 0x00020001 #define SCLP_CMDW_READ_SCP_INFO_FORCED 0x00120001 -int kvm_arch_init(KVMState *s, int smp_cpus) +int kvm_arch_init(KVMState *s) { return 0; } diff --git a/vl.c b/vl.c index 0292184..33f844f 100644 --- a/vl.c +++ b/vl.c @@ -2836,7 +2836,7 @@ int main(int argc, char **argv, char **envp) } if (kvm_allowed) { -int ret = kvm_init(smp_cpus); +int ret = kvm_init(); if (ret < 0) { if (!kvm_available()) { printf("KVM not supported for this target\n"); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/18] kvm: x86: Fix xcr0 reset mismerge
From: Jan Kiszka For unknown reasons, xcr0 reset ended up in kvm_arch_update_guest_debug on upstream merge. Fix this and also remove the misleading comment (1 is THE reset value). Signed-off-by: Jan Kiszka --- target-i386/kvm.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 07c75c0..c4a22dd 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -450,6 +450,7 @@ void kvm_arch_reset_vcpu(CPUState *env) env->interrupt_injected = -1; env->nmi_injected = 0; env->nmi_pending = 0; +env->xcr0 = 1; if (kvm_irqchip_in_kernel()) { env->mp_state = cpu_is_bsp(env) ? KVM_MP_STATE_RUNNABLE : KVM_MP_STATE_UNINITIALIZED; @@ -1759,8 +1760,6 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg) ((uint32_t)len_code[hw_breakpoint[n].len] << (18 + n*4)); } } -/* Legal xcr0 for loading */ -env->xcr0 = 1; } #endif /* KVM_CAP_SET_GUEST_DEBUG */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/18] kvm: x86: Rework identity map and TSS setup for larger BIOS sizes
From: Jan Kiszka In order to support loading BIOSes > 256K, reorder the code, adjusting the base if the kernel supports moving the identity map. Signed-off-by: Jan Kiszka --- target-i386/kvm.c | 63 +--- 1 files changed, 30 insertions(+), 33 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 1db8227..72f9fdf 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -493,27 +493,9 @@ static int kvm_get_supported_msrs(KVMState *s) return ret; } -static int kvm_init_identity_map_page(KVMState *s) -{ -#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR -int ret; -uint64_t addr = 0xfffbc000; - -if (!kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) { -return 0; -} - -ret = kvm_vm_ioctl(s, KVM_SET_IDENTITY_MAP_ADDR, &addr); -if (ret < 0) { -fprintf(stderr, "kvm_set_identity_map_addr: %s\n", strerror(ret)); -return ret; -} -#endif -return 0; -} - int kvm_arch_init(KVMState *s) { +uint64_t identity_base = 0xfffbc000; int ret; struct utsname utsname; @@ -525,27 +507,42 @@ int kvm_arch_init(KVMState *s) uname(&utsname); lm_capable_kernel = strcmp(utsname.machine, "x86_64") == 0; -/* create vm86 tss. KVM uses vm86 mode to emulate 16-bit code - * directly. In order to use vm86 mode, a TSS is needed. Since this - * must be part of guest physical memory, we need to allocate it. */ - -/* this address is 3 pages before the bios, and the bios should present - * as unavaible memory. FIXME, need to ensure the e820 map deals with - * this? - */ /* - * Tell fw_cfg to notify the BIOS to reserve the range. + * On older Intel CPUs, KVM uses vm86 mode to emulate 16-bit code directly. + * In order to use vm86 mode, an EPT identity map and a TSS are needed. + * Since these must be part of guest physical memory, we need to allocate + * them, both by setting their start addresses in the kernel and by + * creating a corresponding e820 entry. We need 4 pages before the BIOS. + * + * Older KVM versions may not support setting the identity map base. In + * that case we need to stick with the default, i.e. a 256K maximum BIOS + * size. */ -if (e820_add_entry(0xfffbc000, 0x4000, E820_RESERVED) < 0) { -perror("e820_add_entry() table is full"); -exit(1); +#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR +if (kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) { +/* Allows up to 16M BIOSes. */ +identity_base = 0xfeffc000; + +ret = kvm_vm_ioctl(s, KVM_SET_IDENTITY_MAP_ADDR, &identity_base); +if (ret < 0) { +return ret; +} } -ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000); +#endif +/* Set TSS base one page after EPT identity map. */ +ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, identity_base + 0x1000); +if (ret < 0) { +return ret; +} + +/* Tell fw_cfg to notify the BIOS to reserve the range. */ +ret = e820_add_entry(identity_base, 0x4000, E820_RESERVED); if (ret < 0) { +fprintf(stderr, "e820_add_entry() table is full\n"); return ret; } -return kvm_init_identity_map_page(s); +return 0; } static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/18] x86: Optionally dump code bytes on cpu_dump_state
From: Jan Kiszka Introduce the cpu_dump_state flag CPU_DUMP_CODE and implement it for x86. This writes out the code bytes around the current instruction pointer. Make use of this feature in KVM to help debugging fatal vm exits. Signed-off-by: Jan Kiszka --- cpu-all.h|2 ++ kvm-all.c|4 ++-- target-i386/helper.c | 21 + 3 files changed, 25 insertions(+), 2 deletions(-) diff --git a/cpu-all.h b/cpu-all.h index 4ce4e83..ffbd6a4 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -765,6 +765,8 @@ int page_check_range(target_ulong start, target_ulong len, int flags); CPUState *cpu_copy(CPUState *env); CPUState *qemu_get_cpu(int cpu); +#define CPU_DUMP_CODE 0x0001 + void cpu_dump_state(CPUState *env, FILE *f, fprintf_function cpu_fprintf, int flags); void cpu_dump_statistics(CPUState *env, FILE *f, fprintf_function cpu_fprintf, diff --git a/kvm-all.c b/kvm-all.c index 10e1194..41decde 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -832,7 +832,7 @@ static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run) if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) { fprintf(stderr, "emulation failure\n"); if (!kvm_arch_stop_on_emulation_error(env)) { -cpu_dump_state(env, stderr, fprintf, 0); +cpu_dump_state(env, stderr, fprintf, CPU_DUMP_CODE); return 0; } } @@ -994,7 +994,7 @@ int kvm_cpu_exec(CPUState *env) } while (ret > 0); if (ret < 0) { -cpu_dump_state(env, stderr, fprintf, 0); +cpu_dump_state(env, stderr, fprintf, CPU_DUMP_CODE); vm_stop(0); env->exit_request = 1; } diff --git a/target-i386/helper.c b/target-i386/helper.c index 6dfa27d..1217452 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -249,6 +249,9 @@ done: cpu_fprintf(f, "\n"); } +#define DUMP_CODE_BYTES_TOTAL50 +#define DUMP_CODE_BYTES_BACKWARD 20 + void cpu_dump_state(CPUState *env, FILE *f, fprintf_function cpu_fprintf, int flags) { @@ -434,6 +437,24 @@ void cpu_dump_state(CPUState *env, FILE *f, fprintf_function cpu_fprintf, cpu_fprintf(f, " "); } } +if (flags & CPU_DUMP_CODE) { +target_ulong base = env->segs[R_CS].base + env->eip; +target_ulong offs = MIN(env->eip, DUMP_CODE_BYTES_BACKWARD); +uint8_t code; +char codestr[3]; + +cpu_fprintf(f, "Code="); +for (i = 0; i < DUMP_CODE_BYTES_TOTAL; i++) { +if (cpu_memory_rw_debug(env, base - offs + i, &code, 1, 0) == 0) { +snprintf(codestr, sizeof(codestr), "%02x", code); +} else { +snprintf(codestr, sizeof(codestr), "??"); +} +cpu_fprintf(f, "%s%s%s%s", i > 0 ? " " : "", +i == offs ? "<" : "", codestr, i == offs ? ">" : ""); +} +cpu_fprintf(f, "\n"); +} } /***/ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/18] kvm: Stop on all fatal exit reasons
From: Jan Kiszka Ensure that we stop the guest whenever we face a fatal or unknown exit reason. If we stop, we also have to enforce a cpu loop exit. Signed-off-by: Jan Kiszka --- kvm-all.c | 15 +++ target-i386/kvm.c |4 target-ppc/kvm.c |4 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 86ddbd6..eaf9272 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -815,7 +815,7 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size, } #ifdef KVM_CAP_INTERNAL_ERROR_DATA -static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run) +static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run) { if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) { @@ -833,13 +833,13 @@ static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run) if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) { fprintf(stderr, "emulation failure\n"); if (!kvm_arch_stop_on_emulation_error(env)) { -return; +return 0; } } /* FIXME: Should trigger a qmp message to let management know * something went wrong. */ -vm_stop(0); +return -1; } #endif @@ -967,16 +967,19 @@ int kvm_cpu_exec(CPUState *env) break; case KVM_EXIT_UNKNOWN: DPRINTF("kvm_exit_unknown\n"); +ret = -1; break; case KVM_EXIT_FAIL_ENTRY: DPRINTF("kvm_exit_fail_entry\n"); +ret = -1; break; case KVM_EXIT_EXCEPTION: DPRINTF("kvm_exit_exception\n"); +ret = -1; break; #ifdef KVM_CAP_INTERNAL_ERROR_DATA case KVM_EXIT_INTERNAL_ERROR: -kvm_handle_internal_error(env, run); +ret = kvm_handle_internal_error(env, run); break; #endif case KVM_EXIT_DEBUG: @@ -997,6 +1000,10 @@ int kvm_cpu_exec(CPUState *env) } } while (ret > 0); +if (ret < 0) { +vm_stop(0); +env->exit_request = 1; +} if (env->exit_request) { env->exit_request = 0; env->exception_index = EXCP_INTERRUPT; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 0aeb079..6b4abaa 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1537,6 +1537,10 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run) case KVM_EXIT_SET_TPR: ret = 1; break; +default: +fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason); +ret = -1; +break; } return ret; diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 5caa07c..849b404 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -307,6 +307,10 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run) dprintf("handle halt\n"); ret = kvmppc_handle_halt(env); break; +default: +fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason); +ret = -1; +break; } return ret; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/18] kvm: x86: Remove redundant mp_state initialization
From: Jan Kiszka kvm_arch_reset_vcpu initializes mp_state, and that function is invoked right after kvm_arch_init_vcpu. Signed-off-by: Jan Kiszka --- target-i386/kvm.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 531b69e..07c75c0 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -321,8 +321,6 @@ int kvm_arch_init_vcpu(CPUState *env) uint32_t signature[3]; #endif -env->mp_state = KVM_MP_STATE_RUNNABLE; - env->cpuid_features &= kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX); i = env->cpuid_ext_features & CPUID_EXT_HYPERVISOR; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/18] kvm: x86: Align kvm_arch_put_registers code with comment
From: Jan Kiszka The ordering doesn't matter in this case, but better keep it consistent. Signed-off-by: Jan Kiszka --- target-i386/kvm.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 0ba13fc..9bb34ab 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1388,12 +1388,12 @@ int kvm_arch_put_registers(CPUState *env, int level) if (ret < 0) { return ret; } -/* must be last */ -ret = kvm_guest_debug_workarounds(env); +ret = kvm_put_debugregs(env); if (ret < 0) { return ret; } -ret = kvm_put_debugregs(env); +/* must be last */ +ret = kvm_guest_debug_workarounds(env); if (ret < 0) { return ret; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/18] kvm: x86: Reset paravirtual MSRs
From: Jan Kiszka Make sure to write the cleared MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, and MSR_KVM_ASYNC_PF_EN to the kernel state so that a freshly booted guest cannot be disturbed by old values. Signed-off-by: Jan Kiszka CC: Glauber Costa --- target-i386/kvm.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 454ddb1..825af42 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -845,6 +845,13 @@ static int kvm_put_msrs(CPUState *env, int level) if (smp_cpus == 1 || env->tsc != 0) { kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc); } +} +/* + * The following paravirtual MSRs have side effects on the guest or are + * too heavy for normal writeback. Limit them to reset or full state + * updates. + */ +if (level >= KVM_PUT_RESET_STATE) { kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/18] kvm: x86: Prepare kvm_get_mp_state for in-kernel irqchip
From: Jan Kiszka This code path will not yet be taken as we still lack in-kernel irqchip support. But qemu-kvm can already make use of it and drop its own mp_state access services. Signed-off-by: Jan Kiszka --- target-i386/kvm.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 9bb34ab..531b69e 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1198,6 +1198,9 @@ static int kvm_get_mp_state(CPUState *env) return ret; } env->mp_state = mp_state.mp_state; +if (kvm_irqchip_in_kernel()) { +env->halted = (mp_state.mp_state == KVM_MP_STATE_HALTED); +} return 0; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/18] kvm: x86: Implicitly clear nmi_injected/pending on reset
From: Jan Kiszka All CPUX86State variables before CPU_COMMON are automatically cleared on reset. Reorder nmi_injected and nmi_pending to avoid having to touch them explicitly. Signed-off-by: Jan Kiszka --- target-i386/cpu.h |6 -- target-i386/kvm.c |2 -- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/target-i386/cpu.h b/target-i386/cpu.h index a457423..af701a4 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -699,6 +699,10 @@ typedef struct CPUX86State { uint32_t smbase; int old_exception; /* exception in flight */ +/* KVM states, automatically cleared on reset */ +uint8_t nmi_injected; +uint8_t nmi_pending; + CPU_COMMON /* processor features (e.g. for CPUID insn) */ @@ -726,8 +730,6 @@ typedef struct CPUX86State { int32_t exception_injected; int32_t interrupt_injected; uint8_t soft_interrupt; -uint8_t nmi_injected; -uint8_t nmi_pending; uint8_t has_error_code; uint32_t sipi_vector; uint32_t cpuid_kvm_features; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 72f9fdf..b2c5ee0 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -435,8 +435,6 @@ void kvm_arch_reset_vcpu(CPUState *env) { env->exception_injected = -1; env->interrupt_injected = -1; -env->nmi_injected = 0; -env->nmi_pending = 0; env->xcr0 = 1; if (kvm_irqchip_in_kernel()) { env->mp_state = cpu_is_bsp(env) ? KVM_MP_STATE_RUNNABLE : -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/18] kvm: Consolidate must-have capability checks
From: Jan Kiszka Instead of splattering the code with #ifdefs and runtime checks for capabilities we cannot work without anyway, provide central test infrastructure for verifying their availability both at build and runtime. Signed-off-by: Jan Kiszka --- configure | 39 -- kvm-all.c | 67 +--- kvm.h | 10 +++ target-i386/kvm.c | 39 ++ target-ppc/kvm.c |4 +++ target-s390x/kvm.c |4 +++ 6 files changed, 79 insertions(+), 84 deletions(-) diff --git a/configure b/configure index 9a02d1f..4673bf0 100755 --- a/configure +++ b/configure @@ -1662,18 +1662,31 @@ if test "$kvm" != "no" ; then #if !defined(KVM_API_VERSION) || KVM_API_VERSION < 12 || KVM_API_VERSION > 12 #error Invalid KVM version #endif -#if !defined(KVM_CAP_USER_MEMORY) -#error Missing KVM capability KVM_CAP_USER_MEMORY -#endif -#if !defined(KVM_CAP_SET_TSS_ADDR) -#error Missing KVM capability KVM_CAP_SET_TSS_ADDR -#endif -#if !defined(KVM_CAP_DESTROY_MEMORY_REGION_WORKS) -#error Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS -#endif -#if !defined(KVM_CAP_USER_NMI) -#error Missing KVM capability KVM_CAP_USER_NMI +EOF +must_have_caps="KVM_CAP_USER_MEMORY \ +KVM_CAP_DESTROY_MEMORY_REGION_WORKS \ +KVM_CAP_COALESCED_MMIO \ +KVM_CAP_SYNC_MMU \ + " +if test \( "$cpu" = "i386" -o "$cpu" = "x86_64" \) ; then + must_have_caps="$caps \ + KVM_CAP_SET_TSS_ADDR \ + KVM_CAP_EXT_CPUID \ + KVM_CAP_CLOCKSOURCE \ + KVM_CAP_NOP_IO_DELAY \ + KVM_CAP_PV_MMU \ + KVM_CAP_MP_STATE \ + KVM_CAP_USER_NMI \ + " +fi +for c in $must_have_caps ; do + cat >> $TMPC <> $TMPC <1) printf(", "); printf("%s",$2);}'` if test "$kvmerr" != "" ; then echo -e "${kvmerr}\n\ - NOTE: To enable KVM support, update your kernel to 2.6.29+ or install \ - recent kvm-kmod from http://sourceforge.net/projects/kvm."; +NOTE: To enable KVM support, update your kernel to 2.6.29+ or install \ +recent kvm-kmod from http://sourceforge.net/projects/kvm."; fi fi feature_not_found "kvm" diff --git a/kvm-all.c b/kvm-all.c index 8053f92..3a1f63b 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -63,9 +63,7 @@ struct KVMState int fd; int vmfd; int coalesced_mmio; -#ifdef KVM_CAP_COALESCED_MMIO struct kvm_coalesced_mmio_ring *coalesced_mmio_ring; -#endif int broken_set_mem_region; int migration_log; int vcpu_events; @@ -82,6 +80,12 @@ struct KVMState static KVMState *kvm_state; +static const KVMCapabilityInfo kvm_required_capabilites[] = { +KVM_CAP_INFO(USER_MEMORY), +KVM_CAP_INFO(DESTROY_MEMORY_REGION_WORKS), +KVM_CAP_LAST_INFO +}; + static KVMSlot *kvm_alloc_slot(KVMState *s) { int i; @@ -227,12 +231,10 @@ int kvm_init_vcpu(CPUState *env) goto err; } -#ifdef KVM_CAP_COALESCED_MMIO if (s->coalesced_mmio && !s->coalesced_mmio_ring) { s->coalesced_mmio_ring = (void *)env->kvm_run + s->coalesced_mmio * PAGE_SIZE; } -#endif ret = kvm_arch_init_vcpu(env); if (ret == 0) { @@ -401,7 +403,6 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size) { int ret = -ENOSYS; -#ifdef KVM_CAP_COALESCED_MMIO KVMState *s = kvm_state; if (s->coalesced_mmio) { @@ -412,7 +413,6 @@ int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size) ret = kvm_vm_ioctl(s, KVM_REGISTER_COALESCED_MMIO, &zone); } -#endif return ret; } @@ -420,7 +420,6 @@ int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size) int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size) { int ret = -ENOSYS; -#ifdef KVM_CAP_COALESCED_MMIO KVMState *s = kvm_state; if (s->coalesced_mmio) { @@ -431,7 +430,6 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size) ret = kvm_vm_ioctl(s, KVM_UNREGISTER_COALESCED_MMIO, &zone); } -#endif return ret; } @@ -481,6 +479,18 @@ static int kvm_check_many_ioeventfds(void) #endif } +static const KVMCapabilityInfo * +kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list) +{ +while (list->name) { +if (!kvm_check_extension(s, list->value)) { +return list; +} +list++; +} +return NULL; +} + static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size, ram_addr_t phys_offset) { @@ -642,6 +652,7 @@ int kvm_init(void) "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
[PATCH 18/18] kvm: x86: Only read/write MSR_KVM_ASYNC_PF_EN if supported
From: Jan Kiszka If the kernel does not support KVM_CAP_ASYNC_PF, it also does not know about the related MSR. So skip it during state synchronization in that case. Fixes annoying kernel warnings. Signed-off-by: Jan Kiszka --- target-i386/kvm.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index b2c5ee0..8e8880a 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -63,6 +63,9 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = { static bool has_msr_star; static bool has_msr_hsave_pa; +#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF) +static bool has_msr_async_pf_en; +#endif static int lm_capable_kernel; static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max) @@ -164,6 +167,7 @@ static int get_para_features(CPUState *env) features |= (1 << para_features[i].feature); } } +has_msr_async_pf_en = features & (1 << KVM_FEATURE_ASYNC_PF); return features; } #endif @@ -828,7 +832,10 @@ static int kvm_put_msrs(CPUState *env, int level) env->system_time_msr); kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); #if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF) -kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr); +if (has_msr_async_pf_en) { +kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, + env->async_pf_en_msr); +} #endif } #ifdef KVM_CAP_MCE @@ -1064,7 +1071,9 @@ static int kvm_get_msrs(CPUState *env) msrs[n++].index = MSR_KVM_SYSTEM_TIME; msrs[n++].index = MSR_KVM_WALL_CLOCK; #if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF) -msrs[n++].index = MSR_KVM_ASYNC_PF_EN; +if (has_msr_async_pf_en) { +msrs[n++].index = MSR_KVM_ASYNC_PF_EN; +} #endif #ifdef KVM_CAP_MCE -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/18] kvm: Flush coalesced mmio buffer on IO window exits
From: Jan Kiszka We must flush pending mmio writes if we leave kvm_cpu_exec for an IO window. Otherwise we risk to loose those requests when migrating to a different host during that window. Signed-off-by: Jan Kiszka --- kvm-all.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 3a1f63b..9976762 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -918,6 +918,8 @@ int kvm_cpu_exec(CPUState *env) cpu_single_env = env; kvm_arch_post_run(env, run); +kvm_flush_coalesced_mmio_buffer(); + if (ret == -EINTR || ret == -EAGAIN) { cpu_exit(env); DPRINTF("io window exit\n"); @@ -930,8 +932,6 @@ int kvm_cpu_exec(CPUState *env) abort(); } -kvm_flush_coalesced_mmio_buffer(); - ret = 0; /* exit loop */ switch (run->exit_reason) { case KVM_EXIT_IO: -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/18] kvm: Do not use qemu_fair_mutex
From: Jan Kiszka The imbalance in the hold time of qemu_global_mutex only exists in TCG mode. In contrast to TCG VCPUs, KVM drops the global lock during guest execution. We already avoid touching the fairness lock from the IO-thread in KVM mode, so also stop using it from the VCPU threads. Signed-off-by: Jan Kiszka --- cpus.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/cpus.c b/cpus.c index 0309189..4c9928e 100644 --- a/cpus.c +++ b/cpus.c @@ -735,9 +735,7 @@ static sigset_t block_io_signals(void) void qemu_mutex_lock_iothread(void) { if (kvm_enabled()) { -qemu_mutex_lock(&qemu_fair_mutex); qemu_mutex_lock(&qemu_global_mutex); -qemu_mutex_unlock(&qemu_fair_mutex); } else { qemu_mutex_lock(&qemu_fair_mutex); if (qemu_mutex_trylock(&qemu_global_mutex)) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Fri, Jan 21, 2011 at 6:17 PM, Jan Kiszka wrote: > On 2011-01-21 19:04, Blue Swirl wrote: >> On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka wrote: >>> On 2011-01-21 17:37, Blue Swirl wrote: On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann wrote: > Hi, > >> By the way, we don't have a QEMUState but instead use globals. > > /me wants to underline this. > > IMO it is absolutely pointless to worry about ways to pass around > kvm_state. > There never ever will be a serious need for that. > > We can stick with the current model of keeping global state in global > variables. And just do the same with kvm_state. > > Or we can move to have all state in a QEMUState struct which we'll pass > around basically everywhere. Then we can simply embed or reference > kvm_state there. > > I'd tend to stick with the global variables as I don't see the point in > having a QEMUstate. I doubt we'll ever see two virtual machines driven > by a > single qemu process. YMMV. Global variables are signs of a poor design. >>> >>> s/are/can be/. >>> QEMUState would not help that, instead more specific structures should be designed, much like what I've proposed for KVMState. Some of these new structures should be even passed around when it makes sense. But I'd not start kvm_state redesign around global variables or QEMUState. >>> >>> We do not need to move individual fields yet, but we need to define >>> classes of fields and strategies how to deal with them long-term. Then >>> we can move forward, and that already in the right direction. >> >> Excellent plan. >> >>> Obvious classes are >>> - static host capabilities and means for the KVM core to query them >> >> OK. There could be other host capabilities here in the future too, >> like Xen. I don't think there are any Xen capabilities ATM though but >> IIRC some recently sent patches had something like those. >> >>> - per-VM fields >> >> What is per-VM which is not machine or CPU architecture specific? > > I think it would suffice for a first step to consider all per-VM fields > as independent of CPU architecture or machine type. I'm afraid that would not be progress. >>> - fields related to memory management >> >> OK. >> >> I'd add fourth possible class: >> - device, CPU and machine configuration, like nographic, >> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also >> irqchip_in_kernel could fit here, though it obviously depends on a >> host capability too. > > I would count everything that cannot be assigned to a concrete device > upfront to the dynamic state of a machine, thus class 2. The point is, > (potentially) every device of that machine requires access to it, just > like (indirectly, via the KVM core services) to some KVM VM state bits. The machine class should not be a catch-all, it would be like QEMUState or KVMState then. Perhaps each field or variable should be listed and given more thought. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 2011-01-21 19:04, Blue Swirl wrote: > On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka wrote: >> On 2011-01-21 17:37, Blue Swirl wrote: >>> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann wrote: Hi, > By the way, we don't have a QEMUState but instead use globals. /me wants to underline this. IMO it is absolutely pointless to worry about ways to pass around kvm_state. There never ever will be a serious need for that. We can stick with the current model of keeping global state in global variables. And just do the same with kvm_state. Or we can move to have all state in a QEMUState struct which we'll pass around basically everywhere. Then we can simply embed or reference kvm_state there. I'd tend to stick with the global variables as I don't see the point in having a QEMUstate. I doubt we'll ever see two virtual machines driven by a single qemu process. YMMV. >>> >>> Global variables are signs of a poor design. >> >> s/are/can be/. >> >>> QEMUState would not help >>> that, instead more specific structures should be designed, much like >>> what I've proposed for KVMState. Some of these new structures should >>> be even passed around when it makes sense. >>> >>> But I'd not start kvm_state redesign around global variables or QEMUState. >> >> We do not need to move individual fields yet, but we need to define >> classes of fields and strategies how to deal with them long-term. Then >> we can move forward, and that already in the right direction. > > Excellent plan. > >> Obvious classes are >> - static host capabilities and means for the KVM core to query them > > OK. There could be other host capabilities here in the future too, > like Xen. I don't think there are any Xen capabilities ATM though but > IIRC some recently sent patches had something like those. > >> - per-VM fields > > What is per-VM which is not machine or CPU architecture specific? I think it would suffice for a first step to consider all per-VM fields as independent of CPU architecture or machine type. > >> - fields related to memory management > > OK. > > I'd add fourth possible class: > - device, CPU and machine configuration, like nographic, > win2k_install_hack, no_hpet, smp_cpus etc. Maybe also > irqchip_in_kernel could fit here, though it obviously depends on a > host capability too. I would count everything that cannot be assigned to a concrete device upfront to the dynamic state of a machine, thus class 2. The point is, (potentially) every device of that machine requires access to it, just like (indirectly, via the KVM core services) to some KVM VM state bits. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Flow Control and Port Mirroring Revisited
I have constructed a test where I run an un-paced UDP_STREAM test in one guest and a paced omni rr test in another guest at the same time. Hmm, what is this supposed to measure? Basically each time you run an un-paced UDP_STREAM you get some random load on the network. Well, if the netperf is (effectively) pinned to a given CPU, presumably it would be trying to generate UDP datagrams at the same rate each time. Indeed though, no guarantee that rate would consistently get through each time. But then, that is where one can use the confidence intervals options to get an idea by how much the rate varied. rick jones -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka wrote: > On 2011-01-21 17:37, Blue Swirl wrote: >> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann wrote: >>> Hi, >>> By the way, we don't have a QEMUState but instead use globals. >>> >>> /me wants to underline this. >>> >>> IMO it is absolutely pointless to worry about ways to pass around kvm_state. >>> There never ever will be a serious need for that. >>> >>> We can stick with the current model of keeping global state in global >>> variables. And just do the same with kvm_state. >>> >>> Or we can move to have all state in a QEMUState struct which we'll pass >>> around basically everywhere. Then we can simply embed or reference >>> kvm_state there. >>> >>> I'd tend to stick with the global variables as I don't see the point in >>> having a QEMUstate. I doubt we'll ever see two virtual machines driven by a >>> single qemu process. YMMV. >> >> Global variables are signs of a poor design. > > s/are/can be/. > >> QEMUState would not help >> that, instead more specific structures should be designed, much like >> what I've proposed for KVMState. Some of these new structures should >> be even passed around when it makes sense. >> >> But I'd not start kvm_state redesign around global variables or QEMUState. > > We do not need to move individual fields yet, but we need to define > classes of fields and strategies how to deal with them long-term. Then > we can move forward, and that already in the right direction. Excellent plan. > Obvious classes are > - static host capabilities and means for the KVM core to query them OK. There could be other host capabilities here in the future too, like Xen. I don't think there are any Xen capabilities ATM though but IIRC some recently sent patches had something like those. > - per-VM fields What is per-VM which is not machine or CPU architecture specific? > - fields related to memory management OK. I'd add fourth possible class: - device, CPU and machine configuration, like nographic, win2k_install_hack, no_hpet, smp_cpus etc. Maybe also irqchip_in_kernel could fit here, though it obviously depends on a host capability too. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 2011-01-21 17:37, Blue Swirl wrote: > On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann wrote: >> Hi, >> >>> By the way, we don't have a QEMUState but instead use globals. >> >> /me wants to underline this. >> >> IMO it is absolutely pointless to worry about ways to pass around kvm_state. >> There never ever will be a serious need for that. >> >> We can stick with the current model of keeping global state in global >> variables. And just do the same with kvm_state. >> >> Or we can move to have all state in a QEMUState struct which we'll pass >> around basically everywhere. Then we can simply embed or reference >> kvm_state there. >> >> I'd tend to stick with the global variables as I don't see the point in >> having a QEMUstate. I doubt we'll ever see two virtual machines driven by a >> single qemu process. YMMV. > > Global variables are signs of a poor design. s/are/can be/. > QEMUState would not help > that, instead more specific structures should be designed, much like > what I've proposed for KVMState. Some of these new structures should > be even passed around when it makes sense. > > But I'd not start kvm_state redesign around global variables or QEMUState. We do not need to move individual fields yet, but we need to define classes of fields and strategies how to deal with them long-term. Then we can move forward, and that already in the right direction. Obvious classes are - static host capabilities and means for the KVM core to query them - per-VM fields - fields related to memory management And we now need at least a plan for the second class to proceed with the actual job. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann wrote: > Hi, > >> By the way, we don't have a QEMUState but instead use globals. > > /me wants to underline this. > > IMO it is absolutely pointless to worry about ways to pass around kvm_state. > There never ever will be a serious need for that. > > We can stick with the current model of keeping global state in global > variables. And just do the same with kvm_state. > > Or we can move to have all state in a QEMUState struct which we'll pass > around basically everywhere. Then we can simply embed or reference > kvm_state there. > > I'd tend to stick with the global variables as I don't see the point in > having a QEMUstate. I doubt we'll ever see two virtual machines driven by a > single qemu process. YMMV. Global variables are signs of a poor design. QEMUState would not help that, instead more specific structures should be designed, much like what I've proposed for KVMState. Some of these new structures should be even passed around when it makes sense. But I'd not start kvm_state redesign around global variables or QEMUState. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REPOST] [PATCH 3/3] Provide control over unmapped pages (v3)
On Fri, 21 Jan 2011, Balbir Singh wrote: > * Christoph Lameter [2011-01-20 09:00:09]: > > > On Thu, 20 Jan 2011, Balbir Singh wrote: > > > > > + unmapped_page_control > > > + [KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL > > > + is enabled. It controls the amount of unmapped memory > > > + that is present in the system. This boot option plus > > > + vm.min_unmapped_ratio (sysctl) provide granular control > > > > min_unmapped_ratio is there to guarantee that zone reclaim does not > > reclaim all unmapped pages. > > > > What you want here is a max_unmapped_ratio. > > > > I thought about that, the logic for reusing min_unmapped_ratio was to > keep a limit beyond which unmapped page cache shrinking should stop. Right. That is the role of it. Its a minimum to leave. You want a maximum size of the pagte cache. > I think you are suggesting max_unmapped_ratio as the point at which > shrinking should begin, right? The role of min_unmapped_ratio is to never reclaim more pagecache if we reach that ratio even if we have to go off node for an allocation. AFAICT What you propose is a maximum size of the page cache. If the number of page cache pages goes beyond that then you trim the page cache in background reclaim. > > > + reclaim_unmapped_pages(priority, zone, &sc); > > > + > > > if (!zone_watermark_ok_safe(zone, order, > > > > H. Okay that means background reclaim does it. If so then we also want > > zone reclaim to be able to work in the background I think. > > Anything specific you had in mind, works for me in testing, but is > there anything specific that stands out in your mind that needs to be > done? Hmmm. So this would also work in a NUMA configuration, right. Limiting the sizes of the page cache would avoid zone reclaim through these limit. Page cache size would be limited by the max_unmapped_ratio. zone_reclaim only would come into play if other allocations make the memory on the node so tight that we would have to evict more page cache pages in direct reclaim. Then zone_reclaim could go down to shrink the page cache size to min_unmapped_ratio. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
On 01/21/2011 09:02 AM, Srivatsa Vaddagiri wrote: On Thu, Jan 20, 2011 at 09:56:27AM -0800, Jeremy Fitzhardinge wrote: The key here is not to sleep when waiting for locks (as implemented by current patch-series, which can put other VMs at an advantage by giving them more time than they are entitled to) Why? If a VCPU can't make progress because its waiting for some resource, then why not schedule something else instead? In the process, "something else" can get more share of cpu resource than its entitled to and that's where I was bit concerned. I guess one could employ hard-limits to cap "something else's" bandwidth where it is of real concern (like clouds). I'd like to think I fixed those things in my yield_task_fair + yield_to + kvm_vcpu_on_spin patch series from yesterday. https://lkml.org/lkml/2011/1/20/403 -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] vhost: force vhost off for non-MSI guests
On 01/21/2011 03:55 AM, Michael S. Tsirkin wrote: On Thu, Jan 20, 2011 at 06:35:46PM -0700, Alex Williamson wrote: On Thu, 2011-01-20 at 18:23 -0600, Anthony Liguori wrote: On 01/20/2011 10:07 AM, Michael S. Tsirkin wrote: On Thu, Jan 20, 2011 at 09:43:57AM -0600, Anthony Liguori wrote: On 01/20/2011 09:35 AM, Michael S. Tsirkin wrote: When MSI is off, each interrupt needs to be bounced through the io thread when it's set/cleared, so vhost-net causes more context switches and higher CPU utilization than userspace virtio which handles networking in the same thread. We'll need to fix this by adding level irq support in kvm irqfd, for now disable vhost-net in these configurations. Signed-off-by: Michael S. Tsirkin I actually think this should be a terminal error. The user asks for vhost-net, if we cannot enable it, we should exit. Or we should warn the user that they should expect bad performance. Silently doing something that the user has explicitly asked us not to do is not a good behavior. Regards, Anthony Liguori The issue is that user has no control of the guest, and can not know whether the guest enables MSI. So what you ask for will just make some guests fail, and others fail sometimes. The user also has no way to know that version X of kvm does not expose a way to inject level interrupts with irqfd. We could have *another* flag that says "use vhost where it helps" but then I think this is what everyone wants to do, anyway, and libvirt already sets vhost=on so I prefer redefining the meaning of an existing flag. In the very least, there needs to be a vhost=force. Having some sort of friendly default policy is fine but we need to provide a mechanism for a user to have the final say. If you want to redefine vhost=on to really mean, use the friendly default, that's fine by me, but only if the vhost=force option exists. I actually would think libvirt would want to use vhost=force. Debugging with vhost=on is going to be a royal pain in the ass if a user reports bad performance. Given the libvirt XML, you can't actually tell from the guest and the XML whether or not vhost was actually in use or not. If we add a force option, let's please distinguish hotplug from VM creation time. The latter can abort. Hotplug should print an error and fail the initfn. It can't abort at init - MSI is disabled at init, it needs to be enabled by the guest later. And aborting the guest in the middle of the run is a very bad idea. What vhostforce=true will do is force vhost backend to be used even if it is slower. vhost=on,vhostforce=false use vhost if we think it will improve performance vhost=on,vhostforce=true always use vhost vhost=off,vhostforce=*do not use vhost Regards, Anthony Liguori Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] vhost: force vhost off for non-MSI guests
On 01/21/2011 03:48 AM, Michael S. Tsirkin wrote: On Thu, Jan 20, 2011 at 06:23:36PM -0600, Anthony Liguori wrote: On 01/20/2011 10:07 AM, Michael S. Tsirkin wrote: On Thu, Jan 20, 2011 at 09:43:57AM -0600, Anthony Liguori wrote: On 01/20/2011 09:35 AM, Michael S. Tsirkin wrote: When MSI is off, each interrupt needs to be bounced through the io thread when it's set/cleared, so vhost-net causes more context switches and higher CPU utilization than userspace virtio which handles networking in the same thread. We'll need to fix this by adding level irq support in kvm irqfd, for now disable vhost-net in these configurations. Signed-off-by: Michael S. Tsirkin I actually think this should be a terminal error. The user asks for vhost-net, if we cannot enable it, we should exit. Or we should warn the user that they should expect bad performance. Silently doing something that the user has explicitly asked us not to do is not a good behavior. Regards, Anthony Liguori The issue is that user has no control of the guest, and can not know whether the guest enables MSI. So what you ask for will just make some guests fail, and others fail sometimes. The user also has no way to know that version X of kvm does not expose a way to inject level interrupts with irqfd. We could have *another* flag that says "use vhost where it helps" but then I think this is what everyone wants to do, anyway, and libvirt already sets vhost=on so I prefer redefining the meaning of an existing flag. In the very least, there needs to be a vhost=force. Having some sort of friendly default policy is fine but we need to provide a mechanism for a user to have the final say. If you want to redefine vhost=on to really mean, use the friendly default, that's fine by me, but only if the vhost=force option exists. OK, I will add that, probably as a separate flag as vhost is a boolean. This will get worse performance but it will be what the user asked for. I actually would think libvirt would want to use vhost=force. Debugging with vhost=on is going to be a royal pain in the ass if a user reports bad performance. Given the libvirt XML, you can't actually tell from the guest and the XML whether or not vhost was actually in use or not. Yes you can: check MSI enabled in the guest, if it is - check vhost enabled in the XML. Not that bad at all, is it? Until you automatically detect level triggered interrupt support for irqfd. This means it's also dependent on a kernel feature too. Is there any way to tell in QEMU that vhost was silently disabled? Regards, Anthony Liguori Regards, Anthony Liguori We get worse performance without MSI anyway, how is this different? Maybe this is best handled by a documentation update? We always said: "use vhost=on to enable experimental in kernel accelerator\n" note 'enable' not 'require'. This is similar to how we specify nvectors : you can not make guest use the feature. How about this: diff --git a/qemu-options.hx b/qemu-options.hx index 898561d..3c937c1 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1061,6 +1061,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net, "use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap flag\n" "use vnet_hdr=on to make the lack of IFF_VNET_HDR support an error condition\n" "use vhost=on to enable experimental in kernel accelerator\n" +"(note: vhost=on has no effect unless guest uses MSI-X)\n" "use 'vhostfd=h' to connect to an already opened vhost net device\n" #endif "-net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n" -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
On Thu, Jan 20, 2011 at 09:56:27AM -0800, Jeremy Fitzhardinge wrote: > > The key here is not to > > sleep when waiting for locks (as implemented by current patch-series, which > > can > > put other VMs at an advantage by giving them more time than they are > > entitled > > to) > > Why? If a VCPU can't make progress because its waiting for some > resource, then why not schedule something else instead? In the process, "something else" can get more share of cpu resource than its entitled to and that's where I was bit concerned. I guess one could employ hard-limits to cap "something else's" bandwidth where it is of real concern (like clouds). > Presumably when > the VCPU does become runnable, the scheduler will credit its previous > blocked state and let it run in preference to something else. which may not be sufficient for it to gain back bandwidth lost while blocked (speaking of mainline scheduler atleast). > > Is there a way we can dynamically expand the size of lock only upon > > contention > > to include additional information like owning vcpu? Have the lock point to a > > per-cpu area upon contention where additional details can be stored perhaps? > > As soon as you add a pointer to the lock, you're increasing its size. I didn't really mean to expand size statically. Rather have some bits of the lock word store pointer to a per-cpu area when there is contention (somewhat similar to how bits of rt_mutex.owner are used). I haven't thought thr' this in detail to see if that is possible though. - vatsa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EPT: Misconfiguration
On Thu, Jan 20, 2011 at 12:48:00PM +0100, Ruben Kerkhof wrote: > I'm suddenly getting lots of the following errors on a server running > 2.36.7, but I have no idea what it means: > > 2011-01-20T12:41:18.358603+01:00 phy005 kernel: EPT: Misconfiguration. > 2011-01-20T12:41:18.358621+01:00 phy005 kernel: EPT: GPA: 0x3dbff6b0 > 2011-01-20T12:41:18.358624+01:00 phy005 kernel: > ept_misconfig_inspect_spte: spte 0x50743e007 level 4 > 2011-01-20T12:41:18.358627+01:00 phy005 kernel: > ept_misconfig_inspect_spte: spte 0x523de2007 level 3 > 2011-01-20T12:41:18.358629+01:00 phy005 kernel: > ept_misconfig_inspect_spte: spte 0x62336f007 level 2 > 2011-01-20T12:41:18.360109+01:00 phy005 kernel: > ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1 > 2011-01-20T12:41:18.360137+01:00 phy005 kernel: > ept_misconfig_inspect_spte: rsvd_bits = 0x3a000 > 2011-01-20T12:41:18.360151+01:00 phy005 kernel: [ cut here > ] A shadow pagetable entry in memory has bits 45-49 set, which is not allowed. Its probably bad memory if this errors were not present before with the same workload and host software. Would be useful to see what memtest86 says. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] vhost: force vhost off for non-MSI guests
On Fri, Jan 21, 2011 at 06:19:13AM -0700, Alex Williamson wrote: > On Fri, 2011-01-21 at 11:55 +0200, Michael S. Tsirkin wrote: > > On Thu, Jan 20, 2011 at 06:35:46PM -0700, Alex Williamson wrote: > > > On Thu, 2011-01-20 at 18:23 -0600, Anthony Liguori wrote: > > > > On 01/20/2011 10:07 AM, Michael S. Tsirkin wrote: > > > > > On Thu, Jan 20, 2011 at 09:43:57AM -0600, Anthony Liguori wrote: > > > > > > > > > >> On 01/20/2011 09:35 AM, Michael S. Tsirkin wrote: > > > > >> > > > > >>> When MSI is off, each interrupt needs to be bounced through the io > > > > >>> thread when it's set/cleared, so vhost-net causes more context > > > > >>> switches and > > > > >>> higher CPU utilization than userspace virtio which handles > > > > >>> networking in > > > > >>> the same thread. > > > > >>> > > > > >>> We'll need to fix this by adding level irq support in kvm irqfd, > > > > >>> for now disable vhost-net in these configurations. > > > > >>> > > > > >>> Signed-off-by: Michael S. Tsirkin > > > > >>> > > > > >> I actually think this should be a terminal error. The user asks for > > > > >> vhost-net, if we cannot enable it, we should exit. > > > > >> > > > > >> Or we should warn the user that they should expect bad performance. > > > > >> Silently doing something that the user has explicitly asked us not > > > > >> to do is not a good behavior. > > > > >> > > > > >> Regards, > > > > >> > > > > >> Anthony Liguori > > > > >> > > > > > The issue is that user has no control of the guest, and can not know > > > > > whether the guest enables MSI. So what you ask for will just make > > > > > some guests fail, and others fail sometimes. > > > > > The user also has no way to know that version X of kvm does not > > > > > expose a > > > > > way to inject level interrupts with irqfd. > > > > > > > > > > We could have *another* flag that says "use vhost where it helps" but > > > > > then I think this is what everyone wants to do, anyway, and libvirt > > > > > already sets vhost=on so I prefer redefining the meaning of an > > > > > existing > > > > > flag. > > > > > > > > > > > > > In the very least, there needs to be a vhost=force. > > > > > > > > Having some sort of friendly default policy is fine but we need to > > > > provide a mechanism for a user to have the final say. If you want to > > > > redefine vhost=on to really mean, use the friendly default, that's fine > > > > by me, but only if the vhost=force option exists. > > > > > > > > I actually would think libvirt would want to use vhost=force. > > > > Debugging > > > > with vhost=on is going to be a royal pain in the ass if a user reports > > > > bad performance. Given the libvirt XML, you can't actually tell from > > > > the guest and the XML whether or not vhost was actually in use or not. > > > > > > If we add a force option, let's please distinguish hotplug from VM > > > creation time. The latter can abort. Hotplug should print an error and > > > fail the initfn. > > > > It can't abort at init - MSI is disabled at init, it needs to be enabled > > by the guest later. And aborting the guest in the middle of the run > > is a very bad idea. > > Yeah, I was thinking about the ordering of device being added vs guest > enabling MSI this morning. Waiting until the guest decides to try to > start using the device to NAK it with an abort is very undesirable. > What if when we have vhost=on,force, the device doesn't advertise an > INTx (PCI_INTERRUPT_PIN = 0)? > > Alex Then we break backward compatibility with old guests. I don't see what the issue is really: It is trivial to check that the guest uses MSIX. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] vhost: force vhost off for non-MSI guests
On Fri, 2011-01-21 at 11:55 +0200, Michael S. Tsirkin wrote: > On Thu, Jan 20, 2011 at 06:35:46PM -0700, Alex Williamson wrote: > > On Thu, 2011-01-20 at 18:23 -0600, Anthony Liguori wrote: > > > On 01/20/2011 10:07 AM, Michael S. Tsirkin wrote: > > > > On Thu, Jan 20, 2011 at 09:43:57AM -0600, Anthony Liguori wrote: > > > > > > > >> On 01/20/2011 09:35 AM, Michael S. Tsirkin wrote: > > > >> > > > >>> When MSI is off, each interrupt needs to be bounced through the io > > > >>> thread when it's set/cleared, so vhost-net causes more context > > > >>> switches and > > > >>> higher CPU utilization than userspace virtio which handles networking > > > >>> in > > > >>> the same thread. > > > >>> > > > >>> We'll need to fix this by adding level irq support in kvm irqfd, > > > >>> for now disable vhost-net in these configurations. > > > >>> > > > >>> Signed-off-by: Michael S. Tsirkin > > > >>> > > > >> I actually think this should be a terminal error. The user asks for > > > >> vhost-net, if we cannot enable it, we should exit. > > > >> > > > >> Or we should warn the user that they should expect bad performance. > > > >> Silently doing something that the user has explicitly asked us not > > > >> to do is not a good behavior. > > > >> > > > >> Regards, > > > >> > > > >> Anthony Liguori > > > >> > > > > The issue is that user has no control of the guest, and can not know > > > > whether the guest enables MSI. So what you ask for will just make > > > > some guests fail, and others fail sometimes. > > > > The user also has no way to know that version X of kvm does not expose a > > > > way to inject level interrupts with irqfd. > > > > > > > > We could have *another* flag that says "use vhost where it helps" but > > > > then I think this is what everyone wants to do, anyway, and libvirt > > > > already sets vhost=on so I prefer redefining the meaning of an existing > > > > flag. > > > > > > > > > > In the very least, there needs to be a vhost=force. > > > > > > Having some sort of friendly default policy is fine but we need to > > > provide a mechanism for a user to have the final say. If you want to > > > redefine vhost=on to really mean, use the friendly default, that's fine > > > by me, but only if the vhost=force option exists. > > > > > > I actually would think libvirt would want to use vhost=force. Debugging > > > with vhost=on is going to be a royal pain in the ass if a user reports > > > bad performance. Given the libvirt XML, you can't actually tell from > > > the guest and the XML whether or not vhost was actually in use or not. > > > > If we add a force option, let's please distinguish hotplug from VM > > creation time. The latter can abort. Hotplug should print an error and > > fail the initfn. > > It can't abort at init - MSI is disabled at init, it needs to be enabled > by the guest later. And aborting the guest in the middle of the run > is a very bad idea. Yeah, I was thinking about the ordering of device being added vs guest enabling MSI this morning. Waiting until the guest decides to try to start using the device to NAK it with an abort is very undesirable. What if when we have vhost=on,force, the device doesn't advertise an INTx (PCI_INTERRUPT_PIN = 0)? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Flow Control and Port Mirroring Revisited
On Thu, Jan 20, 2011 at 05:38:33PM +0900, Simon Horman wrote: > [ Trimmed Eric from CC list as vger was complaining that it is too long ] > > On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote: > > >So it won't be all that simple to implement well, and before we try, > > >I'd like to know whether there are applications that are helped > > >by it. For example, we could try to measure latency at various > > >pps and see whether the backpressure helps. netperf has -b, -w > > >flags which might help these measurements. > > > > Those options are enabled when one adds --enable-burst to the > > pre-compilation ./configure of netperf (one doesn't have to > > recompile netserver). However, if one is also looking at latency > > statistics via the -j option in the top-of-trunk, or simply at the > > histogram with --enable-histogram on the ./configure and a verbosity > > level of 2 (global -v 2) then one wants the very top of trunk > > netperf from: > > Hi, > > I have constructed a test where I run an un-paced UDP_STREAM test in > one guest and a paced omni rr test in another guest at the same time. Hmm, what is this supposed to measure? Basically each time you run an un-paced UDP_STREAM you get some random load on the network. You can't tell what it was exactly, only that it was between the send and receive throughput. > Breifly I get the following results from the omni test.. > > 1. Omni test only:MEAN_LATENCY=272.00 > 2. Omni and stream test: MEAN_LATENCY=3423.00 > 3. cpu and net_cls group: MEAN_LATENCY=493.00 >As per 2 plus cgoups are created for each guest >and guest tasks added to the groups > 4. 100Mbit/s class: MEAN_LATENCY=273.00 >As per 3 plus the net_cls groups each have a 100MBit/s HTB class > 5. cpu.shares=128:MEAN_LATENCY=652.00 >As per 4 plus the cpu groups have cpu.shares set to 128 > 6. Busy CPUS: MEAN_LATENCY=15126.00 >As per 5 but the CPUs are made busy using a simple shell while loop > > There is a bit of noise in the results as the two netperf invocations > aren't started at exactly the same moment > > For reference, my netperf invocations are: > netperf -c -C -t UDP_STREAM -H 172.17.60.216 -l 12 > netperf.omni -p 12866 -D -c -C -H 172.17.60.216 -t omni -j -v 2 -- -r 1 -d rr > -k foo -b 1 -w 200 -m 200 > > foo contains > PROTOCOL > THROUGHPUT,THROUGHPUT_UNITS > LOCAL_SEND_THROUGHPUT > LOCAL_RECV_THROUGHPUT > REMOTE_SEND_THROUGHPUT > REMOTE_RECV_THROUGHPUT > RT_LATENCY,MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY > P50_LATENCY,P90_LATENCY,P99_LATENCY,STDDEV_LATENCY > LOCAL_CPU_UTIL,REMOTE_CPU_UTIL -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
Gerd Hoffmann writes: > Hi, > >> By the way, we don't have a QEMUState but instead use globals. > > /me wants to underline this. > > IMO it is absolutely pointless to worry about ways to pass around > kvm_state. There never ever will be a serious need for that. > > We can stick with the current model of keeping global state in global > variables. And just do the same with kvm_state. > > Or we can move to have all state in a QEMUState struct which we'll > pass around basically everywhere. Then we can simply embed or > reference kvm_state there. > > I'd tend to stick with the global variables as I don't see the point > in having a QEMUstate. I doubt we'll ever see two virtual machines > driven by a single qemu process. YMMV. /me grabs the fat magic marker and underlines some more. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
Gerd Hoffmann writes: > On 01/20/11 20:39, Anthony Liguori wrote: >> On 01/20/2011 02:44 AM, Gerd Hoffmann wrote: >>> Hi, >>> For (2), you cannot use bus=X,addr=Y because it makes assumptions about the PCI topology which may change in newer -M pc's. >>> >>> Why should the PCI topology for 'pc' ever change? >>> >>> We'll probably get q35 support some day, but when this lands I expect >>> we'll see a new machine type 'q35', so '-m q35' will pick the ich9 >>> chipset (which will have a different pci topology of course) and '-m >>> pc' will pick the existing piix chipset (which will continue to look >>> like it looks today). >> >> But then what's the default machine type? When I say -M pc, I really >> mean the default machine. > > I'd tend to leave pc as default for a release cycle or two so we can > hash out issues with q35, then flip the default once it got broader > testing and runs stable. > >> At some point, "qemu-system-x86_64 -device virtio-net-pci,addr=2.0" >> >> Is not going to be a reliable way to invoke qemu because there's no way >> we can guarantee that slot 2 isn't occupied by a chipset device or some >> other default device. > > Indeed. But qemu -M pc should continue to work though. 'pc' would > better named 'piix3', but renaming it now is probably not worth the > trouble. We mustn't change pc-0.14 & friends. We routinely change pc, but whether an upgrade to q35 qualifies as routine change is debatable. If you don't want PCI topology (and more) to change across QEMU updates, consider using the versioned machine types. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] vhost: force vhost off for non-MSI guests
On Thu, Jan 20, 2011 at 06:35:46PM -0700, Alex Williamson wrote: > On Thu, 2011-01-20 at 18:23 -0600, Anthony Liguori wrote: > > On 01/20/2011 10:07 AM, Michael S. Tsirkin wrote: > > > On Thu, Jan 20, 2011 at 09:43:57AM -0600, Anthony Liguori wrote: > > > > > >> On 01/20/2011 09:35 AM, Michael S. Tsirkin wrote: > > >> > > >>> When MSI is off, each interrupt needs to be bounced through the io > > >>> thread when it's set/cleared, so vhost-net causes more context switches > > >>> and > > >>> higher CPU utilization than userspace virtio which handles networking in > > >>> the same thread. > > >>> > > >>> We'll need to fix this by adding level irq support in kvm irqfd, > > >>> for now disable vhost-net in these configurations. > > >>> > > >>> Signed-off-by: Michael S. Tsirkin > > >>> > > >> I actually think this should be a terminal error. The user asks for > > >> vhost-net, if we cannot enable it, we should exit. > > >> > > >> Or we should warn the user that they should expect bad performance. > > >> Silently doing something that the user has explicitly asked us not > > >> to do is not a good behavior. > > >> > > >> Regards, > > >> > > >> Anthony Liguori > > >> > > > The issue is that user has no control of the guest, and can not know > > > whether the guest enables MSI. So what you ask for will just make > > > some guests fail, and others fail sometimes. > > > The user also has no way to know that version X of kvm does not expose a > > > way to inject level interrupts with irqfd. > > > > > > We could have *another* flag that says "use vhost where it helps" but > > > then I think this is what everyone wants to do, anyway, and libvirt > > > already sets vhost=on so I prefer redefining the meaning of an existing > > > flag. > > > > > > > In the very least, there needs to be a vhost=force. > > > > Having some sort of friendly default policy is fine but we need to > > provide a mechanism for a user to have the final say. If you want to > > redefine vhost=on to really mean, use the friendly default, that's fine > > by me, but only if the vhost=force option exists. > > > > I actually would think libvirt would want to use vhost=force. Debugging > > with vhost=on is going to be a royal pain in the ass if a user reports > > bad performance. Given the libvirt XML, you can't actually tell from > > the guest and the XML whether or not vhost was actually in use or not. > > If we add a force option, let's please distinguish hotplug from VM > creation time. The latter can abort. Hotplug should print an error and > fail the initfn. It can't abort at init - MSI is disabled at init, it needs to be enabled by the guest later. And aborting the guest in the middle of the run is a very bad idea. What vhostforce=true will do is force vhost backend to be used even if it is slower. > Thanks, > > Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] vhost: force vhost off for non-MSI guests
On Thu, Jan 20, 2011 at 06:23:36PM -0600, Anthony Liguori wrote: > On 01/20/2011 10:07 AM, Michael S. Tsirkin wrote: > >On Thu, Jan 20, 2011 at 09:43:57AM -0600, Anthony Liguori wrote: > >>On 01/20/2011 09:35 AM, Michael S. Tsirkin wrote: > >>>When MSI is off, each interrupt needs to be bounced through the io > >>>thread when it's set/cleared, so vhost-net causes more context switches and > >>>higher CPU utilization than userspace virtio which handles networking in > >>>the same thread. > >>> > >>>We'll need to fix this by adding level irq support in kvm irqfd, > >>>for now disable vhost-net in these configurations. > >>> > >>>Signed-off-by: Michael S. Tsirkin > >>I actually think this should be a terminal error. The user asks for > >>vhost-net, if we cannot enable it, we should exit. > >> > >>Or we should warn the user that they should expect bad performance. > >>Silently doing something that the user has explicitly asked us not > >>to do is not a good behavior. > >> > >>Regards, > >> > >>Anthony Liguori > >The issue is that user has no control of the guest, and can not know > >whether the guest enables MSI. So what you ask for will just make > >some guests fail, and others fail sometimes. > >The user also has no way to know that version X of kvm does not expose a > >way to inject level interrupts with irqfd. > > > >We could have *another* flag that says "use vhost where it helps" but > >then I think this is what everyone wants to do, anyway, and libvirt > >already sets vhost=on so I prefer redefining the meaning of an existing > >flag. > > In the very least, there needs to be a vhost=force. > Having some sort of friendly default policy is fine but we need to > provide a mechanism for a user to have the final say. If you want > to redefine vhost=on to really mean, use the friendly default, > that's fine by me, but only if the vhost=force option exists. OK, I will add that, probably as a separate flag as vhost is a boolean. This will get worse performance but it will be what the user asked for. > > I actually would think libvirt would want to use vhost=force. > Debugging with vhost=on is going to be a royal pain in the ass if a > user reports bad performance. Given the libvirt XML, you can't > actually tell from the guest and the XML whether or not vhost was > actually in use or not. Yes you can: check MSI enabled in the guest, if it is - check vhost enabled in the XML. Not that bad at all, is it? > > Regards, > > Anthony Liguori We get worse performance without MSI anyway, how is this different? > >Maybe this is best handled by a documentation update? > > > >We always said: > > "use vhost=on to enable experimental in kernel > > accelerator\n" > > > >note 'enable' not 'require'. This is similar to how we specify > >nvectors : you can not make guest use the feature. > > > >How about this: > > > >diff --git a/qemu-options.hx b/qemu-options.hx > >index 898561d..3c937c1 100644 > >--- a/qemu-options.hx > >+++ b/qemu-options.hx > >@@ -1061,6 +1061,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net, > > "use vnet_hdr=off to avoid enabling the IFF_VNET_HDR > > tap flag\n" > > "use vnet_hdr=on to make the lack of IFF_VNET_HDR > > support an error condition\n" > > "use vhost=on to enable experimental in kernel > > accelerator\n" > >+"(note: vhost=on has no effect unless guest uses > >MSI-X)\n" > > "use 'vhostfd=h' to connect to an already opened vhost > > net device\n" > > #endif > > "-net > > socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n" > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
Hi, By the way, we don't have a QEMUState but instead use globals. /me wants to underline this. IMO it is absolutely pointless to worry about ways to pass around kvm_state. There never ever will be a serious need for that. We can stick with the current model of keeping global state in global variables. And just do the same with kvm_state. Or we can move to have all state in a QEMUState struct which we'll pass around basically everywhere. Then we can simply embed or reference kvm_state there. I'd tend to stick with the global variables as I don't see the point in having a QEMUstate. I doubt we'll ever see two virtual machines driven by a single qemu process. YMMV. cheers, Gerd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 01/20/11 20:39, Anthony Liguori wrote: On 01/20/2011 02:44 AM, Gerd Hoffmann wrote: Hi, For (2), you cannot use bus=X,addr=Y because it makes assumptions about the PCI topology which may change in newer -M pc's. Why should the PCI topology for 'pc' ever change? We'll probably get q35 support some day, but when this lands I expect we'll see a new machine type 'q35', so '-m q35' will pick the ich9 chipset (which will have a different pci topology of course) and '-m pc' will pick the existing piix chipset (which will continue to look like it looks today). But then what's the default machine type? When I say -M pc, I really mean the default machine. I'd tend to leave pc as default for a release cycle or two so we can hash out issues with q35, then flip the default once it got broader testing and runs stable. At some point, "qemu-system-x86_64 -device virtio-net-pci,addr=2.0" Is not going to be a reliable way to invoke qemu because there's no way we can guarantee that slot 2 isn't occupied by a chipset device or some other default device. Indeed. But qemu -M pc should continue to work though. 'pc' would better named 'piix3', but renaming it now is probably not worth the trouble. cheers, Gerd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 26872] qemu stop responding if using kvm with usb passthru
https://bugzilla.kernel.org/show_bug.cgi?id=26872 alien.vi...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||PATCH_ALREADY_AVAILABLE --- Comment #1 from alien.vi...@gmail.com 2011-01-21 08:02:24 --- user must enable MMU in kernel command line -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html