[Qemu-devel] Re: KVM call agenda for July 20
On 07/20/2010 12:46 AM, Chris Wright wrote: Please send in any agenda items you are interested in covering. Last week's agenda, minus the item that we started to discuss. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
[Qemu-devel] Re: [SeaBIOS] [PATCH 2/7] seabios: shadow: make device finding more generic.
On Tue, Jul 13, 2010 at 06:45:00PM +0900, Isaku Yamahata wrote: > On Mon, Jul 12, 2010 at 08:59:14PM -0400, Kevin O'Connor wrote: > > On Mon, Jul 12, 2010 at 08:47:47PM +0900, Isaku Yamahata wrote: > > > pam register offset is north bridge specific. > > > So determine the offset based on found north bridge. > > > > Is it really just the offset that is north bridge specific? I thought > > the entire process was very north bridge specific. > > > > If so, I don't think it makes sense to pass back the pam0 register - > > instead the north bridge specific code should do the necessary work > > (using helper functions if possible). > > > > I have the same concern with part 3 and 4 of this series. > > I440fx and Q35 (all Intel chipset?) are similar in registers > which seabios programs, so I choice to abstract it at register offset level. > I don't expect that other vendor's chipset support is wanted. Although it isn't currently used, the memory locking support is useful on real machines too. I'd prefer a solution that would work on both qemu and real machines. It's minor for part 2 of the series, but I found part 3/4 to be hard to follow due to the way the flow of code jumps between machine specific code in dev-i440fx.c and the smm code in smm.c. > If you want more high level abstract, I'll respin the patch set. I've been meaning to look through the full series of changes in your repo, but have not yet had a chance to do so. I hope to get to that in the next few days. Sorry for the delay. -Kevin
[Qemu-devel] [RFC PATCH 12/14] KVM-test: Add a subtest of netperf
Add network load by netperf, server is launched on guest, execute netperf client with different protocols on host. if all clients execute successfully, case will be pass. Test result will be record into result.txt. Now this case only tests with "TCP_RR TCP_CRR UDP_RR TCP_STREAM TCP_MAERTS TCP_SENDFILE UDP_STREAM". DLPI only supported by Unix, unix domain test is not necessary, so drop test of DLPI and unix domain. Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/netperf.py b/client/tests/kvm/tests/netperf.py new file mode 100644 index 000..00a91f0 --- /dev/null +++ b/client/tests/kvm/tests/netperf.py @@ -0,0 +1,56 @@ +import logging, commands, os +from autotest_lib.client.common_lib import error +import kvm_subprocess, kvm_test_utils, kvm_utils + +def run_netperf(test, params, env): +""" +Network stress test with netperf + +1) Boot up a virtual machine +2) Launch netserver on guest +3) Execute netperf client on host with different protocols +4) Outout the test result + +@param test: Kvm test object +@param params: Dictionary with the test parameters. +@param env: Dictionary with test environment. +""" +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +session = kvm_test_utils.wait_for_login(vm, + timeout=int(params.get("login_timeout", 360))) +netperf_dir = os.path.join(os.environ['AUTODIR'], "tests/netperf2") +setup_cmd = params.get("setup_cmd") +guest_ip = vm.get_address() +result_file = os.path.join(test.debugdir, "result.txt") + +session.get_command_output("service iptables stop") +for i in params.get("netperf_files").split(): +if not vm.copy_files_to(os.path.join(netperf_dir, i), "/tmp"): +raise error.TestError("Could not copy files to guest") +if session.get_command_status(setup_cmd % "/tmp", timeout=100) != 0: +raise error.TestFail("Fail to setup netperf on guest") +if session.get_command_status(params.get("netserver_cmd") % "/tmp") != 0: +raise error.TestFail("Fail to start netperf server on guest") + +try: +logging.info("Setup and run netperf client on host") +s, o = commands.getstatusoutput(setup_cmd % netperf_dir) +if s != 0: +raise error.TestFail("Fail to setup netperf on host, o: %s" % o) +success = True +file(result_file, "w").write("Netperf Test Result\n") +for i in params.get("protocols").split(): +cmd = params.get("netperf_cmd") % (netperf_dir, i, guest_ip) +logging.debug("Execute netperf client test: %s" % cmd) +s, o = commands.getstatusoutput(cmd) +if s != 0: +logging.error("Fail to execute netperf test, protocol:%s" % i) +success = False +else: +logging.info(o) +file(result_file, "a+").write("%s\n" % o) +if not success: +raise error.TestFail("Not all the test passed") +finally: +session.get_command_output("killall netserver") +session.close() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 7716d48..dec988e 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -398,6 +398,16 @@ variants: type = mac_change kill_vm = yes +- netperf: install setup unattended_install.cdrom +type = netperf +nic_mode = tap +netperf_files = netperf-2.4.5.tar.bz2 wait_before_data.patch +setup_cmd = "cd %s && tar xvfj netperf-2.4.5.tar.bz2 && cd netperf-2.4.5 && patch -p0 < ../wait_before_data.patch && ./configure && make" +netserver_cmd = %s/netperf-2.4.5/src/netserver +# test time is 60 seconds, set the buffer size to 1 for more hardware interrupt +netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 -- -m 1 +protocols = "TCP_STREAM TCP_MAERTS TCP_RR TCP_CRR UDP_RR TCP_SENDFILE UDP_STREAM" + - physical_resources_check: install setup unattended_install.cdrom type = physical_resources_check catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'
[Qemu-devel] [RFC PATCH 02/14] KVM Test: Add a function get_interface_name() to kvm_net_utils.py
The function get_interface_name is used to get the interface name of linux guest through the macaddress of specified macaddress. Signed-off-by: Jason Wang Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/kvm_net_utils.py b/client/tests/kvm/kvm_net_utils.py new file mode 100644 index 000..ede4965 --- /dev/null +++ b/client/tests/kvm/kvm_net_utils.py @@ -0,0 +1,18 @@ +import re + +def get_linux_ifname(session, mac_address): +""" +Get the interface name through the mac address. + +@param session: session to the virtual machine +@mac_address: the macaddress of nic +""" + +output = session.get_command_output("ifconfig -a") + +try: +ethname = re.findall("(\w+)\s+Link.*%s" % mac_address, output, + re.IGNORECASE)[0] +return ethname +except: +return None
[Qemu-devel] [RFC PATCH 14/14] KVM-test: Add subtest of testing offload by ethtool
The latest case contains TX/RX/SG/TSO/GSO/GRO/LRO test. RTL8139 NIC doesn't support TSO, LRO, it's too old, so drop offload test from rtl8139. LRO, GRO are only supported by latest kernel, virtio nic doesn't support receive offloading function. Initialize the callbacks first and execute all the sub tests one by one, all the result will be check at the end. When execute this test, vhost should be enabled, then most of new feature can be used. Vhost doestn't support VIRTIO_NET_F_MRG_RXBUF, so do not check large packets in received offload test. Transfer files by scp between host and guest, match new opened TCP port by netstat. Capture the packages info by tcpdump, it contains package length. Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/ethtool.py b/client/tests/kvm/tests/ethtool.py new file mode 100644 index 000..7274eae --- /dev/null +++ b/client/tests/kvm/tests/ethtool.py @@ -0,0 +1,205 @@ +import time, os, logging, commands, re +from autotest_lib.client.common_lib import error +from autotest_lib.client.bin import utils +import kvm_test_utils, kvm_utils, kvm_net_utils + +def run_ethtool(test, params, env): +""" +Test offload functions of ethernet device by ethtool + +1) Log into a guest +2) Initialize the callback of sub functions +3) Enable/disable sub function of NIC +4) Execute callback function +5) Check the return value +6) Restore original configuration + +@param test: Kvm test object +@param params: Dictionary with the test parameters. +@param env: Dictionary with test environment. +""" +def ethtool_get(type): +feature_pattern = { +'tx': 'tx.*checksumming', +'rx': 'rx.*checksumming', +'sg': 'scatter.*gather', +'tso': 'tcp.*segmentation.*offload', +'gso': 'generic.*segmentation.*offload', +'gro': 'generic.*receive.*offload', +'lro': 'large.*receive.*offload', +} +s, o = session.get_command_status_output("ethtool -k %s" % ethname) +try: +return re.findall("%s: (.*)" % feature_pattern.get(type), o)[0] +except IndexError: +logging.debug("Could not get %s status" % type) + +def ethtool_set(type, status): +""" +Set ethernet device offload status + +@param type: Offload type name +@param status: New status will be changed to +""" +logging.info("Try to set %s %s" % (type, status)) +if status not in ["off", "on"]: +return False +cmd = "ethtool -K %s %s %s" % (ethname, type, status) +if ethtool_get(type) != status: +return session.get_command_status(cmd) == 0 +if ethtool_get(type) != status: +logging.error("Fail to set %s %s" % (type, status)) +return False +return True + +def ethtool_save_params(): +logging.info("Save ethtool configuration") +for i in supported_features: +feature_status[i] = ethtool_get(i) + +def ethtool_restore_params(): +logging.info("Restore ethtool configuration") +for i in supported_features: +ethtool_set(i, feature_status[i]) + +def compare_md5sum(name): +logging.info("Compare md5sum of the files on guest and host") +host_result = utils.hash_file(name, method="md5") +try: +o = session.get_command_output("md5sum %s" % name) +guest_result = re.findall("\w+", o)[0] +except IndexError: +logging.error("Could not get file md5sum in guest") +return False +logging.debug("md5sum: guest(%s), host(%s)" % (guest_result, + host_result)) +return guest_result == host_result + +def transfer_file(src="guest"): +""" +Transfer file by scp, use tcpdump to capture packets, then check the +return string. + +@param src: Source host of transfer file +@return: Tuple (status, error msg/tcpdump result) +""" +session2.get_command_status("rm -rf %s" % filename) +dd_cmd = "dd if=/dev/urandom of=%s bs=1M count=%s" % (filename, + params.get("filesize")) +logging.info("Creat file in source host, cmd: %s" % dd_cmd) +tcpdump_cmd = "tcpdump -lep -s 0 tcp -vv port ssh" +if src == "guest": +s = session.get_command_status(dd_cmd, timeout=360) +tcpdump_cmd += " and src %s" % guest_ip +copy_files_fun = vm.copy_files_from +else: +s, o = commands.getstatusoutput(dd_cmd) +tcpdump_cmd += " and dst %s" % guest_ip +copy_files_fun = vm.copy_files_to +if s != 0: +return (False, "Fail to create file by dd, cmd: %s" % dd_cmd) + +# only capture the new tcp port aft
[Qemu-devel] [RFC PATCH 11/14] KVM-test: Add a subtest of changing mac address
Mainly test steps: 1. get a new mac from pool, and the old mac addr of guest. 2. execute the mac_change.sh in guest. 3. relogin to guest and query the interfaces info by `ifconfig` Signed-off-by: Cao, Chen Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/mac_change.py b/client/tests/kvm/tests/mac_change.py new file mode 100644 index 000..dc93377 --- /dev/null +++ b/client/tests/kvm/tests/mac_change.py @@ -0,0 +1,66 @@ +import logging +from autotest_lib.client.common_lib import error +import kvm_utils, kvm_test_utils, kvm_net_utils + + +def run_mac_change(test, params, env): +""" +Change MAC Address of Guest. + +1. get a new mac from pool, and the old mac addr of guest. +2. set new mac in guest and regain new IP. +3. re-log into guest with new mac + +@param test: kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. +""" +timeout = int(params.get("login_timeout", 360)) +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +logging.info("Trying to log into guest '%s' by serial", vm.name) +session = kvm_utils.wait_for(lambda: vm.serial_login(), + timeout, 0, step=2) +if not session: +raise error.TestFail("Could not log into guest '%s'" % vm.name) + +old_mac = vm.get_macaddr(0) +kvm_utils.put_mac_to_pool(vm.root_dir, old_mac, vm.instance) +new_mac = kvm_utils.get_mac_from_pool(vm.root_dir, + vm=vm.instance, + nic_index=0, + prefix=vm.mac_prefix) +logging.info("The initial MAC address is %s" % old_mac) +interface = kvm_net_utils.get_linux_ifname(session, old_mac) + +# Start change mac address +logging.info("Changing mac address to %s" % new_mac) +change_cmd = "ifconfig %s down && ifconfig %s hw ether %s && ifconfig %s up"\ + % (interface, interface, new_mac, interface) +if session.get_command_status(change_cmd) != 0: +raise error.TestFail("Fail to send mac_change command") + +# Verify whether mac address is changed to new one +logging.info("Verifying the new mac address") +if session.get_command_status("ifconfig | grep -i %s" % new_mac) != 0: +raise error.TestFail("Fail to change mac address") + +# Restart `dhclient' to regain IP for new mac address +logging.info("Re-start the network to gain new ip") +dhclient_cmd = "dhclient -r && dhclient %s" % interface +session.sendline(dhclient_cmd) + +# Re-log into the guest after changing mac address +if kvm_utils.wait_for(session.is_responsive, 120, 20, 3): +# Just warning when failed to see the session become dead, +# because there is a little chance the ip does not change. +logging.warn("The session is still responsive, settings may fail.") +session.close() + +# Re-log into guest and check if session is responsive +logging.info("Re-log into the guest") +session = kvm_test_utils.wait_for_login(vm, + timeout=int(params.get("login_timeout", 360))) +if not session.is_responsive(): +raise error.TestFail("The new session is not responsive.") + +session.close() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 5515601..7716d48 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -394,6 +394,10 @@ variants: restart_vm = yes pxe_timeout = 60 +- mac_change: install setup unattended_install.cdrom +type = mac_change +kill_vm = yes + - physical_resources_check: install setup unattended_install.cdrom type = physical_resources_check catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}' @@ -1070,7 +1074,7 @@ variants: # Windows section - @Windows: -no autotest linux_s3 vlan_tag ioquit unattended_install.(url|nfs|remote_ks) jumbo file_transfer nicdriver_unload nic_promisc multicast +no autotest linux_s3 vlan_tag ioquit unattended_install.(url|nfs|remote_ks) jumbo file_transfer nicdriver_unload nic_promisc multicast mac_change shutdown_command = shutdown /s /f /t 0 reboot_command = shutdown /r /f /t 0 status_test_command = echo %errorlevel%
[Qemu-devel] [RFC PATCH 10/14] KVM-test: Add a subtest of pxe
This case just snoop tftp packet through tcpdump, it depends on public dhcp server, better to test it through dnsmasq. Signed-off-by: Jason Wang Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/pxe.py b/client/tests/kvm/tests/pxe.py new file mode 100644 index 000..8859aaa --- /dev/null +++ b/client/tests/kvm/tests/pxe.py @@ -0,0 +1,30 @@ +import logging +from autotest_lib.client.common_lib import error +import kvm_subprocess, kvm_test_utils, kvm_utils + + +def run_pxe(test, params, env): +""" +PXE test: + +1) Snoop the tftp packet in the tap device +2) Wait for some seconds +3) Check whether capture tftp packets + +@param test: kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. +""" +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +timeout = int(params.get("pxe_timeout", 60)) + +logging.info("Try to boot from pxe") +status, output = kvm_subprocess.run_fg("tcpdump -nli %s" % vm.get_ifname(), + logging.debug, + "(pxe) ", + timeout) + +logging.info("Analysing the tcpdump result...") +if not "tftp" in output: +raise error.TestFail("Couldn't find tftp packet in %s seconds" % timeout) +logging.info("Found tftp packet") diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 9594a38..5515601 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -381,6 +381,19 @@ variants: mgroup_count = 20 flood_minutes = 1 +- pxe: +type = pxe +images = pxe +image_name_pxe = pxe-test +image_size_pxe = 1G +force_create_image_pxe = yes +remove_image_pxe = yes +extra_params += ' -boot n' +kill_vm_on_error = yes +network = bridge +restart_vm = yes +pxe_timeout = 60 + - physical_resources_check: install setup unattended_install.cdrom type = physical_resources_check catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'
[Qemu-devel] [RFC PATCH 05/14] KVM-test: Add a subtest jumbo
According to different nic model set different MTU for it. And ping from guest to host, to see whether tested size can be received by host. Signed-off-by: Jason Wang Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/jumbo.py b/client/tests/kvm/tests/jumbo.py new file mode 100644 index 000..9f56a87 --- /dev/null +++ b/client/tests/kvm/tests/jumbo.py @@ -0,0 +1,133 @@ +import os, re, logging, commands, time, random +from autotest_lib.client.common_lib import error +import kvm_subprocess, kvm_test_utils, kvm_utils, kvm_net_utils + +def run_jumbo(test, params, env): +""" +Test the RX jumbo frame function of vnics: +1) boot the vm +2) change the MTU of guest nics and host taps depends on the nic model +3) add the static arp entry for guest nic +4) wait for the MTU ok +5) verify the patch mtu using ping +6) ping the guest with large frames +7) increament size ping +8) flood ping the guest with large frames +9) verify the path mtu +10) revocer the mtu + +@param test: kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. +""" + +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +session = kvm_test_utils.wait_for_login(vm) +mtu = params.get("mtu", "1500") +flood_time = params.get("flood_time", "300") +max_icmp_pkt_size = int(mtu) - 28 + +ifname = vm.get_ifname(0) +ip = vm.get_address(0) +if ip is None: +raise error.TestError("Could not get the ip address") + +try: +# Environment preparartion +ethname = kvm_net_utils.get_linux_ifname(session, vm.get_macaddr(0)) + +logging.info("Changing the mtu of guest ...") +guest_mtu_cmd = "ifconfig %s mtu %s" % (ethname , mtu) +s, o = session.get_command_status_output(guest_mtu_cmd) +if s != 0: +logging.error(o) +raise error.TestError("Fail to set the mtu of guest nic: %s" + % ethname) + +logging.info("Chaning the mtu of host tap ...") +host_mtu_cmd = "ifconfig %s mtu %s" % (ifname, mtu) +s, o = commands.getstatusoutput(host_mtu_cmd) +if s != 0: +raise error.TestError("Fail to set the mtu of %s" % ifname) + +logging.info("Add a temporary static arp entry ...") +arp_add_cmd = "arp -s %s %s -i %s" % (ip, vm.get_macaddr(0), ifname) +s, o = commands.getstatusoutput(arp_add_cmd) +if s != 0 : +raise error.TestError("Fail to add temporary arp entry") + +def is_mtu_ok(): +s, o = kvm_net_utils.ping(ip, 1, interface = ifname, + packetsize = max_icmp_pkt_size, + hint = "do", timeout = 2) +if s != 0: +return False +else: +return True + +def verify_mtu(): +logging.info("Verify the path mtu") +s, o = kvm_net_utils.ping(ip, 10, interface = ifname, + packetsize = max_icmp_pkt_size, + hint = "do", timeout = 15) +if s != 0 : +logging.error(o) +raise error.TestFail("Path MTU is not as expected") +if kvm_net_utils.get_loss_ratio(o) != 0: +logging.error(o) +raise error.TestFail("Packet loss ratio during mtu verification" + " is not zero") + +def flood_ping(): +logging.info("Flood with large frames") +kvm_net_utils.ping(ip, interface = ifname, + packetsize = max_icmp_pkt_size, + flood = True, timeout = float(flood_time)) + +def large_frame_ping(count = 100): +logging.info("Large frame ping") +s, o = kvm_net_utils.ping(ip, count, interface = ifname, + packetsize = max_icmp_pkt_size, + timeout = float(count) * 2) +ratio = kvm_net_utils.get_loss_ratio(o) +if ratio != 0: +raise error.TestFail("Loss ratio of large frame ping is %s" \ + % ratio) + +def size_increase_ping(step = random.randrange(90, 110)): +logging.info("Size increase ping") +for size in range(0, max_icmp_pkt_size + 1, step): +logging.info("Ping %s with size %s" % (ip, size)) +s, o = kvm_net_utils.ping(ip, 1, interface = ifname, + packetsize = size, + hint = "do", timeout = 1) +if s != 0: +s, o = kvm_net_utils.ping(ip, 10, interface = ifname, +
[Qemu-devel] [RFC PATCH 06/14] KVM-test: Add basic file transfer test
This test is the basic test of transfering file between host and guest. Try to transfer a large file from host to guest, and transfer it back to host, then compare the files by diff command. The default file size is 4000M, scp timeout is 1000s. It means if the average speed is less than 4M/s, this test will be fail. We can extend this test by using another disk later, then we can transfer larger files without the limit of first disk size. Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/file_transfer.py b/client/tests/kvm/tests/file_transfer.py new file mode 100644 index 000..a20e62e --- /dev/null +++ b/client/tests/kvm/tests/file_transfer.py @@ -0,0 +1,54 @@ +import logging, commands +from autotest_lib.client.common_lib import error +import kvm_utils, kvm_test_utils + +def run_file_transfer(test, params, env): +""" +Test ethrnet device function by ethtool + +1) Boot up a virtual machine +2) Create a large file by dd on host +3) Copy this file from host to guest +4) Copy this file from guest to host +5) Check if file transfers good + +@param test: Kvm test object +@param params: Dictionary with the test parameters. +@param env: Dictionary with test environment. +""" +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +timeout=int(params.get("login_timeout", 360)) +logging.info("Trying to log into guest '%s' by serial", vm.name) +session = kvm_utils.wait_for(lambda: vm.serial_login(), + timeout, 0, step=2) +if not session: +raise error.TestFail("Could not log into guest '%s'" % vm.name) + +dir = test.tmpdir +scp_timeout = int(params.get("scp_timeout")) +cmd = "dd if=/dev/urandom of=%s/a.out bs=1M count=%d" % (dir, int( + params.get("filesize", 4000))) +try: +logging.info("Create file by dd command on host, cmd: %s" % cmd) +s, o = commands.getstatusoutput(cmd) +if s != 0: +raise error.TestError("Fail to create file, output:%s" % o) + +logging.info("Transfer file from host to guest") +if not vm.copy_files_to("%s/a.out" % dir, "/tmp/b.out", +timeout=scp_timeout): +raise error.TestFail("Fail to transfer file from host to guest") + +logging.info("Transfer file from guest to host") +if not vm.copy_files_from("/tmp/b.out", "%s/c.out" % dir, +timeout=scp_timeout): +raise error.TestFail("Fail to transfer file from guest to host") + +logging.debug(commands.getoutput("ls -l %s/[ac].out" % dir)) +s, o = commands.getstatusoutput("diff %s/a.out %s/c.out" % (dir, dir)) +if s != 0: +raise error.TestFail("File changed after transfer. Output:%s" % o) +finally: +session.get_command_status("rm -f /tmp/b.out") +commands.getoutput("rm -f %s/[ac].out" % dir) +session.close() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 7f7b56a..872674e 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -357,6 +357,11 @@ variants: - jumbo: install setup unattended_install.cdrom type = jumbo +- file_transfer: install setup unattended_install.cdrom +type = file_transfer +filesize = 4000 +scp_timeout = 1000 + - physical_resources_check: install setup unattended_install.cdrom type = physical_resources_check catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}' @@ -1033,7 +1038,7 @@ variants: # Windows section - @Windows: -no autotest linux_s3 vlan_tag ioquit unattended_install.(url|nfs|remote_ks) jumbo +no autotest linux_s3 vlan_tag ioquit unattended_install.(url|nfs|remote_ks) jumbo file_transfer shutdown_command = shutdown /s /f /t 0 reboot_command = shutdown /r /f /t 0 status_test_command = echo %errorlevel%
[Qemu-devel] [RFC PATCH 09/14] KVM-test: Add a subtest of multicast
Use 'ping' to test send/recive multicat packets. Flood ping test is also added. Limit guest network as 'bridge' mode, because multicast packets could not be transmitted to guest when using 'user' network. Add join_mcast.py for joining machine into multicast groups. Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/scripts/join_mcast.py b/client/tests/kvm/scripts/join_mcast.py new file mode 100755 index 000..0d90e5c --- /dev/null +++ b/client/tests/kvm/scripts/join_mcast.py @@ -0,0 +1,29 @@ +import socket, struct, os, signal, sys +# this script is used to join machine into multicast groups +# author: Amos Kong + +if len(sys.argv) < 4: +print """%s [mgroup_count] [prefix] [suffix] +mgroup_count: count of multicast addresses +prefix: multicast address prefix +suffix: multicast address suffix""" % sys.argv[0] +sys.exit() + +mgroup_count = int(sys.argv[1]) +prefix = sys.argv[2] +suffix = int(sys.argv[3]) + +s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) +for i in range(mgroup_count): +mcast = prefix + "." + str(suffix + i) +try: +mreq = struct.pack("4sl", socket.inet_aton(mcast), socket.INADDR_ANY) +s.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq) +except: +s.close() +print "Could not join multicast: %s" % mcast +raise + +print "join_mcast_pid:%s" % os.getpid() +os.kill(os.getpid(), signal.SIGSTOP) +s.close() diff --git a/client/tests/kvm/tests/multicast.py b/client/tests/kvm/tests/multicast.py new file mode 100644 index 000..6b0e106 --- /dev/null +++ b/client/tests/kvm/tests/multicast.py @@ -0,0 +1,67 @@ +import logging, commands, os, re +from autotest_lib.client.common_lib import error +import kvm_test_utils, kvm_net_utils + + +def run_multicast(test, params, env): +""" +Test multicast function of nic (rtl8139/e1000/virtio) + +1) Create a VM +2) Join guest into multicast groups +3) Ping multicast addresses on host +4) Flood ping test with different size of packets +5) Final ping test and check if lose packet + +@param test: Kvm test object +@param params: Dictionary with the test parameters. +@param env: Dictionary with test environment. +""" +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +session = kvm_test_utils.wait_for_login(vm, + timeout=int(params.get("login_timeout", 360))) + +# stop iptable/selinux on guest/host +cmd = "/etc/init.d/iptables stop && echo 0 > /selinux/enforce" +session.get_command_status(cmd) +commands.getoutput(cmd) +# make sure guest replies to broadcasts +session.get_command_status("echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore" +"_broadcasts && echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore_all") + +# base multicast address +mcast = params.get("mcast", "225.0.0.1") +# count of multicast addresses, less than 20 +mgroup_count = int(params.get("mgroup_count", 5)) +flood_minutes = float(params.get("flood_minutes", 10)) +ifname = vm.get_ifname() +prefix = re.findall("\d+.\d+.\d+", mcast)[0] +suffix = int(re.findall("\d+", mcast)[-1]) +# copy python script to guest for joining guest to multicast groups +mcast_path = os.path.join(test.bindir, "scripts/join_mcast.py") +vm.copy_files_to(mcast_path, "/tmp") +output = session.get_command_output("python /tmp/join_mcast.py %d %s %d" % +(mgroup_count, prefix, suffix)) +# if success to join multicast the process will be paused, and return pid. +if not re.findall("join_mcast_pid:(\d+)", output): +raise error.TestFail("Can't join multicast groups,output:%s" % output) +pid = output.split()[0] + +try: +for i in range(mgroup_count): +new_suffix = suffix + i +mcast = "%s.%d" % (prefix, new_suffix) +logging.info("Initial ping test, mcast: %s", mcast) +s, o = kvm_net_utils.ping(mcast, 10, interface=ifname, timeout=20) +if s != 0: +raise error.TestFail(" Ping return non-zero value %s" % o) +logging.info("Flood ping test, mcast: %s", mcast) +kvm_net_utils.ping(mcast, None, interface=ifname, flood=True, + output_func=None, timeout=flood_minutes*60) +logging.info("Final ping test, mcast: %s", mcast) +s, o = kvm_net_utils.ping(mcast, 10, interface=ifname, timeout=20) +if s != 0: +raise error.TestFail(" Ping return non-zero value %s" % o) +finally: +session.get_command_output("kill -s SIGCONT %s" % pid) +session.close() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 9e2b9a0..9594a38 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -374,6 +374,13 @@ variants:
[Qemu-devel] [RFC PATCH 13/14] KVM-test: Improve vlan subtest
This is an enhancement of existed vlan test. Rename the vlan_tag.py to vlan.py, it is more reasonable. . Setup arp from "/proc/sys/net/ipv4/conf/all/arp_ignore" . Multiple vlans exist simultaneously . Test ping between same and different vlans . Test by TCP data transfer, floop ping between same vlan . Maximal plumb/unplumb vlans Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/vlan.py b/client/tests/kvm/tests/vlan.py new file mode 100644 index 000..dc7611b --- /dev/null +++ b/client/tests/kvm/tests/vlan.py @@ -0,0 +1,181 @@ +import logging, time, re +from autotest_lib.client.common_lib import error +import kvm_subprocess, kvm_test_utils, kvm_utils, kvm_net_utils + +def run_vlan(test, params, env): +""" +Test 802.1Q vlan of NIC, config it by vconfig command. + +1) Create two VMs +2) Setup guests in 10 different vlans by vconfig and using hard-coded + ip address +3) Test by ping between same and different vlans of two VMs +4) Test by TCP data transfer, floop ping between same vlan of two VMs +5) Test maximal plumb/unplumb vlans +6) Recover the vlan config + +@param test: KVM test object. +@param params: Dictionary with the test parameters. +@param env: Dictionary with test environment. +""" + +vm = [] +session = [] +vm_ip = [] +digest_origin = [] +vlan_ip = ['', ''] +ip_unit = ['1', '2'] +subnet = params.get("subnet") +vlan_num = int(params.get("vlan_num")) +maximal = int(params.get("maximal")) +file_size = params.get("file_size") + +vm.append(kvm_test_utils.get_living_vm(env, params.get("main_vm"))) +vm.append(kvm_test_utils.get_living_vm(env, "vm2")) + +def add_vlan(session, id, iface="eth0"): +if session.get_command_status("vconfig add %s %s" % (iface, id)) != 0: +raise error.TestError("Fail to add %s.%s" % (iface, id)) + +def set_ip_vlan(session, id, ip, iface="eth0"): +iface = "%s.%s" % (iface, id) +if session.get_command_status("ifconfig %s %s" % (iface, ip)) != 0: +raise error.TestError("Fail to configure ip for %s" % iface) + +def set_arp_ignore(session, iface="eth0"): +ignore_cmd = "echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore" +if session.get_command_status(ignore_cmd) != 0: +raise error.TestError("Fail to set arp_ignore of %s" % session) + +def rem_vlan(session, id, iface="eth0"): +rem_vlan_cmd = "if [[ -e /proc/net/vlan/%s ]];then vconfig rem %s;fi" +iface = "%s.%s" % (iface, id) +s = session.get_command_status(rem_vlan_cmd % (iface, iface)) +return s + +def nc_transfer(src, dst): +nc_port = kvm_utils.find_free_port(1025, 5334, vm_ip[dst]) +listen_cmd = params.get("listen_cmd") +send_cmd = params.get("send_cmd") + +#listen in dst +listen_cmd = listen_cmd % (nc_port, "receive") +session[dst].sendline(listen_cmd) +time.sleep(2) +#send file from src to dst +send_cmd = send_cmd % (vlan_ip[dst], str(nc_port), "file") +if session[src].get_command_status(send_cmd, timeout = 60) != 0: +raise error.TestFail ("Fail to send file" +" from vm%s to vm%s" % (src+1, dst+1)) +s, o = session[dst].read_up_to_prompt(timeout=60) +if s != True: +raise error.TestFail ("Fail to receive file" +" from vm%s to vm%s" % (src+1, dst+1)) +#check MD5 message digest of receive file in dst +output = session[dst].get_command_output("md5sum receive").strip() +digest_receive = re.findall(r'(\w+)', output)[0] +if digest_receive == digest_origin[src]: +logging.info("file succeed received in vm %s" % vlan_ip[dst]) +else: +logging.info("digest_origin is %s" % digest_origin[src]) +logging.info("digest_receive is %s" % digest_receive) +raise error.TestFail("File transfered differ from origin") +session[dst].get_command_status("rm -f receive") + +for i in range(2): +session.append(kvm_test_utils.wait_for_login(vm[i], + timeout=int(params.get("login_timeout", 360 +if not session[i] : +raise error.TestError("Could not log into guest(vm%d)" % i) +logging.info("Logged in") + +#get guest ip +vm_ip.append(vm[i].get_address()) + +#produce sized file in vm +dd_cmd = "dd if=/dev/urandom of=file bs=1024k count=%s" +if session[i].get_command_status(dd_cmd % file_size) != 0: +raise error.TestFail("File producing failed") +#record MD5 message digest of file +s, output =session[i].get_command_status_output("md5sum file", +timeout=60) +if s != 0: +raise error.Test
[Qemu-devel] [RFC PATCH 04/14] KVM-test: Add a new subtest ping
This test use ping to check the virtual nics, it contains two kinds of test: 1. Packet loss ratio test, ping the guest with different size of packets. 2. Stress test, flood ping guest then use ordinary ping to test the network. The interval and packet size could be configurated through tests_base.cfg Signed-off-by: Jason Wang Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/ping.py b/client/tests/kvm/tests/ping.py new file mode 100644 index 000..cfccda4 --- /dev/null +++ b/client/tests/kvm/tests/ping.py @@ -0,0 +1,71 @@ +import logging, time, re, commands +from autotest_lib.client.common_lib import error +import kvm_subprocess, kvm_test_utils, kvm_utils, kvm_net_utils + + +def run_ping(test, params, env): +""" +Ping the guest with different size of packets. + +Packet Loss Test: +1) Ping the guest with different size/interval of packets. +Stress Test: +1) Flood ping the guest. +2) Check if the network is still usable. + +@param test: Kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. +""" + +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +session = kvm_test_utils.wait_for_login(vm) + +counts = params.get("ping_counts", 100) +flood_minutes = float(params.get("flood_minutes", 10)) +nics = params.get("nics").split() +strict_check = params.get("strict_check", "no") == "yes" + +packet_size = [0, 1, 4, 48, 512, 1440, 1500, 1505, 4054, 4055, 4096, 4192, + 8878, 9000, 32767, 65507] + +try: +for i, nic in enumerate(nics): +ip = vm.get_address(i) +if not ip: +logging.error("Could not get the ip of nic index %d" % i) +continue + +for size in packet_size: +logging.info("Ping with packet size %s" % size) +status, output = kvm_net_utils.ping(ip, 10, +packetsize = size, +timeout = 20) +if strict_check: +ratio = kvm_net_utils.get_loss_ratio(output) +if ratio != 0: +raise error.TestFail(" Loss ratio is %s for packet size" + " %s" % (ratio, size)) +else: +if status != 0: +raise error.TestFail(" Ping returns non-zero value %s" % + output) + +logging.info("Flood ping test") +kvm_net_utils.ping(ip, None, flood = True, output_func= None, + timeout = flood_minutes * 60) + +logging.info("Final ping test") +status, output = kvm_net_utils.ping(ip, counts, +timeout = float(counts) * 1.5) +if strict_check: +ratio = kvm_net_utils.get_loss_ratio(output) +if ratio != 0: +raise error.TestFail("Packet loss ratio is %s after flood" + % ratio) +else: +if status != 0: +raise error.TestFail(" Ping returns non-zero value %s" % + output) +finally: +session.close() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 6710c00..4f58dc0 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -349,6 +349,11 @@ variants: kill_vm_gracefully_vm2 = no address_index_vm2 = 1 +- ping: install setup unattended_install.cdrom +type = ping +counts = 100 +flood_minutes = 10 + - physical_resources_check: install setup unattended_install.cdrom type = physical_resources_check catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'
[Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm
Old method uses the mac address in the configuration files which could lead serious problem when multiple tests running in different hosts. This patch adds a new macaddress pool algorithm, it generates the mac prefix based on mac address of the host which could eliminate the duplicated mac addresses between machines. When user have set the mac_prefix in the configuration file, we should use it in stead of the dynamic generated mac prefix. Other change: . Fix randomly generating mac address so that it correspond to IEEE802. . Update clone function to decide clone mac address or not. . Update get_macaddr function. . Add set_mac_address function. New auto mac address pool algorithm: If address_index is defined, VM will get mac from config file then record mac in to address_pool. If address_index is not defined, VM will call get_mac_from_pool to auto create mac then recored mac to address_pool in following format: {'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}} AE:9D:94:6A:9b:f9: mac address 20100310-165222-Wt7l : instance attribute of VM 0: index of NIC Signed-off-by: Jason Wang Signed-off-by: Feng Yang Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index fb2d1c2..7c0946e 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -5,6 +5,7 @@ KVM test utility functions. """ import time, string, random, socket, os, signal, re, logging, commands, cPickle +import fcntl, shelve from autotest_lib.client.bin import utils from autotest_lib.client.common_lib import error, logging_config import kvm_subprocess @@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword): # Functions related to MAC/IP addresses +def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'): +""" +random generated mac address. + +1) First try to generate macaddress based on the mac address prefix. +2) And then try to use total random generated mac address. + +@param root_dir: Root dir for kvm +@param vm: Here we use instance of vm +@param nic_index: The index of nic. +@param prefix: Prefix of mac address. +@Return: Return mac address. +""" + +lock_filename = os.path.join(root_dir, "mac_lock") +lock_file = open(lock_filename, 'w') +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX) +mac_filename = os.path.join(root_dir, "address_pool") +mac_shelve = shelve.open(mac_filename, writeback=False) + +mac_pool = mac_shelve.get("macpool") + +if not mac_pool: +mac_pool = {} +found = False + +val = "%s:%s" % (vm, nic_index) +for key in mac_pool.keys(): +if val in mac_pool[key]: +mac_pool[key].append(val) +found = True +mac = key + +while not found: +postfix = "%02x:%02x" % (random.randint(0x00,0xfe), +random.randint(0x00,0xfe)) +mac = prefix + postfix +mac_list = mac.split(":") +# Clear multicast bit +mac_list[0] = int(mac_list[0],16) & 0xfe +# Set local assignment bit (IEEE802) +mac_list[0] = mac_list[0] | 0x02 +mac_list[0] = "%02x" % mac_list[0] +mac = ":".join(mac_list) +if mac not in mac_pool.keys() or 0 == len(mac_pool[mac]): +mac_pool[mac] = ["%s:%s" % (vm,nic_index)] +found = True +mac_shelve["macpool"] = mac_pool +logging.debug("generating mac addr %s " % mac) + +mac_shelve.close() +fcntl.lockf(lock_file.fileno(), fcntl.LOCK_UN) +lock_file.close() +return mac + + +def put_mac_to_pool(root_dir, mac, vm): +""" +Put the macaddress back to address pool + +@param root_dir: Root dir for kvm +@param vm: Here we use instance attribute of vm +@param mac: mac address will be put. +@Return: mac address. +""" + +lock_filename = os.path.join(root_dir, "mac_lock") +lock_file = open(lock_filename, 'w') +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX) +mac_filename = os.path.join(root_dir, "address_pool") +mac_shelve = shelve.open(mac_filename, writeback=False) + +mac_pool = mac_shelve.get("macpool") + +if not mac_pool or (not mac in mac_pool): +logging.debug("Try to free a macaddress does no in pool") +logging.debug("macaddress is %s" % mac) +logging.debug("pool is %s" % mac_pool) +else: +if len(mac_pool[mac]) <= 1: +mac_pool.pop(mac) +else: +for value in mac_pool[mac]: +if vm in value: +mac_pool[mac].remove(value) +break +if len(mac_pool[mac]) == 0: +mac_pool.pop(mac) + +mac_shelve["macpool"] = mac_pool +logging.debug("freeing mac addr %s " % mac) + +mac_shelve.close() +fcntl.lockf(lock_file.fileno(), fcntl.LOCK_UN) +lock_file.close() +return
[Qemu-devel] [RFC PATCH 08/14] KVM-test: Add a subtest of nic promisc
This test mainly covers TCP sent from host to guest and from guest to host with repeatedly turn on/off NIC promiscuous mode. Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/nic_promisc.py b/client/tests/kvm/tests/nic_promisc.py new file mode 100644 index 000..9a0c979 --- /dev/null +++ b/client/tests/kvm/tests/nic_promisc.py @@ -0,0 +1,87 @@ +import logging, commands +from autotest_lib.client.common_lib import error +import kvm_utils, kvm_test_utils, kvm_net_utils + +def run_nic_promisc(test, params, env): +""" +Test nic driver in promisc mode: + +1) Boot up a guest +2) Repeatedly enable/disable promiscuous mode in guest +3) TCP data transmission from host to guest, and from guest to host, + with 1/1460/65000/1 bytes payloads +4) Clean temporary files +5) Stop enable/disable promiscuous mode change + +@param test: kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment +""" +timeout = int(params.get("login_timeout", 360)) +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +session = kvm_test_utils.wait_for_login(vm, timeout=timeout) +logging.info("Trying to log into guest '%s' by serial", vm.name) +session2 = kvm_utils.wait_for(lambda: vm.serial_login(), + timeout, 0, step=2) +if not session2: +raise error.TestFail("Could not log into guest '%s'" % vm.name) + +def compare(filename): +cmd = "md5sum %s" % filename +s1, ret_host = commands.getstatusoutput(cmd) +s2, ret_guest = session.get_command_status_output(cmd) +if s1 != 0 or s2 != 0: +logging.debug("ret_host:%s, ret_guest:%s" % (ret_host, ret_guest)) +logging.error("Could not get md5, cmd:%s" % cmd) +return False +if ret_host.strip() != ret_guest.strip(): +logging.debug("ret_host :%s, ret_guest:%s" % (ret_host, ret_guest)) +logging.error("Files' md5sum mismatch" % (receiver)) +return False +return True + +ethname = kvm_net_utils.get_linux_ifname(session, vm.get_macaddr(0)) +set_promisc_cmd = "ip link set %s promisc on; sleep 0.01;" % ethname +set_promisc_cmd += "ip link set %s promisc off; sleep 0.01" % ethname +logging.info("Set promisc change repeatedly in guest") +session2.sendline("while true; do %s; done" % set_promisc_cmd) + +dd_cmd = "dd if=/dev/urandom of=%s bs=%d count=1" +filename = "/tmp/nic_promisc_file" +file_size = params.get("file_size", "1, 1460, 65000, 1").split(",") +try: +for size in file_size: +logging.info("Create %s bytes file on host" % size) +s, o = commands.getstatusoutput(dd_cmd % (filename, int(size))) +if s != 0: +logging.debug("Output: %s"% o) +raise error.TestFail("Create file on host failed") + +logging.info("Transfer file from host to guest") +if not vm.copy_files_to(filename, filename): +raise error.TestFail("File transfer failed") +if not compare(filename): +raise error.TestFail("Compare file failed") + +logging.info("Create %s bytes file on guest" % size) +if session.get_command_status(dd_cmd % (filename, int(size)), +timeout=100) != 0: +raise error.TestFail("Create file on guest failed") + +logging.info("Transfer file from guest to host") +if not vm.copy_files_from(filename, filename): +raise error.TestFail("File transfer failed") +if not compare(filename): +raise error.TestFail("Compare file failed") + +logging.info("Clean temporal files") +cmd = "rm -f %s" % filename +s1, o = commands.getstatusoutput(cmd) +s2 = session.get_command_status(cmd) +if s1 != 0 or s2 != 0: +raise error.TestError("Fail to clean temporal files") +finally: +logging.info("Restore the %s to the nonpromisc mode" % ethname) +session2.close() +session.get_command_status("ip link set %s promisc off" % ethname) +session.close() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 03d15c0..9e2b9a0 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -370,6 +370,10 @@ variants: scp_timeout = 300 thread_num = 10 +- nic_promisc: install setup unattended_install.cdrom +type = nic_promisc +file_size = 1, 1460, 65000, 1 + - physical_resources_check: install setup unattended_install.cdrom type = physical_resources_check catch_uuid_cmd = dmidecode | awk -F: '/UU
[Qemu-devel] [RFC PATCH 07/14] KVM-test: Add a subtest of load/unload nic driver
Repeatedly load/unload nic driver, try to transfer file between guest and host by threads at the same time, and check the md5sum. Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/nicdriver_unload.py b/client/tests/kvm/tests/nicdriver_unload.py new file mode 100644 index 000..22f9f44 --- /dev/null +++ b/client/tests/kvm/tests/nicdriver_unload.py @@ -0,0 +1,128 @@ +import logging, commands, threading, re, os +from autotest_lib.client.common_lib import error +import kvm_utils, kvm_test_utils, kvm_net_utils + +def run_nicdriver_unload(test, params, env): +""" +Test nic driver + +1) Boot a vm +2) Get the nic driver name +3) Repeatedly unload/load nic driver +4) Multi-session TCP transfer on test interface +5) Check the test interface should still work + +@param test: KVM test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. +""" +timeout = int(params.get("login_timeout", 360)) +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +session = kvm_test_utils.wait_for_login(vm, timeout=timeout) +logging.info("Trying to log into guest '%s' by serial", vm.name) +session2 = kvm_utils.wait_for(lambda: vm.serial_login(), + timeout, 0, step=2) +if not session2: +raise error.TestFail("Could not log into guest '%s'" % vm.name) + +ethname = kvm_net_utils.get_linux_ifname(session, vm.get_macaddr(0)) +try: +# FIXME: Try three waies to get nic driver name, because the +# modprobe.conf is dropped in latest system, and ethtool method is not +# supported by virtio_nic. + +output = session.get_command_output("cat /etc/modprobe.conf") +driver = re.findall(r'%s (\w+)' % ethname,output) +if not driver: +output = session.get_command_output("ethtool -i %s" % ethname) +driver = re.findall(r'driver: (\w+)', output) +if not driver: +output = session.get_command_output("lspci -k") +driver = re.findall("Ethernet controller.*\n.*\n.*Kernel driver" +" in use: (\w+)", output) +driver = driver[0] +except IndexError: +raise error.TestError("Could not find driver name") + +logging.info("driver is %s" % driver) + +class ThreadScp(threading.Thread): +def run(self): +remote_file = '/tmp/' + self.getName() +file_list.append(remote_file) +ret = vm.copy_files_to(file_name, remote_file, timeout=scp_timeout) +logging.debug("Copy result of %s: %s" % (remote_file, ret)) + +def compare(origin_file, receive_file): +cmd = "md5sum %s" +output1 = commands.getstatusoutput(cmd % origin_file)[1].strip() +check_sum1 = output1.split()[0] +s, output2 = session.get_command_status_output(cmd % receive_file) +if s != 0: +logging.error("Could not get md5sum of receive_file") +return False +check_sum2 = output2.strip().split()[0] +logging.debug("origin: %s, receive: %s" % (check_sum1, check_sum2)) +if check_sum1 != check_sum2: +logging.error("md5sum doesn't match") +return False +return True + +#produce sized file in host +file_size = params.get("file_size") +file_name = "/tmp/nicdriver_unload_file" +cmd = "dd if=/dev/urandom of=%s bs=%sM count=1" +s, o = commands.getstatusoutput(cmd % (file_name, file_size)) +if s != 0: +raise error.TestFail("Fail to create file by dd") + +connect_time = params.get("connect_time") +scp_timeout = int(params.get("scp_timeout")) +thread_num = int(params.get("thread_num")) +file_list = [] + +unload_load_cmd = "sleep %s && ifconfig %s down && modprobe -r %s && " +unload_load_cmd += "sleep 1 && modprobe %s && ifconfig %s up" +unload_load_cmd = unload_load_cmd % (connect_time, ethname, driver, + driver, ethname) +pid = os.fork() +if pid != 0: +logging.info("unload/load nic driver repeatedly in guest...") +while True: +logging.debug("Try to unload/load nic drive once") +if session2.get_command_status(unload_load_cmd, timeout=120) != 0: +session.get_command_output("rm -rf /tmp/Thread-*") +raise error.TestFail("Unload/load nic driver failed") +pid, s = os.waitpid(pid, os.WNOHANG) +status = os.WEXITSTATUS(s) +if (pid, status) != (0, 0): +logging.debug("Child process ending") +break +else: +logging.info("Multi-session tcp data transfer") +threads = [] +for i in range(thread_num): +t = ThreadScp() +t.start() +threads.append(t) +for t in threads: +
[Qemu-devel] [RFC PATCH 03/14] KVM Test: Add a common ping module for network related tests
The kvm_net_utils.py is a just a place that wraps common network related commands which is used to do the network-related tests. Use -1 as the packet ratio for loss analysis. Use quiet mode when doing the flood ping. Signed-off-by: Jason Wang Signed-off-by: Amos Kong --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/kvm_net_utils.py b/client/tests/kvm/kvm_net_utils.py index ede4965..8a71858 100644 --- a/client/tests/kvm/kvm_net_utils.py +++ b/client/tests/kvm/kvm_net_utils.py @@ -1,4 +1,114 @@ -import re +import logging, re, signal +from autotest_lib.client.common_lib import error +import kvm_subprocess, kvm_utils + +def get_loss_ratio(output): +""" +Get the packet loss ratio from the output of ping + +@param output +""" +try: +return int(re.findall('(\d+)% packet loss', output)[0]) +except IndexError: +logging.debug(output) +return -1 + +def raw_ping(command, timeout, session, output_func): +""" +Low-level ping command execution. + +@param command: ping command +@param timeout: timeout of the ping command +@param session: local executon hint or session to execute the ping command +""" +if session == "localhost": +process = kvm_subprocess.run_bg(command, output_func=output_func, +timeout=timeout) + +# Send SIGINT singal to notify the timeout of running ping process, +# Because ping have the ability to catch the SIGINT signal so we can +# always get the packet loss ratio even if timeout. +if process.is_alive(): +kvm_utils.kill_process_tree(process.get_pid(), signal.SIGINT) + +status = process.get_status() +output = process.get_output() + +process.close() +return status, output +else: +session.sendline(command) +status, output = session.read_up_to_prompt(timeout=timeout, + print_func=output_func) +if status is False: +# Send ctrl+c (SIGINT) through ssh session +session.sendline("\003") +status, output2 = session.read_up_to_prompt(print_func=output_func) +output += output2 +if status is False: +# We also need to use this session to query the return value +session.sendline("\003") + +session.sendline(session.status_test_command) +s2, o2 = session.read_up_to_prompt() +if s2 is False: +status = -1 +else: +try: +status = int(re.findall("\d+", o2)[0]) +except: +status = -1 + +return status, output + +def ping(dest = "localhost", count = None, interval = None, interface = None, + packetsize = None, ttl = None, hint = None, adaptive = False, + broadcast = False, flood = False, timeout = 0, + output_func = logging.debug, session = "localhost"): +""" +Wrapper of ping. + +@param dest: destination address +@param count: count of icmp packet +@param interval: interval of two icmp echo request +@param interface: specified interface of the source address +@param packetsize: packet size of icmp +@param ttl: ip time to live +@param hint: path mtu discovery hint +@param adaptive: adaptive ping flag +@param broadcast: broadcast ping flag +@param flood: flood ping flag +@param timeout: timeout for the ping command +@param output_func: function used to log the result of ping +@param session: local executon hint or session to execute the ping command +""" + +command = "ping %s " % dest + +if count is not None: +command += " -c %s" % count +if interval is not None: +command += " -i %s" % interval +if interface is not None: +command += " -I %s" % interface +if packetsize is not None: +command += " -s %s" % packetsize +if ttl is not None: +command += " -t %s" % ttl +if hint is not None: +command += " -M %s" % hint +if adaptive is True: +command += " -A" +if broadcast is True: +command += " -b" +if flood is True: +# temporary workaround as the kvm_subprocess may not properly handle +# the timeout for the output of flood ping +command += " -f -q" +output_func = None + +return raw_ping(command, timeout, session, output_func) def get_linux_ifname(session, mac_address): """
[Qemu-devel] [Autotest][RFC PATCH 00/14] Patchset of network related subtests
The following series contain 11 network related subtests, welcome to give me some suggestions about correctness, design, enhancement. Thank you so much! --- Amos Kong (14): KVM-test: Add a new macaddress pool algorithm KVM Test: Add a function get_interface_name() to kvm_net_utils.py KVM Test: Add a common ping module for network related tests KVM-test: Add a new subtest ping KVM-test: Add a subtest jumbo KVM-test: Add basic file transfer test KVM-test: Add a subtest of load/unload nic driver KVM-test: Add a subtest of nic promisc KVM-test: Add a subtest of multicast KVM-test: Add a subtest of pxe KVM-test: Add a subtest of changing mac address KVM-test: Add a subtest of netperf KVM-test: Improve vlan subtest KVM-test: Add subtest of testing offload by ethtool 0 files changed, 0 insertions(+), 0 deletions(-) -- Amos Kong
[Qemu-devel] Bonjour..
Bonjours a vous, Bonjours a vous, Peut-on gagner sa vie avec internet ? De nombreuses personnes se posent cette question. Les moyens gratuits accessibles par tous ne permettent pas de vivre d'internet (grand maxi 500 euros par mois). Ces moyens permettent de se faire un peu d'argent de poche, ce qui est déjà très bien. De nombreux systèmes à paliers promettent des très grandes sommes sous forme de rentes si on investit au départ : méfiez vous, la plupart du temps ces sites ne payent jamais. Mais c'est bel et bien possible de gagner sa vie sur internet. Si vous voulez en savoir plus .. venez sur http://gagnezargentnet.awardspace.biz afin d'en savoir plus sur comment proceder pour faire de l'argent sur internet en toute légalité bien entendu. Si vous etes donc interessé vous n'avez aucune raison de ne pas venir apprendre .. Beaucoup l'on deja fait et d'autres le feront. Ce n est pas de l'argent facile. c'est juste une autre facon d'en gagner .. Si vous avez des questions, je vous invite a me contacter par e-mail a l'adresse ci dessous. garcia_bast...@hotmail.com cordialement, Garcia Bastien Specialiste Dans le business et Marketing sur le Net . --extPart_001_1386_695C700A.4A1B1D8B--
[Qemu-devel] [Autotest PATCH 00/14] Patchset of network related subtests
The following series contain 11 network related subtests, welcome to give me some suggestions about correctness, design, enhancement. Thank you so much! --- Amos Kong (14): KVM-test: Add a new macaddress pool algorithm KVM Test: Add a function get_interface_name() to kvm_net_utils.py KVM Test: Add a common ping module for network related tests KVM-test: Add a new subtest ping KVM-test: Add a subtest jumbo KVM-test: Add basic file transfer test KVM-test: Add a subtest of load/unload nic driver KVM-test: Add a subtest of nic promisc KVM-test: Add a subtest of multicast KVM-test: Add a subtest of pxe KVM-test: Add a subtest of changing mac address KVM-test: Add a subtest of netperf KVM-test: Improve vlan subtest KVM-test: Add subtest of testing offload by ethtool 0 files changed, 0 insertions(+), 0 deletions(-) -- Amos Kong
[Qemu-devel] [PULL] vhost, e1000
The following changes since commit 488243b0e9126daa5f1e7fb2e97717b66a977517: target-ppc: fix power mode checking on 7400/7410 (2010-07-19 00:33:29 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu.git for_anthony Michael S. Tsirkin (4): Merge branch 'master' into pci e1000: fix access 4 bytes beyond buffer end e1000: secrc support vhost: fix miration during device start hw/e1000.c | 12 ++-- hw/pci-hotplug.c |2 +- hw/pci.c | 34 +- hw/pcnet.c | 16 ++-- hw/rtl8139.c |3 --- hw/vhost.c | 21 +++-- hw/virtio-net.c | 41 - hw/vmware_vga.c |3 --- sysemu.h |1 - 9 files changed, 77 insertions(+), 56 deletions(-)
[Qemu-devel] KVM call agenda for July 20
Please send in any agenda items you are interested in covering. thanks, -chris
[Qemu-devel] Re: [PATCH] block: Use error codes from lower levels for error message
Am 19.07.2010 14:26, schrieb Kevin Wolf: Am 18.07.2010 21:42, schrieb Stefan Weil: "No such file or directory" is a misleading error message when a user tries to open a file with wrong permissions. Cc: Kevin Wolf Signed-off-by: Stefan Weil --- block.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/block.c b/block.c index f837876..2f80540 100644 --- a/block.c +++ b/block.c @@ -330,16 +330,20 @@ BlockDriver *bdrv_find_protocol(const char *filename) return NULL; } -static BlockDriver *find_image_format(const char *filename) +static BlockDriver *find_image_format(const char *filename, int *error) Wouldn't it be a more natural interface to return an 0/-errno int and pass the BlockDriver* by reference? I think we already have some function that work this way in the block code, but I can't remember any that get an int *error. ... nor did I find a function which takes a BlockDriver**. But if you prefer it like that, I can send a new version of the patch. { int ret, score, score_max; BlockDriver *drv1, *drv; uint8_t buf[2048]; BlockDriverState *bs; +*error = -ENOENT; Why -ENOENT is the default would be clearer if you moved it down next to the drv = NULL before the loop that searches for the driver. What about the "return bdrv_find_format" lines? They need a default value, too. And I did not want to change too much because I cannot run a complete test for all cases. So setting *error at the beginning should be the safest modification. Apart from these minor nitpicks it looks good. Kevin Thanks. Stefan
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 07/19/2010 11:11 AM, Gleb Natapov wrote: On Mon, Jul 19, 2010 at 10:54:03AM -0500, Anthony Liguori wrote: On 07/19/2010 09:53 AM, Gleb Natapov wrote: On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote: On 07/19/2010 02:33 AM, Gleb Natapov wrote: On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: That what I am warring about too. If we are adding device we have to be sure such device can actually exist on real hw too otherwise we may have problems later. I don't understand why the constraints of real h/w have anything to do with this. Can you explain? Each time we do something not architectural it cause us troubles later. So constraints of real h/w is our constrains to. Your constraints are purely artificial. What is artificial about it? Each time we break them we safer. Just because something doesn't fit as an ISA or PCI device doesn't mean it can't exist in real life. There are plenty of one-off devices with odd interfaces. And there are such that cause cpu to stall for 6.5 seconds when you do io to them? That would certainly be a poorly designed interface. I can appreciate your point and I think suggesting that we should implement an ad-hoc completion interface is reasonable. For instance, outl(FW_CFG_SET_INITRD_ADDR, addr) while !inb(FW_CFG_INITRD_READY): // spin There are plenty of places that something like fw_cfg could live and still do DMA. It can directly hang off of the Southbridge. It doesn't necessary need to be connected to the ISA/LPC buses. Examples of real HW? The IBM IMM, HP ILO, or Intel iAMT modules. They basically play an identical role to fw_cfg. So what are their interfaces? May be we should emulate one. The interface to firmware is private and changes from platform to platform. The IMM exposes various interfaces to the OS as it implements a number of legacy devices. It also exposes a side-channel (very similar to virtio-console) as a USB RNDIS driver. I believe it implements IPMI over a private ethernet type although I'd have to double check. It may actually use TCP/IP. Of course it is possible to add proper DMA interface to fw_cfg, but should we do it for such a small gain? I think an ad-hoc DMA interface is perfectly reasonable to do. I agree that adding a more generic DMA interface is overkill. It should look like real DMA at least. The justification for it should be better than "In our project we don't what to do this and we don't what to do that so our initrd is 100M now, so why not add hack to qemu to load it 1 second faster so we can grow it some more". I certainly agree that adding a polling interface for DMA completion is a reasonable requirement. Regards, Anthony Liguori -- Gleb.
[Qemu-devel] [PATCH v2] loadvm: improve tests before bdrv_snapshot_goto()
This patch improves the resilience of the load_vmstate() function, doing further and better ordered tests. In load_vmstate(), if there is any error on bdrv_snapshot_goto(), except if the error is on VM state device, load_vmstate() will return zero and the VM will be started with major corruption chances. The current process: - test if there is any writable device without snapshot support - if exists return -error - get the device that saves the VM state, possible return -error but unlikely because it was tested earlier - flush I/O - run bdrv_snapshot_goto() on devices - if fails, give an warning and goes to the next (not good!) - if fails on the VM state device, return zero (not good!) - check if the requested snapshot exists on the device that saves the VM state and the state is not zero - if fails return -error - open the file with the VM state - if fails return -error - load the VM state - if fails return -error - return zero New behavior: - get the device that saves the VM state - if fails return -error - check if the requested snapshot exists on the device that saves the VM state and the state is not zero - if fails return -error - test if there is any writable device without snapshot support - if exists return -error - test if the devices with snapshot support have the requested snapshot - if anyone fails, return -error - flush I/O - run snapshot_goto() on devices - if anyone fails, return -error - open the file with the VM state - if fails return -error - load the VM state - if fails return -error - return zero do_loadvm must not call vm_start if any error has occurred in load_vmstate. Changelog from v1 --- - Use -ENOTSUP instead of -EINVAL when no device supports snapshots - Split the verification of the existance of an snapshot on the VM state device and the verification of the size of the saved VM state Signed-off-by: Miguel Di Ciurcio Filho --- monitor.c |3 +- savevm.c | 71 +--- 2 files changed, 36 insertions(+), 38 deletions(-) diff --git a/monitor.c b/monitor.c index 45fd482..aa60cfa 100644 --- a/monitor.c +++ b/monitor.c @@ -2270,8 +2270,9 @@ static void do_loadvm(Monitor *mon, const QDict *qdict) vm_stop(0); -if (load_vmstate(name) >= 0 && saved_vm_running) +if (load_vmstate(name) == 0 && saved_vm_running) { vm_start(); +} } int monitor_get_fd(Monitor *mon, const char *fdname) diff --git a/savevm.c b/savevm.c index ee27989..f21873e 100644 --- a/savevm.c +++ b/savevm.c @@ -1804,12 +1804,27 @@ void do_savevm(Monitor *mon, const QDict *qdict) int load_vmstate(const char *name) { -BlockDriverState *bs, *bs1; +BlockDriverState *bs, *bs_vm_state; QEMUSnapshotInfo sn; QEMUFile *f; int ret; -/* Verify if there is a device that doesn't support snapshots and is writable */ +bs_vm_state = bdrv_snapshots(); +if (!bs_vm_state) { +error_report("No block device supports snapshots"); +return -ENOTSUP; +} + +/* Don't even try to load empty VM states */ +ret = bdrv_snapshot_find(bs_vm_state, &sn, name); +if (ret < 0) { +return ret; +} else if (sn.vm_state_size == 0) { +return -EINVAL; +} + +/* Verify if there is any device that doesn't support snapshots and is +writable and check if the requested snapshot is available too. */ bs = NULL; while ((bs = bdrv_next(bs))) { @@ -1822,63 +1837,45 @@ int load_vmstate(const char *name) bdrv_get_device_name(bs)); return -ENOTSUP; } -} -bs = bdrv_snapshots(); -if (!bs) { -error_report("No block device supports snapshots"); -return -EINVAL; +ret = bdrv_snapshot_find(bs, &sn, name); +if (ret < 0) { +error_report("Device '%s' does not have the requested snapshot '%s'", + bdrv_get_device_name(bs), name); +return ret; +} } /* Flush all IO requests so they don't interfere with the new state. */ qemu_aio_flush(); -bs1 = NULL; -while ((bs1 = bdrv_next(bs1))) { -if (bdrv_can_snapshot(bs1)) { -ret = bdrv_snapshot_goto(bs1, name); +bs = NULL; +while ((bs = bdrv_next(bs))) { +if (bdrv_can_snapshot(bs)) { +ret = bdrv_snapshot_goto(bs, name); if (ret < 0) { -switch(ret) { -case -ENOTSUP: -error_report("%sSnapshots not supported on device '%s'", - bs != bs1 ? "Warning: " : "", - bdrv_get_device_name(bs1)); -break; -case -ENOENT: -error_report("%sCould not find snapshot '%s' on device '%s'", - bs != bs1 ? "Warning: " : "", -
[Qemu-devel] Re: [PATCH] loadvm: improve tests before bdrv_snapshot_goto()
On Mon, Jul 19, 2010 at 11:22 AM, Kevin Wolf wrote: >> >> - Â Â /* Verify if there is a device that doesn't support snapshots and is >> writable */ >> + Â Â bs_vm_state = bdrv_snapshots(); >> + Â Â if (!bs_vm_state) { >> + Â Â Â Â error_report("No block device supports snapshots"); >> + Â Â Â Â return -EINVAL; > > -ENOTSUP? It was -EINVAL before, just kept it. But -ENOTSUP make more sense. >> + Â Â /* Don't even try to load empty VM states */ >> + Â Â ret = bdrv_snapshot_find(bs_vm_state, &sn, name); >> + Â Â if ((ret >= 0) && (sn.vm_state_size == 0)) { >> + Â Â Â Â return -EINVAL; >> + Â Â } > > You can probably stop here already if ret < 0: > > ret = ... > if (ret < 0) { > Â Â return ret; > } else if (sn.vm_state_size == 0) { > Â Â return -EINVAL; > } > Better indeed. >> >> @@ -1821,64 +1834,46 @@ int load_vmstate(const char *name) >> Â Â Â Â Â Â Â error_report("Device '%s' is writable but does not support >> snapshots.", >> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â bdrv_get_device_name(bs)); >> Â Â Â Â Â Â Â return -ENOTSUP; >> + Â Â Â Â } else { > > The then branch has a return, so you don't need the else here and can > have the following code nested one level less. > Ack. I will send v2 shortly. Thanks, Miguel
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 05:47:40PM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 07:11:37PM +0300, Gleb Natapov wrote: > > And there are such that cause cpu to stall for 6.5 seconds when you do > > io to them? I never said that we should implement ISA or PCI device, I > > don't know why you bring them here. > > Where is "6.5 seconds" coming from? That is the *total boot time* > of the libguestfs appliance, and includes far far more than the > time taken to do the memcpy. > > I timed the call to cpu_physical_memory_write, and it takes 115 > milliseconds with my patch (for an initrd which is 113 MB). > And how much time it takes to load it using string PIO? 1 second 115 millisecond? I thought 6.5 and 7.5 was image loading time, not total boot time. Stalling vcpu execution for 115 millisecond may be unfortunate but not as catastrophic as 6.5 seconds. But interface will be there for everyone to use, so it may be eventually abused even more. > > It should look like real DMA at least. The justification for it should > > be better than "In our project we don't what to do this and we don't > > what to do that so our initrd is 100M now, so why not add hack to qemu > > to load it 1 second faster so we can grow it some more". > > Please don't make stuff up. We have a large initrd for perfectly good > reasons which I have outlined in a previous email. Those reasons does not look good for me at all. Cleaning up existing distro is not much less work that creating basic distro with only things you need, but result is much better. When I worked on embedded project almost 10 years ago we tried to cleanup generic Red Hat Linux and result was still huge. By building our own distro we were able to squeeze two root partitions in 64M compressed. You also do not want to consider putting things into cdrom because of some issues that should be solvable. -- Gleb.
[Qemu-devel] [PATCH v3 1/2] QMP: Introduce the documentation for query-qdm
--- qemu-monitor.hx | 71 +++ 1 files changed, 71 insertions(+), 0 deletions(-) diff --git a/qemu-monitor.hx b/qemu-monitor.hx index 2af3de6..4e6062b 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -2490,6 +2490,77 @@ STEXI show device tree @item info qdm show qdev device model list +ETEXI +SQMP +query-qdm +- + +Describe the capabilities of all devices registered with qdev. + +The returned output is a json-array, each element is a json-object describing +a single device type. + +Each json-object contains the following: + +- "name": name of the device (json-string) +- "bus": the name of the bus type for the device (json-string) +- Possible values: PCI, SCSI, I2C, ISA, SSI, USB, virtio-serial-bus, System, +IDE, s390-virtio +- "alias": an alias by which the device is also known (json-string, optional) +- "description": description of the device (json-string, optional) +- "creatable": whether this device can be created by the user (json-boolean) +- "properties": a json-array where each item is a json-object that describes a + property of the device. If the device has no property to be setup, this item + will not be present. Each json-object contains the following: + - "name": the name of the property (json-string) + - "type": the json type of the property (json-string) +- Possible values: integer, string, boolean + +Example: + +-> { "execute": "query-qdm" } +<- { + "return": [ +{ + "name": "virtio-blk-pci", + "creatable": true, + "bus": "PCI", + "properties": [ + { + "name": "indirect_desc", + "type": "boolean" + }, + { + "name": "logical_block_size", + "type": "integer" + }, + { + "name": "opt_io_size", + "type": "integer" + }, + { + "name": "drive", + "type": "string" + } + ] +}, +{ + "name": "virtio-balloon-pci", + "creatable": true, + "bus": "PCI", + "properties": [ + { + "name": "indirect_desc", + "type": "boolean" + } + ] +}, + +] + +EQMP + +STEXI @item info roms show roms @end table -- 1.7.1
[Qemu-devel] [PATCH v3 2/2] monitor: Convert 'info qdm' to QMP
Converts the 'info qdm' command to QMP, allowing the discovery of all devices known to the QEMU binary without relying on command line paramaters like -device ? and -device devtype,? This change does not modify the output of the 'info qdm' monitor command. Signed-off-by: Miguel Di Ciurcio Filho --- hw/qdev.c | 110 +++- hw/qdev.h |3 +- monitor.c |3 +- 3 files changed, 112 insertions(+), 4 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index e99c73f..d24d42a 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -29,6 +29,7 @@ #include "qdev.h" #include "sysemu.h" #include "monitor.h" +#include "qjson.h" static int qdev_hotplug = 0; @@ -779,13 +780,118 @@ void do_info_qtree(Monitor *mon) qbus_print(mon, main_system_bus, 0); } -void do_info_qdm(Monitor *mon) +static void qdm_list_iter(QObject *obj, void *opaque) +{ + +Monitor *mon = opaque; +QDict *dev = qobject_to_qdict(obj); + +monitor_printf(mon, "name \"%s\", bus %s", qdict_get_str(dev, "name"), +qdict_get_str(dev, "bus")); + +if (qdict_haskey(dev, "alias")) { +monitor_printf(mon, ", alias \"%s\"", qdict_get_str(dev, "alias")); +} + +if (qdict_haskey(dev, "description")) { +monitor_printf(mon, ", desc \"%s\"", qdict_get_str(dev, "description")); +} + +if (!qdict_get_bool(dev, "creatable")) { +monitor_printf(mon, ", no-user"); +} + +monitor_printf(mon, "\n"); +} + +void do_info_qdm_print(Monitor *mon, const QObject *ret_data) +{ +QList *devs; + +devs = qobject_to_qlist(ret_data); +qlist_iter(devs, qdm_list_iter, mon); +} + +static const char *qdev_property_type_to_string(int type) +{ +switch (type) { +case PROP_TYPE_UINT8: +case PROP_TYPE_UINT16: +case PROP_TYPE_UINT32: +case PROP_TYPE_INT32: +case PROP_TYPE_UINT64: +return "integer"; +case PROP_TYPE_TADDR: +case PROP_TYPE_MACADDR: +case PROP_TYPE_DRIVE: +case PROP_TYPE_CHR: +case PROP_TYPE_STRING: +case PROP_TYPE_NETDEV: +return "string"; +case PROP_TYPE_BIT: + return "boolean"; +case PROP_TYPE_UNSPEC: +case PROP_TYPE_VLAN: +case PROP_TYPE_PTR: +return NULL; +} + +return NULL; +} + +void do_info_qdm(Monitor *mon, QObject **ret_data) { DeviceInfo *info; +QList *devs = qlist_new(); for (info = device_info_list; info != NULL; info = info->next) { -qdev_print_devinfo(info); +QObject *obj; +QDict *dev; +QList *props = qlist_new(); +Property *prop; + +for (prop = info->props; prop && prop->name; prop++) { +QObject *entry; +/* + * TODO: skip old and hackish stuff, they will be removed some day. + */ +if (!prop->info->parse || prop->info->type == PROP_TYPE_VLAN +|| prop->info->type == PROP_TYPE_PTR +|| prop->info->type == PROP_TYPE_UNSPEC) { +continue; +} + +const char *type = qdev_property_type_to_string(prop->info->type); + +entry = qobject_from_jsonf("{ 'name': %s, 'type': %s }", + prop->name, type); + +qlist_append_obj(props, entry); +} + +obj = qobject_from_jsonf("{ 'name': %s, 'bus': %s, 'creatable': %i }", + info->name, + info->bus_info->name, + info->no_user ? 0 : 1); + +dev = qobject_to_qdict(obj); + +if (!qlist_empty(props)) { +qdict_put(dev, "properties", props); +} + +if (info->alias) { +qdict_put(dev, "alias", qstring_from_str(info->alias)); +} + +if (info->desc) { +qdict_put(dev, "description", qstring_from_str(info->desc)); +} + +qlist_append(devs, dev); } + +*ret_data = QOBJECT(devs); } int do_device_add(Monitor *mon, const QDict *qdict, QObject **ret_data) diff --git a/hw/qdev.h b/hw/qdev.h index 678f8b7..3b0382b 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -184,7 +184,8 @@ void qbus_free(BusState *bus); /*** monitor commands ***/ void do_info_qtree(Monitor *mon); -void do_info_qdm(Monitor *mon); +void do_info_qdm_print(Monitor *mon, const QObject *ret_data); +void do_info_qdm(Monitor *mon, QObject **ret_data); int do_device_add(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_device_del(Monitor *mon, const QDict *qdict, QObject **ret_data); diff --git a/monitor.c b/monitor.c index 45fd482..66810f2 100644 --- a/monitor.c +++ b/monitor.c @@ -2565,7 +2565,8 @@ static const mon_cmd_t info_cmds[] = { .args_type = "", .params = "", .help = "show qdev device model list", -.mhandler.info = do_info_qdm, +.user_print = do_info_qdm_print, +.mhandler.info_new = do_info_qdm, }, { .name = "roms", -- 1.7.1
[Qemu-devel] [PATCH v3 0/2] QMP: Introduce query-qdm
This series introduces the documentation for the query-qdm command and the conversion of the monitor command 'info qdm' to QMP. The documentation and code are based on a patch previously sent to qemu-devel by Daniel P. Berrange: http://lists.gnu.org/archive/html/qemu-devel/2010-06/msg00931.html Changelog from v2 - - added IDE and s390-virtio as possible values for "bus" - specify that the "properties" list is optional, in case the device doesn't have anything to be setup - reworded the explanation of "creatable" - reverted the qdev/qmp split, just use "type" and its json equivalent representation - do not list legacy stuff: PROP_TYPE_(VLAN|PTR|UNSPEC) Changelog from v1 - - renamed "props" to "properties" - updated the examples - reworded the explanations of "name" and "description" - split "type" into a json-object, adding "qmp" and "qdev" - list all possible values for "bus" - list all possible values for "qdev" on "type" - list all possible values for "qmp" on "type" Changes from the Daniel's original patch: - Split the patch in two, taking out the documentation from the code - Reworded some parts of the documentation and added data types - Small cleanups and renamed do_info_devices() to do_info_qdm() - Added do_info_qdm_print() to be used in the monitor Regards, Miguel --- Miguel Di Ciurcio Filho (2): QMP: Introduce the documentation for query-qdm monitor: Convert 'info qdm' to QMP hw/qdev.c | 110 ++- hw/qdev.h |3 +- monitor.c |3 +- qemu-monitor.hx | 71 +++ 4 files changed, 183 insertions(+), 4 deletions(-)
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 07:11:37PM +0300, Gleb Natapov wrote: > And there are such that cause cpu to stall for 6.5 seconds when you do > io to them? I never said that we should implement ISA or PCI device, I > don't know why you bring them here. Where is "6.5 seconds" coming from? That is the *total boot time* of the libguestfs appliance, and includes far far more than the time taken to do the memcpy. I timed the call to cpu_physical_memory_write, and it takes 115 milliseconds with my patch (for an initrd which is 113 MB). > It should look like real DMA at least. The justification for it should > be better than "In our project we don't what to do this and we don't > what to do that so our initrd is 100M now, so why not add hack to qemu > to load it 1 second faster so we can grow it some more". Please don't make stuff up. We have a large initrd for perfectly good reasons which I have outlined in a previous email. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top
[Qemu-devel] [PATCH] Declare code_gen_ptr, code_gen_max_blocks 'static'
Both values are only used in exec.c, so there is no need to make them globally available. Signed-off-by: Stefan Weil --- exec-all.h |2 -- exec.c |4 ++-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/exec-all.h b/exec-all.h index a775582..58b5575 100644 --- a/exec-all.h +++ b/exec-all.h @@ -191,8 +191,6 @@ void tb_link_page(TranslationBlock *tb, void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr); extern TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE]; -extern uint8_t *code_gen_ptr; -extern int code_gen_max_blocks; #if defined(USE_DIRECT_JUMP) diff --git a/exec.c b/exec.c index 4641b3e..868cd7f 100644 --- a/exec.c +++ b/exec.c @@ -80,7 +80,7 @@ #define SMC_BITMAP_USE_THRESHOLD 10 static TranslationBlock *tbs; -int code_gen_max_blocks; +static int code_gen_max_blocks; TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE]; static int nb_tbs; /* any access to the tbs or the page table must use this lock */ @@ -107,7 +107,7 @@ static uint8_t *code_gen_buffer; static unsigned long code_gen_buffer_size; /* threshold to flush the translated code buffer */ static unsigned long code_gen_buffer_max_size; -uint8_t *code_gen_ptr; +static uint8_t *code_gen_ptr; #if !defined(CONFIG_USER_ONLY) int phys_ram_fd; -- 1.7.1
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 10:54:03AM -0500, Anthony Liguori wrote: > On 07/19/2010 09:53 AM, Gleb Natapov wrote: > >On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote: > >>On 07/19/2010 02:33 AM, Gleb Natapov wrote: > >>>On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > >That what I am warring about too. If we are adding device we have to be > >sure such device can actually exist on real hw too otherwise we may have > >problems later. > I don't understand why the constraints of real h/w have anything to do > with this. Can you explain? > > >>>Each time we do something not architectural it cause us troubles later. > >>>So constraints of real h/w is our constrains to. > >>Your constraints are purely artificial. > >> > >What is artificial about it? Each time we break them we safer. > > Just because something doesn't fit as an ISA or PCI device doesn't > mean it can't exist in real life. There are plenty of one-off > devices with odd interfaces. And there are such that cause cpu to stall for 6.5 seconds when you do io to them? I never said that we should implement ISA or PCI device, I don't know why you bring them here. > > >>There are plenty of places that something like fw_cfg could live and > >>still do DMA. It can directly hang off of the Southbridge. It > >>doesn't necessary need to be connected to the ISA/LPC buses. > >Examples of real HW? > > The IBM IMM, HP ILO, or Intel iAMT modules. They basically play an > identical role to fw_cfg. > So what are their interfaces? May be we should emulate one. > > And I am not against something that does DMA, > >but that is not what proposed patch does. It provides magic io > >instruction that CPU calls and when instruction completes memory is > >updated. This is nothing like DMA. > > Isn't this exactly what the interface for PCI DMA looks like since > there's no standard DMA implementation? > Every DMA that I know about support polling for completion or they can issue interrupt at the end of transaction. I am not even sure you can design such HW that will stall cpu in IO instruction till some operation is completed. > > Of course it is possible to add > >proper DMA interface to fw_cfg, but should we do it for such a small > >gain? > > I think an ad-hoc DMA interface is perfectly reasonable to do. I > agree that adding a more generic DMA interface is overkill. > It should look like real DMA at least. The justification for it should be better than "In our project we don't what to do this and we don't what to do that so our initrd is 100M now, so why not add hack to qemu to load it 1 second faster so we can grow it some more". -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 07/19/2010 09:53 AM, Gleb Natapov wrote: On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote: On 07/19/2010 02:33 AM, Gleb Natapov wrote: On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: That what I am warring about too. If we are adding device we have to be sure such device can actually exist on real hw too otherwise we may have problems later. I don't understand why the constraints of real h/w have anything to do with this. Can you explain? Each time we do something not architectural it cause us troubles later. So constraints of real h/w is our constrains to. Your constraints are purely artificial. What is artificial about it? Each time we break them we safer. Just because something doesn't fit as an ISA or PCI device doesn't mean it can't exist in real life. There are plenty of one-off devices with odd interfaces. There are plenty of places that something like fw_cfg could live and still do DMA. It can directly hang off of the Southbridge. It doesn't necessary need to be connected to the ISA/LPC buses. Examples of real HW? The IBM IMM, HP ILO, or Intel iAMT modules. They basically play an identical role to fw_cfg. And I am not against something that does DMA, but that is not what proposed patch does. It provides magic io instruction that CPU calls and when instruction completes memory is updated. This is nothing like DMA. Isn't this exactly what the interface for PCI DMA looks like since there's no standard DMA implementation? Of course it is possible to add proper DMA interface to fw_cfg, but should we do it for such a small gain? I think an ad-hoc DMA interface is perfectly reasonable to do. I agree that adding a more generic DMA interface is overkill. Regards, Anthony Liguori Buses exist to multiplex I/O devices because of limited wiring space on motherboards. There's no reason we need to constrain ourselves to minimize the number of virtual wires we emulate. Regards, Anthony Liguori -- Gleb.
[Qemu-devel] Re: [PATCH] ide/atapi: add support for GET EVENT STATUS NOTIFICATION
Am 19.07.2010 17:36, schrieb Aurelien Jarno: >> Have you tested some more OSes to ensure that they don't start to expect >> events to actually work now the command "works"? I didn't see any >> problems in a quick test with Linux, but you never know. >> > > Besides FreeBSD, I have tested without problem Linux, NetBSD and > OpenBSD, though I haven't tested them more then booting and mounting a > CD-ROM. I guess the interesting thing is changing media. It did work for me on Linux, though, and I just tried Windows 7 and it seems to be fine, too (though I tried Windows on qemu-kvm because I only get a BSOD with upstream qemu) Kevin
[Qemu-devel] Re: [PATCH] ide/atapi: add support for GET EVENT STATUS NOTIFICATION
On Mon, Jul 19, 2010 at 05:28:40PM +0200, Kevin Wolf wrote: > Am 19.07.2010 15:53, schrieb Aurelien Jarno: > > The GET EVENT STATUS NOTIFICATION is a mandatory command according > > to MMC-3, even if event status notification is not supported. > > > > This patch adds support for this command. It returns NEA ("No Event > > Available") with an empty "Supported Event Classes" to show that it > > doesn't event support status notification. If asychronous operation is > > requested, which requires NCQ support, it returns an error according > > to the specifications. > > > > This fixes HAL support on FreeBSD and derivatives, which fill up the > > logs every second with: > > > > acd0: FAILURE - unknown CMD (0x03) ILLEGAL REQUEST asc=0x20 ascq=0x00 > > > > Signed-off-by: Aurelien Jarno > > Looks good to me. Thanks for the review. > Would you prefer me to take this into the block branch (actually, I have > already done this) or are you going to commit directly? This might I am fine to get it through the block branch. > actually be something that should be in 0.13. agreed. > Have you tested some more OSes to ensure that they don't start to expect > events to actually work now the command "works"? I didn't see any > problems in a quick test with Linux, but you never know. > Besides FreeBSD, I have tested without problem Linux, NetBSD and OpenBSD, though I haven't tested them more then booting and mounting a CD-ROM. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
[Qemu-devel] Re: [PATCH] ide/atapi: add support for GET EVENT STATUS NOTIFICATION
Am 19.07.2010 15:53, schrieb Aurelien Jarno: > The GET EVENT STATUS NOTIFICATION is a mandatory command according > to MMC-3, even if event status notification is not supported. > > This patch adds support for this command. It returns NEA ("No Event > Available") with an empty "Supported Event Classes" to show that it > doesn't event support status notification. If asychronous operation is > requested, which requires NCQ support, it returns an error according > to the specifications. > > This fixes HAL support on FreeBSD and derivatives, which fill up the > logs every second with: > > acd0: FAILURE - unknown CMD (0x03) ILLEGAL REQUEST asc=0x20 ascq=0x00 > > Signed-off-by: Aurelien Jarno Looks good to me. Would you prefer me to take this into the block branch (actually, I have already done this) or are you going to commit directly? This might actually be something that should be in 0.13. Have you tested some more OSes to ensure that they don't start to expect events to actually work now the command "works"? I didn't see any problems in a quick test with Linux, but you never know. Kevin
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 09:52:23AM -0500, Anthony Liguori wrote: > On 07/19/2010 04:15 AM, Gleb Natapov wrote: > >On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote: > >>On 19.07.2010, at 11:06, Gleb Natapov wrote: > >> > >>>On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote: > virt-install is another program that uses explicit -initrd. > > >>>Installation takes a lot of time. Saving 1 second there will not be > >>>noticeable. And during lifetime of installed VM initrd will be loaded > >>>from its disk. > >>Guys, please. It shouldn't be one or the other. Let's make sure both ways > >>of doing things are fast. That's what users want: fast. > >> > >That what we are talking about, no? We are trying to find faster way to > >load kernel/initrd and stay architectural. > > Modern platforms are not nearly as "architectural" as you would think. > > It's not unusual to hang a custom chip off of the Southbridge that > implements platform specific services along with an array of > "legacy" devices that are implemented mostly in software to cost. > > Other buses (like PS/2) are largely implemented in SMM today by the BIOS. > I don't get your point. What is not "architectural" about all that? -- Gleb.
[Qemu-devel] Automatic generation of code-generator components (RETRY)
Dear developers -- I've seen no responses yet. My proposal is due in early August, so if anyone has feelings on this or comments about it, please respond soon :-) ... Thank you for your patience -- Eliot moss Original Message Subject: [Qemu-devel] Automatic generation of code-generator components Date: Tue, 13 Jul 2010 14:09:56 -0400 From: Eliot Moss Reply-To: m...@cs.umass.edu To: qemu-devel@nongnu.org Dear QEMU developers -- I have had some email conversation with a few active developers, and with their encouragement, want to open it up for the whole list to comment. For several years my research group at UMass has been developing generic code-generator generator (CGG) technology. Historic CGGs have always been tied to a particular code-generation framework, that is, to a particular intermediate representation (IR) and compiler. Our tool, called GIST (for Generator of Instruction Selectors Tool), is designed to work from any reasonable IR and to connect to any reasonable framework. More technical details below, but what we are hoping for is to be able to say that if we make this industrial-strength with some funding from the National Science Foundation, the QEMU community will be interested in using it. No commitment -- just that you think it *might* be a good idea if we can make it go. We would use QEMU as one of our "demo" environments. Ok, more details. We have an architecture description language called CISL (CoGenT Instruction Set Language; CoGenT is our overall project's name). It is somewhat like C or Java in appearance. You define the various memories and registers, and the instructions. To generate an instruction selector from input ISA A (generally a compiler IR, but not necessarily) to output ISA B (generally, but not always, a hardware architecture), you start with descriptions of A and B in CISL -- some of which may already be around. You also write what we call a *mapping* from A to B, which simply indicates where on B each memory/register of A should go. The tool then finds instruction selector patterns, at least one for each instruction of the A machine. For any given retargetable *framework* (compiler, interpreter, emulator), we write one *adapter*, that knows how to take GIST patterns in their internal form and write them out in the way that the framework needs them. Here's an example. Suppose we are going from A = QEMU IR to B = MIPS, that is, the same as the TCG back end for an emulator running on the MIPS processor. We have written a CISL description for the QEMU IR (yes, already), and suppose we have one for the MIPS, sufficient for code generation anyway. [Side note: Compilers do not generally use every instruction of their target, e.g., not the privileged mode ones, etc. Also, in the presence of register allocation, they generally target a slightly virtualized machine -- one with a huge number of registers, which register allocation then resolves to real registers and occasional spilled locations.] The mapping would talk about how to find QEMU memory on the MIPS (perhaps a dedicated base register), etc., and would also capture the conventions for calling helper routines, and so on. The adapter for QEMU TCG back ends would generate something like a C switch statement with one case for each QEMU IR instruction. Each case might have some additional case analysis. This is because (as you see in QEMU), a given IR instruction can have special cases depending on values of constants, whether something is in a register, etc. GIST will have found different patterns for each of these, and with each one there would be a *constraint*, indicating when it applies. For example, patterns for adding a constant value on the MIPS would likely have a special case for constants that fit in 16 bits, since then you can use one immediate instruction. Likewise, the constant 0 is a special case since it can just be a move. In addition to constraints, patterns have costs, which one can develop for any given target, but would typically be based on number of instructions, number of instruction bytes, number of memory references, etc. Thus the case analysis for a given instruction would check for the lowest cost patterns first, and would conclude with the most general pattern (but which may be the most expensive). The adapter would also need to generate the information needed by the QEMU TCG register allocator. Now, here are some things of additional interest: - While QEMU IR -> emulation host code-generation is maybe the most obvious case, we can also handle the "front end" emulation target -> QEMU IR generation. This probably requires a slightly different description of machine A than when A is the emulation host -- after all, we must handle *all* instructions, including privileged ones, etc. But it is possible to make the descriptions modular in such a way that instructions used in both cases are not repeated. - I noticed that someone is looki
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote: > On 07/19/2010 02:33 AM, Gleb Natapov wrote: > >On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: > >>On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > >>>That what I am warring about too. If we are adding device we have to be > >>>sure such device can actually exist on real hw too otherwise we may have > >>>problems later. > >>I don't understand why the constraints of real h/w have anything to do > >>with this. Can you explain? > >> > >Each time we do something not architectural it cause us troubles later. > >So constraints of real h/w is our constrains to. > > Your constraints are purely artificial. > What is artificial about it? Each time we break them we safer. > There are plenty of places that something like fw_cfg could live and > still do DMA. It can directly hang off of the Southbridge. It > doesn't necessary need to be connected to the ISA/LPC buses. Examples of real HW? And I am not against something that does DMA, but that is not what proposed patch does. It provides magic io instruction that CPU calls and when instruction completes memory is updated. This is nothing like DMA. Of course it is possible to add proper DMA interface to fw_cfg, but should we do it for such a small gain? > > Buses exist to multiplex I/O devices because of limited wiring space > on motherboards. There's no reason we need to constrain ourselves > to minimize the number of virtual wires we emulate. > > Regards, > > Anthony Liguori -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 07/19/2010 04:15 AM, Gleb Natapov wrote: On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote: On 19.07.2010, at 11:06, Gleb Natapov wrote: On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote: virt-install is another program that uses explicit -initrd. Installation takes a lot of time. Saving 1 second there will not be noticeable. And during lifetime of installed VM initrd will be loaded from its disk. Guys, please. It shouldn't be one or the other. Let's make sure both ways of doing things are fast. That's what users want: fast. That what we are talking about, no? We are trying to find faster way to load kernel/initrd and stay architectural. Modern platforms are not nearly as "architectural" as you would think. It's not unusual to hang a custom chip off of the Southbridge that implements platform specific services along with an array of "legacy" devices that are implemented mostly in software to cost. Other buses (like PS/2) are largely implemented in SMM today by the BIOS. Regards, Anthony Liguori Honestly I would expect much greater speedup from Richard's approach like 2 seconds vs 8 seconds. It is hard to justify code complication just for 1 second speedup. -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 07/19/2010 02:33 AM, Gleb Natapov wrote: On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: That what I am warring about too. If we are adding device we have to be sure such device can actually exist on real hw too otherwise we may have problems later. I don't understand why the constraints of real h/w have anything to do with this. Can you explain? Each time we do something not architectural it cause us troubles later. So constraints of real h/w is our constrains to. Your constraints are purely artificial. There are plenty of places that something like fw_cfg could live and still do DMA. It can directly hang off of the Southbridge. It doesn't necessary need to be connected to the ISA/LPC buses. Buses exist to multiplex I/O devices because of limited wiring space on motherboards. There's no reason we need to constrain ourselves to minimize the number of virtual wires we emulate. Regards, Anthony Liguori
[Qemu-devel] Re: [PATCH] loadvm: improve tests before bdrv_snapshot_goto()
Am 14.07.2010 20:27, schrieb Miguel Di Ciurcio Filho: > This patch improves the resilience of the load_vmstate() function, doing > further and better ordered tests. > > In load_vmstate(), if there is any error on bdrv_snapshot_goto(), except if > the > error is on VM state device, load_vmstate() will return zero and the VM will > be > started with major corruption chances. > > The current process: > - test if there is any writable device without snapshot support > - if exists return -error > - get the device that saves the VM state, possible return -error but unlikely > because it was tested earlier > - flush I/O > - run bdrv_snapshot_goto() on devices > - if fails, give an warning and goes to the next (not good!) > - if fails on the VM state device, return zero (not good!) > - check if the requested snapshot exists on the device that saves the VM state > and the state is not zero > - if fails return -error > - open the file with the VM state > - if fails return -error > - load the VM state > - if fails return -error > - return zero > > New behavior: > - get the device that saves the VM state > - if fails return -error > - check if the requested snapshot exists on the device that saves the VM state > and the state is not zero > - if fails return -error > - test if there is any writable device without snapshot support > - if exists return -error > - test if the devices with snapshot support have the requested snapshot > - if anyone fails, return -error > - flush I/O > - run snapshot_goto() on devices > - if anyone fails, return -error > - open the file with the VM state > - if fails return -error > - load the VM state > - if fails return -error > - return zero > > do_loadvm must not call vm_start if any error has occurred in load_vmstate. > > Signed-off-by: Miguel Di Ciurcio Filho > --- > monitor.c |3 +- > savevm.c | 71 > 2 files changed, 35 insertions(+), 39 deletions(-) > > diff --git a/monitor.c b/monitor.c > index 45fd482..aa60cfa 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -2270,8 +2270,9 @@ static void do_loadvm(Monitor *mon, const QDict *qdict) > > vm_stop(0); > > -if (load_vmstate(name) >= 0 && saved_vm_running) > +if (load_vmstate(name) == 0 && saved_vm_running) { > vm_start(); > +} > } > > int monitor_get_fd(Monitor *mon, const char *fdname) > diff --git a/savevm.c b/savevm.c > index ee27989..9f29cb0 100644 > --- a/savevm.c > +++ b/savevm.c > @@ -1804,12 +1804,25 @@ void do_savevm(Monitor *mon, const QDict *qdict) > > int load_vmstate(const char *name) > { > -BlockDriverState *bs, *bs1; > +BlockDriverState *bs, *bs_vm_state; > QEMUSnapshotInfo sn; > QEMUFile *f; > int ret; > > -/* Verify if there is a device that doesn't support snapshots and is > writable */ > +bs_vm_state = bdrv_snapshots(); > +if (!bs_vm_state) { > +error_report("No block device supports snapshots"); > +return -EINVAL; -ENOTSUP? > +} > + > +/* Don't even try to load empty VM states */ > +ret = bdrv_snapshot_find(bs_vm_state, &sn, name); > +if ((ret >= 0) && (sn.vm_state_size == 0)) { > +return -EINVAL; > +} You can probably stop here already if ret < 0: ret = ... if (ret < 0) { return ret; } else if (sn.vm_state_size == 0) { return -EINVAL; } I'm not sure about EINVAL here either, but I don't really have a better suggestion. > + > +/* Verify if there is any device that doesn't support snapshots and is > +writable and check if the requested snapshot is available too. */ > bs = NULL; > while ((bs = bdrv_next(bs))) { > > @@ -1821,64 +1834,46 @@ int load_vmstate(const char *name) > error_report("Device '%s' is writable but does not support > snapshots.", > bdrv_get_device_name(bs)); > return -ENOTSUP; > +} else { The then branch has a return, so you don't need the else here and can have the following code nested one level less. Looks good otherwise. Kevin
[Qemu-devel] [PATCH] ide/atapi: add support for GET EVENT STATUS NOTIFICATION
The GET EVENT STATUS NOTIFICATION is a mandatory command according to MMC-3, even if event status notification is not supported. This patch adds support for this command. It returns NEA ("No Event Available") with an empty "Supported Event Classes" to show that it doesn't event support status notification. If asychronous operation is requested, which requires NCQ support, it returns an error according to the specifications. This fixes HAL support on FreeBSD and derivatives, which fill up the logs every second with: acd0: FAILURE - unknown CMD (0x03) ILLEGAL REQUEST asc=0x20 ascq=0x00 Signed-off-by: Aurelien Jarno --- hw/ide/core.c | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index e20f2e7..9e1bdd5 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -1643,6 +1643,21 @@ static void ide_atapi_cmd(IDEState *s) ide_atapi_cmd_reply(s, len, max_len); break; } +case GPCMD_GET_EVENT_STATUS_NOTIFICATION: +max_len = ube16_to_cpu(packet + 7); + +if (packet[1] & 0x01) { /* polling */ +/* We don't support any event class (yet). */ +cpu_to_ube16(buf, 0x00); /* No event descriptor returned */ +buf[2] = 0x80; /* No Event Available (NEA) */ +buf[3] = 0x00; /* Empty supported event classes */ +ide_atapi_cmd_reply(s, 4, max_len); +} else { /* asynchronous mode */ +/* Only polling is supported, asynchronous mode is not. */ +ide_atapi_cmd_error(s, SENSE_ILLEGAL_REQUEST, +ASC_INV_FIELD_IN_CMD_PACKET); +} +break; default: ide_atapi_cmd_error(s, SENSE_ILLEGAL_REQUEST, ASC_ILLEGAL_OPCODE); -- 1.7.1
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 02:06:27PM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 12:15:43PM +0300, Gleb Natapov wrote: > > That what we are talking about, no? We are trying to find faster way to > > load kernel/initrd and stay architectural. Honestly I would expect much > > greater speedup from Richard's approach like 2 seconds vs 8 seconds. It > > is hard to justify code complication just for 1 second speedup. > > I've no idea where this "8 seconds" comes from. Total boot time That was number generated by may random number generator. I was just trying to say that I would have expected much more gain from copying kernel/initrd directly into the memory considering how much is going on during pio string emulation. > currently is < 8 seconds even without my patch. My patch takes it > from 7.5 seconds to 6.5 seconds. > It shows that we are not so bad at emulating pio string operations. -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 12:15:43PM +0300, Gleb Natapov wrote: > That what we are talking about, no? We are trying to find faster way to > load kernel/initrd and stay architectural. Honestly I would expect much > greater speedup from Richard's approach like 2 seconds vs 8 seconds. It > is hard to justify code complication just for 1 second speedup. I've no idea where this "8 seconds" comes from. Total boot time currently is < 8 seconds even without my patch. My patch takes it from 7.5 seconds to 6.5 seconds. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones New in Fedora 11: Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 70 libraries supprt'd http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw
Re: [Qemu-devel] TCG vs Dyngen
> In the old version there is a file ops_mem.h that has proto types for memory > functions. In the new version this file does not exist anymore which is due to > the inclusion of tcg instead of dyngen. I'm not very familiar with the old dyngen system, but I believe the tcg equivalent to ops_mem.h is softmmu_template.h. -- -Eli signature.asc Description: Digital signature
[Qemu-devel] [Bug 607204] Re: New qemu instances often cannot be started if host system is under load
Forgot to mention: The bug is still present in the latest git-version as of this writing. -- New qemu instances often cannot be started if host system is under load https://bugs.launchpad.net/bugs/607204 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: I've got a problem where I cannot start any new VMs with qemu-kvm if the host machine is under high CPU load. The problem is not 100% reproducible (it works sometimes), but under load conditions, it happens most of the time - roughly 95%. I'm usually using libvirt to start and stop KVM VMs. When using virsh to start a new VM under those conditions, the output looks like this: virsh # start testserver-a error: Failed to start domain testserver-a error: monitor socket did not show up.: Connection refused (There is a very long wait after the command has been sent until the error message shows up.) This is (an example of) the command line that libvirtd uses to start up qemu: - snip - LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.12 -enable-kvm -m 256 -smp 1,sockets=1,cores=1,threads=1 -name testserver-a -uuid 7cbb3665-4d58-86b8-ce8f-20541995a99c -nodefaults -chardev socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/testserver-a.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no-acpi -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x7 -drive file=/data/testserver-a-system.img,if=none,id=drive-scsi0-0-1,boot=on -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 -drive file=/data/testserver-a-data1.img,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/data/testserver-a-data2.img,if=none,id=drive-virtio-disk2 -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/data/gentoo-install-amd64-minimal-20100408.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/data/testserver-a_configfloppy.img,if=none,id=drive-fdc0-0-0 -global isa-fdc.driveA=drive-fdc0-0-0 -device e1000,vlan=0,id=net0,mac=52:54:00:84:6d:69,bus=pci.0,addr=0x6 -net tap,fd=24,vlan=0,name=hostnet0 -usb -vnc 127.0.0.1:1,password -k de -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 - snip - Copy-pasting this to a commandline on the host to start qemu manually leads to a non-functional qemu process that "just sits there" with nothing happening. The monitor socket /usr/local/var/lib/libvirt/qemu/testserver-a.monitor will, indeed, not show up. I've tried starting qemu with the same commandline but without the parameters for redirecting the monitor to a socket, without the fd parameter for the network interface and without the vnc parameter. This resulted in a black window with the title "QEMU (testserver-a) [Stopped]". I could not access the monitor console in graphical mode either. When I press Ctrl-Alt-2 in graphical mode to access the monitor console, qemu will sometimes (but not always) crash with a segfault about 2 seconds after. Some experimentation I've done suggests that this problem only happens if the high cpu load is caused by another qemu process, not if it is caused by something else running on the machine. The bug appears much less often if I leave off the -nodefaults parameter. The bug will still appear if I start qemu as qemu-system-x86_64 instead of qemu-kvm and replace the -enable-kvm parameter with -no-kvm. The host machine I'm running this on has got 16 cores in total. It looks like it is sufficient for this bug to surface if at least one of these cores is brought to near 100% use by a qemu process. The version of qemu I'm using is qemu-kvm 0.12.4, built from source. Libvirt is version 0.8.1, built from source as well. The host OS is Fedora 12. The Kernel version is 2.6.32.12-115.fc12.x86_64. Attached is an strace of attempting to start qemu which I hope will help someone with a better understanding of qemu's internals see what's actually going on there.
[Qemu-devel] [Bug 607204] [NEW] New qemu instances often cannot be started if host system is under load
Public bug reported: I've got a problem where I cannot start any new VMs with qemu-kvm if the host machine is under high CPU load. The problem is not 100% reproducible (it works sometimes), but under load conditions, it happens most of the time - roughly 95%. I'm usually using libvirt to start and stop KVM VMs. When using virsh to start a new VM under those conditions, the output looks like this: virsh # start testserver-a error: Failed to start domain testserver-a error: monitor socket did not show up.: Connection refused (There is a very long wait after the command has been sent until the error message shows up.) This is (an example of) the command line that libvirtd uses to start up qemu: - snip - LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.12 -enable-kvm -m 256 -smp 1,sockets=1,cores=1,threads=1 -name testserver-a -uuid 7cbb3665-4d58-86b8-ce8f-20541995a99c -nodefaults -chardev socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/testserver-a.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no-acpi -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x7 -drive file=/data/testserver-a-system.img,if=none,id=drive-scsi0-0-1,boot=on -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 -drive file=/data/testserver-a-data1.img,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/data/testserver-a-data2.img,if=none,id=drive-virtio-disk2 -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/data/gentoo-install-amd64-minimal-20100408.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/data/testserver-a_configfloppy.img,if=none,id=drive-fdc0-0-0 -global isa-fdc.driveA=drive-fdc0-0-0 -device e1000,vlan=0,id=net0,mac=52:54:00:84:6d:69,bus=pci.0,addr=0x6 -net tap,fd=24,vlan=0,name=hostnet0 -usb -vnc 127.0.0.1:1,password -k de -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 - snip - Copy-pasting this to a commandline on the host to start qemu manually leads to a non-functional qemu process that "just sits there" with nothing happening. The monitor socket /usr/local/var/lib/libvirt/qemu/testserver-a.monitor will, indeed, not show up. I've tried starting qemu with the same commandline but without the parameters for redirecting the monitor to a socket, without the fd parameter for the network interface and without the vnc parameter. This resulted in a black window with the title "QEMU (testserver-a) [Stopped]". I could not access the monitor console in graphical mode either. When I press Ctrl-Alt-2 in graphical mode to access the monitor console, qemu will sometimes (but not always) crash with a segfault about 2 seconds after. Some experimentation I've done suggests that this problem only happens if the high cpu load is caused by another qemu process, not if it is caused by something else running on the machine. The bug appears much less often if I leave off the -nodefaults parameter. The bug will still appear if I start qemu as qemu-system-x86_64 instead of qemu-kvm and replace the -enable-kvm parameter with -no-kvm. The host machine I'm running this on has got 16 cores in total. It looks like it is sufficient for this bug to surface if at least one of these cores is brought to near 100% use by a qemu process. The version of qemu I'm using is qemu-kvm 0.12.4, built from source. Libvirt is version 0.8.1, built from source as well. The host OS is Fedora 12. The Kernel version is 2.6.32.12-115.fc12.x86_64. Attached is an strace of attempting to start qemu which I hope will help someone with a better understanding of qemu's internals see what's actually going on there. ** Affects: qemu Importance: Undecided Status: New -- New qemu instances often cannot be started if host system is under load https://bugs.launchpad.net/bugs/607204 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: I've got a problem where I cannot start any new VMs with qemu-kvm if the host machine is under high CPU load. The problem is not 100% reproducible (it works sometimes), but under load conditions, it happens most of the time - roughly 95%. I'm usually using libvirt to start and stop KVM VMs. When using virsh to start a new VM under those conditions, the output looks like this: virsh # start testserver-a error: Failed to start domain testserver-a error: monitor socket did not show up.: Connection refused (There is a very long wait after the command has been sent until the error message shows up.) This is (an example of) the command line that libvirtd uses to start up qemu: - snip - LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root
[Qemu-devel] [Bug 607204] Re: New qemu instances often cannot be started if host system is under load
** Attachment added: "strace output" http://launchpadlibrarian.net/52160914/qemu-kvm-bug-strace -- New qemu instances often cannot be started if host system is under load https://bugs.launchpad.net/bugs/607204 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: I've got a problem where I cannot start any new VMs with qemu-kvm if the host machine is under high CPU load. The problem is not 100% reproducible (it works sometimes), but under load conditions, it happens most of the time - roughly 95%. I'm usually using libvirt to start and stop KVM VMs. When using virsh to start a new VM under those conditions, the output looks like this: virsh # start testserver-a error: Failed to start domain testserver-a error: monitor socket did not show up.: Connection refused (There is a very long wait after the command has been sent until the error message shows up.) This is (an example of) the command line that libvirtd uses to start up qemu: - snip - LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.12 -enable-kvm -m 256 -smp 1,sockets=1,cores=1,threads=1 -name testserver-a -uuid 7cbb3665-4d58-86b8-ce8f-20541995a99c -nodefaults -chardev socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/testserver-a.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no-acpi -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x7 -drive file=/data/testserver-a-system.img,if=none,id=drive-scsi0-0-1,boot=on -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 -drive file=/data/testserver-a-data1.img,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/data/testserver-a-data2.img,if=none,id=drive-virtio-disk2 -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/data/gentoo-install-amd64-minimal-20100408.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/data/testserver-a_configfloppy.img,if=none,id=drive-fdc0-0-0 -global isa-fdc.driveA=drive-fdc0-0-0 -device e1000,vlan=0,id=net0,mac=52:54:00:84:6d:69,bus=pci.0,addr=0x6 -net tap,fd=24,vlan=0,name=hostnet0 -usb -vnc 127.0.0.1:1,password -k de -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 - snip - Copy-pasting this to a commandline on the host to start qemu manually leads to a non-functional qemu process that "just sits there" with nothing happening. The monitor socket /usr/local/var/lib/libvirt/qemu/testserver-a.monitor will, indeed, not show up. I've tried starting qemu with the same commandline but without the parameters for redirecting the monitor to a socket, without the fd parameter for the network interface and without the vnc parameter. This resulted in a black window with the title "QEMU (testserver-a) [Stopped]". I could not access the monitor console in graphical mode either. When I press Ctrl-Alt-2 in graphical mode to access the monitor console, qemu will sometimes (but not always) crash with a segfault about 2 seconds after. Some experimentation I've done suggests that this problem only happens if the high cpu load is caused by another qemu process, not if it is caused by something else running on the machine. The bug appears much less often if I leave off the -nodefaults parameter. The bug will still appear if I start qemu as qemu-system-x86_64 instead of qemu-kvm and replace the -enable-kvm parameter with -no-kvm. The host machine I'm running this on has got 16 cores in total. It looks like it is sufficient for this bug to surface if at least one of these cores is brought to near 100% use by a qemu process. The version of qemu I'm using is qemu-kvm 0.12.4, built from source. Libvirt is version 0.8.1, built from source as well. The host OS is Fedora 12. The Kernel version is 2.6.32.12-115.fc12.x86_64. Attached is an strace of attempting to start qemu which I hope will help someone with a better understanding of qemu's internals see what's actually going on there.
[Qemu-devel] Re: [PATCH] Create USB buses and devices based on USB version.
On 07/14/10 11:56, David Ahern wrote: > Create USB buses and devices based on USB version. > > Signed-off-by: David Ahern ping. Relative to current code this addresses a number of FIXME's by assigning USB devices to a specific bus. It is also groundwork for adding ehci. David > --- > hw/usb-bus.c| 70 > --- > hw/usb-msd.c|2 +- > hw/usb-net.c|2 +- > hw/usb-ohci.c |2 +- > hw/usb-serial.c |4 +- > hw/usb-uhci.c |2 +- > hw/usb.h|7 +++-- > usb-bsd.c |2 +- > usb-linux.c |2 +- > 9 files changed, 63 insertions(+), 30 deletions(-) > > diff --git a/hw/usb-bus.c b/hw/usb-bus.c > index b692503..f4849f8 100644 > --- a/hw/usb-bus.c > +++ b/hw/usb-bus.c > @@ -4,6 +4,12 @@ > #include "sysemu.h" > #include "monitor.h" > > +enum { > +USB_VERSION_NONE, > +USB_VERSION_1_1, > +}; > + > + > static void usb_bus_dev_print(Monitor *mon, DeviceState *qdev, int indent); > > static struct BusInfo usb_bus_info = { > @@ -14,27 +20,46 @@ static struct BusInfo usb_bus_info = { > static int next_usb_bus = 0; > static QTAILQ_HEAD(, USBBus) busses = QTAILQ_HEAD_INITIALIZER(busses); > > -void usb_bus_new(USBBus *bus, DeviceState *host) > +static void usb_bus_new(USBBus *bus, int version, DeviceState *host) > { > qbus_create_inplace(&bus->qbus, &usb_bus_info, host, NULL); > bus->busnr = next_usb_bus++; > bus->qbus.allow_hotplug = 1; /* Yes, we can */ > +bus->version = version; > QTAILQ_INIT(&bus->free); > QTAILQ_INIT(&bus->used); > QTAILQ_INSERT_TAIL(&busses, bus, next); > } > > -USBBus *usb_bus_find(int busnr) > +void usb_bus_new_v1(USBBus *bus, DeviceState *host) > +{ > +usb_bus_new(bus, USB_VERSION_1_1, host); > +} > + > +static USBBus *usb_bus_find(int busnr) > +{ > +USBBus *bus; > + > +QTAILQ_FOREACH(bus, &busses, next) { > +if (bus->busnr == busnr) { > +break; > +} > +} > + > +return bus; > +} > + > +/* device creation should be using this one */ > +USBBus *usb_bus_find_version(int version) > { > USBBus *bus; > > -if (-1 == busnr) > -return QTAILQ_FIRST(&busses); > QTAILQ_FOREACH(bus, &busses, next) { > -if (bus->busnr == busnr) > -return bus; > +if (bus->version == version) { > +break; > +} > } > -return NULL; > +return bus; > } > > static int usb_qdev_init(DeviceState *qdev, DeviceInfo *base) > @@ -80,20 +105,15 @@ void usb_qdev_register_many(USBDeviceInfo *info) > } > } > > -USBDevice *usb_create(USBBus *bus, const char *name) > +static USBDevice *usb_create(USBBus *bus, const char *name) > { > DeviceState *dev; > > -#if 1 > -/* temporary stopgap until all usb is properly qdev-ified */ > if (!bus) { > -bus = usb_bus_find(-1); > -if (!bus) > -return NULL; > -fprintf(stderr, "%s: no bus specified, using \"%s\" for \"%s\"\n", > -__FUNCTION__, bus->qbus.name, name); > +fprintf(stderr, "%s: no bus specified for \"%s\"\n", > +__FUNCTION__, name); > +return NULL; > } > -#endif > > dev = qdev_create(&bus->qbus, name); > return DO_UPCAST(USBDevice, qdev, dev); > @@ -101,7 +121,13 @@ USBDevice *usb_create(USBBus *bus, const char *name) > > USBDevice *usb_create_simple(USBBus *bus, const char *name) > { > -USBDevice *dev = usb_create(bus, name); > +USBDevice *dev; > + > +/* if bus not given default to USB 1.1 */ > +if (!bus) > +bus = usb_bus_find_version(USB_VERSION_1_1); > + > +dev = usb_create(bus, name); > if (!dev) { > hw_error("Failed to create USB device '%s'\n", name); > } > @@ -109,6 +135,13 @@ USBDevice *usb_create_simple(USBBus *bus, const char > *name) > return dev; > } > > +/* create USB device attached to USB 1.1 controller */ > +USBDevice *usb_create_v1(const char *name) > +{ > +USBBus *bus = usb_bus_find_version(USB_VERSION_1_1); > +return usb_create(bus, name); > +} > + > void usb_register_port(USBBus *bus, USBPort *port, void *opaque, int index, > usb_attachfn attach) > { > @@ -260,7 +293,6 @@ void usb_info(Monitor *mon) > /* handle legacy -usbdevice cmd line option */ > USBDevice *usbdevice_create(const char *cmdline) > { > -USBBus *bus = usb_bus_find(-1 /* any */); > DeviceInfo *info; > USBDeviceInfo *usb; > char driver[32]; > @@ -302,7 +334,7 @@ USBDevice *usbdevice_create(const char *cmdline) > error_report("usbdevice %s accepts no params", driver); > return NULL; > } > -return usb_create_simple(bus, usb->qdev.name); > +return usb_create_simple(NULL, usb->qdev.name); > } > return usb->usbdevice_init(params); > } > diff --git a/hw/usb-msd.c b/hw/usb-msd.c > index 65e9624..d8b68f7 100644 > --- a/hw/usb
[Qemu-devel] Re: [PATCH] block: Use error codes from lower levels for error message
Am 18.07.2010 21:42, schrieb Stefan Weil: > "No such file or directory" is a misleading error message > when a user tries to open a file with wrong permissions. > > Cc: Kevin Wolf > Signed-off-by: Stefan Weil > --- > block.c | 12 > 1 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/block.c b/block.c > index f837876..2f80540 100644 > --- a/block.c > +++ b/block.c > @@ -330,16 +330,20 @@ BlockDriver *bdrv_find_protocol(const char *filename) > return NULL; > } > > -static BlockDriver *find_image_format(const char *filename) > +static BlockDriver *find_image_format(const char *filename, int *error) Wouldn't it be a more natural interface to return an 0/-errno int and pass the BlockDriver* by reference? I think we already have some function that work this way in the block code, but I can't remember any that get an int *error. > { > int ret, score, score_max; > BlockDriver *drv1, *drv; > uint8_t buf[2048]; > BlockDriverState *bs; > > +*error = -ENOENT; Why -ENOENT is the default would be clearer if you moved it down next to the drv = NULL before the loop that searches for the driver. Apart from these minor nitpicks it looks good. Kevin
[Qemu-devel] [PATCH 2/2 version 3] fw_cfg: Implement fast "DMA"-type operation for rapidly copying in kernel, initrd [etc] into the guest
This version adds polling for DMA done. Part 1/2 (the fix for e->callback == NULL) is the same as always so I won't attach it. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://et.redhat.com/~rjones/libguestfs/ See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html >From 449eeef3dedb5612fe4f408835bbd926643d35f8 Mon Sep 17 00:00:00 2001 From: Richard Jones Date: Sat, 17 Jul 2010 14:30:46 +0100 Subject: [PATCH 2/2] fw_cfg: Allow guest to read kernel etc via fast, synchronous "DMA"-type operation. This adds a "DMA" operation for rapidly copying the kernel, initrd etc into the guest. The guest sets up a DMA address and size and then issues the usual read operation but with the FW_CFG_DMA bit set on the entry number. QEmu then just copies the whole config entry to the selected physical address synchronously. The guest should poll until this "DMA" operation is done, allowing us to write an alternate asynchronous version in future should that be necessary. This saves some time when loading large images. This change is backwards compatible. ROMs using the old method will work unchanged. Signed-off-by: Richard W.M. Jones --- hw/fw_cfg.c | 35 +- hw/fw_cfg.h | 10 +++- pc-bios/optionrom/linuxboot.S |8 +++--- pc-bios/optionrom/optionrom.h | 42 + 4 files changed, 88 insertions(+), 7 deletions(-) diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c index 37e6f1f..383586f 100644 --- a/hw/fw_cfg.c +++ b/hw/fw_cfg.c @@ -55,6 +55,13 @@ struct FWCfgState { uint32_t cur_offset; }; +/* Target address and size for DMA operations. This is only used + * during boot and across 32 and 64 bit architectures, so only writes + * to lower 4GB addresses are supported. + */ +static uint32_t dma_addr = 0; +static uint32_t dma_size = 0; + static void fw_cfg_write(FWCfgState *s, uint8_t value) { int arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL); @@ -98,7 +105,22 @@ static uint8_t fw_cfg_read(FWCfgState *s) if (s->cur_entry == FW_CFG_INVALID || !e->data || s->cur_offset >= e->len) ret = 0; -else +else if (s->cur_entry & FW_CFG_DMA) { +if (dma_size > e->len - s->cur_offset) +dma_size = e->len - s->cur_offset; + +cpu_physical_memory_write ((target_phys_addr_t) dma_addr, + &e->data[s->cur_offset], + dma_size); +s->cur_offset += e->len; +/* Returns 0 if there was an error, 1 if the DMA operation + * started. Callers *must* poll for completion of the + * operation (waiting for FW_CFG_DMA_DONE == 0), even though + * in the current implementation the operation completes + * instantaneously from the p.o.v of the current guest vCPU. + */ +ret = 1; +} else ret = e->data[s->cur_offset++]; FW_CFG_DPRINTF("read %d\n", ret); @@ -352,6 +374,17 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port, fw_cfg_add_i16(s, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu); +fw_cfg_add_bytes(s, FW_CFG_DMA_ADDR | FW_CFG_WRITE_CHANNEL, + (uint8_t *)&dma_addr, sizeof dma_addr); +fw_cfg_add_bytes(s, FW_CFG_DMA_SIZE | FW_CFG_WRITE_CHANNEL, + (uint8_t *)&dma_size, sizeof dma_size); +/* Current implementation is synchronous, so this value always reads + * as 0 (meaning "done"). In other possible implementations, this + * could return > 0 indicating that the caller should continue polling + * for completion of the operation. + */ +fw_cfg_add_i32(s, FW_CFG_DMA_DONE, 0); + return s; } diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h index 4d13a4f..abc41c9 100644 --- a/hw/fw_cfg.h +++ b/hw/fw_cfg.h @@ -30,11 +30,17 @@ #define FW_CFG_FILE_FIRST 0x20 #define FW_CFG_FILE_SLOTS 0x10 -#define FW_CFG_MAX_ENTRY(FW_CFG_FILE_FIRST+FW_CFG_FILE_SLOTS) +#define FW_CFG_DMA_ADDR 0x30 +#define FW_CFG_DMA_SIZE 0x31 +#define FW_CFG_DMA_DONE 0x32 + +#define FW_CFG_MAX_ENTRY(FW_CFG_DMA_DONE+1) + +#define FW_CFG_DMA 0x2000 #define FW_CFG_WRITE_CHANNEL0x4000 #define FW_CFG_ARCH_LOCAL 0x8000 -#define FW_CFG_ENTRY_MASK ~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL) +#define FW_CFG_ENTRY_MASK ~(FW_CFG_DMA | FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL) #define FW_CFG_INVALID 0x diff --git a/pc-bios/optionrom/linuxboot.S b/pc-bios/optionrom/linuxboot.S index c109363..dbf44cb 100644 --- a/pc-bios/optionrom/linuxboot.S +++ b/pc-bios/optionrom/linuxboot.S @@ -106,10 +106,10 @@ copy_kernel: /* We're now running in 16-bit CS, but 32-bit ES! */ /*
Re: [Qemu-devel] [PATCH] block migraton: check sectors before shift operation.
2010/7/19 Kevin Wolf : > Am 19.07.2010 06:45, schrieb Yoshiaki Tamura: >> Commit d246673dcb9911218ff555bcdf28b250e38fa46c has expanded the types >> of block drive that can be initialized for block migration. Â Although >> bdrv_getlength() may return < 0, current code shifts it without >> checking. Â This makes block migration initialization invalid and >> results in abort() due to calling qemu_malloc() with 0 size at >> bdrv_set_dirty_tracking(). Â This patch checks the return value of >> bdrv_getlength() by masking with BDRV_SECTOR_MASK. >> >> Signed-off-by: Yoshiaki Tamura > > I applied a similar patch by Shahar Havivi to the block branch a few > days ago. Oops. Missed that discussion. Yoshi > > Kevin > >
[Qemu-devel] Re: [PATCH 0/2 version 2] fw_cfg: Implement fast "DMA"-type operation for rapidly copying in kernel, initrd [etc] into the guest
On Mon, Jul 19, 2010 at 11:49:09AM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 01:45:00PM +0300, Gleb Natapov wrote: > > On Mon, Jul 19, 2010 at 11:15:04AM +0100, Richard W.M. Jones wrote: > > > > > > This is the second version of the patch. > > > > > > We don't use the word "blit" any more, instead this is replaced with > > > "DMA", even though it's not quite like a DMA operation on physical > > > hardware. > > > > > You ignored the whole discussion above. Calling things DMA will not make > > them so. You haven't event implemented Alexander's suggestion to poll > > for DMA completion which will at lease make the interface to the guest > > palatable. > > I read everything in the discussion. > > I can add polling however. > If copying (call to cpu_physical_memory_write()) really takes 6 or more seconds we should really make it async from the beginning. (If we are going this way at all. I prefer to use virtio-serial for such complex gust/host communication. fw_cfg was designed to be simple at should stay so. And it is used by other arches too so any extension should be usable there). -- Gleb.
[Qemu-devel] Re: [PATCH 0/2 version 2] fw_cfg: Implement fast "DMA"-type operation for rapidly copying in kernel, initrd [etc] into the guest
On Mon, Jul 19, 2010 at 01:45:00PM +0300, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 11:15:04AM +0100, Richard W.M. Jones wrote: > > > > This is the second version of the patch. > > > > We don't use the word "blit" any more, instead this is replaced with > > "DMA", even though it's not quite like a DMA operation on physical > > hardware. > > > You ignored the whole discussion above. Calling things DMA will not make > them so. You haven't event implemented Alexander's suggestion to poll > for DMA completion which will at lease make the interface to the guest > palatable. I read everything in the discussion. I can add polling however. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones New in Fedora 11: Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 70 libraries supprt'd http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw
[Qemu-devel] Re: [PATCH 0/2 version 2] fw_cfg: Implement fast "DMA"-type operation for rapidly copying in kernel, initrd [etc] into the guest
On Mon, Jul 19, 2010 at 11:15:04AM +0100, Richard W.M. Jones wrote: > > This is the second version of the patch. > > We don't use the word "blit" any more, instead this is replaced with > "DMA", even though it's not quite like a DMA operation on physical > hardware. > You ignored the whole discussion above. Calling things DMA will not make them so. You haven't event implemented Alexander's suggestion to poll for DMA completion which will at lease make the interface to the guest palatable. > The guest writes the physical address and size to two 32 bit fw_cfg > variables. Then when the guest issues an ordinary read operation with > the extra FW_CFG_DMA flag set, instead of returning a single byte, > qemu "DMA"s the requested data into the guest memory. > > The guest shouldn't be able to request a dma_size larger than the > amount of data in the entry. The patch checks this and adjusts > dma_size. > > The guest might select a dma_addr which does not correspond to > physical memory (or dma_addr + dma_size). Reading the code it seems > to be that cpu_physical_memory_write catches this case and will > abort() (so the guest is only harming itself). However I'd quite like > an expert opinion on this ... > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > virt-top is 'top' for virtual machines. Tiny program with many > powerful monitoring features, net stats, disk stats, logging, etc. > http://et.redhat.com/~rjones/virt-top -- Gleb.
[Qemu-devel] [PATCH 2/2 version 2] fw_cfg: Allow guest to read kernel etc via fast, synchronous "DMA"-type operation.
-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ >From 55d4700262253da52aa403dc1ba68da2ae91b084 Mon Sep 17 00:00:00 2001 From: Richard Jones Date: Sat, 17 Jul 2010 14:30:46 +0100 Subject: [PATCH 2/2] fw_cfg: Allow guest to read kernel etc via fast, synchronous "DMA"-type operation. This adds a "DMA" operation for rapidly copying the kernel, initrd etc into the guest. The guest sets up a DMA address and size and then issues the usual read operation but with the FW_CFG_DMA bit set on the entry number. QEmu then just copies the whole config entry to the selected physical address synchronously. This saves some time when loading large images. This change is backwards compatible. ROMs using the old method will work unchanged. Signed-off-by: Richard W.M. Jones --- hw/fw_cfg.c | 22 +- hw/fw_cfg.h |8 ++-- pc-bios/optionrom/linuxboot.S |8 pc-bios/optionrom/optionrom.h | 37 + 4 files changed, 68 insertions(+), 7 deletions(-) diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c index 37e6f1f..798a332 100644 --- a/hw/fw_cfg.c +++ b/hw/fw_cfg.c @@ -55,6 +55,13 @@ struct FWCfgState { uint32_t cur_offset; }; +/* Target address and size for DMA operations. This is only used + * during boot and across 32 and 64 bit architectures, so only writes + * to lower 4GB addresses are supported. + */ +static uint32_t dma_addr = 0; +static uint32_t dma_size = 0; + static void fw_cfg_write(FWCfgState *s, uint8_t value) { int arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL); @@ -98,7 +105,16 @@ static uint8_t fw_cfg_read(FWCfgState *s) if (s->cur_entry == FW_CFG_INVALID || !e->data || s->cur_offset >= e->len) ret = 0; -else +else if (s->cur_entry & FW_CFG_DMA) { +if (dma_size > e->len - s->cur_offset) +dma_size = e->len - s->cur_offset; + +cpu_physical_memory_write ((target_phys_addr_t) dma_addr, + &e->data[s->cur_offset], + dma_size); +s->cur_offset += e->len; +ret = 1; +} else ret = e->data[s->cur_offset++]; FW_CFG_DPRINTF("read %d\n", ret); @@ -351,6 +367,10 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port, fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); fw_cfg_add_i16(s, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu); +fw_cfg_add_bytes(s, FW_CFG_DMA_ADDR | FW_CFG_WRITE_CHANNEL, + (uint8_t *)&dma_addr, sizeof dma_addr); +fw_cfg_add_bytes(s, FW_CFG_DMA_SIZE | FW_CFG_WRITE_CHANNEL, + (uint8_t *)&dma_size, sizeof dma_size); return s; } diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h index 4d13a4f..44b2be5 100644 --- a/hw/fw_cfg.h +++ b/hw/fw_cfg.h @@ -30,11 +30,15 @@ #define FW_CFG_FILE_FIRST 0x20 #define FW_CFG_FILE_SLOTS 0x10 -#define FW_CFG_MAX_ENTRY(FW_CFG_FILE_FIRST+FW_CFG_FILE_SLOTS) +#define FW_CFG_DMA_ADDR 0x30 +#define FW_CFG_DMA_SIZE 0x31 +#define FW_CFG_MAX_ENTRY(FW_CFG_DMA_SIZE+1) + +#define FW_CFG_DMA 0x2000 #define FW_CFG_WRITE_CHANNEL0x4000 #define FW_CFG_ARCH_LOCAL 0x8000 -#define FW_CFG_ENTRY_MASK ~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL) +#define FW_CFG_ENTRY_MASK ~(FW_CFG_DMA | FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL) #define FW_CFG_INVALID 0x diff --git a/pc-bios/optionrom/linuxboot.S b/pc-bios/optionrom/linuxboot.S index c109363..dbf44cb 100644 --- a/pc-bios/optionrom/linuxboot.S +++ b/pc-bios/optionrom/linuxboot.S @@ -106,10 +106,10 @@ copy_kernel: /* We're now running in 16-bit CS, but 32-bit ES! */ /* Load kernel and initrd */ - read_fw_blob_addr32(FW_CFG_KERNEL) - read_fw_blob_addr32(FW_CFG_INITRD) - read_fw_blob_addr32(FW_CFG_CMDLINE) - read_fw_blob_addr32(FW_CFG_SETUP) + read_fw_blob_dma(FW_CFG_KERNEL) + read_fw_blob_dma(FW_CFG_INITRD) + read_fw_blob_dma(FW_CFG_CMDLINE) + read_fw_blob_dma(FW_CFG_SETUP) /* And now jump into Linux! */ mov $0, %eax diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h index fbdd48a..7fffe2d 100644 --- a/pc-bios/optionrom/optionrom.h +++ b/pc-bios/optionrom/optionrom.h @@ -50,6 +50,27 @@ bswap %eax .endm +/* + * Write %eax to a variable in the fw_cfg device. + * In: %eax + * Clobbers: %edx + */ +.macro write_fw VAR +push%eax +mov $(\VAR|FW_CFG_WRITE_CHANNEL), %ax +mov $BIOS_CFG_IOPORT_CFG, %dx +outw%ax, (%dx) +pop %eax
[Qemu-devel] [PATCH 1/2 version 2] Don't call fw_cfg e->callback if e->callback is NULL.
-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://et.redhat.com/~rjones/libguestfs/ See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html >From 1fe39da6476a6ff05e9705cd7f63f94c93746053 Mon Sep 17 00:00:00 2001 From: Richard Jones Date: Sat, 17 Jul 2010 14:23:21 +0100 Subject: [PATCH 1/2] Don't call fw_cfg e->callback if e->callback is NULL. If you set up a fw_cfg writable entry without a callback, then e->callback is still called, causing qemu to segfault. Luckily since nothing in qemu uses writable entries at the moment, this is not exploitable. Signed-off-by: Richard W.M. Jones --- hw/fw_cfg.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c index 72866ae..37e6f1f 100644 --- a/hw/fw_cfg.c +++ b/hw/fw_cfg.c @@ -65,7 +65,8 @@ static void fw_cfg_write(FWCfgState *s, uint8_t value) if (s->cur_entry & FW_CFG_WRITE_CHANNEL && s->cur_offset < e->len) { e->data[s->cur_offset++] = value; if (s->cur_offset == e->len) { -e->callback(e->callback_opaque, e->data); +if (e->callback) +e->callback(e->callback_opaque, e->data); s->cur_offset = 0; } } -- 1.7.1
[Qemu-devel] [PATCH 0/2 version 2] fw_cfg: Implement fast "DMA"-type operation for rapidly copying in kernel, initrd [etc] into the guest
This is the second version of the patch. We don't use the word "blit" any more, instead this is replaced with "DMA", even though it's not quite like a DMA operation on physical hardware. The guest writes the physical address and size to two 32 bit fw_cfg variables. Then when the guest issues an ordinary read operation with the extra FW_CFG_DMA flag set, instead of returning a single byte, qemu "DMA"s the requested data into the guest memory. The guest shouldn't be able to request a dma_size larger than the amount of data in the entry. The patch checks this and adjusts dma_size. The guest might select a dma_addr which does not correspond to physical memory (or dma_addr + dma_size). Reading the code it seems to be that cpu_physical_memory_write catches this case and will abort() (so the guest is only harming itself). However I'd quite like an expert opinion on this ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top
Re: [Qemu-devel] [PATCH v2] block: Change bdrv_commit to handle multiple sectors at once
On Mon, Jul 19, 2010 at 10:59 AM, Kevin Wolf wrote: > bdrv_commit copies the image to its backing file sector by sector, which > is (surprise!) relatively slow. Let's take a larger buffer and handle more > sectors at once if possible. > > With a 1G qcow2 file, this brought the time bdrv_commit takes down from > 5:06 min to 1:14 min for me. > > Signed-off-by: Kevin Wolf > --- Looks good. Stefan
[Qemu-devel] [PATCH v2] block: Change bdrv_commit to handle multiple sectors at once
bdrv_commit copies the image to its backing file sector by sector, which is (surprise!) relatively slow. Let's take a larger buffer and handle more sectors at once if possible. With a 1G qcow2 file, this brought the time bdrv_commit takes down from 5:06 min to 1:14 min for me. Signed-off-by: Kevin Wolf --- block.c | 39 --- 1 files changed, 20 insertions(+), 19 deletions(-) diff --git a/block.c b/block.c index 65cf4dc..6e3fc49 100644 --- a/block.c +++ b/block.c @@ -724,14 +724,16 @@ int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res) return bs->drv->bdrv_check(bs, res); } +#define COMMIT_BUF_SECTORS 2048 + /* commit COW file into the raw image */ int bdrv_commit(BlockDriverState *bs) { BlockDriver *drv = bs->drv; -int64_t i, total_sectors; -int n, j, ro, open_flags; +int64_t sector, total_sectors; +int n, ro, open_flags; int ret = 0, rw_ret = 0; -unsigned char sector[BDRV_SECTOR_SIZE]; +uint8_t *buf; char filename[1024]; BlockDriverState *bs_rw, *bs_ro; @@ -774,22 +776,20 @@ int bdrv_commit(BlockDriverState *bs) } total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS; -for (i = 0; i < total_sectors;) { -if (drv->bdrv_is_allocated(bs, i, 65536, &n)) { -for(j = 0; j < n; j++) { -if (bdrv_read(bs, i, sector, 1) != 0) { -ret = -EIO; -goto ro_cleanup; -} +buf = qemu_malloc(COMMIT_BUF_SECTORS * BDRV_SECTOR_SIZE); -if (bdrv_write(bs->backing_hd, i, sector, 1) != 0) { -ret = -EIO; -goto ro_cleanup; -} -i++; - } - } else { -i += n; +for (sector = 0; sector < total_sectors; sector += n) { +if (drv->bdrv_is_allocated(bs, sector, COMMIT_BUF_SECTORS, &n)) { + +if (bdrv_read(bs, sector, buf, n) != 0) { +ret = -EIO; +goto ro_cleanup; +} + +if (bdrv_write(bs->backing_hd, sector, buf, n) != 0) { +ret = -EIO; +goto ro_cleanup; +} } } @@ -806,6 +806,7 @@ int bdrv_commit(BlockDriverState *bs) bdrv_flush(bs->backing_hd); ro_cleanup: +qemu_free(buf); if (ro) { /* re-open as RO */ -- 1.7.1.1
Re: [Qemu-devel] [PATCH] block: Change bdrv_commit to handle multiple sectors at once
Am 16.07.2010 20:16, schrieb Christoph Hellwig: > On Fri, Jul 16, 2010 at 06:17:36PM +0200, Kevin Wolf wrote: >> +buf = qemu_malloc(2048 * BDRV_SECTOR_SIZE); > > Please add a COMMIT_BUF_SIZE #define instead of the hardcoded 2048 in > various places. > >> for (i = 0; i < total_sectors;) { >> +if (drv->bdrv_is_allocated(bs, i, 2048, &n)) { >> >> +if (bdrv_read(bs, i, buf, n) != 0) { >> +ret = -EIO; >> +goto ro_cleanup; >> +} >> + >> +if (bdrv_write(bs->backing_hd, i, buf, n) != 0) { >> +ret = -EIO; >> +goto ro_cleanup; >> +} >> } >> +i += n; > > Maybe it's just me, but I'd prefer n getting a more descriptive name > (e.g. sector) and moving the increment of it into the for loop, e.g. > > for (sector = 0; sector < total_sectors; sector += n) { > if (!drv->bdrv_is_allocated(bs, i, 2048, &n)) > continue; > ... > } So you mean i should be renamed, not n, right? I'll change these points in v2. Kevin
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 11:21:34AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 11:19, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 11:13:38AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 11:10, Gleb Natapov wrote: > >> > >>> On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 11:00, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 10:48, Gleb Natapov wrote: > >> > >>> > Were there DMA capable devices back in ISA times? There must be. If > so, we can just take a look at what they do and do it similarly. Bus > mastering was a new thing for PCI, right? > > >>> I think IDE can be considered DMA capable ISA device, no? At least > >>> it works by writing to PIO ports and getting result into memory, but > >>> with interrupts and status bits and everything that real device should > >>> have. On board DMA engine is also ISA device. > >> > >> We could define our device to be polling. So all we need is a status > >> bit that the guest sets when it starts the DMA and the device unsets > >> when the DMA is done. In our case that should be immediate, because > >> the PIO invokes the full code paths, but it would look more like a > >> real device, no? > >> > > This is better, but it shouldn't be synchronous. Kernel and initrd are > > on disk so why not setup aio and read them from io thread allowing vcpu > > thread immediately return to guest mode to process interrupts. > > That would work with the above described device model. If we're going > synchronous or asynchronous would become an implementation detail. > > >>> If vcpu thread will sleep for too much time without processing events we > >>> can see strange timeouts in a guest. > >> > >> I don't think I understand what you mean? > >> > > Vcpu executes "in %ax". Next instruction is executed 6 seconds later. > > All timers that should have been processed during this time fire at the > > same moment triggering all kind of timeouts. Think about watchdog that > > should be written into every two seconds otherwise it does reset. > > That's a hypervisor implementation detail! If we want to go synchronously, we > do. If something breaks, we don't. It is. And it is a bug in the interface that we knowingly introduce. Do we want to do that? > Doing it synchronously simpllifies things a lot. And we're talking about a > device that's only used before the OS kicks in. There's no use in pretending > we're running a watchdog there. On sane (embedded) HW that uses watchdog firmware tickle it too. We do not want to stuck in firmware. Actually sane watchdog can't be stopped after it is started. I see a compelling use case for watchdog support in seabios. And Seabios nowadays is complicated an runs a lot of code that use interrupts. > > > > >>> > > Or why > > not use virtio-serial while we are at it? After all virtio-serial is > > there to allow host and guest communication. > > Because virtio-serial needs us to set up the full virtio-pci stack. > That's too much to mess with in an option rom IMHO. > > >>> We already do it for virtio-blk. Read only support is very small in > >>> LOC there. Don't know about virtio-serial protocol. > >> > >> The virtio-blk model uses the whole pxe framework. For our in-tree option > >> roms we're trying to be simple. And I'd like to keep it that way. I really > >> don't want to add PCI enumeration and BAR setup to that code. > >> > > The virtio-blk is entirely in seabios and does not use pxe at all! > > So it uses even more framework :). The linuxboot stuff is completely separate > in its very own option rom. > How this option rom is loaded? -- Gleb.
Re: [Qemu-devel] [PATCH] block migraton: check sectors before shift operation.
Am 19.07.2010 06:45, schrieb Yoshiaki Tamura: > Commit d246673dcb9911218ff555bcdf28b250e38fa46c has expanded the types > of block drive that can be initialized for block migration. Although > bdrv_getlength() may return < 0, current code shifts it without > checking. This makes block migration initialization invalid and > results in abort() due to calling qemu_malloc() with 0 size at > bdrv_set_dirty_tracking(). This patch checks the return value of > bdrv_getlength() by masking with BDRV_SECTOR_MASK. > > Signed-off-by: Yoshiaki Tamura I applied a similar patch by Shahar Havivi to the block branch a few days ago. Kevin
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 11:19, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 11:13:38AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 11:10, Gleb Natapov wrote: >> >>> On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote: On 19.07.2010, at 11:00, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 10:48, Gleb Natapov wrote: >> >>> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right? >>> I think IDE can be considered DMA capable ISA device, no? At least >>> it works by writing to PIO ports and getting result into memory, but >>> with interrupts and status bits and everything that real device should >>> have. On board DMA engine is also ISA device. >> >> We could define our device to be polling. So all we need is a status bit >> that the guest sets when it starts the DMA and the device unsets when >> the DMA is done. In our case that should be immediate, because the PIO >> invokes the full code paths, but it would look more like a real device, >> no? >> > This is better, but it shouldn't be synchronous. Kernel and initrd are > on disk so why not setup aio and read them from io thread allowing vcpu > thread immediately return to guest mode to process interrupts. That would work with the above described device model. If we're going synchronous or asynchronous would become an implementation detail. >>> If vcpu thread will sleep for too much time without processing events we >>> can see strange timeouts in a guest. >> >> I don't think I understand what you mean? >> > Vcpu executes "in %ax". Next instruction is executed 6 seconds later. > All timers that should have been processed during this time fire at the > same moment triggering all kind of timeouts. Think about watchdog that > should be written into every two seconds otherwise it does reset. That's a hypervisor implementation detail! If we want to go synchronously, we do. If something breaks, we don't. Doing it synchronously simpllifies things a lot. And we're talking about a device that's only used before the OS kicks in. There's no use in pretending we're running a watchdog there. > >>> > Or why > not use virtio-serial while we are at it? After all virtio-serial is > there to allow host and guest communication. Because virtio-serial needs us to set up the full virtio-pci stack. That's too much to mess with in an option rom IMHO. >>> We already do it for virtio-blk. Read only support is very small in >>> LOC there. Don't know about virtio-serial protocol. >> >> The virtio-blk model uses the whole pxe framework. For our in-tree option >> roms we're trying to be simple. And I'd like to keep it that way. I really >> don't want to add PCI enumeration and BAR setup to that code. >> > The virtio-blk is entirely in seabios and does not use pxe at all! So it uses even more framework :). The linuxboot stuff is completely separate in its very own option rom. Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 12:19:22PM +0300, Gleb Natapov wrote: > Vcpu executes "in %ax". Next instruction is executed 6 seconds later. > All timers that should have been processed during this time fire at the > same moment triggering all kind of timeouts. Think about watchdog that > should be written into every two seconds otherwise it does reset. This particular code runs very early in boot, and the atomic copy operation is very quick even with a 100MB initrd. But the question I think should be: If a guest maliciously (after boot) tried to use this mechanism, could it do harm to the host? Or would it just harm itself? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://et.redhat.com/~rjones/libguestfs/ See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html
Re: [Qemu-devel] [PATCH 2/2] fw_cfg: Add blit operation for copying kernel, initrd, ..
On Mon, Jul 19, 2010 at 08:23:34AM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 01:59:22AM +0200, Aurelien Jarno wrote: > > OpenBIOS also uses the same firmware interface, so it would need to be > > changed if this patch is accepted. > > The patch leaves the old interface. Does it still need to be changed? > As long as the old interface is kept, that should be fine for OpenBIOS. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 11:15, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 11:06, Gleb Natapov wrote: >> >>> On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote: virt-install is another program that uses explicit -initrd. >>> Installation takes a lot of time. Saving 1 second there will not be >>> noticeable. And during lifetime of installed VM initrd will be loaded >>> from its disk. >> >> Guys, please. It shouldn't be one or the other. Let's make sure both ways of >> doing things are fast. That's what users want: fast. >> > That what we are talking about, no? We are trying to find faster way to > load kernel/initrd and stay architectural. Honestly I would expect much > greater speedup from Richard's approach like 2 seconds vs 8 seconds. It > is hard to justify code complication just for 1 second speedup. I agree. Hence I'd like to keep the complication as simple as possible. Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 11:13:38AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 11:10, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 11:00, Gleb Natapov wrote: > >> > >>> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 10:48, Gleb Natapov wrote: > > > > >> Were there DMA capable devices back in ISA times? There must be. If > >> so, we can just take a look at what they do and do it similarly. Bus > >> mastering was a new thing for PCI, right? > >> > > I think IDE can be considered DMA capable ISA device, no? At least > > it works by writing to PIO ports and getting result into memory, but > > with interrupts and status bits and everything that real device should > > have. On board DMA engine is also ISA device. > > We could define our device to be polling. So all we need is a status bit > that the guest sets when it starts the DMA and the device unsets when > the DMA is done. In our case that should be immediate, because the PIO > invokes the full code paths, but it would look more like a real device, > no? > > >>> This is better, but it shouldn't be synchronous. Kernel and initrd are > >>> on disk so why not setup aio and read them from io thread allowing vcpu > >>> thread immediately return to guest mode to process interrupts. > >> > >> That would work with the above described device model. If we're going > >> synchronous or asynchronous would become an implementation detail. > >> > > If vcpu thread will sleep for too much time without processing events we > > can see strange timeouts in a guest. > > I don't think I understand what you mean? > Vcpu executes "in %ax". Next instruction is executed 6 seconds later. All timers that should have been processed during this time fire at the same moment triggering all kind of timeouts. Think about watchdog that should be written into every two seconds otherwise it does reset. > > > >>> Or why > >>> not use virtio-serial while we are at it? After all virtio-serial is > >>> there to allow host and guest communication. > >> > >> Because virtio-serial needs us to set up the full virtio-pci stack. That's > >> too much to mess with in an option rom IMHO. > >> > > We already do it for virtio-blk. Read only support is very small in > > LOC there. Don't know about virtio-serial protocol. > > The virtio-blk model uses the whole pxe framework. For our in-tree option > roms we're trying to be simple. And I'd like to keep it that way. I really > don't want to add PCI enumeration and BAR setup to that code. > The virtio-blk is entirely in seabios and does not use pxe at all! -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote: > Richard, what does kvm_stat tell you while loading the initrd? Are > there a lot of PIO requests or are we simply looping inside qemu > code? The two attached files were made by running kvm_stat -l > /tmp/... during a single run starting libguestfs. This use of kvm_stat is as described in Chris's blog entry here: http://clalance.blogspot.com/2009/01/kvm-performance-tools.html The first attachment ('no-patch') is without the proposed patch. The second attachment ('with-patch') is with the proposed patch. It seems some numbers such as #vmexits are lower with the proposed patch, although not by a very much. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v efer_relo exits fpu_reloa halt_exit halt_wake host_stat hypercall insn_emul insn_emul invlpg io_exits irq_exits irq_injec irq_windo largepage mmio_exit mmu_cache mmu_flood mmu_pde_z mmu_pte_u mmu_pte_w mmu_recyc mmu_shado mmu_unsyn nmi_injec nmi_windo pf_fixed pf_guest remote_tl request_i signal_ex tlb_flush 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 147464 10991 0 0 75176 0 70151 0 0 75122106 8 11 0 7 49 0 0 0 56360 0 94 0 0 0390 0 0 0 39 0 0 21165 0 0 0 21159 0 21139 0 0 21151 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 0 0 108370 1024 0 0 57189 0 9936 0 8674 55697342108209 0 1014 2828610180 0776 0 2754 0 0 0 24981 0 0 0286 10996 0 78421 0 0 0 16362 0 1828 0 0 12189 2136 1073 1371 0 1827 93 0 1 0 1 0 0 2 0 0 46659 0 0 0 2041 1 0 52784611 8 0 13498 0 2722 0 1084 10562 1236769 1106 0 1317769276 1138 0 1405 0630 42 0 0 31694 4779 0 0 1227 2799 0 122866 7220 0 0 6 0 11677 0 7619 6596 1655 1137 1560 0 2035 5133 1956 7687 0 9642 0 4375472 0 0 62778 32417 0 0 1976 19957 0 48502 19852 9 0 22262 0 3173 0 1607 19781802 1199 1700 0799 1063452 1918 0 2370 0 1125 49 0 0 14706 7666 0 0642 4795 0-579572 -39698-17 0-216762 0 -120626 0 -18984-201098 -6295 -4294 -5957 0 -6999 -9935 -3294 -10924 0 -70554 0 -8978 -565 0 0-181208 -44862 0 0 -6227 -38548 0 0 0 0 0 0 0 0 0 0 0 0
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 11:10, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 11:00, Gleb Natapov wrote: >> >>> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote: On 19.07.2010, at 10:48, Gleb Natapov wrote: > >> Were there DMA capable devices back in ISA times? There must be. If so, >> we can just take a look at what they do and do it similarly. Bus >> mastering was a new thing for PCI, right? >> > I think IDE can be considered DMA capable ISA device, no? At least > it works by writing to PIO ports and getting result into memory, but > with interrupts and status bits and everything that real device should > have. On board DMA engine is also ISA device. We could define our device to be polling. So all we need is a status bit that the guest sets when it starts the DMA and the device unsets when the DMA is done. In our case that should be immediate, because the PIO invokes the full code paths, but it would look more like a real device, no? >>> This is better, but it shouldn't be synchronous. Kernel and initrd are >>> on disk so why not setup aio and read them from io thread allowing vcpu >>> thread immediately return to guest mode to process interrupts. >> >> That would work with the above described device model. If we're going >> synchronous or asynchronous would become an implementation detail. >> > If vcpu thread will sleep for too much time without processing events we can > see strange timeouts in a guest. I don't think I understand what you mean? > >>> Or why >>> not use virtio-serial while we are at it? After all virtio-serial is >>> there to allow host and guest communication. >> >> Because virtio-serial needs us to set up the full virtio-pci stack. That's >> too much to mess with in an option rom IMHO. >> > We already do it for virtio-blk. Read only support is very small in > LOC there. Don't know about virtio-serial protocol. The virtio-blk model uses the whole pxe framework. For our in-tree option roms we're trying to be simple. And I'd like to keep it that way. I really don't want to add PCI enumeration and BAR setup to that code. Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 11:06, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote: > >> > >> virt-install is another program that uses explicit -initrd. > >> > > Installation takes a lot of time. Saving 1 second there will not be > > noticeable. And during lifetime of installed VM initrd will be loaded > > from its disk. > > Guys, please. It shouldn't be one or the other. Let's make sure both ways of > doing things are fast. That's what users want: fast. > That what we are talking about, no? We are trying to find faster way to load kernel/initrd and stay architectural. Honestly I would expect much greater speedup from Richard's approach like 2 seconds vs 8 seconds. It is hard to justify code complication just for 1 second speedup. -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 11:00, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 10:48, Gleb Natapov wrote: > >> > >>> > Were there DMA capable devices back in ISA times? There must be. If so, > we can just take a look at what they do and do it similarly. Bus > mastering was a new thing for PCI, right? > > >>> I think IDE can be considered DMA capable ISA device, no? At least > >>> it works by writing to PIO ports and getting result into memory, but > >>> with interrupts and status bits and everything that real device should > >>> have. On board DMA engine is also ISA device. > >> > >> We could define our device to be polling. So all we need is a status bit > >> that the guest sets when it starts the DMA and the device unsets when the > >> DMA is done. In our case that should be immediate, because the PIO invokes > >> the full code paths, but it would look more like a real device, no? > >> > > This is better, but it shouldn't be synchronous. Kernel and initrd are > > on disk so why not setup aio and read them from io thread allowing vcpu > > thread immediately return to guest mode to process interrupts. > > That would work with the above described device model. If we're going > synchronous or asynchronous would become an implementation detail. > If vcpu thread will sleep for too much time without processing events we can see strange timeouts in a guest. > > Or why > > not use virtio-serial while we are at it? After all virtio-serial is > > there to allow host and guest communication. > > Because virtio-serial needs us to set up the full virtio-pci stack. That's > too much to mess with in an option rom IMHO. > We already do it for virtio-blk. Read only support is very small in LOC there. Don't know about virtio-serial protocol. -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 11:06, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote: >> >> virt-install is another program that uses explicit -initrd. >> > Installation takes a lot of time. Saving 1 second there will not be > noticeable. And during lifetime of installed VM initrd will be loaded > from its disk. Guys, please. It shouldn't be one or the other. Let's make sure both ways of doing things are fast. That's what users want: fast. Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 11:40:41AM +0300, Gleb Natapov wrote: > > On Mon, Jul 19, 2010 at 09:34:11AM +0100, Richard W.M. Jones wrote: > > > On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote: > > > > Why not put then on cdrom or disk? > > > > > > It simplifies device and mountpoint enumeration not to have a separate > > > disk. It would also mean we couldn't use standard Fedora paths, or > > > we'd have to have bind-mount /bin etc on to the disk mount point, > > > which again complicates things. > > > > > Can't help you here, but if it's doable you can speedup your startup > > time much more then by a second. > > This isn't true. > > The most we could save is 0.8 seconds [time taken to load the 100MB > initrd by the kernel] less the time taken to probe and mount a CD ISO But you do not need all 100MB of application, so with disk approach you load things you need on demand. > [0.2 seconds - measured using guestfish] less the time taken to load > programs from this CD. So the most we could save would be 0.6 > seconds, and in reality it'd be less than this if we actually loaded > and ran any programs from the CD at all. > > My patch saves 1 second, and all the programs are in RAM. > And it will take 100M of a host ram. > > Most users load initrd from a disk not by -initrd option. > > It's unusual, but on my production webserver I use -kernel and -initrd > options explicitly. That's because I want all my VMs to share a > single kernel. > How often you restart them? > virt-install is another program that uses explicit -initrd. > Installation takes a lot of time. Saving 1 second there will not be noticeable. And during lifetime of installed VM initrd will be loaded from its disk. -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote: [...] OK, it's early in the morning and I can't do maths. But we're still asking a big increase in complexity versus optimizing something which is just slow in qemu at the moment. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 11:00, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 10:48, Gleb Natapov wrote: >> >>> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right? >>> I think IDE can be considered DMA capable ISA device, no? At least >>> it works by writing to PIO ports and getting result into memory, but >>> with interrupts and status bits and everything that real device should >>> have. On board DMA engine is also ISA device. >> >> We could define our device to be polling. So all we need is a status bit >> that the guest sets when it starts the DMA and the device unsets when the >> DMA is done. In our case that should be immediate, because the PIO invokes >> the full code paths, but it would look more like a real device, no? >> > This is better, but it shouldn't be synchronous. Kernel and initrd are > on disk so why not setup aio and read them from io thread allowing vcpu > thread immediately return to guest mode to process interrupts. That would work with the above described device model. If we're going synchronous or asynchronous would become an implementation detail. > Or why > not use virtio-serial while we are at it? After all virtio-serial is > there to allow host and guest communication. Because virtio-serial needs us to set up the full virtio-pci stack. That's too much to mess with in an option rom IMHO. Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 10:48, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 10:41:48AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 10:30, Gleb Natapov wrote: > >> > >>> On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 10:19, Gleb Natapov wrote: > > Yes and no. It sounds nice at first, but doesn't quite fit. There are > two issues: > > 1) We need a new PCI ID > >>> We have our range. We can allocate from there. > >>> > 2) There can be a lot of initrd binaries with multiboot. We only have a > limited amount of BARs > > >>> Is it supported now with fw_cfg interface? My main concern with this > >>> approach is huge BAR size that may take a lot of space from PCI MMIO range > >>> if guest OS decide to configure it. > >> > >> Oh, right. I think I combined all the modules into the INITRD blob. Yeah, > >> that would work. Is coalesced MMIO more efficient than coalesced PIO? Or > >> do we have to do some RAM mapping for those special BAR regions? > >> > > I think we will have to do RAM mapping. Otherwise it may be slow to. > > Coalesced MMIO is for write not read IIRC. > > Oh, right. Makes sense. > > > > >> Were there DMA capable devices back in ISA times? There must be. If so, we > >> can just take a look at what they do and do it similarly. Bus mastering > >> was a new thing for PCI, right? > >> > > I think IDE can be considered DMA capable ISA device, no? At least > > it works by writing to PIO ports and getting result into memory, but > > with interrupts and status bits and everything that real device should > > have. On board DMA engine is also ISA device. > > We could define our device to be polling. So all we need is a status bit that > the guest sets when it starts the DMA and the device unsets when the DMA is > done. In our case that should be immediate, because the PIO invokes the full > code paths, but it would look more like a real device, no? > This is better, but it shouldn't be synchronous. Kernel and initrd are on disk so why not setup aio and read them from io thread allowing vcpu thread immediately return to guest mode to process interrupts. Or why not use virtio-serial while we are at it? After all virtio-serial is there to allow host and guest communication. > outb(PORT_DMA_CTL, FWCFG_DMA_ENABLE); > while (inb(PORT_DMA_CTL) & FWCFG_DMA_ENABLE) { > /* DMA going on */ > } > /* DMA done */ > > > Alex -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 11:40:41AM +0300, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 09:34:11AM +0100, Richard W.M. Jones wrote: > > On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote: > > > Why not put then on cdrom or disk? > > > > It simplifies device and mountpoint enumeration not to have a separate > > disk. It would also mean we couldn't use standard Fedora paths, or > > we'd have to have bind-mount /bin etc on to the disk mount point, > > which again complicates things. > > > Can't help you here, but if it's doable you can speedup your startup > time much more then by a second. This isn't true. The most we could save is 0.8 seconds [time taken to load the 100MB initrd by the kernel] less the time taken to probe and mount a CD ISO [0.2 seconds - measured using guestfish] less the time taken to load programs from this CD. So the most we could save would be 0.6 seconds, and in reality it'd be less than this if we actually loaded and ran any programs from the CD at all. My patch saves 1 second, and all the programs are in RAM. > Most users load initrd from a disk not by -initrd option. It's unusual, but on my production webserver I use -kernel and -initrd options explicitly. That's because I want all my VMs to share a single kernel. virt-install is another program that uses explicit -initrd. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 10:48, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 10:41:48AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 10:30, Gleb Natapov wrote: >> >>> On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote: On 19.07.2010, at 10:19, Gleb Natapov wrote: Yes and no. It sounds nice at first, but doesn't quite fit. There are two issues: 1) We need a new PCI ID >>> We have our range. We can allocate from there. >>> 2) There can be a lot of initrd binaries with multiboot. We only have a limited amount of BARs >>> Is it supported now with fw_cfg interface? My main concern with this >>> approach is huge BAR size that may take a lot of space from PCI MMIO range >>> if guest OS decide to configure it. >> >> Oh, right. I think I combined all the modules into the INITRD blob. Yeah, >> that would work. Is coalesced MMIO more efficient than coalesced PIO? Or do >> we have to do some RAM mapping for those special BAR regions? >> > I think we will have to do RAM mapping. Otherwise it may be slow to. > Coalesced MMIO is for write not read IIRC. Oh, right. Makes sense. > >> Were there DMA capable devices back in ISA times? There must be. If so, we >> can just take a look at what they do and do it similarly. Bus mastering was >> a new thing for PCI, right? >> > I think IDE can be considered DMA capable ISA device, no? At least > it works by writing to PIO ports and getting result into memory, but > with interrupts and status bits and everything that real device should > have. On board DMA engine is also ISA device. We could define our device to be polling. So all we need is a status bit that the guest sets when it starts the DMA and the device unsets when the DMA is done. In our case that should be immediate, because the PIO invokes the full code paths, but it would look more like a real device, no? outb(PORT_DMA_CTL, FWCFG_DMA_ENABLE); while (inb(PORT_DMA_CTL) & FWCFG_DMA_ENABLE) { /* DMA going on */ } /* DMA done */ Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 10:41:48AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 10:30, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 10:19, Gleb Natapov wrote: > >> > >> Yes and no. It sounds nice at first, but doesn't quite fit. There are two > >> issues: > >> > >> 1) We need a new PCI ID > > We have our range. We can allocate from there. > > > >> 2) There can be a lot of initrd binaries with multiboot. We only have a > >> limited amount of BARs > >> > > Is it supported now with fw_cfg interface? My main concern with this > > approach is huge BAR size that may take a lot of space from PCI MMIO range > > if guest OS decide to configure it. > > Oh, right. I think I combined all the modules into the INITRD blob. Yeah, > that would work. Is coalesced MMIO more efficient than coalesced PIO? Or do > we have to do some RAM mapping for those special BAR regions? > I think we will have to do RAM mapping. Otherwise it may be slow to. Coalesced MMIO is for write not read IIRC. > Were there DMA capable devices back in ISA times? There must be. If so, we > can just take a look at what they do and do it similarly. Bus mastering was a > new thing for PCI, right? > I think IDE can be considered DMA capable ISA device, no? At least it works by writing to PIO ports and getting result into memory, but with interrupts and status bits and everything that real device should have. On board DMA engine is also ISA device. -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 10:30, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 10:19, Gleb Natapov wrote: >> >> Yes and no. It sounds nice at first, but doesn't quite fit. There are two >> issues: >> >> 1) We need a new PCI ID > We have our range. We can allocate from there. > >> 2) There can be a lot of initrd binaries with multiboot. We only have a >> limited amount of BARs >> > Is it supported now with fw_cfg interface? My main concern with this > approach is huge BAR size that may take a lot of space from PCI MMIO range > if guest OS decide to configure it. Oh, right. I think I combined all the modules into the INITRD blob. Yeah, that would work. Is coalesced MMIO more efficient than coalesced PIO? Or do we have to do some RAM mapping for those special BAR regions? Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right? Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 09:34:11AM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote: > > Why not put then on cdrom or disk? > > It simplifies device and mountpoint enumeration not to have a separate > disk. It would also mean we couldn't use standard Fedora paths, or > we'd have to have bind-mount /bin etc on to the disk mount point, > which again complicates things. > Can't help you here, but if it's doable you can speedup your startup time much more then by a second. > Anyway, what we're talking about here is a problem in qemu. How is The problem is that you want to speed up your application. There is more then one solution to the problem. If you come up with reasonable solution in qemu that it OK. > making initrd loading faster not a benefit for everyone? Every boot > has to load an initrd of some size, so making that operation faster > benefits every user, even if individually only by a small amount. > Most users load initrd from a disk not by -initrd option. -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote: > Why not put then on cdrom or disk? It simplifies device and mountpoint enumeration not to have a separate disk. It would also mean we couldn't use standard Fedora paths, or we'd have to have bind-mount /bin etc on to the disk mount point, which again complicates things. Anyway, what we're talking about here is a problem in qemu. How is making initrd loading faster not a benefit for everyone? Every boot has to load an initrd of some size, so making that operation faster benefits every user, even if individually only by a small amount. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 10:19, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 10:08:57AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 10:01, Gleb Natapov wrote: > >> > >>> On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 09:51, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 09:33, Gleb Natapov wrote: > >> > >>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > > That what I am warring about too. If we are adding device we have > > to be > > sure such device can actually exist on real hw too otherwise we may > > have > > problems later. > > I don't understand why the constraints of real h/w have anything to > do > with this. Can you explain? > > >>> Each time we do something not architectural it cause us troubles > >>> later. > >>> So constraints of real h/w is our constrains to. > >>> > > Also 1 second on 100M file does not look like huge gain to me. > > Every second counts. We're trying to get libguestfs boot times down > from 8-12 seconds to 4-5 seconds. For many cases it's an interactive > program. > > >>> So what about making initrd smaller? I remember managing two > >>> distribution in 64M flash in embedded project. > >> > >> Having a huge initrd basically helps in reusing a lot of existing > >> code. We do the same - in general the initrd is just a subset of the > >> applications of the host OS. And if you start putting perl or the > >> likes into it, it becomes big. > >> > > Why not provide small disk/cdrom with all those utilities installed? > > Because - if the loading is done fast - this way everything's in RAM > instantly. And you still have all devices available for use inside the > system - that makes enumeration a lot easier. There are several reasons > why and I don't think we should force different ways on people just > because one component of our system is ineffective. > > >>> Loading huge initrd on real HW takes noticeably longer time that small > >>> one, so I would say that it is your design that is to blame here, not > >>> KVM. > >> > >> I disagree. Virtualization enables new use cases. The -initrd parameter is > >> a very good example for that. It's something that you simply couldn't do > >> on real hw. > >> > > How is it different from starting kernel/initrd from usb flash drive? > > The kernel and initrd are read directly from the host fs. It's more like a 9p > grub boot. > There is no "host" on real HW :) But conceptually it's almost the same. 9p grub boot would be also nice. Hmm, I think PXE is closest to -kernel/-initrd option on real HW. > > > >>> > > > >> I guess the best thing for now really is to try and see which code > >> paths insb goes along. It should really be coalesced. > >> > > It is coalesced to a certain extent (reenter guest every 1024 bytes, > > read from userspace page at a time). You need to continue injecting > > interrupt into a guest during long string operation and checking > > exception condition on a page boundaries. > > That still sounds slow. So yeah, adding DMA is probably the right way to > go. But then again - if we model it after real hw it would be > asynchronous, giving us an interrupt, causing even more headache. Ugh. > > Can't we just ignore real hw constraints here and have it available in > guest ram once one particular PIO is done? No bus master, no interrupts, > but full speed and simplicity/atomicity which also helps migration. > > >>> We shouldn't add devices that work not like real HW to speed up some > >>> pathological cases (and are slow on real HW too). > >> > >> Just because you don't use them doesn't mean they're pathological, really. > >> We simply chose a bad interface for transferring reasonable big chunks of > >> data and we need to fix that. If you want to look at it from a different > >> perspective, it's a regression. Older qemu versions did map the kernel and > >> initrd directly into guest ram, so now we're slower than back then. > >> > > I use them hundred time each day (at least -kernel part). If the > > interface is slow for your use case I have no problem with introducing > > new one, but the one that make sense in x86 architecture. I do not agree > > this is regression BTW. You can't compare buggy way of doing things and > > non-buggy way and say that bug fixing is a regression. > > > > What about adding new PCI card that holds kernel initrd in ROM bar? > > Y
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 10:19, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 10:08:57AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 10:01, Gleb Natapov wrote: >> >>> On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote: On 19.07.2010, at 09:51, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 09:33, Gleb Natapov wrote: >> >>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > That what I am warring about too. If we are adding device we have to > be > sure such device can actually exist on real hw too otherwise we may > have > problems later. I don't understand why the constraints of real h/w have anything to do with this. Can you explain? >>> Each time we do something not architectural it cause us troubles later. >>> So constraints of real h/w is our constrains to. >>> > Also 1 second on 100M file does not look like huge gain to me. Every second counts. We're trying to get libguestfs boot times down from 8-12 seconds to 4-5 seconds. For many cases it's an interactive program. >>> So what about making initrd smaller? I remember managing two >>> distribution in 64M flash in embedded project. >> >> Having a huge initrd basically helps in reusing a lot of existing code. >> We do the same - in general the initrd is just a subset of the >> applications of the host OS. And if you start putting perl or the likes >> into it, it becomes big. >> > Why not provide small disk/cdrom with all those utilities installed? Because - if the loading is done fast - this way everything's in RAM instantly. And you still have all devices available for use inside the system - that makes enumeration a lot easier. There are several reasons why and I don't think we should force different ways on people just because one component of our system is ineffective. >>> Loading huge initrd on real HW takes noticeably longer time that small >>> one, so I would say that it is your design that is to blame here, not >>> KVM. >> >> I disagree. Virtualization enables new use cases. The -initrd parameter is a >> very good example for that. It's something that you simply couldn't do on >> real hw. >> > How is it different from starting kernel/initrd from usb flash drive? The kernel and initrd are read directly from the host fs. It's more like a 9p grub boot. > >>> > >> I guess the best thing for now really is to try and see which code paths >> insb goes along. It should really be coalesced. >> > It is coalesced to a certain extent (reenter guest every 1024 bytes, > read from userspace page at a time). You need to continue injecting > interrupt into a guest during long string operation and checking > exception condition on a page boundaries. That still sounds slow. So yeah, adding DMA is probably the right way to go. But then again - if we model it after real hw it would be asynchronous, giving us an interrupt, causing even more headache. Ugh. Can't we just ignore real hw constraints here and have it available in guest ram once one particular PIO is done? No bus master, no interrupts, but full speed and simplicity/atomicity which also helps migration. >>> We shouldn't add devices that work not like real HW to speed up some >>> pathological cases (and are slow on real HW too). >> >> Just because you don't use them doesn't mean they're pathological, really. >> We simply chose a bad interface for transferring reasonable big chunks of >> data and we need to fix that. If you want to look at it from a different >> perspective, it's a regression. Older qemu versions did map the kernel and >> initrd directly into guest ram, so now we're slower than back then. >> > I use them hundred time each day (at least -kernel part). If the > interface is slow for your use case I have no problem with introducing > new one, but the one that make sense in x86 architecture. I do not agree > this is regression BTW. You can't compare buggy way of doing things and > non-buggy way and say that bug fixing is a regression. > > What about adding new PCI card that holds kernel initrd in ROM bar? Yes and no. It sounds nice at first, but doesn't quite fit. There are two issues: 1) We need a new PCI ID 2) There can be a lot of initrd binaries with multiboot. We only have a limited amount of BARs Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 10:08:57AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 10:01, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 09:51, Gleb Natapov wrote: > >> > >>> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 09:33, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: > >> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > >>> That what I am warring about too. If we are adding device we have to > >>> be > >>> sure such device can actually exist on real hw too otherwise we may > >>> have > >>> problems later. > >> > >> I don't understand why the constraints of real h/w have anything to do > >> with this. Can you explain? > >> > > Each time we do something not architectural it cause us troubles later. > > So constraints of real h/w is our constrains to. > > > >>> Also 1 second on 100M file does not look like huge gain to me. > >> > >> Every second counts. We're trying to get libguestfs boot times down > >> from 8-12 seconds to 4-5 seconds. For many cases it's an interactive > >> program. > >> > > So what about making initrd smaller? I remember managing two > > distribution in 64M flash in embedded project. > > Having a huge initrd basically helps in reusing a lot of existing code. > We do the same - in general the initrd is just a subset of the > applications of the host OS. And if you start putting perl or the likes > into it, it becomes big. > > >>> Why not provide small disk/cdrom with all those utilities installed? > >> > >> Because - if the loading is done fast - this way everything's in RAM > >> instantly. And you still have all devices available for use inside the > >> system - that makes enumeration a lot easier. There are several reasons > >> why and I don't think we should force different ways on people just > >> because one component of our system is ineffective. > >> > > Loading huge initrd on real HW takes noticeably longer time that small > > one, so I would say that it is your design that is to blame here, not > > KVM. > > I disagree. Virtualization enables new use cases. The -initrd parameter is a > very good example for that. It's something that you simply couldn't do on > real hw. > How is it different from starting kernel/initrd from usb flash drive? > > > >>> > I guess the best thing for now really is to try and see which code paths > insb goes along. It should really be coalesced. > > >>> It is coalesced to a certain extent (reenter guest every 1024 bytes, > >>> read from userspace page at a time). You need to continue injecting > >>> interrupt into a guest during long string operation and checking > >>> exception condition on a page boundaries. > >> > >> That still sounds slow. So yeah, adding DMA is probably the right way to > >> go. But then again - if we model it after real hw it would be > >> asynchronous, giving us an interrupt, causing even more headache. Ugh. > >> > >> Can't we just ignore real hw constraints here and have it available in > >> guest ram once one particular PIO is done? No bus master, no interrupts, > >> but full speed and simplicity/atomicity which also helps migration. > >> > > We shouldn't add devices that work not like real HW to speed up some > > pathological cases (and are slow on real HW too). > > Just because you don't use them doesn't mean they're pathological, really. We > simply chose a bad interface for transferring reasonable big chunks of data > and we need to fix that. If you want to look at it from a different > perspective, it's a regression. Older qemu versions did map the kernel and > initrd directly into guest ram, so now we're slower than back then. > I use them hundred time each day (at least -kernel part). If the interface is slow for your use case I have no problem with introducing new one, but the one that make sense in x86 architecture. I do not agree this is regression BTW. You can't compare buggy way of doing things and non-buggy way and say that bug fixing is a regression. What about adding new PCI card that holds kernel initrd in ROM bar? -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 10:01, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 09:51, Gleb Natapov wrote: >> >>> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote: On 19.07.2010, at 09:33, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: >> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: >>> That what I am warring about too. If we are adding device we have to be >>> sure such device can actually exist on real hw too otherwise we may have >>> problems later. >> >> I don't understand why the constraints of real h/w have anything to do >> with this. Can you explain? >> > Each time we do something not architectural it cause us troubles later. > So constraints of real h/w is our constrains to. > >>> Also 1 second on 100M file does not look like huge gain to me. >> >> Every second counts. We're trying to get libguestfs boot times down >> from 8-12 seconds to 4-5 seconds. For many cases it's an interactive >> program. >> > So what about making initrd smaller? I remember managing two > distribution in 64M flash in embedded project. Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big. >>> Why not provide small disk/cdrom with all those utilities installed? >> >> Because - if the loading is done fast - this way everything's in RAM >> instantly. And you still have all devices available for use inside the >> system - that makes enumeration a lot easier. There are several reasons why >> and I don't think we should force different ways on people just because one >> component of our system is ineffective. >> > Loading huge initrd on real HW takes noticeably longer time that small > one, so I would say that it is your design that is to blame here, not > KVM. I disagree. Virtualization enables new use cases. The -initrd parameter is a very good example for that. It's something that you simply couldn't do on real hw. > >>> I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced. >>> It is coalesced to a certain extent (reenter guest every 1024 bytes, >>> read from userspace page at a time). You need to continue injecting >>> interrupt into a guest during long string operation and checking >>> exception condition on a page boundaries. >> >> That still sounds slow. So yeah, adding DMA is probably the right way to go. >> But then again - if we model it after real hw it would be asynchronous, >> giving us an interrupt, causing even more headache. Ugh. >> >> Can't we just ignore real hw constraints here and have it available in guest >> ram once one particular PIO is done? No bus master, no interrupts, but full >> speed and simplicity/atomicity which also helps migration. >> > We shouldn't add devices that work not like real HW to speed up some > pathological cases (and are slow on real HW too). Just because you don't use them doesn't mean they're pathological, really. We simply chose a bad interface for transferring reasonable big chunks of data and we need to fix that. If you want to look at it from a different perspective, it's a regression. Older qemu versions did map the kernel and initrd directly into guest ram, so now we're slower than back then. Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 09:51, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 09:33, Gleb Natapov wrote: > >> > >>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > > That what I am warring about too. If we are adding device we have to be > > sure such device can actually exist on real hw too otherwise we may have > > problems later. > > I don't understand why the constraints of real h/w have anything to do > with this. Can you explain? > > >>> Each time we do something not architectural it cause us troubles later. > >>> So constraints of real h/w is our constrains to. > >>> > > Also 1 second on 100M file does not look like huge gain to me. > > Every second counts. We're trying to get libguestfs boot times down > from 8-12 seconds to 4-5 seconds. For many cases it's an interactive > program. > > >>> So what about making initrd smaller? I remember managing two > >>> distribution in 64M flash in embedded project. > >> > >> Having a huge initrd basically helps in reusing a lot of existing code. We > >> do the same - in general the initrd is just a subset of the applications > >> of the host OS. And if you start putting perl or the likes into it, it > >> becomes big. > >> > > Why not provide small disk/cdrom with all those utilities installed? > > Because - if the loading is done fast - this way everything's in RAM > instantly. And you still have all devices available for use inside the system > - that makes enumeration a lot easier. There are several reasons why and I > don't think we should force different ways on people just because one > component of our system is ineffective. > Loading huge initrd on real HW takes noticeably longer time that small one, so I would say that it is your design that is to blame here, not KVM. > > > >> I guess the best thing for now really is to try and see which code paths > >> insb goes along. It should really be coalesced. > >> > > It is coalesced to a certain extent (reenter guest every 1024 bytes, > > read from userspace page at a time). You need to continue injecting > > interrupt into a guest during long string operation and checking > > exception condition on a page boundaries. > > That still sounds slow. So yeah, adding DMA is probably the right way to go. > But then again - if we model it after real hw it would be asynchronous, > giving us an interrupt, causing even more headache. Ugh. > > Can't we just ignore real hw constraints here and have it available in guest > ram once one particular PIO is done? No bus master, no interrupts, but full > speed and simplicity/atomicity which also helps migration. > We shouldn't add devices that work not like real HW to speed up some pathological cases (and are slow on real HW too). -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 09:51, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote: >> >> On 19.07.2010, at 09:33, Gleb Natapov wrote: >> >>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > That what I am warring about too. If we are adding device we have to be > sure such device can actually exist on real hw too otherwise we may have > problems later. I don't understand why the constraints of real h/w have anything to do with this. Can you explain? >>> Each time we do something not architectural it cause us troubles later. >>> So constraints of real h/w is our constrains to. >>> > Also 1 second on 100M file does not look like huge gain to me. Every second counts. We're trying to get libguestfs boot times down from 8-12 seconds to 4-5 seconds. For many cases it's an interactive program. >>> So what about making initrd smaller? I remember managing two >>> distribution in 64M flash in embedded project. >> >> Having a huge initrd basically helps in reusing a lot of existing code. We >> do the same - in general the initrd is just a subset of the applications of >> the host OS. And if you start putting perl or the likes into it, it becomes >> big. >> > Why not provide small disk/cdrom with all those utilities installed? Because - if the loading is done fast - this way everything's in RAM instantly. And you still have all devices available for use inside the system - that makes enumeration a lot easier. There are several reasons why and I don't think we should force different ways on people just because one component of our system is ineffective. > >> I guess the best thing for now really is to try and see which code paths >> insb goes along. It should really be coalesced. >> > It is coalesced to a certain extent (reenter guest every 1024 bytes, > read from userspace page at a time). You need to continue injecting > interrupt into a guest during long string operation and checking > exception condition on a page boundaries. That still sounds slow. So yeah, adding DMA is probably the right way to go. But then again - if we model it after real hw it would be asynchronous, giving us an interrupt, causing even more headache. Ugh. Can't we just ignore real hw constraints here and have it available in guest ram once one particular PIO is done? No bus master, no interrupts, but full speed and simplicity/atomicity which also helps migration. Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 08:44:16AM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 10:33:12AM +0300, Gleb Natapov wrote: > > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: > > > On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > > > > That what I am warring about too. If we are adding device we have to be > > > > sure such device can actually exist on real hw too otherwise we may have > > > > problems later. > > > > > > I don't understand why the constraints of real h/w have anything to do > > > with this. Can you explain? > > > > > Each time we do something not architectural it cause us troubles later. > > Can you explain more or point to some examples? I really don't > understand what these troubles could be. But I'm prepared to be > enlightened. > There are many. Look at vmware backdoor interface for instance. Such beast can't exist on real HW, so now we have to have hacks in emulator since io operation can change cpu registers. And I am not saying that what you are proposing can't exist on real HW. If such device can exist we can do it that way too. The gain is too small though. > > So what about making initrd smaller? I remember managing two > > distribution in 64M flash in embedded project. > > The distribution is the size that it is, because (a) it has to be > based on Fedora and because (b) it has to include a certain number of > programs. Why not put then on cdrom or disk? > > The reason for (a) is so that we don't need to compile our own tools > and we can benefit from bug fixes from Fedora (and contribute bug > fixes back). The reason for (b) is that we want to implement a rich > API[1], and having a rich API means we simply have to include many > binaries. > > We're already doing a lot of minimization on the image[2], deleting > man pages, language files, etc., so the image mainly just contains > binaries and libraries and kernel modules, which we cannot get rid of > because of (b). The original pre-minimization image is 600MB or so. > -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 09:33, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: > >> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > >>> That what I am warring about too. If we are adding device we have to be > >>> sure such device can actually exist on real hw too otherwise we may have > >>> problems later. > >> > >> I don't understand why the constraints of real h/w have anything to do > >> with this. Can you explain? > >> > > Each time we do something not architectural it cause us troubles later. > > So constraints of real h/w is our constrains to. > > > >>> Also 1 second on 100M file does not look like huge gain to me. > >> > >> Every second counts. We're trying to get libguestfs boot times down > >> from 8-12 seconds to 4-5 seconds. For many cases it's an interactive > >> program. > >> > > So what about making initrd smaller? I remember managing two > > distribution in 64M flash in embedded project. > > Having a huge initrd basically helps in reusing a lot of existing code. We do > the same - in general the initrd is just a subset of the applications of the > host OS. And if you start putting perl or the likes into it, it becomes big. > Why not provide small disk/cdrom with all those utilities installed? > I guess the best thing for now really is to try and see which code paths insb > goes along. It should really be coalesced. > It is coalesced to a certain extent (reenter guest every 1024 bytes, read from userspace page at a time). You need to continue injecting interrupt into a guest during long string operation and checking exception condition on a page boundaries. > Richard, what does kvm_stat tell you while loading the initrd? Are there a > lot of PIO requests or are we simply looping inside qemu code? > > > Alex -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 10:33:12AM +0300, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: > > On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > > > That what I am warring about too. If we are adding device we have to be > > > sure such device can actually exist on real hw too otherwise we may have > > > problems later. > > > > I don't understand why the constraints of real h/w have anything to do > > with this. Can you explain? > > > Each time we do something not architectural it cause us troubles later. Can you explain more or point to some examples? I really don't understand what these troubles could be. But I'm prepared to be enlightened. > So what about making initrd smaller? I remember managing two > distribution in 64M flash in embedded project. The distribution is the size that it is, because (a) it has to be based on Fedora and because (b) it has to include a certain number of programs. The reason for (a) is so that we don't need to compile our own tools and we can benefit from bug fixes from Fedora (and contribute bug fixes back). The reason for (b) is that we want to implement a rich API[1], and having a rich API means we simply have to include many binaries. We're already doing a lot of minimization on the image[2], deleting man pages, language files, etc., so the image mainly just contains binaries and libraries and kernel modules, which we cannot get rid of because of (b). The original pre-minimization image is 600MB or so. Rich. [1] http://libguestfs.org/guestfs.3.html [2] http://manpages.ubuntu.com/manpages/lucid/man8/febootstrap-minimize.8.html -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://et.redhat.com/~rjones/libguestfs/ See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On 19.07.2010, at 09:33, Gleb Natapov wrote: > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: >> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: >>> That what I am warring about too. If we are adding device we have to be >>> sure such device can actually exist on real hw too otherwise we may have >>> problems later. >> >> I don't understand why the constraints of real h/w have anything to do >> with this. Can you explain? >> > Each time we do something not architectural it cause us troubles later. > So constraints of real h/w is our constrains to. > >>> Also 1 second on 100M file does not look like huge gain to me. >> >> Every second counts. We're trying to get libguestfs boot times down >> from 8-12 seconds to 4-5 seconds. For many cases it's an interactive >> program. >> > So what about making initrd smaller? I remember managing two > distribution in 64M flash in embedded project. Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big. I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced. Richard, what does kvm_stat tell you while loading the initrd? Are there a lot of PIO requests or are we simply looping inside qemu code? Alex
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: > On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > > That what I am warring about too. If we are adding device we have to be > > sure such device can actually exist on real hw too otherwise we may have > > problems later. > > I don't understand why the constraints of real h/w have anything to do > with this. Can you explain? > Each time we do something not architectural it cause us troubles later. So constraints of real h/w is our constrains to. > > Also 1 second on 100M file does not look like huge gain to me. > > Every second counts. We're trying to get libguestfs boot times down > from 8-12 seconds to 4-5 seconds. For many cases it's an interactive > program. > So what about making initrd smaller? I remember managing two distribution in 64M flash in embedded project. -- Gleb.
Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > That what I am warring about too. If we are adding device we have to be > sure such device can actually exist on real hw too otherwise we may have > problems later. I don't understand why the constraints of real h/w have anything to do with this. Can you explain? > Also 1 second on 100M file does not look like huge gain to me. Every second counts. We're trying to get libguestfs boot times down from 8-12 seconds to 4-5 seconds. For many cases it's an interactive program. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v
Re: [Qemu-devel] [PATCH 2/2] fw_cfg: Add blit operation for copying kernel, initrd, ..
On Mon, Jul 19, 2010 at 01:59:22AM +0200, Aurelien Jarno wrote: > OpenBIOS also uses the same firmware interface, so it would need to be > changed if this patch is accepted. The patch leaves the old interface. Does it still need to be changed? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/