date:20100719

[Qemu-devel] Re: KVM call agenda for July 20

2010-07-19 Thread Avi Kivity


On 07/20/2010 12:46 AM, Chris Wright wrote:

Please send in any agenda items you are interested in covering.

   

 Last week's agenda, minus the item that we started to discuss.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

[Qemu-devel] Re: [SeaBIOS] [PATCH 2/7] seabios: shadow: make device finding more generic.

2010-07-19 Thread Kevin O'Connor

On Tue, Jul 13, 2010 at 06:45:00PM +0900, Isaku Yamahata wrote:
> On Mon, Jul 12, 2010 at 08:59:14PM -0400, Kevin O'Connor wrote:
> > On Mon, Jul 12, 2010 at 08:47:47PM +0900, Isaku Yamahata wrote:
> > > pam register offset is north bridge specific.
> > > So determine the offset based on found north bridge.
> > 
> > Is it really just the offset that is north bridge specific?  I thought
> > the entire process was very north bridge specific.
> >
> > If so, I don't think it makes sense to pass back the pam0 register -
> > instead the north bridge specific code should do the necessary work
> > (using helper functions if possible).
> > 
> > I have the same concern with part 3 and 4 of this series.
> 
> I440fx and Q35 (all Intel chipset?) are similar in registers
> which seabios programs, so I choice to abstract it at register offset level.
> I don't expect that other vendor's chipset support is wanted.

Although it isn't currently used, the memory locking support is useful
on real machines too.  I'd prefer a solution that would work on both
qemu and real machines.

It's minor for part 2 of the series, but I found part 3/4 to be hard
to follow due to the way the flow of code jumps between machine
specific code in dev-i440fx.c and the smm code in smm.c.

> If you want more high level abstract, I'll respin the patch set.

I've been meaning to look through the full series of changes in your
repo, but have not yet had a chance to do so.  I hope to get to that
in the next few days.  Sorry for the delay.

-Kevin

[Qemu-devel] [RFC PATCH 12/14] KVM-test: Add a subtest of netperf

2010-07-19 Thread Amos Kong

Add network load by netperf, server is launched on guest, execute netperf
client with different protocols on host. if all clients execute successfully,
case will be pass. Test result will be record into result.txt.
Now this case only tests with "TCP_RR TCP_CRR UDP_RR TCP_STREAM TCP_MAERTS
TCP_SENDFILE UDP_STREAM". DLPI only supported by Unix, unix domain test is
not necessary, so drop test of DLPI and unix domain.

Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/netperf.py 
b/client/tests/kvm/tests/netperf.py
new file mode 100644
index 000..00a91f0
--- /dev/null
+++ b/client/tests/kvm/tests/netperf.py
@@ -0,0 +1,56 @@
+import logging, commands, os
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils
+
+def run_netperf(test, params, env):
+"""
+Network stress test with netperf
+
+1) Boot up a virtual machine
+2) Launch netserver on guest
+3) Execute netperf client on host with different protocols
+4) Outout the test result
+
+@param test: Kvm test object
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+"""
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+session = kvm_test_utils.wait_for_login(vm,
+   timeout=int(params.get("login_timeout", 360)))
+netperf_dir = os.path.join(os.environ['AUTODIR'], "tests/netperf2")
+setup_cmd = params.get("setup_cmd")
+guest_ip = vm.get_address()
+result_file = os.path.join(test.debugdir, "result.txt")
+
+session.get_command_output("service iptables stop")
+for i in params.get("netperf_files").split():
+if not vm.copy_files_to(os.path.join(netperf_dir, i), "/tmp"):
+raise error.TestError("Could not copy files to guest")
+if session.get_command_status(setup_cmd % "/tmp", timeout=100) != 0:
+raise error.TestFail("Fail to setup netperf on guest")
+if session.get_command_status(params.get("netserver_cmd") % "/tmp") != 0:
+raise error.TestFail("Fail to start netperf server on guest")
+
+try:
+logging.info("Setup and run netperf client on host")
+s, o = commands.getstatusoutput(setup_cmd % netperf_dir)
+if s != 0:
+raise error.TestFail("Fail to setup netperf on host, o: %s" % o)
+success = True
+file(result_file, "w").write("Netperf Test Result\n")
+for i in params.get("protocols").split():
+cmd = params.get("netperf_cmd") % (netperf_dir, i, guest_ip)
+logging.debug("Execute netperf client test: %s" % cmd)
+s, o = commands.getstatusoutput(cmd)
+if s != 0:
+logging.error("Fail to execute netperf test, protocol:%s" % i)
+success = False
+else:
+logging.info(o)
+file(result_file, "a+").write("%s\n" % o)
+if not success:
+raise error.TestFail("Not all the test passed")
+finally:
+session.get_command_output("killall netserver")
+session.close()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 7716d48..dec988e 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -398,6 +398,16 @@ variants:
 type = mac_change
 kill_vm = yes
 
+- netperf: install setup unattended_install.cdrom
+type = netperf
+nic_mode = tap
+netperf_files = netperf-2.4.5.tar.bz2 wait_before_data.patch
+setup_cmd = "cd %s && tar xvfj netperf-2.4.5.tar.bz2 && cd 
netperf-2.4.5 && patch -p0 < ../wait_before_data.patch && ./configure && make"
+netserver_cmd =  %s/netperf-2.4.5/src/netserver
+# test time is 60 seconds, set the buffer size to 1 for more hardware 
interrupt
+netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 -- -m 1
+protocols = "TCP_STREAM TCP_MAERTS TCP_RR TCP_CRR UDP_RR TCP_SENDFILE 
UDP_STREAM"
+
 - physical_resources_check: install setup unattended_install.cdrom
 type = physical_resources_check
 catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'

[Qemu-devel] [RFC PATCH 02/14] KVM Test: Add a function get_interface_name() to kvm_net_utils.py

2010-07-19 Thread Amos Kong

The function get_interface_name is used to get the interface name of linux
guest through the macaddress of specified macaddress.

Signed-off-by: Jason Wang 
Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/kvm_net_utils.py 
b/client/tests/kvm/kvm_net_utils.py
new file mode 100644
index 000..ede4965
--- /dev/null
+++ b/client/tests/kvm/kvm_net_utils.py
@@ -0,0 +1,18 @@
+import re
+
+def get_linux_ifname(session, mac_address):
+"""
+Get the interface name through the mac address.
+
+@param session: session to the virtual machine
+@mac_address: the macaddress of nic
+"""
+
+output = session.get_command_output("ifconfig -a")
+
+try:
+ethname = re.findall("(\w+)\s+Link.*%s" % mac_address, output,
+ re.IGNORECASE)[0]
+return ethname
+except:
+return None

[Qemu-devel] [RFC PATCH 14/14] KVM-test: Add subtest of testing offload by ethtool

2010-07-19 Thread Amos Kong

The latest case contains TX/RX/SG/TSO/GSO/GRO/LRO test. RTL8139 NIC doesn't
support TSO, LRO, it's too old, so drop offload test from rtl8139. LRO, GRO
are only supported by latest kernel, virtio nic doesn't support receive
offloading function.
Initialize the callbacks first and execute all the sub tests one by one, all
the result will be check at the end.
When execute this test, vhost should be enabled, then most of new feature can
be used. Vhost doestn't support VIRTIO_NET_F_MRG_RXBUF, so do not check large
packets in received offload test.
Transfer files by scp between host and guest, match new opened TCP port by
netstat. Capture the packages info by tcpdump, it contains package length.

Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/ethtool.py 
b/client/tests/kvm/tests/ethtool.py
new file mode 100644
index 000..7274eae
--- /dev/null
+++ b/client/tests/kvm/tests/ethtool.py
@@ -0,0 +1,205 @@
+import time, os, logging, commands, re
+from autotest_lib.client.common_lib import error
+from autotest_lib.client.bin import utils
+import kvm_test_utils, kvm_utils, kvm_net_utils
+
+def run_ethtool(test, params, env):
+"""
+Test offload functions of ethernet device by ethtool
+
+1) Log into a guest
+2) Initialize the callback of sub functions
+3) Enable/disable sub function of NIC
+4) Execute callback function
+5) Check the return value
+6) Restore original configuration
+
+@param test: Kvm test object
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+"""
+def ethtool_get(type):
+feature_pattern = {
+'tx':  'tx.*checksumming',
+'rx':  'rx.*checksumming',
+'sg':  'scatter.*gather',
+'tso': 'tcp.*segmentation.*offload',
+'gso': 'generic.*segmentation.*offload',
+'gro': 'generic.*receive.*offload',
+'lro': 'large.*receive.*offload',
+}
+s, o = session.get_command_status_output("ethtool -k %s" % ethname)
+try:
+return re.findall("%s: (.*)" % feature_pattern.get(type), o)[0]
+except IndexError:
+logging.debug("Could not get %s status" % type)
+
+def ethtool_set(type, status):
+"""
+Set ethernet device offload status
+
+@param type: Offload type name
+@param status: New status will be changed to
+"""
+logging.info("Try to set %s %s" % (type, status))
+if status not in ["off", "on"]:
+return False
+cmd = "ethtool -K %s %s %s" % (ethname, type, status)
+if ethtool_get(type) != status:
+return session.get_command_status(cmd) == 0
+if ethtool_get(type) != status:
+logging.error("Fail to set %s %s" % (type, status))
+return False
+return True
+
+def ethtool_save_params():
+logging.info("Save ethtool configuration")
+for i in supported_features:
+feature_status[i] = ethtool_get(i)
+
+def ethtool_restore_params():
+logging.info("Restore ethtool configuration")
+for i in supported_features:
+ethtool_set(i, feature_status[i])
+
+def compare_md5sum(name):
+logging.info("Compare md5sum of the files on guest and host")
+host_result = utils.hash_file(name, method="md5")
+try:
+o = session.get_command_output("md5sum %s" % name)
+guest_result = re.findall("\w+", o)[0]
+except IndexError:
+logging.error("Could not get file md5sum in guest")
+return False
+logging.debug("md5sum: guest(%s), host(%s)" % (guest_result,
+   host_result))
+return guest_result == host_result
+
+def transfer_file(src="guest"):
+"""
+Transfer file by scp, use tcpdump to capture packets, then check the
+return string.
+
+@param src: Source host of transfer file
+@return: Tuple (status, error msg/tcpdump result)
+"""
+session2.get_command_status("rm -rf %s" % filename)
+dd_cmd = "dd if=/dev/urandom of=%s bs=1M count=%s" % (filename,
+   params.get("filesize"))
+logging.info("Creat file in source host, cmd: %s" % dd_cmd)
+tcpdump_cmd = "tcpdump -lep -s 0 tcp -vv port ssh"
+if src == "guest":
+s = session.get_command_status(dd_cmd, timeout=360)
+tcpdump_cmd += " and src %s" % guest_ip
+copy_files_fun = vm.copy_files_from
+else:
+s, o = commands.getstatusoutput(dd_cmd)
+tcpdump_cmd += " and dst %s" % guest_ip
+copy_files_fun = vm.copy_files_to
+if s != 0:
+return (False, "Fail to create file by dd, cmd: %s" % dd_cmd)
+
+# only capture the new tcp port aft

[Qemu-devel] [RFC PATCH 11/14] KVM-test: Add a subtest of changing mac address

2010-07-19 Thread Amos Kong

Mainly test steps:
1. get a new mac from pool, and the old mac addr of guest.
2. execute the mac_change.sh in guest.
3. relogin to guest and query the interfaces info by `ifconfig`

Signed-off-by: Cao, Chen 
Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/mac_change.py 
b/client/tests/kvm/tests/mac_change.py
new file mode 100644
index 000..dc93377
--- /dev/null
+++ b/client/tests/kvm/tests/mac_change.py
@@ -0,0 +1,66 @@
+import logging
+from autotest_lib.client.common_lib import error
+import kvm_utils, kvm_test_utils, kvm_net_utils
+
+
+def run_mac_change(test, params, env):
+"""
+Change MAC Address of Guest.
+
+1. get a new mac from pool, and the old mac addr of guest.
+2. set new mac in guest and regain new IP.
+3. re-log into guest with new mac
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+timeout = int(params.get("login_timeout", 360))
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+logging.info("Trying to log into guest '%s' by serial", vm.name)
+session = kvm_utils.wait_for(lambda: vm.serial_login(),
+  timeout, 0, step=2)
+if not session:
+raise error.TestFail("Could not log into guest '%s'" % vm.name)
+
+old_mac = vm.get_macaddr(0)
+kvm_utils.put_mac_to_pool(vm.root_dir, old_mac, vm.instance)
+new_mac = kvm_utils.get_mac_from_pool(vm.root_dir,
+  vm=vm.instance,
+  nic_index=0,
+  prefix=vm.mac_prefix)
+logging.info("The initial MAC address is %s" % old_mac)
+interface = kvm_net_utils.get_linux_ifname(session, old_mac)
+
+# Start change mac address
+logging.info("Changing mac address to %s" % new_mac)
+change_cmd = "ifconfig %s down && ifconfig %s hw ether %s && ifconfig %s 
up"\
+ % (interface, interface, new_mac, interface)
+if session.get_command_status(change_cmd) != 0:
+raise error.TestFail("Fail to send mac_change command")
+
+# Verify whether mac address is changed to new one
+logging.info("Verifying the new mac address")
+if session.get_command_status("ifconfig | grep -i %s" % new_mac) != 0:
+raise error.TestFail("Fail to change mac address")
+
+# Restart `dhclient' to regain IP for new mac address
+logging.info("Re-start the network to gain new ip")
+dhclient_cmd = "dhclient -r && dhclient %s" % interface
+session.sendline(dhclient_cmd)
+
+# Re-log into the guest after changing mac address
+if kvm_utils.wait_for(session.is_responsive, 120, 20, 3):
+# Just warning when failed to see the session become dead,
+# because there is a little chance the ip does not change.
+logging.warn("The session is still responsive, settings may fail.")
+session.close()
+
+# Re-log into guest and check if session is responsive
+logging.info("Re-log into the guest")
+session = kvm_test_utils.wait_for_login(vm,
+  timeout=int(params.get("login_timeout", 360)))
+if not session.is_responsive():
+raise error.TestFail("The new session is not responsive.")
+
+session.close()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 5515601..7716d48 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -394,6 +394,10 @@ variants:
 restart_vm = yes
 pxe_timeout = 60
 
+- mac_change: install setup unattended_install.cdrom
+type = mac_change
+kill_vm = yes
+
 - physical_resources_check: install setup unattended_install.cdrom
 type = physical_resources_check
 catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'
@@ -1070,7 +1074,7 @@ variants:
 
 # Windows section
 - @Windows:
-no autotest linux_s3 vlan_tag ioquit 
unattended_install.(url|nfs|remote_ks) jumbo file_transfer nicdriver_unload 
nic_promisc multicast
+no autotest linux_s3 vlan_tag ioquit 
unattended_install.(url|nfs|remote_ks) jumbo file_transfer nicdriver_unload 
nic_promisc multicast mac_change
 shutdown_command = shutdown /s /f /t 0
 reboot_command = shutdown /r /f /t 0
 status_test_command = echo %errorlevel%

[Qemu-devel] [RFC PATCH 10/14] KVM-test: Add a subtest of pxe

2010-07-19 Thread Amos Kong

This case just snoop tftp packet through tcpdump, it depends on public dhcp
server, better to test it through dnsmasq.

Signed-off-by: Jason Wang 
Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/pxe.py b/client/tests/kvm/tests/pxe.py
new file mode 100644
index 000..8859aaa
--- /dev/null
+++ b/client/tests/kvm/tests/pxe.py
@@ -0,0 +1,30 @@
+import logging
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils
+
+
+def run_pxe(test, params, env):
+"""
+PXE test:
+
+1) Snoop the tftp packet in the tap device
+2) Wait for some seconds
+3) Check whether capture tftp packets
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+timeout = int(params.get("pxe_timeout", 60))
+
+logging.info("Try to boot from pxe")
+status, output = kvm_subprocess.run_fg("tcpdump -nli %s" % vm.get_ifname(),
+   logging.debug,
+   "(pxe) ",
+   timeout)
+
+logging.info("Analysing the tcpdump result...")
+if not "tftp" in output:
+raise error.TestFail("Couldn't find tftp packet in %s seconds" % 
timeout)
+logging.info("Found tftp packet")
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 9594a38..5515601 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -381,6 +381,19 @@ variants:
 mgroup_count = 20
 flood_minutes = 1
 
+- pxe:
+type = pxe
+images = pxe
+image_name_pxe = pxe-test
+image_size_pxe = 1G
+force_create_image_pxe = yes
+remove_image_pxe = yes
+extra_params += ' -boot n'
+kill_vm_on_error = yes
+network = bridge
+restart_vm = yes
+pxe_timeout = 60
+
 - physical_resources_check: install setup unattended_install.cdrom
 type = physical_resources_check
 catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'

[Qemu-devel] [RFC PATCH 05/14] KVM-test: Add a subtest jumbo

2010-07-19 Thread Amos Kong

According to different nic model set different MTU for it. And ping from guest
to host, to see whether tested size can be received by host.

Signed-off-by: Jason Wang 
Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/jumbo.py b/client/tests/kvm/tests/jumbo.py
new file mode 100644
index 000..9f56a87
--- /dev/null
+++ b/client/tests/kvm/tests/jumbo.py
@@ -0,0 +1,133 @@
+import os, re, logging, commands, time, random
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils, kvm_net_utils
+
+def run_jumbo(test, params, env):
+"""
+Test the RX jumbo frame function of vnics:
+1) boot the vm
+2) change the MTU of guest nics and host taps depends on the nic model
+3) add the static arp entry for guest nic
+4) wait for the MTU ok
+5) verify the patch mtu using ping
+6) ping the guest with large frames
+7) increament size ping
+8) flood ping the guest with large frames
+9) verify the path mtu
+10) revocer the mtu
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+session = kvm_test_utils.wait_for_login(vm)
+mtu = params.get("mtu", "1500")
+flood_time = params.get("flood_time", "300")
+max_icmp_pkt_size = int(mtu) - 28
+
+ifname = vm.get_ifname(0)
+ip = vm.get_address(0)
+if ip is None:
+raise error.TestError("Could not get the ip address")
+
+try:
+# Environment preparartion
+ethname = kvm_net_utils.get_linux_ifname(session, vm.get_macaddr(0))
+
+logging.info("Changing the mtu of guest ...")
+guest_mtu_cmd = "ifconfig %s mtu %s" % (ethname , mtu)
+s, o = session.get_command_status_output(guest_mtu_cmd)
+if s != 0:
+logging.error(o)
+raise error.TestError("Fail to set the mtu of guest nic: %s"
+  % ethname)
+
+logging.info("Chaning the mtu of host tap ...")
+host_mtu_cmd = "ifconfig %s mtu %s" % (ifname, mtu)
+s, o = commands.getstatusoutput(host_mtu_cmd)
+if s != 0:
+raise error.TestError("Fail to set the mtu of %s" % ifname)
+
+logging.info("Add a temporary static arp entry ...")
+arp_add_cmd = "arp -s %s %s -i %s" % (ip, vm.get_macaddr(0), ifname)
+s, o = commands.getstatusoutput(arp_add_cmd)
+if s != 0 :
+raise error.TestError("Fail to add temporary arp entry")
+
+def is_mtu_ok():
+s, o = kvm_net_utils.ping(ip, 1, interface = ifname,
+  packetsize = max_icmp_pkt_size,
+  hint = "do", timeout = 2)
+if s != 0:
+return False
+else:
+return True
+
+def verify_mtu():
+logging.info("Verify the path mtu")
+s, o = kvm_net_utils.ping(ip, 10, interface = ifname,
+  packetsize = max_icmp_pkt_size,
+  hint = "do", timeout = 15)
+if s != 0 :
+logging.error(o)
+raise error.TestFail("Path MTU is not as expected")
+if kvm_net_utils.get_loss_ratio(o) != 0:
+logging.error(o)
+raise error.TestFail("Packet loss ratio during mtu 
verification"
+ " is not zero")
+
+def flood_ping():
+logging.info("Flood with large frames")
+kvm_net_utils.ping(ip, interface = ifname,
+   packetsize = max_icmp_pkt_size,
+   flood = True, timeout = float(flood_time))
+
+def large_frame_ping(count = 100):
+logging.info("Large frame ping")
+s, o = kvm_net_utils.ping(ip, count, interface = ifname,
+  packetsize = max_icmp_pkt_size,
+  timeout = float(count) * 2)
+ratio = kvm_net_utils.get_loss_ratio(o)
+if ratio != 0:
+raise error.TestFail("Loss ratio of large frame ping is %s" \
+ % ratio)
+
+def size_increase_ping(step = random.randrange(90, 110)):
+logging.info("Size increase ping")
+for size in range(0, max_icmp_pkt_size + 1, step):
+logging.info("Ping %s with size %s" % (ip, size))
+s, o = kvm_net_utils.ping(ip, 1, interface = ifname,
+  packetsize = size,
+  hint = "do", timeout = 1)
+if s != 0:
+s, o = kvm_net_utils.ping(ip, 10, interface = ifname,
+

[Qemu-devel] [RFC PATCH 06/14] KVM-test: Add basic file transfer test

2010-07-19 Thread Amos Kong

This test is the basic test of transfering file between host and guest. Try to
transfer a large file from host to guest, and transfer it back to host, then
compare the files by diff command.
The default file size is 4000M, scp timeout is 1000s. It means if the average
speed is less than 4M/s, this test will be fail.
We can extend this test by using another disk later, then we can transfer larger
files without the limit of first disk size.

Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/file_transfer.py 
b/client/tests/kvm/tests/file_transfer.py
new file mode 100644
index 000..a20e62e
--- /dev/null
+++ b/client/tests/kvm/tests/file_transfer.py
@@ -0,0 +1,54 @@
+import logging, commands
+from autotest_lib.client.common_lib import error
+import kvm_utils, kvm_test_utils
+
+def run_file_transfer(test, params, env):
+"""
+Test ethrnet device function by ethtool
+
+1) Boot up a virtual machine
+2) Create a large file by dd on host
+3) Copy this file from host to guest
+4) Copy this file from guest to host
+5) Check if file transfers good
+
+@param test: Kvm test object
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+"""
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+timeout=int(params.get("login_timeout", 360))
+logging.info("Trying to log into guest '%s' by serial", vm.name)
+session = kvm_utils.wait_for(lambda: vm.serial_login(),
+ timeout, 0, step=2)
+if not session:
+raise error.TestFail("Could not log into guest '%s'" % vm.name)
+
+dir = test.tmpdir
+scp_timeout = int(params.get("scp_timeout"))
+cmd = "dd if=/dev/urandom of=%s/a.out bs=1M count=%d" %  (dir, int(
+ params.get("filesize", 4000)))
+try:
+logging.info("Create file by dd command on host, cmd: %s" % cmd)
+s, o = commands.getstatusoutput(cmd)
+if s != 0:
+raise error.TestError("Fail to create file, output:%s" % o)
+
+logging.info("Transfer file from host to guest")
+if not vm.copy_files_to("%s/a.out" % dir, "/tmp/b.out",
+timeout=scp_timeout):
+raise error.TestFail("Fail to transfer file from host to guest")
+
+logging.info("Transfer file from guest to host")
+if not vm.copy_files_from("/tmp/b.out", "%s/c.out" % dir,
+timeout=scp_timeout):
+raise error.TestFail("Fail to transfer file from guest to host")
+
+logging.debug(commands.getoutput("ls -l %s/[ac].out" % dir))
+s, o = commands.getstatusoutput("diff %s/a.out %s/c.out" % (dir, dir))
+if s != 0:
+raise error.TestFail("File changed after transfer. Output:%s" % o)
+finally:
+session.get_command_status("rm -f /tmp/b.out")
+commands.getoutput("rm -f %s/[ac].out" % dir)
+session.close()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 7f7b56a..872674e 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -357,6 +357,11 @@ variants:
 - jumbo: install setup unattended_install.cdrom
 type = jumbo
 
+- file_transfer: install setup unattended_install.cdrom
+type = file_transfer
+filesize = 4000
+scp_timeout = 1000
+
 - physical_resources_check: install setup unattended_install.cdrom
 type = physical_resources_check
 catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'
@@ -1033,7 +1038,7 @@ variants:
 
 # Windows section
 - @Windows:
-no autotest linux_s3 vlan_tag ioquit 
unattended_install.(url|nfs|remote_ks) jumbo
+no autotest linux_s3 vlan_tag ioquit 
unattended_install.(url|nfs|remote_ks) jumbo file_transfer
 shutdown_command = shutdown /s /f /t 0
 reboot_command = shutdown /r /f /t 0
 status_test_command = echo %errorlevel%

[Qemu-devel] [RFC PATCH 09/14] KVM-test: Add a subtest of multicast

2010-07-19 Thread Amos Kong

Use 'ping' to test send/recive multicat packets. Flood ping test is also added.
Limit guest network as 'bridge' mode, because multicast packets could not be
transmitted to guest when using 'user' network.
Add join_mcast.py for joining machine into multicast groups.

Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/scripts/join_mcast.py 
b/client/tests/kvm/scripts/join_mcast.py
new file mode 100755
index 000..0d90e5c
--- /dev/null
+++ b/client/tests/kvm/scripts/join_mcast.py
@@ -0,0 +1,29 @@
+import socket, struct, os, signal, sys
+# this script is used to join machine into multicast groups
+# author: Amos Kong 
+
+if len(sys.argv) < 4:
+print """%s [mgroup_count] [prefix] [suffix]
+mgroup_count: count of multicast addresses
+prefix: multicast address prefix
+suffix: multicast address suffix""" % sys.argv[0]
+sys.exit()
+
+mgroup_count = int(sys.argv[1])
+prefix = sys.argv[2]
+suffix = int(sys.argv[3])
+
+s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+for i in range(mgroup_count):
+mcast = prefix + "." + str(suffix + i)
+try:
+mreq = struct.pack("4sl", socket.inet_aton(mcast), socket.INADDR_ANY)
+s.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq)
+except:
+s.close()
+print "Could not join multicast: %s" % mcast
+raise
+
+print "join_mcast_pid:%s" % os.getpid()
+os.kill(os.getpid(), signal.SIGSTOP)
+s.close()
diff --git a/client/tests/kvm/tests/multicast.py 
b/client/tests/kvm/tests/multicast.py
new file mode 100644
index 000..6b0e106
--- /dev/null
+++ b/client/tests/kvm/tests/multicast.py
@@ -0,0 +1,67 @@
+import logging, commands, os, re
+from autotest_lib.client.common_lib import error
+import kvm_test_utils, kvm_net_utils
+
+
+def run_multicast(test, params, env):
+"""
+Test multicast function of nic (rtl8139/e1000/virtio)
+
+1) Create a VM
+2) Join guest into multicast groups
+3) Ping multicast addresses on host
+4) Flood ping test with different size of packets
+5) Final ping test and check if lose packet
+
+@param test: Kvm test object
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+"""
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+session = kvm_test_utils.wait_for_login(vm,
+  timeout=int(params.get("login_timeout", 360)))
+
+# stop iptable/selinux on guest/host
+cmd = "/etc/init.d/iptables stop && echo 0 > /selinux/enforce"
+session.get_command_status(cmd)
+commands.getoutput(cmd)
+# make sure guest replies to broadcasts
+session.get_command_status("echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore"
+"_broadcasts && echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore_all")
+
+# base multicast address
+mcast = params.get("mcast", "225.0.0.1")
+# count of multicast addresses, less than 20
+mgroup_count = int(params.get("mgroup_count", 5))
+flood_minutes = float(params.get("flood_minutes", 10))
+ifname = vm.get_ifname()
+prefix = re.findall("\d+.\d+.\d+", mcast)[0]
+suffix = int(re.findall("\d+", mcast)[-1])
+# copy python script to guest for joining guest to multicast groups
+mcast_path = os.path.join(test.bindir, "scripts/join_mcast.py")
+vm.copy_files_to(mcast_path, "/tmp")
+output = session.get_command_output("python /tmp/join_mcast.py %d %s %d" %
+(mgroup_count, prefix, suffix))
+# if success to join multicast the process will be paused, and return pid.
+if not re.findall("join_mcast_pid:(\d+)", output):
+raise error.TestFail("Can't join multicast groups,output:%s" % output)
+pid = output.split()[0]
+
+try:
+for i in range(mgroup_count):
+new_suffix = suffix + i
+mcast = "%s.%d" % (prefix, new_suffix)
+logging.info("Initial ping test, mcast: %s", mcast)
+s, o = kvm_net_utils.ping(mcast, 10, interface=ifname, timeout=20)
+if s != 0:
+raise error.TestFail(" Ping return non-zero value %s" % o)
+logging.info("Flood ping test, mcast: %s", mcast)
+kvm_net_utils.ping(mcast, None, interface=ifname, flood=True,
+   output_func=None, timeout=flood_minutes*60)
+logging.info("Final ping test, mcast: %s", mcast)
+s, o = kvm_net_utils.ping(mcast, 10, interface=ifname, timeout=20)
+if s != 0:
+raise error.TestFail(" Ping return non-zero value %s" % o)
+finally:
+session.get_command_output("kill -s SIGCONT %s" % pid)
+session.close()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 9e2b9a0..9594a38 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -374,6 +374,13 @@ variants:

[Qemu-devel] [RFC PATCH 13/14] KVM-test: Improve vlan subtest

2010-07-19 Thread Amos Kong

This is an enhancement of existed vlan test. Rename the vlan_tag.py to vlan.py,
it is more reasonable.
. Setup arp from "/proc/sys/net/ipv4/conf/all/arp_ignore"
. Multiple vlans exist simultaneously
. Test ping between same and different vlans
. Test by TCP data transfer, floop ping between same vlan
. Maximal plumb/unplumb vlans

Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/vlan.py b/client/tests/kvm/tests/vlan.py
new file mode 100644
index 000..dc7611b
--- /dev/null
+++ b/client/tests/kvm/tests/vlan.py
@@ -0,0 +1,181 @@
+import logging, time, re
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils, kvm_net_utils
+
+def run_vlan(test, params, env):
+"""
+Test 802.1Q vlan of NIC, config it by vconfig command.
+
+1) Create two VMs
+2) Setup guests in 10 different vlans by vconfig and using hard-coded
+   ip address
+3) Test by ping between same and different vlans of two VMs
+4) Test by TCP data transfer, floop ping between same vlan of two VMs
+5) Test maximal plumb/unplumb vlans
+6) Recover the vlan config
+
+@param test: KVM test object.
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+"""
+
+vm = []
+session = []
+vm_ip = []
+digest_origin = []
+vlan_ip = ['', '']
+ip_unit = ['1', '2']
+subnet = params.get("subnet")
+vlan_num = int(params.get("vlan_num"))
+maximal = int(params.get("maximal"))
+file_size = params.get("file_size")
+
+vm.append(kvm_test_utils.get_living_vm(env, params.get("main_vm")))
+vm.append(kvm_test_utils.get_living_vm(env, "vm2"))
+
+def add_vlan(session, id, iface="eth0"):
+if session.get_command_status("vconfig add %s %s" % (iface, id)) != 0:
+raise error.TestError("Fail to add %s.%s" % (iface, id))
+
+def set_ip_vlan(session, id, ip, iface="eth0"):
+iface = "%s.%s" % (iface, id)
+if session.get_command_status("ifconfig %s %s" % (iface, ip)) != 0:
+raise error.TestError("Fail to configure ip for %s" % iface)
+
+def set_arp_ignore(session, iface="eth0"):
+ignore_cmd = "echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore"
+if session.get_command_status(ignore_cmd) != 0:
+raise error.TestError("Fail to set arp_ignore of %s" % session)
+
+def rem_vlan(session, id, iface="eth0"):
+rem_vlan_cmd = "if [[ -e /proc/net/vlan/%s ]];then vconfig rem %s;fi"
+iface = "%s.%s" % (iface, id)
+s = session.get_command_status(rem_vlan_cmd % (iface, iface))
+return s
+
+def nc_transfer(src, dst):
+nc_port = kvm_utils.find_free_port(1025, 5334, vm_ip[dst])
+listen_cmd = params.get("listen_cmd")
+send_cmd = params.get("send_cmd")
+
+#listen in dst
+listen_cmd = listen_cmd % (nc_port, "receive")
+session[dst].sendline(listen_cmd)
+time.sleep(2)
+#send file from src to dst
+send_cmd = send_cmd % (vlan_ip[dst], str(nc_port), "file")
+if session[src].get_command_status(send_cmd, timeout = 60) != 0:
+raise error.TestFail ("Fail to send file"
+" from vm%s to vm%s" % (src+1, dst+1))
+s, o = session[dst].read_up_to_prompt(timeout=60)
+if s != True:
+raise error.TestFail ("Fail to receive file"
+" from vm%s to vm%s" % (src+1, dst+1))
+#check MD5 message digest of receive file in dst
+output = session[dst].get_command_output("md5sum receive").strip()
+digest_receive = re.findall(r'(\w+)', output)[0]
+if digest_receive == digest_origin[src]:
+logging.info("file succeed received in vm %s" % vlan_ip[dst])
+else:
+logging.info("digest_origin is  %s" % digest_origin[src])
+logging.info("digest_receive is %s" % digest_receive)
+raise error.TestFail("File transfered differ from origin")
+session[dst].get_command_status("rm -f receive")
+
+for i in range(2):
+session.append(kvm_test_utils.wait_for_login(vm[i],
+   timeout=int(params.get("login_timeout", 360
+if not session[i] :
+raise error.TestError("Could not log into guest(vm%d)" % i)
+logging.info("Logged in")
+
+#get guest ip
+vm_ip.append(vm[i].get_address())
+
+#produce sized file in vm
+dd_cmd = "dd if=/dev/urandom of=file bs=1024k count=%s"
+if session[i].get_command_status(dd_cmd % file_size) != 0:
+raise error.TestFail("File producing failed")
+#record MD5 message digest of file
+s, output =session[i].get_command_status_output("md5sum file",
+timeout=60)
+if s != 0:
+raise error.Test

[Qemu-devel] [RFC PATCH 04/14] KVM-test: Add a new subtest ping

2010-07-19 Thread Amos Kong

This test use ping to check the virtual nics, it contains two kinds of test:
1. Packet loss ratio test, ping the guest with different size of packets.
2. Stress test, flood ping guest then use ordinary ping to test the network.

The interval and packet size could be configurated through tests_base.cfg

Signed-off-by: Jason Wang 
Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/ping.py b/client/tests/kvm/tests/ping.py
new file mode 100644
index 000..cfccda4
--- /dev/null
+++ b/client/tests/kvm/tests/ping.py
@@ -0,0 +1,71 @@
+import logging, time, re, commands
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils, kvm_net_utils
+
+
+def run_ping(test, params, env):
+"""
+Ping the guest with different size of packets.
+
+Packet Loss Test:
+1) Ping the guest with different size/interval of packets.
+Stress Test:
+1) Flood ping the guest.
+2) Check if the network is still usable.
+
+@param test: Kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+session = kvm_test_utils.wait_for_login(vm)
+
+counts = params.get("ping_counts", 100)
+flood_minutes = float(params.get("flood_minutes", 10))
+nics = params.get("nics").split()
+strict_check = params.get("strict_check", "no") == "yes"
+
+packet_size = [0, 1, 4, 48, 512, 1440, 1500, 1505, 4054, 4055, 4096, 4192,
+   8878, 9000, 32767, 65507]
+
+try:
+for i, nic in enumerate(nics):
+ip = vm.get_address(i)
+if not ip:
+logging.error("Could not get the ip of nic index %d" % i)
+continue
+
+for size in packet_size:
+logging.info("Ping with packet size %s" % size)
+status, output = kvm_net_utils.ping(ip, 10,
+packetsize = size,
+timeout = 20)
+if strict_check:
+ratio = kvm_net_utils.get_loss_ratio(output)
+if ratio != 0:
+raise error.TestFail(" Loss ratio is %s for packet 
size"
+ " %s" % (ratio, size))
+else:
+if status != 0:
+raise error.TestFail(" Ping returns non-zero value %s" 
%
+ output)
+
+logging.info("Flood ping test")
+kvm_net_utils.ping(ip, None, flood = True, output_func= None,
+   timeout = flood_minutes * 60)
+
+logging.info("Final ping test")
+status, output = kvm_net_utils.ping(ip, counts,
+timeout = float(counts) * 1.5)
+if strict_check:
+ratio = kvm_net_utils.get_loss_ratio(output)
+if ratio != 0:
+raise error.TestFail("Packet loss ratio is %s after flood"
+ % ratio)
+else:
+if status != 0:
+raise error.TestFail(" Ping returns non-zero value %s" %
+ output)
+finally:
+session.close()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 6710c00..4f58dc0 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -349,6 +349,11 @@ variants:
 kill_vm_gracefully_vm2 = no
 address_index_vm2 = 1
 
+- ping: install setup unattended_install.cdrom
+type = ping
+counts = 100
+flood_minutes = 10
+
 - physical_resources_check: install setup unattended_install.cdrom
 type = physical_resources_check
 catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'

[Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm

2010-07-19 Thread Amos Kong

Old method uses the mac address in the configuration files which could
lead serious problem when multiple tests running in different hosts.

This patch adds a new macaddress pool algorithm, it generates the mac prefix
based on mac address of the host which could eliminate the duplicated mac
addresses between machines.

When user have set the mac_prefix in the configuration file, we should use it
in stead of the dynamic generated mac prefix.

Other change:
. Fix randomly generating mac address so that it correspond to IEEE802.
. Update clone function to decide clone mac address or not.
. Update get_macaddr function.
. Add set_mac_address function.

New auto mac address pool algorithm:
If address_index is defined, VM will get mac from config file then record mac
in to address_pool. If address_index is not defined, VM will call
get_mac_from_pool to auto create mac then recored mac to address_pool in
following format:
{'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}}

  AE:9D:94:6A:9b:f9: mac address
  20100310-165222-Wt7l : instance attribute of VM
  0: index of NIC


Signed-off-by: Jason Wang 
Signed-off-by: Feng Yang 
Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index fb2d1c2..7c0946e 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -5,6 +5,7 @@ KVM test utility functions.
 """
 
 import time, string, random, socket, os, signal, re, logging, commands, cPickle
+import fcntl, shelve
 from autotest_lib.client.bin import utils
 from autotest_lib.client.common_lib import error, logging_config
 import kvm_subprocess
@@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword):
 
 # Functions related to MAC/IP addresses
 
+def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'):
+"""
+random generated mac address.
+
+1) First try to generate macaddress based on the mac address prefix.
+2) And then try to use total random generated mac address.
+
+@param root_dir: Root dir for kvm
+@param vm: Here we use instance of vm
+@param nic_index: The index of nic.
+@param prefix: Prefix of mac address.
+@Return: Return mac address.
+"""
+
+lock_filename = os.path.join(root_dir, "mac_lock")
+lock_file = open(lock_filename, 'w')
+fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX)
+mac_filename = os.path.join(root_dir, "address_pool")
+mac_shelve = shelve.open(mac_filename, writeback=False)
+
+mac_pool = mac_shelve.get("macpool")
+
+if not mac_pool:
+mac_pool = {}
+found = False
+
+val = "%s:%s" % (vm, nic_index)
+for key in mac_pool.keys():
+if val in mac_pool[key]:
+mac_pool[key].append(val)
+found = True
+mac = key
+
+while not found:
+postfix = "%02x:%02x" % (random.randint(0x00,0xfe),
+random.randint(0x00,0xfe))
+mac = prefix + postfix
+mac_list = mac.split(":")
+# Clear multicast bit
+mac_list[0] = int(mac_list[0],16) & 0xfe
+# Set local assignment bit (IEEE802)
+mac_list[0] = mac_list[0] | 0x02
+mac_list[0] = "%02x" % mac_list[0]
+mac = ":".join(mac_list)
+if mac not in mac_pool.keys() or 0 == len(mac_pool[mac]):
+mac_pool[mac] = ["%s:%s" % (vm,nic_index)]
+found = True
+mac_shelve["macpool"] = mac_pool
+logging.debug("generating mac addr %s " % mac)
+
+mac_shelve.close()
+fcntl.lockf(lock_file.fileno(), fcntl.LOCK_UN)
+lock_file.close()
+return mac
+
+
+def put_mac_to_pool(root_dir, mac, vm):
+"""
+Put the macaddress back to address pool
+
+@param root_dir: Root dir for kvm
+@param vm: Here we use instance attribute of vm
+@param mac: mac address will be put.
+@Return: mac address.
+"""
+
+lock_filename = os.path.join(root_dir, "mac_lock")
+lock_file = open(lock_filename, 'w')
+fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX)
+mac_filename = os.path.join(root_dir, "address_pool")
+mac_shelve = shelve.open(mac_filename, writeback=False)
+
+mac_pool = mac_shelve.get("macpool")
+
+if not mac_pool or (not mac in mac_pool):
+logging.debug("Try to free a macaddress does no in pool")
+logging.debug("macaddress is %s" % mac)
+logging.debug("pool is %s" % mac_pool)
+else:
+if len(mac_pool[mac]) <= 1:
+mac_pool.pop(mac)
+else:
+for value in mac_pool[mac]:
+if vm in value:
+mac_pool[mac].remove(value)
+break
+if len(mac_pool[mac]) == 0:
+mac_pool.pop(mac)
+
+mac_shelve["macpool"] = mac_pool
+logging.debug("freeing mac addr %s " % mac)
+
+mac_shelve.close()
+fcntl.lockf(lock_file.fileno(), fcntl.LOCK_UN)
+lock_file.close()
+return

[Qemu-devel] [RFC PATCH 08/14] KVM-test: Add a subtest of nic promisc

2010-07-19 Thread Amos Kong

This test mainly covers TCP sent from host to guest and from guest to host
with repeatedly turn on/off NIC promiscuous mode.

Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/nic_promisc.py 
b/client/tests/kvm/tests/nic_promisc.py
new file mode 100644
index 000..9a0c979
--- /dev/null
+++ b/client/tests/kvm/tests/nic_promisc.py
@@ -0,0 +1,87 @@
+import logging, commands
+from autotest_lib.client.common_lib import error
+import kvm_utils, kvm_test_utils, kvm_net_utils
+
+def run_nic_promisc(test, params, env):
+"""
+Test nic driver in promisc mode:
+
+1) Boot up a guest
+2) Repeatedly enable/disable promiscuous mode in guest
+3) TCP data transmission from host to guest, and from guest to host,
+   with 1/1460/65000/1 bytes payloads
+4) Clean temporary files
+5) Stop enable/disable promiscuous mode change
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment
+"""
+timeout = int(params.get("login_timeout", 360))
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+session = kvm_test_utils.wait_for_login(vm, timeout=timeout)
+logging.info("Trying to log into guest '%s' by serial", vm.name)
+session2 = kvm_utils.wait_for(lambda: vm.serial_login(),
+  timeout, 0, step=2)
+if not session2:
+raise error.TestFail("Could not log into guest '%s'" % vm.name)
+
+def compare(filename):
+cmd = "md5sum %s" % filename
+s1, ret_host = commands.getstatusoutput(cmd)
+s2, ret_guest = session.get_command_status_output(cmd)
+if s1 != 0 or s2 != 0:
+logging.debug("ret_host:%s, ret_guest:%s" % (ret_host, ret_guest))
+logging.error("Could not get md5, cmd:%s" % cmd)
+return False
+if ret_host.strip() != ret_guest.strip():
+logging.debug("ret_host :%s, ret_guest:%s" % (ret_host, ret_guest))
+logging.error("Files' md5sum mismatch" % (receiver))
+return False
+return True
+
+ethname = kvm_net_utils.get_linux_ifname(session, vm.get_macaddr(0))
+set_promisc_cmd = "ip link set %s promisc on; sleep 0.01;" % ethname
+set_promisc_cmd += "ip link set %s promisc off; sleep 0.01" % ethname
+logging.info("Set promisc change repeatedly in guest")
+session2.sendline("while true; do %s; done" % set_promisc_cmd)
+
+dd_cmd = "dd if=/dev/urandom of=%s bs=%d count=1"
+filename = "/tmp/nic_promisc_file"
+file_size = params.get("file_size", "1, 1460, 65000, 1").split(",")
+try:
+for size in file_size:
+logging.info("Create %s bytes file on host" % size)
+s, o = commands.getstatusoutput(dd_cmd % (filename, int(size)))
+if s != 0:
+logging.debug("Output: %s"% o)
+raise error.TestFail("Create file on host failed")
+
+logging.info("Transfer file from host to guest")
+if not vm.copy_files_to(filename, filename):
+raise error.TestFail("File transfer failed")
+if not compare(filename):
+raise error.TestFail("Compare file failed")
+
+logging.info("Create %s bytes file on guest" % size)
+if session.get_command_status(dd_cmd % (filename, int(size)),
+timeout=100) != 0:
+raise error.TestFail("Create file on guest failed")
+
+logging.info("Transfer file from guest to host")
+if not vm.copy_files_from(filename, filename):
+raise error.TestFail("File transfer failed")
+if not compare(filename):
+raise error.TestFail("Compare file failed")
+
+logging.info("Clean temporal files")
+cmd = "rm -f %s" % filename
+s1, o = commands.getstatusoutput(cmd)
+s2 = session.get_command_status(cmd)
+if s1 != 0 or s2 != 0:
+raise error.TestError("Fail to clean temporal files")
+finally:
+logging.info("Restore the %s to the nonpromisc mode" % ethname)
+session2.close()
+session.get_command_status("ip link set %s promisc off" % ethname)
+session.close()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 03d15c0..9e2b9a0 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -370,6 +370,10 @@ variants:
 scp_timeout = 300
 thread_num = 10
 
+- nic_promisc:  install setup unattended_install.cdrom
+type = nic_promisc
+file_size = 1, 1460, 65000, 1
+
 - physical_resources_check: install setup unattended_install.cdrom
 type = physical_resources_check
 catch_uuid_cmd = dmidecode | awk -F: '/UU

[Qemu-devel] [RFC PATCH 07/14] KVM-test: Add a subtest of load/unload nic driver

2010-07-19 Thread Amos Kong

Repeatedly load/unload nic driver, try to transfer file between guest and host
by threads at the same time, and check the md5sum.

Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/nicdriver_unload.py 
b/client/tests/kvm/tests/nicdriver_unload.py
new file mode 100644
index 000..22f9f44
--- /dev/null
+++ b/client/tests/kvm/tests/nicdriver_unload.py
@@ -0,0 +1,128 @@
+import logging, commands, threading, re, os
+from autotest_lib.client.common_lib import error
+import kvm_utils, kvm_test_utils, kvm_net_utils
+
+def run_nicdriver_unload(test, params, env):
+"""
+Test nic driver
+
+1) Boot a vm
+2) Get the nic driver name
+3) Repeatedly unload/load nic driver
+4) Multi-session TCP transfer on test interface
+5) Check the test interface should still work
+
+@param test: KVM test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+timeout = int(params.get("login_timeout", 360))
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+session = kvm_test_utils.wait_for_login(vm, timeout=timeout)
+logging.info("Trying to log into guest '%s' by serial", vm.name)
+session2 = kvm_utils.wait_for(lambda: vm.serial_login(),
+  timeout, 0, step=2)
+if not session2:
+raise error.TestFail("Could not log into guest '%s'" % vm.name)
+
+ethname = kvm_net_utils.get_linux_ifname(session, vm.get_macaddr(0))
+try:
+# FIXME: Try three waies to get nic driver name, because the
+# modprobe.conf is dropped in latest system, and ethtool method is not
+# supported by virtio_nic.
+
+output = session.get_command_output("cat /etc/modprobe.conf")
+driver = re.findall(r'%s (\w+)' % ethname,output)
+if not driver:
+output = session.get_command_output("ethtool -i %s" % ethname)
+driver = re.findall(r'driver: (\w+)', output)
+if not driver:
+output = session.get_command_output("lspci -k")
+driver = re.findall("Ethernet controller.*\n.*\n.*Kernel driver"
+" in use: (\w+)", output)
+driver = driver[0]
+except IndexError:
+raise error.TestError("Could not find driver name")
+
+logging.info("driver is %s" % driver)
+
+class ThreadScp(threading.Thread):
+def run(self):
+remote_file = '/tmp/' + self.getName()
+file_list.append(remote_file)
+ret = vm.copy_files_to(file_name, remote_file, timeout=scp_timeout)
+logging.debug("Copy result of %s: %s" % (remote_file, ret))
+
+def compare(origin_file, receive_file):
+cmd = "md5sum %s"
+output1 = commands.getstatusoutput(cmd % origin_file)[1].strip()
+check_sum1 = output1.split()[0]
+s, output2 = session.get_command_status_output(cmd % receive_file)
+if s != 0:
+logging.error("Could not get md5sum of receive_file")
+return False
+check_sum2 = output2.strip().split()[0]
+logging.debug("origin: %s, receive: %s" % (check_sum1, check_sum2))
+if check_sum1 != check_sum2:
+logging.error("md5sum doesn't match")
+return False
+return True
+
+#produce sized file in host
+file_size = params.get("file_size")
+file_name = "/tmp/nicdriver_unload_file"
+cmd = "dd if=/dev/urandom of=%s bs=%sM count=1"
+s, o = commands.getstatusoutput(cmd % (file_name, file_size))
+if s != 0:
+raise error.TestFail("Fail to create file by dd")
+
+connect_time = params.get("connect_time")
+scp_timeout = int(params.get("scp_timeout"))
+thread_num = int(params.get("thread_num"))
+file_list = []
+
+unload_load_cmd = "sleep %s && ifconfig %s down && modprobe -r %s && "
+unload_load_cmd += "sleep 1 && modprobe %s && ifconfig %s up"
+unload_load_cmd = unload_load_cmd % (connect_time, ethname, driver,
+ driver, ethname)
+pid = os.fork()
+if pid != 0:
+logging.info("unload/load nic driver repeatedly in guest...")
+while True:
+logging.debug("Try to unload/load nic drive once")
+if session2.get_command_status(unload_load_cmd, timeout=120) != 0:
+session.get_command_output("rm -rf /tmp/Thread-*")
+raise error.TestFail("Unload/load nic driver failed")
+pid, s = os.waitpid(pid, os.WNOHANG)
+status = os.WEXITSTATUS(s)
+if (pid, status) != (0, 0):
+logging.debug("Child process ending")
+break
+else:
+logging.info("Multi-session tcp data transfer")
+threads = []
+for i in range(thread_num):
+t = ThreadScp()
+t.start()
+threads.append(t)
+for t in threads:
+

[Qemu-devel] [RFC PATCH 03/14] KVM Test: Add a common ping module for network related tests

2010-07-19 Thread Amos Kong

The kvm_net_utils.py is a just a place that wraps common network
related commands which is used to do the network-related tests.
Use -1 as the packet ratio for loss analysis.
Use quiet mode when doing the flood ping.

Signed-off-by: Jason Wang 
Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/kvm_net_utils.py 
b/client/tests/kvm/kvm_net_utils.py
index ede4965..8a71858 100644
--- a/client/tests/kvm/kvm_net_utils.py
+++ b/client/tests/kvm/kvm_net_utils.py
@@ -1,4 +1,114 @@
-import re
+import logging, re, signal
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_utils
+
+def get_loss_ratio(output):
+"""
+Get the packet loss ratio from the output of ping
+
+@param output
+"""
+try:
+return int(re.findall('(\d+)% packet loss', output)[0])
+except IndexError:
+logging.debug(output)
+return -1
+
+def raw_ping(command, timeout, session, output_func):
+"""
+Low-level ping command execution.
+
+@param command: ping command
+@param timeout: timeout of the ping command
+@param session: local executon hint or session to execute the ping command
+"""
+if session == "localhost":
+process = kvm_subprocess.run_bg(command, output_func=output_func,
+timeout=timeout)
+
+# Send SIGINT singal to notify the timeout of running ping process,
+# Because ping have the ability to catch the SIGINT signal so we can
+# always get the packet loss ratio even if timeout.
+if process.is_alive():
+kvm_utils.kill_process_tree(process.get_pid(), signal.SIGINT)
+
+status = process.get_status()
+output = process.get_output()
+
+process.close()
+return status, output
+else:
+session.sendline(command)
+status, output = session.read_up_to_prompt(timeout=timeout,
+   print_func=output_func)
+if status is False:
+# Send ctrl+c (SIGINT) through ssh session
+session.sendline("\003")
+status, output2 = session.read_up_to_prompt(print_func=output_func)
+output += output2
+if status is False:
+# We also need to use this session to query the return value
+session.sendline("\003")
+
+session.sendline(session.status_test_command)
+s2, o2 = session.read_up_to_prompt()
+if s2 is False:
+status = -1
+else:
+try:
+status = int(re.findall("\d+", o2)[0])
+except:
+status = -1
+
+return status, output
+
+def ping(dest = "localhost", count = None, interval = None, interface = None,
+ packetsize = None, ttl = None, hint = None, adaptive = False,
+ broadcast = False, flood = False, timeout = 0,
+ output_func = logging.debug, session = "localhost"):
+"""
+Wrapper of ping.
+
+@param dest: destination address
+@param count: count of icmp packet
+@param interval: interval of two icmp echo request
+@param interface: specified interface of the source address
+@param packetsize: packet size of icmp
+@param ttl: ip time to live
+@param hint: path mtu discovery hint
+@param adaptive: adaptive ping flag
+@param broadcast: broadcast ping flag
+@param flood: flood ping flag
+@param timeout: timeout for the ping command
+@param output_func: function used to log the result of ping
+@param session: local executon hint or session to execute the ping command
+"""
+
+command = "ping %s " % dest
+
+if count is not None:
+command += " -c %s" % count
+if interval is not None:
+command += " -i %s" % interval
+if interface is not None:
+command += " -I %s" % interface
+if packetsize is not None:
+command += " -s %s" % packetsize
+if ttl is not None:
+command += " -t %s" % ttl
+if hint is not None:
+command += " -M %s" % hint
+if adaptive is True:
+command += " -A"
+if broadcast is True:
+command += " -b"
+if flood is True:
+# temporary workaround as the kvm_subprocess may not properly handle
+# the timeout for the output of flood ping
+command += " -f -q"
+output_func = None
+
+return raw_ping(command, timeout, session, output_func)
 
 def get_linux_ifname(session, mac_address):
 """

[Qemu-devel] [Autotest][RFC PATCH 00/14] Patchset of network related subtests

2010-07-19 Thread Amos Kong

The following series contain 11 network related subtests, welcome to give me
some suggestions about correctness, design, enhancement.

Thank you so much!

---

Amos Kong (14):
  KVM-test: Add a new macaddress pool algorithm
  KVM Test: Add a function get_interface_name() to kvm_net_utils.py
  KVM Test: Add a common ping module for network related tests
  KVM-test: Add a new subtest ping
  KVM-test: Add a subtest jumbo
  KVM-test: Add basic file transfer test
  KVM-test: Add a subtest of load/unload nic driver
  KVM-test: Add a subtest of nic promisc
  KVM-test: Add a subtest of multicast
  KVM-test: Add a subtest of pxe
  KVM-test: Add a subtest of changing mac address
  KVM-test: Add a subtest of netperf
  KVM-test: Improve vlan subtest
  KVM-test: Add subtest of testing offload by ethtool


 0 files changed, 0 insertions(+), 0 deletions(-)

-- 
Amos Kong

[Qemu-devel] Bonjour..

2010-07-19 Thread Newsletter

Bonjours a vous,

Bonjours a vous, Peut-on gagner sa vie avec internet ?  De nombreuses personnes 
se posent cette question. Les moyens gratuits accessibles par tous ne 
permettent pas de vivre d'internet (grand maxi 500 euros par mois).

Ces moyens permettent de se faire un peu d'argent de poche, ce qui est déjà 
très bien. De nombreux systèmes à paliers promettent des très grandes sommes 
sous forme de rentes si on investit au départ : méfiez vous, la plupart du 
temps ces sites ne payent jamais.

Mais c'est bel et bien possible de gagner sa vie sur internet. Si vous voulez 
en savoir plus ..

venez sur http://gagnezargentnet.awardspace.biz afin d'en savoir plus sur 
comment proceder

pour faire de l'argent sur internet en toute légalité bien entendu. Si vous 
etes donc interessé vous n'avez aucune raison de ne pas venir apprendre .. 
Beaucoup l'on deja fait et d'autres le feront.

Ce n est pas de l'argent facile. c'est juste une autre facon d'en gagner ..

Si vous avez des questions, je vous invite a me contacter par e-mail a 
l'adresse ci dessous. garcia_bast...@hotmail.com

cordialement,

Garcia Bastien
Specialiste  Dans le business et Marketing sur le Net .



 
--extPart_001_1386_695C700A.4A1B1D8B--

[Qemu-devel] [Autotest PATCH 00/14] Patchset of network related subtests

2010-07-19 Thread Amos Kong

The following series contain 11 network related subtests, welcome to give me
some suggestions about correctness, design, enhancement.

Thank you so much!

---

Amos Kong (14):
  KVM-test: Add a new macaddress pool algorithm
  KVM Test: Add a function get_interface_name() to kvm_net_utils.py
  KVM Test: Add a common ping module for network related tests
  KVM-test: Add a new subtest ping
  KVM-test: Add a subtest jumbo
  KVM-test: Add basic file transfer test
  KVM-test: Add a subtest of load/unload nic driver
  KVM-test: Add a subtest of nic promisc
  KVM-test: Add a subtest of multicast
  KVM-test: Add a subtest of pxe
  KVM-test: Add a subtest of changing mac address
  KVM-test: Add a subtest of netperf
  KVM-test: Improve vlan subtest
  KVM-test: Add subtest of testing offload by ethtool


 0 files changed, 0 insertions(+), 0 deletions(-)

-- 
Amos Kong

[Qemu-devel] [PULL] vhost, e1000

2010-07-19 Thread Michael S. Tsirkin

The following changes since commit 488243b0e9126daa5f1e7fb2e97717b66a977517:

  target-ppc: fix power mode checking on 7400/7410 (2010-07-19 00:33:29 +0200)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu.git for_anthony

Michael S. Tsirkin (4):
  Merge branch 'master' into pci
  e1000: fix access 4 bytes beyond buffer end
  e1000: secrc support
  vhost: fix miration during device start

 hw/e1000.c   |   12 ++--
 hw/pci-hotplug.c |2 +-
 hw/pci.c |   34 +-
 hw/pcnet.c   |   16 ++--
 hw/rtl8139.c |3 ---
 hw/vhost.c   |   21 +++--
 hw/virtio-net.c  |   41 -
 hw/vmware_vga.c  |3 ---
 sysemu.h |1 -
 9 files changed, 77 insertions(+), 56 deletions(-)

[Qemu-devel] KVM call agenda for July 20

2010-07-19 Thread Chris Wright

Please send in any agenda items you are interested in covering.

thanks,
-chris

[Qemu-devel] Re: [PATCH] block: Use error codes from lower levels for error message

2010-07-19 Thread Stefan Weil


Am 19.07.2010 14:26, schrieb Kevin Wolf:

Am 18.07.2010 21:42, schrieb Stefan Weil:
   

"No such file or directory" is a misleading error message
when a user tries to open a file with wrong permissions.

Cc: Kevin Wolf
Signed-off-by: Stefan Weil
---
  block.c |   12 
  1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index f837876..2f80540 100644
--- a/block.c
+++ b/block.c
@@ -330,16 +330,20 @@ BlockDriver *bdrv_find_protocol(const char *filename)
  return NULL;
  }

-static BlockDriver *find_image_format(const char *filename)
+static BlockDriver *find_image_format(const char *filename, int *error)
 

Wouldn't it be a more natural interface to return an 0/-errno int and
pass the BlockDriver* by reference? I think we already have some
function that work this way in the block code, but I can't remember any
that get an int *error.
   


... nor did I find a function which takes a BlockDriver**.
But if you prefer it like that, I can send a new version of the patch.

   

  {
  int ret, score, score_max;
  BlockDriver *drv1, *drv;
  uint8_t buf[2048];
  BlockDriverState *bs;

+*error = -ENOENT;
 

Why -ENOENT is the default would be clearer if you moved it down next to
the drv = NULL before the loop that searches for the driver.

   


What about the "return bdrv_find_format" lines?
They need a default value, too.

And I did not want to change too much because I cannot
run a complete test for all cases.

So setting *error at the beginning should be the safest
modification.


Apart from these minor nitpicks it looks good.

Kevin

   

Thanks.

Stefan

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Anthony Liguori


On 07/19/2010 11:11 AM, Gleb Natapov wrote:

On Mon, Jul 19, 2010 at 10:54:03AM -0500, Anthony Liguori wrote:
   

On 07/19/2010 09:53 AM, Gleb Natapov wrote:
 

On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote:
   

On 07/19/2010 02:33 AM, Gleb Natapov wrote:
 

On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
   

On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
 

That what I am warring about too. If we are adding device we have to be
sure such device can actually exist on real hw too otherwise we may have
problems later.
   

I don't understand why the constraints of real h/w have anything to do
with this.  Can you explain?

 

Each time we do something not architectural it cause us troubles later.
So constraints of real h/w is our constrains to.
   

Your constraints are purely artificial.

 

What is artificial about it? Each time we break them we safer.
   

Just because something doesn't fit as an ISA or PCI device doesn't
mean it can't exist in real life.  There are plenty of one-off
devices with odd interfaces.
 

And there are such that cause cpu to stall for 6.5 seconds when you do
io to them?


That would certainly be a poorly designed interface.  I can appreciate 
your point and I think suggesting that we should implement an ad-hoc 
completion interface is reasonable.  For instance,


outl(FW_CFG_SET_INITRD_ADDR, addr)
while !inb(FW_CFG_INITRD_READY):
 // spin


There are plenty of places that something like fw_cfg could live and
still do DMA.  It can directly hang off of the Southbridge.  It
doesn't necessary need to be connected to the ISA/LPC buses.
 

Examples of real HW?
   

The IBM IMM, HP ILO, or Intel iAMT modules.  They basically play an
identical role to fw_cfg.

 

So what are their interfaces?  May be we should emulate one.
   


The interface to firmware is private and changes from platform to 
platform.  The IMM exposes various interfaces to the OS as it implements 
a number of legacy devices.  It also exposes a side-channel (very 
similar to virtio-console) as a USB RNDIS driver.  I believe it 
implements IPMI over a private ethernet type although I'd have to double 
check.  It may actually use TCP/IP. Of course it is possible to add



proper DMA interface to fw_cfg, but should we do it for such a small
gain?
   

I think an ad-hoc DMA interface is perfectly reasonable to do.  I
agree that adding a more generic DMA interface is overkill.

 

It should look like real DMA at least. The justification for it should
be better than "In our project we don't what to do this and we don't
what to do that so our initrd is 100M now, so why not add hack to qemu
to load it 1 second faster so we can grow it some more".
   


I certainly agree that adding a polling interface for DMA completion is 
a reasonable requirement.


Regards,

Anthony Liguori


--
Gleb.

[Qemu-devel] [PATCH v2] loadvm: improve tests before bdrv_snapshot_goto()

2010-07-19 Thread Miguel Di Ciurcio Filho

This patch improves the resilience of the load_vmstate() function, doing
further and better ordered tests.

In load_vmstate(), if there is any error on bdrv_snapshot_goto(), except if the
error is on VM state device, load_vmstate() will return zero and the VM will be
started with major corruption chances.

The current process:
- test if there is any writable device without snapshot support
- if exists return -error
- get the device that saves the VM state, possible return -error but unlikely
because it was tested earlier
- flush I/O
- run bdrv_snapshot_goto() on devices
- if fails, give an warning and goes to the next (not good!)
- if fails on the VM state device, return zero (not good!)
- check if the requested snapshot exists on the device that saves the VM state
and the state is not zero
- if fails return -error
- open the file with the VM state
- if fails return -error
- load the VM state
- if fails return -error
- return zero

New behavior:
- get the device that saves the VM state
- if fails return -error
- check if the requested snapshot exists on the device that saves the VM state
and the state is not zero
- if fails return -error
- test if there is any writable device without snapshot support
- if exists return -error
- test if the devices with snapshot support have the requested snapshot
- if anyone fails, return -error
- flush I/O
- run snapshot_goto() on devices
- if anyone fails, return -error
- open the file with the VM state
- if fails return -error
- load the VM state
- if fails return -error
- return zero

do_loadvm must not call vm_start if any error has occurred in load_vmstate.

Changelog from v1
---
- Use -ENOTSUP instead of -EINVAL when no device supports snapshots
- Split the verification of the existance of an snapshot on the VM state device
  and the verification of the size of the saved VM state

Signed-off-by: Miguel Di Ciurcio Filho 
---
 monitor.c |3 +-
 savevm.c  |   71 +---
 2 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/monitor.c b/monitor.c
index 45fd482..aa60cfa 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2270,8 +2270,9 @@ static void do_loadvm(Monitor *mon, const QDict *qdict)
 
 vm_stop(0);
 
-if (load_vmstate(name) >= 0 && saved_vm_running)
+if (load_vmstate(name) == 0 && saved_vm_running) {
 vm_start();
+}
 }
 
 int monitor_get_fd(Monitor *mon, const char *fdname)
diff --git a/savevm.c b/savevm.c
index ee27989..f21873e 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1804,12 +1804,27 @@ void do_savevm(Monitor *mon, const QDict *qdict)
 
 int load_vmstate(const char *name)
 {
-BlockDriverState *bs, *bs1;
+BlockDriverState *bs, *bs_vm_state;
 QEMUSnapshotInfo sn;
 QEMUFile *f;
 int ret;
 
-/* Verify if there is a device that doesn't support snapshots and is 
writable */
+bs_vm_state = bdrv_snapshots();
+if (!bs_vm_state) {
+error_report("No block device supports snapshots");
+return -ENOTSUP;
+}
+
+/* Don't even try to load empty VM states */
+ret = bdrv_snapshot_find(bs_vm_state, &sn, name);
+if (ret < 0) {
+return ret;
+} else if (sn.vm_state_size == 0) {
+return -EINVAL;
+}
+
+/* Verify if there is any device that doesn't support snapshots and is
+writable and check if the requested snapshot is available too. */
 bs = NULL;
 while ((bs = bdrv_next(bs))) {
 
@@ -1822,63 +1837,45 @@ int load_vmstate(const char *name)
bdrv_get_device_name(bs));
 return -ENOTSUP;
 }
-}
 
-bs = bdrv_snapshots();
-if (!bs) {
-error_report("No block device supports snapshots");
-return -EINVAL;
+ret = bdrv_snapshot_find(bs, &sn, name);
+if (ret < 0) {
+error_report("Device '%s' does not have the requested snapshot 
'%s'",
+   bdrv_get_device_name(bs), name);
+return ret;
+}
 }
 
 /* Flush all IO requests so they don't interfere with the new state.  */
 qemu_aio_flush();
 
-bs1 = NULL;
-while ((bs1 = bdrv_next(bs1))) {
-if (bdrv_can_snapshot(bs1)) {
-ret = bdrv_snapshot_goto(bs1, name);
+bs = NULL;
+while ((bs = bdrv_next(bs))) {
+if (bdrv_can_snapshot(bs)) {
+ret = bdrv_snapshot_goto(bs, name);
 if (ret < 0) {
-switch(ret) {
-case -ENOTSUP:
-error_report("%sSnapshots not supported on device '%s'",
- bs != bs1 ? "Warning: " : "",
- bdrv_get_device_name(bs1));
-break;
-case -ENOENT:
-error_report("%sCould not find snapshot '%s' on device 
'%s'",
- bs != bs1 ? "Warning: " : "",
-

[Qemu-devel] Re: [PATCH] loadvm: improve tests before bdrv_snapshot_goto()

2010-07-19 Thread Miguel Di Ciurcio Filho

On Mon, Jul 19, 2010 at 11:22 AM, Kevin Wolf  wrote:
>>
>> -    /* Verify if there is a device that doesn't support snapshots and is 
>> writable */
>> +    bs_vm_state = bdrv_snapshots();
>> +    if (!bs_vm_state) {
>> +        error_report("No block device supports snapshots");
>> +        return -EINVAL;
>
> -ENOTSUP?

It was -EINVAL before, just kept it. But -ENOTSUP make more sense.


>> +    /* Don't even try to load empty VM states */
>> +    ret = bdrv_snapshot_find(bs_vm_state, &sn, name);
>> +    if ((ret >= 0) && (sn.vm_state_size == 0)) {
>> +        return -EINVAL;
>> +    }
>
> You can probably stop here already if ret < 0:
>
> ret = ...
> if (ret < 0) {
>    return ret;
> } else if (sn.vm_state_size == 0) {
>    return -EINVAL;
> }
>

Better indeed.

>>
>> @@ -1821,64 +1834,46 @@ int load_vmstate(const char *name)
>>              error_report("Device '%s' is writable but does not support 
>> snapshots.",
>>                                 bdrv_get_device_name(bs));
>>              return -ENOTSUP;
>> +        } else {
>
> The then branch has a return, so you don't need the else here and can
> have the following code nested one level less.
>

Ack.

I will send v2 shortly.

Thanks,

Miguel

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 05:47:40PM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 07:11:37PM +0300, Gleb Natapov wrote:
> > And there are such that cause cpu to stall for 6.5 seconds when you do
> > io to them? I never said that we should implement ISA or PCI device, I
> > don't know why you bring them here.
> 
> Where is "6.5 seconds" coming from?  That is the *total boot time*
> of the libguestfs appliance, and includes far far more than the
> time taken to do the memcpy.
> 
> I timed the call to cpu_physical_memory_write, and it takes 115
> milliseconds with my patch (for an initrd which is 113 MB).
> 
And how much time it takes to load it using string PIO? 1 second 115
millisecond? I thought 6.5 and 7.5 was image loading time, not total
boot time. Stalling vcpu execution for 115 millisecond may be unfortunate
but not as catastrophic as 6.5 seconds. But interface will be there for
everyone to use, so it may be eventually abused even more.

> > It should look like real DMA at least. The justification for it should
> > be better than "In our project we don't what to do this and we don't
> > what to do that so our initrd is 100M now, so why not add hack to qemu
> > to load it 1 second faster so we can grow it some more".
> 
> Please don't make stuff up.  We have a large initrd for perfectly good
> reasons which I have outlined in a previous email.

Those reasons does not look good for me at all. Cleaning up existing
distro is not much less work that creating basic distro with only things
you need, but result is much better. When I worked on embedded project
almost 10 years ago we tried to cleanup generic Red Hat Linux and result
was still huge. By building our own distro we were able to squeeze two
root partitions in 64M compressed. You also do not want to consider
putting things into cdrom because of some issues that should be
solvable.

--
Gleb.

[Qemu-devel] [PATCH v3 1/2] QMP: Introduce the documentation for query-qdm

2010-07-19 Thread Miguel Di Ciurcio Filho

---
 qemu-monitor.hx |   71 +++
 1 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index 2af3de6..4e6062b 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -2490,6 +2490,77 @@ STEXI
 show device tree
 @item info qdm
 show qdev device model list
+ETEXI
+SQMP
+query-qdm
+-
+
+Describe the capabilities of all devices registered with qdev.
+
+The returned output is a json-array, each element is a json-object describing
+a single device type.
+
+Each json-object contains the following:
+
+- "name": name of the device (json-string)
+- "bus": the name of the bus type for the device (json-string)
+- Possible values: PCI, SCSI, I2C, ISA, SSI, USB, virtio-serial-bus, 
System,
+IDE, s390-virtio
+- "alias": an alias by which the device is also known (json-string, optional)
+- "description": description of the device  (json-string, optional)
+- "creatable": whether this device can be created by the user (json-boolean)
+- "properties": a json-array where each item is a json-object that describes a
+  property of the device. If the device has no property to be setup, this item
+  will not be present. Each json-object contains the following:
+ - "name": the name of the property (json-string)
+ - "type": the json type of the property (json-string)
+- Possible values: integer, string, boolean
+
+Example:
+
+-> { "execute": "query-qdm" }
+<- {
+  "return": [
+{
+   "name": "virtio-blk-pci",
+   "creatable": true,
+   "bus": "PCI",
+   "properties": [
+ {
+ "name": "indirect_desc",
+ "type": "boolean" 
+ },
+ {
+ "name": "logical_block_size",
+ "type": "integer"
+ },
+ {
+ "name": "opt_io_size",
+ "type": "integer"
+ },
+ {
+ "name": "drive",
+ "type": "string"
+ }
+   ]
+},
+{
+   "name": "virtio-balloon-pci",
+   "creatable": true,
+   "bus": "PCI",
+   "properties": [
+ {
+ "name": "indirect_desc",
+ "type": "boolean"
+ }
+   ]
+},
+  
+]
+
+EQMP
+
+STEXI
 @item info roms
 show roms
 @end table
-- 
1.7.1

[Qemu-devel] [PATCH v3 2/2] monitor: Convert 'info qdm' to QMP

2010-07-19 Thread Miguel Di Ciurcio Filho

Converts the 'info qdm' command to QMP, allowing the discovery of all devices
known to the QEMU binary without relying on command line paramaters like
-device ? and -device devtype,?

This change does not modify the output of the 'info qdm' monitor command.

Signed-off-by: Miguel Di Ciurcio Filho 
---
 hw/qdev.c |  110 +++-
 hw/qdev.h |3 +-
 monitor.c |3 +-
 3 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/hw/qdev.c b/hw/qdev.c
index e99c73f..d24d42a 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -29,6 +29,7 @@
 #include "qdev.h"
 #include "sysemu.h"
 #include "monitor.h"
+#include "qjson.h"
 
 static int qdev_hotplug = 0;
 
@@ -779,13 +780,118 @@ void do_info_qtree(Monitor *mon)
 qbus_print(mon, main_system_bus, 0);
 }
 
-void do_info_qdm(Monitor *mon)
+static void qdm_list_iter(QObject *obj, void *opaque)
+{
+
+Monitor *mon = opaque;
+QDict *dev = qobject_to_qdict(obj);
+
+monitor_printf(mon, "name \"%s\", bus %s", qdict_get_str(dev, "name"),
+qdict_get_str(dev, "bus"));
+
+if (qdict_haskey(dev, "alias")) {
+monitor_printf(mon, ", alias \"%s\"", qdict_get_str(dev, "alias"));
+}
+
+if (qdict_haskey(dev, "description")) {
+monitor_printf(mon, ", desc \"%s\"", qdict_get_str(dev, 
"description"));
+}
+
+if (!qdict_get_bool(dev, "creatable")) {
+monitor_printf(mon, ", no-user");
+}
+
+monitor_printf(mon, "\n");
+}
+
+void do_info_qdm_print(Monitor *mon, const QObject *ret_data)
+{
+QList *devs;
+
+devs = qobject_to_qlist(ret_data);
+qlist_iter(devs, qdm_list_iter, mon);
+}
+
+static const char *qdev_property_type_to_string(int type)
+{
+switch (type) {
+case PROP_TYPE_UINT8:
+case PROP_TYPE_UINT16:
+case PROP_TYPE_UINT32:
+case PROP_TYPE_INT32:
+case PROP_TYPE_UINT64:
+return "integer";
+case PROP_TYPE_TADDR:
+case PROP_TYPE_MACADDR:
+case PROP_TYPE_DRIVE:
+case PROP_TYPE_CHR:
+case PROP_TYPE_STRING:
+case PROP_TYPE_NETDEV:
+return "string";
+case PROP_TYPE_BIT:
+   return "boolean";
+case PROP_TYPE_UNSPEC:
+case PROP_TYPE_VLAN:
+case PROP_TYPE_PTR:
+return NULL;
+}
+
+return NULL;
+}
+
+void do_info_qdm(Monitor *mon, QObject **ret_data)
 {
 DeviceInfo *info;
+QList *devs = qlist_new();
 
 for (info = device_info_list; info != NULL; info = info->next) {
-qdev_print_devinfo(info);
+QObject *obj;
+QDict *dev;
+QList *props = qlist_new();
+Property *prop;
+
+for (prop = info->props; prop && prop->name; prop++) {
+QObject *entry;
+/*
+ * TODO: skip old and hackish stuff, they will be removed some day.
+ */
+if (!prop->info->parse || prop->info->type == PROP_TYPE_VLAN
+|| prop->info->type == PROP_TYPE_PTR
+|| prop->info->type == PROP_TYPE_UNSPEC) {
+continue;
+}
+
+const char *type = qdev_property_type_to_string(prop->info->type);
+
+entry = qobject_from_jsonf("{ 'name': %s, 'type': %s }",
+   prop->name, type);
+
+qlist_append_obj(props, entry);
+}
+
+obj = qobject_from_jsonf("{ 'name': %s, 'bus': %s, 'creatable': %i }",
+ info->name,
+ info->bus_info->name,
+ info->no_user ? 0 : 1);
+
+dev = qobject_to_qdict(obj);
+
+if (!qlist_empty(props)) {
+qdict_put(dev, "properties", props);
+}
+
+if (info->alias) {
+qdict_put(dev, "alias", qstring_from_str(info->alias));
+}
+
+if (info->desc) {
+qdict_put(dev, "description", qstring_from_str(info->desc));
+}
+
+qlist_append(devs, dev);
 }
+
+*ret_data = QOBJECT(devs);
 }
 
 int do_device_add(Monitor *mon, const QDict *qdict, QObject **ret_data)
diff --git a/hw/qdev.h b/hw/qdev.h
index 678f8b7..3b0382b 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -184,7 +184,8 @@ void qbus_free(BusState *bus);
 /*** monitor commands ***/
 
 void do_info_qtree(Monitor *mon);
-void do_info_qdm(Monitor *mon);
+void do_info_qdm_print(Monitor *mon, const QObject *ret_data);
+void do_info_qdm(Monitor *mon, QObject **ret_data);
 int do_device_add(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_device_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
diff --git a/monitor.c b/monitor.c
index 45fd482..66810f2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2565,7 +2565,8 @@ static const mon_cmd_t info_cmds[] = {
 .args_type  = "",
 .params = "",
 .help   = "show qdev device model list",
-.mhandler.info = do_info_qdm,
+.user_print = do_info_qdm_print,
+.mhandler.info_new = do_info_qdm,
 },
 {
 .name   = "roms",
-- 
1.7.1

[Qemu-devel] [PATCH v3 0/2] QMP: Introduce query-qdm

2010-07-19 Thread Miguel Di Ciurcio Filho

This series introduces the documentation for the query-qdm command and the
conversion of the monitor command 'info qdm' to QMP.

The documentation and code are based on a patch previously sent to qemu-devel
by Daniel P. Berrange:

http://lists.gnu.org/archive/html/qemu-devel/2010-06/msg00931.html

Changelog from v2
-
- added IDE and s390-virtio as possible values for "bus"
- specify that the "properties" list is optional, in case the device doesn't
have anything to be setup
- reworded the explanation of "creatable"
- reverted the qdev/qmp split, just use "type" and its json equivalent
representation
- do not list legacy stuff: PROP_TYPE_(VLAN|PTR|UNSPEC)

Changelog from v1
-
- renamed "props" to "properties"
- updated the examples
- reworded the explanations of "name" and "description"
- split "type" into a json-object, adding "qmp" and "qdev"
- list all possible values for "bus"
- list all possible values for "qdev" on "type"
- list all possible values for "qmp" on "type"

Changes from the Daniel's original patch:
- Split the patch in two, taking out the documentation from the code
- Reworded some parts of the documentation and added data types
- Small cleanups and renamed do_info_devices() to do_info_qdm()
- Added do_info_qdm_print() to be used in the monitor

Regards,

Miguel

---

Miguel Di Ciurcio Filho (2):
  QMP: Introduce the documentation for query-qdm
  monitor: Convert 'info qdm' to QMP

 hw/qdev.c   |  110 ++-
 hw/qdev.h   |3 +-
 monitor.c   |3 +-
 qemu-monitor.hx |   71 +++
 4 files changed, 183 insertions(+), 4 deletions(-)

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 07:11:37PM +0300, Gleb Natapov wrote:
> And there are such that cause cpu to stall for 6.5 seconds when you do
> io to them? I never said that we should implement ISA or PCI device, I
> don't know why you bring them here.

Where is "6.5 seconds" coming from?  That is the *total boot time*
of the libguestfs appliance, and includes far far more than the
time taken to do the memcpy.

I timed the call to cpu_physical_memory_write, and it takes 115
milliseconds with my patch (for an initrd which is 113 MB).

> It should look like real DMA at least. The justification for it should
> be better than "In our project we don't what to do this and we don't
> what to do that so our initrd is 100M now, so why not add hack to qemu
> to load it 1 second faster so we can grow it some more".

Please don't make stuff up.  We have a large initrd for perfectly good
reasons which I have outlined in a previous email.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

[Qemu-devel] [PATCH] Declare code_gen_ptr, code_gen_max_blocks 'static'

2010-07-19 Thread Stefan Weil

Both values are only used in exec.c, so there is no need
to make them globally available.

Signed-off-by: Stefan Weil 
---
 exec-all.h |2 --
 exec.c |4 ++--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/exec-all.h b/exec-all.h
index a775582..58b5575 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -191,8 +191,6 @@ void tb_link_page(TranslationBlock *tb,
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 
 extern TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
-extern uint8_t *code_gen_ptr;
-extern int code_gen_max_blocks;
 
 #if defined(USE_DIRECT_JUMP)
 
diff --git a/exec.c b/exec.c
index 4641b3e..868cd7f 100644
--- a/exec.c
+++ b/exec.c
@@ -80,7 +80,7 @@
 #define SMC_BITMAP_USE_THRESHOLD 10
 
 static TranslationBlock *tbs;
-int code_gen_max_blocks;
+static int code_gen_max_blocks;
 TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
 static int nb_tbs;
 /* any access to the tbs or the page table must use this lock */
@@ -107,7 +107,7 @@ static uint8_t *code_gen_buffer;
 static unsigned long code_gen_buffer_size;
 /* threshold to flush the translated code buffer */
 static unsigned long code_gen_buffer_max_size;
-uint8_t *code_gen_ptr;
+static uint8_t *code_gen_ptr;
 
 #if !defined(CONFIG_USER_ONLY)
 int phys_ram_fd;
-- 
1.7.1

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 10:54:03AM -0500, Anthony Liguori wrote:
> On 07/19/2010 09:53 AM, Gleb Natapov wrote:
> >On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote:
> >>On 07/19/2010 02:33 AM, Gleb Natapov wrote:
> >>>On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> >That what I am warring about too. If we are adding device we have to be
> >sure such device can actually exist on real hw too otherwise we may have
> >problems later.
> I don't understand why the constraints of real h/w have anything to do
> with this.  Can you explain?
> 
> >>>Each time we do something not architectural it cause us troubles later.
> >>>So constraints of real h/w is our constrains to.
> >>Your constraints are purely artificial.
> >>
> >What is artificial about it? Each time we break them we safer.
> 
> Just because something doesn't fit as an ISA or PCI device doesn't
> mean it can't exist in real life.  There are plenty of one-off
> devices with odd interfaces.
And there are such that cause cpu to stall for 6.5 seconds when you do
io to them? I never said that we should implement ISA or PCI device, I
don't know why you bring them here.

> 
> >>There are plenty of places that something like fw_cfg could live and
> >>still do DMA.  It can directly hang off of the Southbridge.  It
> >>doesn't necessary need to be connected to the ISA/LPC buses.
> >Examples of real HW?
> 
> The IBM IMM, HP ILO, or Intel iAMT modules.  They basically play an
> identical role to fw_cfg.
> 
So what are their interfaces?  May be we should emulate one.

> >  And I am not against something that does DMA,
> >but that is not what proposed patch does. It provides magic io
> >instruction that CPU calls and when instruction completes memory is
> >updated. This is nothing like DMA.
> 
> Isn't this exactly what the interface for PCI DMA looks like since
> there's no standard DMA implementation?
> 
Every DMA that I know about support polling for completion or they can
issue interrupt at the end of transaction.  I am not even sure you can
design such HW that will stall cpu in IO instruction till some operation
is completed.

> >  Of course it is possible to add
> >proper DMA interface to fw_cfg, but should we do it for such a small
> >gain?
> 
> I think an ad-hoc DMA interface is perfectly reasonable to do.  I
> agree that adding a more generic DMA interface is overkill.
> 
It should look like real DMA at least. The justification for it should
be better than "In our project we don't what to do this and we don't
what to do that so our initrd is 100M now, so why not add hack to qemu
to load it 1 second faster so we can grow it some more".

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Anthony Liguori


On 07/19/2010 09:53 AM, Gleb Natapov wrote:

On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote:
   

On 07/19/2010 02:33 AM, Gleb Natapov wrote:
 

On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
   

On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
 

That what I am warring about too. If we are adding device we have to be
sure such device can actually exist on real hw too otherwise we may have
problems later.
   

I don't understand why the constraints of real h/w have anything to do
with this.  Can you explain?

 

Each time we do something not architectural it cause us troubles later.
So constraints of real h/w is our constrains to.
   

Your constraints are purely artificial.

 

What is artificial about it? Each time we break them we safer.
   


Just because something doesn't fit as an ISA or PCI device doesn't mean 
it can't exist in real life.  There are plenty of one-off devices with 
odd interfaces.



There are plenty of places that something like fw_cfg could live and
still do DMA.  It can directly hang off of the Southbridge.  It
doesn't necessary need to be connected to the ISA/LPC buses.
 

Examples of real HW?


The IBM IMM, HP ILO, or Intel iAMT modules.  They basically play an 
identical role to fw_cfg.



  And I am not against something that does DMA,
but that is not what proposed patch does. It provides magic io
instruction that CPU calls and when instruction completes memory is
updated. This is nothing like DMA.


Isn't this exactly what the interface for PCI DMA looks like since 
there's no standard DMA implementation?



  Of course it is possible to add
proper DMA interface to fw_cfg, but should we do it for such a small
gain?
   


I think an ad-hoc DMA interface is perfectly reasonable to do.  I agree 
that adding a more generic DMA interface is overkill.


Regards,

Anthony Liguori


Buses exist to multiplex I/O devices because of limited wiring space
on motherboards.  There's no reason we need to constrain ourselves
to minimize the number of virtual wires we emulate.

Regards,

Anthony Liguori
 

--
Gleb.

[Qemu-devel] Re: [PATCH] ide/atapi: add support for GET EVENT STATUS NOTIFICATION

2010-07-19 Thread Kevin Wolf

Am 19.07.2010 17:36, schrieb Aurelien Jarno:
>> Have you tested some more OSes to ensure that they don't start to expect
>> events to actually work now the command "works"? I didn't see any
>> problems in a quick test with Linux, but you never know.
>>
> 
> Besides FreeBSD, I have tested without problem Linux, NetBSD and
> OpenBSD, though I haven't tested them more then booting and mounting a
> CD-ROM.

I guess the interesting thing is changing media. It did work for me on
Linux, though, and I just tried Windows 7 and it seems to be fine, too
(though I tried Windows on qemu-kvm because I only get a BSOD with
upstream qemu)

Kevin

[Qemu-devel] Re: [PATCH] ide/atapi: add support for GET EVENT STATUS NOTIFICATION

2010-07-19 Thread Aurelien Jarno

On Mon, Jul 19, 2010 at 05:28:40PM +0200, Kevin Wolf wrote:
> Am 19.07.2010 15:53, schrieb Aurelien Jarno:
> > The GET EVENT STATUS NOTIFICATION is a mandatory command according
> > to MMC-3, even if event status notification is not supported.
> > 
> > This patch adds support for this command. It returns NEA ("No Event
> > Available") with an empty "Supported Event Classes" to show that it
> > doesn't event support status notification. If asychronous operation is
> > requested, which requires NCQ support, it returns an error according
> > to the specifications.
> > 
> > This fixes HAL support on FreeBSD and derivatives, which fill up the
> > logs every second with:
> > 
> >   acd0: FAILURE - unknown CMD (0x03) ILLEGAL REQUEST asc=0x20 ascq=0x00
> > 
> > Signed-off-by: Aurelien Jarno 
> 
> Looks good to me.

Thanks for the review.

> Would you prefer me to take this into the block branch (actually, I have
> already done this) or are you going to commit directly? This might

I am fine to get it through the block branch.

> actually be something that should be in 0.13.

agreed.


> Have you tested some more OSes to ensure that they don't start to expect
> events to actually work now the command "works"? I didn't see any
> problems in a quick test with Linux, but you never know.
> 

Besides FreeBSD, I have tested without problem Linux, NetBSD and
OpenBSD, though I haven't tested them more then booting and mounting a
CD-ROM.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] Re: [PATCH] ide/atapi: add support for GET EVENT STATUS NOTIFICATION

2010-07-19 Thread Kevin Wolf

Am 19.07.2010 15:53, schrieb Aurelien Jarno:
> The GET EVENT STATUS NOTIFICATION is a mandatory command according
> to MMC-3, even if event status notification is not supported.
> 
> This patch adds support for this command. It returns NEA ("No Event
> Available") with an empty "Supported Event Classes" to show that it
> doesn't event support status notification. If asychronous operation is
> requested, which requires NCQ support, it returns an error according
> to the specifications.
> 
> This fixes HAL support on FreeBSD and derivatives, which fill up the
> logs every second with:
> 
>   acd0: FAILURE - unknown CMD (0x03) ILLEGAL REQUEST asc=0x20 ascq=0x00
> 
> Signed-off-by: Aurelien Jarno 

Looks good to me.

Would you prefer me to take this into the block branch (actually, I have
already done this) or are you going to commit directly? This might
actually be something that should be in 0.13.

Have you tested some more OSes to ensure that they don't start to expect
events to actually work now the command "works"? I didn't see any
problems in a quick test with Linux, but you never know.

Kevin

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 09:52:23AM -0500, Anthony Liguori wrote:
> On 07/19/2010 04:15 AM, Gleb Natapov wrote:
> >On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote:
> >>On 19.07.2010, at 11:06, Gleb Natapov wrote:
> >>
> >>>On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
> virt-install is another program that uses explicit -initrd.
> 
> >>>Installation takes a lot of time. Saving 1 second there will not be
> >>>noticeable. And during lifetime of installed VM initrd will be loaded
> >>>from its disk.
> >>Guys, please. It shouldn't be one or the other. Let's make sure both ways 
> >>of doing things are fast. That's what users want: fast.
> >>
> >That what we are talking about, no? We are trying to find faster way to
> >load kernel/initrd and stay architectural.
> 
> Modern platforms are not nearly as "architectural" as you would think.
> 
> It's not unusual to hang a custom chip off of the Southbridge that
> implements platform specific services along with an array of
> "legacy" devices that are implemented mostly in software to cost.
> 
> Other buses (like PS/2) are largely implemented in SMM today by the BIOS.
> 
I don't get your point. What is not "architectural" about all that?

--
Gleb.

[Qemu-devel] Automatic generation of code-generator components (RETRY)

2010-07-19 Thread Eliot Moss

Dear developers -- I've seen no responses yet.  My proposal
is due in early August, so if anyone has feelings on this or
comments about it, please respond soon :-) ...

Thank you for your patience -- Eliot moss

 Original Message 
Subject: [Qemu-devel] Automatic generation of code-generator components
Date: Tue, 13 Jul 2010 14:09:56 -0400
From: Eliot Moss 
Reply-To: m...@cs.umass.edu
To: qemu-devel@nongnu.org

Dear QEMU developers --

I have had some email conversation with a few active developers,
and with their encouragement, want to open it up for the whole list
to comment.

For several years my research group at UMass has been developing
generic code-generator generator (CGG) technology. Historic CGGs
have always been tied to a particular code-generation framework,
that is, to a particular intermediate representation (IR) and
compiler.  Our tool, called GIST (for Generator of Instruction
Selectors Tool), is designed to work from any reasonable IR and
to connect to any reasonable framework.  More technical details
below, but what we are hoping for is to be able to say that if
we make this industrial-strength with some funding from the
National Science Foundation, the QEMU community will be interested
in using it.  No commitment -- just that you think it *might*
be a good idea if we can make it go.  We would use QEMU as one
of our "demo" environments.

Ok, more details.  We have an architecture description language
called CISL (CoGenT Instruction Set Language; CoGenT is our overall
project's name).  It is somewhat like C or Java in appearance. You
define the various memories and registers, and the instructions.
To generate an instruction selector from input ISA A (generally
a compiler IR, but not necessarily) to output ISA B (generally,
but not always, a hardware architecture), you start with descriptions
of A and B in CISL -- some of which may already be around.  You
also write what we call a *mapping* from A to B, which simply
indicates where on B each memory/register of A should go.  The
tool then finds instruction selector patterns, at least one for
each instruction of the A machine.

For any given retargetable *framework* (compiler, interpreter, emulator),
we write one *adapter*, that knows how to take GIST patterns in their
internal form and write them out in the way that the framework
needs them.

Here's an example.  Suppose we are going from A = QEMU IR to
B = MIPS, that is, the same as the TCG back end for an emulator
running on the MIPS processor.  We have written a CISL
description for the QEMU IR (yes, already), and suppose we have
one for the MIPS, sufficient for code generation anyway.
[Side note: Compilers do not generally use every instruction
of their target, e.g., not the privileged mode ones, etc.
Also, in the presence of register allocation, they generally
target a slightly virtualized machine -- one with a huge
number of registers, which register allocation then resolves
to real registers and occasional spilled locations.]

The mapping would talk about how to find QEMU memory on the
MIPS (perhaps a dedicated base register), etc., and would
also capture the conventions for calling helper routines,
and so on.

The adapter for QEMU TCG back ends would generate something
like a C switch statement with one case for each QEMU IR
instruction. Each case might have some additional case
analysis. This is because (as you see in QEMU), a given
IR instruction can have special cases depending on values
of constants, whether something is in a register, etc.
GIST will have found different patterns for each of these,
and with each one there would be a *constraint*, indicating
when it applies.  For example, patterns for adding a constant
value on the MIPS would likely have a special case for
constants that fit in 16 bits, since then you can use one
immediate instruction.  Likewise, the constant 0 is a
special case since it can just be a move.  In addition
to constraints, patterns have costs, which one can develop
for any given target, but would typically be based on
number of instructions, number of instruction bytes,
number of memory references, etc. Thus the case analysis
for a given instruction would check for the lowest cost
patterns first, and would conclude with the most general
pattern (but which may be the most expensive).

The adapter would also need to generate the information
needed by the QEMU TCG register allocator.

Now, here are some things of additional interest:

- While QEMU IR -> emulation host code-generation is maybe
  the most obvious case, we can also handle the "front end"
  emulation target -> QEMU IR generation.  This probably
  requires a slightly different description of machine A
  than when A is the emulation host -- after all, we must
  handle *all* instructions, including privileged ones,
  etc.  But it is possible to make the descriptions
  modular in such a way that instructions used in both
  cases are not repeated.

- I noticed that someone is looki

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote:
> On 07/19/2010 02:33 AM, Gleb Natapov wrote:
> >On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> >>On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> >>>That what I am warring about too. If we are adding device we have to be
> >>>sure such device can actually exist on real hw too otherwise we may have
> >>>problems later.
> >>I don't understand why the constraints of real h/w have anything to do
> >>with this.  Can you explain?
> >>
> >Each time we do something not architectural it cause us troubles later.
> >So constraints of real h/w is our constrains to.
> 
> Your constraints are purely artificial.
> 
What is artificial about it? Each time we break them we safer.

> There are plenty of places that something like fw_cfg could live and
> still do DMA.  It can directly hang off of the Southbridge.  It
> doesn't necessary need to be connected to the ISA/LPC buses.
Examples of real HW? And I am not against something that does DMA,
but that is not what proposed patch does. It provides magic io
instruction that CPU calls and when instruction completes memory is
updated. This is nothing like DMA. Of course it is possible to add
proper DMA interface to fw_cfg, but should we do it for such a small
gain?

> 
> Buses exist to multiplex I/O devices because of limited wiring space
> on motherboards.  There's no reason we need to constrain ourselves
> to minimize the number of virtual wires we emulate.
> 
> Regards,
> 
> Anthony Liguori

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Anthony Liguori


On 07/19/2010 04:15 AM, Gleb Natapov wrote:

On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote:
   

On 19.07.2010, at 11:06, Gleb Natapov wrote:

 

On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
   

virt-install is another program that uses explicit -initrd.

 

Installation takes a lot of time. Saving 1 second there will not be
noticeable. And during lifetime of installed VM initrd will be loaded
from its disk.
   

Guys, please. It shouldn't be one or the other. Let's make sure both ways of 
doing things are fast. That's what users want: fast.

 

That what we are talking about, no? We are trying to find faster way to
load kernel/initrd and stay architectural.


Modern platforms are not nearly as "architectural" as you would think.

It's not unusual to hang a custom chip off of the Southbridge that 
implements platform specific services along with an array of "legacy" 
devices that are implemented mostly in software to cost.


Other buses (like PS/2) are largely implemented in SMM today by the BIOS.

Regards,

Anthony Liguori


  Honestly I would expect much
greater speedup from Richard's approach like 2 seconds vs 8 seconds. It
is hard to justify code complication just for 1 second speedup.

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Anthony Liguori


On 07/19/2010 02:33 AM, Gleb Natapov wrote:

On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
   

On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
 

That what I am warring about too. If we are adding device we have to be
sure such device can actually exist on real hw too otherwise we may have
problems later.
   

I don't understand why the constraints of real h/w have anything to do
with this.  Can you explain?

 

Each time we do something not architectural it cause us troubles later.
So constraints of real h/w is our constrains to.
   


Your constraints are purely artificial.

There are plenty of places that something like fw_cfg could live and 
still do DMA.  It can directly hang off of the Southbridge.  It doesn't 
necessary need to be connected to the ISA/LPC buses.


Buses exist to multiplex I/O devices because of limited wiring space on 
motherboards.  There's no reason we need to constrain ourselves to 
minimize the number of virtual wires we emulate.


Regards,

Anthony Liguori

[Qemu-devel] Re: [PATCH] loadvm: improve tests before bdrv_snapshot_goto()

2010-07-19 Thread Kevin Wolf

Am 14.07.2010 20:27, schrieb Miguel Di Ciurcio Filho:
> This patch improves the resilience of the load_vmstate() function, doing
> further and better ordered tests.
> 
> In load_vmstate(), if there is any error on bdrv_snapshot_goto(), except if 
> the
> error is on VM state device, load_vmstate() will return zero and the VM will 
> be
> started with major corruption chances.
> 
> The current process:
> - test if there is any writable device without snapshot support
> - if exists return -error
> - get the device that saves the VM state, possible return -error but unlikely
> because it was tested earlier
> - flush I/O
> - run bdrv_snapshot_goto() on devices
> - if fails, give an warning and goes to the next (not good!)
> - if fails on the VM state device, return zero (not good!)
> - check if the requested snapshot exists on the device that saves the VM state
> and the state is not zero
> - if fails return -error
> - open the file with the VM state
> - if fails return -error
> - load the VM state
> - if fails return -error
> - return zero
> 
> New behavior:
> - get the device that saves the VM state
> - if fails return -error
> - check if the requested snapshot exists on the device that saves the VM state
> and the state is not zero
> - if fails return -error
> - test if there is any writable device without snapshot support
> - if exists return -error
> - test if the devices with snapshot support have the requested snapshot
> - if anyone fails, return -error
> - flush I/O
> - run snapshot_goto() on devices
> - if anyone fails, return -error
> - open the file with the VM state
> - if fails return -error
> - load the VM state
> - if fails return -error
> - return zero
> 
> do_loadvm must not call vm_start if any error has occurred in load_vmstate.
> 
> Signed-off-by: Miguel Di Ciurcio Filho 
> ---
>  monitor.c |3 +-
>  savevm.c  |   71 
>  2 files changed, 35 insertions(+), 39 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 45fd482..aa60cfa 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2270,8 +2270,9 @@ static void do_loadvm(Monitor *mon, const QDict *qdict)
>  
>  vm_stop(0);
>  
> -if (load_vmstate(name) >= 0 && saved_vm_running)
> +if (load_vmstate(name) == 0 && saved_vm_running) {
>  vm_start();
> +}
>  }
>  
>  int monitor_get_fd(Monitor *mon, const char *fdname)
> diff --git a/savevm.c b/savevm.c
> index ee27989..9f29cb0 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1804,12 +1804,25 @@ void do_savevm(Monitor *mon, const QDict *qdict)
>  
>  int load_vmstate(const char *name)
>  {
> -BlockDriverState *bs, *bs1;
> +BlockDriverState *bs, *bs_vm_state;
>  QEMUSnapshotInfo sn;
>  QEMUFile *f;
>  int ret;
>  
> -/* Verify if there is a device that doesn't support snapshots and is 
> writable */
> +bs_vm_state = bdrv_snapshots();
> +if (!bs_vm_state) {
> +error_report("No block device supports snapshots");
> +return -EINVAL;

-ENOTSUP?

> +}
> +
> +/* Don't even try to load empty VM states */
> +ret = bdrv_snapshot_find(bs_vm_state, &sn, name);
> +if ((ret >= 0) && (sn.vm_state_size == 0)) {
> +return -EINVAL;
> +}

You can probably stop here already if ret < 0:

ret = ...
if (ret < 0) {
return ret;
} else if (sn.vm_state_size == 0) {
return -EINVAL;
}

I'm not sure about EINVAL here either, but I don't really have a better
suggestion.

> +
> +/* Verify if there is any device that doesn't support snapshots and is
> +writable and check if the requested snapshot is available too. */
>  bs = NULL;
>  while ((bs = bdrv_next(bs))) {
>  
> @@ -1821,64 +1834,46 @@ int load_vmstate(const char *name)
>  error_report("Device '%s' is writable but does not support 
> snapshots.",
> bdrv_get_device_name(bs));
>  return -ENOTSUP;
> +} else {

The then branch has a return, so you don't need the else here and can
have the following code nested one level less.

Looks good otherwise.

Kevin

[Qemu-devel] [PATCH] ide/atapi: add support for GET EVENT STATUS NOTIFICATION

2010-07-19 Thread Aurelien Jarno

The GET EVENT STATUS NOTIFICATION is a mandatory command according
to MMC-3, even if event status notification is not supported.

This patch adds support for this command. It returns NEA ("No Event
Available") with an empty "Supported Event Classes" to show that it
doesn't event support status notification. If asychronous operation is
requested, which requires NCQ support, it returns an error according
to the specifications.

This fixes HAL support on FreeBSD and derivatives, which fill up the
logs every second with:

  acd0: FAILURE - unknown CMD (0x03) ILLEGAL REQUEST asc=0x20 ascq=0x00

Signed-off-by: Aurelien Jarno 
---
 hw/ide/core.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index e20f2e7..9e1bdd5 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1643,6 +1643,21 @@ static void ide_atapi_cmd(IDEState *s)
 ide_atapi_cmd_reply(s, len, max_len);
 break;
 }
+case GPCMD_GET_EVENT_STATUS_NOTIFICATION:
+max_len = ube16_to_cpu(packet + 7);
+
+if (packet[1] & 0x01) { /* polling */
+/* We don't support any event class (yet). */
+cpu_to_ube16(buf, 0x00); /* No event descriptor returned */
+buf[2] = 0x80;   /* No Event Available (NEA) */
+buf[3] = 0x00;   /* Empty supported event classes */
+ide_atapi_cmd_reply(s, 4, max_len);
+} else { /* asynchronous mode */
+/* Only polling is supported, asynchronous mode is not. */
+ide_atapi_cmd_error(s, SENSE_ILLEGAL_REQUEST,
+ASC_INV_FIELD_IN_CMD_PACKET);
+}
+break;
 default:
 ide_atapi_cmd_error(s, SENSE_ILLEGAL_REQUEST,
 ASC_ILLEGAL_OPCODE);
-- 
1.7.1

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 02:06:27PM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 12:15:43PM +0300, Gleb Natapov wrote:
> > That what we are talking about, no? We are trying to find faster way to
> > load kernel/initrd and stay architectural. Honestly I would expect much
> > greater speedup from Richard's approach like 2 seconds vs 8 seconds. It
> > is hard to justify code complication just for 1 second speedup.
> 
> I've no idea where this "8 seconds" comes from.  Total boot time
That was number generated by may random number generator. I was just
trying to say that I would have expected much more gain from copying
kernel/initrd directly into the memory considering how much is going
on during pio string emulation.

> currently is < 8 seconds even without my patch.  My patch takes it
> from 7.5 seconds to 6.5 seconds.
> 
It shows that we are not so bad at emulating pio string operations.

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 12:15:43PM +0300, Gleb Natapov wrote:
> That what we are talking about, no? We are trying to find faster way to
> load kernel/initrd and stay architectural. Honestly I would expect much
> greater speedup from Richard's approach like 2 seconds vs 8 seconds. It
> is hard to justify code complication just for 1 second speedup.

I've no idea where this "8 seconds" comes from.  Total boot time
currently is < 8 seconds even without my patch.  My patch takes it
from 7.5 seconds to 6.5 seconds.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw

Re: [Qemu-devel] TCG vs Dyngen

2010-07-19 Thread Eli L.

> In the old version there is a file ops_mem.h that has proto types for memory
> functions. In the new version this file does not exist anymore which is due to
> the inclusion of tcg instead of dyngen.

I'm not very familiar with the old dyngen system, but I believe the tcg
equivalent to ops_mem.h is softmmu_template.h.

-- 
-Eli


signature.asc
Description: Digital signature

[Qemu-devel] [Bug 607204] Re: New qemu instances often cannot be started if host system is under load

2010-07-19 Thread Guido Winkelmann

Forgot to mention:

The bug is still present in the latest git-version as of this writing.

-- 
New qemu instances often cannot be started if host system is under load
https://bugs.launchpad.net/bugs/607204
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I've got a problem where I cannot start any new VMs with qemu-kvm if the host 
machine is under high CPU load. The problem is not 100% reproducible (it works 
sometimes), but under load conditions, it happens most of the time - roughly 
95%.

I'm usually using libvirt to start and stop KVM VMs. When using virsh to start 
a new VM under those conditions, the output looks like this:

virsh # start testserver-a
error: Failed to start domain testserver-a
error: monitor socket did not show up.: Connection refused

(There is a very long wait after the command has been sent until the error 
message shows up.)

This is (an example of) the command line that libvirtd uses to start up qemu:

- snip -
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root 
QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.12 -enable-kvm -m 256 -smp 
1,sockets=1,cores=1,threads=1 -name testserver-a -uuid 
7cbb3665-4d58-86b8-ce8f-20541995a99c -nodefaults -chardev 
socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/testserver-a.monitor,server,nowait
 -mon chardev=monitor,mode=readline -rtc base=utc -no-acpi -boot c -device 
lsi,id=scsi0,bus=pci.0,addr=0x7 -drive 
file=/data/testserver-a-system.img,if=none,id=drive-scsi0-0-1,boot=on -device 
scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 -drive 
file=/data/testserver-a-data1.img,if=none,id=drive-virtio-disk1 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 
-drive file=/data/testserver-a-data2.img,if=none,id=drive-virtio-disk2 -device 
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,id=virtio-disk2 
-drive 
file=/data/gentoo-install-amd64-minimal-20100408.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on
 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive 
file=/data/testserver-a_configfloppy.img,if=none,id=drive-fdc0-0-0 -global 
isa-fdc.driveA=drive-fdc0-0-0 -device 
e1000,vlan=0,id=net0,mac=52:54:00:84:6d:69,bus=pci.0,addr=0x6 -net 
tap,fd=24,vlan=0,name=hostnet0 -usb -vnc 127.0.0.1:1,password -k de -vga cirrus 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
- snip -

Copy-pasting this to a commandline on the host to start qemu manually leads to 
a non-functional qemu process that "just sits there" with nothing happening. 
The monitor socket /usr/local/var/lib/libvirt/qemu/testserver-a.monitor will, 
indeed, not show up.

I've tried starting qemu with the same commandline but without the parameters 
for redirecting the monitor to a socket, without the fd parameter for the 
network interface and without the vnc parameter. This resulted in a black 
window with the title "QEMU (testserver-a) [Stopped]". I could not access the 
monitor console in graphical mode either. When I press Ctrl-Alt-2 in graphical 
mode to access the monitor console, qemu will sometimes (but not always) crash 
with a segfault about 2 seconds after.

Some experimentation I've done suggests that this problem only happens if the 
high cpu load is caused by another qemu process, not if it is caused by 
something else running on the machine.

The bug appears much less often if I leave off the -nodefaults parameter.

The bug will still appear if I start qemu as qemu-system-x86_64 instead of 
qemu-kvm and replace the -enable-kvm parameter with -no-kvm.

The host machine I'm running this on has got 16 cores in total. It looks like 
it is sufficient for this bug to surface if at least one of these cores is 
brought to near 100% use by a qemu process.

The version of qemu I'm using is qemu-kvm 0.12.4, built from source. Libvirt is 
version 0.8.1, built from source as well. The host OS is Fedora 12. The Kernel 
version is 2.6.32.12-115.fc12.x86_64.

Attached is an strace of attempting to start qemu which I hope will help 
someone with a better understanding of qemu's internals see what's actually 
going on there.

[Qemu-devel] [Bug 607204] [NEW] New qemu instances often cannot be started if host system is under load

2010-07-19 Thread Guido Winkelmann

Public bug reported:

I've got a problem where I cannot start any new VMs with qemu-kvm if the
host machine is under high CPU load. The problem is not 100%
reproducible (it works sometimes), but under load conditions, it happens
most of the time - roughly 95%.

I'm usually using libvirt to start and stop KVM VMs. When using virsh to
start a new VM under those conditions, the output looks like this:

virsh # start testserver-a
error: Failed to start domain testserver-a
error: monitor socket did not show up.: Connection refused

(There is a very long wait after the command has been sent until the
error message shows up.)

This is (an example of) the command line that libvirtd uses to start up
qemu:

- snip -
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root 
QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.12 -enable-kvm -m 256 -smp 
1,sockets=1,cores=1,threads=1 -name testserver-a -uuid 
7cbb3665-4d58-86b8-ce8f-20541995a99c -nodefaults -chardev 
socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/testserver-a.monitor,server,nowait
 -mon chardev=monitor,mode=readline -rtc base=utc -no-acpi -boot c -device 
lsi,id=scsi0,bus=pci.0,addr=0x7 -drive 
file=/data/testserver-a-system.img,if=none,id=drive-scsi0-0-1,boot=on -device 
scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 -drive 
file=/data/testserver-a-data1.img,if=none,id=drive-virtio-disk1 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 
-drive file=/data/testserver-a-data2.img,if=none,id=drive-virtio-disk2 -device 
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,id=virtio-disk2 
-drive 
file=/data/gentoo-install-amd64-minimal-20100408.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on
 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive 
file=/data/testserver-a_configfloppy.img,if=none,id=drive-fdc0-0-0 -global 
isa-fdc.driveA=drive-fdc0-0-0 -device 
e1000,vlan=0,id=net0,mac=52:54:00:84:6d:69,bus=pci.0,addr=0x6 -net 
tap,fd=24,vlan=0,name=hostnet0 -usb -vnc 127.0.0.1:1,password -k de -vga cirrus 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
- snip -

Copy-pasting this to a commandline on the host to start qemu manually
leads to a non-functional qemu process that "just sits there" with
nothing happening. The monitor socket
/usr/local/var/lib/libvirt/qemu/testserver-a.monitor will, indeed, not
show up.

I've tried starting qemu with the same commandline but without the
parameters for redirecting the monitor to a socket, without the fd
parameter for the network interface and without the vnc parameter. This
resulted in a black window with the title "QEMU (testserver-a)
[Stopped]". I could not access the monitor console in graphical mode
either. When I press Ctrl-Alt-2 in graphical mode to access the monitor
console, qemu will sometimes (but not always) crash with a segfault
about 2 seconds after.

Some experimentation I've done suggests that this problem only happens
if the high cpu load is caused by another qemu process, not if it is
caused by something else running on the machine.

The bug appears much less often if I leave off the -nodefaults
parameter.

The bug will still appear if I start qemu as qemu-system-x86_64 instead
of qemu-kvm and replace the -enable-kvm parameter with -no-kvm.

The host machine I'm running this on has got 16 cores in total. It looks
like it is sufficient for this bug to surface if at least one of these
cores is brought to near 100% use by a qemu process.

The version of qemu I'm using is qemu-kvm 0.12.4, built from source.
Libvirt is version 0.8.1, built from source as well. The host OS is
Fedora 12. The Kernel version is 2.6.32.12-115.fc12.x86_64.

Attached is an strace of attempting to start qemu which I hope will help
someone with a better understanding of qemu's internals see what's
actually going on there.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
New qemu instances often cannot be started if host system is under load
https://bugs.launchpad.net/bugs/607204
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I've got a problem where I cannot start any new VMs with qemu-kvm if the host 
machine is under high CPU load. The problem is not 100% reproducible (it works 
sometimes), but under load conditions, it happens most of the time - roughly 
95%.

I'm usually using libvirt to start and stop KVM VMs. When using virsh to start 
a new VM under those conditions, the output looks like this:

virsh # start testserver-a
error: Failed to start domain testserver-a
error: monitor socket did not show up.: Connection refused

(There is a very long wait after the command has been sent until the error 
message shows up.)

This is (an example of) the command line that libvirtd uses to start up qemu:

- snip -
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root

[Qemu-devel] [Bug 607204] Re: New qemu instances often cannot be started if host system is under load

2010-07-19 Thread Guido Winkelmann


** Attachment added: "strace output"
   http://launchpadlibrarian.net/52160914/qemu-kvm-bug-strace

-- 
New qemu instances often cannot be started if host system is under load
https://bugs.launchpad.net/bugs/607204
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I've got a problem where I cannot start any new VMs with qemu-kvm if the host 
machine is under high CPU load. The problem is not 100% reproducible (it works 
sometimes), but under load conditions, it happens most of the time - roughly 
95%.

I'm usually using libvirt to start and stop KVM VMs. When using virsh to start 
a new VM under those conditions, the output looks like this:

virsh # start testserver-a
error: Failed to start domain testserver-a
error: monitor socket did not show up.: Connection refused

(There is a very long wait after the command has been sent until the error 
message shows up.)

This is (an example of) the command line that libvirtd uses to start up qemu:

- snip -
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root 
QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.12 -enable-kvm -m 256 -smp 
1,sockets=1,cores=1,threads=1 -name testserver-a -uuid 
7cbb3665-4d58-86b8-ce8f-20541995a99c -nodefaults -chardev 
socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/testserver-a.monitor,server,nowait
 -mon chardev=monitor,mode=readline -rtc base=utc -no-acpi -boot c -device 
lsi,id=scsi0,bus=pci.0,addr=0x7 -drive 
file=/data/testserver-a-system.img,if=none,id=drive-scsi0-0-1,boot=on -device 
scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 -drive 
file=/data/testserver-a-data1.img,if=none,id=drive-virtio-disk1 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 
-drive file=/data/testserver-a-data2.img,if=none,id=drive-virtio-disk2 -device 
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,id=virtio-disk2 
-drive 
file=/data/gentoo-install-amd64-minimal-20100408.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on
 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive 
file=/data/testserver-a_configfloppy.img,if=none,id=drive-fdc0-0-0 -global 
isa-fdc.driveA=drive-fdc0-0-0 -device 
e1000,vlan=0,id=net0,mac=52:54:00:84:6d:69,bus=pci.0,addr=0x6 -net 
tap,fd=24,vlan=0,name=hostnet0 -usb -vnc 127.0.0.1:1,password -k de -vga cirrus 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
- snip -

Copy-pasting this to a commandline on the host to start qemu manually leads to 
a non-functional qemu process that "just sits there" with nothing happening. 
The monitor socket /usr/local/var/lib/libvirt/qemu/testserver-a.monitor will, 
indeed, not show up.

I've tried starting qemu with the same commandline but without the parameters 
for redirecting the monitor to a socket, without the fd parameter for the 
network interface and without the vnc parameter. This resulted in a black 
window with the title "QEMU (testserver-a) [Stopped]". I could not access the 
monitor console in graphical mode either. When I press Ctrl-Alt-2 in graphical 
mode to access the monitor console, qemu will sometimes (but not always) crash 
with a segfault about 2 seconds after.

Some experimentation I've done suggests that this problem only happens if the 
high cpu load is caused by another qemu process, not if it is caused by 
something else running on the machine.

The bug appears much less often if I leave off the -nodefaults parameter.

The bug will still appear if I start qemu as qemu-system-x86_64 instead of 
qemu-kvm and replace the -enable-kvm parameter with -no-kvm.

The host machine I'm running this on has got 16 cores in total. It looks like 
it is sufficient for this bug to surface if at least one of these cores is 
brought to near 100% use by a qemu process.

The version of qemu I'm using is qemu-kvm 0.12.4, built from source. Libvirt is 
version 0.8.1, built from source as well. The host OS is Fedora 12. The Kernel 
version is 2.6.32.12-115.fc12.x86_64.

Attached is an strace of attempting to start qemu which I hope will help 
someone with a better understanding of qemu's internals see what's actually 
going on there.

[Qemu-devel] Re: [PATCH] Create USB buses and devices based on USB version.

2010-07-19 Thread David S. Ahern

On 07/14/10 11:56, David Ahern wrote:
> Create USB buses and devices based on USB version.
> 
> Signed-off-by: David Ahern 

ping.

Relative to current code this addresses a number of FIXME's by assigning
USB devices to a specific bus. It is also groundwork for adding ehci.

David

> ---
>  hw/usb-bus.c|   70 
> ---
>  hw/usb-msd.c|2 +-
>  hw/usb-net.c|2 +-
>  hw/usb-ohci.c   |2 +-
>  hw/usb-serial.c |4 +-
>  hw/usb-uhci.c   |2 +-
>  hw/usb.h|7 +++--
>  usb-bsd.c   |2 +-
>  usb-linux.c |2 +-
>  9 files changed, 63 insertions(+), 30 deletions(-)
> 
> diff --git a/hw/usb-bus.c b/hw/usb-bus.c
> index b692503..f4849f8 100644
> --- a/hw/usb-bus.c
> +++ b/hw/usb-bus.c
> @@ -4,6 +4,12 @@
>  #include "sysemu.h"
>  #include "monitor.h"
>  
> +enum {
> +USB_VERSION_NONE,
> +USB_VERSION_1_1,
> +};
> +
> +
>  static void usb_bus_dev_print(Monitor *mon, DeviceState *qdev, int indent);
>  
>  static struct BusInfo usb_bus_info = {
> @@ -14,27 +20,46 @@ static struct BusInfo usb_bus_info = {
>  static int next_usb_bus = 0;
>  static QTAILQ_HEAD(, USBBus) busses = QTAILQ_HEAD_INITIALIZER(busses);
>  
> -void usb_bus_new(USBBus *bus, DeviceState *host)
> +static void usb_bus_new(USBBus *bus, int version, DeviceState *host)
>  {
>  qbus_create_inplace(&bus->qbus, &usb_bus_info, host, NULL);
>  bus->busnr = next_usb_bus++;
>  bus->qbus.allow_hotplug = 1; /* Yes, we can */
> +bus->version = version;
>  QTAILQ_INIT(&bus->free);
>  QTAILQ_INIT(&bus->used);
>  QTAILQ_INSERT_TAIL(&busses, bus, next);
>  }
>  
> -USBBus *usb_bus_find(int busnr)
> +void usb_bus_new_v1(USBBus *bus, DeviceState *host)
> +{
> +usb_bus_new(bus, USB_VERSION_1_1, host);
> +}
> +
> +static USBBus *usb_bus_find(int busnr)
> +{
> +USBBus *bus;
> +
> +QTAILQ_FOREACH(bus, &busses, next) {
> +if (bus->busnr == busnr) {
> +break;
> +}
> +}
> +
> +return bus;
> +}
> +
> +/* device creation should be using this one */
> +USBBus *usb_bus_find_version(int version)
>  {
>  USBBus *bus;
>  
> -if (-1 == busnr)
> -return QTAILQ_FIRST(&busses);
>  QTAILQ_FOREACH(bus, &busses, next) {
> -if (bus->busnr == busnr)
> -return bus;
> +if (bus->version == version) {
> +break;
> +}
>  }
> -return NULL;
> +return bus;
>  }
>  
>  static int usb_qdev_init(DeviceState *qdev, DeviceInfo *base)
> @@ -80,20 +105,15 @@ void usb_qdev_register_many(USBDeviceInfo *info)
>  }
>  }
>  
> -USBDevice *usb_create(USBBus *bus, const char *name)
> +static USBDevice *usb_create(USBBus *bus, const char *name)
>  {
>  DeviceState *dev;
>  
> -#if 1
> -/* temporary stopgap until all usb is properly qdev-ified */
>  if (!bus) {
> -bus = usb_bus_find(-1);
> -if (!bus)
> -return NULL;
> -fprintf(stderr, "%s: no bus specified, using \"%s\" for \"%s\"\n",
> -__FUNCTION__, bus->qbus.name, name);
> +fprintf(stderr, "%s: no bus specified for \"%s\"\n",
> +__FUNCTION__, name);
> +return NULL;
>  }
> -#endif
>  
>  dev = qdev_create(&bus->qbus, name);
>  return DO_UPCAST(USBDevice, qdev, dev);
> @@ -101,7 +121,13 @@ USBDevice *usb_create(USBBus *bus, const char *name)
>  
>  USBDevice *usb_create_simple(USBBus *bus, const char *name)
>  {
> -USBDevice *dev = usb_create(bus, name);
> +USBDevice *dev;
> +
> +/* if bus not given default to USB 1.1 */
> +if (!bus)
> +bus = usb_bus_find_version(USB_VERSION_1_1);
> +
> +dev = usb_create(bus, name);
>  if (!dev) {
>  hw_error("Failed to create USB device '%s'\n", name);
>  }
> @@ -109,6 +135,13 @@ USBDevice *usb_create_simple(USBBus *bus, const char 
> *name)
>  return dev;
>  }
>  
> +/* create USB device attached to USB 1.1 controller */
> +USBDevice *usb_create_v1(const char *name)
> +{
> +USBBus *bus = usb_bus_find_version(USB_VERSION_1_1);
> +return usb_create(bus, name);
> +}
> +
>  void usb_register_port(USBBus *bus, USBPort *port, void *opaque, int index,
> usb_attachfn attach)
>  {
> @@ -260,7 +293,6 @@ void usb_info(Monitor *mon)
>  /* handle legacy -usbdevice cmd line option */
>  USBDevice *usbdevice_create(const char *cmdline)
>  {
> -USBBus *bus = usb_bus_find(-1 /* any */);
>  DeviceInfo *info;
>  USBDeviceInfo *usb;
>  char driver[32];
> @@ -302,7 +334,7 @@ USBDevice *usbdevice_create(const char *cmdline)
>  error_report("usbdevice %s accepts no params", driver);
>  return NULL;
>  }
> -return usb_create_simple(bus, usb->qdev.name);
> +return usb_create_simple(NULL, usb->qdev.name);
>  }
>  return usb->usbdevice_init(params);
>  }
> diff --git a/hw/usb-msd.c b/hw/usb-msd.c
> index 65e9624..d8b68f7 100644
> --- a/hw/usb

[Qemu-devel] Re: [PATCH] block: Use error codes from lower levels for error message

2010-07-19 Thread Kevin Wolf

Am 18.07.2010 21:42, schrieb Stefan Weil:
> "No such file or directory" is a misleading error message
> when a user tries to open a file with wrong permissions.
> 
> Cc: Kevin Wolf 
> Signed-off-by: Stefan Weil 
> ---
>  block.c |   12 
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index f837876..2f80540 100644
> --- a/block.c
> +++ b/block.c
> @@ -330,16 +330,20 @@ BlockDriver *bdrv_find_protocol(const char *filename)
>  return NULL;
>  }
>  
> -static BlockDriver *find_image_format(const char *filename)
> +static BlockDriver *find_image_format(const char *filename, int *error)

Wouldn't it be a more natural interface to return an 0/-errno int and
pass the BlockDriver* by reference? I think we already have some
function that work this way in the block code, but I can't remember any
that get an int *error.

>  {
>  int ret, score, score_max;
>  BlockDriver *drv1, *drv;
>  uint8_t buf[2048];
>  BlockDriverState *bs;
>  
> +*error = -ENOENT;

Why -ENOENT is the default would be clearer if you moved it down next to
the drv = NULL before the loop that searches for the driver.

Apart from these minor nitpicks it looks good.

Kevin

[Qemu-devel] [PATCH 2/2 version 3] fw_cfg: Implement fast "DMA"-type operation for rapidly copying in kernel, initrd [etc] into the guest

2010-07-19 Thread Richard W.M. Jones


This version adds polling for DMA done.

Part 1/2 (the fix for e->callback == NULL) is the same as always so
I won't attach it.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://et.redhat.com/~rjones/libguestfs/
See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html
>From 449eeef3dedb5612fe4f408835bbd926643d35f8 Mon Sep 17 00:00:00 2001
From: Richard Jones 
Date: Sat, 17 Jul 2010 14:30:46 +0100
Subject: [PATCH 2/2] fw_cfg: Allow guest to read kernel etc via fast, 
synchronous "DMA"-type operation.

This adds a "DMA" operation for rapidly copying the kernel, initrd
etc into the guest.

The guest sets up a DMA address and size and then issues the usual
read operation but with the FW_CFG_DMA bit set on the entry number.
QEmu then just copies the whole config entry to the selected physical
address synchronously.  The guest should poll until this "DMA"
operation is done, allowing us to write an alternate asynchronous
version in future should that be necessary.

This saves some time when loading large images.

This change is backwards compatible.  ROMs using the old method will
work unchanged.

Signed-off-by: Richard W.M. Jones 
---
 hw/fw_cfg.c   |   35 +-
 hw/fw_cfg.h   |   10 +++-
 pc-bios/optionrom/linuxboot.S |8 +++---
 pc-bios/optionrom/optionrom.h |   42 +
 4 files changed, 88 insertions(+), 7 deletions(-)

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index 37e6f1f..383586f 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -55,6 +55,13 @@ struct FWCfgState {
 uint32_t cur_offset;
 };
 
+/* Target address and size for DMA operations.  This is only used
+ * during boot and across 32 and 64 bit architectures, so only writes
+ * to lower 4GB addresses are supported.
+ */
+static uint32_t dma_addr = 0;
+static uint32_t dma_size = 0;
+
 static void fw_cfg_write(FWCfgState *s, uint8_t value)
 {
 int arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL);
@@ -98,7 +105,22 @@ static uint8_t fw_cfg_read(FWCfgState *s)
 
 if (s->cur_entry == FW_CFG_INVALID || !e->data || s->cur_offset >= e->len)
 ret = 0;
-else
+else if (s->cur_entry & FW_CFG_DMA) {
+if (dma_size > e->len - s->cur_offset)
+dma_size = e->len - s->cur_offset;
+
+cpu_physical_memory_write ((target_phys_addr_t) dma_addr,
+   &e->data[s->cur_offset],
+   dma_size);
+s->cur_offset += e->len;
+/* Returns 0 if there was an error, 1 if the DMA operation
+ * started.  Callers *must* poll for completion of the
+ * operation (waiting for FW_CFG_DMA_DONE == 0), even though
+ * in the current implementation the operation completes
+ * instantaneously from the p.o.v of the current guest vCPU.
+ */
+ret = 1;
+} else
 ret = e->data[s->cur_offset++];
 
 FW_CFG_DPRINTF("read %d\n", ret);
@@ -352,6 +374,17 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t 
data_port,
 fw_cfg_add_i16(s, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu);
 
+fw_cfg_add_bytes(s, FW_CFG_DMA_ADDR | FW_CFG_WRITE_CHANNEL,
+ (uint8_t *)&dma_addr, sizeof dma_addr);
+fw_cfg_add_bytes(s, FW_CFG_DMA_SIZE | FW_CFG_WRITE_CHANNEL,
+ (uint8_t *)&dma_size, sizeof dma_size);
+/* Current implementation is synchronous, so this value always reads
+ * as 0 (meaning "done").  In other possible implementations, this
+ * could return > 0 indicating that the caller should continue polling
+ * for completion of the operation.
+ */
+fw_cfg_add_i32(s, FW_CFG_DMA_DONE, 0);
+
 return s;
 }
 
diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index 4d13a4f..abc41c9 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -30,11 +30,17 @@
 
 #define FW_CFG_FILE_FIRST   0x20
 #define FW_CFG_FILE_SLOTS   0x10
-#define FW_CFG_MAX_ENTRY(FW_CFG_FILE_FIRST+FW_CFG_FILE_SLOTS)
 
+#define FW_CFG_DMA_ADDR 0x30
+#define FW_CFG_DMA_SIZE 0x31
+#define FW_CFG_DMA_DONE 0x32
+
+#define FW_CFG_MAX_ENTRY(FW_CFG_DMA_DONE+1)
+
+#define FW_CFG_DMA  0x2000
 #define FW_CFG_WRITE_CHANNEL0x4000
 #define FW_CFG_ARCH_LOCAL   0x8000
-#define FW_CFG_ENTRY_MASK   ~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL)
+#define FW_CFG_ENTRY_MASK   ~(FW_CFG_DMA | FW_CFG_WRITE_CHANNEL | 
FW_CFG_ARCH_LOCAL)
 
 #define FW_CFG_INVALID  0x
 
diff --git a/pc-bios/optionrom/linuxboot.S b/pc-bios/optionrom/linuxboot.S
index c109363..dbf44cb 100644
--- a/pc-bios/optionrom/linuxboot.S
+++ b/pc-bios/optionrom/linuxboot.S
@@ -106,10 +106,10 @@ copy_kernel:
/* We're now running in 16-bit CS, but 32-bit ES! */
 
/*

Re: [Qemu-devel] [PATCH] block migraton: check sectors before shift operation.

2010-07-19 Thread Yoshiaki Tamura

2010/7/19 Kevin Wolf :
> Am 19.07.2010 06:45, schrieb Yoshiaki Tamura:
>> Commit d246673dcb9911218ff555bcdf28b250e38fa46c has expanded the types
>> of block drive that can be initialized for block migration.  Although
>> bdrv_getlength() may return < 0, current code shifts it without
>> checking.  This makes block migration initialization invalid and
>> results in abort() due to calling qemu_malloc() with 0 size at
>> bdrv_set_dirty_tracking().  This patch checks the return value of
>> bdrv_getlength() by masking with BDRV_SECTOR_MASK.
>>
>> Signed-off-by: Yoshiaki Tamura 
>
> I applied a similar patch by Shahar Havivi to the block branch a few
> days ago.

Oops.  Missed that discussion.

Yoshi

>
> Kevin
>
>

[Qemu-devel] Re: [PATCH 0/2 version 2] fw_cfg: Implement fast "DMA"-type operation for rapidly copying in kernel, initrd [etc] into the guest

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 11:49:09AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 01:45:00PM +0300, Gleb Natapov wrote:
> > On Mon, Jul 19, 2010 at 11:15:04AM +0100, Richard W.M. Jones wrote:
> > > 
> > > This is the second version of the patch.
> > > 
> > > We don't use the word "blit" any more, instead this is replaced with
> > > "DMA", even though it's not quite like a DMA operation on physical
> > > hardware.
> > > 
> > You ignored the whole discussion above. Calling things DMA will not make
> > them so. You haven't event implemented Alexander's suggestion to poll
> > for DMA completion which will at lease make the interface to the guest
> > palatable.
> 
> I read everything in the discussion.
> 
> I can add polling however.
> 
If copying (call to cpu_physical_memory_write()) really takes 6 or more
seconds we should really make it async from the beginning. (If we are going
this way at all. I prefer to use virtio-serial for such complex gust/host
communication. fw_cfg was designed to be simple at should stay so. And
it is used by other arches too so any extension should be usable there).

--
Gleb.

[Qemu-devel] Re: [PATCH 0/2 version 2] fw_cfg: Implement fast "DMA"-type operation for rapidly copying in kernel, initrd [etc] into the guest

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 01:45:00PM +0300, Gleb Natapov wrote:
> On Mon, Jul 19, 2010 at 11:15:04AM +0100, Richard W.M. Jones wrote:
> > 
> > This is the second version of the patch.
> > 
> > We don't use the word "blit" any more, instead this is replaced with
> > "DMA", even though it's not quite like a DMA operation on physical
> > hardware.
> > 
> You ignored the whole discussion above. Calling things DMA will not make
> them so. You haven't event implemented Alexander's suggestion to poll
> for DMA completion which will at lease make the interface to the guest
> palatable.

I read everything in the discussion.

I can add polling however.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw

[Qemu-devel] Re: [PATCH 0/2 version 2] fw_cfg: Implement fast "DMA"-type operation for rapidly copying in kernel, initrd [etc] into the guest

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 11:15:04AM +0100, Richard W.M. Jones wrote:
> 
> This is the second version of the patch.
> 
> We don't use the word "blit" any more, instead this is replaced with
> "DMA", even though it's not quite like a DMA operation on physical
> hardware.
> 
You ignored the whole discussion above. Calling things DMA will not make
them so. You haven't event implemented Alexander's suggestion to poll
for DMA completion which will at lease make the interface to the guest
palatable.
 
> The guest writes the physical address and size to two 32 bit fw_cfg
> variables.  Then when the guest issues an ordinary read operation with
> the extra FW_CFG_DMA flag set, instead of returning a single byte,
> qemu "DMA"s the requested data into the guest memory.
> 
> The guest shouldn't be able to request a dma_size larger than the
> amount of data in the entry.  The patch checks this and adjusts
> dma_size.
> 
> The guest might select a dma_addr which does not correspond to
> physical memory (or dma_addr + dma_size).  Reading the code it seems
> to be that cpu_physical_memory_write catches this case and will
> abort() (so the guest is only harming itself).  However I'd quite like
> an expert opinion on this ...
> 
> Rich.
> 
> -- 
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> virt-top is 'top' for virtual machines.  Tiny program with many
> powerful monitoring features, net stats, disk stats, logging, etc.
> http://et.redhat.com/~rjones/virt-top

--
Gleb.

[Qemu-devel] [PATCH 2/2 version 2] fw_cfg: Allow guest to read kernel etc via fast, synchronous "DMA"-type operation.

2010-07-19 Thread Richard W.M. Jones


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/
>From 55d4700262253da52aa403dc1ba68da2ae91b084 Mon Sep 17 00:00:00 2001
From: Richard Jones 
Date: Sat, 17 Jul 2010 14:30:46 +0100
Subject: [PATCH 2/2] fw_cfg: Allow guest to read kernel etc via fast, 
synchronous "DMA"-type operation.

This adds a "DMA" operation for rapidly copying the kernel, initrd
etc into the guest.

The guest sets up a DMA address and size and then issues the usual
read operation but with the FW_CFG_DMA bit set on the entry number.
QEmu then just copies the whole config entry to the selected physical
address synchronously.

This saves some time when loading large images.

This change is backwards compatible.  ROMs using the old method will
work unchanged.

Signed-off-by: Richard W.M. Jones 
---
 hw/fw_cfg.c   |   22 +-
 hw/fw_cfg.h   |8 ++--
 pc-bios/optionrom/linuxboot.S |8 
 pc-bios/optionrom/optionrom.h |   37 +
 4 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index 37e6f1f..798a332 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -55,6 +55,13 @@ struct FWCfgState {
 uint32_t cur_offset;
 };
 
+/* Target address and size for DMA operations.  This is only used
+ * during boot and across 32 and 64 bit architectures, so only writes
+ * to lower 4GB addresses are supported.
+ */
+static uint32_t dma_addr = 0;
+static uint32_t dma_size = 0;
+
 static void fw_cfg_write(FWCfgState *s, uint8_t value)
 {
 int arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL);
@@ -98,7 +105,16 @@ static uint8_t fw_cfg_read(FWCfgState *s)
 
 if (s->cur_entry == FW_CFG_INVALID || !e->data || s->cur_offset >= e->len)
 ret = 0;
-else
+else if (s->cur_entry & FW_CFG_DMA) {
+if (dma_size > e->len - s->cur_offset)
+dma_size = e->len - s->cur_offset;
+
+cpu_physical_memory_write ((target_phys_addr_t) dma_addr,
+   &e->data[s->cur_offset],
+   dma_size);
+s->cur_offset += e->len;
+ret = 1;
+} else
 ret = e->data[s->cur_offset++];
 
 FW_CFG_DPRINTF("read %d\n", ret);
@@ -351,6 +367,10 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t 
data_port,
 fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
 fw_cfg_add_i16(s, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu);
+fw_cfg_add_bytes(s, FW_CFG_DMA_ADDR | FW_CFG_WRITE_CHANNEL,
+ (uint8_t *)&dma_addr, sizeof dma_addr);
+fw_cfg_add_bytes(s, FW_CFG_DMA_SIZE | FW_CFG_WRITE_CHANNEL,
+ (uint8_t *)&dma_size, sizeof dma_size);
 
 return s;
 }
diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index 4d13a4f..44b2be5 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -30,11 +30,15 @@
 
 #define FW_CFG_FILE_FIRST   0x20
 #define FW_CFG_FILE_SLOTS   0x10
-#define FW_CFG_MAX_ENTRY(FW_CFG_FILE_FIRST+FW_CFG_FILE_SLOTS)
 
+#define FW_CFG_DMA_ADDR 0x30
+#define FW_CFG_DMA_SIZE 0x31
+#define FW_CFG_MAX_ENTRY(FW_CFG_DMA_SIZE+1)
+
+#define FW_CFG_DMA  0x2000
 #define FW_CFG_WRITE_CHANNEL0x4000
 #define FW_CFG_ARCH_LOCAL   0x8000
-#define FW_CFG_ENTRY_MASK   ~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL)
+#define FW_CFG_ENTRY_MASK   ~(FW_CFG_DMA | FW_CFG_WRITE_CHANNEL | 
FW_CFG_ARCH_LOCAL)
 
 #define FW_CFG_INVALID  0x
 
diff --git a/pc-bios/optionrom/linuxboot.S b/pc-bios/optionrom/linuxboot.S
index c109363..dbf44cb 100644
--- a/pc-bios/optionrom/linuxboot.S
+++ b/pc-bios/optionrom/linuxboot.S
@@ -106,10 +106,10 @@ copy_kernel:
/* We're now running in 16-bit CS, but 32-bit ES! */
 
/* Load kernel and initrd */
-   read_fw_blob_addr32(FW_CFG_KERNEL)
-   read_fw_blob_addr32(FW_CFG_INITRD)
-   read_fw_blob_addr32(FW_CFG_CMDLINE)
-   read_fw_blob_addr32(FW_CFG_SETUP)
+   read_fw_blob_dma(FW_CFG_KERNEL)
+   read_fw_blob_dma(FW_CFG_INITRD)
+   read_fw_blob_dma(FW_CFG_CMDLINE)
+   read_fw_blob_dma(FW_CFG_SETUP)
 
/* And now jump into Linux! */
mov $0, %eax
diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h
index fbdd48a..7fffe2d 100644
--- a/pc-bios/optionrom/optionrom.h
+++ b/pc-bios/optionrom/optionrom.h
@@ -50,6 +50,27 @@
bswap   %eax
 .endm
 
+/*
+ * Write %eax to a variable in the fw_cfg device.
+ * In:  %eax
+ * Clobbers:   %edx
+ */
+.macro write_fw VAR
+push%eax
+mov $(\VAR|FW_CFG_WRITE_CHANNEL), %ax
+mov $BIOS_CFG_IOPORT_CFG, %dx
+outw%ax, (%dx)
+pop %eax

[Qemu-devel] [PATCH 1/2 version 2] Don't call fw_cfg e->callback if e->callback is NULL.

2010-07-19 Thread Richard W.M. Jones


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://et.redhat.com/~rjones/libguestfs/
See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html
>From 1fe39da6476a6ff05e9705cd7f63f94c93746053 Mon Sep 17 00:00:00 2001
From: Richard Jones 
Date: Sat, 17 Jul 2010 14:23:21 +0100
Subject: [PATCH 1/2] Don't call fw_cfg e->callback if e->callback is NULL.

If you set up a fw_cfg writable entry without a callback, then
e->callback is still called, causing qemu to segfault.

Luckily since nothing in qemu uses writable entries at the moment,
this is not exploitable.

Signed-off-by: Richard W.M. Jones 
---
 hw/fw_cfg.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index 72866ae..37e6f1f 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -65,7 +65,8 @@ static void fw_cfg_write(FWCfgState *s, uint8_t value)
 if (s->cur_entry & FW_CFG_WRITE_CHANNEL && s->cur_offset < e->len) {
 e->data[s->cur_offset++] = value;
 if (s->cur_offset == e->len) {
-e->callback(e->callback_opaque, e->data);
+if (e->callback)
+e->callback(e->callback_opaque, e->data);
 s->cur_offset = 0;
 }
 }
-- 
1.7.1

[Qemu-devel] [PATCH 0/2 version 2] fw_cfg: Implement fast "DMA"-type operation for rapidly copying in kernel, initrd [etc] into the guest

2010-07-19 Thread Richard W.M. Jones


This is the second version of the patch.

We don't use the word "blit" any more, instead this is replaced with
"DMA", even though it's not quite like a DMA operation on physical
hardware.

The guest writes the physical address and size to two 32 bit fw_cfg
variables.  Then when the guest issues an ordinary read operation with
the extra FW_CFG_DMA flag set, instead of returning a single byte,
qemu "DMA"s the requested data into the guest memory.

The guest shouldn't be able to request a dma_size larger than the
amount of data in the entry.  The patch checks this and adjusts
dma_size.

The guest might select a dma_addr which does not correspond to
physical memory (or dma_addr + dma_size).  Reading the code it seems
to be that cpu_physical_memory_write catches this case and will
abort() (so the guest is only harming itself).  However I'd quite like
an expert opinion on this ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

Re: [Qemu-devel] [PATCH v2] block: Change bdrv_commit to handle multiple sectors at once

2010-07-19 Thread Stefan Hajnoczi

On Mon, Jul 19, 2010 at 10:59 AM, Kevin Wolf  wrote:
> bdrv_commit copies the image to its backing file sector by sector, which
> is (surprise!) relatively slow. Let's take a larger buffer and handle more
> sectors at once if possible.
>
> With a 1G qcow2 file, this brought the time bdrv_commit takes down from
> 5:06 min to 1:14 min for me.
>
> Signed-off-by: Kevin Wolf 
> ---

Looks good.

Stefan

[Qemu-devel] [PATCH v2] block: Change bdrv_commit to handle multiple sectors at once

2010-07-19 Thread Kevin Wolf

bdrv_commit copies the image to its backing file sector by sector, which
is (surprise!) relatively slow. Let's take a larger buffer and handle more
sectors at once if possible.

With a 1G qcow2 file, this brought the time bdrv_commit takes down from
5:06 min to 1:14 min for me.

Signed-off-by: Kevin Wolf 
---
 block.c |   39 ---
 1 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/block.c b/block.c
index 65cf4dc..6e3fc49 100644
--- a/block.c
+++ b/block.c
@@ -724,14 +724,16 @@ int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res)
 return bs->drv->bdrv_check(bs, res);
 }
 
+#define COMMIT_BUF_SECTORS 2048
+
 /* commit COW file into the raw image */
 int bdrv_commit(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
-int64_t i, total_sectors;
-int n, j, ro, open_flags;
+int64_t sector, total_sectors;
+int n, ro, open_flags;
 int ret = 0, rw_ret = 0;
-unsigned char sector[BDRV_SECTOR_SIZE];
+uint8_t *buf;
 char filename[1024];
 BlockDriverState *bs_rw, *bs_ro;
 
@@ -774,22 +776,20 @@ int bdrv_commit(BlockDriverState *bs)
 }
 
 total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
-for (i = 0; i < total_sectors;) {
-if (drv->bdrv_is_allocated(bs, i, 65536, &n)) {
-for(j = 0; j < n; j++) {
-if (bdrv_read(bs, i, sector, 1) != 0) {
-ret = -EIO;
-goto ro_cleanup;
-}
+buf = qemu_malloc(COMMIT_BUF_SECTORS * BDRV_SECTOR_SIZE);
 
-if (bdrv_write(bs->backing_hd, i, sector, 1) != 0) {
-ret = -EIO;
-goto ro_cleanup;
-}
-i++;
-   }
-   } else {
-i += n;
+for (sector = 0; sector < total_sectors; sector += n) {
+if (drv->bdrv_is_allocated(bs, sector, COMMIT_BUF_SECTORS, &n)) {
+
+if (bdrv_read(bs, sector, buf, n) != 0) {
+ret = -EIO;
+goto ro_cleanup;
+}
+
+if (bdrv_write(bs->backing_hd, sector, buf, n) != 0) {
+ret = -EIO;
+goto ro_cleanup;
+}
 }
 }
 
@@ -806,6 +806,7 @@ int bdrv_commit(BlockDriverState *bs)
 bdrv_flush(bs->backing_hd);
 
 ro_cleanup:
+qemu_free(buf);
 
 if (ro) {
 /* re-open as RO */
-- 
1.7.1.1

Re: [Qemu-devel] [PATCH] block: Change bdrv_commit to handle multiple sectors at once

2010-07-19 Thread Kevin Wolf

Am 16.07.2010 20:16, schrieb Christoph Hellwig:
> On Fri, Jul 16, 2010 at 06:17:36PM +0200, Kevin Wolf wrote:
>> +buf = qemu_malloc(2048 * BDRV_SECTOR_SIZE);
> 
> Please add a COMMIT_BUF_SIZE #define instead of the hardcoded 2048 in
> various places.
> 
>>  for (i = 0; i < total_sectors;) {
>> +if (drv->bdrv_is_allocated(bs, i, 2048, &n)) {
>>  
>> +if (bdrv_read(bs, i, buf, n) != 0) {
>> +ret = -EIO;
>> +goto ro_cleanup;
>> +}
>> +
>> +if (bdrv_write(bs->backing_hd, i, buf, n) != 0) {
>> +ret = -EIO;
>> +goto ro_cleanup;
>> +}
>>  }
>> +i += n;
> 
> Maybe it's just me, but I'd prefer n getting a more descriptive name
> (e.g. sector) and moving the increment of it into the for loop, e.g.
> 
>   for (sector = 0; sector < total_sectors; sector += n) {
>   if (!drv->bdrv_is_allocated(bs, i, 2048, &n))
>   continue;
>   ...
>   }

So you mean i should be renamed, not n, right? I'll change these points
in v2.

Kevin

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 11:21:34AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 11:19, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 11:13:38AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 11:10, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote:
>  
>  On 19.07.2010, at 11:00, Gleb Natapov wrote:
>  
> > On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 10:48, Gleb Natapov wrote:
> >> 
> >>> 
>  Were there DMA capable devices back in ISA times? There must be. If 
>  so, we can just take a look at what they do and do it similarly. Bus 
>  mastering was a new thing for PCI, right?
>  
> >>> I think IDE can be considered DMA capable ISA device, no? At least
> >>> it works by writing to PIO ports and getting result into memory, but
> >>> with interrupts and status bits and everything that real device should
> >>> have. On board DMA engine is also ISA device. 
> >> 
> >> We could define our device to be polling. So all we need is a status 
> >> bit that the guest sets when it starts the DMA and the device unsets 
> >> when the DMA is done. In our case that should be immediate, because 
> >> the PIO invokes the full code paths, but it would look more like a 
> >> real device, no?
> >> 
> > This is better, but it shouldn't be synchronous. Kernel and initrd are
> > on disk so why not setup aio and read them from io thread allowing vcpu
> > thread immediately return to guest mode to process interrupts.
>  
>  That would work with the above described device model. If we're going 
>  synchronous or asynchronous would become an implementation detail.
>  
> >>> If vcpu thread will sleep for too much time without processing events we 
> >>> can see strange timeouts in a guest.
> >> 
> >> I don't think I understand what you mean?
> >> 
> > Vcpu executes "in %ax". Next instruction is executed 6 seconds later.
> > All timers that should have been processed during this time fire at the
> > same moment triggering all kind of timeouts. Think about watchdog that
> > should be written into every two seconds otherwise it does reset.
> 
> That's a hypervisor implementation detail! If we want to go synchronously, we 
> do. If something breaks, we don't.
It is. And it is a bug in the interface that we knowingly introduce. Do
we want to do that?

> Doing it synchronously simpllifies things a lot. And we're talking about a 
> device that's only used before the OS kicks in. There's no use in pretending 
> we're running a watchdog there.
On sane (embedded) HW that uses watchdog firmware tickle it too.
We do not want to stuck in firmware. Actually sane watchdog can't be
stopped after it is started. I see a compelling use case for watchdog
support in seabios.

And Seabios nowadays is complicated an runs a lot of code that use
interrupts.

> 
> > 
> >>> 
> > Or why
> > not use virtio-serial while we are at it? After all virtio-serial is
> > there to allow host and guest communication.
>  
>  Because virtio-serial needs us to set up the full virtio-pci stack. 
>  That's too much to mess with in an option rom IMHO.
>  
> >>> We already do it for virtio-blk. Read only support is very small in
> >>> LOC there. Don't know about virtio-serial protocol.
> >> 
> >> The virtio-blk model uses the whole pxe framework. For our in-tree option 
> >> roms we're trying to be simple. And I'd like to keep it that way. I really 
> >> don't want to add PCI enumeration and BAR setup to that code.
> >> 
> > The virtio-blk is entirely in seabios and does not use pxe at all!
> 
> So it uses even more framework :). The linuxboot stuff is completely separate 
> in its very own option rom.
> 
How this option rom is loaded?

--
Gleb.

Re: [Qemu-devel] [PATCH] block migraton: check sectors before shift operation.

2010-07-19 Thread Kevin Wolf

Am 19.07.2010 06:45, schrieb Yoshiaki Tamura:
> Commit d246673dcb9911218ff555bcdf28b250e38fa46c has expanded the types
> of block drive that can be initialized for block migration.  Although
> bdrv_getlength() may return < 0, current code shifts it without
> checking.  This makes block migration initialization invalid and
> results in abort() due to calling qemu_malloc() with 0 size at
> bdrv_set_dirty_tracking().  This patch checks the return value of
> bdrv_getlength() by masking with BDRV_SECTOR_MASK.
> 
> Signed-off-by: Yoshiaki Tamura 

I applied a similar patch by Shahar Havivi to the block branch a few
days ago.

Kevin

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf


On 19.07.2010, at 11:19, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 11:13:38AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 11:10, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote:
 
 On 19.07.2010, at 11:00, Gleb Natapov wrote:
 
> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 10:48, Gleb Natapov wrote:
>> 
>>> 
 Were there DMA capable devices back in ISA times? There must be. If 
 so, we can just take a look at what they do and do it similarly. Bus 
 mastering was a new thing for PCI, right?
 
>>> I think IDE can be considered DMA capable ISA device, no? At least
>>> it works by writing to PIO ports and getting result into memory, but
>>> with interrupts and status bits and everything that real device should
>>> have. On board DMA engine is also ISA device. 
>> 
>> We could define our device to be polling. So all we need is a status bit 
>> that the guest sets when it starts the DMA and the device unsets when 
>> the DMA is done. In our case that should be immediate, because the PIO 
>> invokes the full code paths, but it would look more like a real device, 
>> no?
>> 
> This is better, but it shouldn't be synchronous. Kernel and initrd are
> on disk so why not setup aio and read them from io thread allowing vcpu
> thread immediately return to guest mode to process interrupts.
 
 That would work with the above described device model. If we're going 
 synchronous or asynchronous would become an implementation detail.
 
>>> If vcpu thread will sleep for too much time without processing events we 
>>> can see strange timeouts in a guest.
>> 
>> I don't think I understand what you mean?
>> 
> Vcpu executes "in %ax". Next instruction is executed 6 seconds later.
> All timers that should have been processed during this time fire at the
> same moment triggering all kind of timeouts. Think about watchdog that
> should be written into every two seconds otherwise it does reset.

That's a hypervisor implementation detail! If we want to go synchronously, we 
do. If something breaks, we don't. Doing it synchronously simpllifies things a 
lot. And we're talking about a device that's only used before the OS kicks in. 
There's no use in pretending we're running a watchdog there.

> 
>>> 
> Or why
> not use virtio-serial while we are at it? After all virtio-serial is
> there to allow host and guest communication.
 
 Because virtio-serial needs us to set up the full virtio-pci stack. That's 
 too much to mess with in an option rom IMHO.
 
>>> We already do it for virtio-blk. Read only support is very small in
>>> LOC there. Don't know about virtio-serial protocol.
>> 
>> The virtio-blk model uses the whole pxe framework. For our in-tree option 
>> roms we're trying to be simple. And I'd like to keep it that way. I really 
>> don't want to add PCI enumeration and BAR setup to that code.
>> 
> The virtio-blk is entirely in seabios and does not use pxe at all!

So it uses even more framework :). The linuxboot stuff is completely separate 
in its very own option rom.

Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 12:19:22PM +0300, Gleb Natapov wrote:
> Vcpu executes "in %ax". Next instruction is executed 6 seconds later.
> All timers that should have been processed during this time fire at the
> same moment triggering all kind of timeouts. Think about watchdog that
> should be written into every two seconds otherwise it does reset.

This particular code runs very early in boot, and the atomic copy
operation is very quick even with a 100MB initrd.

But the question I think should be: If a guest maliciously (after
boot) tried to use this mechanism, could it do harm to the host?  Or
would it just harm itself?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://et.redhat.com/~rjones/libguestfs/
See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html

Re: [Qemu-devel] [PATCH 2/2] fw_cfg: Add blit operation for copying kernel, initrd, ..

2010-07-19 Thread Aurelien Jarno

On Mon, Jul 19, 2010 at 08:23:34AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 01:59:22AM +0200, Aurelien Jarno wrote:
> > OpenBIOS also uses the same firmware interface, so it would need to be
> > changed if this patch is accepted.
> 
> The patch leaves the old interface.  Does it still need to be changed?
> 

As long as the old interface is kept, that should be fine for OpenBIOS.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf


On 19.07.2010, at 11:15, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 11:06, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
 
 virt-install is another program that uses explicit -initrd.
 
>>> Installation takes a lot of time. Saving 1 second there will not be
>>> noticeable. And during lifetime of installed VM initrd will be loaded
>>> from its disk.
>> 
>> Guys, please. It shouldn't be one or the other. Let's make sure both ways of 
>> doing things are fast. That's what users want: fast.
>> 
> That what we are talking about, no? We are trying to find faster way to
> load kernel/initrd and stay architectural. Honestly I would expect much
> greater speedup from Richard's approach like 2 seconds vs 8 seconds. It
> is hard to justify code complication just for 1 second speedup.

I agree. Hence I'd like to keep the complication as simple as possible.

Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 11:13:38AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 11:10, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 11:00, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
>  
>  On 19.07.2010, at 10:48, Gleb Natapov wrote:
>  
> > 
> >> Were there DMA capable devices back in ISA times? There must be. If 
> >> so, we can just take a look at what they do and do it similarly. Bus 
> >> mastering was a new thing for PCI, right?
> >> 
> > I think IDE can be considered DMA capable ISA device, no? At least
> > it works by writing to PIO ports and getting result into memory, but
> > with interrupts and status bits and everything that real device should
> > have. On board DMA engine is also ISA device. 
>  
>  We could define our device to be polling. So all we need is a status bit 
>  that the guest sets when it starts the DMA and the device unsets when 
>  the DMA is done. In our case that should be immediate, because the PIO 
>  invokes the full code paths, but it would look more like a real device, 
>  no?
>  
> >>> This is better, but it shouldn't be synchronous. Kernel and initrd are
> >>> on disk so why not setup aio and read them from io thread allowing vcpu
> >>> thread immediately return to guest mode to process interrupts.
> >> 
> >> That would work with the above described device model. If we're going 
> >> synchronous or asynchronous would become an implementation detail.
> >> 
> > If vcpu thread will sleep for too much time without processing events we 
> > can see strange timeouts in a guest.
> 
> I don't think I understand what you mean?
> 
Vcpu executes "in %ax". Next instruction is executed 6 seconds later.
All timers that should have been processed during this time fire at the
same moment triggering all kind of timeouts. Think about watchdog that
should be written into every two seconds otherwise it does reset.

> > 
> >>> Or why
> >>> not use virtio-serial while we are at it? After all virtio-serial is
> >>> there to allow host and guest communication.
> >> 
> >> Because virtio-serial needs us to set up the full virtio-pci stack. That's 
> >> too much to mess with in an option rom IMHO.
> >> 
> > We already do it for virtio-blk. Read only support is very small in
> > LOC there. Don't know about virtio-serial protocol.
> 
> The virtio-blk model uses the whole pxe framework. For our in-tree option 
> roms we're trying to be simple. And I'd like to keep it that way. I really 
> don't want to add PCI enumeration and BAR setup to that code.
> 
The virtio-blk is entirely in seabios and does not use pxe at all!

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> Richard, what does kvm_stat tell you while loading the initrd? Are
> there a lot of PIO requests or are we simply looping inside qemu
> code?

The two attached files were made by running kvm_stat -l > /tmp/...
during a single run starting libguestfs.  This use of kvm_stat is as
described in Chris's blog entry here:
http://clalance.blogspot.com/2009/01/kvm-performance-tools.html

The first attachment ('no-patch') is without the proposed patch.

The second attachment ('with-patch') is with the proposed patch.

It seems some numbers such as #vmexits are lower with the proposed
patch, although not by a very much.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v
 efer_relo  exits  fpu_reloa  halt_exit  halt_wake  host_stat  hypercall  
insn_emul  insn_emul invlpg   io_exits  irq_exits  irq_injec  irq_windo  
largepage  mmio_exit  mmu_cache  mmu_flood  mmu_pde_z  mmu_pte_u  mmu_pte_w  
mmu_recyc  mmu_shado  mmu_unsyn  nmi_injec  nmi_windo   pf_fixed   pf_guest  
remote_tl  request_i  signal_ex  tlb_flush
 0  0  0  0  0  0  0
  0  0  0  0  0  0  0   
   0  0  0  0  0  0  0  
0  0  0  0  0  0  0  0  
0  0  0
 0  0  0  0  0  0  0
  0  0  0  0  0  0  0   
   0  0  0  0  0  0  0  
0  0  0  0  0  0  0  0  
0  0  0
 0 147464  10991  0  0  75176  0
  70151  0  0  75122106  8 11   
   0  7 49  0  0  0  56360  
0 94  0  0  0390  0  0  
0 39  0
 0  21165  0  0  0  21159  0
  21139  0  0  21151 18  0  0   
   0  0  0  0  0  0  0  
0  0  0  0  0  0  0  0  
0 16  0
 0 108370   1024  0  0  57189  0
   9936  0   8674  55697342108209   
   0   1014   2828610180  0776  
0   2754  0  0  0  24981  0  0  
0286  10996
 0  78421  0  0  0  16362  0
   1828  0  0  12189   2136   1073   1371   
   0   1827 93  0  1  0  1  
0  0  2  0  0  46659  0  0  
0   2041  1
 0  52784611  8  0  13498  0
   2722  0   1084  10562   1236769   1106   
   0   1317769276   1138  0   1405  
0630 42  0  0  31694   4779  0  
0   1227   2799
 0 122866   7220  0  0  6  0
  11677  0   7619   6596   1655   1137   1560   
   0   2035   5133   1956   7687  0   9642  
0   4375472  0  0  62778  32417  0  
0   1976  19957
 0  48502  19852  9  0  22262  0
   3173  0   1607  19781802   1199   1700   
   0799   1063452   1918  0   2370  
0   1125 49  0  0  14706   7666  0  
0642   4795
 0-579572 -39698-17  0-216762  0
-120626  0 -18984-201098  -6295  -4294  -5957   
   0  -6999  -9935  -3294 -10924  0 -70554  
0  -8978   -565  0  0-181208 -44862  0  
0  -6227 -38548
 0  0  0  0  0  0  0
  0  0  0  0  0

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf


On 19.07.2010, at 11:10, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 11:00, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
 
 On 19.07.2010, at 10:48, Gleb Natapov wrote:
 
> 
>> Were there DMA capable devices back in ISA times? There must be. If so, 
>> we can just take a look at what they do and do it similarly. Bus 
>> mastering was a new thing for PCI, right?
>> 
> I think IDE can be considered DMA capable ISA device, no? At least
> it works by writing to PIO ports and getting result into memory, but
> with interrupts and status bits and everything that real device should
> have. On board DMA engine is also ISA device. 
 
 We could define our device to be polling. So all we need is a status bit 
 that the guest sets when it starts the DMA and the device unsets when the 
 DMA is done. In our case that should be immediate, because the PIO invokes 
 the full code paths, but it would look more like a real device, no?
 
>>> This is better, but it shouldn't be synchronous. Kernel and initrd are
>>> on disk so why not setup aio and read them from io thread allowing vcpu
>>> thread immediately return to guest mode to process interrupts.
>> 
>> That would work with the above described device model. If we're going 
>> synchronous or asynchronous would become an implementation detail.
>> 
> If vcpu thread will sleep for too much time without processing events we can 
> see strange timeouts in a guest.

I don't think I understand what you mean?

> 
>>> Or why
>>> not use virtio-serial while we are at it? After all virtio-serial is
>>> there to allow host and guest communication.
>> 
>> Because virtio-serial needs us to set up the full virtio-pci stack. That's 
>> too much to mess with in an option rom IMHO.
>> 
> We already do it for virtio-blk. Read only support is very small in
> LOC there. Don't know about virtio-serial protocol.

The virtio-blk model uses the whole pxe framework. For our in-tree option roms 
we're trying to be simple. And I'd like to keep it that way. I really don't 
want to add PCI enumeration and BAR setup to that code.


Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 11:06, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
> >> 
> >> virt-install is another program that uses explicit -initrd.
> >> 
> > Installation takes a lot of time. Saving 1 second there will not be
> > noticeable. And during lifetime of installed VM initrd will be loaded
> > from its disk.
> 
> Guys, please. It shouldn't be one or the other. Let's make sure both ways of 
> doing things are fast. That's what users want: fast.
> 
That what we are talking about, no? We are trying to find faster way to
load kernel/initrd and stay architectural. Honestly I would expect much
greater speedup from Richard's approach like 2 seconds vs 8 seconds. It
is hard to justify code complication just for 1 second speedup.

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 11:00, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 10:48, Gleb Natapov wrote:
> >> 
> >>> 
>  Were there DMA capable devices back in ISA times? There must be. If so, 
>  we can just take a look at what they do and do it similarly. Bus 
>  mastering was a new thing for PCI, right?
>  
> >>> I think IDE can be considered DMA capable ISA device, no? At least
> >>> it works by writing to PIO ports and getting result into memory, but
> >>> with interrupts and status bits and everything that real device should
> >>> have. On board DMA engine is also ISA device. 
> >> 
> >> We could define our device to be polling. So all we need is a status bit 
> >> that the guest sets when it starts the DMA and the device unsets when the 
> >> DMA is done. In our case that should be immediate, because the PIO invokes 
> >> the full code paths, but it would look more like a real device, no?
> >> 
> > This is better, but it shouldn't be synchronous. Kernel and initrd are
> > on disk so why not setup aio and read them from io thread allowing vcpu
> > thread immediately return to guest mode to process interrupts.
> 
> That would work with the above described device model. If we're going 
> synchronous or asynchronous would become an implementation detail.
> 
If vcpu thread will sleep for too much time without processing events we can 
see strange timeouts in a guest.

> > Or why
> > not use virtio-serial while we are at it? After all virtio-serial is
> > there to allow host and guest communication.
> 
> Because virtio-serial needs us to set up the full virtio-pci stack. That's 
> too much to mess with in an option rom IMHO.
> 
We already do it for virtio-blk. Read only support is very small in
LOC there. Don't know about virtio-serial protocol.

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf

On 19.07.2010, at 11:06, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
>> 
>> virt-install is another program that uses explicit -initrd.
>> 
> Installation takes a lot of time. Saving 1 second there will not be
> noticeable. And during lifetime of installed VM initrd will be loaded
> from its disk.

Guys, please. It shouldn't be one or the other. Let's make sure both ways of 
doing things are fast. That's what users want: fast.

Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 11:40:41AM +0300, Gleb Natapov wrote:
> > On Mon, Jul 19, 2010 at 09:34:11AM +0100, Richard W.M. Jones wrote:
> > > On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote:
> > > > Why not put then on cdrom or disk?
> > > 
> > > It simplifies device and mountpoint enumeration not to have a separate
> > > disk.  It would also mean we couldn't use standard Fedora paths, or
> > > we'd have to have bind-mount /bin etc on to the disk mount point,
> > > which again complicates things.
> > > 
> > Can't help you here, but if it's doable you can speedup your startup
> > time much more then by a second.
> 
> This isn't true.
> 
> The most we could save is 0.8 seconds [time taken to load the 100MB
> initrd by the kernel] less the time taken to probe and mount a CD ISO
But you do not need all 100MB of application, so with disk approach
you load things you need on demand.

> [0.2 seconds - measured using guestfish] less the time taken to load
> programs from this CD.  So the most we could save would be 0.6
> seconds, and in reality it'd be less than this if we actually loaded
> and ran any programs from the CD at all.
> 
> My patch saves 1 second, and all the programs are in RAM.
> 
And it will take 100M of a host ram.

> > Most users load initrd from a disk not by -initrd option.
> 
> It's unusual, but on my production webserver I use -kernel and -initrd
> options explicitly.  That's because I want all my VMs to share a
> single kernel.
> 
How often you restart them?

> virt-install is another program that uses explicit -initrd.
> 
Installation takes a lot of time. Saving 1 second there will not be
noticeable. And during lifetime of installed VM initrd will be loaded
from its disk.

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
[...]

OK, it's early in the morning and I can't do maths.  But we're still
asking a big increase in complexity versus optimizing something which
is just slow in qemu at the moment.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf


On 19.07.2010, at 11:00, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 10:48, Gleb Natapov wrote:
>> 
>>> 
 Were there DMA capable devices back in ISA times? There must be. If so, we 
 can just take a look at what they do and do it similarly. Bus mastering 
 was a new thing for PCI, right?
 
>>> I think IDE can be considered DMA capable ISA device, no? At least
>>> it works by writing to PIO ports and getting result into memory, but
>>> with interrupts and status bits and everything that real device should
>>> have. On board DMA engine is also ISA device. 
>> 
>> We could define our device to be polling. So all we need is a status bit 
>> that the guest sets when it starts the DMA and the device unsets when the 
>> DMA is done. In our case that should be immediate, because the PIO invokes 
>> the full code paths, but it would look more like a real device, no?
>> 
> This is better, but it shouldn't be synchronous. Kernel and initrd are
> on disk so why not setup aio and read them from io thread allowing vcpu
> thread immediately return to guest mode to process interrupts.

That would work with the above described device model. If we're going 
synchronous or asynchronous would become an implementation detail.

> Or why
> not use virtio-serial while we are at it? After all virtio-serial is
> there to allow host and guest communication.

Because virtio-serial needs us to set up the full virtio-pci stack. That's too 
much to mess with in an option rom IMHO.

Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 10:48, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 10:41:48AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 10:30, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote:
>  
>  On 19.07.2010, at 10:19, Gleb Natapov wrote:
>  
>  Yes and no. It sounds nice at first, but doesn't quite fit. There are 
>  two issues:
>  
>  1) We need a new PCI ID
> >>> We have our range. We can allocate from there.
> >>> 
>  2) There can be a lot of initrd binaries with multiboot. We only have a 
>  limited amount of BARs
>  
> >>> Is it supported now with fw_cfg interface? My main concern with this
> >>> approach is huge BAR size that may take a lot of space from PCI MMIO range
> >>> if guest OS decide to configure it.
> >> 
> >> Oh, right. I think I combined all the modules into the INITRD blob. Yeah, 
> >> that would work. Is coalesced MMIO more efficient than coalesced PIO? Or 
> >> do we have to do some RAM mapping for those special BAR regions?
> >> 
> > I think we will have to do RAM mapping. Otherwise it may be slow to.
> > Coalesced MMIO is for write not read IIRC.
> 
> Oh, right. Makes sense.
> 
> > 
> >> Were there DMA capable devices back in ISA times? There must be. If so, we 
> >> can just take a look at what they do and do it similarly. Bus mastering 
> >> was a new thing for PCI, right?
> >> 
> > I think IDE can be considered DMA capable ISA device, no? At least
> > it works by writing to PIO ports and getting result into memory, but
> > with interrupts and status bits and everything that real device should
> > have. On board DMA engine is also ISA device. 
> 
> We could define our device to be polling. So all we need is a status bit that 
> the guest sets when it starts the DMA and the device unsets when the DMA is 
> done. In our case that should be immediate, because the PIO invokes the full 
> code paths, but it would look more like a real device, no?
> 
This is better, but it shouldn't be synchronous. Kernel and initrd are
on disk so why not setup aio and read them from io thread allowing vcpu
thread immediately return to guest mode to process interrupts. Or why
not use virtio-serial while we are at it? After all virtio-serial is
there to allow host and guest communication.

> outb(PORT_DMA_CTL, FWCFG_DMA_ENABLE);
> while (inb(PORT_DMA_CTL) & FWCFG_DMA_ENABLE) {
> /* DMA going on */
> }
> /* DMA done */
> 
> 
> Alex

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 11:40:41AM +0300, Gleb Natapov wrote:
> On Mon, Jul 19, 2010 at 09:34:11AM +0100, Richard W.M. Jones wrote:
> > On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote:
> > > Why not put then on cdrom or disk?
> > 
> > It simplifies device and mountpoint enumeration not to have a separate
> > disk.  It would also mean we couldn't use standard Fedora paths, or
> > we'd have to have bind-mount /bin etc on to the disk mount point,
> > which again complicates things.
> > 
> Can't help you here, but if it's doable you can speedup your startup
> time much more then by a second.

This isn't true.

The most we could save is 0.8 seconds [time taken to load the 100MB
initrd by the kernel] less the time taken to probe and mount a CD ISO
[0.2 seconds - measured using guestfish] less the time taken to load
programs from this CD.  So the most we could save would be 0.6
seconds, and in reality it'd be less than this if we actually loaded
and ran any programs from the CD at all.

My patch saves 1 second, and all the programs are in RAM.

> Most users load initrd from a disk not by -initrd option.

It's unusual, but on my production webserver I use -kernel and -initrd
options explicitly.  That's because I want all my VMs to share a
single kernel.

virt-install is another program that uses explicit -initrd.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf


On 19.07.2010, at 10:48, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 10:41:48AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 10:30, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote:
 
 On 19.07.2010, at 10:19, Gleb Natapov wrote:
 
 Yes and no. It sounds nice at first, but doesn't quite fit. There are two 
 issues:
 
 1) We need a new PCI ID
>>> We have our range. We can allocate from there.
>>> 
 2) There can be a lot of initrd binaries with multiboot. We only have a 
 limited amount of BARs
 
>>> Is it supported now with fw_cfg interface? My main concern with this
>>> approach is huge BAR size that may take a lot of space from PCI MMIO range
>>> if guest OS decide to configure it.
>> 
>> Oh, right. I think I combined all the modules into the INITRD blob. Yeah, 
>> that would work. Is coalesced MMIO more efficient than coalesced PIO? Or do 
>> we have to do some RAM mapping for those special BAR regions?
>> 
> I think we will have to do RAM mapping. Otherwise it may be slow to.
> Coalesced MMIO is for write not read IIRC.

Oh, right. Makes sense.

> 
>> Were there DMA capable devices back in ISA times? There must be. If so, we 
>> can just take a look at what they do and do it similarly. Bus mastering was 
>> a new thing for PCI, right?
>> 
> I think IDE can be considered DMA capable ISA device, no? At least
> it works by writing to PIO ports and getting result into memory, but
> with interrupts and status bits and everything that real device should
> have. On board DMA engine is also ISA device. 

We could define our device to be polling. So all we need is a status bit that 
the guest sets when it starts the DMA and the device unsets when the DMA is 
done. In our case that should be immediate, because the PIO invokes the full 
code paths, but it would look more like a real device, no?

outb(PORT_DMA_CTL, FWCFG_DMA_ENABLE);
while (inb(PORT_DMA_CTL) & FWCFG_DMA_ENABLE) {
/* DMA going on */
}
/* DMA done */


Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 10:41:48AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 10:30, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 10:19, Gleb Natapov wrote:
> >> 
> >> Yes and no. It sounds nice at first, but doesn't quite fit. There are two 
> >> issues:
> >> 
> >> 1) We need a new PCI ID
> > We have our range. We can allocate from there.
> > 
> >> 2) There can be a lot of initrd binaries with multiboot. We only have a 
> >> limited amount of BARs
> >> 
> > Is it supported now with fw_cfg interface? My main concern with this
> > approach is huge BAR size that may take a lot of space from PCI MMIO range
> > if guest OS decide to configure it.
> 
> Oh, right. I think I combined all the modules into the INITRD blob. Yeah, 
> that would work. Is coalesced MMIO more efficient than coalesced PIO? Or do 
> we have to do some RAM mapping for those special BAR regions?
> 
I think we will have to do RAM mapping. Otherwise it may be slow to.
Coalesced MMIO is for write not read IIRC.

> Were there DMA capable devices back in ISA times? There must be. If so, we 
> can just take a look at what they do and do it similarly. Bus mastering was a 
> new thing for PCI, right?
> 
I think IDE can be considered DMA capable ISA device, no? At least
it works by writing to PIO ports and getting result into memory, but
with interrupts and status bits and everything that real device should
have. On board DMA engine is also ISA device. 

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf

On 19.07.2010, at 10:30, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 10:19, Gleb Natapov wrote:
>> 
>> Yes and no. It sounds nice at first, but doesn't quite fit. There are two 
>> issues:
>> 
>> 1) We need a new PCI ID
> We have our range. We can allocate from there.
> 
>> 2) There can be a lot of initrd binaries with multiboot. We only have a 
>> limited amount of BARs
>> 
> Is it supported now with fw_cfg interface? My main concern with this
> approach is huge BAR size that may take a lot of space from PCI MMIO range
> if guest OS decide to configure it.

Oh, right. I think I combined all the modules into the INITRD blob. Yeah, that 
would work. Is coalesced MMIO more efficient than coalesced PIO? Or do we have 
to do some RAM mapping for those special BAR regions?

Were there DMA capable devices back in ISA times? There must be. If so, we can 
just take a look at what they do and do it similarly. Bus mastering was a new 
thing for PCI, right?

Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 09:34:11AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote:
> > Why not put then on cdrom or disk?
> 
> It simplifies device and mountpoint enumeration not to have a separate
> disk.  It would also mean we couldn't use standard Fedora paths, or
> we'd have to have bind-mount /bin etc on to the disk mount point,
> which again complicates things.
> 
Can't help you here, but if it's doable you can speedup your startup
time much more then by a second.

> Anyway, what we're talking about here is a problem in qemu.  How is
The problem is that you want to speed up your application. There is more
then one solution to the problem. If you come up with reasonable
solution in qemu that it OK. 

> making initrd loading faster not a benefit for everyone?  Every boot
> has to load an initrd of some size, so making that operation faster
> benefits every user, even if individually only by a small amount.
> 
Most users load initrd from a disk not by -initrd option.

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote:
> Why not put then on cdrom or disk?

It simplifies device and mountpoint enumeration not to have a separate
disk.  It would also mean we couldn't use standard Fedora paths, or
we'd have to have bind-mount /bin etc on to the disk mount point,
which again complicates things.

Anyway, what we're talking about here is a problem in qemu.  How is
making initrd loading faster not a benefit for everyone?  Every boot
has to load an initrd of some size, so making that operation faster
benefits every user, even if individually only by a small amount.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 10:19, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 10:08:57AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 10:01, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote:
>  
>  On 19.07.2010, at 09:51, Gleb Natapov wrote:
>  
> > On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 09:33, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>  On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> > That what I am warring about too. If we are adding device we have 
> > to be
> > sure such device can actually exist on real hw too otherwise we may 
> > have
> > problems later.
>  
>  I don't understand why the constraints of real h/w have anything to 
>  do
>  with this.  Can you explain?
>  
> >>> Each time we do something not architectural it cause us troubles 
> >>> later.
> >>> So constraints of real h/w is our constrains to.
> >>> 
> > Also 1 second on 100M file does not look like huge gain to me.
>  
>  Every second counts.  We're trying to get libguestfs boot times down
>  from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
>  program.
>  
> >>> So what about making initrd smaller? I remember managing two
> >>> distribution in 64M flash in embedded project.
> >> 
> >> Having a huge initrd basically helps in reusing a lot of existing 
> >> code. We do the same - in general the initrd is just a subset of the 
> >> applications of the host OS. And if you start putting perl or the 
> >> likes into it, it becomes big.
> >> 
> > Why not provide small disk/cdrom with all those utilities installed?
>  
>  Because - if the loading is done fast - this way everything's in RAM 
>  instantly. And you still have all devices available for use inside the 
>  system - that makes enumeration a lot easier. There are several reasons 
>  why and I don't think we should force different ways on people just 
>  because one component of our system is ineffective.
>  
> >>> Loading huge initrd on real HW takes noticeably longer time that small
> >>> one, so I would say that it is your design that is to blame here, not
> >>> KVM.
> >> 
> >> I disagree. Virtualization enables new use cases. The -initrd parameter is 
> >> a very good example for that. It's something that you simply couldn't do 
> >> on real hw.
> >> 
> > How is it different from starting kernel/initrd from usb flash drive?
> 
> The kernel and initrd are read directly from the host fs. It's more like a 9p 
> grub boot.
> 
There is no "host" on real HW :) But conceptually it's almost the same.
9p grub boot would be also nice. Hmm, I think PXE is closest to
-kernel/-initrd option on real HW.

> > 
> >>> 
> > 
> >> I guess the best thing for now really is to try and see which code 
> >> paths insb goes along. It should really be coalesced.
> >> 
> > It is coalesced to a certain extent (reenter guest every 1024 bytes,
> > read from userspace page at a time). You need to continue injecting
> > interrupt into a guest during long string operation and checking
> > exception condition on a page boundaries.
>  
>  That still sounds slow. So yeah, adding DMA is probably the right way to 
>  go. But then again - if we model it after real hw it would be 
>  asynchronous, giving us an interrupt, causing even more headache. Ugh.
>  
>  Can't we just ignore real hw constraints here and have it available in 
>  guest ram once one particular PIO is done? No bus master, no interrupts, 
>  but full speed and simplicity/atomicity which also helps migration.
>  
> >>> We shouldn't add devices that work not like real HW to speed up some
> >>> pathological cases (and are slow on real HW too).
> >> 
> >> Just because you don't use them doesn't mean they're pathological, really. 
> >> We simply chose a bad interface for transferring reasonable big chunks of 
> >> data and we need to fix that. If you want to look at it from a different 
> >> perspective, it's a regression. Older qemu versions did map the kernel and 
> >> initrd directly into guest ram, so now we're slower than back then.
> >> 
> > I use them hundred time each day (at least -kernel part). If the
> > interface is slow for your use case I have no problem with introducing
> > new one, but the one that make sense in x86 architecture. I do not agree
> > this is regression BTW. You can't compare buggy way of doing things and
> > non-buggy way and say that bug fixing is a regression.
> > 
> > What about adding new PCI card that holds kernel initrd in ROM bar?
> 
> Y

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf

On 19.07.2010, at 10:19, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 10:08:57AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 10:01, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote:

 On 19.07.2010, at 09:51, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 09:33, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
 On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> That what I am warring about too. If we are adding device we have to 
> be
> sure such device can actually exist on real hw too otherwise we may 
> have
> problems later.

 I don't understand why the constraints of real h/w have anything to do
 with this.  Can you explain?

>>> Each time we do something not architectural it cause us troubles later.
>>> So constraints of real h/w is our constrains to.
>>> 
> Also 1 second on 100M file does not look like huge gain to me.

 Every second counts.  We're trying to get libguestfs boot times down
 from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
 program.

>>> So what about making initrd smaller? I remember managing two
>>> distribution in 64M flash in embedded project.
>> 
>> Having a huge initrd basically helps in reusing a lot of existing code. 
>> We do the same - in general the initrd is just a subset of the 
>> applications of the host OS. And if you start putting perl or the likes 
>> into it, it becomes big.
>> 
> Why not provide small disk/cdrom with all those utilities installed?

 Because - if the loading is done fast - this way everything's in RAM 
 instantly. And you still have all devices available for use inside the 
 system - that makes enumeration a lot easier. There are several reasons 
 why and I don't think we should force different ways on people just 
 because one component of our system is ineffective.

>>> Loading huge initrd on real HW takes noticeably longer time that small
>>> one, so I would say that it is your design that is to blame here, not
>>> KVM.
>> 
>> I disagree. Virtualization enables new use cases. The -initrd parameter is a 
>> very good example for that. It's something that you simply couldn't do on 
>> real hw.
>> 
> How is it different from starting kernel/initrd from usb flash drive?

The kernel and initrd are read directly from the host fs. It's more like a 9p 
grub boot.

> 
>>> 
> 
>> I guess the best thing for now really is to try and see which code paths 
>> insb goes along. It should really be coalesced.
>> 
> It is coalesced to a certain extent (reenter guest every 1024 bytes,
> read from userspace page at a time). You need to continue injecting
> interrupt into a guest during long string operation and checking
> exception condition on a page boundaries.

 That still sounds slow. So yeah, adding DMA is probably the right way to 
 go. But then again - if we model it after real hw it would be 
 asynchronous, giving us an interrupt, causing even more headache. Ugh.

 Can't we just ignore real hw constraints here and have it available in 
 guest ram once one particular PIO is done? No bus master, no interrupts, 
 but full speed and simplicity/atomicity which also helps migration.

>>> We shouldn't add devices that work not like real HW to speed up some
>>> pathological cases (and are slow on real HW too).
>> 
>> Just because you don't use them doesn't mean they're pathological, really. 
>> We simply chose a bad interface for transferring reasonable big chunks of 
>> data and we need to fix that. If you want to look at it from a different 
>> perspective, it's a regression. Older qemu versions did map the kernel and 
>> initrd directly into guest ram, so now we're slower than back then.
>> 
> I use them hundred time each day (at least -kernel part). If the
> interface is slow for your use case I have no problem with introducing
> new one, but the one that make sense in x86 architecture. I do not agree
> this is regression BTW. You can't compare buggy way of doing things and
> non-buggy way and say that bug fixing is a regression.
> 
> What about adding new PCI card that holds kernel initrd in ROM bar?

Yes and no. It sounds nice at first, but doesn't quite fit. There are two 
issues:

1) We need a new PCI ID
2) There can be a lot of initrd binaries with multiboot. We only have a limited 
amount of BARs

Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 10:08:57AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 10:01, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 09:51, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
>  
>  On 19.07.2010, at 09:33, Gleb Natapov wrote:
>  
> > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> >> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> >>> That what I am warring about too. If we are adding device we have to 
> >>> be
> >>> sure such device can actually exist on real hw too otherwise we may 
> >>> have
> >>> problems later.
> >> 
> >> I don't understand why the constraints of real h/w have anything to do
> >> with this.  Can you explain?
> >> 
> > Each time we do something not architectural it cause us troubles later.
> > So constraints of real h/w is our constrains to.
> > 
> >>> Also 1 second on 100M file does not look like huge gain to me.
> >> 
> >> Every second counts.  We're trying to get libguestfs boot times down
> >> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
> >> program.
> >> 
> > So what about making initrd smaller? I remember managing two
> > distribution in 64M flash in embedded project.
>  
>  Having a huge initrd basically helps in reusing a lot of existing code. 
>  We do the same - in general the initrd is just a subset of the 
>  applications of the host OS. And if you start putting perl or the likes 
>  into it, it becomes big.
>  
> >>> Why not provide small disk/cdrom with all those utilities installed?
> >> 
> >> Because - if the loading is done fast - this way everything's in RAM 
> >> instantly. And you still have all devices available for use inside the 
> >> system - that makes enumeration a lot easier. There are several reasons 
> >> why and I don't think we should force different ways on people just 
> >> because one component of our system is ineffective.
> >> 
> > Loading huge initrd on real HW takes noticeably longer time that small
> > one, so I would say that it is your design that is to blame here, not
> > KVM.
> 
> I disagree. Virtualization enables new use cases. The -initrd parameter is a 
> very good example for that. It's something that you simply couldn't do on 
> real hw.
> 
How is it different from starting kernel/initrd from usb flash drive?

> > 
> >>> 
>  I guess the best thing for now really is to try and see which code paths 
>  insb goes along. It should really be coalesced.
>  
> >>> It is coalesced to a certain extent (reenter guest every 1024 bytes,
> >>> read from userspace page at a time). You need to continue injecting
> >>> interrupt into a guest during long string operation and checking
> >>> exception condition on a page boundaries.
> >> 
> >> That still sounds slow. So yeah, adding DMA is probably the right way to 
> >> go. But then again - if we model it after real hw it would be 
> >> asynchronous, giving us an interrupt, causing even more headache. Ugh.
> >> 
> >> Can't we just ignore real hw constraints here and have it available in 
> >> guest ram once one particular PIO is done? No bus master, no interrupts, 
> >> but full speed and simplicity/atomicity which also helps migration.
> >> 
> > We shouldn't add devices that work not like real HW to speed up some
> > pathological cases (and are slow on real HW too).
> 
> Just because you don't use them doesn't mean they're pathological, really. We 
> simply chose a bad interface for transferring reasonable big chunks of data 
> and we need to fix that. If you want to look at it from a different 
> perspective, it's a regression. Older qemu versions did map the kernel and 
> initrd directly into guest ram, so now we're slower than back then.
> 
I use them hundred time each day (at least -kernel part). If the
interface is slow for your use case I have no problem with introducing
new one, but the one that make sense in x86 architecture. I do not agree
this is regression BTW. You can't compare buggy way of doing things and
non-buggy way and say that bug fixing is a regression.

What about adding new PCI card that holds kernel initrd in ROM bar?

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf


On 19.07.2010, at 10:01, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 09:51, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
 
 On 19.07.2010, at 09:33, Gleb Natapov wrote:
 
> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
>>> That what I am warring about too. If we are adding device we have to be
>>> sure such device can actually exist on real hw too otherwise we may have
>>> problems later.
>> 
>> I don't understand why the constraints of real h/w have anything to do
>> with this.  Can you explain?
>> 
> Each time we do something not architectural it cause us troubles later.
> So constraints of real h/w is our constrains to.
> 
>>> Also 1 second on 100M file does not look like huge gain to me.
>> 
>> Every second counts.  We're trying to get libguestfs boot times down
>> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
>> program.
>> 
> So what about making initrd smaller? I remember managing two
> distribution in 64M flash in embedded project.
 
 Having a huge initrd basically helps in reusing a lot of existing code. We 
 do the same - in general the initrd is just a subset of the applications 
 of the host OS. And if you start putting perl or the likes into it, it 
 becomes big.
 
>>> Why not provide small disk/cdrom with all those utilities installed?
>> 
>> Because - if the loading is done fast - this way everything's in RAM 
>> instantly. And you still have all devices available for use inside the 
>> system - that makes enumeration a lot easier. There are several reasons why 
>> and I don't think we should force different ways on people just because one 
>> component of our system is ineffective.
>> 
> Loading huge initrd on real HW takes noticeably longer time that small
> one, so I would say that it is your design that is to blame here, not
> KVM.

I disagree. Virtualization enables new use cases. The -initrd parameter is a 
very good example for that. It's something that you simply couldn't do on real 
hw.

> 
>>> 
 I guess the best thing for now really is to try and see which code paths 
 insb goes along. It should really be coalesced.
 
>>> It is coalesced to a certain extent (reenter guest every 1024 bytes,
>>> read from userspace page at a time). You need to continue injecting
>>> interrupt into a guest during long string operation and checking
>>> exception condition on a page boundaries.
>> 
>> That still sounds slow. So yeah, adding DMA is probably the right way to go. 
>> But then again - if we model it after real hw it would be asynchronous, 
>> giving us an interrupt, causing even more headache. Ugh.
>> 
>> Can't we just ignore real hw constraints here and have it available in guest 
>> ram once one particular PIO is done? No bus master, no interrupts, but full 
>> speed and simplicity/atomicity which also helps migration.
>> 
> We shouldn't add devices that work not like real HW to speed up some
> pathological cases (and are slow on real HW too).

Just because you don't use them doesn't mean they're pathological, really. We 
simply chose a bad interface for transferring reasonable big chunks of data and 
we need to fix that. If you want to look at it from a different perspective, 
it's a regression. Older qemu versions did map the kernel and initrd directly 
into guest ram, so now we're slower than back then.


Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 09:51, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 09:33, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>  On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> > That what I am warring about too. If we are adding device we have to be
> > sure such device can actually exist on real hw too otherwise we may have
> > problems later.
>  
>  I don't understand why the constraints of real h/w have anything to do
>  with this.  Can you explain?
>  
> >>> Each time we do something not architectural it cause us troubles later.
> >>> So constraints of real h/w is our constrains to.
> >>> 
> > Also 1 second on 100M file does not look like huge gain to me.
>  
>  Every second counts.  We're trying to get libguestfs boot times down
>  from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
>  program.
>  
> >>> So what about making initrd smaller? I remember managing two
> >>> distribution in 64M flash in embedded project.
> >> 
> >> Having a huge initrd basically helps in reusing a lot of existing code. We 
> >> do the same - in general the initrd is just a subset of the applications 
> >> of the host OS. And if you start putting perl or the likes into it, it 
> >> becomes big.
> >> 
> > Why not provide small disk/cdrom with all those utilities installed?
> 
> Because - if the loading is done fast - this way everything's in RAM 
> instantly. And you still have all devices available for use inside the system 
> - that makes enumeration a lot easier. There are several reasons why and I 
> don't think we should force different ways on people just because one 
> component of our system is ineffective.
> 
Loading huge initrd on real HW takes noticeably longer time that small
one, so I would say that it is your design that is to blame here, not
KVM.

> > 
> >> I guess the best thing for now really is to try and see which code paths 
> >> insb goes along. It should really be coalesced.
> >> 
> > It is coalesced to a certain extent (reenter guest every 1024 bytes,
> > read from userspace page at a time). You need to continue injecting
> > interrupt into a guest during long string operation and checking
> > exception condition on a page boundaries.
> 
> That still sounds slow. So yeah, adding DMA is probably the right way to go. 
> But then again - if we model it after real hw it would be asynchronous, 
> giving us an interrupt, causing even more headache. Ugh.
> 
> Can't we just ignore real hw constraints here and have it available in guest 
> ram once one particular PIO is done? No bus master, no interrupts, but full 
> speed and simplicity/atomicity which also helps migration.
> 
We shouldn't add devices that work not like real HW to speed up some
pathological cases (and are slow on real HW too).

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf

On 19.07.2010, at 09:51, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 09:33, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
 On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> That what I am warring about too. If we are adding device we have to be
> sure such device can actually exist on real hw too otherwise we may have
> problems later.

 I don't understand why the constraints of real h/w have anything to do
 with this.  Can you explain?

>>> Each time we do something not architectural it cause us troubles later.
>>> So constraints of real h/w is our constrains to.
>>> 
> Also 1 second on 100M file does not look like huge gain to me.

 Every second counts.  We're trying to get libguestfs boot times down
 from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
 program.

>>> So what about making initrd smaller? I remember managing two
>>> distribution in 64M flash in embedded project.
>> 
>> Having a huge initrd basically helps in reusing a lot of existing code. We 
>> do the same - in general the initrd is just a subset of the applications of 
>> the host OS. And if you start putting perl or the likes into it, it becomes 
>> big.
>> 
> Why not provide small disk/cdrom with all those utilities installed?

Because - if the loading is done fast - this way everything's in RAM instantly. 
And you still have all devices available for use inside the system - that makes 
enumeration a lot easier. There are several reasons why and I don't think we 
should force different ways on people just because one component of our system 
is ineffective.

> 
>> I guess the best thing for now really is to try and see which code paths 
>> insb goes along. It should really be coalesced.
>> 
> It is coalesced to a certain extent (reenter guest every 1024 bytes,
> read from userspace page at a time). You need to continue injecting
> interrupt into a guest during long string operation and checking
> exception condition on a page boundaries.

That still sounds slow. So yeah, adding DMA is probably the right way to go. 
But then again - if we model it after real hw it would be asynchronous, giving 
us an interrupt, causing even more headache. Ugh.

Can't we just ignore real hw constraints here and have it available in guest 
ram once one particular PIO is done? No bus master, no interrupts, but full 
speed and simplicity/atomicity which also helps migration.

Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 08:44:16AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 10:33:12AM +0300, Gleb Natapov wrote:
> > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> > > On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> > > > That what I am warring about too. If we are adding device we have to be
> > > > sure such device can actually exist on real hw too otherwise we may have
> > > > problems later.
> > > 
> > > I don't understand why the constraints of real h/w have anything to do
> > > with this.  Can you explain?
> > > 
> > Each time we do something not architectural it cause us troubles later.
> 
> Can you explain more or point to some examples?  I really don't
> understand what these troubles could be.  But I'm prepared to be
> enlightened.
> 
There are many. Look at vmware backdoor interface for instance. Such
beast can't exist on real HW, so now we have to have hacks in emulator
since io operation can change cpu registers. And I am not saying that
what you are proposing can't exist on real HW. If such device can exist
we can do it that way too. The gain is too small though.

> > So what about making initrd smaller? I remember managing two
> > distribution in 64M flash in embedded project.
> 
> The distribution is the size that it is, because (a) it has to be
> based on Fedora and because (b) it has to include a certain number of
> programs.
Why not put then on cdrom or disk?

> 
> The reason for (a) is so that we don't need to compile our own tools
> and we can benefit from bug fixes from Fedora (and contribute bug
> fixes back).  The reason for (b) is that we want to implement a rich
> API[1], and having a rich API means we simply have to include many
> binaries.
> 
> We're already doing a lot of minimization on the image[2], deleting
> man pages, language files, etc., so the image mainly just contains
> binaries and libraries and kernel modules, which we cannot get rid of
> because of (b).  The original pre-minimization image is 600MB or so.
> 

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 09:33, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> >> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> >>> That what I am warring about too. If we are adding device we have to be
> >>> sure such device can actually exist on real hw too otherwise we may have
> >>> problems later.
> >> 
> >> I don't understand why the constraints of real h/w have anything to do
> >> with this.  Can you explain?
> >> 
> > Each time we do something not architectural it cause us troubles later.
> > So constraints of real h/w is our constrains to.
> > 
> >>> Also 1 second on 100M file does not look like huge gain to me.
> >> 
> >> Every second counts.  We're trying to get libguestfs boot times down
> >> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
> >> program.
> >> 
> > So what about making initrd smaller? I remember managing two
> > distribution in 64M flash in embedded project.
> 
> Having a huge initrd basically helps in reusing a lot of existing code. We do 
> the same - in general the initrd is just a subset of the applications of the 
> host OS. And if you start putting perl or the likes into it, it becomes big.
> 
Why not provide small disk/cdrom with all those utilities installed?

> I guess the best thing for now really is to try and see which code paths insb 
> goes along. It should really be coalesced.
> 
It is coalesced to a certain extent (reenter guest every 1024 bytes,
read from userspace page at a time). You need to continue injecting
interrupt into a guest during long string operation and checking
exception condition on a page boundaries.

> Richard, what does kvm_stat tell you while loading the initrd? Are there a 
> lot of PIO requests or are we simply looping inside qemu code?
> 
> 
> Alex

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 10:33:12AM +0300, Gleb Natapov wrote:
> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> > On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> > > That what I am warring about too. If we are adding device we have to be
> > > sure such device can actually exist on real hw too otherwise we may have
> > > problems later.
> > 
> > I don't understand why the constraints of real h/w have anything to do
> > with this.  Can you explain?
> > 
> Each time we do something not architectural it cause us troubles later.

Can you explain more or point to some examples?  I really don't
understand what these troubles could be.  But I'm prepared to be
enlightened.

> So what about making initrd smaller? I remember managing two
> distribution in 64M flash in embedded project.

The distribution is the size that it is, because (a) it has to be
based on Fedora and because (b) it has to include a certain number of
programs.

The reason for (a) is so that we don't need to compile our own tools
and we can benefit from bug fixes from Fedora (and contribute bug
fixes back).  The reason for (b) is that we want to implement a rich
API[1], and having a rich API means we simply have to include many
binaries.

We're already doing a lot of minimization on the image[2], deleting
man pages, language files, etc., so the image mainly just contains
binaries and libraries and kernel modules, which we cannot get rid of
because of (b).  The original pre-minimization image is 600MB or so.

Rich.

[1] http://libguestfs.org/guestfs.3.html
[2] http://manpages.ubuntu.com/manpages/lucid/man8/febootstrap-minimize.8.html

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://et.redhat.com/~rjones/libguestfs/
See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Alexander Graf

On 19.07.2010, at 09:33, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
>>> That what I am warring about too. If we are adding device we have to be
>>> sure such device can actually exist on real hw too otherwise we may have
>>> problems later.
>> 
>> I don't understand why the constraints of real h/w have anything to do
>> with this.  Can you explain?
>> 
> Each time we do something not architectural it cause us troubles later.
> So constraints of real h/w is our constrains to.
> 
>>> Also 1 second on 100M file does not look like huge gain to me.
>> 
>> Every second counts.  We're trying to get libguestfs boot times down
>> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
>> program.
>> 
> So what about making initrd smaller? I remember managing two
> distribution in 64M flash in embedded project.

Having a huge initrd basically helps in reusing a lot of existing code. We do 
the same - in general the initrd is just a subset of the applications of the 
host OS. And if you start putting perl or the likes into it, it becomes big.

I guess the best thing for now really is to try and see which code paths insb 
goes along. It should really be coalesced.

Richard, what does kvm_stat tell you while loading the initrd? Are there a lot 
of PIO requests or are we simply looping inside qemu code?

Alex

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Gleb Natapov

On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> > That what I am warring about too. If we are adding device we have to be
> > sure such device can actually exist on real hw too otherwise we may have
> > problems later.
> 
> I don't understand why the constraints of real h/w have anything to do
> with this.  Can you explain?
> 
Each time we do something not architectural it cause us troubles later.
So constraints of real h/w is our constrains to.

> > Also 1 second on 100M file does not look like huge gain to me.
> 
> Every second counts.  We're trying to get libguestfs boot times down
> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
> program.
> 
So what about making initrd smaller? I remember managing two
distribution in 64M flash in embedded project.

--
Gleb.

Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> That what I am warring about too. If we are adding device we have to be
> sure such device can actually exist on real hw too otherwise we may have
> problems later.

I don't understand why the constraints of real h/w have anything to do
with this.  Can you explain?

> Also 1 second on 100M file does not look like huge gain to me.

Every second counts.  We're trying to get libguestfs boot times down
from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
program.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

Re: [Qemu-devel] [PATCH 2/2] fw_cfg: Add blit operation for copying kernel, initrd, ..

2010-07-19 Thread Richard W.M. Jones

On Mon, Jul 19, 2010 at 01:59:22AM +0200, Aurelien Jarno wrote:
> OpenBIOS also uses the same firmware interface, so it would need to be
> changed if this patch is accepted.

The patch leaves the old interface.  Does it still need to be changed?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

97 matches

Mail list logo