[KVM-AUTOTEST][COMMIT] Delete kvm_runtest_old test
From: Eduardo Habkost ehabk...@redhat.com Signed-off-by: Eduardo Habkost ehabk...@redhat.com *** Diff is too large (405826 bytes) -- not showing *** -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Sync from site_host import SiteHost code with upstream autotest
From: Eduardo Habkost ehabk...@redhat.com Signed-off-by: Eduardo Habkost ehabk...@redhat.com diff --git a/server/hosts/__init__.py b/server/hosts/__init__.py index 21bca1c..f8b6803 100644 --- a/server/hosts/__init__.py +++ b/server/hosts/__init__.py @@ -12,7 +12,10 @@ You should 'import hosts' instead of importing every available host module. # host abstract classes from base_classes import Host from remote import RemoteHost -#from site_host import SiteHost +try: +from site_host import SiteHost +except ImportError, e: +pass # host implementation classes from ssh_host import SSHHost -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Merge autotest SVN revision 3185
From: Eduardo Habkost ehabk...@redhat.com This will pulls some new code that is needed for the changes that will be implemented on the kvm tests. Signed-off-by: Eduardo Habkost ehabk...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Moving from kvm_log to the upstream logging system.
From: Lucas Meneghel Rodrigues l...@redhat.com kvm_log can be replaced by the upstream logging system with advantages: * Uses the standard python module, that allows us to define multiple message destinations * It implements nearly all functionalities implemented on kvm_log This patch converts all kvm_log statements to logging statements. *** Diff is too large (57356 bytes) -- not showing *** -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Fixing docstrings and lines of code wider than 80 characters.
From: Lucas Meneghel Rodrigues mrodr...@redhat.com The autotest coding standards specify the preferred way of doing docstring documentation, as well as the maximum line width in 80 characters. This change converts all non compliances. *** Diff is too large (185612 bytes) -- not showing *** -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Sanitizing strings before passing them to the logging system.
From: Lucas Meneghel Rodrigues l...@redhat.com The logging system encodes messages using the utf-8 encoding by default. So sometimes it's necessary to re-encode lines when non unicode characters are thrown up by the command outputs. This patch re-encodes lines of text that will be passed to the logging system on the track_process function on kvm_utils. diff --git a/client/tests/kvm_runtest_2/kvm_utils.py b/client/tests/kvm_runtest_2/kvm_utils.py index bcc80a1..9ef5954 100644 --- a/client/tests/kvm_runtest_2/kvm_utils.py +++ b/client/tests/kvm_runtest_2/kvm_utils.py @@ -755,6 +755,9 @@ def track_process(sub, status_output=None, term_func=None, stdout_func=None, # Call stdout_func with the returned text if stdout_func: text = prefix + text.strip() +# We need to sanitize the text before passing it to the logging +# system +text = text.decode('utf-8', 'replace') stdout_func(text) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Trimming trailing whitespaces from all files on the source directory
From: Lucas Meneghel Rodrigues l...@redhat.com The previous patches introduces trailing whitespaces to all files. This patch removes trailing whitespaces from all lines of the files present on the kvm_runtest_2 test. *** Diff is too large (161734 bytes) -- not showing *** -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Changing kvm_runtest_2 to kvm.
From: Lucas Meneghel Rodrigues l...@redhat.com *** Diff is too large (85588 bytes) -- not showing *** -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Fixing import style on make_html_report
From: Lucas Meneghel Rodrigues l...@redhat.com diff --git a/client/tests/kvm/make_html_report.py b/client/tests/kvm/make_html_report.py index a660cff..988b2f3 100755 --- a/client/tests/kvm/make_html_report.py +++ b/client/tests/kvm/make_html_report.py @@ -6,13 +6,8 @@ Script used to parse the test results and generate an HTML report. @copyright: Red Hat 2008-2009 +import os, sys, re, getopt, time, datetime, commands -import os, sys -import re -import getopt -import time -import datetime -import commands format_css= html,body { -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Final adjustments on the KVM test
From: Lucas Meneghel Rodrigues l...@redhat.com diff --git a/client/tests/kvm/control b/client/tests/kvm/control index d11f72f..f4b2113 100644 --- a/client/tests/kvm/control +++ b/client/tests/kvm/control @@ -143,7 +143,7 @@ for dict in list: dependencies_satisfied = False break if dependencies_satisfied: -current_status = job.run_test(kvm_runtest_2, params=dict, +current_status = job.run_test(kvm, params=dict, tag=dict.get(shortname)) else: current_status = False diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py index 83ec9be..91ec89a 100644 --- a/client/tests/kvm/kvm.py +++ b/client/tests/kvm/kvm.py @@ -61,7 +61,7 @@ class kvm(test.test): keys = params.keys() keys.sort() for key in keys: -logging.debug(%s = %s % (key, params[key])) +logging.debug(%s = %s % (key, params[key])) self.write_test_keyval({key: params[key]}) # Open the environment file -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Add Dror Russo to author lists
From: Eduardo Habkost ehabk...@redhat.com So the code will match the code on the upstream autotest SVN repository. Signed-off-by: Eduardo Habkost ehabk...@redhat.com diff --git a/client/tests/kvm/control b/client/tests/kvm/control index f4b2113..eb38686 100644 --- a/client/tests/kvm/control +++ b/client/tests/kvm/control @@ -1,5 +1,6 @@ AUTHOR = u...@redhat.com (Uri Lublin) +dru...@redhat.com (Dror Russo) mgold...@redhat.com (Michael Goldish) dh...@redhat.com (David Huff) aerom...@redhat.com (Alexey Eromenko) diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py index 91ec89a..1b9013c 100644 --- a/client/tests/kvm/kvm.py +++ b/client/tests/kvm/kvm.py @@ -17,6 +17,7 @@ class kvm(test.test): @copyright: Red Hat 2008-2009 @author: Uri Lublin (u...@redhat.com) +@author: Dror Russo (dru...@redhat.com) @author: Michael Goldish (mgold...@redhat.com) @author: David Huff (dh...@redhat.com) @author: Alexey Eromenko (aerom...@redhat.com) diff --git a/client/tests/kvm/make_html_report.py b/client/tests/kvm/make_html_report.py index 988b2f3..6aed39e 100755 --- a/client/tests/kvm/make_html_report.py +++ b/client/tests/kvm/make_html_report.py @@ -4,6 +4,7 @@ Script used to parse the test results and generate an HTML report. @copyright: (c)2005-2007 Matt Kruse (javascripttoolbox.com) @copyright: Red Hat 2008-2009 +...@author: Dror Russo (dru...@redhat.com) import os, sys, re, getopt, time, datetime, commands -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST][COMMIT] Merge branch 'master' of git://github.com/ehabkost/autotest into kvm-merge2
From: Eduardo Habkost ehabk...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts
- Avi Kivity a...@redhat.com wrote: David Huff wrote: This patch will run pre and post scripts defined in config file with the parameter pre_command and post_command post_command. Also exports all the prameters in preprocess for passing arguments to the script. + #execute any pre_commands +pre_command = params.get(pre_command) +if pre_command: +# export environment vars +for k in params.keys(): +kvm_log.info(Adding KVM_TEST_%s to Environment % (k)) +os.putenv(KVM_TEST_%s % (k), str(params[k])) +# execute command +kvm_log.info(Executing command '%s'... % pre_command) +timeout = int(params.get(pre_commmand_timeout, 600)) +(status, pid, output) = kvm_utils.run_bg(cd %s; %s % (test.bindir, pre_command), + None, kvm_log.debug, (pre_command) , timeout=timeout) +if status != 0: +kvm_utils.safe_kill(pid, signal.SIGTERM) +raise error.TestError, Custom processing pre_command failed kvm_utils.run_bg should throw an exception instead of returning status. - What if we're interested in the status for some reason? Its value may indicate what went wrong with the child process. - If we throw an exception we should add a parameter that controls whether an exception should be thrown (something like ignore_status) because in some cases we don't want to throw an exception. - A run_bg call has 3 possible outcomes: success (status == 0), failure (status != 0) and timeout (process still running). If we throw an exception upon failure, what do we do upon timeout? Sometimes it's good (qemu should keep running unless something went wrong) and sometimes bad (pre_command should probably not time out). So we end up needing an ignore_timeout parameter as well. - What if we want the test to fail with an informative test-specific exception such as something failed after migration? But if status != 0, will there actually be a pid to kill? If the timeout expires and the process is still running, status is None. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()
On Tue, Jun 02, 2009 at 01:41:05PM -0400, Gregory Haskins wrote: And having close not clean up the state unless you do an ioctl first is very messy IMO - I don't think you'll find any such examples in kernel. I agree, and that is why I am advocating this POLLHUP solution. It was only this other way to begin with because the technology didn't exist until Davide showed me the light. Problem with your request is that I already looked into what is essentially a bi-directional reference problem (for a different reason) when I started the POLLHUP series. Its messy to do this in a way that doesn't negatively impact the fast path (introducing locking, etc) or make my head explode making sure it doesn't race. Afaict, we would need to solve this problem to do what you are proposing (patches welcome). If this hybrid decoupled-deassign + unified-close is indeed an important feature set, I suggest that we still consider this POLLHUP series for inclusion, and then someone can re-introduce DEASSIGN support in the future as a CAP bit extension. That way we at least get the desirable close() properties that we both seem in favor of, and get this advanced use case when we need it (and can figure out the locking design). FWIW, I took a look and yes, it is non-trivial. I concur, we can always add the deassign ioctl later. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts
Michael Goldish wrote: - Avi Kivity a...@redhat.com wrote: David Huff wrote: This patch will run pre and post scripts defined in config file with the parameter pre_command and post_command post_command. Also exports all the prameters in preprocess for passing arguments to the script. + #execute any pre_commands +pre_command = params.get(pre_command) +if pre_command: +# export environment vars +for k in params.keys(): +kvm_log.info(Adding KVM_TEST_%s to Environment % (k)) +os.putenv(KVM_TEST_%s % (k), str(params[k])) +# execute command +kvm_log.info(Executing command '%s'... % pre_command) +timeout = int(params.get(pre_commmand_timeout, 600)) +(status, pid, output) = kvm_utils.run_bg(cd %s; %s % (test.bindir, pre_command), + None, kvm_log.debug, (pre_command) , timeout=timeout) +if status != 0: +kvm_utils.safe_kill(pid, signal.SIGTERM) +raise error.TestError, Custom processing pre_command failed kvm_utils.run_bg should throw an exception instead of returning status. - What if we're interested in the status for some reason? Its value may indicate what went wrong with the child process. Put it in the exception string. - If we throw an exception we should add a parameter that controls whether an exception should be thrown (something like ignore_status) because in some cases we don't want to throw an exception. I'll complain every time I see it. If you don't care if the command succeeds or not, why run it in the first place? - A run_bg call has 3 possible outcomes: success (status == 0), failure (status != 0) and timeout (process still running). If we throw an exception upon failure, what do we do upon timeout? Sometimes it's good (qemu should keep running unless something went wrong) and sometimes bad (pre_command should probably not time out). So we end up needing an ignore_timeout parameter as well. A run_bg() call should return an object with a .join() method, or throw an exception. At some point you must call the join() method, which throws an exception or returns None. - What if we want the test to fail with an informative test-specific exception such as something failed after migration? Chained exceptions can provide detailed information. But if status != 0, will there actually be a pid to kill? If the timeout expires and the process is still running, status is None. functions which can return with three possible outcomes are difficult to use. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-autotest: False PASS results
- Uri Lublin u...@redhat.com wrote: On 05/10/2009 08:15 PM, sudhir kumar wrote: Hi Uri, Any comments? -- Forwarded message -- From: sudhir kumarsmalik...@gmail.com The kvm-autotest shows the following PASS results for migration, while the VM was crashed and test should have failed. Here is the sequence of test commands and results grepped from kvm-autotest output. /root/sudhir/regression/test/kvm-autotest-phx/client/tests/kvm_runtest_2/qemu -name 'vm1' -monitor unix:/tmp/monitor-20090508-055624-QSuS,server,nowait -drive file=/root/sudhir/regression/test/kvm-autotest-phx/client/tests/kvm_runtest_2/images/rhel5-32.raw,if=ide,boot=on -net nic,vlan=0 -net user,vlan=0 -m 8192 -smp 4 -redir tcp:5000::22 -vnc :1 /root/sudhir/regression/test/kvm-autotest-phx/client/tests/kvm_runtest_2/qemu -name 'dst' -monitor unix:/tmp/monitor-20090508-055625-iamW,server,nowait -drive file=/root/sudhir/regression/test/kvm-autotest-phx/client/tests/kvm_runtest_2/images/rhel5-32.raw,if=ide,boot=on -net nic,vlan=0 -net user,vlan=0 -m 8192 -smp 4 -redir tcp:5001::22 -vnc :2 -incoming tcp:0:5200 2009-05-08 05:58:43,471 Configuring logger for client level GOOD kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.1 END GOOD kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.1 GOOD kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2 kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2 timestamp=1241762371 localtime=May 08 05:59:31 completed successfully Persistent state variable __group_level now set to 1 END GOOD kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2 kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2 timestamp=1241762371 localtime=May 08 05:59:31 From the test output it looks that the test was succesful to log into the guest after migration: 20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: Migration finished successfully 20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: send_monitor_cmd: Sending monitor command: screendump /root/sudhir/regression/test/kvm-autotest-phx/client/results/default/kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2/debug/migration_post.ppm 20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: send_monitor_cmd: Sending monitor command: screendump /root/sudhir/regression/test/kvm-autotest-phx/client/results/default/kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2/debug/migration_pre.ppm 20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: send_monitor_cmd: Sending monitor command: quit 20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: is_sshd_running: Timeout 20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: Logging into guest after migration... 20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: remote_login: Trying to login... 20090508-055927 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: remote_login: Got 'Are you sure...'; sending 'yes' 20090508-055927 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: remote_login: Got password prompt; sending '123456' 20090508-055928 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: remote_login: Got shell prompt -- logged in 20090508-055928 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: Logged in after migration 20090508-055928 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: get_command_status_output: Sending command: help 20090508-055930 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: postprocess_vm: Postprocessing VM 'vm1'... 20090508-055930 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG: postprocess_vm: VM object found in environment When I did vnc to the final migrated VM was crashed with a call trace as shown in the attachment. Quite less possible that the call trace appeared after the test finished as migration with memory more than 4GB is already broken [BUG 52527]. This looks a false PASS to me. Any idea how can we handle such falso positive results? Shall we wait for sometime after migration, log into the vm, do some work or run some good test, get output and report that the vm is alive? I don't think it's a False PASS. It seems the test was able to ssh into the guest, and run a command on the guest. I wouldn't call this a false PASS either. The migration_test_command parameter gives you some control over the test. You can set it to something more informative than the default 'help'. Note that it should probably be different for Linux and Windows guests. The migration test runs the command before and after migration, compares the outputs, and fails if they don't match. Is it possible that the VNC connection caused the crash? Currently we only run migration once (round-trip). I think we should run migration more than once (using iterations). If the guest crashes due
Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts
- Avi Kivity a...@redhat.com wrote: Michael Goldish wrote: - Avi Kivity a...@redhat.com wrote: David Huff wrote: This patch will run pre and post scripts defined in config file with the parameter pre_command and post_command post_command. Also exports all the prameters in preprocess for passing arguments to the script. + #execute any pre_commands +pre_command = params.get(pre_command) +if pre_command: +# export environment vars +for k in params.keys(): +kvm_log.info(Adding KVM_TEST_%s to Environment % (k)) +os.putenv(KVM_TEST_%s % (k), str(params[k])) +# execute command +kvm_log.info(Executing command '%s'... % pre_command) +timeout = int(params.get(pre_commmand_timeout, 600)) +(status, pid, output) = kvm_utils.run_bg(cd %s; %s % (test.bindir, pre_command), + None, kvm_log.debug, (pre_command) , timeout=timeout) +if status != 0: +kvm_utils.safe_kill(pid, signal.SIGTERM) +raise error.TestError, Custom processing pre_command failed kvm_utils.run_bg should throw an exception instead of returning status. - What if we're interested in the status for some reason? Its value may indicate what went wrong with the child process. Put it in the exception string. But I want its value to be examined programmatically in the code. It's less comfortable to retrieve it from a string. - If we throw an exception we should add a parameter that controls whether an exception should be thrown (something like ignore_status) because in some cases we don't want to throw an exception. I'll complain every time I see it. If you don't care if the command succeeds or not, why run it in the first place? I care, but sometimes I don't want to fail the test when a command fails. I can put every run_bg() call in a try-except statement, but that misses the point of raising exceptions in the first place. - A run_bg call has 3 possible outcomes: success (status == 0), failure (status != 0) and timeout (process still running). If we throw an exception upon failure, what do we do upon timeout? Sometimes it's good (qemu should keep running unless something went wrong) and sometimes bad (pre_command should probably not time out). So we end up needing an ignore_timeout parameter as well. A run_bg() call should return an object with a .join() method, or throw an exception. At some point you must call the join() method, which throws an exception or returns None. OK, I have queued patches to make run_bg() return such an object anyway, except for the exceptions part. - What if we want the test to fail with an informative test-specific exception such as something failed after migration? Chained exceptions can provide detailed information. Wouldn't it complicate the test code? How can I provide a detailed message such as test command failed after migration -- can you illustrate this in a small example? But if status != 0, will there actually be a pid to kill? If the timeout expires and the process is still running, status is None. functions which can return with three possible outcomes are difficult to use. I tend to see it as two possible outcomes: completion or timeout. In the former case, status is returned. In the latter, None is returned. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts
Michael Goldish wrote: - What if we're interested in the status for some reason? Its value may indicate what went wrong with the child process. Put it in the exception string. But I want its value to be examined programmatically in the code. It's less comfortable to retrieve it from a string. Then create a custom exception object and put it in a field. - If we throw an exception we should add a parameter that controls whether an exception should be thrown (something like ignore_status) because in some cases we don't want to throw an exception. I'll complain every time I see it. If you don't care if the command succeeds or not, why run it in the first place? I care, but sometimes I don't want to fail the test when a command fails. I can put every run_bg() call in a try-except statement, but that misses the point of raising exceptions in the first place. I think it's very rare to want to let the test continue even if some command fails. Can you give examples? exception such as something failed after migration? Chained exceptions can provide detailed information. Wouldn't it complicate the test code? How can I provide a detailed message such as test command failed after migration -- can you illustrate this in a small example? try: code code code except e: raise ChainedException('exception while running migration test', e) Instead of checking each code line, you provide a wrapper for the entire test. But if status != 0, will there actually be a pid to kill? If the timeout expires and the process is still running, status is None. functions which can return with three possible outcomes are difficult to use. I tend to see it as two possible outcomes: completion or timeout. In the former case, status is returned. In the latter, None is returned. The caller has to test for three possible outcomes. Success, failure, and 'command is running in the background'. If your callsites don't check for all three, something's wrong. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] kvm-kmod-2.6.30-rc8
This is a package containing the kvm external module, based on the 2.6.30 series. kvm-kmod-2.6.30-rc8 contains the kvm code that is present in Linux 2.6.30-rc8, except that it can run on older kernels. It is a good companion to the qemu-kvm-0.10 series. Note that performance and features will vary with the host kernel used. Changes from kvm-kmod-2.6.30-rc6: - update source to Linux 2.6.30-rc6 - fix futex failures on i386 PAE guest kernels - build fix for iommu-less host kernels - build fixes on pre-2.6.18 kernels (Jan Kiszka) - makefile typo fix (Jan Kiszka) http://www.linux-kvm.org -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Add serial number support for virtio_blk, V4
On Mon, Jun 01 2009, Rusty Russell wrote: On Fri, 29 May 2009 01:45:27 pm john cooper wrote: virtio_blk-serial-4.patch Hate to ask dumb questions, but is there a scsi equivalent of this? It'd be nice if we could avoid being ATA-specific in the long run... SCSI has mode pages, where ATA pretty much stuffs everything into the identify data. Also, why u16? The identify page is word based, so u16 makes sense. Thanks, Rusty. +/* return ATA identify data + */ +static int virtblk_identify(struct gendisk *disk, void *argp) +{ + struct virtio_blk *vblk = disk-private_data; + u16 *id; + int err = -ENOMEM; + + id = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL); + if (!id) + goto out; + + err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY, + offsetof(struct virtio_blk_config, identify), id, + VIRTIO_BLK_ID_BYTES); -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] always halt non-bsp cpu.
On Tue, Jun 02, 2009 at 03:37:47PM -0400, Glauber Costa wrote: This is not kvm specific, and should do fine in plain qemu Signed-off-by: Glauber Costa glom...@redhat.com --- hw/apic.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 86aa6b6..2eddba0 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -467,8 +467,7 @@ static void apic_init_ipi(APICState *s) cpu_reset(s-cpu_env); -if (!(s-apicbase MSR_IA32_APICBASE_BSP) -(!kvm_enabled() || !qemu_kvm_irqchip_in_kernel())) +if (!(s-apicbase MSR_IA32_APICBASE_BSP)) s-cpu_env-halted = 1; if (kvm_enabled() !qemu_kvm_irqchip_in_kernel()) Once upon a time there was a good reason to add this: 358bdca5beb733c983bd5e7406c7893e19e9600b What have changed? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH] qemu: virtio save/load bindings
Anthony Liguori wrote: Michael S. Tsirkin wrote: Implement bindings for virtio save/load. Use them in virtio pci. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Is anyone working to fill in load/save bindings so that saving virtio devices works? Here's a trivial patch to do this (this one is on top of my MSI-X patchset). Comments? hw/virtio-pci.c | 49 - hw/virtio.c | 31 ++- hw/virtio.h |4 3 files changed, 66 insertions(+), 18 deletions(-) static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, diff --git a/hw/virtio.c b/hw/virtio.c index 63ffcff..b773dff 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -568,9 +568,8 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f) { int i; -/* FIXME: load/save binding. */ -//pci_device_save(vdev-pci_dev, f); -//msix_save(vdev-pci_dev, f); qdev regressed save/restore? What else is broken right now from the qdev commit? I'm beginning to think committing in the state it was in was a mistake. Paul, can you put together a TODO so that we know all of the things that have regressed so we can get things back into shape? virtio is still broken. Anthony, are you planning to commit this patch? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] always halt non-bsp cpu.
Glauber Costa wrote: On Wed, Jun 03, 2009 at 12:32:04AM +0200, Jan Kiszka wrote: Glauber Costa wrote: On Wed, Jun 03, 2009 at 12:01:00AM +0200, Jan Kiszka wrote: Glauber Costa wrote: On Tue, Jun 02, 2009 at 10:35:47PM +0200, Jan Kiszka wrote: Glauber Costa wrote: This is not kvm specific, and should do fine in plain qemu This is fine with plain qemu already. The problem, IIUC, is that in-kernel kvm irqchip does not have a chance to remove the halted state again. Did you test the effect of this patch on that scenario? What makes it safe to be removed now? IIRC, the in kernel irqchip sets halted = 0 in the very beginning of the vcpu initialization. It is tested here with in-kernel irqchip and works, so probably not a problem, unless you can spot something. At least your patch applied alone breaks -smp 1 here. But the whole management of env-halted for the in-kernel irqchip in qemu-kvm is a bit hacky IMHO. Maybe it's time to rethink this. Would be nice to always see a consistent halted in user space, specifically for debugging purposes. out of curiosity: did you apply the whole series? Meanwhile I did, but it makes no difference. Can you try putting the following patch before this one? If it helps you to understand the issue, I will do so later. But I *really* suggest to take this chance and develop in-kernel irqchip support that does not mess with halted, rather keeps it consistent (on register sync) and maybe adds exceptions from if (!env-halted) tests where required. IMHO, this is mandatory for an upstream merge! Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] always halt non-bsp cpu.
Glauber Costa wrote: On Wed, Jun 03, 2009 at 12:32:04AM +0200, Jan Kiszka wrote: Glauber Costa wrote: On Wed, Jun 03, 2009 at 12:01:00AM +0200, Jan Kiszka wrote: Glauber Costa wrote: On Tue, Jun 02, 2009 at 10:35:47PM +0200, Jan Kiszka wrote: Glauber Costa wrote: This is not kvm specific, and should do fine in plain qemu This is fine with plain qemu already. The problem, IIUC, is that in-kernel kvm irqchip does not have a chance to remove the halted state again. Did you test the effect of this patch on that scenario? What makes it safe to be removed now? IIRC, the in kernel irqchip sets halted = 0 in the very beginning of the vcpu initialization. It is tested here with in-kernel irqchip and works, so probably not a problem, unless you can spot something. At least your patch applied alone breaks -smp 1 here. But the whole management of env-halted for the in-kernel irqchip in qemu-kvm is a bit hacky IMHO. Maybe it's time to rethink this. Would be nice to always see a consistent halted in user space, specifically for debugging purposes. out of curiosity: did you apply the whole series? Meanwhile I did, but it makes no difference. Jan, can you be more specific on why it breaks? I'm double trying it here, and it works for me just fine. It locked up in the BIOS after reset from a booted SMP guest, somewhere before selecting the boot device. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] always halt non-bsp cpu.
On Wed, Jun 03, 2009 at 01:01:29PM +0200, Jan Kiszka wrote: Glauber Costa wrote: On Wed, Jun 03, 2009 at 12:32:04AM +0200, Jan Kiszka wrote: Glauber Costa wrote: On Wed, Jun 03, 2009 at 12:01:00AM +0200, Jan Kiszka wrote: Glauber Costa wrote: On Tue, Jun 02, 2009 at 10:35:47PM +0200, Jan Kiszka wrote: Glauber Costa wrote: This is not kvm specific, and should do fine in plain qemu This is fine with plain qemu already. The problem, IIUC, is that in-kernel kvm irqchip does not have a chance to remove the halted state again. Did you test the effect of this patch on that scenario? What makes it safe to be removed now? IIRC, the in kernel irqchip sets halted = 0 in the very beginning of the vcpu initialization. It is tested here with in-kernel irqchip and works, so probably not a problem, unless you can spot something. At least your patch applied alone breaks -smp 1 here. But the whole management of env-halted for the in-kernel irqchip in qemu-kvm is a bit hacky IMHO. Maybe it's time to rethink this. Would be nice to always see a consistent halted in user space, specifically for debugging purposes. out of curiosity: did you apply the whole series? Meanwhile I did, but it makes no difference. Can you try putting the following patch before this one? If it helps you to understand the issue, I will do so later. But I *really* suggest to take this chance and develop in-kernel irqchip support that does not mess with halted, rather keeps it consistent (on register sync) and maybe adds exceptions from if (!env-halted) tests where required. IMHO, this is mandatory for an upstream merge! The difference between kernel/userspace irq chip is that in former case halted cpu sleeps in the kernel and in later case in the usespace (well also in the kernel but not in kvm code). We currently abuse halted to do different sleeps. Cpu loop should be reworked. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2800447 ] kvm fails to build against out of tree kernel builds+patch
Bugs item #2800447, was opened at 2009-06-03 11:17 Message generated for change (Tracker Item Submitted) made by tuxjay You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: James (tuxjay) Assigned to: Nobody/Anonymous (nobody) Summary: kvm fails to build against out of tree kernel builds+patch Initial Comment: Hi, The first problem is the -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2800447 ] kvm fails to build against out of tree kernel builds+patch
Bugs item #2800447, was opened at 2009-06-03 11:17 Message generated for change (Comment added) made by tuxjay You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: James (tuxjay) Assigned to: Nobody/Anonymous (nobody) Summary: kvm fails to build against out of tree kernel builds+patch Initial Comment: Hi, The first problem is the -- Comment By: James (tuxjay) Date: 2009-06-03 11:19 Message: Hi, The first problem is that kvm fails to detect the kernel source directory correctly.. In configure the line kernelsourcedir=${kerneldir%/*}/source Only works if ../source is the build parent (e.g. if the source dir is /lib/modules/version/source) The first patch fixes that. The next is a build problem: -Iarch/${ARCH_DIR}/include should be: -I$(KERNELSOURCEDIR)/${ARCH_DIR}/include In the case of out of place builds. The second patch fixes that. Cheers, James -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM_AUTOTEST] unattended installs
On Tue, 2009-06-02 at 20:25 -0500, Charles Duffy wrote: David Huff wrote: I had some issues getting the guest to read a vvfat floppy passed to the guest via qemu option -fda fat:floppy:rw:./floppy it would pass the dir as a vfat disk and the guest could mount it, however guest could not read the files on the vdisk.. I'm surprised you opted to work around this with a loop mount rather than using mtools, given the former approach's need for root. Autotest runs the tests as root, so we will have root privileges anyway... Granted, getting files off the ISO _could_ also require root -- in my own analog to these scripts, I use iso-read (from the libcdio package): # Pull kernel and initrd iso-read --image $OS_ISO \ --extract /isolinux/vmlinuz \ --output-file $TMPDIR/vmlinuz iso-read --image $OS_ISO \ --extract /isolinux/initrd.img \ --output-file $TMPDIR/initrd.img # Write floppy image mformat -C -f 1440 -i $TMPDIR/floppy.img :: mcopy $TMPDIR/floppy.d/* -i $TMPDIR/floppy.img ::/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()
Michael S. Tsirkin wrote: On Tue, Jun 02, 2009 at 01:41:05PM -0400, Gregory Haskins wrote: And having close not clean up the state unless you do an ioctl first is very messy IMO - I don't think you'll find any such examples in kernel. I agree, and that is why I am advocating this POLLHUP solution. It was only this other way to begin with because the technology didn't exist until Davide showed me the light. Problem with your request is that I already looked into what is essentially a bi-directional reference problem (for a different reason) when I started the POLLHUP series. Its messy to do this in a way that doesn't negatively impact the fast path (introducing locking, etc) or make my head explode making sure it doesn't race. Afaict, we would need to solve this problem to do what you are proposing (patches welcome). If this hybrid decoupled-deassign + unified-close is indeed an important feature set, I suggest that we still consider this POLLHUP series for inclusion, and then someone can re-introduce DEASSIGN support in the future as a CAP bit extension. That way we at least get the desirable close() properties that we both seem in favor of, and get this advanced use case when we need it (and can figure out the locking design). FWIW, I took a look and yes, it is non-trivial. I concur, we can always add the deassign ioctl later. Sounds good, Michael. Thanks! -Greg signature.asc Description: OpenPGP digital signature
[ kvm-Bugs-2800447 ] kvm fails to build against out of tree kernel builds+patch
Bugs item #2800447, was opened at 2009-06-03 13:17 Message generated for change (Comment added) made by kiszka You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: James (tuxjay) Assigned to: Nobody/Anonymous (nobody) Summary: kvm fails to build against out of tree kernel builds+patch Initial Comment: Hi, The first problem is the -- Comment By: Jan Kiszka (kiszka) Date: 2009-06-03 13:41 Message: Please post your patches with individual changelogs and signed-off to k...@vger.kernel.org. TiA. -- Comment By: James (tuxjay) Date: 2009-06-03 13:19 Message: Hi, The first problem is that kvm fails to detect the kernel source directory correctly.. In configure the line kernelsourcedir=${kerneldir%/*}/source Only works if ../source is the build parent (e.g. if the source dir is /lib/modules/version/source) The first patch fixes that. The next is a build problem: -Iarch/${ARCH_DIR}/include should be: -I$(KERNELSOURCEDIR)/${ARCH_DIR}/include In the case of out of place builds. The second patch fixes that. Cheers, James -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] revert part of 3db8b916e merge
kvm_*_mpstate() cannot be called from kvm_arch_*_registers() since kvm_arch_*_registers() sometimes called from io thread, but kvm_*_mpstate() can be called only by cpu thread. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/qemu-kvm.c b/qemu-kvm.c index 68d3b92..7ed1e06 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1431,26 +1431,3 @@ void qemu_kvm_cpu_stop(CPUState *env) if (kvm_enabled()) env-kvm_cpu_state.stopped = 1; } - -void kvm_arch_get_registers(CPUState *env) -{ -kvm_save_registers(env); -kvm_save_mpstate(env); -} - -void kvm_arch_put_registers(CPUState *env) -{ -kvm_load_registers(env); -kvm_load_mpstate(env); -} - - -void cpu_synchronize_state(CPUState *env, int modified) -{ -if (kvm_enabled()) { -if (modified) -kvm_arch_put_registers(env); -else -kvm_arch_get_registers(env); -} -} diff --git a/qemu-kvm.h b/qemu-kvm.h index 725589b..5d47e88 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -216,10 +216,25 @@ int kvm_log_stop(target_phys_addr_t phys_addr, target_phys_addr_t len); static inline int kvm_sync_vcpus(void) { return 0; } -void kvm_arch_get_registers(CPUState *env); -void kvm_arch_put_registers(CPUState *env); +static inline void kvm_arch_get_registers(CPUState *env) +{ +kvm_save_registers(env); +} -void cpu_synchronize_state(CPUState *env, int modified); +static inline void kvm_arch_put_registers(CPUState *env) +{ +kvm_load_registers(env); +} + +static inline void cpu_synchronize_state(CPUState *env, int modified) +{ +if (kvm_enabled()) { +if (modified) +kvm_arch_put_registers(env); +else +kvm_arch_get_registers(env); +} +} uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int reg); diff --git a/target-i386/machine.c b/target-i386/machine.c index f280d3d..07df1e1 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -32,7 +32,10 @@ void cpu_save(QEMUFile *f, void *opaque) int32_t pending_irq; int i, bit; -cpu_synchronize_state(env, 0); +if (kvm_enabled()) { +kvm_save_registers(env); +kvm_save_mpstate(env); +} for(i = 0; i CPU_NB_REGS; i++) qemu_put_betls(f, env-regs[i]); -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] revert part of 3db8b916e merge
Gleb Natapov wrote: kvm_*_mpstate() cannot be called from kvm_arch_*_registers() since kvm_arch_*_registers() sometimes called from io thread, but kvm_*_mpstate() can be called only by cpu thread. I really dislike vcpu functions to be called from outside the vcpu thread. Who are the callers? Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/qemu-kvm.c b/qemu-kvm.c index 68d3b92..7ed1e06 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1431,26 +1431,3 @@ void qemu_kvm_cpu_stop(CPUState *env) if (kvm_enabled()) env-kvm_cpu_state.stopped = 1; } - -void kvm_arch_get_registers(CPUState *env) -{ -kvm_save_registers(env); -kvm_save_mpstate(env); -} - -void kvm_arch_put_registers(CPUState *env) -{ -kvm_load_registers(env); -kvm_load_mpstate(env); -} - - -void cpu_synchronize_state(CPUState *env, int modified) -{ -if (kvm_enabled()) { -if (modified) -kvm_arch_put_registers(env); -else -kvm_arch_get_registers(env); -} -} diff --git a/qemu-kvm.h b/qemu-kvm.h index 725589b..5d47e88 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -216,10 +216,25 @@ int kvm_log_stop(target_phys_addr_t phys_addr, target_phys_addr_t len); static inline int kvm_sync_vcpus(void) { return 0; } -void kvm_arch_get_registers(CPUState *env); -void kvm_arch_put_registers(CPUState *env); +static inline void kvm_arch_get_registers(CPUState *env) +{ +kvm_save_registers(env); +} -void cpu_synchronize_state(CPUState *env, int modified); +static inline void kvm_arch_put_registers(CPUState *env) +{ +kvm_load_registers(env); +} + +static inline void cpu_synchronize_state(CPUState *env, int modified) +{ +if (kvm_enabled()) { +if (modified) +kvm_arch_put_registers(env); +else +kvm_arch_get_registers(env); +} +} uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int reg); diff --git a/target-i386/machine.c b/target-i386/machine.c index f280d3d..07df1e1 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -32,7 +32,10 @@ void cpu_save(QEMUFile *f, void *opaque) int32_t pending_irq; int i, bit; -cpu_synchronize_state(env, 0); +if (kvm_enabled()) { +kvm_save_registers(env); +kvm_save_mpstate(env); +} for(i = 0; i CPU_NB_REGS; i++) qemu_put_betls(f, env-regs[i]); -- Gleb. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] revert part of 3db8b916e merge
On Wed, Jun 03, 2009 at 03:16:46PM +0300, Avi Kivity wrote: Gleb Natapov wrote: kvm_*_mpstate() cannot be called from kvm_arch_*_registers() since kvm_arch_*_registers() sometimes called from io thread, but kvm_*_mpstate() can be called only by cpu thread. I really dislike vcpu functions to be called from outside the vcpu thread. Who are the callers? monitor.c. May be others (git grep cpu_synchronize_state). But kvm_save_registers() does the right thing and calls another function on vcpu thread. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/qemu-kvm.c b/qemu-kvm.c index 68d3b92..7ed1e06 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1431,26 +1431,3 @@ void qemu_kvm_cpu_stop(CPUState *env) if (kvm_enabled()) env-kvm_cpu_state.stopped = 1; } - -void kvm_arch_get_registers(CPUState *env) -{ -kvm_save_registers(env); -kvm_save_mpstate(env); -} - -void kvm_arch_put_registers(CPUState *env) -{ -kvm_load_registers(env); -kvm_load_mpstate(env); -} - - -void cpu_synchronize_state(CPUState *env, int modified) -{ -if (kvm_enabled()) { -if (modified) -kvm_arch_put_registers(env); -else -kvm_arch_get_registers(env); -} -} diff --git a/qemu-kvm.h b/qemu-kvm.h index 725589b..5d47e88 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -216,10 +216,25 @@ int kvm_log_stop(target_phys_addr_t phys_addr, target_phys_addr_t len); static inline int kvm_sync_vcpus(void) { return 0; } -void kvm_arch_get_registers(CPUState *env); -void kvm_arch_put_registers(CPUState *env); +static inline void kvm_arch_get_registers(CPUState *env) +{ +kvm_save_registers(env); +} -void cpu_synchronize_state(CPUState *env, int modified); +static inline void kvm_arch_put_registers(CPUState *env) +{ +kvm_load_registers(env); +} + +static inline void cpu_synchronize_state(CPUState *env, int modified) +{ +if (kvm_enabled()) { +if (modified) +kvm_arch_put_registers(env); +else +kvm_arch_get_registers(env); +} +} uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int reg); diff --git a/target-i386/machine.c b/target-i386/machine.c index f280d3d..07df1e1 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -32,7 +32,10 @@ void cpu_save(QEMUFile *f, void *opaque) int32_t pending_irq; int i, bit; -cpu_synchronize_state(env, 0); +if (kvm_enabled()) { +kvm_save_registers(env); +kvm_save_mpstate(env); +} for(i = 0; i CPU_NB_REGS; i++) qemu_put_betls(f, env-regs[i]); -- Gleb. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2800447 ] kvm fails to build against out of tree kernel builds+patch
Bugs item #2800447, was opened at 2009-06-03 13:17 Message generated for change (Comment added) made by kiszka You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: James (tuxjay) Assigned to: Nobody/Anonymous (nobody) Summary: kvm fails to build against out of tree kernel builds+patch Initial Comment: Hi, The first problem is the -- Comment By: Jan Kiszka (kiszka) Date: 2009-06-03 14:24 Message: That's trivial, look at other patches posted to the list: add a plain Signed-off-by: YOUR NAME your-em...@address. -- Comment By: James (tuxjay) Date: 2009-06-03 14:19 Message: I've prepared the patches and changelog entries, how do I get it signed off? I've been searching the website for the patch procedure and couldn't find it, sorry. Thank you -- Comment By: Jan Kiszka (kiszka) Date: 2009-06-03 13:41 Message: Please post your patches with individual changelogs and signed-off to k...@vger.kernel.org. TiA. -- Comment By: James (tuxjay) Date: 2009-06-03 13:19 Message: Hi, The first problem is that kvm fails to detect the kernel source directory correctly.. In configure the line kernelsourcedir=${kerneldir%/*}/source Only works if ../source is the build parent (e.g. if the source dir is /lib/modules/version/source) The first patch fixes that. The next is a build problem: -Iarch/${ARCH_DIR}/include should be: -I$(KERNELSOURCEDIR)/${ARCH_DIR}/include In the case of out of place builds. The second patch fixes that. Cheers, James -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fix detection of kernel source directory
Fix detection of kernel source directory when it is different to build directory. Signed-off-by: James Pike ja...@chilon.net diff --git a/configure b/configure index e01ba98..c3b7a5f 100755 --- a/configure +++ b/configure @@ -91,7 +91,7 @@ arch=${arch%%-*} # see if we have split build and source directories if [ -d $kerneldir/include2 ]; then -kernelsourcedir=${kerneldir%/*}/source +kernelsourcedir=`readlink -f $kerneldir/source` fi if [ -n $no_uname -a $want_module ]; then
[PATCH] fix include paths when the kernel source and build directory are different
Use correct architecture includes when kernel source and build directory are different. Signed-off-by: James Pike ja...@chilon.net diff --git a/Makefile b/Makefile index 95e4c81..ad08c45 100644 --- a/Makefile +++ b/Makefile @@ -27,8 +27,9 @@ all:: prerequisite # include header priority 1) $LINUX 2) $KERNELDIR 3) include-compat $(MAKE) -C $(KERNELDIR) M=`pwd` \ LINUXINCLUDE=-I`pwd`/include -Iinclude \ - $(if $(KERNELSOURCEDIR),-Iinclude2 -I$(KERNELSOURCEDIR)/include) \ - -Iarch/${ARCH_DIR}/include -I`pwd`/include-compat \ + $(if $(KERNELSOURCEDIR),\ + -Iinclude2 -I$(KERNELSOURCEDIR)/include -I$(KERNELSOURCEDIR)/arch/${ARCH_DIR}/include, \ + -Iarch/${ARCH_DIR}/include) -I`pwd`/include-compat \ -include include/linux/autoconf.h \ -include `pwd`/$(ARCH_DIR)/external-module-compat.h $(module_defines) \ $$@
Re: Windows Server 2008 VM performance
Avi Kivity wrote: Andrew Theurer wrote: Is there a virtio_block driver to test? There is, but it isn't available yet. OK. Can I assume a better virtio_net driver is in the works as well? Can we find the root cause of the exits (is there a way to get stack dump or something that can show where there are coming from)? Marcelo is working on a super-duper easy to use kvm trace which can show what's going on. The old one is reasonably easy though it exports less data. If you can generate some traces, I'll have a look at them. Thanks Avi. I'll try out kvm-86 and see if I can generate some kvm trace data. -Andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl
On Tue, Jun 02, 2009 at 09:53:29PM -0400, Gregory Haskins wrote: Paul E. McKenney wrote: On Tue, Jun 02, 2009 at 02:23:14PM -0400, Gregory Haskins wrote: Paul E. McKenney wrote: On Tue, Jun 02, 2009 at 11:15:38AM -0400, Gregory Haskins wrote: Assigning an irqfd object to a kvm object creates a relationship that we currently manage by having the kvm oject acquire/hold a file* reference to the underlying eventfd. The lifetime of these objects is properly maintained by decoupling the two objects whenever the irqfd is closed or kvm is closed, whichever comes first. However, the irqfd close method is less than ideal since it requires two system calls to complete (one for ioctl(kvmfd, IRQFD_DEASSIGN), the other for close(eventfd)). This dual-call approach was utilized because there was no notification mechanism on the eventfd side at the time irqfd was implemented. Recently, Davide proposed a patch to send a POLLHUP wakeup whenever an eventfd is about to close. So we eliminate the IRQFD_DEASSIGN ioctl (*) vector in favor of sensing the desassign automatically when the fd is closed. The resulting code is slightly more complex as a result since we need to allow either side to sever the relationship independently. We utilize SRCU to guarantee stable concurrent access to the KVM pointer without adding additional atomic operations in the fast path. At minimum, this design should be acked by both Davide and Paul (cc'd). (*) The irqfd patch does not exist in any released tree, so the understanding is that we can alter the irqfd specific ABI without taking the normal precautions, such as CAP bits. A few questions and suggestions interspersed below. Thanx, Paul Thanks for the review, Paul. Some questions, clarifications, and mea culpas below. Thanx, Paul (FYI: This isn't quite what I was asking you about on IRC yesterday, but it's related...and the SRCU portion of the conversation *did* inspire me here. Just note that the stuff I was asking about is still forthcoming) ;-) Signed-off-by: Gregory Haskins ghask...@novell.com CC: Davide Libenzi davi...@xmailserver.org CC: Michael S. Tsirkin m...@redhat.com CC: Paul E. McKenney paul...@linux.vnet.ibm.com --- include/linux/kvm.h |2 - virt/kvm/eventfd.c | 177 +++ virt/kvm/kvm_main.c |3 + 3 files changed, 81 insertions(+), 101 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 632a856..29b62cc 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -482,8 +482,6 @@ struct kvm_x86_mce { }; #endif -#define KVM_IRQFD_FLAG_DEASSIGN (1 0) - struct kvm_irqfd { __u32 fd; __u32 gsi; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f3f2ea1..6ed62e2 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -37,26 +37,63 @@ * */ struct _irqfd { +struct mutex lock; +struct srcu_structsrcu; struct kvm *kvm; int gsi; -struct file *file; struct list_head list; poll_tablept; wait_queue_head_t*wqh; wait_queue_t wait; -struct work_structwork; +struct work_structinject; }; static void irqfd_inject(struct work_struct *work) { -struct _irqfd *irqfd = container_of(work, struct _irqfd, work); -struct kvm *kvm = irqfd-kvm; +struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); +struct kvm *kvm; +int idx; + +idx = srcu_read_lock(irqfd-srcu); + +kvm = rcu_dereference(irqfd-kvm); [4] +if (kvm) { +mutex_lock(kvm-lock); +kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1); +kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0); +mutex_unlock(kvm-lock); +} + +srcu_read_unlock(irqfd-srcu, idx); +} [1] + +static void +irqfd_disconnect(struct _irqfd *irqfd) +{ +struct kvm *kvm; + +mutex_lock(irqfd-lock); + +kvm = rcu_dereference(irqfd-kvm); +rcu_assign_pointer(irqfd-kvm, NULL); + +mutex_unlock(irqfd-lock); [2] + +if (!kvm) +return; mutex_lock(kvm-lock); -kvm_set_irq(kvm,
Re: [PATCH 2/2] Add serial number support for virtio_blk, V4
Jens Axboe wrote: On Mon, Jun 01 2009, Rusty Russell wrote: On Fri, 29 May 2009 01:45:27 pm john cooper wrote: virtio_blk-serial-4.patch Hate to ask dumb questions, but is there a scsi equivalent of this? It'd be nice if we could avoid being ATA-specific in the long run... SCSI has mode pages, where ATA pretty much stuffs everything into the identify data. Also, why u16? The identify page is word based, so u16 makes sense. It is actually an artifact left over from the previous version. In that case the driver was only pulling the serial number over, itself constructing a mocked-up identify page, and needed to be aware of the internal structure. Currently qemu fabricates the identify page which the driver treats as opaque data passed onto the user. So the u16 * structure has no meaning. Sorry to be late in responding here. A patch to address this wrinkle and other feedback to follow shortly. -john +/* return ATA identify data + */ +static int virtblk_identify(struct gendisk *disk, void *argp) +{ + struct virtio_blk *vblk = disk-private_data; + u16 *id; + int err = -ENOMEM; + + id = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL); + if (!id) + goto out; + + err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY, + offsetof(struct virtio_blk_config, identify), id, + VIRTIO_BLK_ID_BYTES); -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl
Thanks again for the review, Paul. IIUC, you think the design is ok as it is. Davide, In light of this, would you like to submit patch 1/2 formally with your SOB at your earliest convenience? Or would you prefer that I submit it and you can simply ack it? Either is fine with me. -Greg signature.asc Description: OpenPGP digital signature
Re: [KVM-RFC PATCH 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl
On Wed, 3 Jun 2009, Gregory Haskins wrote: Thanks again for the review, Paul. IIUC, you think the design is ok as it is. Davide, In light of this, would you like to submit patch 1/2 formally with your SOB at your earliest convenience? Or would you prefer that I submit it and you can simply ack it? Either is fine with me. Will do later today... - Davide -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows Server 2008 VM performance
On Wed, Jun 03, 2009 at 09:21:16AM -0500, Andrew Theurer wrote: Avi Kivity wrote: Andrew Theurer wrote: Is there a virtio_block driver to test? There is, but it isn't available yet. OK. Can I assume a better virtio_net driver is in the works as well? Can we find the root cause of the exits (is there a way to get stack dump or something that can show where there are coming from)? Marcelo is working on a super-duper easy to use kvm trace which can show what's going on. The old one is reasonably easy though it exports less data. If you can generate some traces, I'll have a look at them. Thanks Avi. I'll try out kvm-86 and see if I can generate some kvm trace data. clone git://git.kernel.org/pub/scm/linux/kernel/git/marcelo/linux-2.6-x86-kvmtrace.git, checkout (remote) branch kvmtrace, _build with KVM=y,KVM_{INTEL,AMD}=y_ (there's a missing export symbol, will pull the upstream fix ASAP). Then once running the benchmark with this kernel, on a phase thats indicative of the workload, do: echo kvm_exit /debugfs/tracing/set_event sleep n cat /debugfs/tracing/trace /tmp/trace-save.txt Make n relatively small, 1 or 2 seconds. Would be nice to see UP vs SMP Win2008 guests. echo /debugfs/tracing/set_event to stop tracing echo /debugfs/tracing/trace to zero the trace buffer With some post processing you can the get the exit reason percentages. Alternatively you can use systemtap (see attached script, need some adjustment in the entry/exit line number probes) so it calculates the exit percentages for you. global exit_names[70] global latency[70] global vmexit global vmentry global vmx_exit_reason // run stap -t kvm.stp to find out the overhead of which probe // and change it here. should be automatic! global exit_probe_overhead = 645 global entry_probe_overhead = 882 global handlexit_probe_overhead = 405 global overhead probe begin (1) { overhead = 882 + 645 + 405 exit_names[0] = EXCEPTION exit_names[1] = EXTERNAL INT exit_names[7] = PENDING INTERRUPT exit_names[10] = CPUID exit_names[18] = HYPERCALL exit_names[28] = CR ACCESS exit_names[29] = DR ACCESS exit_names[30] = IO INSTRUCTION (LIGHT) exit_names[31] = MSR READ exit_names[32] = MSR WRITE exit_names[44] = APIC ACCESS exit_names[60] = IO INSTRUCTION (HEAVY) } probe module(kvm_intel).statement(kvm_handle_e...@arch/x86/kvm/vmx.c:3011) { vmx_exit_reason = $exit_reason } // exit probe module(kvm_intel).statement(vmx_vcpu_...@arch/x86/kvm/vmx.c:3226) { vmexit = get_cycles() } // entry probe module(kvm_intel).function(vmx_vcpu_run) { vmentry = get_cycles() if (vmx_exit_reason != 12) latency[vmx_exit_reason] vmentry - vmexit - overhead } //heavy-exit probe module(kvm).function(kvm_arch_vcpu_ioctl_run) { vmx_exit_reason = 60 } probe end { foreach (x in latency-) { printf(%d: %s\n, x, exit_names[x]) printf (avg %d = sum %d / count %d\n, @avg(latency[x]), @sum(latency[x]), @count(latency[x])) printf (min %d max %d\n, @min(latency[x]), @max(latency[x])) print(@hist_linear(latency[x], 1000, 5000, 1000)) } print(\n) }
Re: [git pull request] kvm-autotest: sync with upstream Autotest SVN
On 06/01/2009 11:41 PM, Eduardo Habkost wrote: Now that kvm-autotest was included on upstream Autotest SVN repository[1], the following pull request is a proposal to sync kvm-autotest.git Autotest SVN trunk, while keeping the kvm-autotest.git history. I have set up a git-svn mirror of the Autotest SVN repository[2], and the upstream Autotest changes included on this pull request come from this mirror. The changes available on my branch do the following, to try to make the resulting history bisectable: 1. Apply some trivial changes to avoid conflicts when merging with Autotest SVN 2. Merge with Autotest SVN revision 3185 (revision immediately before kvm-autotest was included) 3. Apply a patch series from Lucas, that converts the kvm-autotest code to the code that went to Autotest SVN, including a rename of tests/kvm_runtest_2 to tests/kvm. 4. Merge with Autotest SVN trunk (revision 3189) The result is a tree that is almost the same that is on upstream Autotest SVN, except for a few differences that need to be either removed from kvm-autotest, or sent upstream. The remaining differences are the following: $ git diff origin/master github/to-kvm-autotest-merge1 | diffstat KVM_REGRESSION_README |4 ++ bin/base_sysinfo.py |2 - bin/simple_ssh.py | 70 tests/dbench/control.60 | 20 + 4 files changed, 95 insertions(+), 1 deletion(-) ('origin' is Autotest SVN, 'github' is the tree where the merge results are available) References: [1] http://autotest.kernel.org/changeset/3187 [2] http://github.com/ehabkost/autotest/tree/master The mirror is synchronized every 30 minutes. The following changes since commit 2a0a76645f72dcbd495ea37b566d017796eca24e: Uri Lublin (1): kvm_tests.cfg: adding some missing iso image md5sum values are available in the git repository at: git://github.com/ehabkost/autotest.git to-kvm-autotest-merge1 Pulled, Thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts
- Avi Kivity a...@redhat.com wrote: Michael Goldish wrote: - What if we're interested in the status for some reason? Its value may indicate what went wrong with the child process. Put it in the exception string. But I want its value to be examined programmatically in the code. It's less comfortable to retrieve it from a string. Then create a custom exception object and put it in a field. - If we throw an exception we should add a parameter that controls whether an exception should be thrown (something like ignore_status) because in some cases we don't want to throw an exception. I'll complain every time I see it. If you don't care if the command succeeds or not, why run it in the first place? I care, but sometimes I don't want to fail the test when a command fails. I can put every run_bg() call in a try-except statement, but that misses the point of raising exceptions in the first place. I think it's very rare to want to let the test continue even if some command fails. Can you give examples? Some commands are not critical, like one that converts the screendumps from PPM to PNG format. We don't want to fail the test if you don't have ImageMagick installed. A similar function used for remote commands is sometimes used for checking if a file exists on the guest and getting its size at the same time, using ls. If the file doesn't exist we should SCP it into the guest, not fail the test. exception such as something failed after migration? Chained exceptions can provide detailed information. Wouldn't it complicate the test code? How can I provide a detailed message such as test command failed after migration -- can you illustrate this in a small example? try: code code code except e: raise ChainedException('exception while running migration test', e) Instead of checking each code line, you provide a wrapper for the entire test. It's not informative to add it for the entire test, because we already know what test failed -- we don't need the exception string for that. Instead, we'd have to wrap small sections of the test separately and provide a different exception string for each. The migration test may be split to four sections: setup, pre-migration, the migration itself and post-migration. This may not be so bad, but do you still think it's a good solution? There's also another issue: the utility functions that should raise exceptions don't know what they're being used for, so there can be exception string collisions. For example, both run_bg(), used for local processes, and kvm_spawn.get_command_status_output(), used for SSH commands, would say something like command %s failed. It would be incorrect to hardcode a more specific message into any of them (in my opinion). But if status != 0, will there actually be a pid to kill? If the timeout expires and the process is still running, status is None. functions which can return with three possible outcomes are difficult to use. I tend to see it as two possible outcomes: completion or timeout. In the former case, status is returned. In the latter, None is returned. The caller has to test for three possible outcomes. Success, failure, and 'command is running in the background'. If your callsites don't check for all three, something's wrong. When we want the command to succeed on time, we just test for status == 0. If it's a nonzero integer or None, something went wrong, and we should kill the process just in case it's still running. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] VMX Unrestricted mode support
Hi Avi, I find that the qemu processor reset state is not per the IA32 processor specifications. (Sections 8.1.1 of http://www.intel.com/Assets/PDF/manual/253668.pdf) In qemu-kvm.git in file target-i386/helper.c in function cpu_reset the segment registers are initialized as follows: cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x, DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | DESC_R_MASK); cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); While the IA32 cpu reset state specification says that Segment Accessed bit is also 1 at the time of cpu reset. so the above code should look like this: cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x, DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | DESC_R_MASK | DESC_A_MASK); cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | DESC_A_MASK); cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK| DESC_A_MASK); cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |DESC_A_MASK); cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); This discrepancy is adding the need of the following function in the unrestricted guest patch. +static inline u32 get_segment_ar(int seg) +{ + if (!enable_unrestricted_guest) + return 0xf3; + + switch (seg) { + case VCPU_SREG_CS: + return 0x9b; + case VCPU_SREG_TR: + return 0x8b; + case VCPU_SREG_LDTR: + return 0x82; + default: + return 0x93; + } +} + For the unrestricted guest support either we can fix this discrepancy in the qemu code, or have a functionality like get_segment_ar() in the kvm vmx code. what do you suggest ? Thanks Regards, Nitin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts
Michael Goldish wrote: I think it's very rare to want to let the test continue even if some command fails. Can you give examples? Some commands are not critical, like one that converts the screendumps from PPM to PNG format. We don't want to fail the test if you don't have ImageMagick installed. I think you don't want to run the test at all in that case. But if you do, you shouldn't invoke ImageMagick if you don't have it installed. Otherwise you can't distinguish between failure due to a corrupted source image, out of disk space, or ImageMagick not installed. Ignoring errors is usually a bug. A similar function used for remote commands is sometimes used for checking if a file exists on the guest and getting its size at the same time, using ls. If the file doesn't exist we should SCP it into the guest, not fail the test. Again it doesn't distinguish between real errors and 'file does not exist'. Use something like 'if test -f file; then ls -l file; else echo missing; fi'. You can use rsync to implement 'copy unless exists'. try: code code code except e: raise ChainedException('exception while running migration test', e) Instead of checking each code line, you provide a wrapper for the entire test. It's not informative to add it for the entire test, because we already know what test failed -- we don't need the exception string for that. Instead, we'd have to wrap small sections of the test separately and provide a different exception string for each. The migration test may be split to four sections: setup, pre-migration, the migration itself and post-migration. This may not be so bad, but do you still think it's a good solution? You can have a 'stage' variable and assign it your current stage, and use it when reporting errors. Anything is better than a maze of ifs. There's also another issue: the utility functions that should raise exceptions don't know what they're being used for, so there can be exception string collisions. For example, both run_bg(), used for local processes, and kvm_spawn.get_command_status_output(), used for SSH commands, would say something like command %s failed. It would be incorrect to hardcode a more specific message into any of them (in my opinion). The message should explain what happened, no more, no less. The caller has to test for three possible outcomes. Success, failure, and 'command is running in the background'. If your callsites don't check for all three, something's wrong. When we want the command to succeed on time, we just test for status == 0. If it's a nonzero integer or None, something went wrong, and we should kill the process just in case it's still running. But if it's nonzero it has exited already! However, I feel I'm not making any progress, so I'll stop. I just wish I could use exceptions in the code I write. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] VMX Unrestricted mode support
Nitin A Kamble wrote: Hi Avi, I find that the qemu processor reset state is not per the IA32 processor specifications. (Sections 8.1.1 of http://www.intel.com/Assets/PDF/manual/253668.pdf) In qemu-kvm.git in file target-i386/helper.c in function cpu_reset the segment registers are initialized as follows: cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x, DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | DESC_R_MASK); cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); While the IA32 cpu reset state specification says that Segment Accessed bit is also 1 at the time of cpu reset. so the above code should look like this: cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x, DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | DESC_R_MASK | DESC_A_MASK); cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | DESC_A_MASK); cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK| DESC_A_MASK); cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |DESC_A_MASK); cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x, DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); This discrepancy is adding the need of the following function in the unrestricted guest patch. +static inline u32 get_segment_ar(int seg) +{ + if (!enable_unrestricted_guest) + return 0xf3; + + switch (seg) { + case VCPU_SREG_CS: + return 0x9b; + case VCPU_SREG_TR: + return 0x8b; + case VCPU_SREG_LDTR: + return 0x82; + default: + return 0x93; + } +} + For the unrestricted guest support either we can fix this discrepancy in the qemu code, or have a functionality like get_segment_ar() in the kvm vmx code. what do you suggest ? Qemu should be fixed of course, but we want kvm to keep working with older qemu. So please also have KVM_SET_SREGS set the A bit on segments which are not unusable. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] avoid halted state for in kernel irqchip
This patch is part of a series that tries to provide a kvm-free apic implementation. In the last interation, Jan pointed out that halted state management with in kernel irqchip gets quite messy. I don't disagree. It broke this series specifically, as init IPIs had the halted state set. The solution would be to rework halted state management. But I'm too lazy. Besides, gleb told me he would do it, which makes it his problem, not mine. ;-) This patch can be used to bypass the problem entirely: if kvm apic do not call apic_init_ipi, but a version of it that does everything but mangling around with cpu states, the problem becomes a non issue, and my glorious series can be applied. Signed-off-by: Glauber Costa glom...@redhat.com --- hw/apic.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 86aa6b6..862289d 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -443,8 +443,7 @@ static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask, } } - -static void apic_init_ipi(APICState *s) +static void apic_init_ipi_state(APICState *s) { int i; @@ -466,6 +465,11 @@ static void apic_init_ipi(APICState *s) s-next_time = 0; cpu_reset(s-cpu_env); +} + +static void apic_init_ipi(APICState *s) +{ +apic_init_ipi_state(s); if (!(s-apicbase MSR_IA32_APICBASE_BSP) (!kvm_enabled() || !qemu_kvm_irqchip_in_kernel())) -- 1.5.6.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] sipi and init: move common code
provide functions to query and reset the state of sipi and init in cpu's apic. This way we can move the kvm specific functions out of the apic path. Signed-off-by: Glauber Costa glom...@redhat.com --- cpu-defs.h |2 -- hw/apic.c | 49 - qemu-kvm.c | 26 ++ qemu-kvm.h |7 +-- 4 files changed, 55 insertions(+), 29 deletions(-) diff --git a/cpu-defs.h b/cpu-defs.h index 1e071e7..e66e928 100644 --- a/cpu-defs.h +++ b/cpu-defs.h @@ -140,8 +140,6 @@ typedef struct CPUWatchpoint { struct qemu_work_item; struct KVMCPUState { -int sipi_needed; -int init; pthread_t thread; int signalled; int stop; diff --git a/hw/apic.c b/hw/apic.c index 862289d..39e1675 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -86,6 +86,8 @@ typedef struct APICState { uint32_t initial_count; int64_t initial_count_load_time, next_time; QEMUTimer *timer; +int sipi_needed; +int init; } APICState; static int apic_io_memory; @@ -93,6 +95,45 @@ static APICState *local_apics[MAX_APICS + 1]; static int last_apic_id = 0; static int apic_irq_delivered; +int apic_init_received(CPUState *env) +{ +if (!env) +return 0; +if (!env-apic_state) +return 0; + +return env-apic_state-init; +} + +int apic_sipi_needed(CPUState *env) +{ +if (!env) +return 0; +if (!env-apic_state) +return 0; + +return env-apic_state-sipi_needed; +} + +void apic_reset_sipi(CPUState *env) +{ +if (!env) +return; +if (!env-apic_state) +return; + +env-apic_state-sipi_needed = 0; +} + +void apic_reset_init(CPUState *env) +{ +if (!env) +return; +if (!env-apic_state) +return; + +env-apic_state-init = 0; +} static void apic_init_ipi(APICState *s); static void apic_set_irq(APICState *s, int vector_num, int trigger_mode); @@ -475,9 +516,8 @@ static void apic_init_ipi(APICState *s) (!kvm_enabled() || !qemu_kvm_irqchip_in_kernel())) s-cpu_env-halted = 1; -if (kvm_enabled() !qemu_kvm_irqchip_in_kernel()) - if (s-cpu_env) - kvm_apic_init(s-cpu_env); +if (s-cpu_env s-cpu_env-cpu_index != 0) + s-init = 1; } /* send a SIPI message to the CPU to start it */ @@ -490,8 +530,7 @@ static void apic_startup(APICState *s, int vector_num) cpu_x86_load_seg_cache(env, R_CS, vector_num 8, vector_num 12, 0x, 0); env-halted = 0; -if (kvm_enabled() !qemu_kvm_irqchip_in_kernel()) - kvm_update_after_sipi(env); +s-sipi_needed = 1; } static void apic_deliver(APICState *s, uint8_t dest, uint8_t dest_mode, diff --git a/qemu-kvm.c b/qemu-kvm.c index 68d3b92..1441cef 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -134,19 +134,6 @@ void kvm_update_interrupt_request(CPUState *env) } } -void kvm_update_after_sipi(CPUState *env) -{ -env-kvm_cpu_state.sipi_needed = 1; -kvm_update_interrupt_request(env); -} - -void kvm_apic_init(CPUState *env) -{ -if (env-cpu_index != 0) - env-kvm_cpu_state.init = 1; -kvm_update_interrupt_request(env); -} - #include signal.h static int try_push_interrupts(void *opaque) @@ -332,7 +319,7 @@ static void kvm_vm_state_change_handler(void *context, int running, int reason) static void update_regs_for_sipi(CPUState *env) { kvm_arch_update_regs_for_sipi(env); -env-kvm_cpu_state.sipi_needed = 0; +apic_reset_sipi(env); } static void update_regs_for_init(CPUState *env) @@ -345,11 +332,10 @@ static void update_regs_for_init(CPUState *env) #ifdef TARGET_I386 /* restore SIPI vector */ -if(env-kvm_cpu_state.sipi_needed) +if (apic_sipi_needed(env)) env-segs[R_CS] = cs; #endif - -env-kvm_cpu_state.init = 0; +apic_reset_init(env); kvm_arch_load_regs(env); } @@ -407,12 +393,12 @@ static int kvm_main_loop_cpu(CPUState *env) if (env-interrupt_request (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI)) env-halted = 0; if (!kvm_irqchip_in_kernel(kvm_context)) { - if (env-kvm_cpu_state.init) + if (apic_init_received(env)) update_regs_for_init(env); - if (env-kvm_cpu_state.sipi_needed) + if (apic_sipi_needed(env)) update_regs_for_sipi(env); } - if (!env-halted !env-kvm_cpu_state.init) + if (!env-halted) kvm_cpu_exec(env); env-exit_request = 0; env-exception_index = EXCP_INTERRUPT; diff --git a/qemu-kvm.h b/qemu-kvm.h index 725589b..eb7dc29 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -31,9 +31,13 @@ void kvm_remove_all_breakpoints(CPUState *current_env); int kvm_update_guest_debug(CPUState *env, unsigned long reinject_trap); int kvm_qemu_init_env(CPUState *env); int kvm_qemu_check_extension(int ext); -void kvm_apic_init(CPUState *env); +/* FIXME: there should be an apic.h file */ /* called from vcpu initialization */
[PATCH 4/4] provide a kvm-free implementation of ioapic
Also, provide a kvm_ioapic that does not depend highly on common code. Signed-off-by: Glauber Costa glom...@redhat.com --- hw/ioapic.c | 162 +- hw/pc.c |7 ++- hw/pc.h |1 + 3 files changed, 110 insertions(+), 60 deletions(-) diff --git a/hw/ioapic.c b/hw/ioapic.c index 6c178c7..3eb5d5f 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -198,58 +198,11 @@ static void ioapic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t va } } -static void kvm_kernel_ioapic_save_to_user(IOAPICState *s) -{ -#if defined(KVM_CAP_IRQCHIP) defined(TARGET_I386) -struct kvm_irqchip chip; -struct kvm_ioapic_state *kioapic; -int i; - -chip.chip_id = KVM_IRQCHIP_IOAPIC; -kvm_get_irqchip(kvm_context, chip); -kioapic = chip.chip.ioapic; - -s-id = kioapic-id; -s-ioregsel = kioapic-ioregsel; -s-base_address = kioapic-base_address; -s-irr = kioapic-irr; -for (i = 0; i IOAPIC_NUM_PINS; i++) { -s-ioredtbl[i] = kioapic-redirtbl[i].bits; -} -#endif -} - -static void kvm_kernel_ioapic_load_from_user(IOAPICState *s) -{ -#if defined(KVM_CAP_IRQCHIP) defined(TARGET_I386) -struct kvm_irqchip chip; -struct kvm_ioapic_state *kioapic; -int i; - -chip.chip_id = KVM_IRQCHIP_IOAPIC; -kioapic = chip.chip.ioapic; - -kioapic-id = s-id; -kioapic-ioregsel = s-ioregsel; -kioapic-base_address = s-base_address; -kioapic-irr = s-irr; -for (i = 0; i IOAPIC_NUM_PINS; i++) { -kioapic-redirtbl[i].bits = s-ioredtbl[i]; -} - -kvm_set_irqchip(kvm_context, chip); -#endif -} - -static void ioapic_save(QEMUFile *f, void *opaque) +static void ioapic_save_common(QEMUFile *f, void *opaque) { IOAPICState *s = opaque; int i; -if (kvm_enabled() qemu_kvm_irqchip_in_kernel()) { -kvm_kernel_ioapic_save_to_user(s); -} - qemu_put_8s(f, s-id); qemu_put_8s(f, s-ioregsel); qemu_put_be64s(f, s-base_address); @@ -259,7 +212,12 @@ static void ioapic_save(QEMUFile *f, void *opaque) } } -static int ioapic_load(QEMUFile *f, void *opaque, int version_id) +static void ioapic_save(QEMUFile *f, void *opaque) +{ +ioapic_save_common(f, opaque); +} + +static int ioapic_load_common(QEMUFile *f, void *opaque, int version_id) { IOAPICState *s = opaque; int i; @@ -283,14 +241,15 @@ static int ioapic_load(QEMUFile *f, void *opaque, int version_id) qemu_get_be64s(f, s-ioredtbl[i]); } -if (kvm_enabled() qemu_kvm_irqchip_in_kernel()) { -kvm_kernel_ioapic_load_from_user(s); -} - return 0; } -static void ioapic_reset(void *opaque) +static int ioapic_load(QEMUFile *f, void *opaque, int version_id) +{ +return ioapic_load_common(f, opaque, version_id); +} + +static void ioapic_reset_common(void *opaque) { IOAPICState *s = opaque; int i; @@ -299,11 +258,11 @@ static void ioapic_reset(void *opaque) s-base_address = IOAPIC_DEFAULT_BASE_ADDRESS; for(i = 0; i IOAPIC_NUM_PINS; i++) s-ioredtbl[i] = 1 16; /* mask LVT */ -#ifdef KVM_CAP_IRQCHIP -if (kvm_enabled() qemu_kvm_irqchip_in_kernel()) { -kvm_kernel_ioapic_load_from_user(s); -} -#endif +} + +static void ioapic_reset(void *opaque) +{ +ioapic_reset_common(opaque); } static CPUReadMemoryFunc *ioapic_mem_read[3] = { @@ -335,3 +294,88 @@ IOAPICState *ioapic_init(void) return s; } + +#ifdef KVM_CAP_IRQCHIP +static void kvm_kernel_ioapic_save_to_user(IOAPICState *s) +{ +#if defined(TARGET_I386) +struct kvm_irqchip chip; +struct kvm_ioapic_state *kioapic; +int i; + +chip.chip_id = KVM_IRQCHIP_IOAPIC; +kvm_get_irqchip(kvm_context, chip); +kioapic = chip.chip.ioapic; + +s-id = kioapic-id; +s-ioregsel = kioapic-ioregsel; +s-base_address = kioapic-base_address; +s-irr = kioapic-irr; +for (i = 0; i IOAPIC_NUM_PINS; i++) { +s-ioredtbl[i] = kioapic-redirtbl[i].bits; +} +#endif +} + +static void kvm_kernel_ioapic_load_from_user(IOAPICState *s) +{ +#if defined(TARGET_I386) +struct kvm_irqchip chip; +struct kvm_ioapic_state *kioapic; +int i; + +chip.chip_id = KVM_IRQCHIP_IOAPIC; +kioapic = chip.chip.ioapic; + +kioapic-id = s-id; +kioapic-ioregsel = s-ioregsel; +kioapic-base_address = s-base_address; +kioapic-irr = s-irr; +for (i = 0; i IOAPIC_NUM_PINS; i++) { +kioapic-redirtbl[i].bits = s-ioredtbl[i]; +} + +kvm_set_irqchip(kvm_context, chip); +#endif +} + +static void kvm_ioapic_save(QEMUFile *f, void *opaque) +{ +IOAPICState *s = opaque; +kvm_kernel_ioapic_save_to_user(s); + +ioapic_save_common(f, opaque); +} + +static int kvm_ioapic_load(QEMUFile *f, void *opaque, int version_id) +{ +IOAPICState *s = opaque; +int r = ioapic_load_common(f, opaque, version_id); + +if (r == 0) +kvm_kernel_ioapic_load_from_user(s); +return r; +} + + +static void
[PATCH 3/4] provide a kvm-free implementation of apic
Also, provide a kvm_apic that does not depend highly on common code. Signed-off-by: Glauber Costa glom...@redhat.com --- hw/apic.c | 241 ++-- hw/pc.c|7 ++- hw/pc.h|1 + qemu-kvm-x86.c |5 +- 4 files changed, 157 insertions(+), 97 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 39e1675..c20f4a9 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -857,103 +857,11 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val) } } -#ifdef KVM_CAP_IRQCHIP - -static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id) -{ -return *((uint32_t *) (kapic-regs + (reg_id 4))); -} - -static inline void kapic_set_reg(struct kvm_lapic_state *kapic, - int reg_id, uint32_t val) -{ -*((uint32_t *) (kapic-regs + (reg_id 4))) = val; -} - -static void kvm_kernel_lapic_save_to_user(APICState *s) -{ -struct kvm_lapic_state apic; -struct kvm_lapic_state *kapic = apic; -int i, v; - -kvm_get_lapic(kvm_context, s-cpu_env-cpu_index, kapic); - -s-id = kapic_reg(kapic, 0x2) 24; -s-tpr = kapic_reg(kapic, 0x8); -s-arb_id = kapic_reg(kapic, 0x9); -s-log_dest = kapic_reg(kapic, 0xd) 24; -s-dest_mode = kapic_reg(kapic, 0xe) 28; -s-spurious_vec = kapic_reg(kapic, 0xf); -for (i = 0; i 8; i++) { -s-isr[i] = kapic_reg(kapic, 0x10 + i); -s-tmr[i] = kapic_reg(kapic, 0x18 + i); -s-irr[i] = kapic_reg(kapic, 0x20 + i); -} -s-esr = kapic_reg(kapic, 0x28); -s-icr[0] = kapic_reg(kapic, 0x30); -s-icr[1] = kapic_reg(kapic, 0x31); -for (i = 0; i APIC_LVT_NB; i++) - s-lvt[i] = kapic_reg(kapic, 0x32 + i); -s-initial_count = kapic_reg(kapic, 0x38); -s-divide_conf = kapic_reg(kapic, 0x3e); - -v = (s-divide_conf 3) | ((s-divide_conf 1) 4); -s-count_shift = (v + 1) 7; - -s-initial_count_load_time = qemu_get_clock(vm_clock); -apic_timer_update(s, s-initial_count_load_time); -} - -static void kvm_kernel_lapic_load_from_user(APICState *s) -{ -struct kvm_lapic_state apic; -struct kvm_lapic_state *klapic = apic; -int i; - -memset(klapic, 0, sizeof apic); -kapic_set_reg(klapic, 0x2, s-id 24); -kapic_set_reg(klapic, 0x8, s-tpr); -kapic_set_reg(klapic, 0xd, s-log_dest 24); -kapic_set_reg(klapic, 0xe, s-dest_mode 28 | 0x0fff); -kapic_set_reg(klapic, 0xf, s-spurious_vec); -for (i = 0; i 8; i++) { -kapic_set_reg(klapic, 0x10 + i, s-isr[i]); -kapic_set_reg(klapic, 0x18 + i, s-tmr[i]); -kapic_set_reg(klapic, 0x20 + i, s-irr[i]); -} -kapic_set_reg(klapic, 0x28, s-esr); -kapic_set_reg(klapic, 0x30, s-icr[0]); -kapic_set_reg(klapic, 0x31, s-icr[1]); -for (i = 0; i APIC_LVT_NB; i++) -kapic_set_reg(klapic, 0x32 + i, s-lvt[i]); -kapic_set_reg(klapic, 0x38, s-initial_count); -kapic_set_reg(klapic, 0x3e, s-divide_conf); - -kvm_set_lapic(kvm_context, s-cpu_env-cpu_index, klapic); -} - -#endif - -void qemu_kvm_load_lapic(CPUState *env) -{ -#ifdef KVM_CAP_IRQCHIP -if (kvm_enabled() kvm_vcpu_inited(env) qemu_kvm_irqchip_in_kernel()) { -kvm_kernel_lapic_load_from_user(env-apic_state); -} -#endif -} - static void apic_save(QEMUFile *f, void *opaque) { APICState *s = opaque; int i; -#ifdef KVM_CAP_IRQCHIP -if (kvm_enabled() qemu_kvm_irqchip_in_kernel()) { -kvm_kernel_lapic_save_to_user(s); -} -#endif - qemu_put_be32s(f, s-apicbase); qemu_put_8s(f, s-id); qemu_put_8s(f, s-arb_id); @@ -1017,8 +925,6 @@ static int apic_load(QEMUFile *f, void *opaque, int version_id) if (version_id = 2) qemu_get_timer(f, s-timer); -qemu_kvm_load_lapic(s-cpu_env); - return 0; } @@ -1039,7 +945,6 @@ static void apic_reset(void *opaque) */ s-lvt[APIC_LVT_LINT0] = 0x700; } -qemu_kvm_load_lapic(s-cpu_env); } static CPUReadMemoryFunc *apic_mem_read[3] = { @@ -1086,3 +991,149 @@ int apic_init(CPUState *env) return 0; } +#ifdef KVM_CAP_IRQCHIP + +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id) +{ +return *((uint32_t *) (kapic-regs + (reg_id 4))); +} + +static inline void kapic_set_reg(struct kvm_lapic_state *kapic, + int reg_id, uint32_t val) +{ +*((uint32_t *) (kapic-regs + (reg_id 4))) = val; +} + +static void kvm_kernel_lapic_save_to_user(APICState *s) +{ +struct kvm_lapic_state apic; +struct kvm_lapic_state *kapic = apic; +int i, v; + +kvm_get_lapic(kvm_context, s-cpu_env-cpu_index, kapic); + +s-id = kapic_reg(kapic, 0x2) 24; +s-tpr = kapic_reg(kapic, 0x8); +s-arb_id = kapic_reg(kapic, 0x9); +s-log_dest = kapic_reg(kapic, 0xd) 24; +s-dest_mode = kapic_reg(kapic, 0xe) 28; +s-spurious_vec = kapic_reg(kapic, 0xf); +for (i = 0; i 8; i++) { +s-isr[i] =
Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts
- Avi Kivity a...@redhat.com wrote: Michael Goldish wrote: I think it's very rare to want to let the test continue even if some command fails. Can you give examples? Some commands are not critical, like one that converts the screendumps from PPM to PNG format. We don't want to fail the test if you don't have ImageMagick installed. I think you don't want to run the test at all in that case. I do -- the conversion is just intended to save some disk space after the test completes. But if you do, you shouldn't invoke ImageMagick if you don't have it installed. Otherwise you can't distinguish between failure due to a corrupted source image, out of disk space, or ImageMagick not installed. You can, if you look at the output and/or status that run_bg() returns (e.g. command not found and 127). But you don't need to because you shouldn't do anything special if the command fails, except inform the user that it failed. This command is really not critical (we're currently not using it anyway -- it's just planned for the near future). Ignoring errors is usually a bug. A similar function used for remote commands is sometimes used for checking if a file exists on the guest and getting its size at the same time, using ls. If the file doesn't exist we should SCP it into the guest, not fail the test. Again it doesn't distinguish between real errors and 'file does not exist'. Use something like 'if test -f file; then ls -l file; else echo missing; fi'. I don't see what's wrong with 'ls filename'. It does exactly what the line you suggested does -- if the file exists it displays some info, and if it doesn't it says No such file In the test we explicitly check for the No such file message, so it's being handled properly. You can use rsync to implement 'copy unless exists'. The guest may not like rsync. We may have to work with FTP in some cases (Windows). try: code code code except e: raise ChainedException('exception while running migration test', e) Instead of checking each code line, you provide a wrapper for the entire test. It's not informative to add it for the entire test, because we already know what test failed -- we don't need the exception string for that. Instead, we'd have to wrap small sections of the test separately and provide a different exception string for each. The migration test may be split to four sections: setup, pre-migration, the migration itself and post-migration. This may not be so bad, but do you still think it's a good solution? You can have a 'stage' variable and assign it your current stage, and use it when reporting errors. Anything is better than a maze of ifs. There's also another issue: the utility functions that should raise exceptions don't know what they're being used for, so there can be exception string collisions. For example, both run_bg(), used for local processes, and kvm_spawn.get_command_status_output(), used for SSH commands, would say something like command %s failed. It would be incorrect to hardcode a more specific message into any of them (in my opinion). The message should explain what happened, no more, no less. What happened depends on the point of view. It doesn't really help the user to print something like read_until_output_matches: Timeout elapsed instead of Could not log into guest after migration. It would also not be useful to see get_command_status_output: command 'tar xfj autotest.tar.bz2' failed with status 1 without knowing if it's a remote or a local command (the function itself doesn't know). The caller has to test for three possible outcomes. Success, failure, and 'command is running in the background'. If your callsites don't check for all three, something's wrong. When we want the command to succeed on time, we just test for status == 0. If it's a nonzero integer or None, something went wrong, and we should kill the process just in case it's still running. But if it's nonzero it has exited already! That's why we use safe_kill(), which kills only if the process is still running. However, I feel I'm not making any progress, so I'll stop. I just wish I could use exceptions in the code I write. We are making progress: I'll try to think of a simple way to start working with exceptions without breaking too many things. In the worst case, we can just keep two sets of functions: low-level ones that don't raise anything, and high level ones that raise an exception whenever anything seems to have gone wrong. I'm sorry for being a bit too stubborn. I prefer to understand things before going ahead and implementing them, and I still feel that some of my questions remain unanswered. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this
Re: [PATCH 0/4] kvm free implementation of apic/ioapic
Glauber Costa wrote: Same series already sent. Jan spotted a problem, and my lazyness found a way to bypass it, so it does not exist. A guest survived the following actions after this series is applied: * smp boot * smp reboot * migrate * reboot migrated guest. I can confirm that this series no longer suffers from the halted regression. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/4] avoid halted state for in kernel irqchip
On Wed, Jun 03, 2009 at 02:25:41PM -0400, Glauber Costa wrote: This patch is part of a series that tries to provide a kvm-free apic implementation. In the last interation, Jan pointed out that halted state management with in kernel irqchip gets quite messy. I don't disagree. It broke this series specifically, as init IPIs had the halted state set. The solution would be to rework halted state management. But I'm too lazy. Besides, gleb told me he would do it, which makes it his problem, not mine. ;-) This patch can be used to bypass the problem entirely: if kvm apic do not call apic_init_ipi, but a version of it that does everything but mangling around with cpu states, the problem becomes a non issue, and my glorious series can be applied. May be I miss something, but I don't see apic_init_ipi_state() called outside of apic_init_ipi() in this patch series, so what is the point of this patch? Signed-off-by: Glauber Costa glom...@redhat.com --- hw/apic.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 86aa6b6..862289d 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -443,8 +443,7 @@ static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask, } } - -static void apic_init_ipi(APICState *s) +static void apic_init_ipi_state(APICState *s) { int i; @@ -466,6 +465,11 @@ static void apic_init_ipi(APICState *s) s-next_time = 0; cpu_reset(s-cpu_env); +} + +static void apic_init_ipi(APICState *s) +{ +apic_init_ipi_state(s); if (!(s-apicbase MSR_IA32_APICBASE_BSP) (!kvm_enabled() || !qemu_kvm_irqchip_in_kernel())) -- 1.5.6.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] sipi and init: move common code
On Wed, Jun 03, 2009 at 02:25:42PM -0400, Glauber Costa wrote: @@ -407,12 +393,12 @@ static int kvm_main_loop_cpu(CPUState *env) if (env-interrupt_request (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI)) env-halted = 0; if (!kvm_irqchip_in_kernel(kvm_context)) { - if (env-kvm_cpu_state.init) + if (apic_init_received(env)) update_regs_for_init(env); - if (env-kvm_cpu_state.sipi_needed) + if (apic_sipi_needed(env)) update_regs_for_sipi(env); } - if (!env-halted !env-kvm_cpu_state.init) + if (!env-halted) You remove checking of init here. Why? It looks like it can be safely removed, but can you do it as a separate patch with explanation why it is not needed here. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v5 0/2] iosignalfd
(Applies to kvm.git/master:25deed73) This is v5 of the series. For more details, please see the header to patch 2/2. This series has been tested against the kvm-eventfd unit test, and appears to be functioning properly. You can download this test here: ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2 This series is ready to be considered for inclusion, pending any further review comments. [ Changelog: v5: *) Removed cookie field, which was a misunderstanding on my part on what Avi wanted for a data-match feature *) Added a new trigger data-match feature which I think is much closer to what we need. *) We retain the dev_count field in the io_bus infrastructure and instead back-fill the array on removal. *) Various minor cleanups *) Rebased to kvm.git/master:25deed73 v4: *) Fixed a bug in the original 2/4 where the PIT failure case would potentially leave the io_bus components registered. *) Condensed the v3 2/4 and 3/4 into one patch (2/2) since the patches became interdependent with the fix described above *) Rebased to kvm.git/master:74dfca0a v3: *) fixed patch 2/4 to handle error cases instead of BUG_ON *) implemented same HAVE_EVENTFD protection mechanism as irqfd to prevent compilation errors on unsupported arches *) completed testing *) rebased to kvm.git/master:7391a6d5 v2: *) added optional data-matching capability (via cookie field) *) changed name from iofd to iosignalfd *) added io_bus unregister function *) implemented deassign feature v1: *) original release (integrated into irqfd v7 series as iofd) ] --- Gregory Haskins (2): kvm: add iosignalfd support kvm: make io_bus interface more robust arch/x86/kvm/i8254.c | 22 +++ arch/x86/kvm/i8259.c |9 + arch/x86/kvm/x86.c|1 include/linux/kvm.h | 15 ++ include/linux/kvm_host.h | 16 ++ virt/kvm/coalesced_mmio.c |8 + virt/kvm/eventfd.c| 356 + virt/kvm/ioapic.c |9 + virt/kvm/kvm_main.c | 41 + 9 files changed, 462 insertions(+), 15 deletions(-) -- Signature -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v5 1/2] kvm: make io_bus interface more robust
Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it fails. We want to create dynamic MMIO/PIO entries driven from userspace later in the series, so we need to enhance the code to be more robust with the following changes: 1) Add a return value to the registration function 2) Fix up all the callsites to check the return code, handle any failures, and percolate the error up to the caller. 3) Add an unregister function that collapses holes in the array Signed-off-by: Gregory Haskins ghask...@novell.com --- arch/x86/kvm/i8254.c | 22 -- arch/x86/kvm/i8259.c |9 - include/linux/kvm_host.h |6 -- virt/kvm/coalesced_mmio.c |8 ++-- virt/kvm/ioapic.c |9 +++-- virt/kvm/kvm_main.c | 30 -- 6 files changed, 73 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index f91b0e3..f01eabe 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -586,6 +586,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags) { struct kvm_pit *pit; struct kvm_kpit_state *pit_state; + int ret; pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL); if (!pit) @@ -620,14 +621,31 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags) kvm_register_irq_mask_notifier(kvm, 0, pit-mask_notifier); kvm_iodevice_init(pit-dev, pit_dev_ops); - kvm_io_bus_register_dev(kvm-pio_bus, pit-dev); + ret = kvm_io_bus_register_dev(kvm-pio_bus, pit-dev); + if (ret 0) + goto fail; if (flags KVM_PIT_SPEAKER_DUMMY) { kvm_iodevice_init(pit-speaker_dev, speaker_dev_ops); - kvm_io_bus_register_dev(kvm-pio_bus, pit-speaker_dev); + ret = kvm_io_bus_register_dev(kvm-pio_bus, + pit-speaker_dev); + if (ret 0) + goto fail; } return pit; + +fail: + if (flags KVM_PIT_SPEAKER_DUMMY) + kvm_io_bus_unregister_dev(kvm-pio_bus, pit-speaker_dev); + + kvm_io_bus_unregister_dev(kvm-pio_bus, pit-dev); + + if (pit-irq_source_id = 0) + kvm_free_irq_source_id(kvm, pit-irq_source_id); + + kfree(pit); + return NULL; } void kvm_free_pit(struct kvm *kvm) diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 2520922..af9b8b4 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -530,6 +530,8 @@ static const struct kvm_io_device_ops picdev_ops = { struct kvm_pic *kvm_create_pic(struct kvm *kvm) { struct kvm_pic *s; + int ret; + s = kzalloc(sizeof(struct kvm_pic), GFP_KERNEL); if (!s) return NULL; @@ -546,6 +548,11 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm) * Initialize PIO device */ kvm_iodevice_init(s-dev, picdev_ops); - kvm_io_bus_register_dev(kvm-pio_bus, s-dev); + ret = kvm_io_bus_register_dev(kvm-pio_bus, s-dev); + if (ret 0) { + kfree(s); + return NULL; + } + return s; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bcb94eb..216fe07 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -61,8 +61,10 @@ void kvm_io_bus_init(struct kvm_io_bus *bus); void kvm_io_bus_destroy(struct kvm_io_bus *bus); struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, gpa_t addr, int len, int is_write); -void kvm_io_bus_register_dev(struct kvm_io_bus *bus, -struct kvm_io_device *dev); +int kvm_io_bus_register_dev(struct kvm_io_bus *bus, + struct kvm_io_device *dev); +void kvm_io_bus_unregister_dev(struct kvm_io_bus *bus, + struct kvm_io_device *dev); struct kvm_vcpu { struct kvm *kvm; diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index c4c7ec2..282fcde 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -97,6 +97,7 @@ static const struct kvm_io_device_ops coalesced_mmio_ops = { int kvm_coalesced_mmio_init(struct kvm *kvm) { struct kvm_coalesced_mmio_dev *dev; + int ret; dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL); if (!dev) @@ -104,9 +105,12 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) kvm_iodevice_init(dev-dev, coalesced_mmio_ops); dev-kvm = kvm; kvm-coalesced_mmio_dev = dev; - kvm_io_bus_register_dev(kvm-mmio_bus, dev-dev); - return 0; + ret = kvm_io_bus_register_dev(kvm-mmio_bus, dev-dev); + if (ret 0) + kfree(dev); + + return ret; } int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm, diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 6b00433..4a37ea1 100644
[KVM PATCH v5 2/2] kvm: add iosignalfd support
iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM heavy-weight exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called doorbell. This module has a function called doorbell_ring() which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via iosignalfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 11 iops, 9.09us rtt iosignalfd-mmio: 200100 iops, 5.00us rtt iosignalfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt iosignalfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. Signed-off-by: Gregory Haskins ghask...@novell.com --- arch/x86/kvm/x86.c |1 include/linux/kvm.h | 15 ++ include/linux/kvm_host.h | 10 + virt/kvm/eventfd.c | 356 ++ virt/kvm/kvm_main.c | 11 + 5 files changed, 389 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c1ed485..c96c0e3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1097,6 +1097,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_IRQ_INJECT_STATUS: case KVM_CAP_ASSIGN_DEV_IRQ: case KVM_CAP_IRQFD: + case KVM_CAP_IOSIGNALFD: case KVM_CAP_PIT2: r = 1; break; diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 632a856..53b720d 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -300,6 +300,19 @@ struct kvm_guest_debug { struct kvm_guest_debug_arch arch; }; +#define KVM_IOSIGNALFD_FLAG_TRIGGER (1 0) +#define KVM_IOSIGNALFD_FLAG_PIO (1 1) +#define KVM_IOSIGNALFD_FLAG_DEASSIGN (1 2) + +struct kvm_iosignalfd { + __u64 trigger; + __u64 addr; + __u32 len; + __u32 fd; + __u32 flags; + __u8 pad[36]; +}; + #define KVM_TRC_SHIFT 16 /* * kvm trace categories @@ -430,6 +443,7 @@ struct kvm_trace_rec { #ifdef __KVM_HAVE_PIT #define KVM_CAP_PIT2 33 #endif +#define KVM_CAP_IOSIGNALFD 34 #ifdef KVM_CAP_IRQ_ROUTING @@ -537,6 +551,7 @@ struct kvm_irqfd { #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) #define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) +#define KVM_IOSIGNALFD _IOW(KVMIO, 0x78, struct kvm_iosignalfd) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 216fe07..b705960 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -138,6 +138,7 @@
[PATCH v3] qemu-kvm: add iosignalfd support
An iosignalfd allows an eventfd to attach to a specific PIO/MMIO region in the guest. Any guest-writes to that region will trigger an eventfd signal. [ This userspace patch coorelates to the kvm.git patches, v5, which you can find here: http://lkml.org/lkml/2009/6/3/433, and are based on top of the irqfd userspace patches which have not yet been accepted upstream. ] Signed-off-by: Gregory Haskins ghask...@novell.com --- kvm/libkvm/libkvm.c | 68 +++ kvm/libkvm/libkvm.h | 42 2 files changed, 110 insertions(+), 0 deletions(-) diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c index 0ada15c..cb558fc 100644 --- a/kvm/libkvm/libkvm.c +++ b/kvm/libkvm/libkvm.c @@ -1493,3 +1493,71 @@ int kvm_irqfd(kvm_context_t kvm, int gsi, int flags) } #endif /* KVM_CAP_IRQFD */ + +#ifdef KVM_CAP_IOSIGNALFD + +#include sys/eventfd.h + +int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long addr, size_t len, + int fd, void *trigger, int flags) +{ + int r; + int type = flags IOSIGNALFD_FLAG_PIO; + struct kvm_iosignalfd data = { + .trigger = (__u64)trigger, + .addr= addr, + .len = len, + .fd = fd, + }; + + data.flags |= trigger ? KVM_IOSIGNALFD_FLAG_TRIGGER : 0; + data.flags |= type ? KVM_IOSIGNALFD_FLAG_PIO : 0; + + if (!kvm_check_extension(kvm, KVM_CAP_IOSIGNALFD)) + return -ENOENT; + + r = ioctl(kvm-vm_fd, KVM_IOSIGNALFD, data); + if (r == -1) + r = -errno; + return r; +} + +int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long addr, int fd, + int flags) +{ + int r; + int type = flags IOSIGNALFD_FLAG_PIO; + struct kvm_iosignalfd data = { + .addr= addr, + .fd = fd, + .flags = KVM_IOSIGNALFD_FLAG_DEASSIGN | + (type ? KVM_IOSIGNALFD_FLAG_PIO : 0), + }; + + if (!kvm_check_extension(kvm, KVM_CAP_IOSIGNALFD)) + return -ENOENT; + + r = ioctl(kvm-vm_fd, KVM_IOSIGNALFD, data); + if (r == -1) + r = -errno; + return r; +} + +#else /* KVM_CAP_IOSIGNALFD */ + +int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long addr, size_t len, + int fd, void *trigger, int flags) +{ + return -ENOSYS; +} + +int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long addr, int fd, + int flags) +{ + return -ENOSYS; +} + +#endif /* KVM_CAP_IOSIGNALFD */ + + + diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h index aca8ed6..65fbe2a 100644 --- a/kvm/libkvm/libkvm.h +++ b/kvm/libkvm/libkvm.h @@ -870,6 +870,48 @@ int kvm_get_irq_route_gsi(kvm_context_t kvm); */ int kvm_irqfd(kvm_context_t kvm, int gsi, int flags); +enum { + iosignalfd_option_pio, +}; + +#define IOSIGNALFD_FLAG_PIO(1 iosignalfd_option_pio) + +/*! + * \brief Assign an eventfd to an IO port (PIO or MMIO) + * + * Assigns an eventfd based file-descriptor to a specific PIO or MMIO + * address range. Any guest writes to the specified range will generate + * an eventfd signal. + * + * A data-match pointer can be optionally provided in trigger and only + * writes which match this value exactly will generate an event. The length + * of the trigger is established by the length of the overall IO range, and + * therefore must be in a natural byte-width for the IO routines of your + * particular architecture (e.g. 1, 2, 4, or 8 bytes on x86_64). + * + * \param kvm Pointer to the current kvm_context + * \param addr The IO address + * \param len The length of the IO region at the address + * \param fd The eventfd file-descriptor + * \param trigger A optional pointer providing data-match token + * \param flags FLAG_PIO: PIO, else MMIO + */ +int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long addr, size_t len, + int fd, void *trigger, int flags); + +/*! + * \brief Deassign an iosignalfd from a previously registered IO port + * + * Deassigns an iosignalfd previously registered with kvm_assign_iosignalfd() + * + * \param kvm Pointer to the current kvm_context + * \param addr The IO address to deassign + * \param fd The eventfd file-descriptor + * \param flags FLAG_PIO: PIO, else MMIO + */ +int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long addr, + int fd, int flags); + #ifdef KVM_CAP_DEVICE_MSIX int kvm_assign_set_msix_nr(kvm_context_t kvm, struct kvm_assigned_msix_nr *msix_nr); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v5 2/2] kvm: add iosignalfd support
Hi Paul, Sorry to bug you again, but here is yet another RCU related patch. See inline Gregory Haskins wrote: iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM heavy-weight exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called doorbell. This module has a function called doorbell_ring() which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via iosignalfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 11 iops, 9.09us rtt iosignalfd-mmio: 200100 iops, 5.00us rtt iosignalfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt iosignalfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. Signed-off-by: Gregory Haskins ghask...@novell.com --- arch/x86/kvm/x86.c |1 include/linux/kvm.h | 15 ++ include/linux/kvm_host.h | 10 + virt/kvm/eventfd.c | 356 ++ virt/kvm/kvm_main.c | 11 + 5 files changed, 389 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c1ed485..c96c0e3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1097,6 +1097,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_IRQ_INJECT_STATUS: case KVM_CAP_ASSIGN_DEV_IRQ: case KVM_CAP_IRQFD: + case KVM_CAP_IOSIGNALFD: case KVM_CAP_PIT2: r = 1; break; diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 632a856..53b720d 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -300,6 +300,19 @@ struct kvm_guest_debug { struct kvm_guest_debug_arch arch; }; +#define KVM_IOSIGNALFD_FLAG_TRIGGER (1 0) +#define KVM_IOSIGNALFD_FLAG_PIO (1 1) +#define KVM_IOSIGNALFD_FLAG_DEASSIGN (1 2) + +struct kvm_iosignalfd { + __u64 trigger; + __u64 addr; + __u32 len; + __u32 fd; + __u32 flags; + __u8 pad[36]; +}; + #define KVM_TRC_SHIFT 16 /* * kvm trace categories @@ -430,6 +443,7 @@ struct kvm_trace_rec { #ifdef __KVM_HAVE_PIT #define KVM_CAP_PIT2 33 #endif +#define KVM_CAP_IOSIGNALFD 34 #ifdef KVM_CAP_IRQ_ROUTING @@ -537,6 +551,7 @@ struct kvm_irqfd { #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) #define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) +#define KVM_IOSIGNALFD _IOW(KVMIO, 0x78, struct kvm_iosignalfd) /* *
Re: [PATCH 1/4] avoid halted state for in kernel irqchip
On Wed, Jun 03, 2009 at 10:50:59PM +0300, Gleb Natapov wrote: On Wed, Jun 03, 2009 at 02:25:41PM -0400, Glauber Costa wrote: This patch is part of a series that tries to provide a kvm-free apic implementation. In the last interation, Jan pointed out that halted state management with in kernel irqchip gets quite messy. I don't disagree. It broke this series specifically, as init IPIs had the halted state set. The solution would be to rework halted state management. But I'm too lazy. Besides, gleb told me he would do it, which makes it his problem, not mine. ;-) This patch can be used to bypass the problem entirely: if kvm apic do not call apic_init_ipi, but a version of it that does everything but mangling around with cpu states, the problem becomes a non issue, and my glorious series can be applied. May be I miss something, but I don't see apic_init_ipi_state() called outside of apic_init_ipi() in this patch series, so what is the point of this patch? it should be called from patch 3. But I made a mistake, and in fact it is not. My bad. Signed-off-by: Glauber Costa glom...@redhat.com --- hw/apic.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 86aa6b6..862289d 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -443,8 +443,7 @@ static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask, } } - -static void apic_init_ipi(APICState *s) +static void apic_init_ipi_state(APICState *s) { int i; @@ -466,6 +465,11 @@ static void apic_init_ipi(APICState *s) s-next_time = 0; cpu_reset(s-cpu_env); +} + +static void apic_init_ipi(APICState *s) +{ +apic_init_ipi_state(s); if (!(s-apicbase MSR_IA32_APICBASE_BSP) (!kvm_enabled() || !qemu_kvm_irqchip_in_kernel())) -- 1.5.6.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] sipi and init: move common code
On Wed, Jun 03, 2009 at 10:53:05PM +0300, Gleb Natapov wrote: On Wed, Jun 03, 2009 at 02:25:42PM -0400, Glauber Costa wrote: @@ -407,12 +393,12 @@ static int kvm_main_loop_cpu(CPUState *env) if (env-interrupt_request (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI)) env-halted = 0; if (!kvm_irqchip_in_kernel(kvm_context)) { - if (env-kvm_cpu_state.init) + if (apic_init_received(env)) update_regs_for_init(env); - if (env-kvm_cpu_state.sipi_needed) + if (apic_sipi_needed(env)) update_regs_for_sipi(env); } - if (!env-halted !env-kvm_cpu_state.init) + if (!env-halted) You remove checking of init here. Why? It looks like it can be safely removed, but can you do it as a separate patch with explanation why it is not needed here. I'll update it to not touch this code, since it is not the aim of this series. we can remove it later. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Add serial number support for virtio_blk, V4a
This patch extracts the opaque data from pci i/o region 0 via the added VIRTIO_BLK_F_IDENTIFY field. By convention this data takes the form of that returned by an ATA IDENTIFY DEVICE command, however the driver (except for structure size) makes no interpretation of the data. The structure data is copied wholesale to userspace via a HDIO_GET_IDENTITY ioctl command (eg: hdparm -i dev). Signed-off-by: john cooper john.coo...@redhat.com --- drivers/block/virtio_blk.c | 41 ++--- include/linux/virtio_blk.h |6 ++ 2 files changed, 44 insertions(+), 3 deletions(-) = --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -146,12 +146,46 @@ static void do_virtblk_request(struct re vblk-vq-vq_ops-kick(vblk-vq); } +/* return ATA identify data + */ +static int virtblk_identify(struct gendisk *disk, void *argp) +{ + struct virtio_blk *vblk = disk-private_data; + void *opaque; + int err = -ENOMEM; + + opaque = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL); + if (!opaque) + goto out; + + err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY, + offsetof(struct virtio_blk_config, identify), opaque, + VIRTIO_BLK_ID_BYTES); + + if (err) + goto out_kfree; + + if (copy_to_user(argp, opaque, VIRTIO_BLK_ID_BYTES)) + err = -EFAULT; + +out_kfree: + kfree(opaque); +out: + return err; +} + static int virtblk_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long data) { - return scsi_cmd_ioctl(bdev-bd_disk-queue, - bdev-bd_disk, mode, cmd, - (void __user *)data); + struct gendisk *disk = bdev-bd_disk; + void __user *argp = (void __user *)data; + + switch (cmd) { + case HDIO_GET_IDENTITY: + return virtblk_identify(disk, argp); + default: + return scsi_cmd_ioctl(disk-queue, disk, mode, cmd, argp); + } } /* We provide getgeo only to please some old bootloader/partitioning tools */ @@ -356,6 +390,7 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, + VIRTIO_BLK_F_IDENTIFY }; static struct virtio_driver virtio_blk = { = --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -15,7 +15,12 @@ #define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */ #define VIRTIO_BLK_F_RO5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_IDENTIFY 8 /* ATA IDENTIFY supported */ +#define VIRTIO_BLK_ID_BYTES(sizeof (__u16[256])) /* IDENTIFY DATA */ + +/* mapped into pci i/o region 0 + */ struct virtio_blk_config { /* The capacity (in 512-byte sectors). */ @@ -32,6 +37,7 @@ struct virtio_blk_config } geometry; /* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */ __u32 blk_size; + __u8 identify[VIRTIO_BLK_ID_BYTES]; } __attribute__((packed)); /* These two define direction. */ -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] provide a kvm-free implementation of ioapic
Also, provide a kvm_ioapic that does not depend highly on common code. Signed-off-by: Glauber Costa glom...@redhat.com --- hw/ioapic.c | 162 +- hw/pc.c |7 ++- hw/pc.h |1 + 3 files changed, 110 insertions(+), 60 deletions(-) diff --git a/hw/ioapic.c b/hw/ioapic.c index 6c178c7..3eb5d5f 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -198,58 +198,11 @@ static void ioapic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t va } } -static void kvm_kernel_ioapic_save_to_user(IOAPICState *s) -{ -#if defined(KVM_CAP_IRQCHIP) defined(TARGET_I386) -struct kvm_irqchip chip; -struct kvm_ioapic_state *kioapic; -int i; - -chip.chip_id = KVM_IRQCHIP_IOAPIC; -kvm_get_irqchip(kvm_context, chip); -kioapic = chip.chip.ioapic; - -s-id = kioapic-id; -s-ioregsel = kioapic-ioregsel; -s-base_address = kioapic-base_address; -s-irr = kioapic-irr; -for (i = 0; i IOAPIC_NUM_PINS; i++) { -s-ioredtbl[i] = kioapic-redirtbl[i].bits; -} -#endif -} - -static void kvm_kernel_ioapic_load_from_user(IOAPICState *s) -{ -#if defined(KVM_CAP_IRQCHIP) defined(TARGET_I386) -struct kvm_irqchip chip; -struct kvm_ioapic_state *kioapic; -int i; - -chip.chip_id = KVM_IRQCHIP_IOAPIC; -kioapic = chip.chip.ioapic; - -kioapic-id = s-id; -kioapic-ioregsel = s-ioregsel; -kioapic-base_address = s-base_address; -kioapic-irr = s-irr; -for (i = 0; i IOAPIC_NUM_PINS; i++) { -kioapic-redirtbl[i].bits = s-ioredtbl[i]; -} - -kvm_set_irqchip(kvm_context, chip); -#endif -} - -static void ioapic_save(QEMUFile *f, void *opaque) +static void ioapic_save_common(QEMUFile *f, void *opaque) { IOAPICState *s = opaque; int i; -if (kvm_enabled() qemu_kvm_irqchip_in_kernel()) { -kvm_kernel_ioapic_save_to_user(s); -} - qemu_put_8s(f, s-id); qemu_put_8s(f, s-ioregsel); qemu_put_be64s(f, s-base_address); @@ -259,7 +212,12 @@ static void ioapic_save(QEMUFile *f, void *opaque) } } -static int ioapic_load(QEMUFile *f, void *opaque, int version_id) +static void ioapic_save(QEMUFile *f, void *opaque) +{ +ioapic_save_common(f, opaque); +} + +static int ioapic_load_common(QEMUFile *f, void *opaque, int version_id) { IOAPICState *s = opaque; int i; @@ -283,14 +241,15 @@ static int ioapic_load(QEMUFile *f, void *opaque, int version_id) qemu_get_be64s(f, s-ioredtbl[i]); } -if (kvm_enabled() qemu_kvm_irqchip_in_kernel()) { -kvm_kernel_ioapic_load_from_user(s); -} - return 0; } -static void ioapic_reset(void *opaque) +static int ioapic_load(QEMUFile *f, void *opaque, int version_id) +{ +return ioapic_load_common(f, opaque, version_id); +} + +static void ioapic_reset_common(void *opaque) { IOAPICState *s = opaque; int i; @@ -299,11 +258,11 @@ static void ioapic_reset(void *opaque) s-base_address = IOAPIC_DEFAULT_BASE_ADDRESS; for(i = 0; i IOAPIC_NUM_PINS; i++) s-ioredtbl[i] = 1 16; /* mask LVT */ -#ifdef KVM_CAP_IRQCHIP -if (kvm_enabled() qemu_kvm_irqchip_in_kernel()) { -kvm_kernel_ioapic_load_from_user(s); -} -#endif +} + +static void ioapic_reset(void *opaque) +{ +ioapic_reset_common(opaque); } static CPUReadMemoryFunc *ioapic_mem_read[3] = { @@ -335,3 +294,88 @@ IOAPICState *ioapic_init(void) return s; } + +#ifdef KVM_CAP_IRQCHIP +static void kvm_kernel_ioapic_save_to_user(IOAPICState *s) +{ +#if defined(TARGET_I386) +struct kvm_irqchip chip; +struct kvm_ioapic_state *kioapic; +int i; + +chip.chip_id = KVM_IRQCHIP_IOAPIC; +kvm_get_irqchip(kvm_context, chip); +kioapic = chip.chip.ioapic; + +s-id = kioapic-id; +s-ioregsel = kioapic-ioregsel; +s-base_address = kioapic-base_address; +s-irr = kioapic-irr; +for (i = 0; i IOAPIC_NUM_PINS; i++) { +s-ioredtbl[i] = kioapic-redirtbl[i].bits; +} +#endif +} + +static void kvm_kernel_ioapic_load_from_user(IOAPICState *s) +{ +#if defined(TARGET_I386) +struct kvm_irqchip chip; +struct kvm_ioapic_state *kioapic; +int i; + +chip.chip_id = KVM_IRQCHIP_IOAPIC; +kioapic = chip.chip.ioapic; + +kioapic-id = s-id; +kioapic-ioregsel = s-ioregsel; +kioapic-base_address = s-base_address; +kioapic-irr = s-irr; +for (i = 0; i IOAPIC_NUM_PINS; i++) { +kioapic-redirtbl[i].bits = s-ioredtbl[i]; +} + +kvm_set_irqchip(kvm_context, chip); +#endif +} + +static void kvm_ioapic_save(QEMUFile *f, void *opaque) +{ +IOAPICState *s = opaque; +kvm_kernel_ioapic_save_to_user(s); + +ioapic_save_common(f, opaque); +} + +static int kvm_ioapic_load(QEMUFile *f, void *opaque, int version_id) +{ +IOAPICState *s = opaque; +int r = ioapic_load_common(f, opaque, version_id); + +if (r == 0) +kvm_kernel_ioapic_load_from_user(s); +return r; +} + + +static void
[PATCH 2/4] sipi and init: move common code
provide functions to query and reset the state of sipi and init in cpu's apic. This way we can move the kvm specific functions out of the apic path. Signed-off-by: Glauber Costa glom...@redhat.com --- cpu-defs.h |2 -- hw/apic.c | 49 - qemu-kvm.c | 26 ++ qemu-kvm.h |7 +-- 4 files changed, 55 insertions(+), 29 deletions(-) diff --git a/cpu-defs.h b/cpu-defs.h index 1e071e7..e66e928 100644 --- a/cpu-defs.h +++ b/cpu-defs.h @@ -140,8 +140,6 @@ typedef struct CPUWatchpoint { struct qemu_work_item; struct KVMCPUState { -int sipi_needed; -int init; pthread_t thread; int signalled; int stop; diff --git a/hw/apic.c b/hw/apic.c index 862289d..39e1675 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -86,6 +86,8 @@ typedef struct APICState { uint32_t initial_count; int64_t initial_count_load_time, next_time; QEMUTimer *timer; +int sipi_needed; +int init; } APICState; static int apic_io_memory; @@ -93,6 +95,45 @@ static APICState *local_apics[MAX_APICS + 1]; static int last_apic_id = 0; static int apic_irq_delivered; +int apic_init_received(CPUState *env) +{ +if (!env) +return 0; +if (!env-apic_state) +return 0; + +return env-apic_state-init; +} + +int apic_sipi_needed(CPUState *env) +{ +if (!env) +return 0; +if (!env-apic_state) +return 0; + +return env-apic_state-sipi_needed; +} + +void apic_reset_sipi(CPUState *env) +{ +if (!env) +return; +if (!env-apic_state) +return; + +env-apic_state-sipi_needed = 0; +} + +void apic_reset_init(CPUState *env) +{ +if (!env) +return; +if (!env-apic_state) +return; + +env-apic_state-init = 0; +} static void apic_init_ipi(APICState *s); static void apic_set_irq(APICState *s, int vector_num, int trigger_mode); @@ -475,9 +516,8 @@ static void apic_init_ipi(APICState *s) (!kvm_enabled() || !qemu_kvm_irqchip_in_kernel())) s-cpu_env-halted = 1; -if (kvm_enabled() !qemu_kvm_irqchip_in_kernel()) - if (s-cpu_env) - kvm_apic_init(s-cpu_env); +if (s-cpu_env s-cpu_env-cpu_index != 0) + s-init = 1; } /* send a SIPI message to the CPU to start it */ @@ -490,8 +530,7 @@ static void apic_startup(APICState *s, int vector_num) cpu_x86_load_seg_cache(env, R_CS, vector_num 8, vector_num 12, 0x, 0); env-halted = 0; -if (kvm_enabled() !qemu_kvm_irqchip_in_kernel()) - kvm_update_after_sipi(env); +s-sipi_needed = 1; } static void apic_deliver(APICState *s, uint8_t dest, uint8_t dest_mode, diff --git a/qemu-kvm.c b/qemu-kvm.c index 68d3b92..b5954bc 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -134,19 +134,6 @@ void kvm_update_interrupt_request(CPUState *env) } } -void kvm_update_after_sipi(CPUState *env) -{ -env-kvm_cpu_state.sipi_needed = 1; -kvm_update_interrupt_request(env); -} - -void kvm_apic_init(CPUState *env) -{ -if (env-cpu_index != 0) - env-kvm_cpu_state.init = 1; -kvm_update_interrupt_request(env); -} - #include signal.h static int try_push_interrupts(void *opaque) @@ -332,7 +319,7 @@ static void kvm_vm_state_change_handler(void *context, int running, int reason) static void update_regs_for_sipi(CPUState *env) { kvm_arch_update_regs_for_sipi(env); -env-kvm_cpu_state.sipi_needed = 0; +apic_reset_sipi(env); } static void update_regs_for_init(CPUState *env) @@ -345,11 +332,10 @@ static void update_regs_for_init(CPUState *env) #ifdef TARGET_I386 /* restore SIPI vector */ -if(env-kvm_cpu_state.sipi_needed) +if (apic_sipi_needed(env)) env-segs[R_CS] = cs; #endif - -env-kvm_cpu_state.init = 0; +apic_reset_init(env); kvm_arch_load_regs(env); } @@ -407,12 +393,12 @@ static int kvm_main_loop_cpu(CPUState *env) if (env-interrupt_request (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI)) env-halted = 0; if (!kvm_irqchip_in_kernel(kvm_context)) { - if (env-kvm_cpu_state.init) + if (apic_init_received(env)) update_regs_for_init(env); - if (env-kvm_cpu_state.sipi_needed) + if (apic_sipi_needed(env)) update_regs_for_sipi(env); } - if (!env-halted !env-kvm_cpu_state.init) + if (!env-halted !apic_init_received(env)) kvm_cpu_exec(env); env-exit_request = 0; env-exception_index = EXCP_INTERRUPT; diff --git a/qemu-kvm.h b/qemu-kvm.h index 725589b..eb7dc29 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -31,9 +31,13 @@ void kvm_remove_all_breakpoints(CPUState *current_env); int kvm_update_guest_debug(CPUState *env, unsigned long reinject_trap); int kvm_qemu_init_env(CPUState *env); int kvm_qemu_check_extension(int ext); -void kvm_apic_init(CPUState *env); +/* FIXME: there should be an apic.h file */ /* called
Re: [KVM PATCH v5 0/2] iosignalfd
Gregory Haskins wrote: (Applies to kvm.git/master:25deed73) This is v5 of the series. For more details, please see the header to patch 2/2. This series has been tested against the kvm-eventfd unit test, and appears to be functioning properly. You can download this test here: ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2 This series is ready to be considered for inclusion, pending any further review comments. Sorry, Marcello. I re-used an old email when composing this. :) Marcello, Avi, and myself have previously agreed that Marcello's mmio-locking cleanup should go in first. When that happens, I will need to rebase this series because it changes how you interface to the io_bus code. I should have mentioned that here, but forgot. (Speaking of, is there an ETA when that code will be merged Avi?) That aside, after I sent this series I went to get some coffee to clear my head and I thought of an issue in the code. I will reply inline to patch 2/2. -Greg [ Changelog: v5: *) Removed cookie field, which was a misunderstanding on my part on what Avi wanted for a data-match feature *) Added a new trigger data-match feature which I think is much closer to what we need. *) We retain the dev_count field in the io_bus infrastructure and instead back-fill the array on removal. *) Various minor cleanups *) Rebased to kvm.git/master:25deed73 v4: *) Fixed a bug in the original 2/4 where the PIT failure case would potentially leave the io_bus components registered. *) Condensed the v3 2/4 and 3/4 into one patch (2/2) since the patches became interdependent with the fix described above *) Rebased to kvm.git/master:74dfca0a v3: *) fixed patch 2/4 to handle error cases instead of BUG_ON *) implemented same HAVE_EVENTFD protection mechanism as irqfd to prevent compilation errors on unsupported arches *) completed testing *) rebased to kvm.git/master:7391a6d5 v2: *) added optional data-matching capability (via cookie field) *) changed name from iofd to iosignalfd *) added io_bus unregister function *) implemented deassign feature v1: *) original release (integrated into irqfd v7 series as iofd) ] --- Gregory Haskins (2): kvm: add iosignalfd support kvm: make io_bus interface more robust arch/x86/kvm/i8254.c | 22 +++ arch/x86/kvm/i8259.c |9 + arch/x86/kvm/x86.c|1 include/linux/kvm.h | 15 ++ include/linux/kvm_host.h | 16 ++ virt/kvm/coalesced_mmio.c |8 + virt/kvm/eventfd.c| 356 + virt/kvm/ioapic.c |9 + virt/kvm/kvm_main.c | 41 + 9 files changed, 462 insertions(+), 15 deletions(-) signature.asc Description: OpenPGP digital signature
Re: [KVM PATCH v5 2/2] kvm: add iosignalfd support
Gregory Haskins wrote: iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM heavy-weight exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called doorbell. This module has a function called doorbell_ring() which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via iosignalfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 11 iops, 9.09us rtt iosignalfd-mmio: 200100 iops, 5.00us rtt iosignalfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt iosignalfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. Signed-off-by: Gregory Haskins ghask...@novell.com --- arch/x86/kvm/x86.c |1 include/linux/kvm.h | 15 ++ include/linux/kvm_host.h | 10 + virt/kvm/eventfd.c | 356 ++ virt/kvm/kvm_main.c | 11 + 5 files changed, 389 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c1ed485..c96c0e3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1097,6 +1097,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_IRQ_INJECT_STATUS: case KVM_CAP_ASSIGN_DEV_IRQ: case KVM_CAP_IRQFD: + case KVM_CAP_IOSIGNALFD: case KVM_CAP_PIT2: r = 1; break; diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 632a856..53b720d 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -300,6 +300,19 @@ struct kvm_guest_debug { struct kvm_guest_debug_arch arch; }; +#define KVM_IOSIGNALFD_FLAG_TRIGGER (1 0) +#define KVM_IOSIGNALFD_FLAG_PIO (1 1) +#define KVM_IOSIGNALFD_FLAG_DEASSIGN (1 2) + +struct kvm_iosignalfd { + __u64 trigger; + __u64 addr; + __u32 len; + __u32 fd; + __u32 flags; + __u8 pad[36]; +}; + #define KVM_TRC_SHIFT 16 /* * kvm trace categories @@ -430,6 +443,7 @@ struct kvm_trace_rec { #ifdef __KVM_HAVE_PIT #define KVM_CAP_PIT2 33 #endif +#define KVM_CAP_IOSIGNALFD 34 #ifdef KVM_CAP_IRQ_ROUTING @@ -537,6 +551,7 @@ struct kvm_irqfd { #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) #define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) +#define KVM_IOSIGNALFD _IOW(KVMIO, 0x78, struct kvm_iosignalfd) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
Mark McLoughlin wrote: On Wed, 2009-05-27 at 13:40 -0400, Gregory Haskins wrote: Mark McLoughlin wrote: On Wed, 2009-05-27 at 15:11 +0300, Avi Kivity wrote: Multiple cookies on the same address are required by virtio. You can't mux since the data doesn't go anywhere. Virtio can survive by checking all rings on a notify, and we can later add a mechanism that has a distinct address for each ring, but let's see if we can cope with multiple cookies. Mark? Trying to catch up, but you're talking about replacing virtio-pci QUEUE_NOTIFY handling with iosignalfd ? For a perfect replacement, what you really need is to be able to register multiple cookies per address range, but only have them trigger if the written data matches a provided value. Hmm..thats an interesting idea. To date, the cookie has really been for identifying the proper range selected for deassignment. I never thought of using it as an actual trigger value at run-time. If the data is lost, virtio has no way of knowing which queue is being notified, so we either end up with per-device, rather than per-queue, notifications (probably not too bad for net, at least) or a different notify address per queue (limiting the number of queues per device). The addr-per-queue is how I was envisioning it, but the trigger value concept hadn't occurred to me. I could make this an option during assignment (e.g. COOKIE flag means only trigger on writes of the provided cookie, otherwise trigger on any write). Sound good? Ah, I'd been thinking of the trigger data being provided separately to the cookie. The virtio ABI is fixed, so we couldn't e.g. have the guest use a cookie to identify a queue - it's just going to continue using a per-device queue number. So, if the cookie was also the trigger, we'd need an eventfd per device. And if this was a device where the guest writes similar values to multiple addresses, you'd need an eventfd per address. Hi Mark, So with the v5 release of iosignalfd, we now have the notion of a trigger, the API of which is as follows: --- /*! * \brief Assign an eventfd to an IO port (PIO or MMIO) * * Assigns an eventfd based file-descriptor to a specific PIO or MMIO * address range. Any guest writes to the specified range will generate * an eventfd signal. * * A data-match pointer can be optionally provided in trigger and only * writes which match this value exactly will generate an event. The length * of the trigger is established by the length of the overall IO range, and * therefore must be in a natural byte-width for the IO routines of your * particular architecture (e.g. 1, 2, 4, or 8 bytes on x86_64). * * \param kvm Pointer to the current kvm_context * \param addr The IO address * \param len The length of the IO region at the address * \param fd The eventfd file-descriptor * \param trigger A optional pointer providing data-match token * \param flags FLAG_PIO: PIO, else MMIO */ int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long addr, size_t len, int fd, void *trigger, int flags); - in the kvm-eventfd test harness, I create three unique eventfd handles, and do the following: --- unsigned char matchA = 0xa5, matchB = 0x42; kvm_assign_iosignalfd(kvm_context, addr, 1, fd[0], NULL, 0); kvm_assign_iosignalfd(kvm_context, addr, 1, fd[1], matchA, 0); kvm_assign_iosignalfd(kvm_context, addr, 1, fd[2], matchB, 0); --- In otherwords, I register a NULL trigger (wildcarded) on the first fd. The second has a data-match trigger of 0xa5, and the third has 0x42. All three of these eventfd's map to the same mmio address with a width of 1 byte. I also fork a task which selects all three fds, and will print out the eventfd count value when tripped. Then, in the guest, I do: -- iowrite8(0, iosignalfd_mmio); iowrite8(0xa5, iosignalfd_mmio); iowrite8(0x42, iosignalfd_mmio); --- The result of which is: IOSIGNALFD 0: event triggered with val 3 IOSIGNALFD 1: event triggered with val 1 IOSIGNALFD 2: event triggered with val 1 on the host, which is my expected outcome. Let me know if you do not think this is sufficient to implement a solution to your virtio-pci design. -Greg signature.asc Description: OpenPGP digital signature
Re: [KVM PATCH v5 2/2] kvm: add iosignalfd support
On Wed, Jun 03, 2009 at 04:17:49PM -0400, Gregory Haskins wrote: iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM heavy-weight exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called doorbell. This module has a function called doorbell_ring() which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via iosignalfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 11 iops, 9.09us rtt iosignalfd-mmio: 200100 iops, 5.00us rtt iosignalfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt iosignalfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. One set of decision criteria for RCU vs. SRCU below, and one question about the RCU reader running concurrently with a deletion. Looks pretty good, in general! Thanx, Paul Signed-off-by: Gregory Haskins ghask...@novell.com --- arch/x86/kvm/x86.c |1 include/linux/kvm.h | 15 ++ include/linux/kvm_host.h | 10 + virt/kvm/eventfd.c | 356 ++ virt/kvm/kvm_main.c | 11 + 5 files changed, 389 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c1ed485..c96c0e3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1097,6 +1097,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_IRQ_INJECT_STATUS: case KVM_CAP_ASSIGN_DEV_IRQ: case KVM_CAP_IRQFD: + case KVM_CAP_IOSIGNALFD: case KVM_CAP_PIT2: r = 1; break; diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 632a856..53b720d 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -300,6 +300,19 @@ struct kvm_guest_debug { struct kvm_guest_debug_arch arch; }; +#define KVM_IOSIGNALFD_FLAG_TRIGGER (1 0) +#define KVM_IOSIGNALFD_FLAG_PIO (1 1) +#define KVM_IOSIGNALFD_FLAG_DEASSIGN (1 2) + +struct kvm_iosignalfd { + __u64 trigger; + __u64 addr; + __u32 len; + __u32 fd; + __u32 flags; + __u8 pad[36]; +}; + #define KVM_TRC_SHIFT 16 /* * kvm trace categories @@ -430,6 +443,7 @@ struct kvm_trace_rec { #ifdef __KVM_HAVE_PIT #define KVM_CAP_PIT2 33 #endif +#define KVM_CAP_IOSIGNALFD 34 #ifdef KVM_CAP_IRQ_ROUTING @@ -537,6 +551,7 @@ struct kvm_irqfd { #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) #define KVM_IRQFD _IOW(KVMIO,
[RFC] CPU hard limits
Hi, This is an RFC about the CPU hard limits feature where I have explained the need for the feature, the proposed plan and the issues around it. Before I come up with an implementation for hard limits, I would like to know community's thoughts on this scheduler enhancement and any feedback and suggestions. Regards, Bharata. 1. CPU hard limit 2. Need for hard limiting CPU resource 3. Granularity of enforcing CPU hard limits 4. Existing solutions 5. Specifying hard limits 6. Per task group vs global bandwidth period 7. Configuring 8. Throttling of tasks 9. Group scheduler hierarchy considerations 10. SMP considerations 11. Starvation 12. Hard limit and fairness 1. CPU hard limit - CFS is a proportional share scheduler which tries to divide the CPU time proportionately between tasks or groups of tasks (task group/cgroup) depending on the priority/weight of the task or shares assigned to groups of tasks. In CFS, a task/task group can get more than its share of CPU if there are enough idle CPU cycles available in the system, due to the work conserving nature of the scheduler. However there are scenarios (Sec 2) where giving more than the desired CPU share to a task/task group is not acceptable. In those scenarios, the scheduler needs to put a hard stop on the CPU resource consumption of task/task group if it exceeds a preset limit. This is usually achieved by throttling the task/task group when it fully consumes its allocated CPU time. 2. Need for hard limiting CPU resource -- - Pay-per-use: In enterprise systems that cater to multiple clients/customers where a customer demands a certain share of CPU resources and pays only that, CPU hard limits will be useful to hard limit the customer's job to consume only the specified amount of CPU resource. - In container based virtualization environments running multiple containers, hard limits will be useful to ensure a container doesn't exceed its CPU entitlement. - Hard limits can be used to provide guarantees. 3. Granularity of enforcing CPU hard limits --- Conceptually, hard limits can either be enforced for individual tasks or groups of tasks. However enforcing limits per task would be too fine grained and would be a lot of work on the part of the system administrator in terms of setting limits for every task. Based on the current understanding of the users of this feature, it is felt that hard limiting is more useful at task group level than the individual tasks level. Hence in the subsequent paragraphs, the concept of hard limit as applicable to task group/cgroup is discussed. 4. Existing solutions - - Both Linux-VServer and OpenVZ virtualization solutions support CPU hard limiting. - Per task limit can be enforced using rlimits, but it is not rate based. 5. Specifying hard limits - CPU time consumed by a task group is generally measured over a time period (called bandwidth period) and the task group gets throttled when its CPU time reaches a limit (hard limit) within a bandwidth period. The task group remains throttled until the bandwidth period gets renewed at which time additional CPU time becomes available to the tasks in the system. When a task group's hard limit is specified as a ratio X/Y, it means that the group will get throttled if its CPU time consumption exceeds X seconds in a bandwidth period of Y seconds. Specifying the hard limit as X/Y requires us to specify the bandwidth period also. Is having a uniform/same bandwidth period for all the groups an option ? If so, we could even specify the hard limit as a percentage, like 30% of a uniform bandwidth period. 6. Per task group vs global bandwidth period The bandwidth period can either be per task group or global. With global bandwidth period, the runtimes of all the task groups need to be replenished when the period ends. Though this appears conceptually simple, the implementation might not scale. Instead if every task group maintains its bandwidth period separately, the refresh cycles of each group will happen independent of each other. Moreover different groups might prefer different bandwidth periods. Hence the first implementation will have per task group bandwidth period. Timers can be used to trigger bandwidth refresh cycles. (similar to rt group sched) 7. Configuring -- - User could set the hard limit (X and/or Y) through the cgroup fs. - When the scheduler supports hard limiting, should it be enabled for all tasks groups of the system ? Or should user have an option to enable hard limiting per group ? - When hard limiting is enabled for a group, should the limit be set to a default to start with ? Or should the user set the limit and the bandwidth before enabling the hard limiting ? - What should be a sane default value for the bandwidth period ? 8. Throttling of