[KVM-AUTOTEST][COMMIT] Delete kvm_runtest_old test

2009-06-03 Thread Uri Lublin
From: Eduardo Habkost ehabk...@redhat.com

Signed-off-by: Eduardo Habkost ehabk...@redhat.com


 *** Diff is too large (405826 bytes) -- not showing *** 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Sync from site_host import SiteHost code with upstream autotest

2009-06-03 Thread Uri Lublin
From: Eduardo Habkost ehabk...@redhat.com

Signed-off-by: Eduardo Habkost ehabk...@redhat.com

diff --git a/server/hosts/__init__.py b/server/hosts/__init__.py
index 21bca1c..f8b6803 100644
--- a/server/hosts/__init__.py
+++ b/server/hosts/__init__.py
@@ -12,7 +12,10 @@ You should 'import hosts' instead of importing every 
available host module.
 # host abstract classes
 from base_classes import Host
 from remote import RemoteHost
-#from site_host import SiteHost
+try:
+from site_host import SiteHost
+except ImportError, e:
+pass

 # host implementation classes
 from ssh_host import SSHHost
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Merge autotest SVN revision 3185

2009-06-03 Thread Uri Lublin
From: Eduardo Habkost ehabk...@redhat.com

This will pulls some new code that is needed for the changes that will be
implemented on the kvm tests.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Moving from kvm_log to the upstream logging system.

2009-06-03 Thread Uri Lublin
From: Lucas Meneghel Rodrigues l...@redhat.com

kvm_log can be replaced by the upstream logging system with advantages:
 * Uses the standard python module, that allows us to define multiple message 
destinations
 * It implements nearly all functionalities implemented on kvm_log

This patch converts all kvm_log statements to logging statements.


 *** Diff is too large (57356 bytes) -- not showing *** 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Fixing docstrings and lines of code wider than 80 characters.

2009-06-03 Thread Uri Lublin
From: Lucas Meneghel Rodrigues mrodr...@redhat.com

The autotest coding standards specify the preferred way of doing
docstring documentation, as well as the maximum line width in
80 characters. This change converts all non compliances.


 *** Diff is too large (185612 bytes) -- not showing *** 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Sanitizing strings before passing them to the logging system.

2009-06-03 Thread Uri Lublin
From: Lucas Meneghel Rodrigues l...@redhat.com

The logging system encodes messages using the utf-8 encoding by
default. So sometimes it's necessary to re-encode lines when
non unicode characters are thrown up by the command outputs.

This patch re-encodes lines of text that will be passed to the
logging system on the track_process function on kvm_utils.

diff --git a/client/tests/kvm_runtest_2/kvm_utils.py 
b/client/tests/kvm_runtest_2/kvm_utils.py
index bcc80a1..9ef5954 100644
--- a/client/tests/kvm_runtest_2/kvm_utils.py
+++ b/client/tests/kvm_runtest_2/kvm_utils.py
@@ -755,6 +755,9 @@ def track_process(sub, status_output=None, term_func=None, 
stdout_func=None,
 # Call stdout_func with the returned text
 if stdout_func:
 text = prefix + text.strip()
+# We need to sanitize the text before passing it to the logging
+# system
+text = text.decode('utf-8', 'replace')
 stdout_func(text)

--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Trimming trailing whitespaces from all files on the source directory

2009-06-03 Thread Uri Lublin
From: Lucas Meneghel Rodrigues l...@redhat.com

The previous patches introduces trailing whitespaces to all files.
This patch removes trailing whitespaces from all lines of the
files present on the kvm_runtest_2 test.


 *** Diff is too large (161734 bytes) -- not showing *** 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Changing kvm_runtest_2 to kvm.

2009-06-03 Thread Uri Lublin
From: Lucas Meneghel Rodrigues l...@redhat.com



 *** Diff is too large (85588 bytes) -- not showing *** 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Fixing import style on make_html_report

2009-06-03 Thread Uri Lublin
From: Lucas Meneghel Rodrigues l...@redhat.com


diff --git a/client/tests/kvm/make_html_report.py 
b/client/tests/kvm/make_html_report.py
index a660cff..988b2f3 100755
--- a/client/tests/kvm/make_html_report.py
+++ b/client/tests/kvm/make_html_report.py
@@ -6,13 +6,8 @@ Script used to parse the test results and generate an HTML 
report.
 @copyright: Red Hat 2008-2009
 

+import os, sys, re, getopt, time, datetime, commands

-import os, sys
-import re
-import getopt
-import time
-import datetime
-import commands

 format_css=
 html,body {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Final adjustments on the KVM test

2009-06-03 Thread Uri Lublin
From: Lucas Meneghel Rodrigues l...@redhat.com


diff --git a/client/tests/kvm/control b/client/tests/kvm/control
index d11f72f..f4b2113 100644
--- a/client/tests/kvm/control
+++ b/client/tests/kvm/control
@@ -143,7 +143,7 @@ for dict in list:
 dependencies_satisfied = False
 break
 if dependencies_satisfied:
-current_status = job.run_test(kvm_runtest_2, params=dict,
+current_status = job.run_test(kvm, params=dict,
   tag=dict.get(shortname))
 else:
 current_status = False
diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py
index 83ec9be..91ec89a 100644
--- a/client/tests/kvm/kvm.py
+++ b/client/tests/kvm/kvm.py
@@ -61,7 +61,7 @@ class kvm(test.test):
 keys = params.keys()
 keys.sort()
 for key in keys:
-logging.debug(%s = %s % (key, params[key]))
+logging.debug(%s = %s % (key, params[key]))
 self.write_test_keyval({key: params[key]})

 # Open the environment file
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Add Dror Russo to author lists

2009-06-03 Thread Uri Lublin
From: Eduardo Habkost ehabk...@redhat.com

So the code will match the code on the upstream autotest SVN repository.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com

diff --git a/client/tests/kvm/control b/client/tests/kvm/control
index f4b2113..eb38686 100644
--- a/client/tests/kvm/control
+++ b/client/tests/kvm/control
@@ -1,5 +1,6 @@
 AUTHOR = 
 u...@redhat.com (Uri Lublin)
+dru...@redhat.com (Dror Russo)
 mgold...@redhat.com (Michael Goldish)
 dh...@redhat.com (David Huff)
 aerom...@redhat.com (Alexey Eromenko)
diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py
index 91ec89a..1b9013c 100644
--- a/client/tests/kvm/kvm.py
+++ b/client/tests/kvm/kvm.py
@@ -17,6 +17,7 @@ class kvm(test.test):

 @copyright: Red Hat 2008-2009
 @author: Uri Lublin (u...@redhat.com)
+@author: Dror Russo (dru...@redhat.com)
 @author: Michael Goldish (mgold...@redhat.com)
 @author: David Huff (dh...@redhat.com)
 @author: Alexey Eromenko (aerom...@redhat.com)
diff --git a/client/tests/kvm/make_html_report.py 
b/client/tests/kvm/make_html_report.py
index 988b2f3..6aed39e 100755
--- a/client/tests/kvm/make_html_report.py
+++ b/client/tests/kvm/make_html_report.py
@@ -4,6 +4,7 @@ Script used to parse the test results and generate an HTML 
report.

 @copyright: (c)2005-2007 Matt Kruse (javascripttoolbox.com)
 @copyright: Red Hat 2008-2009
+...@author: Dror Russo (dru...@redhat.com)
 

 import os, sys, re, getopt, time, datetime, commands
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][COMMIT] Merge branch 'master' of git://github.com/ehabkost/autotest into kvm-merge2

2009-06-03 Thread Uri Lublin
From: Eduardo Habkost ehabk...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts

2009-06-03 Thread Michael Goldish

- Avi Kivity a...@redhat.com wrote:

 David Huff wrote:
  This patch will run pre and post scripts
  defined in config file with the parameter pre_command
  and post_command post_command.
 
  Also exports all the prameters in preprocess for passing
  arguments to the script.
   
  +   #execute any pre_commands
  +pre_command = params.get(pre_command)
  +if pre_command:
  +# export environment vars
  +for k in params.keys():
  +kvm_log.info(Adding KVM_TEST_%s to Environment %
 (k))
  +os.putenv(KVM_TEST_%s % (k), str(params[k]))
  +# execute command
  +kvm_log.info(Executing command '%s'... % pre_command)
  +timeout = int(params.get(pre_commmand_timeout, 600))
  +(status, pid, output) = kvm_utils.run_bg(cd %s; %s %
 (test.bindir, pre_command),
  + None,
 kvm_log.debug, (pre_command) , timeout=timeout)
  +if status != 0:
  +kvm_utils.safe_kill(pid, signal.SIGTERM)
  +raise error.TestError, Custom processing pre_command
 failed

 
 kvm_utils.run_bg should throw an exception instead of returning
 status.

- What if we're interested in the status for some reason? Its value
may indicate what went wrong with the child process.
- If we throw an exception we should add a parameter that controls
whether an exception should be thrown (something like ignore_status)
because in some cases we don't want to throw an exception.
- A run_bg call has 3 possible outcomes: success (status == 0), failure
(status != 0) and timeout (process still running). If we throw an exception
upon failure, what do we do upon timeout? Sometimes it's good (qemu should
keep running unless something went wrong) and sometimes bad (pre_command
should probably not time out). So we end up needing an ignore_timeout
parameter as well.
- What if we want the test to fail with an informative test-specific
exception such as something failed after migration?

 But if status != 0, will there actually be a pid to kill?

If the timeout expires and the process is still running, status is None.

 -- 
 error compiling committee.c: too many arguments to function
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-03 Thread Michael S. Tsirkin
On Tue, Jun 02, 2009 at 01:41:05PM -0400, Gregory Haskins wrote:
  And having close not clean up the state unless you do an ioctl first is
  very messy IMO - I don't think you'll find any such examples in kernel.
 

 
 I agree, and that is why I am advocating this POLLHUP solution.  It was
 only this other way to begin with because the technology didn't exist
 until Davide showed me the light.
 
 Problem with your request is that I already looked into what is
 essentially a bi-directional reference problem (for a different reason)
 when I started the POLLHUP series.  Its messy to do this in a way that
 doesn't negatively impact the fast path (introducing locking, etc) or
 make my head explode making sure it doesn't race.  Afaict, we would need
 to solve this problem to do what you are proposing (patches welcome).
 
 If this hybrid decoupled-deassign + unified-close is indeed an important
 feature set, I suggest that we still consider this POLLHUP series for
 inclusion, and then someone can re-introduce DEASSIGN support in the
 future as a CAP bit extension.  That way we at least get the desirable
 close() properties that we both seem in favor of, and get this advanced
 use case when we need it (and can figure out the locking design).
 

FWIW, I took a look and yes, it is non-trivial.
I concur, we can always add the deassign ioctl later.



-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts

2009-06-03 Thread Avi Kivity

Michael Goldish wrote:

- Avi Kivity a...@redhat.com wrote:

  

David Huff wrote:


This patch will run pre and post scripts
defined in config file with the parameter pre_command
and post_command post_command.

Also exports all the prameters in preprocess for passing
arguments to the script.
 
+   #execute any pre_commands

+pre_command = params.get(pre_command)
+if pre_command:
+# export environment vars
+for k in params.keys():
+kvm_log.info(Adding KVM_TEST_%s to Environment %
  

(k))


+os.putenv(KVM_TEST_%s % (k), str(params[k]))
+# execute command
+kvm_log.info(Executing command '%s'... % pre_command)
+timeout = int(params.get(pre_commmand_timeout, 600))
+(status, pid, output) = kvm_utils.run_bg(cd %s; %s %
  

(test.bindir, pre_command),


+ None,
  

kvm_log.debug, (pre_command) , timeout=timeout)


+if status != 0:
+kvm_utils.safe_kill(pid, signal.SIGTERM)
+raise error.TestError, Custom processing pre_command
  

failed

  
  

kvm_utils.run_bg should throw an exception instead of returning
status.



- What if we're interested in the status for some reason? Its value
may indicate what went wrong with the child process.
  

Put it in the exception string.


- If we throw an exception we should add a parameter that controls
whether an exception should be thrown (something like ignore_status)
because in some cases we don't want to throw an exception.
  


I'll complain every time I see it.  If you don't care if the command 
succeeds or not, why run it in the first place?



- A run_bg call has 3 possible outcomes: success (status == 0), failure
(status != 0) and timeout (process still running). If we throw an exception
upon failure, what do we do upon timeout? Sometimes it's good (qemu should
keep running unless something went wrong) and sometimes bad (pre_command
should probably not time out). So we end up needing an ignore_timeout
parameter as well.
  


A run_bg() call should return an object with a .join() method, or throw 
an exception.  At some point you must call the join() method, which 
throws an exception or returns None.



- What if we want the test to fail with an informative test-specific
exception such as something failed after migration?
  


Chained exceptions can provide detailed information.


But if status != 0, will there actually be a pid to kill?



If the timeout expires and the process is still running, status is None.
  


functions which can return with three possible outcomes are difficult to 
use.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-autotest: False PASS results

2009-06-03 Thread Michael Goldish

- Uri Lublin u...@redhat.com wrote:

 On 05/10/2009 08:15 PM, sudhir kumar wrote:
  Hi Uri,
  Any comments?
 
 
  -- Forwarded message --
  From: sudhir kumarsmalik...@gmail.com
 
  The kvm-autotest shows the following PASS results for migration,
  while the VM was crashed and test should have failed.
 
  Here is the sequence of test commands and results grepped from
  kvm-autotest output.
 
 
 /root/sudhir/regression/test/kvm-autotest-phx/client/tests/kvm_runtest_2/qemu
  -name 'vm1' -monitor
  unix:/tmp/monitor-20090508-055624-QSuS,server,nowait -drive
 
 file=/root/sudhir/regression/test/kvm-autotest-phx/client/tests/kvm_runtest_2/images/rhel5-32.raw,if=ide,boot=on
  -net nic,vlan=0 -net user,vlan=0 -m 8192
  -smp 4 -redir tcp:5000::22 -vnc :1
 
 
 
 /root/sudhir/regression/test/kvm-autotest-phx/client/tests/kvm_runtest_2/qemu
  -name 'dst' -monitor
  unix:/tmp/monitor-20090508-055625-iamW,server,nowait -drive
 
 file=/root/sudhir/regression/test/kvm-autotest-phx/client/tests/kvm_runtest_2/images/rhel5-32.raw,if=ide,boot=on
  -net nic,vlan=0 -net user,vlan=0 -m 8192
  -smp 4 -redir tcp:5001::22 -vnc :2 -incoming tcp:0:5200
 
 
 
  2009-05-08 05:58:43,471 Configuring logger for client level
  GOOD
  kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.1
  END GOOD
  kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.1
 
  GOOD
  kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2
  kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2
  timestamp=1241762371
  localtime=May 08 05:59:31   completed successfully
  Persistent state variable __group_level now set to 1
  END GOOD
  kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2
  kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2
  timestamp=1241762371
  localtime=May 08 05:59:31
 
   From the test output it looks that the test was succesful to
  log into the guest after migration:
 
  20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: Migration
  finished successfully
  20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  send_monitor_cmd: Sending monitor command: screendump
 
 /root/sudhir/regression/test/kvm-autotest-phx/client/results/default/kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2/debug/migration_post.ppm
  20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  send_monitor_cmd: Sending monitor command: screendump
 
 /root/sudhir/regression/test/kvm-autotest-phx/client/results/default/kvm_runtest_2.raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2/debug/migration_pre.ppm
  20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  send_monitor_cmd: Sending monitor command: quit
  20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  is_sshd_running: Timeout
  20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: Logging
 into
  guest after migration...
  20090508-055926 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  remote_login: Trying to login...
  20090508-055927 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  remote_login: Got 'Are you sure...'; sending 'yes'
  20090508-055927 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  remote_login: Got password prompt; sending '123456'
  20090508-055928 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  remote_login: Got shell prompt -- logged in
  20090508-055928 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: Logged in
  after migration
  20090508-055928 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  get_command_status_output: Sending command: help
  20090508-055930 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  postprocess_vm: Postprocessing VM 'vm1'...
  20090508-055930 raw.8gb_mem.smp4.RHEL.5.3.i386.migrate.2: DEBUG:
  postprocess_vm: VM object found in environment
 
  When I did vnc to the final migrated VM was crashed with a call
 trace
  as shown in the attachment.
  Quite less possible that the call trace appeared after the test
  finished as migration with memory
  more than 4GB is already broken [BUG 52527]. This looks a false
 PASS
  to me. Any idea how can we handle
  such falso positive results? Shall we wait for sometime after
  migration, log into the vm, do some work or run some good test,
  get output and report that the vm is alive?
 
 
 
 I don't think it's a False PASS.
 It seems the test was able to ssh into the guest, and run a command on
 the guest.

I wouldn't call this a false PASS either.

The migration_test_command parameter gives you some control over the test.
You can set it to something more informative than the default 'help'.
Note that it should probably be different for Linux and Windows guests.
The migration test runs the command before and after migration, compares
the outputs, and fails if they don't match.

Is it possible that the VNC connection caused the crash?

 Currently we only run migration once (round-trip). I think we should
 run 
 migration more than once (using iterations). If the guest crashes due
 

Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts

2009-06-03 Thread Michael Goldish

- Avi Kivity a...@redhat.com wrote:

 Michael Goldish wrote:
  - Avi Kivity a...@redhat.com wrote:
 

  David Huff wrote:
  
  This patch will run pre and post scripts
  defined in config file with the parameter pre_command
  and post_command post_command.
 
  Also exports all the prameters in preprocess for passing
  arguments to the script.
   
  +   #execute any pre_commands
  +pre_command = params.get(pre_command)
  +if pre_command:
  +# export environment vars
  +for k in params.keys():
  +kvm_log.info(Adding KVM_TEST_%s to Environment %

  (k))
  
  +os.putenv(KVM_TEST_%s % (k), str(params[k]))
  +# execute command
  +kvm_log.info(Executing command '%s'... % pre_command)
  +timeout = int(params.get(pre_commmand_timeout, 600))
  +(status, pid, output) = kvm_utils.run_bg(cd %s; %s %

  (test.bindir, pre_command),
  
  + None,

  kvm_log.debug, (pre_command) , timeout=timeout)
  
  +if status != 0:
  +kvm_utils.safe_kill(pid, signal.SIGTERM)
  +raise error.TestError, Custom processing
 pre_command

  failed
  


  kvm_utils.run_bg should throw an exception instead of returning
  status.
  
 
  - What if we're interested in the status for some reason? Its value
  may indicate what went wrong with the child process.

 Put it in the exception string.

But I want its value to be examined programmatically in the code.
It's less comfortable to retrieve it from a string.

  - If we throw an exception we should add a parameter that controls
  whether an exception should be thrown (something like
 ignore_status)
  because in some cases we don't want to throw an exception.

 
 I'll complain every time I see it.  If you don't care if the command 
 succeeds or not, why run it in the first place?

I care, but sometimes I don't want to fail the test when a command fails.
I can put every run_bg() call in a try-except statement, but that misses the
point of raising exceptions in the first place.

  - A run_bg call has 3 possible outcomes: success (status == 0),
 failure
  (status != 0) and timeout (process still running). If we throw an
 exception
  upon failure, what do we do upon timeout? Sometimes it's good (qemu
 should
  keep running unless something went wrong) and sometimes bad
 (pre_command
  should probably not time out). So we end up needing an
 ignore_timeout
  parameter as well.

 
 A run_bg() call should return an object with a .join() method, or
 throw 
 an exception.  At some point you must call the join() method, which 
 throws an exception or returns None.

OK, I have queued patches to make run_bg() return such an object anyway,
except for the exceptions part.

  - What if we want the test to fail with an informative
 test-specific
  exception such as something failed after migration?

 
 Chained exceptions can provide detailed information.

Wouldn't it complicate the test code?
How can I provide a detailed message such as test command failed after 
migration --
can you illustrate this in a small example?

  But if status != 0, will there actually be a pid to kill?
  
 
  If the timeout expires and the process is still running, status is
 None.

 
 functions which can return with three possible outcomes are difficult
 to use.

I tend to see it as two possible outcomes: completion or timeout. In the former
case, status is returned. In the latter, None is returned.

 -- 
 Do not meddle in the internals of kernels, for they are subtle and
 quick to panic.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts

2009-06-03 Thread Avi Kivity

Michael Goldish wrote:



- What if we're interested in the status for some reason? Its value
may indicate what went wrong with the child process.
  
  

Put it in the exception string.



But I want its value to be examined programmatically in the code.
It's less comfortable to retrieve it from a string.
  


Then create a custom exception object and put it in a field.

  

- If we throw an exception we should add a parameter that controls
whether an exception should be thrown (something like
  

ignore_status)


because in some cases we don't want to throw an exception.
  
  
I'll complain every time I see it.  If you don't care if the command 
succeeds or not, why run it in the first place?



I care, but sometimes I don't want to fail the test when a command fails.
I can put every run_bg() call in a try-except statement, but that misses the
point of raising exceptions in the first place.
  


I think it's very rare to want to let the test continue even if some 
command fails.


Can you give examples?


exception such as something failed after migration?
  
  

Chained exceptions can provide detailed information.



Wouldn't it complicate the test code?
How can I provide a detailed message such as test command failed after 
migration --
can you illustrate this in a small example?
  


 try:
code
code
code
 except e:
raise ChainedException('exception while running migration test', e)

Instead of checking each code line, you provide a wrapper for the entire 
test.




But if status != 0, will there actually be a pid to kill?



If the timeout expires and the process is still running, status is
  

None.

  
  

functions which can return with three possible outcomes are difficult
to use.



I tend to see it as two possible outcomes: completion or timeout. In the former
case, status is returned. In the latter, None is returned.
  


The caller has to test for three possible outcomes.  Success, failure, 
and 'command is running in the background'.  If your callsites don't 
check for all three, something's wrong.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] kvm-kmod-2.6.30-rc8

2009-06-03 Thread Avi Kivity

This is a package containing the kvm external module, based on the
2.6.30 series.  kvm-kmod-2.6.30-rc8 contains the kvm code that is
present in Linux 2.6.30-rc8, except that it can run on older kernels.
It is a good companion to the qemu-kvm-0.10 series.

Note that performance and features will vary with the host kernel used.

Changes from kvm-kmod-2.6.30-rc6:
- update source to Linux 2.6.30-rc6
  - fix futex failures on i386 PAE guest kernels
- build fix for iommu-less host kernels
- build fixes on pre-2.6.18 kernels (Jan Kiszka)
- makefile typo fix (Jan Kiszka)

http://www.linux-kvm.org

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Add serial number support for virtio_blk, V4

2009-06-03 Thread Jens Axboe
On Mon, Jun 01 2009, Rusty Russell wrote:
 On Fri, 29 May 2009 01:45:27 pm john cooper wrote:
  virtio_blk-serial-4.patch
 
 Hate to ask dumb questions, but is there a scsi equivalent of this?  It'd be 
 nice if we could avoid being ATA-specific in the long run...

SCSI has mode pages, where ATA pretty much stuffs everything into the
identify data.

 Also, why u16?

The identify page is word based, so u16 makes sense.

 
 Thanks,
 Rusty.
 
  +/* return ATA identify data
  + */
  +static int virtblk_identify(struct gendisk *disk, void *argp)
  +{
  +   struct virtio_blk *vblk = disk-private_data;
  +   u16 *id;
  +   int err = -ENOMEM;
  +
  +   id = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL);
  +   if (!id)
  +   goto out;
  +
  +   err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY,
  +   offsetof(struct virtio_blk_config, identify), id,
  +   VIRTIO_BLK_ID_BYTES);
 

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] always halt non-bsp cpu.

2009-06-03 Thread Gleb Natapov
On Tue, Jun 02, 2009 at 03:37:47PM -0400, Glauber Costa wrote:
 This is not kvm specific, and should do fine in plain qemu
 
 Signed-off-by: Glauber Costa glom...@redhat.com
 ---
  hw/apic.c |3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)
 
 diff --git a/hw/apic.c b/hw/apic.c
 index 86aa6b6..2eddba0 100644
 --- a/hw/apic.c
 +++ b/hw/apic.c
 @@ -467,8 +467,7 @@ static void apic_init_ipi(APICState *s)
  
  cpu_reset(s-cpu_env);
  
 -if (!(s-apicbase  MSR_IA32_APICBASE_BSP) 
 -(!kvm_enabled() || !qemu_kvm_irqchip_in_kernel()))
 +if (!(s-apicbase  MSR_IA32_APICBASE_BSP))
  s-cpu_env-halted = 1;
  
  if (kvm_enabled()  !qemu_kvm_irqchip_in_kernel())
Once upon a time there was a good reason to add this: 
358bdca5beb733c983bd5e7406c7893e19e9600b

What have changed?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] qemu: virtio save/load bindings

2009-06-03 Thread Avi Kivity

Anthony Liguori wrote:

Michael S. Tsirkin wrote:

Implement bindings for virtio save/load. Use them in virtio pci.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Is anyone working to fill in load/save bindings so that saving virtio
devices works? Here's a trivial patch to do this (this one is on top 
of my

MSI-X patchset).
Comments?

 hw/virtio-pci.c |   49 
-

 hw/virtio.c |   31 ++-
 hw/virtio.h |4 
 3 files changed, 66 insertions(+), 18 deletions(-)

 static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
diff --git a/hw/virtio.c b/hw/virtio.c
index 63ffcff..b773dff 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -568,9 +568,8 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 {
 int i;
 
-/* FIXME: load/save binding.  */

-//pci_device_save(vdev-pci_dev, f);
-//msix_save(vdev-pci_dev, f);
  


qdev regressed save/restore?  What else is broken right now from the 
qdev commit?


I'm beginning to think committing in the state it was in was a 
mistake.  Paul, can you put together a TODO so that we know all of the 
things that have regressed so we can get things back into shape?




virtio is still broken.  Anthony, are you planning to commit this patch?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] always halt non-bsp cpu.

2009-06-03 Thread Jan Kiszka
Glauber Costa wrote:
 On Wed, Jun 03, 2009 at 12:32:04AM +0200, Jan Kiszka wrote:
 Glauber Costa wrote:
 On Wed, Jun 03, 2009 at 12:01:00AM +0200, Jan Kiszka wrote:
 Glauber Costa wrote:
 On Tue, Jun 02, 2009 at 10:35:47PM +0200, Jan Kiszka wrote:
 Glauber Costa wrote:
 This is not kvm specific, and should do fine in plain qemu
 This is fine with plain qemu already. The problem, IIUC, is that
 in-kernel kvm irqchip does not have a chance to remove the halted state
 again. Did you test the effect of this patch on that scenario? What
 makes it safe to be removed now?
 IIRC, the in kernel irqchip sets halted = 0 in the very beginning of
 the vcpu initialization.

 It is tested here with in-kernel irqchip and works, so probably not
 a problem, unless you can spot something.
 At least your patch applied alone breaks -smp 1 here.

 But the whole management of env-halted for the in-kernel irqchip in
 qemu-kvm is a bit hacky IMHO. Maybe it's time to rethink this. Would be
 nice to always see a consistent halted in user space, specifically for
 debugging purposes.
 out of curiosity: did you apply the whole series?
 Meanwhile I did, but it makes no difference.

 
 Can you try putting the following patch before this one?

If it helps you to understand the issue, I will do so later.

But I *really* suggest to take this chance and develop in-kernel irqchip
support that does not mess with halted, rather keeps it consistent (on
register sync) and maybe adds exceptions from if (!env-halted) tests
where required. IMHO, this is mandatory for an upstream merge!

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] always halt non-bsp cpu.

2009-06-03 Thread Jan Kiszka
Glauber Costa wrote:
 On Wed, Jun 03, 2009 at 12:32:04AM +0200, Jan Kiszka wrote:
 Glauber Costa wrote:
 On Wed, Jun 03, 2009 at 12:01:00AM +0200, Jan Kiszka wrote:
 Glauber Costa wrote:
 On Tue, Jun 02, 2009 at 10:35:47PM +0200, Jan Kiszka wrote:
 Glauber Costa wrote:
 This is not kvm specific, and should do fine in plain qemu
 This is fine with plain qemu already. The problem, IIUC, is that
 in-kernel kvm irqchip does not have a chance to remove the halted state
 again. Did you test the effect of this patch on that scenario? What
 makes it safe to be removed now?
 IIRC, the in kernel irqchip sets halted = 0 in the very beginning of
 the vcpu initialization.

 It is tested here with in-kernel irqchip and works, so probably not
 a problem, unless you can spot something.
 At least your patch applied alone breaks -smp 1 here.

 But the whole management of env-halted for the in-kernel irqchip in
 qemu-kvm is a bit hacky IMHO. Maybe it's time to rethink this. Would be
 nice to always see a consistent halted in user space, specifically for
 debugging purposes.
 out of curiosity: did you apply the whole series?
 Meanwhile I did, but it makes no difference.
 Jan, can you be more specific on why it breaks? I'm double trying it
 here, and it works for me just fine.
 

It locked up in the BIOS after reset from a booted SMP guest, somewhere
before selecting the boot device.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] always halt non-bsp cpu.

2009-06-03 Thread Gleb Natapov
On Wed, Jun 03, 2009 at 01:01:29PM +0200, Jan Kiszka wrote:
 Glauber Costa wrote:
  On Wed, Jun 03, 2009 at 12:32:04AM +0200, Jan Kiszka wrote:
  Glauber Costa wrote:
  On Wed, Jun 03, 2009 at 12:01:00AM +0200, Jan Kiszka wrote:
  Glauber Costa wrote:
  On Tue, Jun 02, 2009 at 10:35:47PM +0200, Jan Kiszka wrote:
  Glauber Costa wrote:
  This is not kvm specific, and should do fine in plain qemu
  This is fine with plain qemu already. The problem, IIUC, is that
  in-kernel kvm irqchip does not have a chance to remove the halted state
  again. Did you test the effect of this patch on that scenario? What
  makes it safe to be removed now?
  IIRC, the in kernel irqchip sets halted = 0 in the very beginning of
  the vcpu initialization.
 
  It is tested here with in-kernel irqchip and works, so probably not
  a problem, unless you can spot something.
  At least your patch applied alone breaks -smp 1 here.
 
  But the whole management of env-halted for the in-kernel irqchip in
  qemu-kvm is a bit hacky IMHO. Maybe it's time to rethink this. Would be
  nice to always see a consistent halted in user space, specifically for
  debugging purposes.
  out of curiosity: did you apply the whole series?
  Meanwhile I did, but it makes no difference.
 
  
  Can you try putting the following patch before this one?
 
 If it helps you to understand the issue, I will do so later.
 
 But I *really* suggest to take this chance and develop in-kernel irqchip
 support that does not mess with halted, rather keeps it consistent (on
 register sync) and maybe adds exceptions from if (!env-halted) tests
 where required. IMHO, this is mandatory for an upstream merge!
 
The difference between kernel/userspace irq chip is that in
former case halted cpu sleeps in the kernel and in later case in the
usespace (well also in the kernel but not in kvm code). We currently
abuse halted to do different sleeps. Cpu loop should be reworked.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2800447 ] kvm fails to build against out of tree kernel builds+patch

2009-06-03 Thread SourceForge.net
Bugs item #2800447, was opened at 2009-06-03 11:17
Message generated for change (Tracker Item Submitted) made by tuxjay
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: James (tuxjay)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm fails to build against out of tree kernel builds+patch

Initial Comment:
Hi,

The first problem is the 

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2800447 ] kvm fails to build against out of tree kernel builds+patch

2009-06-03 Thread SourceForge.net
Bugs item #2800447, was opened at 2009-06-03 11:17
Message generated for change (Comment added) made by tuxjay
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: James (tuxjay)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm fails to build against out of tree kernel builds+patch

Initial Comment:
Hi,

The first problem is the 

--

Comment By: James (tuxjay)
Date: 2009-06-03 11:19

Message:
Hi,

The first problem is that kvm fails to detect the kernel source directory
correctly..

In configure the line
   kernelsourcedir=${kerneldir%/*}/source

Only works if ../source is the build parent (e.g. if the source dir is
/lib/modules/version/source)

The first patch fixes that. The next is a build problem:

-Iarch/${ARCH_DIR}/include

should be: -I$(KERNELSOURCEDIR)/${ARCH_DIR}/include

In the case of out of place builds. The second patch fixes that.

Cheers, James

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM_AUTOTEST] unattended installs

2009-06-03 Thread Lucas Meneghel Rodrigues
On Tue, 2009-06-02 at 20:25 -0500, Charles Duffy wrote:
 David Huff wrote:
  I had some issues getting the guest to read a vvfat floppy passed to the 
  guest
  via qemu option -fda fat:floppy:rw:./floppy it would pass the dir as a 
  vfat
  disk and the guest could mount it, however guest could not read the files 
  on the
  vdisk..
 
 I'm surprised you opted to work around this with a loop mount rather 
 than using mtools, given the former approach's need for root.

Autotest runs the tests as root, so we will have root privileges
anyway...

 Granted, getting files off the ISO _could_ also require root -- in my 
 own analog to these scripts, I use iso-read (from the libcdio package):
 
 # Pull kernel and initrd
 iso-read --image $OS_ISO \
   --extract /isolinux/vmlinuz \
   --output-file $TMPDIR/vmlinuz
 iso-read --image $OS_ISO \
   --extract /isolinux/initrd.img \
   --output-file $TMPDIR/initrd.img
 # Write floppy image
 mformat -C -f 1440 -i $TMPDIR/floppy.img ::
 mcopy $TMPDIR/floppy.d/* -i $TMPDIR/floppy.img ::/
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-03 Thread Gregory Haskins
Michael S. Tsirkin wrote:
 On Tue, Jun 02, 2009 at 01:41:05PM -0400, Gregory Haskins wrote:
   
 And having close not clean up the state unless you do an ioctl first is
 very messy IMO - I don't think you'll find any such examples in kernel.

   
   
 I agree, and that is why I am advocating this POLLHUP solution.  It was
 only this other way to begin with because the technology didn't exist
 until Davide showed me the light.

 Problem with your request is that I already looked into what is
 essentially a bi-directional reference problem (for a different reason)
 when I started the POLLHUP series.  Its messy to do this in a way that
 doesn't negatively impact the fast path (introducing locking, etc) or
 make my head explode making sure it doesn't race.  Afaict, we would need
 to solve this problem to do what you are proposing (patches welcome).

 If this hybrid decoupled-deassign + unified-close is indeed an important
 feature set, I suggest that we still consider this POLLHUP series for
 inclusion, and then someone can re-introduce DEASSIGN support in the
 future as a CAP bit extension.  That way we at least get the desirable
 close() properties that we both seem in favor of, and get this advanced
 use case when we need it (and can figure out the locking design).

 

 FWIW, I took a look and yes, it is non-trivial.
 I concur, we can always add the deassign ioctl later.



   

Sounds good, Michael.  Thanks!

-Greg



signature.asc
Description: OpenPGP digital signature


[ kvm-Bugs-2800447 ] kvm fails to build against out of tree kernel builds+patch

2009-06-03 Thread SourceForge.net
Bugs item #2800447, was opened at 2009-06-03 13:17
Message generated for change (Comment added) made by kiszka
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: James (tuxjay)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm fails to build against out of tree kernel builds+patch

Initial Comment:
Hi,

The first problem is the 

--

Comment By: Jan Kiszka (kiszka)
Date: 2009-06-03 13:41

Message:
Please post your patches with individual changelogs and signed-off to
k...@vger.kernel.org. TiA.

--

Comment By: James (tuxjay)
Date: 2009-06-03 13:19

Message:
Hi,

The first problem is that kvm fails to detect the kernel source directory
correctly..

In configure the line
   kernelsourcedir=${kerneldir%/*}/source

Only works if ../source is the build parent (e.g. if the source dir is
/lib/modules/version/source)

The first patch fixes that. The next is a build problem:

-Iarch/${ARCH_DIR}/include

should be: -I$(KERNELSOURCEDIR)/${ARCH_DIR}/include

In the case of out of place builds. The second patch fixes that.

Cheers, James

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] revert part of 3db8b916e merge

2009-06-03 Thread Gleb Natapov
kvm_*_mpstate() cannot be called from kvm_arch_*_registers()
since kvm_arch_*_registers() sometimes called from io thread, but
kvm_*_mpstate() can be called only by cpu thread.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 68d3b92..7ed1e06 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1431,26 +1431,3 @@ void qemu_kvm_cpu_stop(CPUState *env)
 if (kvm_enabled())
 env-kvm_cpu_state.stopped = 1;
 }
-
-void kvm_arch_get_registers(CPUState *env)
-{
-kvm_save_registers(env);
-kvm_save_mpstate(env);
-}
-
-void kvm_arch_put_registers(CPUState *env)
-{
-kvm_load_registers(env);
-kvm_load_mpstate(env);
-}
-
-
-void cpu_synchronize_state(CPUState *env, int modified)
-{
-if (kvm_enabled()) {
-if (modified)
-kvm_arch_put_registers(env);
-else
-kvm_arch_get_registers(env);
-}
-}
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 725589b..5d47e88 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -216,10 +216,25 @@ int kvm_log_stop(target_phys_addr_t phys_addr, 
target_phys_addr_t len);
 
 static inline int kvm_sync_vcpus(void) { return 0; }
 
-void kvm_arch_get_registers(CPUState *env);
-void kvm_arch_put_registers(CPUState *env);
+static inline void kvm_arch_get_registers(CPUState *env)
+{
+kvm_save_registers(env);
+}
 
-void cpu_synchronize_state(CPUState *env, int modified);
+static inline void kvm_arch_put_registers(CPUState *env)
+{
+kvm_load_registers(env);
+}
+
+static inline void cpu_synchronize_state(CPUState *env, int modified)
+{
+if (kvm_enabled()) {
+if (modified)
+kvm_arch_put_registers(env);
+else
+kvm_arch_get_registers(env);
+}
+}
 
 uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
   int reg);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index f280d3d..07df1e1 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -32,7 +32,10 @@ void cpu_save(QEMUFile *f, void *opaque)
 int32_t pending_irq;
 int i, bit;
 
-cpu_synchronize_state(env, 0);
+if (kvm_enabled()) {
+kvm_save_registers(env);
+kvm_save_mpstate(env);
+}
 
 for(i = 0; i  CPU_NB_REGS; i++)
 qemu_put_betls(f, env-regs[i]);
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] revert part of 3db8b916e merge

2009-06-03 Thread Avi Kivity

Gleb Natapov wrote:

kvm_*_mpstate() cannot be called from kvm_arch_*_registers()
since kvm_arch_*_registers() sometimes called from io thread, but
kvm_*_mpstate() can be called only by cpu thread.

  


I really dislike vcpu functions to be called from outside the vcpu 
thread.  Who are the callers?



Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 68d3b92..7ed1e06 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1431,26 +1431,3 @@ void qemu_kvm_cpu_stop(CPUState *env)
 if (kvm_enabled())
 env-kvm_cpu_state.stopped = 1;
 }
-
-void kvm_arch_get_registers(CPUState *env)
-{
-kvm_save_registers(env);
-kvm_save_mpstate(env);
-}
-
-void kvm_arch_put_registers(CPUState *env)
-{
-kvm_load_registers(env);
-kvm_load_mpstate(env);
-}
-
-
-void cpu_synchronize_state(CPUState *env, int modified)
-{
-if (kvm_enabled()) {
-if (modified)
-kvm_arch_put_registers(env);
-else
-kvm_arch_get_registers(env);
-}
-}
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 725589b..5d47e88 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -216,10 +216,25 @@ int kvm_log_stop(target_phys_addr_t phys_addr, 
target_phys_addr_t len);
 
 static inline int kvm_sync_vcpus(void) { return 0; }
 
-void kvm_arch_get_registers(CPUState *env);

-void kvm_arch_put_registers(CPUState *env);
+static inline void kvm_arch_get_registers(CPUState *env)
+{
+kvm_save_registers(env);
+}
 
-void cpu_synchronize_state(CPUState *env, int modified);

+static inline void kvm_arch_put_registers(CPUState *env)
+{
+kvm_load_registers(env);
+}
+
+static inline void cpu_synchronize_state(CPUState *env, int modified)
+{
+if (kvm_enabled()) {
+if (modified)
+kvm_arch_put_registers(env);
+else
+kvm_arch_get_registers(env);
+}
+}
 
 uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,

   int reg);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index f280d3d..07df1e1 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -32,7 +32,10 @@ void cpu_save(QEMUFile *f, void *opaque)
 int32_t pending_irq;
 int i, bit;
 
-cpu_synchronize_state(env, 0);

+if (kvm_enabled()) {
+kvm_save_registers(env);
+kvm_save_mpstate(env);
+}
 
 for(i = 0; i  CPU_NB_REGS; i++)

 qemu_put_betls(f, env-regs[i]);
--
Gleb.
  



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] revert part of 3db8b916e merge

2009-06-03 Thread Gleb Natapov
On Wed, Jun 03, 2009 at 03:16:46PM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 kvm_*_mpstate() cannot be called from kvm_arch_*_registers()
 since kvm_arch_*_registers() sometimes called from io thread, but
 kvm_*_mpstate() can be called only by cpu thread.

   

 I really dislike vcpu functions to be called from outside the vcpu  
 thread.  Who are the callers?

monitor.c. May be others (git grep cpu_synchronize_state). But
kvm_save_registers() does the right thing and calls another function on
vcpu thread.

 Signed-off-by: Gleb Natapov g...@redhat.com
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 68d3b92..7ed1e06 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -1431,26 +1431,3 @@ void qemu_kvm_cpu_stop(CPUState *env)
  if (kvm_enabled())
  env-kvm_cpu_state.stopped = 1;
  }
 -
 -void kvm_arch_get_registers(CPUState *env)
 -{
 -kvm_save_registers(env);
 -kvm_save_mpstate(env);
 -}
 -
 -void kvm_arch_put_registers(CPUState *env)
 -{
 -kvm_load_registers(env);
 -kvm_load_mpstate(env);
 -}
 -
 -
 -void cpu_synchronize_state(CPUState *env, int modified)
 -{
 -if (kvm_enabled()) {
 -if (modified)
 -kvm_arch_put_registers(env);
 -else
 -kvm_arch_get_registers(env);
 -}
 -}
 diff --git a/qemu-kvm.h b/qemu-kvm.h
 index 725589b..5d47e88 100644
 --- a/qemu-kvm.h
 +++ b/qemu-kvm.h
 @@ -216,10 +216,25 @@ int kvm_log_stop(target_phys_addr_t phys_addr, 
 target_phys_addr_t len);
   static inline int kvm_sync_vcpus(void) { return 0; }
  -void kvm_arch_get_registers(CPUState *env);
 -void kvm_arch_put_registers(CPUState *env);
 +static inline void kvm_arch_get_registers(CPUState *env)
 +{
 +kvm_save_registers(env);
 +}
  -void cpu_synchronize_state(CPUState *env, int modified);
 +static inline void kvm_arch_put_registers(CPUState *env)
 +{
 +kvm_load_registers(env);
 +}
 +
 +static inline void cpu_synchronize_state(CPUState *env, int modified)
 +{
 +if (kvm_enabled()) {
 +if (modified)
 +kvm_arch_put_registers(env);
 +else
 +kvm_arch_get_registers(env);
 +}
 +}
   uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t 
 function,
int reg);
 diff --git a/target-i386/machine.c b/target-i386/machine.c
 index f280d3d..07df1e1 100644
 --- a/target-i386/machine.c
 +++ b/target-i386/machine.c
 @@ -32,7 +32,10 @@ void cpu_save(QEMUFile *f, void *opaque)
  int32_t pending_irq;
  int i, bit;
  -cpu_synchronize_state(env, 0);
 +if (kvm_enabled()) {
 +kvm_save_registers(env);
 +kvm_save_mpstate(env);
 +}
   for(i = 0; i  CPU_NB_REGS; i++)
  qemu_put_betls(f, env-regs[i]);
 --
  Gleb.
   


 -- 
 Do not meddle in the internals of kernels, for they are subtle and quick to 
 panic.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2800447 ] kvm fails to build against out of tree kernel builds+patch

2009-06-03 Thread SourceForge.net
Bugs item #2800447, was opened at 2009-06-03 13:17
Message generated for change (Comment added) made by kiszka
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: James (tuxjay)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm fails to build against out of tree kernel builds+patch

Initial Comment:
Hi,

The first problem is the 

--

Comment By: Jan Kiszka (kiszka)
Date: 2009-06-03 14:24

Message:
That's trivial, look at other patches posted to the list: add a plain
Signed-off-by: YOUR NAME your-em...@address.

--

Comment By: James (tuxjay)
Date: 2009-06-03 14:19

Message:
I've prepared the patches and changelog entries, how do I get it signed
off? I've been searching the website for the patch procedure and couldn't
find it, sorry.

Thank you

--

Comment By: Jan Kiszka (kiszka)
Date: 2009-06-03 13:41

Message:
Please post your patches with individual changelogs and signed-off to
k...@vger.kernel.org. TiA.

--

Comment By: James (tuxjay)
Date: 2009-06-03 13:19

Message:
Hi,

The first problem is that kvm fails to detect the kernel source directory
correctly..

In configure the line
   kernelsourcedir=${kerneldir%/*}/source

Only works if ../source is the build parent (e.g. if the source dir is
/lib/modules/version/source)

The first patch fixes that. The next is a build problem:

-Iarch/${ARCH_DIR}/include

should be: -I$(KERNELSOURCEDIR)/${ARCH_DIR}/include

In the case of out of place builds. The second patch fixes that.

Cheers, James

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2800447group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix detection of kernel source directory

2009-06-03 Thread James Pike
Fix detection of kernel source directory when it is different to build
directory.
Signed-off-by: James Pike ja...@chilon.net
diff --git a/configure b/configure
index e01ba98..c3b7a5f 100755
--- a/configure
+++ b/configure
@@ -91,7 +91,7 @@ arch=${arch%%-*}
 
 # see if we have split build and source directories
 if [ -d $kerneldir/include2 ]; then
-kernelsourcedir=${kerneldir%/*}/source
+kernelsourcedir=`readlink -f $kerneldir/source`
 fi
 
 if [ -n $no_uname -a $want_module ]; then


[PATCH] fix include paths when the kernel source and build directory are different

2009-06-03 Thread James Pike
Use correct architecture includes when kernel source and build directory
are different.
Signed-off-by: James Pike ja...@chilon.net
diff --git a/Makefile b/Makefile
index 95e4c81..ad08c45 100644
--- a/Makefile
+++ b/Makefile
@@ -27,8 +27,9 @@ all:: prerequisite
 #  include header priority 1) $LINUX 2) $KERNELDIR 3) include-compat
$(MAKE) -C $(KERNELDIR) M=`pwd` \
LINUXINCLUDE=-I`pwd`/include -Iinclude \
-   $(if $(KERNELSOURCEDIR),-Iinclude2 
-I$(KERNELSOURCEDIR)/include) \
-   -Iarch/${ARCH_DIR}/include -I`pwd`/include-compat \
+   $(if $(KERNELSOURCEDIR),\
+   -Iinclude2 -I$(KERNELSOURCEDIR)/include 
-I$(KERNELSOURCEDIR)/arch/${ARCH_DIR}/include, \
+   -Iarch/${ARCH_DIR}/include) -I`pwd`/include-compat \
-include include/linux/autoconf.h \
-include `pwd`/$(ARCH_DIR)/external-module-compat.h 
$(module_defines) \
$$@


Re: Windows Server 2008 VM performance

2009-06-03 Thread Andrew Theurer

Avi Kivity wrote:

Andrew Theurer wrote:


Is there a virtio_block driver to test?  


There is, but it isn't available yet.

OK.  Can I assume a better virtio_net driver is in the works as well?


Can we find the root cause of the exits (is there a way to get stack 
dump or something that can show where there are coming from)?


Marcelo is working on a super-duper easy to use kvm trace which can 
show what's going on.  The old one is reasonably easy though it 
exports less data.  If you can generate some traces, I'll have a look 
at them.


Thanks Avi.  I'll try out kvm-86 and see if I can generate some kvm 
trace data.


-Andrew

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-RFC PATCH 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl

2009-06-03 Thread Paul E. McKenney
On Tue, Jun 02, 2009 at 09:53:29PM -0400, Gregory Haskins wrote:
 Paul E. McKenney wrote:
  On Tue, Jun 02, 2009 at 02:23:14PM -0400, Gregory Haskins wrote:

  Paul E. McKenney wrote:
  
  On Tue, Jun 02, 2009 at 11:15:38AM -0400, Gregory Haskins wrote:


  Assigning an irqfd object to a kvm object creates a relationship that we
  currently manage by having the kvm oject acquire/hold a file* reference 
  to
  the underlying eventfd.  The lifetime of these objects is properly 
  maintained
  by decoupling the two objects whenever the irqfd is closed or kvm is 
  closed,
  whichever comes first.
 
  However, the irqfd close method is less than ideal since it requires 
  two
  system calls to complete (one for ioctl(kvmfd, IRQFD_DEASSIGN), the 
  other for
  close(eventfd)).  This dual-call approach was utilized because there was 
  no
  notification mechanism on the eventfd side at the time irqfd was 
  implemented.
 
  Recently, Davide proposed a patch to send a POLLHUP wakeup whenever an
  eventfd is about to close.  So we eliminate the IRQFD_DEASSIGN ioctl (*)
  vector in favor of sensing the desassign automatically when the fd is 
  closed.
  The resulting code is slightly more complex as a result since we need to
  allow either side to sever the relationship independently.  We utilize 
  SRCU
  to guarantee stable concurrent access to the KVM pointer without adding
  additional atomic operations in the fast path.
 
  At minimum, this design should be acked by both Davide and Paul (cc'd).
 
  (*) The irqfd patch does not exist in any released tree, so the 
  understanding
  is that we can alter the irqfd specific ABI without taking the normal
  precautions, such as CAP bits.
  
  
  A few questions and suggestions interspersed below.
 
Thanx, Paul


  Thanks for the review, Paul.
  
 
  Some questions, clarifications, and mea culpas below.
 
  Thanx, Paul
 

  (FYI: This isn't quite what I was asking you about on IRC yesterday, but
  it's related...and the SRCU portion of the conversation *did* inspire me
  here.  Just note that the stuff I was asking about is still forthcoming)
  
 
  ;-)
 

  Signed-off-by: Gregory Haskins ghask...@novell.com
  CC: Davide Libenzi davi...@xmailserver.org
  CC: Michael S. Tsirkin m...@redhat.com
  CC: Paul E. McKenney paul...@linux.vnet.ibm.com
  ---
 
   include/linux/kvm.h |2 -
   virt/kvm/eventfd.c  |  177 
  +++
   virt/kvm/kvm_main.c |3 +
   3 files changed, 81 insertions(+), 101 deletions(-)
 
  diff --git a/include/linux/kvm.h b/include/linux/kvm.h
  index 632a856..29b62cc 100644
  --- a/include/linux/kvm.h
  +++ b/include/linux/kvm.h
  @@ -482,8 +482,6 @@ struct kvm_x86_mce {
   };
   #endif
 
  -#define KVM_IRQFD_FLAG_DEASSIGN (1  0)
  -
   struct kvm_irqfd {
   __u32 fd;
   __u32 gsi;
  diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
  index f3f2ea1..6ed62e2 100644
  --- a/virt/kvm/eventfd.c
  +++ b/virt/kvm/eventfd.c
  @@ -37,26 +37,63 @@
* 
*/
   struct _irqfd {
  +struct mutex  lock;
  +struct srcu_structsrcu;
   struct kvm   *kvm;
   int   gsi;
  -struct file  *file;
   struct list_head  list;
   poll_tablept;
   wait_queue_head_t*wqh;
   wait_queue_t  wait;
  -struct work_structwork;
  +struct work_structinject;
   };
 
   static void
   irqfd_inject(struct work_struct *work)
   {
  -struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
  -struct kvm *kvm = irqfd-kvm;
  +struct _irqfd *irqfd = container_of(work, struct _irqfd, 
  inject);
  +struct kvm *kvm;
  +int idx;
  +
  +idx = srcu_read_lock(irqfd-srcu);
  +
  +kvm = rcu_dereference(irqfd-kvm);
  
 
 [4]
 
  +if (kvm) {
  +mutex_lock(kvm-lock);
  +kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 
  irqfd-gsi, 1);
  +kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 
  irqfd-gsi, 0);
  +mutex_unlock(kvm-lock);
  +}
  +
  +srcu_read_unlock(irqfd-srcu, idx);
  +}
  
  
  [1]
 
  
  +
  +static void
  +irqfd_disconnect(struct _irqfd *irqfd)
  +{
  +struct kvm *kvm;
  +
  +mutex_lock(irqfd-lock);
  +
  +kvm = rcu_dereference(irqfd-kvm);
  +rcu_assign_pointer(irqfd-kvm, NULL);
  +
  +mutex_unlock(irqfd-lock);
  
  
  [2]
 
  
  +
  +if (!kvm)
  +return;
 
   mutex_lock(kvm-lock);
  -kvm_set_irq(kvm, 

Re: [PATCH 2/2] Add serial number support for virtio_blk, V4

2009-06-03 Thread john cooper
Jens Axboe wrote:
 On Mon, Jun 01 2009, Rusty Russell wrote:
 On Fri, 29 May 2009 01:45:27 pm john cooper wrote:
 virtio_blk-serial-4.patch
 Hate to ask dumb questions, but is there a scsi equivalent of this?  It'd be 
 nice if we could avoid being ATA-specific in the long run...
 
 SCSI has mode pages, where ATA pretty much stuffs everything into the
 identify data.
 
 Also, why u16?
 
 The identify page is word based, so u16 makes sense.

It is actually an artifact left over from the previous
version.  In that case the driver was only pulling
the serial number over, itself constructing a mocked-up
identify page, and needed to be aware of the internal
structure.

Currently qemu fabricates the identify page which the
driver treats as opaque data passed onto the user.  So
the u16 * structure has no meaning.

Sorry to be late in responding here.  A patch to address
this wrinkle and other feedback to follow shortly.

-john


 +/* return ATA identify data
 + */
 +static int virtblk_identify(struct gendisk *disk, void *argp)
 +{
 +   struct virtio_blk *vblk = disk-private_data;
 +   u16 *id;
 +   int err = -ENOMEM;
 +
 +   id = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL);
 +   if (!id)
 +   goto out;
 +
 +   err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY,
 +   offsetof(struct virtio_blk_config, identify), id,
 +   VIRTIO_BLK_ID_BYTES);
 


-- 
john.coo...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-RFC PATCH 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl

2009-06-03 Thread Gregory Haskins

Thanks again for the review, Paul.  IIUC, you think the design is ok as
it is.

Davide,
   In light of this, would you like to submit patch 1/2 formally with
your SOB at your earliest convenience?  Or would you prefer that I
submit it and you can simply ack it?  Either is fine with me.

-Greg



signature.asc
Description: OpenPGP digital signature


Re: [KVM-RFC PATCH 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl

2009-06-03 Thread Davide Libenzi
On Wed, 3 Jun 2009, Gregory Haskins wrote:

 
 Thanks again for the review, Paul.  IIUC, you think the design is ok as
 it is.
 
 Davide,
In light of this, would you like to submit patch 1/2 formally with
 your SOB at your earliest convenience?  Or would you prefer that I
 submit it and you can simply ack it?  Either is fine with me.

Will do later today...



- Davide


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows Server 2008 VM performance

2009-06-03 Thread Marcelo Tosatti
On Wed, Jun 03, 2009 at 09:21:16AM -0500, Andrew Theurer wrote:
 Avi Kivity wrote:
 Andrew Theurer wrote:


 Is there a virtio_block driver to test?  

 There is, but it isn't available yet.
 OK.  Can I assume a better virtio_net driver is in the works as well?

 Can we find the root cause of the exits (is there a way to get stack  
 dump or something that can show where there are coming from)?

 Marcelo is working on a super-duper easy to use kvm trace which can  
 show what's going on.  The old one is reasonably easy though it  
 exports less data.  If you can generate some traces, I'll have a look  
 at them.

 Thanks Avi.  I'll try out kvm-86 and see if I can generate some kvm  
 trace data.

clone
git://git.kernel.org/pub/scm/linux/kernel/git/marcelo/linux-2.6-x86-kvmtrace.git,
 
checkout (remote) branch kvmtrace, _build with KVM=y,KVM_{INTEL,AMD}=y_ 
(there's a missing export symbol, will pull the upstream fix ASAP).

Then once running the benchmark with this kernel, on a phase thats
indicative of the workload, do:

echo kvm_exit  /debugfs/tracing/set_event
sleep n
cat /debugfs/tracing/trace  /tmp/trace-save.txt

Make n relatively small, 1 or 2 seconds. Would be nice to see UP vs SMP
Win2008 guests.

echo  /debugfs/tracing/set_event to stop tracing
echo  /debugfs/tracing/trace to zero the trace buffer

With some post processing you can the get the exit reason percentages.

Alternatively you can use systemtap (see attached script, need some
adjustment in the entry/exit line number probes) so it calculates the
exit percentages for you.


global exit_names[70]
global latency[70]
global vmexit
global vmentry
global vmx_exit_reason

// run stap -t kvm.stp to find out the overhead of which probe
// and change it here. should be automatic!
global exit_probe_overhead = 645
global entry_probe_overhead = 882
global handlexit_probe_overhead = 405
global overhead

probe begin (1)
{
overhead = 882 + 645 + 405
exit_names[0] = EXCEPTION
exit_names[1] = EXTERNAL INT
exit_names[7] = PENDING INTERRUPT
exit_names[10] = CPUID
exit_names[18] = HYPERCALL
exit_names[28] = CR ACCESS
exit_names[29] = DR ACCESS
exit_names[30] = IO INSTRUCTION (LIGHT)
exit_names[31] = MSR READ
exit_names[32] = MSR WRITE
exit_names[44] = APIC ACCESS
exit_names[60] = IO INSTRUCTION (HEAVY)
}

probe module(kvm_intel).statement(kvm_handle_e...@arch/x86/kvm/vmx.c:3011)
{
vmx_exit_reason = $exit_reason
}

// exit
probe module(kvm_intel).statement(vmx_vcpu_...@arch/x86/kvm/vmx.c:3226) 
{
vmexit = get_cycles()
}

// entry
probe module(kvm_intel).function(vmx_vcpu_run)
{
vmentry = get_cycles()
if (vmx_exit_reason != 12)
latency[vmx_exit_reason]  vmentry - vmexit - overhead
}

//heavy-exit
probe module(kvm).function(kvm_arch_vcpu_ioctl_run)
{
vmx_exit_reason = 60
}

probe end {
   foreach (x in latency-) {
printf(%d: %s\n, x, exit_names[x])
printf (avg %d = sum %d / count %d\n,
   @avg(latency[x]), @sum(latency[x]), @count(latency[x]))
printf (min %d max %d\n, @min(latency[x]), @max(latency[x]))
print(@hist_linear(latency[x], 1000, 5000, 1000))
}

print(\n)
}





Re: [git pull request] kvm-autotest: sync with upstream Autotest SVN

2009-06-03 Thread Uri Lublin

On 06/01/2009 11:41 PM, Eduardo Habkost wrote:

Now that kvm-autotest was included on upstream Autotest SVN
repository[1], the following pull request is a proposal to sync
kvm-autotest.git Autotest SVN trunk, while keeping the kvm-autotest.git
history.

I have set up a git-svn mirror of the Autotest SVN repository[2], and
the upstream Autotest changes included on this pull request come from
this mirror.

The changes available on my branch do the following, to try to make the
resulting history bisectable:

1. Apply some trivial changes to avoid conflicts when merging with
Autotest SVN
2. Merge with Autotest SVN revision 3185 (revision immediately before
kvm-autotest was included)
3. Apply a patch series from Lucas, that converts the kvm-autotest
code to the code that went to Autotest SVN, including a rename
of tests/kvm_runtest_2 to tests/kvm.
4. Merge with Autotest SVN trunk (revision 3189)


The result is a tree that is almost the same that is on upstream
Autotest SVN, except for a few differences that need to be either
removed from kvm-autotest, or sent upstream. The remaining differences
are the following:

$ git diff origin/master github/to-kvm-autotest-merge1 | diffstat
  KVM_REGRESSION_README   |4 ++
  bin/base_sysinfo.py |2 -
  bin/simple_ssh.py   |   70 

  tests/dbench/control.60 |   20 +
  4 files changed, 95 insertions(+), 1 deletion(-)

('origin' is Autotest SVN, 'github' is the tree where the merge results
are available)


References:

  [1] http://autotest.kernel.org/changeset/3187
  [2] http://github.com/ehabkost/autotest/tree/master
  The mirror is synchronized every 30 minutes.


The following changes since commit 2a0a76645f72dcbd495ea37b566d017796eca24e:
   Uri Lublin (1):
 kvm_tests.cfg: adding some missing iso image md5sum values

are available in the git repository at:

   git://github.com/ehabkost/autotest.git to-kvm-autotest-merge1




Pulled, Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts

2009-06-03 Thread Michael Goldish

- Avi Kivity a...@redhat.com wrote:

 Michael Goldish wrote:
  
  
  - What if we're interested in the status for some reason? Its
 value
  may indicate what went wrong with the child process.


  Put it in the exception string.
  
 
  But I want its value to be examined programmatically in the code.
  It's less comfortable to retrieve it from a string.

 
 Then create a custom exception object and put it in a field.
 

  - If we throw an exception we should add a parameter that
 controls
  whether an exception should be thrown (something like

  ignore_status)
  
  because in some cases we don't want to throw an exception.


  I'll complain every time I see it.  If you don't care if the
 command 
  succeeds or not, why run it in the first place?
  
 
  I care, but sometimes I don't want to fail the test when a command
 fails.
  I can put every run_bg() call in a try-except statement, but that
 misses the
  point of raising exceptions in the first place.

 
 I think it's very rare to want to let the test continue even if some 
 command fails.
 
 Can you give examples?

Some commands are not critical, like one that converts the screendumps
from PPM to PNG format. We don't want to fail the test if you don't have
ImageMagick installed.

A similar function used for remote commands is sometimes used for checking
if a file exists on the guest and getting its size at the same time,
using ls. If the file doesn't exist we should SCP it into the guest, not
fail the test.

  exception such as something failed after migration?


  Chained exceptions can provide detailed information.
  
 
  Wouldn't it complicate the test code?
  How can I provide a detailed message such as test command failed
 after migration --
  can you illustrate this in a small example?

 
   try:
  code
  code
  code
   except e:
  raise ChainedException('exception while running migration test',
 e)
 
 Instead of checking each code line, you provide a wrapper for the
 entire test.

It's not informative to add it for the entire test, because we already know
what test failed -- we don't need the exception string for that.

Instead, we'd have to wrap small sections of the test separately and provide
a different exception string for each. The migration test may be split to
four sections: setup, pre-migration, the migration itself and post-migration.
This may not be so bad, but do you still think it's a good solution?

There's also another issue: the utility functions that should raise exceptions
don't know what they're being used for, so there can be exception string
collisions. For example, both run_bg(), used for local processes, and
kvm_spawn.get_command_status_output(), used for SSH commands, would say
something like command %s failed. It would be incorrect to hardcode a more
specific message into any of them (in my opinion).

  But if status != 0, will there actually be a pid to kill?
  
  
  If the timeout expires and the process is still running, status
 is

  None.
  


  functions which can return with three possible outcomes are
 difficult
  to use.
  
 
  I tend to see it as two possible outcomes: completion or timeout. In
 the former
  case, status is returned. In the latter, None is returned.

 
 The caller has to test for three possible outcomes.  Success, failure,
 and 'command is running in the background'.  If your callsites don't 
 check for all three, something's wrong.

When we want the command to succeed on time, we just test for status == 0.
If it's a nonzero integer or None, something went wrong, and we should kill
the process just in case it's still running.

 -- 
 Do not meddle in the internals of kernels, for they are subtle and
 quick to panic.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] VMX Unrestricted mode support

2009-06-03 Thread Nitin A Kamble
Hi Avi,
  I find that the qemu processor reset state is not per the IA32
processor specifications. (Sections 8.1.1 of
http://www.intel.com/Assets/PDF/manual/253668.pdf)

In qemu-kvm.git in file target-i386/helper.c in function cpu_reset the
segment registers are initialized as follows:

cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | 
DESC_R_MASK);
cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);

While the IA32 cpu reset state specification says that Segment Accessed
bit is also 1 at the time of cpu reset. so the above code should look
like this:

cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | 
 DESC_R_MASK | DESC_A_MASK);
cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | DESC_A_MASK);
cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_W_MASK| DESC_A_MASK);
cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |DESC_A_MASK);
cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);

This discrepancy is adding the need of the following function in the
unrestricted guest patch.

+static inline u32 get_segment_ar(int seg)
+{
+   if (!enable_unrestricted_guest)
+   return 0xf3;
+
+   switch (seg) {
+   case VCPU_SREG_CS:
+   return 0x9b;
+   case VCPU_SREG_TR:
+   return 0x8b;
+   case VCPU_SREG_LDTR:
+   return 0x82;
+   default:
+   return 0x93;
+   }
+}
+

For the unrestricted guest support either we can fix this discrepancy in
the qemu code, or have a functionality like get_segment_ar() in the kvm
vmx code. 
  what do you suggest ?

Thanks  Regards,
Nitin
   

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts

2009-06-03 Thread Avi Kivity

Michael Goldish wrote:
I think it's very rare to want to let the test continue even if some 
command fails.


Can you give examples?



Some commands are not critical, like one that converts the screendumps
from PPM to PNG format. We don't want to fail the test if you don't have
ImageMagick installed.
  


I think you don't want to run the test at all in that case.  But if you 
do, you shouldn't invoke ImageMagick if you don't have it installed.  
Otherwise you can't distinguish between failure due to a corrupted 
source image, out of disk space, or ImageMagick not installed.


Ignoring errors is usually a bug.


A similar function used for remote commands is sometimes used for checking
if a file exists on the guest and getting its size at the same time,
using ls. If the file doesn't exist we should SCP it into the guest, not
fail the test.
  


Again it doesn't distinguish between real errors and 'file does not 
exist'.  Use something like 'if test -f file; then ls -l file; else echo 
missing; fi'.


You can use rsync to implement 'copy unless exists'.



  try:
 code
 code
 code
  except e:
 raise ChainedException('exception while running migration test',
e)

Instead of checking each code line, you provide a wrapper for the
entire test.



It's not informative to add it for the entire test, because we already know
what test failed -- we don't need the exception string for that.

Instead, we'd have to wrap small sections of the test separately and provide
a different exception string for each. The migration test may be split to
four sections: setup, pre-migration, the migration itself and post-migration.
This may not be so bad, but do you still think it's a good solution?
  


You can have a 'stage' variable and assign it your current stage, and 
use it when reporting errors.


Anything is better than a maze of ifs.


There's also another issue: the utility functions that should raise exceptions
don't know what they're being used for, so there can be exception string
collisions. For example, both run_bg(), used for local processes, and
kvm_spawn.get_command_status_output(), used for SSH commands, would say
something like command %s failed. It would be incorrect to hardcode a more
specific message into any of them (in my opinion).

  


The message should explain what happened, no more, no less.


The caller has to test for three possible outcomes.  Success, failure,
and 'command is running in the background'.  If your callsites don't 
check for all three, something's wrong.



When we want the command to succeed on time, we just test for status == 0.
If it's a nonzero integer or None, something went wrong, and we should kill
the process just in case it's still running.
  


But if it's nonzero it has exited already!

However, I feel I'm not making any progress, so I'll stop.  I just wish 
I could use exceptions in the code I write.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] VMX Unrestricted mode support

2009-06-03 Thread Avi Kivity

Nitin A Kamble wrote:

Hi Avi,
  I find that the qemu processor reset state is not per the IA32
processor specifications. (Sections 8.1.1 of
http://www.intel.com/Assets/PDF/manual/253668.pdf)

In qemu-kvm.git in file target-i386/helper.c in function cpu_reset the
segment registers are initialized as follows:

cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | 
DESC_R_MASK);

cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x,
   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);

While the IA32 cpu reset state specification says that Segment Accessed
bit is also 1 at the time of cpu reset. so the above code should look
like this:

cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | 
 DESC_R_MASK | DESC_A_MASK);

cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | DESC_A_MASK);
cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_W_MASK| DESC_A_MASK);
cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |DESC_A_MASK);
cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x,
 DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);

This discrepancy is adding the need of the following function in the
unrestricted guest patch.

+static inline u32 get_segment_ar(int seg)
+{
+   if (!enable_unrestricted_guest)
+   return 0xf3;
+
+   switch (seg) {
+   case VCPU_SREG_CS:
+   return 0x9b;
+   case VCPU_SREG_TR:
+   return 0x8b;
+   case VCPU_SREG_LDTR:
+   return 0x82;
+   default:
+   return 0x93;
+   }
+}
+

For the unrestricted guest support either we can fix this discrepancy in
the qemu code, or have a functionality like get_segment_ar() in the kvm
vmx code. 
  what do you suggest ?


  


Qemu should be fixed of course, but we want kvm to keep working with 
older qemu.  So please also have KVM_SET_SREGS set the A bit on segments 
which are not unusable.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] avoid halted state for in kernel irqchip

2009-06-03 Thread Glauber Costa
This patch is part of a series that tries to provide
a kvm-free apic implementation. In the last interation,
Jan pointed out that halted state management with in kernel
irqchip gets quite messy. I don't disagree.

It broke this series specifically, as init IPIs had the
halted state set. The solution would be to rework halted
state management. But I'm too lazy. Besides, gleb told me
he would do it, which makes it his problem, not mine. ;-)

This patch can be used to bypass the problem entirely:
if kvm apic do not call apic_init_ipi, but a version of it
that does everything but mangling around with cpu states,
the problem becomes a non issue, and my glorious series
can be applied.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 hw/apic.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 86aa6b6..862289d 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -443,8 +443,7 @@ static void apic_get_delivery_bitmask(uint32_t 
*deliver_bitmask,
 }
 }
 
-
-static void apic_init_ipi(APICState *s)
+static void apic_init_ipi_state(APICState *s)
 {
 int i;
 
@@ -466,6 +465,11 @@ static void apic_init_ipi(APICState *s)
 s-next_time = 0;
 
 cpu_reset(s-cpu_env);
+}
+
+static void apic_init_ipi(APICState *s)
+{
+apic_init_ipi_state(s);
 
 if (!(s-apicbase  MSR_IA32_APICBASE_BSP) 
 (!kvm_enabled() || !qemu_kvm_irqchip_in_kernel()))
-- 
1.5.6.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] sipi and init: move common code

2009-06-03 Thread Glauber Costa
provide functions to query and reset the state of sipi and
init in cpu's apic. This way we can move the kvm specific functions
out of the apic path.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 cpu-defs.h |2 --
 hw/apic.c  |   49 -
 qemu-kvm.c |   26 ++
 qemu-kvm.h |7 +--
 4 files changed, 55 insertions(+), 29 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index 1e071e7..e66e928 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -140,8 +140,6 @@ typedef struct CPUWatchpoint {
 struct qemu_work_item;
 
 struct KVMCPUState {
-int sipi_needed;
-int init;
 pthread_t thread;
 int signalled;
 int stop;
diff --git a/hw/apic.c b/hw/apic.c
index 862289d..39e1675 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -86,6 +86,8 @@ typedef struct APICState {
 uint32_t initial_count;
 int64_t initial_count_load_time, next_time;
 QEMUTimer *timer;
+int sipi_needed;
+int init;
 } APICState;
 
 static int apic_io_memory;
@@ -93,6 +95,45 @@ static APICState *local_apics[MAX_APICS + 1];
 static int last_apic_id = 0;
 static int apic_irq_delivered;
 
+int apic_init_received(CPUState *env)
+{
+if (!env)
+return 0;
+if (!env-apic_state)
+return 0;
+
+return env-apic_state-init;
+}
+
+int apic_sipi_needed(CPUState *env)
+{
+if (!env)
+return 0;
+if (!env-apic_state)
+return 0;
+
+return env-apic_state-sipi_needed;
+}
+
+void apic_reset_sipi(CPUState *env)
+{
+if (!env)
+return;
+if (!env-apic_state)
+return;
+
+env-apic_state-sipi_needed = 0;
+}
+
+void apic_reset_init(CPUState *env)
+{
+if (!env)
+return;
+if (!env-apic_state)
+return;
+
+env-apic_state-init = 0;
+}
 
 static void apic_init_ipi(APICState *s);
 static void apic_set_irq(APICState *s, int vector_num, int trigger_mode);
@@ -475,9 +516,8 @@ static void apic_init_ipi(APICState *s)
 (!kvm_enabled() || !qemu_kvm_irqchip_in_kernel()))
 s-cpu_env-halted = 1;
 
-if (kvm_enabled()  !qemu_kvm_irqchip_in_kernel())
-   if (s-cpu_env)
-   kvm_apic_init(s-cpu_env);
+if (s-cpu_env  s-cpu_env-cpu_index != 0)
+   s-init = 1;
 }
 
 /* send a SIPI message to the CPU to start it */
@@ -490,8 +530,7 @@ static void apic_startup(APICState *s, int vector_num)
 cpu_x86_load_seg_cache(env, R_CS, vector_num  8, vector_num  12,
0x, 0);
 env-halted = 0;
-if (kvm_enabled()  !qemu_kvm_irqchip_in_kernel())
-   kvm_update_after_sipi(env);
+s-sipi_needed = 1;
 }
 
 static void apic_deliver(APICState *s, uint8_t dest, uint8_t dest_mode,
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 68d3b92..1441cef 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -134,19 +134,6 @@ void kvm_update_interrupt_request(CPUState *env)
 }
 }
 
-void kvm_update_after_sipi(CPUState *env)
-{
-env-kvm_cpu_state.sipi_needed = 1;
-kvm_update_interrupt_request(env);
-}
-
-void kvm_apic_init(CPUState *env)
-{
-if (env-cpu_index != 0)
-   env-kvm_cpu_state.init = 1;
-kvm_update_interrupt_request(env);
-}
-
 #include signal.h
 
 static int try_push_interrupts(void *opaque)
@@ -332,7 +319,7 @@ static void kvm_vm_state_change_handler(void *context, int 
running, int reason)
 static void update_regs_for_sipi(CPUState *env)
 {
 kvm_arch_update_regs_for_sipi(env);
-env-kvm_cpu_state.sipi_needed = 0;
+apic_reset_sipi(env);
 }
 
 static void update_regs_for_init(CPUState *env)
@@ -345,11 +332,10 @@ static void update_regs_for_init(CPUState *env)
 
 #ifdef TARGET_I386
 /* restore SIPI vector */
-if(env-kvm_cpu_state.sipi_needed)
+if (apic_sipi_needed(env))
 env-segs[R_CS] = cs;
 #endif
-
-env-kvm_cpu_state.init = 0;
+apic_reset_init(env);
 kvm_arch_load_regs(env);
 }
 
@@ -407,12 +393,12 @@ static int kvm_main_loop_cpu(CPUState *env)
if (env-interrupt_request  (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI))
env-halted = 0;
 if (!kvm_irqchip_in_kernel(kvm_context)) {
-   if (env-kvm_cpu_state.init)
+   if (apic_init_received(env))
update_regs_for_init(env);
-   if (env-kvm_cpu_state.sipi_needed)
+   if (apic_sipi_needed(env))
update_regs_for_sipi(env);
 }
-   if (!env-halted  !env-kvm_cpu_state.init)
+   if (!env-halted)
kvm_cpu_exec(env);
env-exit_request = 0;
 env-exception_index = EXCP_INTERRUPT;
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 725589b..eb7dc29 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -31,9 +31,13 @@ void kvm_remove_all_breakpoints(CPUState *current_env);
 int kvm_update_guest_debug(CPUState *env, unsigned long reinject_trap);
 int kvm_qemu_init_env(CPUState *env);
 int kvm_qemu_check_extension(int ext);
-void kvm_apic_init(CPUState *env);
+/* FIXME: there should be an apic.h file */
 /* called from vcpu initialization */
 

[PATCH 4/4] provide a kvm-free implementation of ioapic

2009-06-03 Thread Glauber Costa
Also, provide a kvm_ioapic that does not depend highly on common code.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 hw/ioapic.c |  162 +-
 hw/pc.c |7 ++-
 hw/pc.h |1 +
 3 files changed, 110 insertions(+), 60 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index 6c178c7..3eb5d5f 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -198,58 +198,11 @@ static void ioapic_mem_writel(void *opaque, 
target_phys_addr_t addr, uint32_t va
 }
 }
 
-static void kvm_kernel_ioapic_save_to_user(IOAPICState *s)
-{
-#if defined(KVM_CAP_IRQCHIP)  defined(TARGET_I386)
-struct kvm_irqchip chip;
-struct kvm_ioapic_state *kioapic;
-int i;
-
-chip.chip_id = KVM_IRQCHIP_IOAPIC;
-kvm_get_irqchip(kvm_context, chip);
-kioapic = chip.chip.ioapic;
-
-s-id = kioapic-id;
-s-ioregsel = kioapic-ioregsel;
-s-base_address = kioapic-base_address;
-s-irr = kioapic-irr;
-for (i = 0; i  IOAPIC_NUM_PINS; i++) {
-s-ioredtbl[i] = kioapic-redirtbl[i].bits;
-}
-#endif
-}
-
-static void kvm_kernel_ioapic_load_from_user(IOAPICState *s)
-{
-#if defined(KVM_CAP_IRQCHIP)  defined(TARGET_I386)
-struct kvm_irqchip chip;
-struct kvm_ioapic_state *kioapic;
-int i;
-
-chip.chip_id = KVM_IRQCHIP_IOAPIC;
-kioapic = chip.chip.ioapic;
-
-kioapic-id = s-id;
-kioapic-ioregsel = s-ioregsel;
-kioapic-base_address = s-base_address;
-kioapic-irr = s-irr;
-for (i = 0; i  IOAPIC_NUM_PINS; i++) {
-kioapic-redirtbl[i].bits = s-ioredtbl[i];
-}
-
-kvm_set_irqchip(kvm_context, chip);
-#endif
-}
-
-static void ioapic_save(QEMUFile *f, void *opaque)
+static void ioapic_save_common(QEMUFile *f, void *opaque)
 {
 IOAPICState *s = opaque;
 int i;
 
-if (kvm_enabled()  qemu_kvm_irqchip_in_kernel()) {
-kvm_kernel_ioapic_save_to_user(s);
-}
-
 qemu_put_8s(f, s-id);
 qemu_put_8s(f, s-ioregsel);
 qemu_put_be64s(f, s-base_address);
@@ -259,7 +212,12 @@ static void ioapic_save(QEMUFile *f, void *opaque)
 }
 }
 
-static int ioapic_load(QEMUFile *f, void *opaque, int version_id)
+static void ioapic_save(QEMUFile *f, void *opaque)
+{
+ioapic_save_common(f, opaque);
+}
+
+static int ioapic_load_common(QEMUFile *f, void *opaque, int version_id)
 {
 IOAPICState *s = opaque;
 int i;
@@ -283,14 +241,15 @@ static int ioapic_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_get_be64s(f, s-ioredtbl[i]);
 }
 
-if (kvm_enabled()  qemu_kvm_irqchip_in_kernel()) {
-kvm_kernel_ioapic_load_from_user(s);
-}
-
 return 0;
 }
 
-static void ioapic_reset(void *opaque)
+static int ioapic_load(QEMUFile *f, void *opaque, int version_id)
+{
+return ioapic_load_common(f, opaque, version_id);
+}
+
+static void ioapic_reset_common(void *opaque)
 {
 IOAPICState *s = opaque;
 int i;
@@ -299,11 +258,11 @@ static void ioapic_reset(void *opaque)
 s-base_address = IOAPIC_DEFAULT_BASE_ADDRESS;
 for(i = 0; i  IOAPIC_NUM_PINS; i++)
 s-ioredtbl[i] = 1  16; /* mask LVT */
-#ifdef KVM_CAP_IRQCHIP
-if (kvm_enabled()  qemu_kvm_irqchip_in_kernel()) {
-kvm_kernel_ioapic_load_from_user(s);
-}
-#endif
+}
+
+static void ioapic_reset(void *opaque)
+{
+ioapic_reset_common(opaque);
 }
 
 static CPUReadMemoryFunc *ioapic_mem_read[3] = {
@@ -335,3 +294,88 @@ IOAPICState *ioapic_init(void)
 
 return s;
 }
+
+#ifdef KVM_CAP_IRQCHIP
+static void kvm_kernel_ioapic_save_to_user(IOAPICState *s)
+{
+#if defined(TARGET_I386)
+struct kvm_irqchip chip;
+struct kvm_ioapic_state *kioapic;
+int i;
+
+chip.chip_id = KVM_IRQCHIP_IOAPIC;
+kvm_get_irqchip(kvm_context, chip);
+kioapic = chip.chip.ioapic;
+
+s-id = kioapic-id;
+s-ioregsel = kioapic-ioregsel;
+s-base_address = kioapic-base_address;
+s-irr = kioapic-irr;
+for (i = 0; i  IOAPIC_NUM_PINS; i++) {
+s-ioredtbl[i] = kioapic-redirtbl[i].bits;
+}
+#endif
+}
+
+static void kvm_kernel_ioapic_load_from_user(IOAPICState *s)
+{
+#if defined(TARGET_I386)
+struct kvm_irqchip chip;
+struct kvm_ioapic_state *kioapic;
+int i;
+
+chip.chip_id = KVM_IRQCHIP_IOAPIC;
+kioapic = chip.chip.ioapic;
+
+kioapic-id = s-id;
+kioapic-ioregsel = s-ioregsel;
+kioapic-base_address = s-base_address;
+kioapic-irr = s-irr;
+for (i = 0; i  IOAPIC_NUM_PINS; i++) {
+kioapic-redirtbl[i].bits = s-ioredtbl[i];
+}
+
+kvm_set_irqchip(kvm_context, chip);
+#endif
+}
+
+static void kvm_ioapic_save(QEMUFile *f, void *opaque)
+{
+IOAPICState *s = opaque;
+kvm_kernel_ioapic_save_to_user(s);
+
+ioapic_save_common(f, opaque);
+}
+
+static int kvm_ioapic_load(QEMUFile *f, void *opaque, int version_id)
+{
+IOAPICState *s = opaque;
+int r = ioapic_load_common(f, opaque, version_id);
+
+if (r == 0)
+kvm_kernel_ioapic_load_from_user(s);
+return r;
+}
+
+
+static void 

[PATCH 3/4] provide a kvm-free implementation of apic

2009-06-03 Thread Glauber Costa
Also, provide a kvm_apic that does not depend highly on common code.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 hw/apic.c  |  241 ++--
 hw/pc.c|7 ++-
 hw/pc.h|1 +
 qemu-kvm-x86.c |5 +-
 4 files changed, 157 insertions(+), 97 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 39e1675..c20f4a9 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -857,103 +857,11 @@ static void apic_mem_writel(void *opaque, 
target_phys_addr_t addr, uint32_t val)
 }
 }
 
-#ifdef KVM_CAP_IRQCHIP
-
-static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id)
-{
-return *((uint32_t *) (kapic-regs + (reg_id  4)));
-}
-
-static inline void kapic_set_reg(struct kvm_lapic_state *kapic,
- int reg_id, uint32_t val)
-{
-*((uint32_t *) (kapic-regs + (reg_id  4))) = val;
-}
-
-static void kvm_kernel_lapic_save_to_user(APICState *s)
-{
-struct kvm_lapic_state apic;
-struct kvm_lapic_state *kapic = apic;
-int i, v;
-
-kvm_get_lapic(kvm_context, s-cpu_env-cpu_index, kapic);
-
-s-id = kapic_reg(kapic, 0x2)  24;
-s-tpr = kapic_reg(kapic, 0x8);
-s-arb_id = kapic_reg(kapic, 0x9);
-s-log_dest = kapic_reg(kapic, 0xd)  24;
-s-dest_mode = kapic_reg(kapic, 0xe)  28;
-s-spurious_vec = kapic_reg(kapic, 0xf);
-for (i = 0; i  8; i++) {
-s-isr[i] = kapic_reg(kapic, 0x10 + i);
-s-tmr[i] = kapic_reg(kapic, 0x18 + i);
-s-irr[i] = kapic_reg(kapic, 0x20 + i);
-}
-s-esr = kapic_reg(kapic, 0x28);
-s-icr[0] = kapic_reg(kapic, 0x30);
-s-icr[1] = kapic_reg(kapic, 0x31);
-for (i = 0; i  APIC_LVT_NB; i++)
-   s-lvt[i] = kapic_reg(kapic, 0x32 + i);
-s-initial_count = kapic_reg(kapic, 0x38);
-s-divide_conf = kapic_reg(kapic, 0x3e);
-
-v = (s-divide_conf  3) | ((s-divide_conf  1)  4);
-s-count_shift = (v + 1)  7;
-
-s-initial_count_load_time = qemu_get_clock(vm_clock);
-apic_timer_update(s, s-initial_count_load_time);
-}
-
-static void kvm_kernel_lapic_load_from_user(APICState *s)
-{
-struct kvm_lapic_state apic;
-struct kvm_lapic_state *klapic = apic;
-int i;
-
-memset(klapic, 0, sizeof apic);
-kapic_set_reg(klapic, 0x2, s-id  24);
-kapic_set_reg(klapic, 0x8, s-tpr);
-kapic_set_reg(klapic, 0xd, s-log_dest  24);
-kapic_set_reg(klapic, 0xe, s-dest_mode  28 | 0x0fff);
-kapic_set_reg(klapic, 0xf, s-spurious_vec);
-for (i = 0; i  8; i++) {
-kapic_set_reg(klapic, 0x10 + i, s-isr[i]);
-kapic_set_reg(klapic, 0x18 + i, s-tmr[i]);
-kapic_set_reg(klapic, 0x20 + i, s-irr[i]);
-}
-kapic_set_reg(klapic, 0x28, s-esr);
-kapic_set_reg(klapic, 0x30, s-icr[0]);
-kapic_set_reg(klapic, 0x31, s-icr[1]);
-for (i = 0; i  APIC_LVT_NB; i++)
-kapic_set_reg(klapic, 0x32 + i, s-lvt[i]);
-kapic_set_reg(klapic, 0x38, s-initial_count);
-kapic_set_reg(klapic, 0x3e, s-divide_conf);
-
-kvm_set_lapic(kvm_context, s-cpu_env-cpu_index, klapic);
-}
-
-#endif
-
-void qemu_kvm_load_lapic(CPUState *env)
-{
-#ifdef KVM_CAP_IRQCHIP
-if (kvm_enabled()  kvm_vcpu_inited(env)  qemu_kvm_irqchip_in_kernel()) 
{
-kvm_kernel_lapic_load_from_user(env-apic_state);
-}
-#endif
-}
-
 static void apic_save(QEMUFile *f, void *opaque)
 {
 APICState *s = opaque;
 int i;
 
-#ifdef KVM_CAP_IRQCHIP
-if (kvm_enabled()  qemu_kvm_irqchip_in_kernel()) {
-kvm_kernel_lapic_save_to_user(s);
-}
-#endif
-
 qemu_put_be32s(f, s-apicbase);
 qemu_put_8s(f, s-id);
 qemu_put_8s(f, s-arb_id);
@@ -1017,8 +925,6 @@ static int apic_load(QEMUFile *f, void *opaque, int 
version_id)
 if (version_id = 2)
 qemu_get_timer(f, s-timer);
 
-qemu_kvm_load_lapic(s-cpu_env);
-
 return 0;
 }
 
@@ -1039,7 +945,6 @@ static void apic_reset(void *opaque)
  */
 s-lvt[APIC_LVT_LINT0] = 0x700;
 }
-qemu_kvm_load_lapic(s-cpu_env);
 }
 
 static CPUReadMemoryFunc *apic_mem_read[3] = {
@@ -1086,3 +991,149 @@ int apic_init(CPUState *env)
 return 0;
 }
 
+#ifdef KVM_CAP_IRQCHIP
+
+static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id)
+{
+return *((uint32_t *) (kapic-regs + (reg_id  4)));
+}
+
+static inline void kapic_set_reg(struct kvm_lapic_state *kapic,
+ int reg_id, uint32_t val)
+{
+*((uint32_t *) (kapic-regs + (reg_id  4))) = val;
+}
+
+static void kvm_kernel_lapic_save_to_user(APICState *s)
+{
+struct kvm_lapic_state apic;
+struct kvm_lapic_state *kapic = apic;
+int i, v;
+
+kvm_get_lapic(kvm_context, s-cpu_env-cpu_index, kapic);
+
+s-id = kapic_reg(kapic, 0x2)  24;
+s-tpr = kapic_reg(kapic, 0x8);
+s-arb_id = kapic_reg(kapic, 0x9);
+s-log_dest = kapic_reg(kapic, 0xd)  24;
+s-dest_mode = kapic_reg(kapic, 0xe)  28;
+s-spurious_vec = kapic_reg(kapic, 0xf);
+for (i = 0; i  8; i++) {
+s-isr[i] = 

Re: [PATCH][KVM_AUTOTEST] Added functionality to the preprocessor to run scripts

2009-06-03 Thread Michael Goldish

- Avi Kivity a...@redhat.com wrote:

 Michael Goldish wrote:
  I think it's very rare to want to let the test continue even if
 some 
  command fails.
 
  Can you give examples?
  
 
  Some commands are not critical, like one that converts the
 screendumps
  from PPM to PNG format. We don't want to fail the test if you don't
 have
  ImageMagick installed.

 
 I think you don't want to run the test at all in that case.

I do -- the conversion is just intended to save some disk space after the
test completes.

 But if you do, you shouldn't invoke ImageMagick if you don't have it 
 installed. 
 
 Otherwise you can't distinguish between failure due to a corrupted 
 source image, out of disk space, or ImageMagick not installed.

You can, if you look at the output and/or status that run_bg() returns (e.g.
command not found and 127). But you don't need to because you shouldn't
do anything special if the command fails, except inform the user that it
failed. This command is really not critical (we're currently not using it
anyway -- it's just planned for the near future).

 Ignoring errors is usually a bug.

  A similar function used for remote commands is sometimes used for
 checking
  if a file exists on the guest and getting its size at the same
 time,
  using ls. If the file doesn't exist we should SCP it into the guest,
 not
  fail the test.

 
 Again it doesn't distinguish between real errors and 'file does not 
 exist'.  Use something like 'if test -f file; then ls -l file; else
 echo 
 missing; fi'.

I don't see what's wrong with 'ls filename'. It does exactly what the line
you suggested does -- if the file exists it displays some info, and if it
doesn't it says No such file 
In the test we explicitly check for the No such file message, so it's
being handled properly.

 You can use rsync to implement 'copy unless exists'.

The guest may not like rsync. We may have to work with FTP in some cases
(Windows).

 
try:
   code
   code
   code
except e:
   raise ChainedException('exception while running migration
 test',
  e)
 
  Instead of checking each code line, you provide a wrapper for the
  entire test.
  
 
  It's not informative to add it for the entire test, because we
 already know
  what test failed -- we don't need the exception string for that.
 
  Instead, we'd have to wrap small sections of the test separately and
 provide
  a different exception string for each. The migration test may be
 split to
  four sections: setup, pre-migration, the migration itself and
 post-migration.
  This may not be so bad, but do you still think it's a good
 solution?

 
 You can have a 'stage' variable and assign it your current stage, and
 use it when reporting errors.
 
 Anything is better than a maze of ifs.
 
  There's also another issue: the utility functions that should raise
 exceptions
  don't know what they're being used for, so there can be exception
 string
  collisions. For example, both run_bg(), used for local processes,
 and
  kvm_spawn.get_command_status_output(), used for SSH commands, would
 say
  something like command %s failed. It would be incorrect to
 hardcode a more
  specific message into any of them (in my opinion).
 

 
 The message should explain what happened, no more, no less.

What happened depends on the point of view.

It doesn't really help the user to print something like
read_until_output_matches: Timeout elapsed
instead of
Could not log into guest after migration.

It would also not be useful to see
get_command_status_output: command 'tar xfj autotest.tar.bz2' failed with 
status 1
without knowing if it's a remote or a local command (the function itself 
doesn't know).

  The caller has to test for three possible outcomes.  Success,
 failure,
  and 'command is running in the background'.  If your callsites
 don't 
  check for all three, something's wrong.
  
 
  When we want the command to succeed on time, we just test for status
 == 0.
  If it's a nonzero integer or None, something went wrong, and we
 should kill
  the process just in case it's still running.

 
 But if it's nonzero it has exited already!

That's why we use safe_kill(), which kills only if the process is still running.

 However, I feel I'm not making any progress, so I'll stop.  I just
 wish I could use exceptions in the code I write.

We are making progress: I'll try to think of a simple way to start working
with exceptions without breaking too many things. In the worst case, we can
just keep two sets of functions: low-level ones that don't raise anything,
and high level ones that raise an exception whenever anything seems to have
gone wrong.

I'm sorry for being a bit too stubborn. I prefer to understand things
before going ahead and implementing them, and I still feel that some of my
questions remain unanswered.

 
 -- 
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.
--
To unsubscribe from this 

Re: [PATCH 0/4] kvm free implementation of apic/ioapic

2009-06-03 Thread Jan Kiszka
Glauber Costa wrote:
 Same series already sent.
 Jan spotted a problem, and my lazyness found a way to bypass it,
 so it does not exist.
 
 A guest survived the following actions after this series is applied:
 * smp boot
 * smp reboot
 * migrate
 * reboot migrated guest.
 

I can confirm that this series no longer suffers from the halted regression.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/4] avoid halted state for in kernel irqchip

2009-06-03 Thread Gleb Natapov
On Wed, Jun 03, 2009 at 02:25:41PM -0400, Glauber Costa wrote:
 This patch is part of a series that tries to provide
 a kvm-free apic implementation. In the last interation,
 Jan pointed out that halted state management with in kernel
 irqchip gets quite messy. I don't disagree.
 
 It broke this series specifically, as init IPIs had the
 halted state set. The solution would be to rework halted
 state management. But I'm too lazy. Besides, gleb told me
 he would do it, which makes it his problem, not mine. ;-)
 
 This patch can be used to bypass the problem entirely:
 if kvm apic do not call apic_init_ipi, but a version of it
 that does everything but mangling around with cpu states,
 the problem becomes a non issue, and my glorious series
 can be applied.
 
May be I miss something, but I don't see apic_init_ipi_state() called
outside of apic_init_ipi() in this patch series, so what is the point of
this patch?

 Signed-off-by: Glauber Costa glom...@redhat.com
 ---
  hw/apic.c |8 ++--
  1 files changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/hw/apic.c b/hw/apic.c
 index 86aa6b6..862289d 100644
 --- a/hw/apic.c
 +++ b/hw/apic.c
 @@ -443,8 +443,7 @@ static void apic_get_delivery_bitmask(uint32_t 
 *deliver_bitmask,
  }
  }
  
 -
 -static void apic_init_ipi(APICState *s)
 +static void apic_init_ipi_state(APICState *s)
  {
  int i;
  
 @@ -466,6 +465,11 @@ static void apic_init_ipi(APICState *s)
  s-next_time = 0;
  
  cpu_reset(s-cpu_env);
 +}
 +
 +static void apic_init_ipi(APICState *s)
 +{
 +apic_init_ipi_state(s);
  
  if (!(s-apicbase  MSR_IA32_APICBASE_BSP) 
  (!kvm_enabled() || !qemu_kvm_irqchip_in_kernel()))
 -- 
 1.5.6.6
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] sipi and init: move common code

2009-06-03 Thread Gleb Natapov
On Wed, Jun 03, 2009 at 02:25:42PM -0400, Glauber Costa wrote:
 @@ -407,12 +393,12 @@ static int kvm_main_loop_cpu(CPUState *env)
   if (env-interrupt_request  (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI))
   env-halted = 0;
  if (!kvm_irqchip_in_kernel(kvm_context)) {
 - if (env-kvm_cpu_state.init)
 + if (apic_init_received(env))
   update_regs_for_init(env);
 - if (env-kvm_cpu_state.sipi_needed)
 + if (apic_sipi_needed(env))
   update_regs_for_sipi(env);
  }
 - if (!env-halted  !env-kvm_cpu_state.init)
 + if (!env-halted)
You remove checking of init here. Why? It looks like it can be safely
removed, but can you do it as a separate patch with explanation why it
is not needed here.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM PATCH v5 0/2] iosignalfd

2009-06-03 Thread Gregory Haskins
(Applies to kvm.git/master:25deed73)

This is v5 of the series.  For more details, please see the header to
patch 2/2.

This series has been tested against the kvm-eventfd unit test, and
appears to be functioning properly.  You can download this test here:

ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2

This series is ready to be considered for inclusion, pending any further
review comments.

[
   Changelog:

  v5:
   *) Removed cookie field, which was a misunderstanding on my
  part on what Avi wanted for a data-match feature
   *) Added a new trigger data-match feature which I think is
  much closer to what we need.
   *) We retain the dev_count field in the io_bus infrastructure
  and instead back-fill the array on removal.
   *) Various minor cleanups
   *) Rebased to kvm.git/master:25deed73

  v4:
   *) Fixed a bug in the original 2/4 where the PIT failure case
  would potentially leave the io_bus components registered.
   *) Condensed the v3 2/4 and 3/4 into one patch (2/2) since
  the patches became interdependent with the fix described above
   *) Rebased to kvm.git/master:74dfca0a

  v3:
   *) fixed patch 2/4 to handle error cases instead of BUG_ON
   *) implemented same HAVE_EVENTFD protection mechanism as
  irqfd to prevent compilation errors on unsupported arches
   *) completed testing
   *) rebased to kvm.git/master:7391a6d5

  v2:
   *) added optional data-matching capability (via cookie field)
   *) changed name from iofd to iosignalfd
   *) added io_bus unregister function
   *) implemented deassign feature

  v1:
   *) original release (integrated into irqfd v7 series as iofd)
]

---

Gregory Haskins (2):
  kvm: add iosignalfd support
  kvm: make io_bus interface more robust


 arch/x86/kvm/i8254.c  |   22 +++
 arch/x86/kvm/i8259.c  |9 +
 arch/x86/kvm/x86.c|1 
 include/linux/kvm.h   |   15 ++
 include/linux/kvm_host.h  |   16 ++
 virt/kvm/coalesced_mmio.c |8 +
 virt/kvm/eventfd.c|  356 +
 virt/kvm/ioapic.c |9 +
 virt/kvm/kvm_main.c   |   41 +
 9 files changed, 462 insertions(+), 15 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM PATCH v5 1/2] kvm: make io_bus interface more robust

2009-06-03 Thread Gregory Haskins
Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it
fails.  We want to create dynamic MMIO/PIO entries driven from userspace later
in the series, so we need to enhance the code to be more robust with the
following changes:

   1) Add a return value to the registration function
   2) Fix up all the callsites to check the return code, handle any
  failures, and percolate the error up to the caller.
   3) Add an unregister function that collapses holes in the array

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 arch/x86/kvm/i8254.c  |   22 --
 arch/x86/kvm/i8259.c  |9 -
 include/linux/kvm_host.h  |6 --
 virt/kvm/coalesced_mmio.c |8 ++--
 virt/kvm/ioapic.c |9 +++--
 virt/kvm/kvm_main.c   |   30 --
 6 files changed, 73 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index f91b0e3..f01eabe 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -586,6 +586,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 {
struct kvm_pit *pit;
struct kvm_kpit_state *pit_state;
+   int ret;
 
pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL);
if (!pit)
@@ -620,14 +621,31 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
kvm_register_irq_mask_notifier(kvm, 0, pit-mask_notifier);
 
kvm_iodevice_init(pit-dev, pit_dev_ops);
-   kvm_io_bus_register_dev(kvm-pio_bus, pit-dev);
+   ret = kvm_io_bus_register_dev(kvm-pio_bus, pit-dev);
+   if (ret  0)
+   goto fail;
 
if (flags  KVM_PIT_SPEAKER_DUMMY) {
kvm_iodevice_init(pit-speaker_dev, speaker_dev_ops);
-   kvm_io_bus_register_dev(kvm-pio_bus, pit-speaker_dev);
+   ret = kvm_io_bus_register_dev(kvm-pio_bus,
+ pit-speaker_dev);
+   if (ret  0)
+   goto fail;
}
 
return pit;
+
+fail:
+   if (flags  KVM_PIT_SPEAKER_DUMMY)
+   kvm_io_bus_unregister_dev(kvm-pio_bus, pit-speaker_dev);
+
+   kvm_io_bus_unregister_dev(kvm-pio_bus, pit-dev);
+
+   if (pit-irq_source_id = 0)
+   kvm_free_irq_source_id(kvm, pit-irq_source_id);
+
+   kfree(pit);
+   return NULL;
 }
 
 void kvm_free_pit(struct kvm *kvm)
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 2520922..af9b8b4 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -530,6 +530,8 @@ static const struct kvm_io_device_ops picdev_ops = {
 struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 {
struct kvm_pic *s;
+   int ret;
+
s = kzalloc(sizeof(struct kvm_pic), GFP_KERNEL);
if (!s)
return NULL;
@@ -546,6 +548,11 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 * Initialize PIO device
 */
kvm_iodevice_init(s-dev, picdev_ops);
-   kvm_io_bus_register_dev(kvm-pio_bus, s-dev);
+   ret = kvm_io_bus_register_dev(kvm-pio_bus, s-dev);
+   if (ret  0) {
+   kfree(s);
+   return NULL;
+   }
+
return s;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bcb94eb..216fe07 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -61,8 +61,10 @@ void kvm_io_bus_init(struct kvm_io_bus *bus);
 void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
  gpa_t addr, int len, int is_write);
-void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
-struct kvm_io_device *dev);
+int kvm_io_bus_register_dev(struct kvm_io_bus *bus,
+   struct kvm_io_device *dev);
+void kvm_io_bus_unregister_dev(struct kvm_io_bus *bus,
+   struct kvm_io_device *dev);
 
 struct kvm_vcpu {
struct kvm *kvm;
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index c4c7ec2..282fcde 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -97,6 +97,7 @@ static const struct kvm_io_device_ops coalesced_mmio_ops = {
 int kvm_coalesced_mmio_init(struct kvm *kvm)
 {
struct kvm_coalesced_mmio_dev *dev;
+   int ret;
 
dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL);
if (!dev)
@@ -104,9 +105,12 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
kvm_iodevice_init(dev-dev, coalesced_mmio_ops);
dev-kvm = kvm;
kvm-coalesced_mmio_dev = dev;
-   kvm_io_bus_register_dev(kvm-mmio_bus, dev-dev);
 
-   return 0;
+   ret = kvm_io_bus_register_dev(kvm-mmio_bus, dev-dev);
+   if (ret  0)
+   kfree(dev);
+
+   return ret;
 }
 
 int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 6b00433..4a37ea1 100644

[KVM PATCH v5 2/2] kvm: add iosignalfd support

2009-06-03 Thread Gregory Haskins
iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest.  Host userspace can register any arbitrary
IO address with a corresponding eventfd and then pass the eventfd to a
specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
heavy-weight exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc).  For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible.  All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling.  This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd.  This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called doorbell.  This
module has a function called doorbell_ring() which simply increments a
counter for each time the doorbell is signaled.  It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl().  The other is direct via
iosignalfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio:   11 iops, 9.09us rtt
iosignalfd-mmio: 200100 iops, 5.00us rtt
iosignalfd-pio:  367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy.  However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio:  153139 iops, 6.53us rtt
iosignalfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.



Signed-off-by: Gregory Haskins ghask...@novell.com
---

 arch/x86/kvm/x86.c   |1 
 include/linux/kvm.h  |   15 ++
 include/linux/kvm_host.h |   10 +
 virt/kvm/eventfd.c   |  356 ++
 virt/kvm/kvm_main.c  |   11 +
 5 files changed, 389 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c1ed485..c96c0e3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1097,6 +1097,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_IRQ_INJECT_STATUS:
case KVM_CAP_ASSIGN_DEV_IRQ:
case KVM_CAP_IRQFD:
+   case KVM_CAP_IOSIGNALFD:
case KVM_CAP_PIT2:
r = 1;
break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 632a856..53b720d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -300,6 +300,19 @@ struct kvm_guest_debug {
struct kvm_guest_debug_arch arch;
 };
 
+#define KVM_IOSIGNALFD_FLAG_TRIGGER   (1  0)
+#define KVM_IOSIGNALFD_FLAG_PIO   (1  1)
+#define KVM_IOSIGNALFD_FLAG_DEASSIGN  (1  2)
+
+struct kvm_iosignalfd {
+   __u64 trigger;
+   __u64 addr;
+   __u32 len;
+   __u32 fd;
+   __u32 flags;
+   __u8  pad[36];
+};
+
 #define KVM_TRC_SHIFT   16
 /*
  * kvm trace categories
@@ -430,6 +443,7 @@ struct kvm_trace_rec {
 #ifdef __KVM_HAVE_PIT
 #define KVM_CAP_PIT2 33
 #endif
+#define KVM_CAP_IOSIGNALFD 34
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -537,6 +551,7 @@ struct kvm_irqfd {
 #define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
 #define KVM_IRQFD  _IOW(KVMIO, 0x76, struct kvm_irqfd)
 #define KVM_CREATE_PIT2   _IOW(KVMIO, 0x77, struct 
kvm_pit_config)
+#define KVM_IOSIGNALFD _IOW(KVMIO, 0x78, struct kvm_iosignalfd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 216fe07..b705960 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -138,6 +138,7 @@ 

[PATCH v3] qemu-kvm: add iosignalfd support

2009-06-03 Thread Gregory Haskins
An iosignalfd allows an eventfd to attach to a specific PIO/MMIO region in the
guest.  Any guest-writes to that region will trigger an eventfd signal.

[
  This userspace patch coorelates to the kvm.git patches, v5, which you
  can find here: http://lkml.org/lkml/2009/6/3/433, and are based on top
  of the irqfd userspace patches which have not yet been accepted upstream.
]

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 kvm/libkvm/libkvm.c |   68 +++
 kvm/libkvm/libkvm.h |   42 
 2 files changed, 110 insertions(+), 0 deletions(-)

diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c
index 0ada15c..cb558fc 100644
--- a/kvm/libkvm/libkvm.c
+++ b/kvm/libkvm/libkvm.c
@@ -1493,3 +1493,71 @@ int kvm_irqfd(kvm_context_t kvm, int gsi, int flags)
 }
 
 #endif /* KVM_CAP_IRQFD */
+
+#ifdef KVM_CAP_IOSIGNALFD
+
+#include sys/eventfd.h
+
+int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long addr, size_t len,
+ int fd, void *trigger, int flags)
+{
+   int r;
+   int type = flags  IOSIGNALFD_FLAG_PIO; 
+   struct kvm_iosignalfd data = {
+   .trigger = (__u64)trigger,
+   .addr= addr,
+   .len = len,
+   .fd  = fd,
+   };
+
+   data.flags |= trigger ? KVM_IOSIGNALFD_FLAG_TRIGGER : 0;
+   data.flags |= type ? KVM_IOSIGNALFD_FLAG_PIO : 0;
+
+   if (!kvm_check_extension(kvm, KVM_CAP_IOSIGNALFD))
+   return -ENOENT;
+
+   r = ioctl(kvm-vm_fd, KVM_IOSIGNALFD, data);
+   if (r == -1)
+   r = -errno;
+   return r;
+}
+
+int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long addr, int fd,
+   int flags)
+{
+   int r;
+   int type = flags  IOSIGNALFD_FLAG_PIO; 
+   struct kvm_iosignalfd data = {
+   .addr= addr,
+   .fd  = fd,
+   .flags   = KVM_IOSIGNALFD_FLAG_DEASSIGN |
+   (type ? KVM_IOSIGNALFD_FLAG_PIO : 0),
+   };
+
+   if (!kvm_check_extension(kvm, KVM_CAP_IOSIGNALFD))
+   return -ENOENT;
+
+   r = ioctl(kvm-vm_fd, KVM_IOSIGNALFD, data);
+   if (r == -1)
+   r = -errno;
+   return r;
+}
+
+#else /* KVM_CAP_IOSIGNALFD */
+
+int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long addr, size_t len,
+ int fd, void *trigger, int flags)
+{
+   return -ENOSYS;
+}
+
+int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long addr, int fd,
+   int flags)
+{
+   return -ENOSYS;
+}
+
+#endif /* KVM_CAP_IOSIGNALFD */
+
+
+
diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h
index aca8ed6..65fbe2a 100644
--- a/kvm/libkvm/libkvm.h
+++ b/kvm/libkvm/libkvm.h
@@ -870,6 +870,48 @@ int kvm_get_irq_route_gsi(kvm_context_t kvm);
  */
 int kvm_irqfd(kvm_context_t kvm, int gsi, int flags);
 
+enum {
+   iosignalfd_option_pio,
+};
+
+#define IOSIGNALFD_FLAG_PIO(1  iosignalfd_option_pio)
+
+/*!
+ * \brief Assign an eventfd to an IO port (PIO or MMIO)
+ *
+ * Assigns an eventfd based file-descriptor to a specific PIO or MMIO
+ * address range.  Any guest writes to the specified range will generate
+ * an eventfd signal.
+ *
+ * A data-match pointer can be optionally provided in trigger and only
+ * writes which match this value exactly will generate an event.  The length
+ * of the trigger is established by the length of the overall IO range, and
+ * therefore must be in a natural byte-width for the IO routines of your
+ * particular architecture (e.g. 1, 2, 4, or 8 bytes on x86_64).
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param addr The IO address
+ * \param len The length of the IO region at the address
+ * \param fd The eventfd file-descriptor
+ * \param trigger A optional pointer providing data-match token
+ * \param flags FLAG_PIO: PIO, else MMIO
+ */
+int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long addr, size_t len,
+ int fd, void *trigger, int flags);
+
+/*!
+ * \brief Deassign an iosignalfd from a previously registered IO port
+ *
+ * Deassigns an iosignalfd previously registered with kvm_assign_iosignalfd()
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param addr The IO address to deassign
+ * \param fd The eventfd file-descriptor
+ * \param flags FLAG_PIO: PIO, else MMIO
+ */
+int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long addr,
+   int fd, int flags);
+
 #ifdef KVM_CAP_DEVICE_MSIX
 int kvm_assign_set_msix_nr(kvm_context_t kvm,
   struct kvm_assigned_msix_nr *msix_nr);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v5 2/2] kvm: add iosignalfd support

2009-06-03 Thread Gregory Haskins
Hi Paul,
  Sorry to bug you again, but here is yet another RCU related patch. 
See inline

Gregory Haskins wrote:
 iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
 signal when written to by a guest.  Host userspace can register any arbitrary
 IO address with a corresponding eventfd and then pass the eventfd to a
 specific end-point of interest for handling.

 Normal IO requires a blocking round-trip since the operation may cause
 side-effects in the emulated model or may return data to the caller.
 Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
 heavy-weight exit back to userspace, and is ultimately serviced by qemu's
 device model synchronously before returning control back to the vcpu.

 However, there is a subclass of IO which acts purely as a trigger for
 other IO (such as to kick off an out-of-band DMA request, etc).  For these
 patterns, the synchronous call is particularly expensive since we really
 only want to simply get our notification transmitted asychronously and
 return as quickly as possible.  All the sychronous infrastructure to ensure
 proper data-dependencies are met in the normal IO case are just unecessary
 overhead for signalling.  This adds additional computational load on the
 system, as well as latency to the signalling path.

 Therefore, we provide a mechanism for registration of an in-kernel trigger
 point that allows the VCPU to only require a very brief, lightweight
 exit just long enough to signal an eventfd.  This also means that any
 clients compatible with the eventfd interface (which includes userspace
 and kernelspace equally well) can now register to be notified. The end
 result should be a more flexible and higher performance notification API
 for the backend KVM hypervisor and perhipheral components.

 To test this theory, we built a test-harness called doorbell.  This
 module has a function called doorbell_ring() which simply increments a
 counter for each time the doorbell is signaled.  It supports signalling
 from either an eventfd, or an ioctl().

 We then wired up two paths to the doorbell: One via QEMU via a registered
 io region and through the doorbell ioctl().  The other is direct via
 iosignalfd.

 You can download this test harness here:

 ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

 The measured results are as follows:

 qemu-mmio:   11 iops, 9.09us rtt
 iosignalfd-mmio: 200100 iops, 5.00us rtt
 iosignalfd-pio:  367300 iops, 2.72us rtt

 I didn't measure qemu-pio, because I have to figure out how to register a
 PIO region with qemu's device model, and I got lazy.  However, for now we
 can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
 and -350ns for HC, we get:

 qemu-pio:  153139 iops, 6.53us rtt
 iosignalfd-hc: 412585 iops, 2.37us rtt

 these are just for fun, for now, until I can gather more data.

 Here is a graph for your convenience:

 http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

 The conclusion to draw is that we save about 4us by skipping the userspace
 hop.

 

 Signed-off-by: Gregory Haskins ghask...@novell.com
 ---

  arch/x86/kvm/x86.c   |1 
  include/linux/kvm.h  |   15 ++
  include/linux/kvm_host.h |   10 +
  virt/kvm/eventfd.c   |  356 
 ++
  virt/kvm/kvm_main.c  |   11 +
  5 files changed, 389 insertions(+), 4 deletions(-)

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index c1ed485..c96c0e3 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1097,6 +1097,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_IRQ_INJECT_STATUS:
   case KVM_CAP_ASSIGN_DEV_IRQ:
   case KVM_CAP_IRQFD:
 + case KVM_CAP_IOSIGNALFD:
   case KVM_CAP_PIT2:
   r = 1;
   break;
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 632a856..53b720d 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -300,6 +300,19 @@ struct kvm_guest_debug {
   struct kvm_guest_debug_arch arch;
  };
  
 +#define KVM_IOSIGNALFD_FLAG_TRIGGER   (1  0)
 +#define KVM_IOSIGNALFD_FLAG_PIO   (1  1)
 +#define KVM_IOSIGNALFD_FLAG_DEASSIGN  (1  2)
 +
 +struct kvm_iosignalfd {
 + __u64 trigger;
 + __u64 addr;
 + __u32 len;
 + __u32 fd;
 + __u32 flags;
 + __u8  pad[36];
 +};
 +
  #define KVM_TRC_SHIFT   16
  /*
   * kvm trace categories
 @@ -430,6 +443,7 @@ struct kvm_trace_rec {
  #ifdef __KVM_HAVE_PIT
  #define KVM_CAP_PIT2 33
  #endif
 +#define KVM_CAP_IOSIGNALFD 34
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -537,6 +551,7 @@ struct kvm_irqfd {
  #define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
  #define KVM_IRQFD  _IOW(KVMIO, 0x76, struct kvm_irqfd)
  #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct 
 kvm_pit_config)
 +#define KVM_IOSIGNALFD _IOW(KVMIO, 0x78, struct kvm_iosignalfd)
  
  /*
   * 

Re: [PATCH 1/4] avoid halted state for in kernel irqchip

2009-06-03 Thread Glauber Costa
On Wed, Jun 03, 2009 at 10:50:59PM +0300, Gleb Natapov wrote:
 On Wed, Jun 03, 2009 at 02:25:41PM -0400, Glauber Costa wrote:
  This patch is part of a series that tries to provide
  a kvm-free apic implementation. In the last interation,
  Jan pointed out that halted state management with in kernel
  irqchip gets quite messy. I don't disagree.
  
  It broke this series specifically, as init IPIs had the
  halted state set. The solution would be to rework halted
  state management. But I'm too lazy. Besides, gleb told me
  he would do it, which makes it his problem, not mine. ;-)
  
  This patch can be used to bypass the problem entirely:
  if kvm apic do not call apic_init_ipi, but a version of it
  that does everything but mangling around with cpu states,
  the problem becomes a non issue, and my glorious series
  can be applied.
  
 May be I miss something, but I don't see apic_init_ipi_state() called
 outside of apic_init_ipi() in this patch series, so what is the point of
 this patch?
it should be called from patch 3.

But I made a mistake, and in fact it is not. My bad.

 
  Signed-off-by: Glauber Costa glom...@redhat.com
  ---
   hw/apic.c |8 ++--
   1 files changed, 6 insertions(+), 2 deletions(-)
  
  diff --git a/hw/apic.c b/hw/apic.c
  index 86aa6b6..862289d 100644
  --- a/hw/apic.c
  +++ b/hw/apic.c
  @@ -443,8 +443,7 @@ static void apic_get_delivery_bitmask(uint32_t 
  *deliver_bitmask,
   }
   }
   
  -
  -static void apic_init_ipi(APICState *s)
  +static void apic_init_ipi_state(APICState *s)
   {
   int i;
   
  @@ -466,6 +465,11 @@ static void apic_init_ipi(APICState *s)
   s-next_time = 0;
   
   cpu_reset(s-cpu_env);
  +}
  +
  +static void apic_init_ipi(APICState *s)
  +{
  +apic_init_ipi_state(s);
   
   if (!(s-apicbase  MSR_IA32_APICBASE_BSP) 
   (!kvm_enabled() || !qemu_kvm_irqchip_in_kernel()))
  -- 
  1.5.6.6
  
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] sipi and init: move common code

2009-06-03 Thread Glauber Costa
On Wed, Jun 03, 2009 at 10:53:05PM +0300, Gleb Natapov wrote:
 On Wed, Jun 03, 2009 at 02:25:42PM -0400, Glauber Costa wrote:
  @@ -407,12 +393,12 @@ static int kvm_main_loop_cpu(CPUState *env)
  if (env-interrupt_request  (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI))
  env-halted = 0;
   if (!kvm_irqchip_in_kernel(kvm_context)) {
  -   if (env-kvm_cpu_state.init)
  +   if (apic_init_received(env))
  update_regs_for_init(env);
  -   if (env-kvm_cpu_state.sipi_needed)
  +   if (apic_sipi_needed(env))
  update_regs_for_sipi(env);
   }
  -   if (!env-halted  !env-kvm_cpu_state.init)
  +   if (!env-halted)
 You remove checking of init here. Why? It looks like it can be safely
 removed, but can you do it as a separate patch with explanation why it
 is not needed here.
I'll update it to not touch this code, since it is not the aim of this series.
we can remove it later.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Add serial number support for virtio_blk, V4a

2009-06-03 Thread john cooper
This patch extracts the opaque data from pci i/o
region 0 via the added VIRTIO_BLK_F_IDENTIFY
field.  By convention this data takes the form of
that returned by an ATA IDENTIFY DEVICE command,
however the driver (except for structure size)
makes no interpretation of the data.  The structure
data is copied wholesale to userspace via a
HDIO_GET_IDENTITY ioctl command (eg: hdparm -i dev).

Signed-off-by: john cooper john.coo...@redhat.com
---

 drivers/block/virtio_blk.c |   41 ++---
 include/linux/virtio_blk.h |6 ++
 2 files changed, 44 insertions(+), 3 deletions(-)
=
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -146,12 +146,46 @@ static void do_virtblk_request(struct re
vblk-vq-vq_ops-kick(vblk-vq);
 }
 
+/* return ATA identify data
+ */
+static int virtblk_identify(struct gendisk *disk, void *argp)
+{
+   struct virtio_blk *vblk = disk-private_data;
+   void *opaque;
+   int err = -ENOMEM;
+
+   opaque = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL);
+   if (!opaque)
+   goto out;
+
+   err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY,
+   offsetof(struct virtio_blk_config, identify), opaque,
+   VIRTIO_BLK_ID_BYTES);
+
+   if (err)
+   goto out_kfree;
+
+   if (copy_to_user(argp, opaque, VIRTIO_BLK_ID_BYTES))
+   err = -EFAULT;
+
+out_kfree:
+   kfree(opaque);
+out:
+   return err;
+}
+
 static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
 unsigned cmd, unsigned long data)
 {
-   return scsi_cmd_ioctl(bdev-bd_disk-queue,
- bdev-bd_disk, mode, cmd,
- (void __user *)data);
+   struct gendisk *disk = bdev-bd_disk;
+   void __user *argp = (void __user *)data;
+
+   switch (cmd) {
+   case HDIO_GET_IDENTITY:
+   return virtblk_identify(disk, argp);
+   default:
+   return scsi_cmd_ioctl(disk-queue, disk, mode, cmd, argp);
+   }
 }
 
 /* We provide getgeo only to please some old bootloader/partitioning tools */
@@ -356,6 +390,7 @@ static struct virtio_device_id id_table[
 static unsigned int features[] = {
VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX,
VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
+   VIRTIO_BLK_F_IDENTIFY
 };
 
 static struct virtio_driver virtio_blk = {
=
--- a/include/linux/virtio_blk.h
+++ b/include/linux/virtio_blk.h
@@ -15,7 +15,12 @@
 #define VIRTIO_BLK_F_GEOMETRY  4   /* Legacy geometry available  */
 #define VIRTIO_BLK_F_RO5   /* Disk is read-only */
 #define VIRTIO_BLK_F_BLK_SIZE  6   /* Block size of disk is available*/
+#define VIRTIO_BLK_F_IDENTIFY  8   /* ATA IDENTIFY supported */
 
+#define VIRTIO_BLK_ID_BYTES(sizeof (__u16[256]))   /* IDENTIFY DATA */
+
+/* mapped into pci i/o region 0
+ */
 struct virtio_blk_config
 {
/* The capacity (in 512-byte sectors). */
@@ -32,6 +37,7 @@ struct virtio_blk_config
} geometry;
/* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */
__u32 blk_size;
+   __u8 identify[VIRTIO_BLK_ID_BYTES];
 } __attribute__((packed));
 
 /* These two define direction. */

-- 
john.coo...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] provide a kvm-free implementation of ioapic

2009-06-03 Thread Glauber Costa
Also, provide a kvm_ioapic that does not depend highly on common code.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 hw/ioapic.c |  162 +-
 hw/pc.c |7 ++-
 hw/pc.h |1 +
 3 files changed, 110 insertions(+), 60 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index 6c178c7..3eb5d5f 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -198,58 +198,11 @@ static void ioapic_mem_writel(void *opaque, 
target_phys_addr_t addr, uint32_t va
 }
 }
 
-static void kvm_kernel_ioapic_save_to_user(IOAPICState *s)
-{
-#if defined(KVM_CAP_IRQCHIP)  defined(TARGET_I386)
-struct kvm_irqchip chip;
-struct kvm_ioapic_state *kioapic;
-int i;
-
-chip.chip_id = KVM_IRQCHIP_IOAPIC;
-kvm_get_irqchip(kvm_context, chip);
-kioapic = chip.chip.ioapic;
-
-s-id = kioapic-id;
-s-ioregsel = kioapic-ioregsel;
-s-base_address = kioapic-base_address;
-s-irr = kioapic-irr;
-for (i = 0; i  IOAPIC_NUM_PINS; i++) {
-s-ioredtbl[i] = kioapic-redirtbl[i].bits;
-}
-#endif
-}
-
-static void kvm_kernel_ioapic_load_from_user(IOAPICState *s)
-{
-#if defined(KVM_CAP_IRQCHIP)  defined(TARGET_I386)
-struct kvm_irqchip chip;
-struct kvm_ioapic_state *kioapic;
-int i;
-
-chip.chip_id = KVM_IRQCHIP_IOAPIC;
-kioapic = chip.chip.ioapic;
-
-kioapic-id = s-id;
-kioapic-ioregsel = s-ioregsel;
-kioapic-base_address = s-base_address;
-kioapic-irr = s-irr;
-for (i = 0; i  IOAPIC_NUM_PINS; i++) {
-kioapic-redirtbl[i].bits = s-ioredtbl[i];
-}
-
-kvm_set_irqchip(kvm_context, chip);
-#endif
-}
-
-static void ioapic_save(QEMUFile *f, void *opaque)
+static void ioapic_save_common(QEMUFile *f, void *opaque)
 {
 IOAPICState *s = opaque;
 int i;
 
-if (kvm_enabled()  qemu_kvm_irqchip_in_kernel()) {
-kvm_kernel_ioapic_save_to_user(s);
-}
-
 qemu_put_8s(f, s-id);
 qemu_put_8s(f, s-ioregsel);
 qemu_put_be64s(f, s-base_address);
@@ -259,7 +212,12 @@ static void ioapic_save(QEMUFile *f, void *opaque)
 }
 }
 
-static int ioapic_load(QEMUFile *f, void *opaque, int version_id)
+static void ioapic_save(QEMUFile *f, void *opaque)
+{
+ioapic_save_common(f, opaque);
+}
+
+static int ioapic_load_common(QEMUFile *f, void *opaque, int version_id)
 {
 IOAPICState *s = opaque;
 int i;
@@ -283,14 +241,15 @@ static int ioapic_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_get_be64s(f, s-ioredtbl[i]);
 }
 
-if (kvm_enabled()  qemu_kvm_irqchip_in_kernel()) {
-kvm_kernel_ioapic_load_from_user(s);
-}
-
 return 0;
 }
 
-static void ioapic_reset(void *opaque)
+static int ioapic_load(QEMUFile *f, void *opaque, int version_id)
+{
+return ioapic_load_common(f, opaque, version_id);
+}
+
+static void ioapic_reset_common(void *opaque)
 {
 IOAPICState *s = opaque;
 int i;
@@ -299,11 +258,11 @@ static void ioapic_reset(void *opaque)
 s-base_address = IOAPIC_DEFAULT_BASE_ADDRESS;
 for(i = 0; i  IOAPIC_NUM_PINS; i++)
 s-ioredtbl[i] = 1  16; /* mask LVT */
-#ifdef KVM_CAP_IRQCHIP
-if (kvm_enabled()  qemu_kvm_irqchip_in_kernel()) {
-kvm_kernel_ioapic_load_from_user(s);
-}
-#endif
+}
+
+static void ioapic_reset(void *opaque)
+{
+ioapic_reset_common(opaque);
 }
 
 static CPUReadMemoryFunc *ioapic_mem_read[3] = {
@@ -335,3 +294,88 @@ IOAPICState *ioapic_init(void)
 
 return s;
 }
+
+#ifdef KVM_CAP_IRQCHIP
+static void kvm_kernel_ioapic_save_to_user(IOAPICState *s)
+{
+#if defined(TARGET_I386)
+struct kvm_irqchip chip;
+struct kvm_ioapic_state *kioapic;
+int i;
+
+chip.chip_id = KVM_IRQCHIP_IOAPIC;
+kvm_get_irqchip(kvm_context, chip);
+kioapic = chip.chip.ioapic;
+
+s-id = kioapic-id;
+s-ioregsel = kioapic-ioregsel;
+s-base_address = kioapic-base_address;
+s-irr = kioapic-irr;
+for (i = 0; i  IOAPIC_NUM_PINS; i++) {
+s-ioredtbl[i] = kioapic-redirtbl[i].bits;
+}
+#endif
+}
+
+static void kvm_kernel_ioapic_load_from_user(IOAPICState *s)
+{
+#if defined(TARGET_I386)
+struct kvm_irqchip chip;
+struct kvm_ioapic_state *kioapic;
+int i;
+
+chip.chip_id = KVM_IRQCHIP_IOAPIC;
+kioapic = chip.chip.ioapic;
+
+kioapic-id = s-id;
+kioapic-ioregsel = s-ioregsel;
+kioapic-base_address = s-base_address;
+kioapic-irr = s-irr;
+for (i = 0; i  IOAPIC_NUM_PINS; i++) {
+kioapic-redirtbl[i].bits = s-ioredtbl[i];
+}
+
+kvm_set_irqchip(kvm_context, chip);
+#endif
+}
+
+static void kvm_ioapic_save(QEMUFile *f, void *opaque)
+{
+IOAPICState *s = opaque;
+kvm_kernel_ioapic_save_to_user(s);
+
+ioapic_save_common(f, opaque);
+}
+
+static int kvm_ioapic_load(QEMUFile *f, void *opaque, int version_id)
+{
+IOAPICState *s = opaque;
+int r = ioapic_load_common(f, opaque, version_id);
+
+if (r == 0)
+kvm_kernel_ioapic_load_from_user(s);
+return r;
+}
+
+
+static void 

[PATCH 2/4] sipi and init: move common code

2009-06-03 Thread Glauber Costa
provide functions to query and reset the state of sipi and
init in cpu's apic. This way we can move the kvm specific functions
out of the apic path.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 cpu-defs.h |2 --
 hw/apic.c  |   49 -
 qemu-kvm.c |   26 ++
 qemu-kvm.h |7 +--
 4 files changed, 55 insertions(+), 29 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index 1e071e7..e66e928 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -140,8 +140,6 @@ typedef struct CPUWatchpoint {
 struct qemu_work_item;
 
 struct KVMCPUState {
-int sipi_needed;
-int init;
 pthread_t thread;
 int signalled;
 int stop;
diff --git a/hw/apic.c b/hw/apic.c
index 862289d..39e1675 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -86,6 +86,8 @@ typedef struct APICState {
 uint32_t initial_count;
 int64_t initial_count_load_time, next_time;
 QEMUTimer *timer;
+int sipi_needed;
+int init;
 } APICState;
 
 static int apic_io_memory;
@@ -93,6 +95,45 @@ static APICState *local_apics[MAX_APICS + 1];
 static int last_apic_id = 0;
 static int apic_irq_delivered;
 
+int apic_init_received(CPUState *env)
+{
+if (!env)
+return 0;
+if (!env-apic_state)
+return 0;
+
+return env-apic_state-init;
+}
+
+int apic_sipi_needed(CPUState *env)
+{
+if (!env)
+return 0;
+if (!env-apic_state)
+return 0;
+
+return env-apic_state-sipi_needed;
+}
+
+void apic_reset_sipi(CPUState *env)
+{
+if (!env)
+return;
+if (!env-apic_state)
+return;
+
+env-apic_state-sipi_needed = 0;
+}
+
+void apic_reset_init(CPUState *env)
+{
+if (!env)
+return;
+if (!env-apic_state)
+return;
+
+env-apic_state-init = 0;
+}
 
 static void apic_init_ipi(APICState *s);
 static void apic_set_irq(APICState *s, int vector_num, int trigger_mode);
@@ -475,9 +516,8 @@ static void apic_init_ipi(APICState *s)
 (!kvm_enabled() || !qemu_kvm_irqchip_in_kernel()))
 s-cpu_env-halted = 1;
 
-if (kvm_enabled()  !qemu_kvm_irqchip_in_kernel())
-   if (s-cpu_env)
-   kvm_apic_init(s-cpu_env);
+if (s-cpu_env  s-cpu_env-cpu_index != 0)
+   s-init = 1;
 }
 
 /* send a SIPI message to the CPU to start it */
@@ -490,8 +530,7 @@ static void apic_startup(APICState *s, int vector_num)
 cpu_x86_load_seg_cache(env, R_CS, vector_num  8, vector_num  12,
0x, 0);
 env-halted = 0;
-if (kvm_enabled()  !qemu_kvm_irqchip_in_kernel())
-   kvm_update_after_sipi(env);
+s-sipi_needed = 1;
 }
 
 static void apic_deliver(APICState *s, uint8_t dest, uint8_t dest_mode,
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 68d3b92..b5954bc 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -134,19 +134,6 @@ void kvm_update_interrupt_request(CPUState *env)
 }
 }
 
-void kvm_update_after_sipi(CPUState *env)
-{
-env-kvm_cpu_state.sipi_needed = 1;
-kvm_update_interrupt_request(env);
-}
-
-void kvm_apic_init(CPUState *env)
-{
-if (env-cpu_index != 0)
-   env-kvm_cpu_state.init = 1;
-kvm_update_interrupt_request(env);
-}
-
 #include signal.h
 
 static int try_push_interrupts(void *opaque)
@@ -332,7 +319,7 @@ static void kvm_vm_state_change_handler(void *context, int 
running, int reason)
 static void update_regs_for_sipi(CPUState *env)
 {
 kvm_arch_update_regs_for_sipi(env);
-env-kvm_cpu_state.sipi_needed = 0;
+apic_reset_sipi(env);
 }
 
 static void update_regs_for_init(CPUState *env)
@@ -345,11 +332,10 @@ static void update_regs_for_init(CPUState *env)
 
 #ifdef TARGET_I386
 /* restore SIPI vector */
-if(env-kvm_cpu_state.sipi_needed)
+if (apic_sipi_needed(env))
 env-segs[R_CS] = cs;
 #endif
-
-env-kvm_cpu_state.init = 0;
+apic_reset_init(env);
 kvm_arch_load_regs(env);
 }
 
@@ -407,12 +393,12 @@ static int kvm_main_loop_cpu(CPUState *env)
if (env-interrupt_request  (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI))
env-halted = 0;
 if (!kvm_irqchip_in_kernel(kvm_context)) {
-   if (env-kvm_cpu_state.init)
+   if (apic_init_received(env))
update_regs_for_init(env);
-   if (env-kvm_cpu_state.sipi_needed)
+   if (apic_sipi_needed(env))
update_regs_for_sipi(env);
 }
-   if (!env-halted  !env-kvm_cpu_state.init)
+   if (!env-halted  !apic_init_received(env))
kvm_cpu_exec(env);
env-exit_request = 0;
 env-exception_index = EXCP_INTERRUPT;
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 725589b..eb7dc29 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -31,9 +31,13 @@ void kvm_remove_all_breakpoints(CPUState *current_env);
 int kvm_update_guest_debug(CPUState *env, unsigned long reinject_trap);
 int kvm_qemu_init_env(CPUState *env);
 int kvm_qemu_check_extension(int ext);
-void kvm_apic_init(CPUState *env);
+/* FIXME: there should be an apic.h file */
 /* called 

Re: [KVM PATCH v5 0/2] iosignalfd

2009-06-03 Thread Gregory Haskins
Gregory Haskins wrote:
 (Applies to kvm.git/master:25deed73)

 This is v5 of the series.  For more details, please see the header to
 patch 2/2.

 This series has been tested against the kvm-eventfd unit test, and
 appears to be functioning properly.  You can download this test here:

 ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2

 This series is ready to be considered for inclusion, pending any further
 review comments.
   

Sorry, Marcello.  I re-used an old email when composing this. :)

Marcello, Avi, and myself have previously agreed that Marcello's
mmio-locking cleanup should go in first.   When that happens, I will
need to rebase this series because it changes how you interface to the
io_bus code.  I should have mentioned that here, but forgot.  (Speaking
of, is there an ETA when that code will be merged Avi?)

That aside, after I sent this series I went to get some coffee to clear
my head and I thought of an issue in the code.  I will reply inline to
patch 2/2.

-Greg

 [
Changelog:

   v5:
*) Removed cookie field, which was a misunderstanding on my
   part on what Avi wanted for a data-match feature
  *) Added a new trigger data-match feature which I think is
   much closer to what we need.
  *) We retain the dev_count field in the io_bus infrastructure
 and instead back-fill the array on removal.
  *) Various minor cleanups
  *) Rebased to kvm.git/master:25deed73

   v4:
*) Fixed a bug in the original 2/4 where the PIT failure case
   would potentially leave the io_bus components registered.
*) Condensed the v3 2/4 and 3/4 into one patch (2/2) since
   the patches became interdependent with the fix described above
*) Rebased to kvm.git/master:74dfca0a

   v3:
*) fixed patch 2/4 to handle error cases instead of BUG_ON
*) implemented same HAVE_EVENTFD protection mechanism as
   irqfd to prevent compilation errors on unsupported arches
*) completed testing
*) rebased to kvm.git/master:7391a6d5

   v2:
*) added optional data-matching capability (via cookie field)
*) changed name from iofd to iosignalfd
*) added io_bus unregister function
*) implemented deassign feature

   v1:
*) original release (integrated into irqfd v7 series as iofd)
 ]

 ---

 Gregory Haskins (2):
   kvm: add iosignalfd support
   kvm: make io_bus interface more robust


  arch/x86/kvm/i8254.c  |   22 +++
  arch/x86/kvm/i8259.c  |9 +
  arch/x86/kvm/x86.c|1 
  include/linux/kvm.h   |   15 ++
  include/linux/kvm_host.h  |   16 ++
  virt/kvm/coalesced_mmio.c |8 +
  virt/kvm/eventfd.c|  356 
 +
  virt/kvm/ioapic.c |9 +
  virt/kvm/kvm_main.c   |   41 +
  9 files changed, 462 insertions(+), 15 deletions(-)

   




signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v5 2/2] kvm: add iosignalfd support

2009-06-03 Thread Gregory Haskins
Gregory Haskins wrote:
 iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
 signal when written to by a guest.  Host userspace can register any arbitrary
 IO address with a corresponding eventfd and then pass the eventfd to a
 specific end-point of interest for handling.

 Normal IO requires a blocking round-trip since the operation may cause
 side-effects in the emulated model or may return data to the caller.
 Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
 heavy-weight exit back to userspace, and is ultimately serviced by qemu's
 device model synchronously before returning control back to the vcpu.

 However, there is a subclass of IO which acts purely as a trigger for
 other IO (such as to kick off an out-of-band DMA request, etc).  For these
 patterns, the synchronous call is particularly expensive since we really
 only want to simply get our notification transmitted asychronously and
 return as quickly as possible.  All the sychronous infrastructure to ensure
 proper data-dependencies are met in the normal IO case are just unecessary
 overhead for signalling.  This adds additional computational load on the
 system, as well as latency to the signalling path.

 Therefore, we provide a mechanism for registration of an in-kernel trigger
 point that allows the VCPU to only require a very brief, lightweight
 exit just long enough to signal an eventfd.  This also means that any
 clients compatible with the eventfd interface (which includes userspace
 and kernelspace equally well) can now register to be notified. The end
 result should be a more flexible and higher performance notification API
 for the backend KVM hypervisor and perhipheral components.

 To test this theory, we built a test-harness called doorbell.  This
 module has a function called doorbell_ring() which simply increments a
 counter for each time the doorbell is signaled.  It supports signalling
 from either an eventfd, or an ioctl().

 We then wired up two paths to the doorbell: One via QEMU via a registered
 io region and through the doorbell ioctl().  The other is direct via
 iosignalfd.

 You can download this test harness here:

 ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

 The measured results are as follows:

 qemu-mmio:   11 iops, 9.09us rtt
 iosignalfd-mmio: 200100 iops, 5.00us rtt
 iosignalfd-pio:  367300 iops, 2.72us rtt

 I didn't measure qemu-pio, because I have to figure out how to register a
 PIO region with qemu's device model, and I got lazy.  However, for now we
 can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
 and -350ns for HC, we get:

 qemu-pio:  153139 iops, 6.53us rtt
 iosignalfd-hc: 412585 iops, 2.37us rtt

 these are just for fun, for now, until I can gather more data.

 Here is a graph for your convenience:

 http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

 The conclusion to draw is that we save about 4us by skipping the userspace
 hop.

 

 Signed-off-by: Gregory Haskins ghask...@novell.com
 ---

  arch/x86/kvm/x86.c   |1 
  include/linux/kvm.h  |   15 ++
  include/linux/kvm_host.h |   10 +
  virt/kvm/eventfd.c   |  356 
 ++
  virt/kvm/kvm_main.c  |   11 +
  5 files changed, 389 insertions(+), 4 deletions(-)

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index c1ed485..c96c0e3 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1097,6 +1097,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_IRQ_INJECT_STATUS:
   case KVM_CAP_ASSIGN_DEV_IRQ:
   case KVM_CAP_IRQFD:
 + case KVM_CAP_IOSIGNALFD:
   case KVM_CAP_PIT2:
   r = 1;
   break;
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 632a856..53b720d 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -300,6 +300,19 @@ struct kvm_guest_debug {
   struct kvm_guest_debug_arch arch;
  };
  
 +#define KVM_IOSIGNALFD_FLAG_TRIGGER   (1  0)
 +#define KVM_IOSIGNALFD_FLAG_PIO   (1  1)
 +#define KVM_IOSIGNALFD_FLAG_DEASSIGN  (1  2)
 +
 +struct kvm_iosignalfd {
 + __u64 trigger;
 + __u64 addr;
 + __u32 len;
 + __u32 fd;
 + __u32 flags;
 + __u8  pad[36];
 +};
 +
  #define KVM_TRC_SHIFT   16
  /*
   * kvm trace categories
 @@ -430,6 +443,7 @@ struct kvm_trace_rec {
  #ifdef __KVM_HAVE_PIT
  #define KVM_CAP_PIT2 33
  #endif
 +#define KVM_CAP_IOSIGNALFD 34
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -537,6 +551,7 @@ struct kvm_irqfd {
  #define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
  #define KVM_IRQFD  _IOW(KVMIO, 0x76, struct kvm_irqfd)
  #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct 
 kvm_pit_config)
 +#define KVM_IOSIGNALFD _IOW(KVMIO, 0x78, struct kvm_iosignalfd)
  
  /*
   * ioctls for vcpu fds
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 

Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support

2009-06-03 Thread Gregory Haskins
Mark McLoughlin wrote:
 On Wed, 2009-05-27 at 13:40 -0400, Gregory Haskins wrote:
   
 Mark McLoughlin wrote:
 
 On Wed, 2009-05-27 at 15:11 +0300, Avi Kivity wrote:

   
   
 Multiple cookies on the same address are required by virtio.  You can't 
 mux since the data doesn't go anywhere.

 Virtio can survive by checking all rings on a notify, and we can later 
 add a mechanism that has a distinct address for each ring, but let's see 
 if we can cope with multiple cookies.  Mark?
 
 
 Trying to catch up, but you're talking about replacing virtio-pci
 QUEUE_NOTIFY handling with iosignalfd ?

 For a perfect replacement, what you really need is to be able to
 register multiple cookies per address range, but only have them trigger
 if the written data matches a provided value.
   
   
 Hmm..thats an interesting idea.  To date, the cookie has really been
 for identifying the proper range selected for deassignment.  I never
 thought of using it as an actual trigger value at run-time.

 
 If the data is lost, virtio has no way of knowing which queue is being
 notified, so we either end up with per-device, rather than per-queue,
 notifications (probably not too bad for net, at least) or a different
 notify address per queue (limiting the number of queues per device).
   
   
 The addr-per-queue is how I was envisioning it, but the trigger value
 concept hadn't occurred to me.  I could make this an option during
 assignment (e.g. COOKIE flag means only trigger on writes of the
 provided cookie, otherwise trigger on any write).  Sound good?
 

 Ah, I'd been thinking of the trigger data being provided separately to
 the cookie.

 The virtio ABI is fixed, so we couldn't e.g. have the guest use a cookie
 to identify a queue - it's just going to continue using a per-device
 queue number. So, if the cookie was also the trigger, we'd need an
 eventfd per device.

 And if this was a device where the guest writes similar values to
 multiple addresses, you'd need an eventfd per address.

   

Hi Mark,
  So with the v5 release of iosignalfd, we now have the notion of a
trigger, the API of which is as follows:

---
/*!
 * \brief Assign an eventfd to an IO port (PIO or MMIO)
 *
 * Assigns an eventfd based file-descriptor to a specific PIO or MMIO
 * address range.  Any guest writes to the specified range will generate
 * an eventfd signal.
 *
 * A data-match pointer can be optionally provided in trigger and only
 * writes which match this value exactly will generate an event.  The length
 * of the trigger is established by the length of the overall IO range, and
 * therefore must be in a natural byte-width for the IO routines of your
 * particular architecture (e.g. 1, 2, 4, or 8 bytes on x86_64).
 *
 * \param kvm Pointer to the current kvm_context
 * \param addr The IO address
 * \param len The length of the IO region at the address
 * \param fd The eventfd file-descriptor
 * \param trigger A optional pointer providing data-match token
 * \param flags FLAG_PIO: PIO, else MMIO
 */
int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long addr, size_t len,
  int fd, void *trigger, int flags);
-

in the kvm-eventfd test harness, I create three unique eventfd handles,
and do the following:


---

unsigned char matchA = 0xa5, matchB = 0x42;

kvm_assign_iosignalfd(kvm_context, addr, 1, fd[0], NULL, 0);
kvm_assign_iosignalfd(kvm_context, addr, 1, fd[1], matchA, 0);
kvm_assign_iosignalfd(kvm_context, addr, 1, fd[2], matchB, 0);

---

In otherwords, I register a NULL trigger (wildcarded) on the first
fd.  The second has a data-match trigger of 0xa5, and the third has
0x42.  All three of these eventfd's map to the same mmio address with a
width of 1 byte.

I also fork a task which selects all three fds, and will print out the
eventfd count value when tripped.

Then, in the guest, I do:

--

iowrite8(0, iosignalfd_mmio);
iowrite8(0xa5, iosignalfd_mmio);
iowrite8(0x42, iosignalfd_mmio);

---

The result of which is:

IOSIGNALFD 0: event triggered with val 3
IOSIGNALFD 1: event triggered with val 1
IOSIGNALFD 2: event triggered with val 1

on the host, which is my expected outcome.  Let me know if you do not
think this is sufficient to implement a solution to your virtio-pci design.

-Greg



signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v5 2/2] kvm: add iosignalfd support

2009-06-03 Thread Paul E. McKenney
On Wed, Jun 03, 2009 at 04:17:49PM -0400, Gregory Haskins wrote:
 iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
 signal when written to by a guest.  Host userspace can register any arbitrary
 IO address with a corresponding eventfd and then pass the eventfd to a
 specific end-point of interest for handling.
 
 Normal IO requires a blocking round-trip since the operation may cause
 side-effects in the emulated model or may return data to the caller.
 Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
 heavy-weight exit back to userspace, and is ultimately serviced by qemu's
 device model synchronously before returning control back to the vcpu.
 
 However, there is a subclass of IO which acts purely as a trigger for
 other IO (such as to kick off an out-of-band DMA request, etc).  For these
 patterns, the synchronous call is particularly expensive since we really
 only want to simply get our notification transmitted asychronously and
 return as quickly as possible.  All the sychronous infrastructure to ensure
 proper data-dependencies are met in the normal IO case are just unecessary
 overhead for signalling.  This adds additional computational load on the
 system, as well as latency to the signalling path.
 
 Therefore, we provide a mechanism for registration of an in-kernel trigger
 point that allows the VCPU to only require a very brief, lightweight
 exit just long enough to signal an eventfd.  This also means that any
 clients compatible with the eventfd interface (which includes userspace
 and kernelspace equally well) can now register to be notified. The end
 result should be a more flexible and higher performance notification API
 for the backend KVM hypervisor and perhipheral components.
 
 To test this theory, we built a test-harness called doorbell.  This
 module has a function called doorbell_ring() which simply increments a
 counter for each time the doorbell is signaled.  It supports signalling
 from either an eventfd, or an ioctl().
 
 We then wired up two paths to the doorbell: One via QEMU via a registered
 io region and through the doorbell ioctl().  The other is direct via
 iosignalfd.
 
 You can download this test harness here:
 
 ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
 
 The measured results are as follows:
 
 qemu-mmio:   11 iops, 9.09us rtt
 iosignalfd-mmio: 200100 iops, 5.00us rtt
 iosignalfd-pio:  367300 iops, 2.72us rtt
 
 I didn't measure qemu-pio, because I have to figure out how to register a
 PIO region with qemu's device model, and I got lazy.  However, for now we
 can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
 and -350ns for HC, we get:
 
 qemu-pio:  153139 iops, 6.53us rtt
 iosignalfd-hc: 412585 iops, 2.37us rtt
 
 these are just for fun, for now, until I can gather more data.
 
 Here is a graph for your convenience:
 
 http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
 
 The conclusion to draw is that we save about 4us by skipping the userspace
 hop.
 
 

One set of decision criteria for RCU vs. SRCU below, and one question
about the RCU reader running concurrently with a deletion.  Looks pretty
good, in general!

Thanx, Paul

 Signed-off-by: Gregory Haskins ghask...@novell.com
 ---
 
  arch/x86/kvm/x86.c   |1 
  include/linux/kvm.h  |   15 ++
  include/linux/kvm_host.h |   10 +
  virt/kvm/eventfd.c   |  356 
 ++
  virt/kvm/kvm_main.c  |   11 +
  5 files changed, 389 insertions(+), 4 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index c1ed485..c96c0e3 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1097,6 +1097,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_IRQ_INJECT_STATUS:
   case KVM_CAP_ASSIGN_DEV_IRQ:
   case KVM_CAP_IRQFD:
 + case KVM_CAP_IOSIGNALFD:
   case KVM_CAP_PIT2:
   r = 1;
   break;
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 632a856..53b720d 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -300,6 +300,19 @@ struct kvm_guest_debug {
   struct kvm_guest_debug_arch arch;
  };
 
 +#define KVM_IOSIGNALFD_FLAG_TRIGGER   (1  0)
 +#define KVM_IOSIGNALFD_FLAG_PIO   (1  1)
 +#define KVM_IOSIGNALFD_FLAG_DEASSIGN  (1  2)
 +
 +struct kvm_iosignalfd {
 + __u64 trigger;
 + __u64 addr;
 + __u32 len;
 + __u32 fd;
 + __u32 flags;
 + __u8  pad[36];
 +};
 +
  #define KVM_TRC_SHIFT   16
  /*
   * kvm trace categories
 @@ -430,6 +443,7 @@ struct kvm_trace_rec {
  #ifdef __KVM_HAVE_PIT
  #define KVM_CAP_PIT2 33
  #endif
 +#define KVM_CAP_IOSIGNALFD 34
 
  #ifdef KVM_CAP_IRQ_ROUTING
 
 @@ -537,6 +551,7 @@ struct kvm_irqfd {
  #define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
  #define KVM_IRQFD  _IOW(KVMIO, 

[RFC] CPU hard limits

2009-06-03 Thread Bharata B Rao
Hi,

This is an RFC about the CPU hard limits feature where I have explained
the need for the feature, the proposed plan and the issues around it.
Before I come up with an implementation for hard limits, I would like to
know community's thoughts on this scheduler enhancement and any feedback
and suggestions.

Regards,
Bharata.

1. CPU hard limit
2. Need for hard limiting CPU resource
3. Granularity of enforcing CPU hard limits
4. Existing solutions
5. Specifying hard limits
6. Per task group vs global bandwidth period
7. Configuring
8. Throttling of tasks
9. Group scheduler hierarchy considerations
10. SMP considerations
11. Starvation
12. Hard limit and fairness

1. CPU hard limit
-
CFS is a proportional share scheduler which tries to divide the CPU time
proportionately between tasks or groups of tasks (task group/cgroup) depending
on the priority/weight of the task or shares assigned to groups of tasks.
In CFS, a task/task group can get more than its share of CPU if there are
enough idle CPU cycles available in the system, due to the work conserving
nature of the scheduler.

However there are scenarios (Sec 2) where giving more than the desired
CPU share to a task/task group is not acceptable. In those scenarios, the
scheduler needs to put a hard stop on the CPU resource consumption of
task/task group if it exceeds a preset limit. This is usually achieved by
throttling the task/task group when it fully consumes its allocated CPU time.

2. Need for hard limiting CPU resource
--
- Pay-per-use: In enterprise systems that cater to multiple clients/customers
  where a customer demands a certain share of CPU resources and pays only
  that, CPU hard limits will be useful to hard limit the customer's job
  to consume only the specified amount of CPU resource.
- In container based virtualization environments running multiple containers,
  hard limits will be useful to ensure a container doesn't exceed its
  CPU entitlement.
- Hard limits can be used to provide guarantees.

3. Granularity of enforcing CPU hard limits
---
Conceptually, hard limits can either be enforced for individual tasks or
groups of tasks.  However enforcing limits per task would be too fine
grained and would be a lot of work on the part of the system administrator
in terms of setting limits for every task. Based on the current understanding
of the users of this feature,  it is felt that hard limiting is more useful
at task group level than the individual tasks level. Hence in the subsequent
paragraphs, the concept of hard limit as applicable to task group/cgroup
is discussed.

4. Existing solutions
-
- Both Linux-VServer and OpenVZ virtualization solutions support CPU hard
  limiting.
- Per task limit can be enforced using rlimits, but it is not rate based.

5. Specifying hard limits
-
CPU time consumed by a task group is generally measured over a
time period (called bandwidth period) and the task group gets throttled
when its CPU time reaches a limit (hard limit) within a bandwidth period.
The task group remains throttled until the bandwidth period gets
renewed at which time additional CPU time becomes available
to the tasks in the system.

When a task group's hard limit is specified as a ratio X/Y, it means that
the group will get throttled if its CPU time consumption exceeds X seconds
in a bandwidth period of Y seconds.

Specifying the hard limit as X/Y requires us to specify the bandwidth
period also.

Is having a uniform/same bandwidth period for all the groups an option ?
If so, we could even specify the hard limit as a percentage, like
30% of a uniform bandwidth period.

6. Per task group vs global bandwidth period

The bandwidth period can either be per task group or global. With global
bandwidth period, the runtimes of all the task groups need to be
replenished when the period ends. Though this appears conceptually simple,
the implementation might not scale. Instead if every task group maintains its
bandwidth period separately, the refresh cycles of each group will happen
independent of each other. Moreover different groups might prefer different
bandwidth periods. Hence the first implementation will have per task group
bandwidth period.

Timers can be used to trigger bandwidth refresh cycles. (similar to rt group
sched)

7. Configuring
--
- User could set the hard limit (X and/or Y) through the cgroup fs.
- When the scheduler supports hard limiting, should it be enabled
  for all tasks groups of the system ? Or should user have an option
  to enable hard limiting per group ?
- When hard limiting is enabled for a group, should the limit be
  set to a default to start with ? Or should the user set the limit
  and the bandwidth before enabling the hard limiting ?
- What should be a sane default value for the bandwidth period ?

8. Throttling of