device passthrough

2010-05-28 Thread Mu Lin
Hi, All:

Is there any method to directly assign a device to Guest OS  without VT-d?

Thanks

Mu--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/10] Redirct and make use of the guest serial console

2010-05-28 Thread Jason Wang
Lucas Meneghel Rodrigues wrote:
 On Tue, 2010-05-11 at 17:03 +0800, Jason Wang wrote:
   
 The guest console is useful for failure troubleshooting especially for
 the one who has calltrace. And as we plan to push the network related
 test in the next few weeks, we found the serial session in more
 reliable during the network testing. So this patchset logs the guest
 serial throught the redirectied serial of guest and also enable the
 ability to log into guest through serial console. I only open the
 serial console for linux, I would do some investigation on windows
 guests. 

 Change from v1:

 - Coding style improvement according to the suggestions from Michael Goldish
 - Improve the username sending handling in remote_login()
 - Change the matching re of login to [Ll]ogin:\s*$
 - Check whether vm have already dead in dumpping thread
 - Return none rather than raise exception when met unknown shell_client
 - Keep tty0 for all linux guests
 - Enable the serial console in unattended installation
 - Add a helper to check whether the panic information was occured 
 - Keep the porcess() at its original location in preprocess()
 

 Jason, after a long conversation I've had with Michael during the
 previous week, we reached some common points:

 1 - We believe it is possible to be able to both log in *and* log serial
 console output. That will require changes to kvm_subprocess and might
 take a little bit more time.
   
Yes, I've tried a ugly implementation of a server in serial_dump_thread
( finally dropped from my patch ), so I agree that it would be much more
elegant if I can get the support from kvm_subprocess.
 2 - We know you guys are depending on this patchset to be accepted in
 order to proceed with the network related cases. However, we ask for a
 little more patience, and we'd like to get your opinions on the patches
 that we are going to roll out. This way we can get to a better solution
 for all of us.

 So, please bear with us and I'll try to see with Michael and Dor if we
 can prioritize this work to not block work items for you guys.

 Cheers,

 Lucas

   
Ok, no problem.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM test: Do not use the hard-coded address during unattended installation

2010-05-28 Thread Jason Wang
Lucas Meneghel Rodrigues wrote:
 On Wed, 2010-05-19 at 17:20 +0800, Jason Wang wrote:
   
 When we do the unattended installation in tap mode, we should use
 vm.get_address() instead of the 'localhost' in order the connect to
 the finish program running in the guest.

 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  client/tests/kvm/tests/unattended_install.py |   25 
 +
  1 files changed, 13 insertions(+), 12 deletions(-)

 diff --git a/client/tests/kvm/tests/unattended_install.py 
 b/client/tests/kvm/tests/unattended_install.py
 index e2cec8e..e71f993 100644
 --- a/client/tests/kvm/tests/unattended_install.py
 +++ b/client/tests/kvm/tests/unattended_install.py
 @@ -17,7 +17,6 @@ def run_unattended_install(test, params, env):
  vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
  
  port = vm.get_port(int(params.get(guest_port_unattended_install)))
 -addr = ('localhost', port)
  if params.get(post_install_delay):
  post_install_delay = int(params.get(post_install_delay))
  else:
 @@ -31,17 +30,19 @@ def run_unattended_install(test, params, env):
  time_elapsed = 0
  while time_elapsed  install_timeout:
  client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 -try:
 -client.connect(addr)
 -msg = client.recv(1024)
 -if msg == 'done':
 -if post_install_delay:
 -logging.debug(Post install delay specified, 
 -  waiting %ss..., post_install_delay)
 -time.sleep(post_install_delay)
 -break
 -except socket.error:
 -pass
 +addr = vm.get_address()
 +if addr:
 

 ^ Per coding style, we should check for is None

 if addr is not None:

   
 +try:
 +client.connect((addr, port))
 +msg = client.recv(1024)
 +if msg == 'done':
 +if post_install_delay:
 +logging.debug(Post install delay specified, 
 +  waiting %ss..., post_install_delay)
 +time.sleep(post_install_delay)
 +break
 +except socket.error:
 +pass
 

 ^ If vm.get_address() returns None, we'll have to fail the test, if we
 don't we'll get a false PASS.

   
An vm may not get its ip address during the startup and because we have
timeout, I think it's safe here.
  time.sleep(1)
  client.close()
  end_time = time.time()

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


   

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM test: Add the support of kernel and initrd option for qemu-kvm

2010-05-28 Thread Jason Wang
-kernel option is useful for both unattended installation and the
unittest in /kvm/user/test.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_vm.py |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index bca9d15..c7eed56 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -360,6 +360,16 @@ class VM:
 tftp = kvm_utils.get_path(root_dir, tftp)
 qemu_cmd += add_tftp(help, tftp)
 
+kernel = params.get(kernel)
+if kernel:
+kernel = kvm_utils.get_path(root_dir, kernel)
+qemu_cmd +=  -kernel %s % kernel
+
+initrd = params.get(initrd)
+if initrd:
+initrd = kvm_utils.get_path(root_dir, initrd)
+qemu_cmd +=  -initrd %s % initrd
+
 for redir_name in kvm_utils.get_sub_dict_names(params, redirs):
 redir_params = kvm_utils.get_sub_dict(params, redir_name)
 guest_port = int(redir_params.get(guest_port))

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM test: Do not use the hard-coded address during unattended installation

2010-05-28 Thread Jason Wang
When we do the unattended installation in tap mode, we should use
vm.get_address() instead of the 'localhost' in order the connect to
the finish program running in the guest.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/tests/unattended_install.py |   25 +
 1 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/client/tests/kvm/tests/unattended_install.py 
b/client/tests/kvm/tests/unattended_install.py
index e2cec8e..8928575 100644
--- a/client/tests/kvm/tests/unattended_install.py
+++ b/client/tests/kvm/tests/unattended_install.py
@@ -17,7 +17,6 @@ def run_unattended_install(test, params, env):
 vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
 
 port = vm.get_port(int(params.get(guest_port_unattended_install)))
-addr = ('localhost', port)
 if params.get(post_install_delay):
 post_install_delay = int(params.get(post_install_delay))
 else:
@@ -31,17 +30,19 @@ def run_unattended_install(test, params, env):
 time_elapsed = 0
 while time_elapsed  install_timeout:
 client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
-try:
-client.connect(addr)
-msg = client.recv(1024)
-if msg == 'done':
-if post_install_delay:
-logging.debug(Post install delay specified, 
-  waiting %ss..., post_install_delay)
-time.sleep(post_install_delay)
-break
-except socket.error:
-pass
+addr = vm.get_address()
+if addr is not None:
+try:
+client.connect((addr, port))
+msg = client.recv(1024)
+if msg == 'done':
+if post_install_delay:
+logging.debug(Post install delay specified, 
+  waiting %ss..., post_install_delay)
+time.sleep(post_install_delay)
+break
+except socket.error:
+pass
 time.sleep(1)
 client.close()
 end_time = time.time()

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM test: Add implementation of network based unattended installation

2010-05-28 Thread Jason Wang
This patch could let the unattended installation to be done through
the following method:

- unattended.cdrom: the original method which does the installation
  from cdrom
- unattended.url: installing the linux guest from http or ftp, tree
  url was specified through url
- unattended.nfs: installing the linux guest from nfs. the server
  address was specified through nfs_server, and the director was
  specified through nfs_dir
- unattended.remote_ks: installing the linux guest through a remote
  kickstart file

For url and nfs installation, the extra_params need to be configurated
to specify the location of unattended files:

- If the unattended file in the tree is used, extra_parmas= append
  ks=floppy and unattended_file params need to be specified in the
  configuration file.
- If the unattended file located at remote server is used,
  unattended_file option must be none and extram_params= append
  ks=http://xxx; need to be speficied in the configuration file and
  don't forget the add the finish nofitication part.

The --kernel and --initrd were used directly for the network
installation instead of the tftp/bootp param because user mode network
is too slow to do this.

Only the unattended files for RHEL and Fedora gues ts are modified,
others are kept unmodified and could do the installation from cdrom.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/scripts/unattended.py   |  107 +++-
 client/tests/kvm/tests_base.cfg.sample   |  172 +++---
 client/tests/kvm/unattended/Fedora-10.ks |2 
 client/tests/kvm/unattended/Fedora-11.ks |2 
 client/tests/kvm/unattended/Fedora-12.ks |2 
 client/tests/kvm/unattended/Fedora-8.ks  |2 
 client/tests/kvm/unattended/Fedora-9.ks  |2 
 client/tests/kvm/unattended/RHEL-3-series.ks |2 
 client/tests/kvm/unattended/RHEL-4-series.ks |2 
 client/tests/kvm/unattended/RHEL-5-series.ks |2 
 10 files changed, 206 insertions(+), 89 deletions(-)

diff --git a/client/tests/kvm/scripts/unattended.py 
b/client/tests/kvm/scripts/unattended.py
index fdadd03..0377d83 100755
--- a/client/tests/kvm/scripts/unattended.py
+++ b/client/tests/kvm/scripts/unattended.py
@@ -50,8 +50,9 @@ class UnattendedInstall(object):
 self.cdrom_iso = os.path.join(kvm_test_dir, cdrom_iso)
 self.floppy_mount = tempfile.mkdtemp(prefix='floppy_', dir='/tmp')
 self.cdrom_mount = tempfile.mkdtemp(prefix='cdrom_', dir='/tmp')
-flopy_name = os.environ['KVM_TEST_floppy']
-self.floppy_img = os.path.join(kvm_test_dir, flopy_name)
+self.nfs_mount = tempfile.mkdtemp(prefix='nfs_', dir='/tmp')
+floppy_name = os.environ['KVM_TEST_floppy']
+self.floppy_img = os.path.join(kvm_test_dir, floppy_name)
 floppy_dir = os.path.dirname(self.floppy_img)
 if not os.path.isdir(floppy_dir):
 os.makedirs(floppy_dir)
@@ -60,6 +61,16 @@ class UnattendedInstall(object):
 self.pxe_image = os.environ.get('KVM_TEST_pxe_image', '')
 self.pxe_initrd = os.environ.get('KVM_TEST_pxe_initrd', '')
 
+self.medium = os.environ.get('KVM_TEST_medium', '')
+self.url = os.environ.get('KVM_TEST_url', '')
+self.kernel = os.environ.get('KVM_TEST_kernel', '')
+self.initrd = os.environ.get('KVM_TEST_initrd', '')
+self.nfs_server = os.environ.get('KVM_TEST_nfs_server', '')
+self.nfs_dir = os.environ.get('KVM_TEST_nfs_dir', '')
+self.image_path = kvm_test_dir
+self.kernel_path = os.path.join(self.image_path, self.kernel)
+self.initrd_path = os.path.join(self.image_path, self.initrd)
+
 
 def create_boot_floppy(self):
 
@@ -106,7 +117,8 @@ class UnattendedInstall(object):
 dest = os.path.join(self.floppy_mount, dest_fname)
 
 # Replace KVM_TEST_CDKEY (in the unattended file) with the cdkey
-# provided for this test
+# provided for this test and replace the KVM_TEST_MEDIUM with
+# the tree url or nfs address provided for this test.
 unattended_contents = open(self.unattended_file).read()
 dummy_cdkey_re = r'\bKVM_TEST_CDKEY\b'
 real_cdkey = os.environ.get('KVM_TEST_cdkey')
@@ -117,7 +129,20 @@ class UnattendedInstall(object):
 else:
 print (WARNING: 'cdkey' required but not specified for 
this unattended installation)
+
+dummy_re = r'\bKVM_TEST_MEDIUM\b'
+if self.medium == cdrom:
+content = cdrom
+elif self.medium == url:
+content = url --url %s % self.url
+elif self.medium == nfs:
+content = nfs --server=%s --dir=%s % (self.nfs_server, 
self.nfs_dir)
+else:
+raise SetupError(Unexpected installation medium %s % 
self.url)
+
+unattended_contents = 

Re: [PATCHv2-RFC 0/2] virtio: put last seen used index into ring itself

2010-05-28 Thread Jes Sorensen
On 05/26/10 21:50, Michael S. Tsirkin wrote:
 Here's a rewrite of the original patch with a new layout.
 I haven't tested it yet so no idea how this performs, but
 I think this addresses the cache bounce issue raised by Avi.
 Posting for early flames/comments.
 
 Generally, the Host end of the virtio ring doesn't need to see where
 Guest is up to in consuming the ring.  However, to completely understand
 what's going on from the outside, this information must be exposed.
 For example, host can reduce the number of interrupts by detecting
 that the guest is currently handling previous buffers.
 
 We add a feature bit so the guest can tell the host that it's writing
 out the current value there, if it wants to use that.
 
 This differs from original approach in that the used index
 is put after avail index (they are typically written out together).
 To avoid cache bounces on descriptor access,
 and make future extensions easier, we put the ring itself at start of
 page, and move the control after it.

Hi Michael,

It looks pretty good to me, however one thing I have been thinking of
while reading through it:

Rather than storing a pointer within the ring struct, pointing into a
position within the same struct. How about storing a byte offset instead
and using a cast to get to the pointer position? That would avoid the
pointer dereference, which is less effective cache wise and harder for
the CPU to predict.

Not sure whether it really matters performance wise, just a thought.

Cheers,
Jes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/2] Fix scsi-generic breakage in upstream qemu-kvm.git

2010-05-28 Thread Kevin Wolf
Am 27.05.2010 17:56, schrieb Nicholas A. Bellinger:
 On Thu, 2010-05-20 at 15:18 +0200, Kevin Wolf wrote:
 Am 17.05.2010 18:45, schrieb Nicholas A. Bellinger:
 From: Nicholas Bellinger n...@linux-iscsi.org

 Greetings,

 Attached are the updated patches following hch's comments to fix 
 scsi-generic
 device breakage with find_image_format() and refresh_total_sectors().

 These are being resent as the last attachments where in MBOX format from 
 git-format-patch.

 Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org

 Thanks, applied all to the block branch, even though I forgot to reply here.

 Kevin
 
 Hi Kevin,
 
 Thanks for accepting the series.  There is one more piece of breakage
 that Chris Krumme found in block.c:find_image_format() in the original
 patch.  Please apply the patch to add the missing bdrv_delete() for the
 SG_IO case below.
 
 Thanks for pointing this out Chris!

Right, thanks for the fix.  I've applied it to the block branch.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question about IOMMU

2010-05-28 Thread James Neave
Hi,

I'm trying to use KVM virtualization at home and I've run into what I
think is a limitation of my hardware.
I'm trying to pass a PCI card (WinTV NOVA-T 500) through to a guest OS
but I get the error 'IOMMU not found'.

Now I've read that this is because my motherboard does not have an
IOMMU to perform hardware accelerated device virtualization.
I have an AMD based system and apparently they call it AMD-Vi and
Intel call it VT-d.

Now, my 790GX based board does not have IOMMU, but the latest chipset,
the 890FX, apparently does, it has IOMMU v1.2

Before I go and spend lots of money on what it a very expensive board,
can somebody confirm that an AMD 890FX based board with a Phenom II X3
processor with Ubuntu 10.04 server 64bit using KVM/QEMU will allow me
to passthrough a PCI card to the guest OS?

On another note, now that I've subscribed to this mailing list I
notice that there is an email named AMD IOMMU Emulation
Is that what I think it is and can I compile up a patched version for my server?

Many Thanks,

James.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Measure the timedrift after continuing a stopped vm

2010-05-28 Thread Jason Wang
This test extends the timedifrt test and measures the timedirft across
the vm stopping and continuing. Two helpers function are also added in
kvm_test_utils to do the stop and continue.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/tests/timedrift_with_stop.py |   97 +
 1 files changed, 97 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/timedrift_with_stop.py

diff --git a/client/tests/kvm/kvm_test_utils.py 
b/client/tests/kvm/kvm_test_utils.py
index 24e2bf5..7e158a3 100644
--- a/client/tests/kvm/kvm_test_utils.py
+++ b/client/tests/kvm/kvm_test_utils.py
@@ -181,6 +181,30 @@ def migrate(vm, env=None):
 raise
 
 
+def stop(vm):
+
+Stop a running vm
+
+s, o = vm.send_monitor_cmd(stop)
+if s != 0:
+raise error.TestError(Could not send the stop command)
+s, o = vm.send_monitor_cmd(info status)
+if paused not in o:
+raise error.TestFail(VM does not stop afer send stop command)
+
+
+def cont(vm):
+
+Continue a stopped vm
+
+s, o = vm.send_monitor_cmd(cont)
+if s != 0:
+raise error.TestError(Could not send the cont command)
+s, o = vm.send_monitor_cmd(info status)
+if running not in o:
+raise error.TestFail(VM still in paused status after sending cont)
+
+
 def get_time(session, time_command, time_filter_re, time_format):
 
 Return the host time and guest time.  If the guest time cannot be fetched
diff --git a/client/tests/kvm/tests/timedrift_with_stop.py 
b/client/tests/kvm/tests/timedrift_with_stop.py
new file mode 100644
index 000..b99dd40
--- /dev/null
+++ b/client/tests/kvm/tests/timedrift_with_stop.py
@@ -0,0 +1,97 @@
+import logging, time, commands, re
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils
+
+
+def run_timedrift_with_stop(test, params, env):
+
+Time drift test with stop/continue the guest:
+
+1) Log into a guest.
+2) Take a time reading from the guest and host.
+3) Stop the running of the guest
+4) Sleep for a while
+5) Continue the guest running
+6) Take a second time reading.
+7) If the drift (in seconds) is higher than a user specified value, fail.
+
+@param test: KVM test object.
+@param params: Dictionary with test parameters.
+@param env: Dictionary with the test environment.
+
+login_timeout = int(params.get(login_timeout, 360))
+vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
+session = kvm_test_utils.wait_for_login(vm, timeout=login_timeout)
+
+# Collect test parameters:
+# Command to run to get the current time
+time_command = params.get(time_command)
+# Filter which should match a string to be passed to time.strptime()
+time_filter_re = params.get(time_filter_re)
+# Time format for time.strptime()
+time_format = params.get(time_format)
+drift_threshold = float(params.get(drift_threshold, 10))
+drift_threshold_single = float(params.get(drift_threshold_single, 3))
+stop_iterations = int(params.get(stop_iterations, 1))
+stop_time = int(params.get(stop_time, 60))
+
+try:
+# Get initial time
+# (ht stands for host time, gt stands for guest time)
+(ht0, gt0) = kvm_test_utils.get_time(session, time_command,
+ time_filter_re, time_format)
+
+# Stop the guest
+for i in range(stop_iterations):
+# Get time before current iteration
+(ht0_, gt0_) = kvm_test_utils.get_time(session, time_command,
+   time_filter_re, time_format)
+# Run current iteration
+logging.info(Stop %s second: iteration %d of %d... %
+ (stop_time, i + 1, stop_iterations))
+
+kvm_test_utils.stop(vm)
+time.sleep(stop_time)
+kvm_test_utils.cont(vm)
+
+# Get time after current iteration
+(ht1_, gt1_) = kvm_test_utils.get_time(session, time_command,
+   time_filter_re, time_format)
+# Report iteration results
+host_delta = ht1_ - ht0_
+guest_delta = gt1_ - gt0_
+drift = abs(host_delta - guest_delta)
+logging.info(Host duration (iteration %d): %.2f %
+ (i + 1, host_delta))
+logging.info(Guest duration (iteration %d): %.2f %
+ (i + 1, guest_delta))
+logging.info(Drift at iteration %d: %.2f seconds %
+ (i + 1, drift))
+# Fail if necessary
+if drift  drift_threshold_single:
+raise error.TestFail(Time drift too large at iteration %d: 
+ %.2f seconds % (i + 1, drift))
+
+# Get final time
+(ht1, gt1) = 

KVM: MMU: always invalidate and flush on spte page size change

2010-05-28 Thread Marcelo Tosatti

Always invalidate spte and flush TLBs when changing page size, to make
sure different sized translations for the same address are never cached
in a CPU's TLB.

The first case where this occurs is when a non-leaf spte pointer is
overwritten by a leaf, large spte entry. This can happen after dirty
logging is disabled on a memslot, for example.

The second case is a leaf, large spte entry is overwritten with a
non-leaf spte pointer, in __direct_map. Note this cannot happen now
because the only potential source of such overwrite is dirty logging
being enabled, which zaps all MMU pages. But this might change 
in the future, so better be robust against it.

Noticed by Andrea.

KVM-Stable-Tag
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1952,6 +1952,8 @@ static void mmu_set_spte(struct kvm_vcpu
 
child = page_header(pte  PT64_BASE_ADDR_MASK);
mmu_page_remove_parent_pte(child, sptep);
+   __set_spte(sptep, shadow_trap_nonpresent_pte);
+   kvm_flush_remote_tlbs(vcpu-kvm);
} else if (pfn != spte_to_pfn(*sptep)) {
pgprintk(hfn old %lx new %lx\n,
 spte_to_pfn(*sptep), pfn);
@@ -2015,6 +2017,16 @@ static int __direct_map(struct kvm_vcpu 
break;
}
 
+   if (is_shadow_present_pte(*iterator.sptep) 
+   !is_large_pte(*iterator.sptep))
+   continue;
+
+   if (is_large_pte(*iterator.sptep)) {
+   rmap_remove(vcpu-kvm, iterator.sptep);
+   __set_spte(iterator.sptep, shadow_trap_nonpresent_pte);
+   kvm_flush_remote_tlbs(vcpu-kvm);
+   }
+
if (*iterator.sptep == shadow_trap_nonpresent_pte) {
u64 base_addr = iterator.addr;
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-28 Thread Michael S. Tsirkin
On Thu, May 27, 2010 at 11:20:22PM +0200, Tejun Heo wrote:
 Hello, Michael.
 
 On 05/27/2010 07:32 PM, Michael S. Tsirkin wrote:
  Well, this is why I proposed adding a new API for creating
  workqueue within workqueue.c, rather than exposing the task
  and attaching it to cgroups in our driver: so that workqueue
  maintainers can fix the implementation if it ever changes.
  
  And after all, it's an internal API, we can always change
  it later if we need.
 ...
  Well, yes but we are using APIs like flush_work etc. These are very
  handy.  It seems much easier than rolling our own queue on top of kthread.
 
 The thing is that this kind of one-off usage becomes problemetic when
 you're trying to change the implementation detail.  All current
 workqueue users don't care which thread they run on and they shouldn't
 as each work owns the context only for the duration the work is
 executing.  If this sort of fundamental guidelines are followed, the
 implementation can be improved in pretty much transparent way but when
 you start depending on specific implementation details, things become
 messy pretty quickly.
 
 If this type of usage were more common, adding proper way to account
 work usage according to cgroups would make sense but that's not the
 case here and I removed the only exception case recently while trying
 to implement cmwq and if this is added.  So, this would be the only
 one which makes such extra assumptions in the whole kernel.  One way
 or the other, workqueue needs to be improved and I don't really think
 adding the single exception at this point is a good idea.
 
 The thing I realized after stop_machine conversion was that there was
 no reason to use workqueue there at all.  There already are more than
 enough not-too-difficult synchronization constructs and if you're
 using a thread for dedicated purposes, code complexity isn't that
 different either way.  Plus, it would also be clearer that dedicated
 threads are required there for what reason.  So, I strongly suggest
 using a kthread.  If there are issues which are noticeably difficult
 to solve with kthread, we can definitely talk about that and think
 about things again.
 
 Thank you.

Well, we have create_singlethread_workqueue, right?
This is not very different ... is it?

Just copying structures and code from workqueue.c,
adding vhost_ in front of it will definitely work:
there is nothing magic about the workqueue library.
But this just involves cut and paste which might be best avoided.
One final idea before we go the cut and paste way: how about
'create_workqueue_from_task' that would get a thread and have workqueue
run there?

 -- 
 tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-28 Thread Tejun Heo
Hello,

On 05/28/2010 05:08 PM, Michael S. Tsirkin wrote:
 Well, we have create_singlethread_workqueue, right?
 This is not very different ... is it?
 
 Just copying structures and code from workqueue.c,
 adding vhost_ in front of it will definitely work:

Sure it will, but you'll probably be able to get away with much less.

 there is nothing magic about the workqueue library.
 But this just involves cut and paste which might be best avoided.

What I'm saying is that some magic needs to be added to workqueue and
if you add this single(!) exception, it will have to be backed out
pretty soon, so it would be better to do it properly now.

 One final idea before we go the cut and paste way: how about
 'create_workqueue_from_task' that would get a thread and have workqueue
 run there?

You can currently depend on that implementation detail but it's not
the workqueue interface is meant to do.  The single threadedness is
there as execution ordering and concurrency specification and it
doesn't (or rather won't) necessarily mean that a specific single
thread is bound to certain workqueue.

Can you please direct me to have a look at the code.  I'll be happy to
do the conversion for you.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/1] ceph/rbd block driver for qemu-kvm (v2)

2010-05-28 Thread Kevin Wolf
Am 27.05.2010 21:11, schrieb Christian Brunner:
 This is a block driver for the distributed file system Ceph 
 (http://ceph.newdream.net/). This driver uses librados (which 
 is part of the Ceph server) for direct access to the Ceph object 
 store and is running entirely in userspace. Therefore it is 
 called rbd - rados block device.
 
 To compile the driver a recent version of ceph (unstable/testin git 
 head or 0.20.3 once it is released) is needed and you have to 
 --enable-rbd when running configure.
 
 Additional information is available on the Ceph-Wiki:
 
 http://ceph.newdream.net/wiki/Kvm-rbd
 
 The patch is based on git://repo.or.cz/qemu/kevin.git block

Signed-off-by line is missing.

 ---
  Makefile  |3 +
  Makefile.objs |1 +
  block/rbd.c   |  584 
 +
  block/rbd_types.h |   52 +
  configure |   27 +++
  5 files changed, 667 insertions(+), 0 deletions(-)
  create mode 100644 block/rbd.c
  create mode 100644 block/rbd_types.h
 
 diff --git a/Makefile b/Makefile
 index 7986bf6..8d09612 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -27,6 +27,9 @@ configure: ;
  $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw)
  
  LIBS+=-lz $(LIBS_TOOLS)
 +ifdef CONFIG_RBD
 +LIBS+=-lrados
 +endif

You already write the -lrados option to config-host.mak in configure, so
this looks unnecessary.

  
  ifdef BUILD_DOCS
  DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8
 diff --git a/Makefile.objs b/Makefile.objs
 index 1a942e5..08dc11f 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -18,6 +18,7 @@ block-nested-y += parallels.o nbd.o blkdebug.o
  block-nested-$(CONFIG_WIN32) += raw-win32.o
  block-nested-$(CONFIG_POSIX) += raw-posix.o
  block-nested-$(CONFIG_CURL) += curl.o
 +block-nested-$(CONFIG_RBD) += rbd.o
  
  block-obj-y +=  $(addprefix block/, $(block-nested-y))
  
 diff --git a/block/rbd.c b/block/rbd.c
 new file mode 100644
 index 000..375ae9d
 --- /dev/null
 +++ b/block/rbd.c
 @@ -0,0 +1,584 @@
 +/*
 + * QEMU Block driver for RADOS (Ceph)
 + *
 + * Copyright (C) 2010 Christian Brunner c...@muc.de
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + *
 + */
 +
 +#include qemu-common.h
 +#include sys/types.h
 +#include stdbool.h
 +
 +#include qemu-common.h
 +
 +#include rbd_types.h
 +#include module.h
 +#include block_int.h
 +
 +#include stdio.h
 +#include stdlib.h
 +#include rados/librados.h
 +
 +#include signal.h
 +
 +/*
 + * When specifying the image filename use:
 + *
 + * rbd:poolname/devicename
 + *
 + * poolname must be the name of an existing rados pool
 + *
 + * devicename is the basename for all objects used to
 + * emulate the raw device.
 + *
 + * Metadata information (image size, ...) is stored in an
 + * object with the name devicename.rbd.
 + *
 + * The raw device is split into 4MB sized objects by default.
 + * The sequencenumber is encoded in a 12 byte long hex-string,
 + * and is attached to the devicename, separated by a dot.
 + * e.g. devicename.1234567890ab
 + *
 + */
 +
 +#define OBJ_MAX_SIZE (1UL  OBJ_DEFAULT_OBJ_ORDER)
 +
 +typedef struct RBDAIOCB {
 +BlockDriverAIOCB common;
 +QEMUBH *bh;
 +int ret;
 +QEMUIOVector *qiov;
 +char *bounce;
 +int write;
 +int64_t sector_num;
 +int aiocnt;
 +int error;
 +} RBDAIOCB;
 +
 +typedef struct RADOSCB {
 +int rcbid;
 +RBDAIOCB *acb;
 +int done;
 +int64_t segsize;
 +char *buf;
 +} RADOSCB;
 +
 +typedef struct RBDRVRBDState {
 +rados_pool_t pool;
 +char name[RBD_MAX_OBJ_NAME_SIZE];
 +int name_len;

name_len looks unused.

 +uint64_t size;
 +uint64_t objsize;
 +} RBDRVRBDState;

Hm, you mean BDRVRBDState?

Maybe ceph would have been a better driver name to avoid such type
names. ;-)

 +
 +typedef struct rbd_obj_header_ondisk RbdHeader1;
 +
 +static int rbd_parsename(const char *filename, char *pool, char *name)
 +{
 +const char *rbdname;
 +char *p, *n;
 +int l;
 +
 +if (!strstart(filename, rbd:, rbdname)) {
 +return -EINVAL;
 +}
 +
 +pstrcpy(pool, 2 * RBD_MAX_SEG_NAME_SIZE, rbdname);

Why twice the size? The callers pass a char[RBD_MAX_SEG_NAME_SIZE], so
doesn't this allow buffer overflows?

 +p = strchr(pool, '/');
 +if (p == NULL) {
 +return -EINVAL;
 +}
 +
 +*p = '\0';
 +n = ++p;

Why introduce a new variable here? p isn't used any more afterwards.

 +
 +l = strlen(n);
 +
 +if (l  RBD_MAX_OBJ_NAME_SIZE) {
 +fprintf(stderr, object name to long\n);

Off by one, you need to consider the trailing '\0'.

Also, please use error_report instead of fprintf(stderr, ...) for real
error messages. Directly printing to stderr is okay for debug code.

 +return -EINVAL;
 +} else if (l = 0) {
 +fprintf(stderr, object name to short\n);
 +return -EINVAL;
 +}
 +
 +strcpy(name, n);
 +
 +return l;
 

Re: Perf trace event parse errors for KVM events

2010-05-28 Thread Stefan Hajnoczi
I get parse errors when using Steven Rostedt's trace-cmd tool, too.

Any ideas what is going on here?  I can provide more info (e.g. trace
files) if necessary.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Reminder] KVM Forum 2010: Early Bird Registration

2010-05-28 Thread KVM Forum 2010 Program Committee
Just a reminder...The early bird registration period ends on May 30th.

It's shaping up to be an excellent KVM Forum, look forward to seeing you
there.

Registration link is here:

http://events.linuxfoundation.org/component/registrationpro/?func=detailsdid=34

thanks,
-KVM Forum 2010 Program Commitee
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device passthrough

2010-05-28 Thread Chris Wright
* Mu Lin (m...@juniper.net) wrote:
 Is there any method to directly assign a device to Guest OS  without VT-d?

Assuming you mean a PCI device, no, there isn't.

Without an IOMMU[1] you can't directly assign a PCI device to a guest
(nor is it safe).  There have been patches floating around to allow
this, but they don't maintain secure isolation.

thanks,
-chris

[1] VT-d is an Intel chipset feature, so you could certainly do it on an
AMD platform that has an AMD IOMMU.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Another SIGFPE in display code, now in cirrus

2010-05-28 Thread Michael Tokarev

12.05.2010 22:11, Stefano Stabellini wrote:

On Wed, 12 May 2010, Jamie Lokier wrote:

Stefano Stabellini wrote:

On Wed, 12 May 2010, Avi Kivity wrote:

It's useful if you have a one-line horizontal pattern you want to
propagate all over.


It might be useful all right, but it is not entirely clear what the
hardware should do in this situation from the documentation we have, and
certainly the current state of the cirrus emulation code doesn't help.


It's quite a reasonable thing for hardware to do, even if not documented.
It would be surprising if the hardware didn't copy the one-line pattern.


All right then, you convinced me :)

This is my proposed solution, however it is untested with Windows NT.


Signed-off-by: Stefano Stabellinistefano.stabell...@eu.citrix.com


So.. what's the status of this, after all? :)
As far as I can tell, it has been agreed that the patch
is good, and verified that it fixes the problem.  Should
we just throw it away and start from scratch, or what? :)

Thanks!


diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index 9f61a01..a7f0d3c 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -676,15 +676,17 @@ static void cirrus_do_copy(CirrusVGAState *s, int dst, 
int src, int w, int h)
  int sx, sy;
  int dx, dy;
  int width, height;
+uint32_t start_addr, line_offset, line_compare;
  int depth;
  int notify = 0;

  depth = s-vga.get_bpp(s-vga) / 8;
  s-vga.get_resolution(s-vga,width,height);
+s-vga.get_offsets(s-vga,line_offset,start_addr,line_compare);

  /* extra x, y */
-sx = (src % ABS(s-cirrus_blt_srcpitch)) / depth;
-sy = (src / ABS(s-cirrus_blt_srcpitch));
+sx = (src % line_offset) / depth;
+sy = (src / line_offset);
  dx = (dst % ABS(s-cirrus_blt_dstpitch)) / depth;
  dy = (dst / ABS(s-cirrus_blt_dstpitch));

@@ -725,18 +727,23 @@ static void cirrus_do_copy(CirrusVGAState *s, int dst, 
int src, int w, int h)
  s-cirrus_blt_dstpitch, s-cirrus_blt_srcpitch,
  s-cirrus_blt_width, s-cirrus_blt_height);

-if (notify)
-   qemu_console_copy(s-vga.ds,
- sx, sy, dx, dy,
- s-cirrus_blt_width / depth,
- s-cirrus_blt_height);
-
-/* we don't have to notify the display that this portion has
-   changed since qemu_console_copy implies this */
-
-cirrus_invalidate_region(s, s-cirrus_blt_dstaddr,
-   s-cirrus_blt_dstpitch, s-cirrus_blt_width,
-   s-cirrus_blt_height);
+ if (ABS(s-cirrus_blt_dstpitch) != line_offset ||
+ ABS(s-cirrus_blt_srcpitch) != line_offset) {
+ /* this is not going to happen very often */
+ vga_hw_invalidate();
+ } else {
+ if (notify)
+ /* we don't have to notify the display that this portion has
+changed since qemu_console_copy implies this */
+ qemu_console_copy(s-vga.ds,
+   sx, sy, dx, dy,
+   s-cirrus_blt_width / depth,
+   s-cirrus_blt_height);
+ else
+ cirrus_invalidate_region(s, s-cirrus_blt_dstaddr,
+  s-cirrus_blt_dstpitch, 
s-cirrus_blt_width,
+  s-cirrus_blt_height);
+ }
  }

  static int cirrus_bitblt_videotovideo_copy(CirrusVGAState * s)
diff --git a/hw/cirrus_vga_rop.h b/hw/cirrus_vga_rop.h
index 39a7b72..80f135b 100644
--- a/hw/cirrus_vga_rop.h
+++ b/hw/cirrus_vga_rop.h
@@ -32,10 +32,10 @@ glue(cirrus_bitblt_rop_fwd_, ROP_NAME)(CirrusVGAState *s,
  dstpitch -= bltwidth;
  srcpitch -= bltwidth;

-if (dstpitch  0 || srcpitch  0) {
-/* is 0 valid? srcpitch == 0 could be useful */
+if (dstpitch  0)
  return;
-}
+if (srcpitch  0)
+srcpitch = 0;

  for (y = 0; y  bltheight; y++) {
  for (x = 0; x  bltwidth; x++) {
@@ -57,6 +57,12 @@ glue(cirrus_bitblt_rop_bkwd_, ROP_NAME)(CirrusVGAState *s,
  int x,y;
  dstpitch += bltwidth;
  srcpitch += bltwidth;
+
+if (dstpitch  0)
+return;
+if (srcpitch  0)
+srcpitch = 0;
+
  for (y = 0; y  bltheight; y++) {
  for (x = 0; x  bltwidth; x++) {
  ROP_OP(*dst, *src);
@@ -78,6 +84,12 @@ glue(glue(cirrus_bitblt_rop_fwd_transp_, 
ROP_NAME),_8)(CirrusVGAState *s,
  uint8_t p;
  dstpitch -= bltwidth;
  srcpitch -= bltwidth;
+
+if (dstpitch  0)
+return;
+if (srcpitch  0)
+srcpitch = 0;
+
  for (y = 0; y  bltheight; y++) {
  for (x = 0; x  bltwidth; x++) {
p = *dst;
@@ -101,6 +113,12 @@ glue(glue(cirrus_bitblt_rop_bkwd_transp_, 
ROP_NAME),_8)(CirrusVGAState *s,
  uint8_t p;
  dstpitch += bltwidth;
  srcpitch += bltwidth;
+
+if (dstpitch  0)
+return;
+if (srcpitch  0)
+srcpitch = 0;
+
  for (y = 0; 

Re: [Qemu-devel] [PATCH 1/1] ceph/rbd block driver for qemu-kvm (v2)

2010-05-28 Thread Christian Brunner
Hi Kevin,

thanks for your review notes. Yehuda and I have already worked this into the git
tree on the ceph site.

I'll do some testing on Monday. After that I'll send an updated patch.

Regards,
Christian

2010/5/28 Kevin Wolf kw...@redhat.com:
 Am 27.05.2010 21:11, schrieb Christian Brunner:
 This is a block driver for the distributed file system Ceph
 (http://ceph.newdream.net/). This driver uses librados (which
 is part of the Ceph server) for direct access to the Ceph object
 store and is running entirely in userspace. Therefore it is
 called rbd - rados block device.

 To compile the driver a recent version of ceph (unstable/testin git
 head or 0.20.3 once it is released) is needed and you have to
 --enable-rbd when running configure.

 Additional information is available on the Ceph-Wiki:

 http://ceph.newdream.net/wiki/Kvm-rbd

 The patch is based on git://repo.or.cz/qemu/kevin.git block

 Signed-off-by line is missing.

 ---
  Makefile          |    3 +
  Makefile.objs     |    1 +
  block/rbd.c       |  584 
 +
  block/rbd_types.h |   52 +
  configure         |   27 +++
  5 files changed, 667 insertions(+), 0 deletions(-)
  create mode 100644 block/rbd.c
  create mode 100644 block/rbd_types.h

 diff --git a/Makefile b/Makefile
 index 7986bf6..8d09612 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -27,6 +27,9 @@ configure: ;
  $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw)

  LIBS+=-lz $(LIBS_TOOLS)
 +ifdef CONFIG_RBD
 +LIBS+=-lrados
 +endif

 You already write the -lrados option to config-host.mak in configure, so
 this looks unnecessary.


  ifdef BUILD_DOCS
  DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8
 diff --git a/Makefile.objs b/Makefile.objs
 index 1a942e5..08dc11f 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -18,6 +18,7 @@ block-nested-y += parallels.o nbd.o blkdebug.o
  block-nested-$(CONFIG_WIN32) += raw-win32.o
  block-nested-$(CONFIG_POSIX) += raw-posix.o
  block-nested-$(CONFIG_CURL) += curl.o
 +block-nested-$(CONFIG_RBD) += rbd.o

  block-obj-y +=  $(addprefix block/, $(block-nested-y))

 diff --git a/block/rbd.c b/block/rbd.c
 new file mode 100644
 index 000..375ae9d
 --- /dev/null
 +++ b/block/rbd.c
 @@ -0,0 +1,584 @@
 +/*
 + * QEMU Block driver for RADOS (Ceph)
 + *
 + * Copyright (C) 2010 Christian Brunner c...@muc.de
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + *
 + */
 +
 +#include qemu-common.h
 +#include sys/types.h
 +#include stdbool.h
 +
 +#include qemu-common.h
 +
 +#include rbd_types.h
 +#include module.h
 +#include block_int.h
 +
 +#include stdio.h
 +#include stdlib.h
 +#include rados/librados.h
 +
 +#include signal.h
 +
 +/*
 + * When specifying the image filename use:
 + *
 + * rbd:poolname/devicename
 + *
 + * poolname must be the name of an existing rados pool
 + *
 + * devicename is the basename for all objects used to
 + * emulate the raw device.
 + *
 + * Metadata information (image size, ...) is stored in an
 + * object with the name devicename.rbd.
 + *
 + * The raw device is split into 4MB sized objects by default.
 + * The sequencenumber is encoded in a 12 byte long hex-string,
 + * and is attached to the devicename, separated by a dot.
 + * e.g. devicename.1234567890ab
 + *
 + */
 +
 +#define OBJ_MAX_SIZE (1UL  OBJ_DEFAULT_OBJ_ORDER)
 +
 +typedef struct RBDAIOCB {
 +    BlockDriverAIOCB common;
 +    QEMUBH *bh;
 +    int ret;
 +    QEMUIOVector *qiov;
 +    char *bounce;
 +    int write;
 +    int64_t sector_num;
 +    int aiocnt;
 +    int error;
 +} RBDAIOCB;
 +
 +typedef struct RADOSCB {
 +    int rcbid;
 +    RBDAIOCB *acb;
 +    int done;
 +    int64_t segsize;
 +    char *buf;
 +} RADOSCB;
 +
 +typedef struct RBDRVRBDState {
 +    rados_pool_t pool;
 +    char name[RBD_MAX_OBJ_NAME_SIZE];
 +    int name_len;

 name_len looks unused.

 +    uint64_t size;
 +    uint64_t objsize;
 +} RBDRVRBDState;

 Hm, you mean BDRVRBDState?

 Maybe ceph would have been a better driver name to avoid such type
 names. ;-)

 +
 +typedef struct rbd_obj_header_ondisk RbdHeader1;
 +
 +static int rbd_parsename(const char *filename, char *pool, char *name)
 +{
 +    const char *rbdname;
 +    char *p, *n;
 +    int l;
 +
 +    if (!strstart(filename, rbd:, rbdname)) {
 +        return -EINVAL;
 +    }
 +
 +    pstrcpy(pool, 2 * RBD_MAX_SEG_NAME_SIZE, rbdname);

 Why twice the size? The callers pass a char[RBD_MAX_SEG_NAME_SIZE], so
 doesn't this allow buffer overflows?

 +    p = strchr(pool, '/');
 +    if (p == NULL) {
 +        return -EINVAL;
 +    }
 +
 +    *p = '\0';
 +    n = ++p;

 Why introduce a new variable here? p isn't used any more afterwards.

 +
 +    l = strlen(n);
 +
 +    if (l  RBD_MAX_OBJ_NAME_SIZE) {
 +        fprintf(stderr, object name to long\n);

 Off by one, you need to consider the trailing '\0'.

 Also, please use error_report instead of fprintf(stderr, ...) for real
 error 

Re: Perf trace event parse errors for KVM events

2010-05-28 Thread Marcelo Tosatti
On Fri, May 28, 2010 at 05:42:51PM +0100, Stefan Hajnoczi wrote:
 I get parse errors when using Steven Rostedt's trace-cmd tool, too.
 
 Any ideas what is going on here?  I can provide more info (e.g. trace
 files) if necessary.

Non standard print_format for the problematic entries?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Perf trace event parse errors for KVM events

2010-05-28 Thread Steven Rostedt
On Fri, 2010-05-28 at 17:42 +0100, Stefan Hajnoczi wrote:
 I get parse errors when using Steven Rostedt's trace-cmd tool, too.
 
 Any ideas what is going on here?  I can provide more info (e.g. trace
 files) if necessary.

Does trace-cmd fail on the same tracepoints? Have you checkout the
latest code?.

I do know it fails on some of the KVM tracerpoints since the formatting
they use is obnoxious.

Could you show the print-fmt of the failing events?

Thanks,

-- Steve


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: device passthrough

2010-05-28 Thread Mu Lin
Thanks, Chris,

Do you know where is the patch, I just need something quick and dirty for now, 
my shining new board does have VT-d but the BIOS is not ready yet, I want to 
have something working now.

Mu 

 -Original Message-
 From: Chris Wright [mailto:chr...@sous-sol.org] 
 Sent: Friday, May 28, 2010 12:33 PM
 To: Mu Lin
 Cc: kvm@vger.kernel.org
 Subject: Re: device passthrough
 
 * Mu Lin (m...@juniper.net) wrote:
  Is there any method to directly assign a device to Guest OS 
  without VT-d?
 
 Assuming you mean a PCI device, no, there isn't.
 
 Without an IOMMU[1] you can't directly assign a PCI device to 
 a guest (nor is it safe).  There have been patches floating 
 around to allow this, but they don't maintain secure isolation.
 
 thanks,
 -chris
 
 [1] VT-d is an Intel chipset feature, so you could certainly 
 do it on an AMD platform that has an AMD IOMMU.
 --
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device passthrough

2010-05-28 Thread Chris Wright
* Mu Lin (m...@juniper.net) wrote:
 Do you know where is the patch, I just need something quick and dirty
 for now, my shining new board does have VT-d but the BIOS is not ready
 yet, I want to have something working now.

Sorry, I don't have a handy pointer.  You can search for either pv dma
changes (paravirtualizing the guest's request for dma addrs so that it
gets host physical addr to program card for dma) or reserved-ram for
pci-passthrough (1:1 mapping of guest to host physical memory).  I don't
recall a recent attempt to bring them forward, so expect anything you
find to be quite stale.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-05-28 Thread Randy Dunlap
Hi,


On Fri, 28 May 2010 16:07:38 -0700 Tom Lyon wrote:

Missing diffstat -p1 -w 70:

 Documentation/vfio.txt |  176 
 MAINTAINERS|7 
 drivers/Kconfig|2 
 drivers/Makefile   |1 
 drivers/vfio/Kconfig   |9 
 drivers/vfio/Makefile  |5 
 drivers/vfio/vfio_dma.c|  372 ++
 drivers/vfio/vfio_intrs.c  |  189 +
 drivers/vfio/vfio_main.c   |  627 +++
 drivers/vfio/vfio_pci_config.c |  554 +++
 drivers/vfio/vfio_rdwr.c   |  147 +++
 drivers/vfio/vfio_sysfs.c  |  153 +++
 include/linux/vfio.h   |  193 +
 13 files changed, 2435 insertions(+)


which shows that the patch is missing an update to
Documentation/ioctl/ioctl-number.txt for ioctl code ';'.  Please add that.


 diff -uprN linux-2.6.34/drivers/vfio/Kconfig 
 vfio-linux-2.6.34/drivers/vfio/Kconfig
 --- linux-2.6.34/drivers/vfio/Kconfig 1969-12-31 16:00:00.0 -0800
 +++ vfio-linux-2.6.34/drivers/vfio/Kconfig2010-05-27 17:07:25.0 
 -0700
 @@ -0,0 +1,9 @@
 +menuconfig VFIO
 + tristate Non-Priv User Space PCI drivers

  Non-privileged

 + depends on PCI
 + help
 +   Driver to allow advanced user space drivers for PCI, PCI-X,
 +   and PCIe devices.  Requires IOMMU to allow non-privilged

 non-privileged

 +   processes to directly control the PCI devices.
 +
 +   If you don't know what to do here, say N.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-05-28 Thread Randy Dunlap
On Fri, 28 May 2010 16:07:38 -0700 Tom Lyon wrote:

 diff -uprN linux-2.6.34/Documentation/vfio.txt 
 vfio-linux-2.6.34/Documentation/vfio.txt
 --- linux-2.6.34/Documentation/vfio.txt   1969-12-31 16:00:00.0 
 -0800
 +++ vfio-linux-2.6.34/Documentation/vfio.txt  2010-05-28 14:03:05.0 
 -0700
 @@ -0,0 +1,176 @@
 +---
 +The VFIO driver is used to allow privileged AND non-privileged processes to
 +implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe
 +devices.
 +
 +Why is this interesting?  Some applications, especially in the high 
 performance
 +computing field, need access to hardware functions with as little overhead as
 +possible. Examples are in network adapters (typically non tcp/ip based) and

 non-TCP/IP-based)

 +in compute accelerators - i.e., array processors, FPGA processors, etc.
 +Previous to the VFIO drivers these apps would need either a kernel-level
 +driver (with corrsponding overheads), or else root permissions to directly

corresponding

 +access the hardware. The VFIO driver allows generic access to the hardware
 +from non-privileged apps IF the hardware is well-behaved enough for this
 +to be safe.
 +
 +While there have long been ways to implement user-level drivers using 
 specific
 +corresponding drivers in the kernel, it was not until the introduction of the
 +UIO driver framework, and the uio_pci_generic driver that one could have a
 +generic kernel component supporting many types of user level drivers. 
 However,
 +even with the uio_pci_generic driver, processes implementing the user level
 +drivers had to be trusted - they could do dangerous manipulation of DMA
 +addreses and were required to be root to write PCI configuration space
 +registers.
 +
 +Recent hardware technologies - I/O MMUs and PCI I/O Virtualization - provide
 +new hardware capabilities which the VFIO solution exploits to allow non-root
 +user level drivers. The main role of the IOMMU is to ensure that DMA accesses
 +from devices go only to the appropriate memory locations, this allows VFIO to

  locations;

 +ensure that user level drivers do not corrupt inappropriate memory.  PCI I/O
 +virtualization (SR-IOV) was defined to allow pass-through of virtual 
 devices
 +to guest virtual machines. VFIO in essence implements pass-through of devices
 +to user processes, not virtual machines.  SR-IOV devices implement a
 +traditional PCI device (the physical function) and a dynamic number of 
 special
 +PCI devices (virtual functions) whose feature set is somewhat restricted - in
 +order to allow the operating system or virtual machine monitor to ensure the
 +safe operation of the system.
 +
 +Any SR-IOV virtual function meets the VFIO definition of well-behaved, but
 +there are many other non-IOV PCI devices which also meet the defintion.
 +Elements of this definition are:
 +- The size of any memory BARs to be mmap'ed into the user process space must 
 be
 +  a multiple of the system page size.
 +- If MSI-X interrupts are used, the device driver must not attempt to mmap or
 +  write the MSI-X vector area.
 +- If the device is a PCI device (not PCI-X or PCIe), it must conform to PCI
 +  revision 2.3 to allow its interrupts to be masked in a generic way.
 +- The device must not use the PCI configuration space in any non-standard 
 way,
 +  i.e., the user level driver will be permitted only to read and write 
 standard
 +  fields of the PCI config space, and only if those fields cannot cause harm 
 to
 +  the system. In addition, some fields are virtualized, so that the user
 +  driver can read/write them like a kernel driver, but they do not affect the
 +  real device.
 +- For now, there is no support for user access to the PCIe and PCI-X extended
 +  capabilities configuration space.
 +
 +Even with these restrictions, there are bound to be devices which are unsafe
 +for user level use - it is still up to the system admin to decide whether to
 +grant access to the device.  When the vfio module is loaded, it will have
 +access to no devices until the desired PCI devices are bound to the driver.
 +First, make sure the devices are not bound to another kernel driver. You can
 +unload that driver if you wish to unbind all its devices, or else enter the
 +driver's sysfs directory, and unbind a specific device:
 + cd /sys/bus/pci/drivers/drivername
 + echo :06:02.00  unbind
 +(The :06:02.00 is a fully qualified PCI device name - different for each
 +device).  Now, to bind to the vfio driver, go to /sys/bus/pci/drivers/vfio 
 and
 +write the PCI device type of the target device to the new_id file:
 + echo 8086 10ca  new_id
 +(8086 10ca are the vendor and device type for the Intel 82576 virtual 
 function
 +devices). A /dev/vfioN entry will be created for each device bound.