RE: [PATCH 3/4][VTD] vt-d hooks in generic KVM sources

2008-06-20 Thread Han, Weidong
Avi Kivity wrote:
> Kay, Allen M wrote:
>> vt-d hooks in generic KVM sources for mapping guest memory with vt-d
>> page table. 
>> 
>> Signed-off-by: Allen M. Kay <[EMAIL PROTECTED]>
>> 
>> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
>> index c97d35c..f635fb0 100644
>> --- a/arch/x86/kvm/Makefile
>> +++ b/arch/x86/kvm/Makefile
>> @@ -10,7 +10,7 @@ endif
>>  EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
>> 
>>  kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o
>> lapic.o \ -i8254.o
>> +i8254.o vtd.o
> 
> This breaks the build.

Why does it break the build? It works for me. 

> 
>> /kvm/x86.c b/arch/x86/kvm/x86.c
>> index d8bc492..61052e1 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -28,6 +28,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>> 
>>  #include 
>>  #include 
>> @@ -351,6 +352,8 @@ static void kvm_free_pci_passthrough(struct kvm
>> *kvm) 
>> 
>>  list_del(&pci_pt_dev->list);
>>  }
>> +if (kvm_intel_iommu_found())
>> +kvm->arch.domain = NULL;
> 
> "domain" is much too generic.  Need something like intel_iommu_domain
> (later we can transform it to iommu_domain as we make it non-intel
> dependent; also move it out of arch so ia64 can benefit too).
> 
>>  write_unlock_irqrestore(&kvm_pci_pt_lock, flags);  }
>> 
>> @@ -1958,6 +1961,11 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>  r = kvm_vm_ioctl_pci_pt_dev(kvm, &pci_pt_dev);  if
>>  (r) goto out;
>> +if (kvm_intel_iommu_found()) {
>> +r = kvm_iommu_map_guest(kvm, &pci_pt_dev); +   
>> if (r) +goto out;
>> +}
> 
> Need to undo the effects of kvm_vm_ioctl_pci_pt_dev() on failure.
> 

kvm_vm_ioctl_pci_pt_dev() does cleanup work by itself when it's failed.

Randy (Weidong)

>> 
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index e8f9fda..7211823 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -388,6 +388,11 @@ int __kvm_set_memory_region(struct kvm *kvm,   
>> } 
>> 
>>  kvm_free_physmem_slot(&old, &new);
>> +
>> +/* map the pages in iommu page table */
>> +if (kvm_intel_iommu_found())
>> +kvm_iommu_map_pages(kvm, base_gfn, npages); +
>>  return 0;
> 
> This is generic code.  As this is arch specific for now, please move
> it to arch code.
> 
> Also, make sure that each patch builds cleanly.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/4][VTD] modifications to intel-iommu.c.

2008-06-20 Thread Han, Weidong
Avi Kivity wrote:
> Kay, Allen M wrote:
>> Modification to intel-iommu.c to support vt-d page table and context
>> table mapping in kvm.  Mods to dmar.c and iova.c are due to header
>> file moves to include/linux. 
>> 
> 
>> diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
>> index f941f60..a58a5b0 100644
>> --- a/drivers/pci/dmar.c
>> +++ b/drivers/pci/dmar.c
>> @@ -26,8 +26,8 @@
>> 
>>  #include 
>>  #include 
>> -#include "iova.h"
>> -#include "intel-iommu.h"
>> +#include 
>> +#include 
> 
> This should have been done in the file movement patch to avoid
> breaking the build.
> 
>> 
>> 
>> +void kvm_intel_iommu_domain_exit(struct dmar_domain *domain)
> 
> This should be a generic API, not a kvm specific one.
> 
>> +{
>> +u64 end;
>> +
>> +/* Domain 0 is reserved, so dont process it */ +if (!domain)
>> +return;
> 
> 'domain' here is a pointer, not an identifier.
> 
>> 
>> +int kvm_intel_iommu_context_mapping(
>> +struct dmar_domain *domain, struct pci_dev *pdev) +{
>> +int rc;
>> +rc = domain_context_mapping(domain, pdev);
>> +return rc;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_intel_iommu_context_mapping);
> 
> What does the return value mean?

It means it whether mapping context succeeded or failed. It must
succeed, or VT-d can't work for the device. I found the return value of
kvm_intel_iommu_context_mapping() was not used, I will add checks for
them.

> 
>> +
>> +int kvm_intel_iommu_page_mapping(
>> +struct dmar_domain *domain, dma_addr_t iova,
>> +u64 hpa, size_t size, int prot)
>> +{
>> +int rc;
>> +rc = domain_page_mapping(domain, iova, hpa, size, prot); +   
>> return rc; +}
>> +EXPORT_SYMBOL_GPL(kvm_intel_iommu_page_mapping);
> 
> The function name makes it sound like it's retrieving information.  If
> it does something, put a verb in there.

We use the names just like what the kernel VT-d code use. It keeps
consistent.

Randy (Weidong)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PCI PT: irq issue

2008-06-20 Thread Han, Weidong
Amit Shah wrote:
> On Thursday 19 June 2008 10:17:29 Amit Shah wrote:
>> * On Wednesday 18 June 2008 18:26:16 Ben-Ami Yassour wrote:
>>> Amit,
>>> 
>>> With the current implementation we have an issue if the driver on
>>> the host was never loaded. 
>>> 
>>> To be able to run kvm with passthrough we have to load and then
>>> unload the driver on the host at least once. After that it works ok.
>> 
>> Yes, whenever a device issues pci_request_regions(), the IRQ may be
>> reassigned. 
>> 
>> The unloading / loading should not be necessary once I commit these
>> changes to the tree. 
>> 
>>> Note that after doing the load and unload the irq as reported by
>>> lspci -v is changed. 
>>> 
>>> The questions that I think we need to figure out are:
>>> 1. How does the loading of the driver on the host causes the irq to
>>> change? 
>>> 2. What other side effects does it do that helps kvm pcipt work?
>>> 3. What do we need to add to the pcipt code that will do the same
>>> "side effect" (or bypass the problem)?
>> 
>> That already answers all these.
>> 
>>> Also note that in the current implementation the user is required to
>>> provide the irq for the device in the kvm command line.
>>> With respect to the comments above it is clear that lspci will show
>>> an irrelevant irq value. Why do we need the user to provide this
>>> information anyhow? 
>>> Why can't KVM find it out automatically?
>> 
>> I thought we discussed this several times. The next commit is going
>> to fix this. 
>> 
>>> Note, if the kernel can find this information then we can also
>>> remove it from the ioctl interface.
>> 
>> Coming soon; coming soon indeed.
> 
> I just pushed out the changes so the trickery with module loading /
> unloading, assigning irq number, etc. are not needed. The userspace
> command line still expects a number for the irq, though, and you can
> pass it any number as long as you use the in-kernel irq handler 
> (this is needed for the irqhook module, which I'm not updating the
> irqhook module as of now). 
> 
> A couple of notes for the VT-d patch:
> - The pci_dev struct is now available in the pci_pt kernel structure,
> so just use that information each time you want to add a device
> instead of searching for it each time.
> - The kernel with KVM VT-d patches doesn't build on the
> kvm-userspace.git tree. Please fix that.
> 

I pulled the latest VT-d branch, and it works fine for me. 

Randy (Weidong)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/4][VTD] kvm vt-d support kernel changes

2008-06-20 Thread Han, Weidong
Avi Kivity wrote:
> Kay, Allen M wrote:
>> Following four patches contains changes for enabling VT-d PCI
>> passthrough.  The patches are located at:
>> 
>> git.kernel.org/pub/scm/linux/kernel/git/amit/kvm.git vtd
>> 
>> 
> 
> Please attach patches as text/plain or inline them.  It's very
> annoying to review application/octet-stream patches.
> 
>> I have incorporated most of the feedbacks from the last RFC
>> submission. It was tested with passthrough an E1000 NIC to a linux
>> guest using irqhook interrupt injection mechanism.
>> 
>> 1) intel_iommu_move.patch: move intel-iommu.h/iova.h to
>> include/linux. 2) intel_iommu_mods.patch: kvm modifications to
>> intel-iommu.c. 
>> 
> 
> What's the upstream path for these (who's the subsystem maintainer)?

Now Gross Mark from Intel is the kernel VT-d maintainer. I have
contacted him and asked him to review these patches.

Randy (Weidong)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-1998355 ] IO Performance

2008-06-20 Thread SourceForge.net
Bugs item #1998355, was opened at 2008-06-20 00:11
Message generated for change (Comment added) made by bjrosen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1998355&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Joshua Rosen (bjrosen)
Assigned to: Nobody/Anonymous (nobody)
Summary: IO Performance 

Initial Comment:
Is there any way of mapping a host's directory into a KVM VM similar to 
VMware's Shared Folder feature?

I've been benchmarking the performance of NCVerilog under various VMs. The 
performance of KVM when using a virtual disk is excellent, in fact it's better 
than VMware Server or VMware Workstation, however if you use
an NFS mounted host directory the performance is unspeakably awful. An NFS 
mounted directory under VMware Server 2.0 (Beta 2) is also slow but it's still 
significantly better than KVM. Using a Shared Folder with VMware Workstation 
eliminates the IO bottleneck, the performance there is about the same as 
accessing a virtual disk.

The system that I did these benchmarks on is a 3GHz Core2 with 8G of RAM. 
VMware was running under CentOS5.1 with a 2.6.23.7 kernel. KVM is running on 
Fedora 9 with a 2.6.25.xx kernel. The Verilog simulation times for my test 
suite are as follows,

Native  06:34
VM Server 2, virtual disk   08:05
VM Server 2, NFS18:37
VM Workstation, shared folder   08:14
KVM, Virtual disk   07:42
KVM, NFS38:36


--

>Comment By: Joshua Rosen (bjrosen)
Date: 2008-06-20 23:29

Message:
Logged In: YES 
user_id=39829
Originator: YES

The instructions for setting up virtio are a little confusing. I have a
CentOS5 VM on Fedora 9. F9 uses the 2.6.25 kernel. 

The wiki says to edit /etc/initramfs-tools/modules however it doesn't say
whether this file is on the host or the guest. There is no
/etc/initramfs-tools directory on either. Would someone please clarify the
procedure for adding a virtio NIC on an F9 host.


--

Comment By: Dor Laor (thekozmo)
Date: 2008-06-20 21:58

Message:
Logged In: YES 
user_id=2124464
Originator: NO

Did you use virtio nic when testing kvm with NFS?
If not, do try, it should boost your performance.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1998355&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-1999184 ] Real mode guests never wake up after an HLT instruction

2008-06-20 Thread SourceForge.net
Bugs item #1999184, was opened at 2008-06-21 01:31
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1999184&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Mohammed Gamal (mgamal)
Assigned to: Nobody/Anonymous (nobody)
Summary: Real mode guests never wake up after an HLT instruction

Initial Comment:
Real mode guests (namely Minix 3 and FreeDOS with HIMEM XMS driver)freeze after 
issuing an hlt instruction and never wake up. 
The problem occurs both with and without commit 36742c5470. Problem disappears 
using -no-kvm switch.

CPU Model:  Intel(R) Core(TM)2 Duo CPU T7250  @ 2.00GHz
KVM version: kvm-69-1687-gd660add
Host Kernel: 2.6.26-rc5 (x86_64)
Guests: FreeDOS and Minix 3.1.2 32-bit
Command: qemu-system-x86_64 -hda /media/sda6/


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1999184&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-1998355 ] IO Performance

2008-06-20 Thread SourceForge.net
Bugs item #1998355, was opened at 2008-06-20 03:11
Message generated for change (Comment added) made by thekozmo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1998355&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Joshua Rosen (bjrosen)
Assigned to: Nobody/Anonymous (nobody)
Summary: IO Performance 

Initial Comment:
Is there any way of mapping a host's directory into a KVM VM similar to 
VMware's Shared Folder feature?

I've been benchmarking the performance of NCVerilog under various VMs. The 
performance of KVM when using a virtual disk is excellent, in fact it's better 
than VMware Server or VMware Workstation, however if you use
an NFS mounted host directory the performance is unspeakably awful. An NFS 
mounted directory under VMware Server 2.0 (Beta 2) is also slow but it's still 
significantly better than KVM. Using a Shared Folder with VMware Workstation 
eliminates the IO bottleneck, the performance there is about the same as 
accessing a virtual disk.

The system that I did these benchmarks on is a 3GHz Core2 with 8G of RAM. 
VMware was running under CentOS5.1 with a 2.6.23.7 kernel. KVM is running on 
Fedora 9 with a 2.6.25.xx kernel. The Verilog simulation times for my test 
suite are as follows,

Native  06:34
VM Server 2, virtual disk   08:05
VM Server 2, NFS18:37
VM Workstation, shared folder   08:14
KVM, Virtual disk   07:42
KVM, NFS38:36


--

Comment By: Dor Laor (thekozmo)
Date: 2008-06-21 00:58

Message:
Logged In: YES 
user_id=2124464
Originator: NO

Did you use virtio nic when testing kvm with NFS?
If not, do try, it should boost your performance.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1998355&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: cache_regs in kvm_emulate_pio

2008-06-20 Thread Marcelo Tosatti
On Fri, Jun 20, 2008 at 11:30:05PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Hi,
>>
>> From my understanding the ->cache_regs call on kvm_emulate_pio() is
>> necessary only on AMD, where vcpu->arch.regs[RAX] is not copied during
>> exit in svm_vcpu_load().
>>
>> On both architectures, the remaining general purpose registers are saved
>> on exit.
>>
>> The following patch saves 100 cycles out of both light and heavy exits
>> on Intel (if correct, kvm_emulate_hypercall and complete_pio could also
>> benefit, thus saving 200 cycles for in-kernel devices).
>>   
>
> ISTR vmwrite as 50 cycles and vmread as much lower.

On my 1.60GHz textbox ->cache_regs takes 114 cycles, measured with
rdtscll() before and after (rdtscll() takes 90 cycles by itself, due to
the barriers I guess, so the exact number was 204 cycles). Calling the
empty ->cache_rax takes 6 cycles.

>> BTW, the decache_regs(vcpu) call at the end of complete_pio() could also
>> be a noop on Intel from what I can tell ?
>>
>>   
>
> I think so.  decache_regs() is actually more important.
>
>> int *exception);
>> +void (*cache_rax)(struct kvm_vcpu *vcpu);
>>  void (*cache_regs)(struct kvm_vcpu *vcpu);
>
> ugh, another callback.  how about instead
>
> /* in vcpu structure */
> u16 regs_available;
> u16 regs_dirty;
>
> /* read from cache if possible */
> if (!test_bit(VCPU_REG_RAX, ®s_available))
>   ->cache_regs();
> printk("%d\n", regs[VCPU_REGS_RAX]);
>
> /* write to cache, ->vcpu_run() will flush */
> regs[VCPU_REGS_RAX] = 17;
> __set_bit(VCPU_REGS_RAX, ®s_dirty);

I think that hiding whether registers are cached or not behing wrappers
makes a lot of sense, but having the ->cache_regs interface split can
also result in gains. An index argument to ->cache_regs() would do the
trick.

For example, there's no need to read GUEST_RSP for
skip_emulated_instruction, thats another 50+ cycles.

Unless there's something obscure that means you need to read RSP/RIP
before accessing the now in-memory guest registers saved with "mov"
in vmx_vcpu_run(). The comment on vcpu_load_rsp_rip seems a little
ambiguous to me:

/*
 * Sync the rsp and rip registers into the vcpu structure.  This allows
 * registers to be accessed by indexing vcpu->arch.regs.
 */

But I think it just refers to the interface in general, so that nobody
would try to access RSP or RIP (and RAX in AMD's case) before calling
->cache_regs().



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: cache_regs in kvm_emulate_pio

2008-06-20 Thread Avi Kivity

Marcelo Tosatti wrote:

Hi,

From my understanding the ->cache_regs call on kvm_emulate_pio() is
necessary only on AMD, where vcpu->arch.regs[RAX] is not copied during
exit in svm_vcpu_load().

On both architectures, the remaining general purpose registers are saved
on exit.

The following patch saves 100 cycles out of both light and heavy exits
on Intel (if correct, kvm_emulate_hypercall and complete_pio could also
benefit, thus saving 200 cycles for in-kernel devices).
  


ISTR vmwrite as 50 cycles and vmread as much lower.


BTW, the decache_regs(vcpu) call at the end of complete_pio() could also
be a noop on Intel from what I can tell ?

  


I think so.  decache_regs() is actually more important.


diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 851184d..95a0736 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -410,6 +410,7 @@ struct kvm_x86_ops {
unsigned long (*get_dr)(struct kvm_vcpu *vcpu, int dr);
void (*set_dr)(struct kvm_vcpu *vcpu, int dr, unsigned long value,
   int *exception);
+   void (*cache_rax)(struct kvm_vcpu *vcpu);
void (*cache_regs)(struct kvm_vcpu *vcpu);
void (*decache_regs)(struct kvm_vcpu *vcpu);
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
  


ugh, another callback.  how about instead

/* in vcpu structure */
u16 regs_available;
u16 regs_dirty;

/* read from cache if possible */
if (!test_bit(VCPU_REG_RAX, ®s_available))
  ->cache_regs();
printk("%d\n", regs[VCPU_REGS_RAX]);

/* write to cache, ->vcpu_run() will flush */
regs[VCPU_REGS_RAX] = 17;
__set_bit(VCPU_REGS_RAX, ®s_dirty);

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sharing disks between two kvm guests

2008-06-20 Thread Anthony Liguori

Laurent Vivier wrote:

Le vendredi 20 juin 2008 à 11:37 -0500, Anthony Liguori a écrit :
  

Laurent Vivier wrote:


Le vendredi 20 juin 2008 à 09:07 -0500, Javier Guerra a écrit :
  
  

On Fri, Jun 20, 2008 at 7:23 AM, carlopmart <[EMAIL PROTECTED]> wrote:



Felix Leimbach wrote:
  
  

 This is my first post to this list. I have already installed kvm-70
under rhel5.2. My intention is to share on disk image betwwen two rhel5.2
kvm guests. Is it possible to accomplish this in kvm like xen or vmware
does?? How can I do?? I didn't find any reference abou this on kvm
documentation ...
  
  

i tried this looong ago and didn't really work because there was some
userspace cache on each QEMU instance.  but the -drive option has a
'cache=off' setting that should be enough.

in theory (i haven't tested, but Avi 'blessed' it):
- create a new image with qemu-img
- add it to the command line using -drive file=xxx,cache=off on both
KVM instances
- use a cluster filesystem!



RFC:

Well, well, perhaps it is delusions of a sick mind but since the
introduction of qemu-nbd I think we can develop easily something to
share a disk between several virtual hosts:

I- in a first step, we can modify qemu-nbd to accept several connections
for one disk image, for instance:

# qemu-nbd my-disk.qcow2
# nbd-client localhost 1024 /dev/nbd0
# nbd-client localhost 1024 /dev/nbd1

and start two virtual hosts:

"qemu -hda v1.img -hdb /dev/nbd0" and "qemu -hda v2.img -hdb /dev/nbd1"

Of course the filesystem must know how to share the access to the disk
with others (-> "cluster filesystem")

II- in a second step, we can include directly the nbd protocol in qemu
(block-nbd.c, "-drive file=nbd:localhost:1024") to connect to the
server. We can also add some commands to the protocol to manage lock,
HA, "what else ?" (Hi George).
  
  

http://hg.codemonkey.ws/qemu-pq/file/25ca451f2040/block-nbd.diff



You're not fun, Anthony.

Perhaps, now, it should be better if you use functions defined in the
(new) file "nbd.c".
  


That patch is very old by now.  It needs to be updated/written to use 
the aio infrastructure which will be a little tough since there are some 
assumptions right now that all aio is posix-aio.


Regards,

Anthony Liguori


Laurent

  

Regards,

Anthony Liguori



Any comments ?

Cheers,
Laurent
  
  



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap

2008-06-20 Thread Anthony Liguori

Avi Kivity wrote:

Anthony Liguori wrote:

I think the current VT-d code needs some reworking.

We should build the table as the shadow page table gets built.  We 
should suppress iotlb flushes unless the table is actually being 
updated.




We can't, since we need the iommu tables populated before we issue any 
dma.


Yes, as I've mentioned, the lack of a DMA window notification API can be 
handled as a special case.


Perhaps we want something like MAP_POPULATE for shadow, which would 
then affect the iommu tables.  Userspace would then do:


mlock()
... set up device assignment
ioctl(..., KVM_POPULATE)


Exactly like this :-)

Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap

2008-06-20 Thread Avi Kivity

Anthony Liguori wrote:

I think the current VT-d code needs some reworking.

We should build the table as the shadow page table gets built.  We 
should suppress iotlb flushes unless the table is actually being updated.




We can't, since we need the iommu tables populated before we issue any dma.

Perhaps we want something like MAP_POPULATE for shadow, which would then 
affect the iommu tables.  Userspace would then do:


mlock()
... set up device assignment
ioctl(..., KVM_POPULATE)


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sharing disks between two kvm guests

2008-06-20 Thread Laurent Vivier
Le vendredi 20 juin 2008 à 11:37 -0500, Anthony Liguori a écrit :
> Laurent Vivier wrote:
> > Le vendredi 20 juin 2008 à 09:07 -0500, Javier Guerra a écrit :
> >   
> >> On Fri, Jun 20, 2008 at 7:23 AM, carlopmart <[EMAIL PROTECTED]> wrote:
> >> 
> >>> Felix Leimbach wrote:
> >>>   
> >  This is my first post to this list. I have already installed kvm-70
> > under rhel5.2. My intention is to share on disk image betwwen two 
> > rhel5.2
> > kvm guests. Is it possible to accomplish this in kvm like xen or vmware
> > does?? How can I do?? I didn't find any reference abou this on kvm
> > documentation ...
> >   
> >> i tried this looong ago and didn't really work because there was some
> >> userspace cache on each QEMU instance.  but the -drive option has a
> >> 'cache=off' setting that should be enough.
> >>
> >> in theory (i haven't tested, but Avi 'blessed' it):
> >> - create a new image with qemu-img
> >> - add it to the command line using -drive file=xxx,cache=off on both
> >> KVM instances
> >> - use a cluster filesystem!
> >> 
> >
> > RFC:
> >
> > Well, well, perhaps it is delusions of a sick mind but since the
> > introduction of qemu-nbd I think we can develop easily something to
> > share a disk between several virtual hosts:
> >
> > I- in a first step, we can modify qemu-nbd to accept several connections
> > for one disk image, for instance:
> >
> > # qemu-nbd my-disk.qcow2
> > # nbd-client localhost 1024 /dev/nbd0
> > # nbd-client localhost 1024 /dev/nbd1
> >
> > and start two virtual hosts:
> >
> > "qemu -hda v1.img -hdb /dev/nbd0" and "qemu -hda v2.img -hdb /dev/nbd1"
> >
> > Of course the filesystem must know how to share the access to the disk
> > with others (-> "cluster filesystem")
> >
> > II- in a second step, we can include directly the nbd protocol in qemu
> > (block-nbd.c, "-drive file=nbd:localhost:1024") to connect to the
> > server. We can also add some commands to the protocol to manage lock,
> > HA, "what else ?" (Hi George).
> >   
> 
> http://hg.codemonkey.ws/qemu-pq/file/25ca451f2040/block-nbd.diff

You're not fun, Anthony.

Perhaps, now, it should be better if you use functions defined in the
(new) file "nbd.c".

Laurent

> Regards,
> 
> Anthony Liguori
> 
> > Any comments ?
> >
> > Cheers,
> > Laurent
> >   
> 
> 
-- 
- [EMAIL PROTECTED] ---
"The best way to predict the future is to invent it."
- Alan Kay

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] kvm-guest-drivers-linux: Fix GSO/partial csum support on older kernels

2008-06-20 Thread Avi Kivity

Mark McLoughlin wrote:

Hi,
Here's a few patches to fix virtio_net GSO
and partial csum support under older kernels.

  


Applied all, thanks.  Sorry for the virtio-like latency in processing.

I don't much like the intense hackery involved in this.  The way I think 
it could be done is:


- hack virtio to use an API which is specific to kvm, but matches the 
current upstream API:


   s/net_func/virtio_compat_net_func/

 including data structures.

- define this API on top of the host kernel's real API.  For a recent 
enough kernel, that's a one-to-one mapping:


  virtio_compat_net_func() { return net_func(); }

 for older ones there's a more trickery involved.

 For kvm, this is a fairly successful strategy, but I imagine that for 
virtio-net this will be much, much, more difficult.


- write Documentation/stable_api_nonsense_nonsense.txt.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sharing disks between two kvm guests

2008-06-20 Thread Avi Kivity

Javier Guerra wrote:

i tried this looong ago and didn't really work because there was some
userspace cache on each QEMU instance.  but the -drive option has a
'cache=off' setting that should be enough.

in theory (i haven't tested, but Avi 'blessed' it):
- create a new image with qemu-img
- add it to the command line using -drive file=xxx,cache=off on both
KVM instances
- use a cluster filesystem!
  


This won't work with qcow images as metadata and allocation is not 
synchronized.  Use raw images.  cache=off is not strictly required, 
since the Linux pagecache will maintain coherency.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4][VTD] vt-d specific files in KVM

2008-06-20 Thread Avi Kivity

Kay, Allen M wrote:

vt-d specific files in KVM for contructing vt-d page tables and
programming vt-d context entries.

Signed-off-by: Allen M. Kay <[EMAIL PROTECTED]>
  
diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c

new file mode 100644
index 000..634802c
--- /dev/null
+++ b/arch/x86/kvm/vtd.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or 
modify it

+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
License for

+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License 
along with
+ * this program; if not, write to the Free Software Foundation, Inc., 
59 Temple

+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Copyright (C) 2006-2008 Intel Corporation
+ * Author: Allen M. Kay <[EMAIL PROTECTED]>
+ * Author: Weidong Han <[EMAIL PROTECTED]>
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "vtd.h"
+
+int kvm_iommu_map_pages(struct kvm *kvm,
+gfn_t base_gfn, unsigned long npages)
+{
+gfn_t gfn = base_gfn;
+pfn_t pfn;
+struct page *page;
+int i, rc;
+
+if (!kvm->arch.domain)
+return -EFAULT;
+
+printk(KERN_DEBUG "kvm_iommu_map_page: gpa = %lx\n",
+gfn << PAGE_SHIFT);
+printk(KERN_DEBUG "kvm_iommu_map_page: hpa = %lx\n",
+gfn_to_pfn(kvm, base_gfn) << PAGE_SHIFT);
+printk(KERN_DEBUG "kvm_iommu_map_page: size = %lx\n",
+npages*PAGE_SIZE);
+
+for (i = 0; i < npages; i++) {
+pfn = gfn_to_pfn(kvm, gfn);
+if (pfn_valid(pfn)) {
+rc = kvm_intel_iommu_page_mapping(kvm->arch.domain,
+gfn << PAGE_SHIFT, pfn << PAGE_SHIFT,
+PAGE_SIZE, DMA_PTE_READ | DMA_PTE_WRITE);
+if (rc) {
+page = gfn_to_page(kvm, gfn);
+put_page(page);


This is racy.  gfn_to_page() can return a different page each time it is 
called.  Instead iommu_map_page() should drop the refcount if it fails.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4][VTD] vt-d hooks in generic KVM sources

2008-06-20 Thread Avi Kivity

Kay, Allen M wrote:

vt-d hooks in generic KVM sources for mapping guest memory with vt-d
page table.

Signed-off-by: Allen M. Kay <[EMAIL PROTECTED]>
  
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile

index c97d35c..f635fb0 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -10,7 +10,7 @@ endif
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
 
 kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o 
lapic.o \

-i8254.o
+i8254.o vtd.o


This breaks the build.


/kvm/x86.c b/arch/x86/kvm/x86.c
index d8bc492..61052e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 

 #include 
@@ -351,6 +352,8 @@ static void kvm_free_pci_passthrough(struct kvm *kvm)
 
 list_del(&pci_pt_dev->list);

 }
+if (kvm_intel_iommu_found())
+kvm->arch.domain = NULL;


"domain" is much too generic.  Need something like intel_iommu_domain 
(later we can transform it to iommu_domain as we make it non-intel 
dependent; also move it out of arch so ia64 can benefit too).



 write_unlock_irqrestore(&kvm_pci_pt_lock, flags);
 }
 
@@ -1958,6 +1961,11 @@ long kvm_arch_vm_ioctl(struct file *filp,

 r = kvm_vm_ioctl_pci_pt_dev(kvm, &pci_pt_dev);
 if (r)
 goto out;
+if (kvm_intel_iommu_found()) {
+r = kvm_iommu_map_guest(kvm, &pci_pt_dev);
+if (r)
+goto out;
+}


Need to undo the effects of kvm_vm_ioctl_pci_pt_dev() on failure.



diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e8f9fda..7211823 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -388,6 +388,11 @@ int __kvm_set_memory_region(struct kvm *kvm,
 }
 
 kvm_free_physmem_slot(&old, &new);

+
+/* map the pages in iommu page table */
+if (kvm_intel_iommu_found())
+kvm_iommu_map_pages(kvm, base_gfn, npages);
+
 return 0;


This is generic code.  As this is arch specific for now, please move it 
to arch code.


Also, make sure that each patch builds cleanly.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4][VTD] modifications to intel-iommu.c.

2008-06-20 Thread Avi Kivity

Kay, Allen M wrote:

Modification to intel-iommu.c to support vt-d page table and context
table mapping in kvm.  Mods to dmar.c and iova.c are due to header file
moves to include/linux.
  



diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index f941f60..a58a5b0 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -26,8 +26,8 @@
 
 #include 

 #include 
-#include "iova.h"
-#include "intel-iommu.h"
+#include 
+#include 


This should have been done in the file movement patch to avoid breaking 
the build. 



 
+void kvm_intel_iommu_domain_exit(struct dmar_domain *domain)


This should be a generic API, not a kvm specific one.


+{
+u64 end;
+
+/* Domain 0 is reserved, so dont process it */
+if (!domain)
+return;


'domain' here is a pointer, not an identifier.



+int kvm_intel_iommu_context_mapping(
+struct dmar_domain *domain, struct pci_dev *pdev)
+{
+int rc;
+rc = domain_context_mapping(domain, pdev);
+return rc;
+}
+EXPORT_SYMBOL_GPL(kvm_intel_iommu_context_mapping);


What does the return value mean?


+
+int kvm_intel_iommu_page_mapping(
+struct dmar_domain *domain, dma_addr_t iova,
+u64 hpa, size_t size, int prot)
+{
+int rc;
+rc = domain_page_mapping(domain, iova, hpa, size, prot);
+return rc;
+}
+EXPORT_SYMBOL_GPL(kvm_intel_iommu_page_mapping);


The function name makes it sound like it's retrieving information.  If 
it does something, put a verb in there.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4][VTD] kvm vt-d support kernel changes

2008-06-20 Thread Avi Kivity

Kay, Allen M wrote:

Following four patches contains changes for enabling VT-d PCI
passthrough.  The patches are located at:

git.kernel.org/pub/scm/linux/kernel/git/amit/kvm.git vtd

  


Please attach patches as text/plain or inline them.  It's very annoying 
to review application/octet-stream patches.



I have incorporated most of the feedbacks from the last RFC submission.
It was tested with passthrough an E1000 NIC to a linux guest using
irqhook interrupt injection mechanism.

1) intel_iommu_move.patch: move intel-iommu.h/iova.h to include/linux.
2) intel_iommu_mods.patch: kvm modifications to intel-iommu.c.
  


What's the upstream path for these (who's the subsystem maintainer)?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] kvm-70 release

2008-06-20 Thread Avi Kivity

Farkas Levente wrote:


kvm-70 do not compile on centos-5 kernel-2.6.18-53.1.21.el5, there are 
2 warning (which would be nice to fix anyway), but the real problems 
is here:

--
WARNING: 
/home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel/kvm-intel.o - 
Section mismatch: reference to .init.text: from .text.fixup after '' 
(at offset 0x97)


This one's harmless (though I'd like to remove it).

WARNING: "kallsyms_lookup_name" 
[/home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel/kvm.ko] undefined!


This is now fixed in kvm-userspace.git.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1 of 3] Remove use of bit fields in kvm trace structure

2008-06-20 Thread Hollis Blanchard
Slightly unconventional coding style, but Acked-by: Hollis Blanchard
<[EMAIL PROTECTED]>

Eric, as I mentioned previously, bitfields cannot be used in portable
binary formats.

Avi, would you apply please?

-- 
Hollis Blanchard
IBM Linux Technology Center

On Thu, 2008-06-19 at 23:19 -0500, Jerone Young wrote:
> 2 files changed, 21 insertions(+), 11 deletions(-)
> include/linux/kvm.h  |   10 +++---
> virt/kvm/kvm_trace.c |   22 ++
> 
> 
> This patch fixes kvmtrace use on big endian systems. When using bit fields 
> the compiler will lay data out in the wrong order expected when laid down 
> into a file. This fixes it by using one variable instead of using bit fields.
> 
> Signed-off-by: Jerone Young <[EMAIL PROTECTED]>
> 
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -311,9 +311,13 @@ struct kvm_s390_interrupt {
> 
>  /* This structure represents a single trace buffer record. */
>  struct kvm_trace_rec {
> - __u32 event:28;
> - __u32 extra_u32:3;
> - __u32 cycle_in:1;
> + /* variable rec_val
> +  * is split into:
> +  * bits 0 - 27  -> event id
> +  * bits 28 -30  -> number of extra data args of size u32
> +  * bits 31  -> binary indicator for if tsc is in record
> +  */
> + __u32 rec_val;
>   __u32 pid;
>   __u32 vcpu_id;
>   union {
> diff --git a/virt/kvm/kvm_trace.c b/virt/kvm/kvm_trace.c
> --- a/virt/kvm/kvm_trace.c
> +++ b/virt/kvm/kvm_trace.c
> @@ -54,12 +54,15 @@ static void kvm_add_trace(void *probe_pr
>   struct kvm_trace *kt = kvm_trace;
>   struct kvm_trace_rec rec;
>   struct kvm_vcpu *vcpu;
> - inti, extra, size;
> + inti, size;
> + u32 extra;
> 
>   if (unlikely(kt->trace_state != KVM_TRACE_STATE_RUNNING))
>   return;
> + 
> + /* set event id */  
> + rec.rec_val = 0x0fff & va_arg(*args, u32);
> 
> - rec.event   = va_arg(*args, u32);
>   vcpu= va_arg(*args, struct kvm_vcpu *);
>   rec.pid = current->tgid;
>   rec.vcpu_id = vcpu->vcpu_id;
> @@ -67,21 +70,24 @@ static void kvm_add_trace(void *probe_pr
>   extra   = va_arg(*args, u32);
>   WARN_ON(!(extra <= KVM_TRC_EXTRA_MAX));
>   extra   = min_t(u32, extra, KVM_TRC_EXTRA_MAX);
> - rec.extra_u32   = extra;
> 
> - rec.cycle_in= p->cycle_in;
> + /* set inidicator for tcs record */
> + rec.rec_val |= 0x8000 & (p->cycle_in << 31);
> + 
> + /* set extra data num */
> + rec.rec_val |= 0x7000 & (extra << 28);
> 
> - if (rec.cycle_in) {
> + if (p->cycle_in) {
>   rec.u.cycle.cycle_u64 = get_cycles();
> 
> - for (i = 0; i < rec.extra_u32; i++)
> + for (i = 0; i < extra; i++)
>   rec.u.cycle.extra_u32[i] = va_arg(*args, u32);
>   } else {
> - for (i = 0; i < rec.extra_u32; i++)
> + for (i = 0; i < extra; i++)
>   rec.u.nocycle.extra_u32[i] = va_arg(*args, u32);
>   }
> 
> - size = calc_rec_size(rec.cycle_in, rec.extra_u32 * sizeof(u32));
> + size = calc_rec_size(p->cycle_in, extra * sizeof(u32));
>   relay_write(kt->rchan, &rec, size);
>  }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sharing disks between two kvm guests

2008-06-20 Thread Anthony Liguori

Laurent Vivier wrote:

Le vendredi 20 juin 2008 à 09:07 -0500, Javier Guerra a écrit :
  

On Fri, Jun 20, 2008 at 7:23 AM, carlopmart <[EMAIL PROTECTED]> wrote:


Felix Leimbach wrote:
  

 This is my first post to this list. I have already installed kvm-70
under rhel5.2. My intention is to share on disk image betwwen two rhel5.2
kvm guests. Is it possible to accomplish this in kvm like xen or vmware
does?? How can I do?? I didn't find any reference abou this on kvm
documentation ...
  

i tried this looong ago and didn't really work because there was some
userspace cache on each QEMU instance.  but the -drive option has a
'cache=off' setting that should be enough.

in theory (i haven't tested, but Avi 'blessed' it):
- create a new image with qemu-img
- add it to the command line using -drive file=xxx,cache=off on both
KVM instances
- use a cluster filesystem!



RFC:

Well, well, perhaps it is delusions of a sick mind but since the
introduction of qemu-nbd I think we can develop easily something to
share a disk between several virtual hosts:

I- in a first step, we can modify qemu-nbd to accept several connections
for one disk image, for instance:

# qemu-nbd my-disk.qcow2
# nbd-client localhost 1024 /dev/nbd0
# nbd-client localhost 1024 /dev/nbd1

and start two virtual hosts:

"qemu -hda v1.img -hdb /dev/nbd0" and "qemu -hda v2.img -hdb /dev/nbd1"

Of course the filesystem must know how to share the access to the disk
with others (-> "cluster filesystem")

II- in a second step, we can include directly the nbd protocol in qemu
(block-nbd.c, "-drive file=nbd:localhost:1024") to connect to the
server. We can also add some commands to the protocol to manage lock,
HA, "what else ?" (Hi George).
  


http://hg.codemonkey.ws/qemu-pq/file/25ca451f2040/block-nbd.diff

Regards,

Anthony Liguori


Any comments ?

Cheers,
Laurent
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sharing disks between two kvm guests

2008-06-20 Thread Laurent Vivier
Le vendredi 20 juin 2008 à 09:07 -0500, Javier Guerra a écrit :
> On Fri, Jun 20, 2008 at 7:23 AM, carlopmart <[EMAIL PROTECTED]> wrote:
> > Felix Leimbach wrote:
> >>
> >>>  This is my first post to this list. I have already installed kvm-70
> >>> under rhel5.2. My intention is to share on disk image betwwen two rhel5.2
> >>> kvm guests. Is it possible to accomplish this in kvm like xen or vmware
> >>> does?? How can I do?? I didn't find any reference abou this on kvm
> >>> documentation ...
> 
> i tried this looong ago and didn't really work because there was some
> userspace cache on each QEMU instance.  but the -drive option has a
> 'cache=off' setting that should be enough.
> 
> in theory (i haven't tested, but Avi 'blessed' it):
> - create a new image with qemu-img
> - add it to the command line using -drive file=xxx,cache=off on both
> KVM instances
> - use a cluster filesystem!

RFC:

Well, well, perhaps it is delusions of a sick mind but since the
introduction of qemu-nbd I think we can develop easily something to
share a disk between several virtual hosts:

I- in a first step, we can modify qemu-nbd to accept several connections
for one disk image, for instance:

# qemu-nbd my-disk.qcow2
# nbd-client localhost 1024 /dev/nbd0
# nbd-client localhost 1024 /dev/nbd1

and start two virtual hosts:

"qemu -hda v1.img -hdb /dev/nbd0" and "qemu -hda v2.img -hdb /dev/nbd1"

Of course the filesystem must know how to share the access to the disk
with others (-> "cluster filesystem")

II- in a second step, we can include directly the nbd protocol in qemu
(block-nbd.c, "-drive file=nbd:localhost:1024") to connect to the
server. We can also add some commands to the protocol to manage lock,
HA, "what else ?" (Hi George).

Any comments ?

Cheers,
Laurent
-- 
- [EMAIL PROTECTED]  --
 "In short: just say NO TO DRUGS and maybe you won't
   end up like the Hurd people." -- Linus Torvald

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/7] force the TSC unreliable by reporting C2 state

2008-06-20 Thread Andi Kleen
Marcelo Tosatti <[EMAIL PROTECTED]> writes:
>
> Well, Linux assumes that TSC stops ticking on C2/C3.

It doesn't always and Linux is overly conservative and doesn't know
the full rules (and in some cases it's also hard to know because the
BIOS hides systems). Also a lot of systems don't have C2/C3.

But it still happens occasionally so it has to be handled. Normally
we would expect guests to detect this because they have exactly the 
same problem on real hardware, but at least older Linux didn't always
get it correct.

But in general the newer kernel already keeps an estimate on how long C2/C3 took
(needed for power management) and nobody would stop KVM from just adding
that into the TSC offset that is supported by VT. You might still have some 
drift
from that though.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sharing disks between two kvm guests

2008-06-20 Thread Javier Guerra
On Fri, Jun 20, 2008 at 7:23 AM, carlopmart <[EMAIL PROTECTED]> wrote:
> Felix Leimbach wrote:
>>
>>>  This is my first post to this list. I have already installed kvm-70
>>> under rhel5.2. My intention is to share on disk image betwwen two rhel5.2
>>> kvm guests. Is it possible to accomplish this in kvm like xen or vmware
>>> does?? How can I do?? I didn't find any reference abou this on kvm
>>> documentation ...

i tried this looong ago and didn't really work because there was some
userspace cache on each QEMU instance.  but the -drive option has a
'cache=off' setting that should be enough.

in theory (i haven't tested, but Avi 'blessed' it):
- create a new image with qemu-img
- add it to the command line using -drive file=xxx,cache=off on both
KVM instances
- use a cluster filesystem!

-- 
Javier
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sharing disks between two kvm guests

2008-06-20 Thread carlopmart

Felix Leimbach wrote:


 This is my first post to this list. I have already installed kvm-70 
under rhel5.2. My intention is to share on disk image betwwen two 
rhel5.2 kvm guests. Is it possible to accomplish this in kvm like xen 
or vmware does?? How can I do?? I didn't find any reference abou this 
on kvm documentation ...
Have a look at KVM/QEMU's -smb option in 
http://bellard.org/qemu/qemu-doc.html




Yes but this option doesn't helps me. I need to simulate a SAN ...

--
CL Martinez
carlopmart {at} gmail {d0t} com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sharing disks between two kvm guests

2008-06-20 Thread Felix Leimbach


 This is my first post to this list. I have already installed kvm-70 
under rhel5.2. My intention is to share on disk image betwwen two 
rhel5.2 kvm guests. Is it possible to accomplish this in kvm like xen 
or vmware does?? How can I do?? I didn't find any reference abou this 
on kvm documentation ...
Have a look at KVM/QEMU's -smb option in 
http://bellard.org/qemu/qemu-doc.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI PT: irq issue

2008-06-20 Thread Amit Shah
On Thursday 19 June 2008 10:17:29 Amit Shah wrote:
> * On Wednesday 18 June 2008 18:26:16 Ben-Ami Yassour wrote:
> > Amit,
> >
> > With the current implementation we have an issue if the driver on the
> > host was never loaded.
> >
> > To be able to run kvm with passthrough we have to load and then unload
> > the driver on the host at least once. After that it works ok.
>
> Yes, whenever a device issues pci_request_regions(), the IRQ may be
> reassigned.
>
> The unloading / loading should not be necessary once I commit these changes
> to the tree.
>
> > Note that after doing the load and unload the irq as reported by lspci
> > -v is changed.
> >
> > The questions that I think we need to figure out are:
> > 1. How does the loading of the driver on the host causes the irq to
> > change?
> > 2. What other side effects does it do that helps kvm pcipt work?
> > 3. What do we need to add to the pcipt code that will do the same "side
> > effect" (or bypass the problem)?
>
> That already answers all these.
>
> > Also note that in the current implementation the user is required to
> > provide the irq for the device in the kvm command line.
> > With respect to the comments above it is clear that lspci will show an
> > irrelevant irq value.
> > Why do we need the user to provide this information anyhow?
> > Why can't KVM find it out automatically?
>
> I thought we discussed this several times. The next commit is going to fix
> this.
>
> > Note, if the kernel can find this information then we can also remove it
> > from the ioctl interface.
>
> Coming soon; coming soon indeed.

I just pushed out the changes so the trickery with module loading / unloading, 
assigning irq number, etc. are not needed. The userspace command line still 
expects a number for the irq, though, and you can pass it any number as long 
as you use the in-kernel irq handler  (this is needed for the irqhook module, 
which I'm not updating the irqhook module as of now).

A couple of notes for the VT-d patch:
- The pci_dev struct is now available in the pci_pt kernel structure, so just 
use that information each time you want to add a device instead of searching 
for it each time.
- The kernel with KVM VT-d patches doesn't build on the kvm-userspace.git 
tree. Please fix that.

Amit.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Sharing disks between two kvm guests

2008-06-20 Thread carlopmart

Hi all,

 This is my first post to this list. I have already installed kvm-70 under 
rhel5.2. My intention is to share on disk image betwwen two rhel5.2 kvm guests. 
Is it possible to accomplish this in kvm like xen or vmware does?? How can I 
do?? I didn't find any reference abou this on kvm documentation ...


Many thanks.

--
CL Martinez
carlopmart {at} gmail {d0t} com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][REPOST]: Fake emulate Intel perfctr MSRs

2008-06-20 Thread Chris Lalancette
Respin of my previous patch to fake emulate the Intel perfctr MSRs.  As Sheng
Yang pointed out, I didn't need an additional include, and I could use other
#define's.

Signed-off-by: Chris Lalancette <[EMAIL PROTECTED]>
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e4278d..f2feacf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -917,6 +917,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
 	case MSR_IA32_TIME_STAMP_COUNTER:
 		guest_write_tsc(data);
 		break;
+	case MSR_P6_PERFCTR0:
+	case MSR_P6_PERFCTR1:
+	case MSR_P6_EVNTSEL0:
+	case MSR_P6_EVNTSEL1:
+		/*
+		 * Just discard all writes to the performance counters; this
+		 * should keep both older linux and windows 64-bit guests
+		 * happy
+		 */
+		pr_unimpl(vcpu, "unimplemented perfctr wrmsr: 0x%x data 0x%llx\n", msr_index, data);
+
+		break;
 	default:
 		msr = find_msr_entry(vmx, msr_index);
 		if (msr) {


Re: [PATCH] [REPOST]: Fake emulate Intel perfctr MSRs

2008-06-20 Thread Chris Lalancette
Yang, Sheng wrote:
> Hi, Chris
> 
> It seems you can use something like MSR_P6_EVNTSEL0 to avoid brought in new 
> #include? :)
> 
> (BTW: these four MSRs are P6 architecture specific ones, and two of them 
> shared by Pentium architecture)
> 

Hello,
 Oh, thanks for the pointer.  I didn't even see those additional #define's!
 I'll respin the patch like you suggest.

Thanks,
Chris Lalancette
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html