date:20101117

Re: [PATCH v2 5/6] KVM: MMU: remove 'clear_unsync' parameter

2010-11-17 Thread Xiao Guangrong

On 11/18/2010 12:49 AM, Marcelo Tosatti wrote:
bool clear_unsync)
>> +static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>>  {
>>  int i, offset, nr_present;
>>  bool host_writable;
>> @@ -781,7 +780,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, 
>> struct kvm_mmu_page *sp,
>>  u64 nonpresent;
>>  
>>  if (rsvd_bits_set || is_present_gpte(gpte) ||
>> -  !clear_unsync)
>> +  sp->unsync)
>>  nonpresent = shadow_trap_nonpresent_pte;
>>  else
>>  nonpresent = shadow_notrap_nonpresent_pte;
> 
> Its better to keep this explicit as a parameter. 
> 

But after patch 6 (KVM: MMU: cleanup update_pte, pte_prefetch and sync_page 
functions),
this parameter is not used anymore... i don't have strong opinion on it :-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/6] KVM: MMU: fix forgot flush vcpu tlbs

2010-11-17 Thread Xiao Guangrong

On 11/18/2010 01:36 AM, Marcelo Tosatti wrote:

>> I don't think we need to flush immediately; set a "tlb dirty" bit
>> somewhere that is cleareded when we flush the tlb.
>> kvm_mmu_notifier_invalidate_page() can consult the bit and force a
>> flush if set.
> 
> Yep.
> 

Great, i'll do it in the v3.

Do we need a simple bug fix patch(which immediately flush tlbs) for
backport first?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/6] KVM: MMU: don't drop spte if overwrite it from W to RO

2010-11-17 Thread Xiao Guangrong

On 11/17/2010 11:57 PM, Avi Kivity wrote:

>>>   set_pte:
>>>   update_spte(sptep, spte);
>>> +/*
>>> + * If we overwrite a writable spte with a read-only one we
>>> + * should flush remote TLBs. Otherwise rmap_write_protect
>>> + * will find a read-only spte, even though the writable spte
>>> + * might be cached on a CPU's TLB.
>>> + */
>>> +if (is_writable_pte(entry)&&  !is_writable_pte(*sptep))
>>> +kvm_flush_remote_tlbs(vcpu->kvm);
>> There is no need to flush on sync_page path since the guest is
>> responsible for it.
>>
> 
>  If we don't, the next rmap_write_protect() will incorrectly decide that
> there's no need to flush tlbs.
> 

Maybe it's not a problem if guest can flush all tlbs after overwrite it?
Marcelo, what's your comment about this?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM with hugepages generate huge load with two guests

2010-11-17 Thread Dmitry Golubev

Hi,

Sorry to bother you again. I have more info:

> 1. router with 32MB of RAM (hugepages) and 1VCPU
...
> Is it too much to have 3 guests with hugepages?

OK, this router is also out of equation - I disabled hugepages for it.
There should be also additional pages available to guests because of
that. I think this should be pretty reproducible... Two exactly
similar 64bit Linux 2.6.32 guests with 3500MB of virtual RAM and 4
VCPU each, running on a Core2Quad (4 real cores) machine with 8GB of
RAM and 3546 2MB hugepages on a 64bit Linux 2.6.35 host (libvirt
0.8.3) from Ubuntu Maverick.

Still no swapping and the effect is pretty much the same: one guest
runs well, two guests work for some minutes - then slow down few
hundred times, showing huge load both inside (unlimited rapid growth
of loadaverage) and outside (host load is not making it unresponsive
though - but loaded to the max). Load growth on host is instant and
finite ('r' column change indicate this sudden rise):

# vmstat 5
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  3  0 194220  30680  7671200   31928 2633 1960  6  6 67 20
 1  2  0 193776  30680  7671200 4   231 55081 78491  3 39 17 41
10  1  0 185508  30680  7671200 487 53042 34212 55 27  9  9
12  0  0 185180  30680  7671200 295 41007 21990 84 16  0  0

Thanks,
Dmitry

On Wed, Nov 17, 2010 at 4:19 AM, Dmitry Golubev  wrote:
> Hi,
>
> Maybe you remember that I wrote few weeks ago about KVM cpu load
> problem with hugepages. The problem was lost hanging, however I have
> now some new information. So the description remains, however I have
> decreased both guest memory and the amount of hugepages:
>
> Ram = 8GB, hugepages = 3546
>
> Total of 2 virual machines:
> 1. router with 32MB of RAM (hugepages) and 1VCPU
> 2. linux guest with 3500MB of RAM (hugepages) and 4VCPU
>
> Everything works fine until I start the second linux guest with the
> same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of
> description is the same as before: after a while the host shows
> loadaverage of about 8 (on a Core2Quad) and it seems that both big
> guests consume exactly the same amount of resources. The hosts seems
> responsive though. Inside the guests, however, things are not so good
> - the load sky rockets to at least 20. Guests are not responsive and
> even a 'ps' executes inappropriately slow (may take few minutes -
> here, however, load builds up and it seems that machine becomes slower
> with time, unlike host, which shows the jump in resource consumption
> instantly). It also seem that the more guests uses memory, the faster
> the problem appers. Still at least a gig of RAM is free on each guest
> and there is no swap activity inside the guest.
>
> The most important thing - why I went back and quoted older message
> than the last one, is that there is no more swap activity on host, so
> the previous track of thought may also be wrong and I returned to the
> beginning. There is plenty of RAM now and swap on host is always on 0
> as seen in 'top'. And there is 100% cpu load, equally shared between
> the two large guests. To stop the load I can destroy either large
> guest. Additionally, I have just discovered that suspending any large
> guest works as well. Moreover, after resume, the load does not come
> back for a while. Both methods stop the high load instantly (faster
> than a second). As you were asking for a 'top' inside the guest, here
> it is:
>
> top - 03:27:27 up 42 min,  1 user,  load average: 18.37, 7.68, 3.12
> Tasks: 197 total,  23 running, 174 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us, 89.2%sy,  0.0%ni, 10.5%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
> Mem:   3510912k total,  1159760k used,  2351152k free,    62568k buffers
> Swap:  4194296k total,        0k used,  4194296k free,   484492k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 12303 root      20   0     0    0    0 R  100  0.0   0:33.72
> vpsnetclean
> 11772 99        20   0  149m  11m 2104 R   82  0.3   0:15.10 httpd
> 10906 99        20   0  149m  11m 2124 R   73  0.3   0:11.52 httpd
> 10247 99        20   0  149m  11m 2128 R   31  0.3   0:05.39 httpd
>  3916 root      20   0 86468  11m 1476 R   16  0.3   0:15.14
> cpsrvd-ssl
> 10919 99        20   0  149m  11m 2124 R    8  0.3   0:03.43 httpd
> 11296 99        20   0  149m  11m 2112 R    7  0.3   0:03.26 httpd
> 12265 99        20   0  149m  11m 2088 R    7  0.3   0:08.01 httpd
> 12317 root      20   0 99.6m 1384  716 R    7  0.0   0:06.57 crond
> 12326 503       20   0  8872   96   72 R    7  0.0   0:01.13 php
>  3634 root      20   0 74804 1176  596 R    6  0.0   0:12.15 crond
> 11864 32005     20   0 87224  13m 2528 R    6  0.4   0:30.84
> cpsrvd-ssl
> 12275 root      20   0 30628 9976 1364 R    6  0.3   0:24.68 cpgs_chk
> 11305 99        20

Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-17 Thread Sheng Yang

On Thursday 18 November 2010 14:21:40 Michael S. Tsirkin wrote:
> On Thu, Nov 18, 2010 at 09:58:55AM +0800, Sheng Yang wrote:
> > > > +static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr,
> > > > int len, + const void *val)
> > > > +{
> > > > +   struct kvm_assigned_dev_kernel *adev =
> > > > +   container_of(this, struct 
> > > > kvm_assigned_dev_kernel,
> > > > +msix_mmio_dev);
> > > > +   int idx, r = 0;
> > > > +   unsigned long new_val = *(unsigned long *)val;
> > > 
> > > What if it's a 64-bit write on a 32-bit host?
> > 
> > In fact we haven't support QWORD(64bit) accessing now. The reason is we
> > haven't seen any OS is using it in this way now, so I think we can leave
> > it later.
> > 
> > Also seems QEmu doesn't got the way to handle 64bit MMIO.
> 
> I think it does.  I think it simply splits these to 32-bit transactions
> and handles as such. That seems to be spec-compilant.  I wouldn't want us
> to regress.

Yes, you're right...

I think I have to add it. :shrug:

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.37-rc2 after KVM shutdown - unregister_netdevice: waiting for vmtst01eth0 to become free. Usage count = 1

2010-11-17 Thread Nikola Ciprich

> Yep, this is a known problem, thanks !
> 
> fix is there : 
> 
> http://patchwork.ozlabs.org/patch/71354/
Thanks Eric, this indeed fixes the problem..
I noticed the fix didn't make it to 2.6.37-rc2-git3 though,
maybe it just got omited?
anyways, thanks for help!
n.



> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-17 Thread Michael S. Tsirkin

On Thu, Nov 18, 2010 at 09:58:55AM +0800, Sheng Yang wrote:
> > > +static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int
> > > len, +   const void *val)
> > > +{
> > > + struct kvm_assigned_dev_kernel *adev =
> > > + container_of(this, struct kvm_assigned_dev_kernel,
> > > +  msix_mmio_dev);
> > > + int idx, r = 0;
> > > + unsigned long new_val = *(unsigned long *)val;
> > 
> > What if it's a 64-bit write on a 32-bit host?
> 
> In fact we haven't support QWORD(64bit) accessing now. The reason is we 
> haven't 
> seen any OS is using it in this way now, so I think we can leave it later.
> 
> Also seems QEmu doesn't got the way to handle 64bit MMIO.

I think it does.  I think it simply splits these to 32-bit transactions
and handles as such. That seems to be spec-compilant.  I wouldn't want us
to regress.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: take kvm_lock for hardware_disable() during cpu hotplug

2010-11-17 Thread Zachary Amsden


On 11/17/2010 04:41 PM, Takuya Yoshikawa wrote:

(2010/11/18 11:33), Zachary Amsden wrote:

On 11/17/2010 04:04 PM, Takuya Yoshikawa wrote:

(2010/11/18 10:59), Zachary Amsden wrote:

On 11/15/2010 10:35 PM, Takuya Yoshikawa wrote:
In kvm_cpu_hotplug(), only CPU_STARTING case is protected by 
kvm_lock.

This patch adds missing protection for CPU_DYING case.

Signed-off-by: Takuya Yoshikawa
---
virt/kvm/kvm_main.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 339dd43..0fdd911 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2148,7 +2148,9 @@ static int kvm_cpu_hotplug(struct 
notifier_block *notifier, unsigned long val,

case CPU_DYING:
printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n",
cpu);
+ spin_lock(&kvm_lock);
hardware_disable(NULL);
+ spin_unlock(&kvm_lock);
break;
case CPU_STARTING:
printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n",


I believe this is correct.


You mean lock is not necessary?


No, I believe your patch is correct and the lock should be there. Did 
you test with spinlock debugging just to be sure?




Sorry but no.

I have no experience with cpu hotplug.

So I thought it would take too much time to do real test by myself and 
reported like this this time.


Any easy way to test?


Yes, quite easy.  Some systems may not let cpu0 go offline, but you can 
manually disable and re-enable the other processors:


[r...@mysore ~]# echo "0" > /sys/devices/system/cpu/cpu1/online
[r...@mysore ~]# echo "1" > /sys/devices/system/cpu/cpu1/online

Cheers,

Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] device-assignment: register a reset function

2010-11-17 Thread Jan Kiszka

Am 16.11.2010 15:37, Alex Williamson wrote:
> On Tue, 2010-11-16 at 15:05 +0100, Bernhard Kohl wrote:
>> This is necessary because during reboot of a VM the assigned devices
>> continue DMA transfers which causes memory corruption.
>>
>> Signed-off-by: Thomas Ostler 
>> Signed-off-by: Bernhard Kohl 
>> ---
>> Changes v1 -> v2:
>> - use defined macros, e.g. PCI_COMMAND
>> - write all zero to the command register to disconnect the device logically
>> ---
>>  hw/device-assignment.c |   12 
>>  1 files changed, 12 insertions(+), 0 deletions(-)
> 
> Looks good to me.
> 
> Acked-by: Alex Williamson 

Acked-by: Jan Kiszka 

> 
>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
>> index 5f5bde1..8d5a609 100644
>> --- a/hw/device-assignment.c
>> +++ b/hw/device-assignment.c
>> @@ -1434,6 +1434,17 @@ static void 
>> assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
>>  dev->msix_table_page = NULL;
>>  }
>>  
>> +static void reset_assigned_device(DeviceState *dev)
>> +{
>> +PCIDevice *d = DO_UPCAST(PCIDevice, qdev, dev);
>> +
>> +/*
>> + * When a 0 is written to the command register, the device is logically
>> + * disconnected from the PCI bus. This avoids further DMA transfers.
>> + */
>> +assigned_dev_pci_write_config(d, PCI_COMMAND, 0, 2);
>> +}
>> +
>>  static int assigned_initfn(struct PCIDevice *pci_dev)
>>  {
>>  AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
>> @@ -1544,6 +1555,7 @@ static PCIDeviceInfo assign_info = {
>>  .qdev.name= "pci-assign",
>>  .qdev.desc= "pass through host pci devices to the guest",
>>  .qdev.size= sizeof(AssignedDevice),
>> +.qdev.reset   = reset_assigned_device,
>>  .init = assigned_initfn,
>>  .exit = assigned_exitfn,
>>  .config_read  = assigned_dev_pci_read_config,

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 2/2] KVM: selective write protection using dirty bitmap

2010-11-17 Thread Takuya Yoshikawa

Lai Jiangshan once tried to rewrite kvm_mmu_slot_remove_write_access() using
rmap: "kvm: rework remove-write-access for a slot"
  http://www.spinics.net/lists/kvm/msg35871.html

One problem pointed out there was that this approach might hurt cache locality
and make things slow down.

But if we restrict the story to dirty logging, we notice that only small
portion of pages are actually needed to be write protected.

For example, I have confirmed that even when we are playing with tools like
x11perf, dirty ratio of the frame buffer bitmap is almost always less than 10%.

In the case of live-migration, we will see more sparseness in the usual
workload because the RAM size is really big.

So this patch uses his approach with small modification to use switched out
dirty bitmap as a hint to restrict the rmap travel.

We can also use this to selectively write protect pages to reduce unwanted page
faults in the future.

Signed-off-by: Takuya Yoshikawa 
Cc: Lai Jiangshan 
---
 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/kvm/mmu.c  |   39 +++
 arch/x86/kvm/x86.c  |   27 ---
 3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b04c0fa..bc170e4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -617,6 +617,8 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 
 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
+void kvm_mmu_slot_remove_write_access_mask(struct kvm *kvm,
+   struct kvm_memory_slot *memslot, unsigned long *dirty_bitmap);
 void kvm_mmu_zap_all(struct kvm *kvm);
 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index bdb9fa9..3506a64 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3454,6 +3454,45 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
int slot)
kvm_flush_remote_tlbs(kvm);
 }
 
+static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *spte = rmap_next(kvm, rmapp, NULL);
+
+   while (spte) {
+   /* avoid RMW */
+   if (is_writable_pte(*spte))
+   *spte &= ~PT_WRITABLE_MASK;
+   spte = rmap_next(kvm, rmapp, spte);
+   }
+}
+
+/*
+ * Write protect the pages set dirty in a given bitmap.
+ */
+void kvm_mmu_slot_remove_write_access_mask(struct kvm *kvm,
+  struct kvm_memory_slot *memslot,
+  unsigned long *dirty_bitmap)
+{
+   int i;
+   unsigned long gfn_offset;
+
+   for_each_set_bit(gfn_offset, dirty_bitmap, memslot->npages) {
+   rmapp_remove_write_access(kvm, &memslot->rmap[gfn_offset]);
+
+   for (i = 0; i < KVM_NR_PAGE_SIZES - 1; i++) {
+   unsigned long gfn = memslot->base_gfn + gfn_offset;
+   unsigned long huge = KVM_PAGES_PER_HPAGE(i + 2);
+   int idx = gfn / huge - memslot->base_gfn / huge;
+
+   if (!(gfn_offset || (gfn % huge)))
+   break;
+   rmapp_remove_write_access(kvm,
+   &memslot->lpage_info[i][idx].rmap_pde);
+   }
+   }
+   kvm_flush_remote_tlbs(kvm);
+}
+
 void kvm_mmu_zap_all(struct kvm *kvm)
 {
struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 038d719..3556b4d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3194,12 +3194,27 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 }
 
 /*
+ * Check the dirty bit ratio of a given memslot.
+ *   0: clean
+ *   1: sparse
+ *   2: dense
+ */
+static int dirty_bitmap_density(struct kvm_memory_slot *memslot)
+{
+   if (!memslot->num_dirty_bits)
+   return 0;
+   if (memslot->num_dirty_bits < memslot->npages / 128)
+   return 1;
+   return 2;
+}
+
+/*
  * Get (and clear) the dirty memory log for a memory slot.
  */
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
  struct kvm_dirty_log *log)
 {
-   int r;
+   int r, density;
struct kvm_memory_slot *memslot;
unsigned long n;
 
@@ -3217,7 +3232,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
n = kvm_dirty_bitmap_bytes(memslot);
 
/* If nothing is dirty, don't bother messing with page tables. */
-   if (memslot->num_dirty_bits) {
+   density = dirty_bitmap_density(memslot);
+   if (density) {
struct kvm_memslots *slots, *old_slots;
unsigned long *dirty_bitmap;
 
@@ -3242,7 +3258,12 @@ int kvm_vm_ioctl_get_di

[RFC PATCH 1/2] KVM: count the number of dirty bits for each memslot

2010-11-17 Thread Takuya Yoshikawa

This patch introduces the counter to hold the number of dirty bits in each
memslot. We will use this to optimize dirty logging later.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/x86.c   |9 +++--
 include/linux/kvm_host.h |1 +
 virt/kvm/kvm_main.c  |6 +-
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 410d2d1..038d719 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3199,10 +3199,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
  struct kvm_dirty_log *log)
 {
-   int r, i;
+   int r;
struct kvm_memory_slot *memslot;
unsigned long n;
-   unsigned long is_dirty = 0;
 
mutex_lock(&kvm->slots_lock);
 
@@ -3217,11 +3216,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
n = kvm_dirty_bitmap_bytes(memslot);
 
-   for (i = 0; !is_dirty && i < n/sizeof(long); i++)
-   is_dirty = memslot->dirty_bitmap[i];
-
/* If nothing is dirty, don't bother messing with page tables. */
-   if (is_dirty) {
+   if (memslot->num_dirty_bits) {
struct kvm_memslots *slots, *old_slots;
unsigned long *dirty_bitmap;
 
@@ -3236,6 +3232,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
goto out;
memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
+   slots->memslots[log->slot].num_dirty_bits = 0;
slots->generation++;
 
old_slots = kvm->memslots;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2d63f2c..07aebf0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -152,6 +152,7 @@ struct kvm_memory_slot {
unsigned long *rmap;
unsigned long *dirty_bitmap;
unsigned long *dirty_bitmap_head;
+   unsigned long num_dirty_bits;
struct {
unsigned long rmap_pde;
int write_count;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 339dd43..0a0521b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -565,6 +565,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot 
*memslot)
return -ENOMEM;
 
memslot->dirty_bitmap_head = memslot->dirty_bitmap;
+   memslot->num_dirty_bits = 0;
return 0;
 }
 
@@ -1388,7 +1389,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
if (memslot && memslot->dirty_bitmap) {
unsigned long rel_gfn = gfn - memslot->base_gfn;
 
-   generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
+   if (!generic_test_le_bit(rel_gfn, memslot->dirty_bitmap)) {
+   generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
+   memslot->num_dirty_bits++;
+   }
}
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 0/2] KVM: dirty logging optimization

2010-11-17 Thread Takuya Yoshikawa

Hi,

Yesterday, I tried to read mmu code but soon came back to dirty logging!
  -- My brain's tlb flush seems to have a serious bug!

Soon after that, I reached to this patch set.


What I thought was that I might be able to improve KVM's interactivity
in the future if I can suppress write protection for hot frame buffer
regions by using hint from user land:  This may help us solve the
interactivity problems on the TODO lists in KVM wiki.

 - Just scaling up frame rate will soon suffer from many page faults.
 - Scaling down may result in bad response.


For live-migration, it may be possible to do fine grained get dirty log.


I want to know about this patch's relationship with your O(1) write
protection plan. Is it possible to co-exist?

If so, I will do some performance test.

 - I'm not sure in which conditions this hack has advantages yet.

Any comments will be appriciated!

  Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: take kvm_lock for hardware_disable() during cpu hotplug

2010-11-17 Thread Takuya Yoshikawa


(2010/11/18 11:33), Zachary Amsden wrote:

On 11/17/2010 04:04 PM, Takuya Yoshikawa wrote:

(2010/11/18 10:59), Zachary Amsden wrote:

On 11/15/2010 10:35 PM, Takuya Yoshikawa wrote:

In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock.
This patch adds missing protection for CPU_DYING case.

Signed-off-by: Takuya Yoshikawa
---
virt/kvm/kvm_main.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 339dd43..0fdd911 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2148,7 +2148,9 @@ static int kvm_cpu_hotplug(struct notifier_block 
*notifier, unsigned long val,
case CPU_DYING:
printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n",
cpu);
+ spin_lock(&kvm_lock);
hardware_disable(NULL);
+ spin_unlock(&kvm_lock);
break;
case CPU_STARTING:
printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n",


I believe this is correct.


You mean lock is not necessary?


No, I believe your patch is correct and the lock should be there. Did you test 
with spinlock debugging just to be sure?



Sorry but no.

I have no experience with cpu hotplug.

So I thought it would take too much time to do real test by myself and reported 
like this this time.

Any easy way to test?

  Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: take kvm_lock for hardware_disable() during cpu hotplug

2010-11-17 Thread Zachary Amsden


On 11/17/2010 04:04 PM, Takuya Yoshikawa wrote:

(2010/11/18 10:59), Zachary Amsden wrote:

On 11/15/2010 10:35 PM, Takuya Yoshikawa wrote:

In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock.
This patch adds missing protection for CPU_DYING case.

Signed-off-by: Takuya Yoshikawa
---
virt/kvm/kvm_main.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 339dd43..0fdd911 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2148,7 +2148,9 @@ static int kvm_cpu_hotplug(struct 
notifier_block *notifier, unsigned long val,

case CPU_DYING:
printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n",
cpu);
+ spin_lock(&kvm_lock);
hardware_disable(NULL);
+ spin_unlock(&kvm_lock);
break;
case CPU_STARTING:
printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n",


I believe this is correct.


You mean lock is not necessary?


No, I believe your patch is correct and the lock should be there.  Did 
you test with spinlock debugging just to be sure?


Thanks,

Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-17 Thread Sheng Yang

On Wednesday 17 November 2010 22:01:41 Avi Kivity wrote:
> On 11/15/2010 11:15 AM, Sheng Yang wrote:
> > We need to query the entry later.
> > 
> > +int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
> > +   struct kvm_kernel_irq_routing_entry *entry)
> > +{
> > +   int count = 0;
> > +   struct kvm_kernel_irq_routing_entry *ei = NULL;
> > +   struct kvm_irq_routing_table *irq_rt;
> > +   struct hlist_node *n;
> > +
> > +   rcu_read_lock();
> > +   irq_rt = rcu_dereference(kvm->irq_routing);
> > +   if (gsi<  irq_rt->nr_rt_entries)
> > +   hlist_for_each_entry(ei, n,&irq_rt->map[gsi], link)
> > +   count++;
> > +   if (count == 1)
> > +   *entry = *ei;
> > +   rcu_read_unlock();
> > +
> > +   return (count != 1);
> > +}
> > +
> 
> Not good form to rely on ei being valid after the loop.
> 
> I guess this is only useful for msi?  Need to document it.

May can be used for others later, it's somehow generic. Where should I document 
it?
> 
> *entry may be stale after rcu_read_unlock().  Is this a problem?

I suppose not. All MSI-X MMIO accessing would be executed without delay, so no 
re-
order issue would happen. If the guest is reading and writing the field at the 
same 
time(from two cpus), it should got some kinds of sync method for itself - or it 
may not care what's the reading result(like the one after msix_mask_irq()). 

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: take kvm_lock for hardware_disable() during cpu hotplug

2010-11-17 Thread Takuya Yoshikawa


(2010/11/18 10:59), Zachary Amsden wrote:

On 11/15/2010 10:35 PM, Takuya Yoshikawa wrote:

In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock.
This patch adds missing protection for CPU_DYING case.

Signed-off-by: Takuya Yoshikawa
---
virt/kvm/kvm_main.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 339dd43..0fdd911 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2148,7 +2148,9 @@ static int kvm_cpu_hotplug(struct notifier_block 
*notifier, unsigned long val,
case CPU_DYING:
printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n",
cpu);
+ spin_lock(&kvm_lock);
hardware_disable(NULL);
+ spin_unlock(&kvm_lock);
break;
case CPU_STARTING:
printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n",


I believe this is correct.


You mean lock is not necessary?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: take kvm_lock for hardware_disable() during cpu hotplug

2010-11-17 Thread Zachary Amsden


On 11/15/2010 10:35 PM, Takuya Yoshikawa wrote:

In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock.
This patch adds missing protection for CPU_DYING case.

Signed-off-by: Takuya Yoshikawa
---
  virt/kvm/kvm_main.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 339dd43..0fdd911 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2148,7 +2148,9 @@ static int kvm_cpu_hotplug(struct notifier_block 
*notifier, unsigned long val,
case CPU_DYING:
printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n",
   cpu);
+   spin_lock(&kvm_lock);
hardware_disable(NULL);
+   spin_unlock(&kvm_lock);
break;
case CPU_STARTING:
printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n",
   


I believe this is correct.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-17 Thread Sheng Yang

On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote:
> On 11/15/2010 11:15 AM, Sheng Yang wrote:
> > This patch enable per-vector mask for assigned devices using MSI-X.
> > 
> > This patch provided two new APIs: one is for guest to specific device's
> > MSI-X table address in MMIO, the other is for userspace to get
> > information about mask bit.
> > 
> > All the mask bit operation are kept in kernel, in order to accelerate.
> > Userspace shouldn't access the device MMIO directly for the information,
> > instead it should uses provided API to do so.
> > 
> > Signed-off-by: Sheng Yang
> > ---
> > 
> >   arch/x86/kvm/x86.c   |1 +
> >   include/linux/kvm.h  |   32 +
> >   include/linux/kvm_host.h |5 +
> >   virt/kvm/assigned-dev.c  |  318
> >   +- 4 files changed, 
355
> >   insertions(+), 1 deletions(-)
> 
> Documentation?

For we are keeping changing the API for last several versions, I'd like to 
settle 
down the API first. Would bring back the document after API was agreed.
> 
> > +static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int
> > len, +void *val)
> > +{
> > +   struct kvm_assigned_dev_kernel *adev =
> > +   container_of(this, struct kvm_assigned_dev_kernel,
> > +msix_mmio_dev);
> > +   int idx, r = 0;
> > +   u32 entry[4];
> > +   struct kvm_kernel_irq_routing_entry e;
> > +
> > +   /* TODO: Get big-endian machine work */
> > +   mutex_lock(&adev->kvm->lock);
> > +   if (!msix_mmio_in_range(adev, addr, len)) {
> > +   r = -EOPNOTSUPP;
> > +   goto out;
> > +   }
> > +   if ((addr&  0x3) || len != 4)
> > +   goto out;
> > +
> > +   idx = msix_get_enabled_idx(adev, addr, len);
> > +   if (idx<  0) {
> > +   idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
> > +   if ((addr % PCI_MSIX_ENTRY_SIZE) ==
> > +   PCI_MSIX_ENTRY_VECTOR_CTRL)
> > +   *(unsigned long *)val =
> > +   test_bit(idx, adev->msix_mask_bitmap) ?
> > +   PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
> > +   else
> > +   r = -EOPNOTSUPP;
> > +   goto out;
> > +   }
> > +
> > +   r = kvm_get_irq_routing_entry(adev->kvm,
> > +   adev->guest_msix_entries[idx].vector,&e);
> > +   if (r || e.type != KVM_IRQ_ROUTING_MSI) {
> > +   r = -EOPNOTSUPP;
> > +   goto out;
> > +   }
> > +   entry[0] = e.msi.address_lo;
> > +   entry[1] = e.msi.address_hi;
> > +   entry[2] = e.msi.data;
> > +   entry[3] = test_bit(adev->guest_msix_entries[idx].entry,
> > +   adev->msix_mask_bitmap);
> > +   memcpy(val,&entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
> > +
> > +out:
> > +   mutex_unlock(&adev->kvm->lock);
> > +   return r;
> > +}
> > +
> > +static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int
> > len, + const void *val)
> > +{
> > +   struct kvm_assigned_dev_kernel *adev =
> > +   container_of(this, struct kvm_assigned_dev_kernel,
> > +msix_mmio_dev);
> > +   int idx, r = 0;
> > +   unsigned long new_val = *(unsigned long *)val;
> 
> What if it's a 64-bit write on a 32-bit host?

In fact we haven't support QWORD(64bit) accessing now. The reason is we haven't 
seen any OS is using it in this way now, so I think we can leave it later.

Also seems QEmu doesn't got the way to handle 64bit MMIO.
> 
> Are we sure the trailing bytes of val are zero?
> 
> > +
> > +   /* TODO: Get big-endian machine work */
> 
> BUILD_BUG_ON(something)

Good idea!
> 
> > +   mutex_lock(&adev->kvm->lock);
> > +   if (!msix_mmio_in_range(adev, addr, len)) {
> > +   r = -EOPNOTSUPP;
> > +   goto out;
> > +   }
> 
> Why is this needed?  Didn't the iodev check already do this?

Well, kvm_io_device_ops() hasn't got "in_range" callback yet...
> 
> > +   if ((addr&  0x3) || len != 4)
> > +   goto out;
> 
> What if len == 8?  I think mst said it was legal.

Since we haven't seen anyone is using it in this way, so I think we can leave 
it 
later.
> 
> > +
> > +   idx = msix_get_enabled_idx(adev, addr, len);
> > +   if (idx<  0) {
> > +   idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
> > +   if (((addr % PCI_MSIX_ENTRY_SIZE) ==
> > +   PCI_MSIX_ENTRY_VECTOR_CTRL)) {
> > +   if (new_val&  ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
> > +   goto out;
> > +   if (new_val&  PCI_MSIX_ENTRY_CTRL_MASKBIT)
> > +   set_bit(idx, adev->msix_mask_bitmap);
> > +   else
> > +   clear_bit(idx, adev->msix_mask_bitmap);
> > +   /* It's possible that we need re-enable MSI-X, so go
> > +* back to userspace */
> > +   }
> > +

[PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT

2010-11-17 Thread Hidetoshi Seto

This patch introduce a fallback mechanism for old systems that do not
support utimensat().  This fix build failure with following warnings:

hw/virtio-9p-local.c: In function 'local_utimensat':
hw/virtio-9p-local.c:479: warning: implicit declaration of function 'utimensat'
hw/virtio-9p-local.c:479: warning: nested extern declaration of 'utimensat'

and:

hw/virtio-9p.c: In function 'v9fs_setattr_post_chmod':
hw/virtio-9p.c:1410: error: 'UTIME_NOW' undeclared (first use in this function)
hw/virtio-9p.c:1410: error: (Each undeclared identifier is reported only once
hw/virtio-9p.c:1410: error: for each function it appears in.)
hw/virtio-9p.c:1413: error: 'UTIME_OMIT' undeclared (first use in this function)
hw/virtio-9p.c: In function 'v9fs_wstat_post_chmod':
hw/virtio-9p.c:2905: error: 'UTIME_OMIT' undeclared (first use in this function)

v4:
  - Use tv_now.tv_usec
  - Rebased on latest qemu.git
v3:
  - Use better alternative handling for UTIME_NOW/OMIT
  - Move qemu_utimensat() to cutils.c
V2:
  - Introduce qemu_utimensat()

Acked-by: Chris Wright 
Acked-by: M. Mohan Kumar 
Signed-off-by: Hidetoshi Seto 
---
 cutils.c |   44 
 hw/virtio-9p-local.c |4 ++--
 qemu-common.h|   10 ++
 3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/cutils.c b/cutils.c
index 28089aa..dbeb3f2 100644
--- a/cutils.c
+++ b/cutils.c
@@ -371,3 +371,47 @@ fail:
 
 return retval;
 }
+
+int qemu_utimensat(int dirfd, const char *path, const struct timespec *times,
+   int flags)
+{
+#ifdef CONFIG_UTIMENSAT
+return utimensat(dirfd, path, times, flags);
+#else
+/* Fallback: use utimes() instead of utimensat() */
+struct timeval tv[2], tv_now;
+struct stat st;
+int i;
+
+/* happy if special cases */
+if (times[0].tv_nsec == UTIME_OMIT && times[1].tv_nsec == UTIME_OMIT) {
+return 0;
+}
+if (times[0].tv_nsec == UTIME_NOW && times[1].tv_nsec == UTIME_NOW) {
+return utimes(path, NULL);
+}
+
+/* prepare for hard cases */
+if (times[0].tv_nsec == UTIME_NOW || times[1].tv_nsec == UTIME_NOW) {
+gettimeofday(&tv_now, NULL);
+}
+if (times[0].tv_nsec == UTIME_OMIT || times[1].tv_nsec == UTIME_OMIT) {
+stat(path, &st);
+}
+
+for (i = 0; i < 2; i++) {
+if (times[i].tv_nsec == UTIME_NOW) {
+tv[i].tv_sec = tv_now.tv_sec;
+tv[i].tv_usec = tv_now.tv_usec;
+} else if (times[i].tv_nsec == UTIME_OMIT) {
+tv[i].tv_sec = (i == 0) ? st.st_atime : st.st_mtime;
+tv[i].tv_usec = 0;
+} else {
+tv[i].tv_sec = times[i].tv_sec;
+tv[i].tv_usec = times[i].tv_nsec / 1000;
+}
+}
+
+return utimes(path, &tv[0]);
+#endif
+}
diff --git a/hw/virtio-9p-local.c b/hw/virtio-9p-local.c
index 0d52020..41603ea 100644
--- a/hw/virtio-9p-local.c
+++ b/hw/virtio-9p-local.c
@@ -480,9 +480,9 @@ static int local_chown(FsContext *fs_ctx, const char *path, 
FsCred *credp)
 }
 
 static int local_utimensat(FsContext *s, const char *path,
-  const struct timespec *buf)
+   const struct timespec *buf)
 {
-return utimensat(AT_FDCWD, rpath(s, path), buf, AT_SYMLINK_NOFOLLOW);
+return qemu_utimensat(AT_FDCWD, rpath(s, path), buf, AT_SYMLINK_NOFOLLOW);
 }
 
 static int local_remove(FsContext *ctx, const char *path)
diff --git a/qemu-common.h b/qemu-common.h
index b3957f1..f0b2c9d 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -150,6 +150,16 @@ int qemu_fls(int i);
 int qemu_fdatasync(int fd);
 int fcntl_setfl(int fd, int flag);
 ssize_t strtosz(const char *nptr, char **end);
+#ifndef CONFIG_UTIMENSAT
+#ifndef UTIME_NOW
+# define UTIME_NOW ((1l << 30) - 1l)
+#endif
+#ifndef UTIME_OMIT
+# define UTIME_OMIT((1l << 30) - 2l)
+#endif
+#endif
+int qemu_utimensat(int dirfd, const char *path, const struct timespec *times,
+int flags);
 
 /* path.c */
 void init_paths(const char *prefix);
-- 
1.7.3.1


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH]KVM: VMX: Inform user about INTEL_TXT dependency

2010-11-17 Thread Wang, Shane

After discussing with Joe, we gave it up (i.e. not depend on 
enabled-inside-smx;)

Thanks.
Shane

-Original Message-
From: Jan Kiszka [mailto:jan.kis...@siemens.com] 
Sent: 2010年11月17日 15:56
To: Wang, Shane
Cc: a...@redhat.com; mtosa...@redhat.com; kvm@vger.kernel.org; Cihula, Joseph
Subject: Re: [PATCH]KVM: VMX: Inform user about INTEL_TXT dependency

Am 17.11.2010 04:40, Shane Wang wrote:
> Inform user to either disable TXT in the BIOS or do TXT launch with tboot 
> before enabling KVM since some BIOSes do not set 
> FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled.
> 
> Signed-off-by: Shane Wang 
> ---
>  arch/x86/kvm/vmx.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff -r b1a2deff4c64 arch/x86/kvm/vmx.c
> --- a/arch/x86/kvm/vmx.c  Wed Nov 17 12:47:42 2010 -0500
> +++ b/arch/x86/kvm/vmx.c  Wed Nov 17 12:49:52 2010 -0500
> @@ -1306,8 +1306,11 @@
>   && tboot_enabled())
>   return 1;
>   if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX)
> - && !tboot_enabled())
> + && !tboot_enabled()) {
> + printk(KERN_WARNING "kvm: disable TXT in the BIOS or "
> + " activate TXT before enabling KVM\n");

Thought you wanted to let this message depend on ENABLED_INSIDE_SMX?
However, if it's OK for you, I'm fine with it as well.

Thanks!
Jan

>   return 1;
> + }
>   }
>  
>   return 0;

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�筏�hФ�≤�}��财�z�&j:+v�����赙zZ+��+zf＂�h���~i���z��wア�?�ㄨ��&�)撷f

Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC

2010-11-17 Thread Anthony Liguori


On 11/16/2010 03:24 PM, Alex Williamson wrote:

On Tue, 2010-11-16 at 08:58 -0600, Anthony Liguori wrote:
   

On 11/01/2010 10:14 AM, Alex Williamson wrote:
 

Register the actual VM RAM using the new API

Signed-off-by: Alex Williamson
---

   hw/pc.c |   12 ++--
   1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 69b13bf..0ea6d10 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -912,14 +912,14 @@ void pc_memory_init(ram_addr_t ram_size,
   /* allocate RAM */
   ram_addr = qemu_ram_alloc(NULL, "pc.ram",
 below_4g_mem_size + above_4g_mem_size);
-cpu_register_physical_memory(0, 0xa, ram_addr);
-cpu_register_physical_memory(0x10,
- below_4g_mem_size - 0x10,
- ram_addr + 0x10);
+
+qemu_ram_register(0, 0xa, ram_addr);
+qemu_ram_register(0x10, below_4g_mem_size - 0x10,
+  ram_addr + 0x10);
   #if TARGET_PHYS_ADDR_BITS>   32
   if (above_4g_mem_size>   0) {
-cpu_register_physical_memory(0x1ULL, above_4g_mem_size,
- ram_addr + below_4g_mem_size);
+qemu_ram_register(0x1ULL, above_4g_mem_size,
+  ram_addr + below_4g_mem_size);
   }

   

Take a look at the memory shadowing in the i440fx.  The regions of
memory in the BIOS area can temporarily become RAM.

That's because there is normally RAM backing this space but the memory
controller redirects writes to the ROM space.

Not sure the best way to handle this, but the basic concept is, RAM
always exists but if a device tries to access it, it may or may not be
accessible as RAM at any given point in time.
 

Gack.  For the benefit of those that want to join the fun without
digging up the spec, these magic flippable segments the i440fx can
toggle are 12 fixed 16k segments from 0xc to 0xe and a single
64k segment from 0xf to 0xf.  There are read-enable and
write-enable bits for each, so the chipset can be configured to read
from the bios and write to memory (to setup BIOS-RAM caching), and read
from memory and write to the bios (to enable BIOS-RAM caching).  The
other bit combinations are also available.
   


Yup.  As Gleb mentions, there's the SDRAM register which controls 
whether 0xa is mapped to PCI or whether it's mapped to RAM (but KVM 
explicitly disabled SMM support).



For my purpose in using this to program the IOMMU with guest physical to
host virtual addresses for device assignment, it doesn't really matter
since there should never be a DMA in this range of memory.  But for a
general RAM API, I'm not sure either.  I'm tempted to say that while
this is in fact a use of RAM, the RAM is never presented to the guest as
usable system memory (E820_RAM for x86), and should therefore be
excluded from the RAM API if we're using it only to track regions that
are actual guest usable physical memory.

We had talked on irc that pc.c should be registering 0x0 to
below_4g_mem_size as ram, but now I tend to disagree with that.  The
memory backing 0xa-0x10 is present, but it's not presented to
the guest as usable RAM.  What's your strict definition of what the RAM
API includes?  Is it only what the guest could consider usable RAM or
does it also include quirky chipset accelerator features like this
(everything with a guest physical address)?  Thanks,
   


Today we model on flat space that's a mixed of device memory, RAM, or 
ROM.  This is not how machines work and the limitations of this model is 
holding us back.


IRL, there's a block of RAM that's connected to a memory controller.  
The CPU is also connected to the memory controller.  Devices are 
connected to another controller which is in turn connected to the memory 
controller.  There may, in fact, be more than one controller between a 
device and the memory controller.


A controller may change the way a device sees memory in arbitrary ways.  
In fact, two controllers accessing the same page might see something 
totally different.


The idea behind the RAM API is to begin to establish this hierarchy.  
RAM is not what any particular device sees--it's actual RAM.  IOW, the 
RAM API should represent what address mapping I would get if I talked 
directly to DIMMs.


This is not what RamBlock is even though the name would suggest 
otherwise.  RamBlocks are anything that qemu represents as cache 
consistency directly accessable memory.  Device ROMs and areas of device 
RAM are all allocated from the RamBlock space.


So the very first task of a RAM API is to simplify differentiate these 
two things.  Once we have the base RAM API, we can start adding the 
proper APIs that sit on top of it (like a PCI memory API).


Regards,

Anthony Liguori


Alex


   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger

HAL type for Win2003 Server on recent KVM versions?

2010-11-17 Thread Kenni Lund

Hi

I'm about to move a couple of virtual machines from a Fedora 11 system
to a new server with a more recent operating system and newer version
of KVM, etc.

One of the guests is a Windows Server 2003 Standard SP2, which is
currently running with the "ACPI Multiprocessor PC" HAL.

Considering moving to RHEL, I've been reading the virtualization
documentation for RHEL 6.0, which says that I need to set HAL to
"Standard PC" when installing a new Win2003 guest.

Since my current guest has been running perfectly fine for a long time
with its current HAL, I was wondering if the system will become
unstable, unbootable or what the disadvantage will be, if I move the
guest to for example RHEL 6.0, without reinstalling or upgrading the
guest to select another HAL mode?

On the other hand, it seems like I can "upgrade" from the current
"ACPI Multiprocessor PC" into "Standard PC", but I'm not sure if I'll
gain anything by trying this.

Thanks in advance..

Best regards
Kenni
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-17 Thread Michael S. Tsirkin

Store irq routing table pointer in the irqfd object,
and use that to inject MSI directly without bouncing out to
a kernel thread.

While we touch this structure, rearrange irqfd fields to make fastpath
better packed for better cache utilization.

Some notes on the design:
- Use pointer into the rt instead of copying an entry,
  to make it possible to use rcu, thus side-stepping
  locking complexities.  We also save some memory this way.
- Old workqueue code is still used for level irqs.
  I don't think we DTRT with level anyway, however,
  it seems easier to keep the code around as
  it has been thought through and debugged, and fix level later than
  rip out and re-instate it later.

Signed-off-by: Michael S. Tsirkin 
---

The below is compile tested only.  Sending out for early
flames/feedback.  Please review!

 include/linux/kvm_host.h |4 ++
 virt/kvm/eventfd.c   |   81 +++--
 virt/kvm/irq_comm.c  |6 ++-
 3 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a055742..b6f7047 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -462,6 +462,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
*ioapic,
   unsigned long *deliver_bitmask);
 #endif
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
+int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
+   int irq_source_id, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
@@ -603,6 +605,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 void kvm_eventfd_init(struct kvm *kvm);
 int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
 void kvm_irqfd_release(struct kvm *kvm);
+void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table *irq_rt);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
 
 #else
@@ -614,6 +617,7 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int 
gsi, int flags)
 }
 
 static inline void kvm_irqfd_release(struct kvm *kvm) {}
+static inline void kvm_irqfd_update(struct kvm *kvm) {}
 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 {
return -ENOSYS;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c1f1e3c..49c1864 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -44,14 +44,18 @@
  */
 
 struct _irqfd {
-   struct kvm   *kvm;
-   struct eventfd_ctx   *eventfd;
-   int   gsi;
-   struct list_head  list;
-   poll_tablept;
-   wait_queue_t  wait;
-   struct work_structinject;
-   struct work_structshutdown;
+   /* Used for MSI fast-path */
+   struct kvm *kvm;
+   wait_queue_t wait;
+   struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
+   /* Used for level IRQ fast-path */
+   int gsi;
+   struct work_struct inject;
+   /* Used for setup/shutdown */
+   struct eventfd_ctx *eventfd;
+   struct list_head list;
+   poll_table pt;
+   struct work_struct shutdown;
 };
 
 static struct workqueue_struct *irqfd_cleanup_wq;
@@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
 {
struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
unsigned long flags = (unsigned long)key;
+   struct kvm_kernel_irq_routing_entry *irq;
 
-   if (flags & POLLIN)
+   if (flags & POLLIN) {
+   rcu_read_lock();
+   irq = irqfd->irq_entry;
/* An event has been signaled, inject an interrupt */
-   schedule_work(&irqfd->inject);
+   if (irq)
+   kvm_set_msi(irq, irqfd->kvm, 
KVM_USERSPACE_IRQ_SOURCE_ID, 1);
+   else
+   schedule_work(&irqfd->inject);
+   rcu_read_unlock();
+   }
 
if (flags & POLLHUP) {
/* The eventfd is closing, detach from KVM */
@@ -166,6 +178,7 @@ irqfd_ptable_queue_proc(struct file *file, 
wait_queue_head_t *wqh,
 static int
 kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
 {
+   struct kvm_irq_routing_table *irq_rt;
struct _irqfd *irqfd, *tmp;
struct file *file = NULL;
struct eventfd_ctx *eventfd = NULL;
@@ -215,6 +228,10 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
goto fail;
}
 
+   rcu_read_lock();
+   irqfd_update(kvm, irqfd, rcu_dereference(kvm->irq_routing));
+   rcu_read_unlock();
+
events = file->f_op->poll(file, &irqfd->pt);
 
list_add_tail(&irqfd->list, &kvm->irqfds.items);
@@ -271,8 +288,15 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)

Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-17 Thread Blue Swirl

2010/11/16 Gleb Natapov :
> On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote:
>> >> Perhaps the FW path should use device class names if no name is specified.
>> > What do you mean by "device class name". We can do something like this:
>> > if (dev->child_bus.lh_first)
>> >        return dev->child_bus.lh_first->info->name;
>> >
>> > i.e if there is child bus use its bus name as fw name. This will make
>> > all pci devices to have "pci" as fw name automatically. The problem is
>> > that theoretically same device can provide different buses.
>>
>> I meant PCI class name, like "display" for display controllers,
>> "network" for NICs etc.
>>
> That is what my pci bus related patch is doing already.
>
>> >> I'll try Sparc32 to see how this fits there.
>>
>> Except bootindex is not implemented for SCSI.
> Will look into adding it.

Thanks. The bootindex on Sparc32 looks like this:
bootindex /e...@7880/d...@1,0
/ether...@/ethernet-...@0

I don't think I got Lance setup right.

OF paths for the devices would be:
/io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0
/io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0

The logic for ESP is that ESP (registers at 0x7880, slot offset
0x88) is handled by the DMA controller (registers at 0x7840,
slot offset 0x84), they are in a SBus slot #5, and SBus (registers
at 0x10001000) is in turn handled by IOMMU (registers at 0x1000).
Lance should be handled the same way.

This hierarchy is partly known by QEMU because DMA accesses use this
flow, but not otherwise. There is no concept of SBus slots, DMA talks
to IOMMU directly. Though in this case both ESP, Lance and their DMA
controllers are on board devices in a MACIO chip. It may be possible
to add the hierarchy information at each stage.

It should also be possible for BIOS to determine the device just from
the physical address if we ignored OF compatibility.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] ceph/rbd block driver for qemu-kvm (v8)

2010-11-17 Thread Christian Brunner

Here is another update for the ceph storage driver. It includes changes
for the annotations Stefan made last week and a bit more things Sage
discovered while looking over the driver again.

I really hope that this time we are not only close, but have reached
a quality that everyone is satisfied with. - Of course suggestions for 
further improvements are always welcome.

Regards,
Christian


RBD is an block driver for the distributed file system Ceph
(http://ceph.newdream.net/). This driver uses librados (which
is part of the Ceph server) for direct access to the Ceph object
store and is running entirely in userspace (Yehuda also
wrote a driver for the linux kernel, that can be used to access
rbd volumes as a block device).
---
 Makefile.objs |1 +
 block/rbd.c   | 1059 +
 block/rbd_types.h |   71 
 configure |   31 ++
 4 files changed, 1162 insertions(+), 0 deletions(-)
 create mode 100644 block/rbd.c
 create mode 100644 block/rbd_types.h

diff --git a/Makefile.objs b/Makefile.objs
index 6ee077c..56a13c1 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -19,6 +19,7 @@ block-nested-y += parallels.o nbd.o blkdebug.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
+block-nested-$(CONFIG_RBD) += rbd.o
 
 block-obj-y +=  $(addprefix block/, $(block-nested-y))
 
diff --git a/block/rbd.c b/block/rbd.c
new file mode 100644
index 000..249a590
--- /dev/null
+++ b/block/rbd.c
@@ -0,0 +1,1059 @@
+/*
+ * QEMU Block driver for RADOS (Ceph)
+ *
+ * Copyright (C) 2010 Christian Brunner 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "qemu-error.h"
+
+#include "rbd_types.h"
+#include "block_int.h"
+
+#include 
+
+
+
+/*
+ * When specifying the image filename use:
+ *
+ * rbd:poolname/devicename
+ *
+ * poolname must be the name of an existing rados pool
+ *
+ * devicename is the basename for all objects used to
+ * emulate the raw device.
+ *
+ * Metadata information (image size, ...) is stored in an
+ * object with the name "devicename.rbd".
+ *
+ * The raw device is split into 4MB sized objects by default.
+ * The sequencenumber is encoded in a 12 byte long hex-string,
+ * and is attached to the devicename, separated by a dot.
+ * e.g. "devicename.1234567890ab"
+ *
+ */
+
+#define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
+
+typedef struct RBDAIOCB {
+BlockDriverAIOCB common;
+QEMUBH *bh;
+int ret;
+QEMUIOVector *qiov;
+char *bounce;
+int write;
+int64_t sector_num;
+int aiocnt;
+int error;
+struct BDRVRBDState *s;
+int cancelled;
+} RBDAIOCB;
+
+typedef struct RADOSCB {
+int rcbid;
+RBDAIOCB *acb;
+struct BDRVRBDState *s;
+int done;
+int64_t segsize;
+char *buf;
+int ret;
+} RADOSCB;
+
+#define RBD_FD_READ 0
+#define RBD_FD_WRITE 1
+
+typedef struct BDRVRBDState {
+int fds[2];
+rados_pool_t pool;
+rados_pool_t header_pool;
+char name[RBD_MAX_OBJ_NAME_SIZE];
+char block_name[RBD_MAX_BLOCK_NAME_SIZE];
+uint64_t size;
+uint64_t objsize;
+int qemu_aio_count;
+int event_reader_pos;
+RADOSCB *event_rcb;
+} BDRVRBDState;
+
+typedef struct rbd_obj_header_ondisk RbdHeader1;
+
+static void rbd_aio_bh_cb(void *opaque);
+
+static int rbd_next_tok(char *dst, int dst_len,
+char *src, char delim,
+const char *name,
+char **p)
+{
+int l;
+char *end;
+
+*p = NULL;
+
+if (delim != '\0') {
+end = strchr(src, delim);
+if (end) {
+*p = end + 1;
+*end = '\0';
+}
+}
+l = strlen(src);
+if (l >= dst_len) {
+error_report("%s too long", name);
+return -EINVAL;
+} else if (l == 0) {
+error_report("%s too short", name);
+return -EINVAL;
+}
+
+pstrcpy(dst, dst_len, src);
+
+return 0;
+}
+
+static int rbd_parsename(const char *filename,
+ char *pool, int pool_len,
+ char *snap, int snap_len,
+ char *name, int name_len)
+{
+const char *start;
+char *p, *buf;
+int ret;
+
+if (!strstart(filename, "rbd:", &start)) {
+return -EINVAL;
+}
+
+buf = qemu_strdup(start);
+p = buf;
+
+ret = rbd_next_tok(pool, pool_len, p, '/', "pool name", &p);
+if (ret < 0 || !p) {
+ret = -EINVAL;
+goto done;
+}
+ret = rbd_next_tok(name, name_len, p, '@', "object name", &p);
+if (ret < 0) {
+goto done;
+}
+if (!p) {
+*snap = '\0';
+goto done;
+}
+
+ret = rbd_next_tok(snap, snap_len, p, '\0', "snap name", &p);
+
+done:
+qemu_free(buf);
+return ret;
+}
+
+static int create_tmap_op(uint8_t op, const char *n

[no subject]

2010-11-17 Thread satimis


http://www.decaza.com/TER healthtworx.ru cid=extasy.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

early kernel panic in linux guest (div-by-zero in pvclock_tsc_khz)

2010-11-17 Thread Stefan Bühler


Hi,

i get an early kernel panic with some kernels:

The physical host runs 2.6.32-5-amd64 (debian stable/testing), and uses 
qemu-kvm/0.12.5+dfsg-4 with libvirt 0.8.3-4.


The node is based on debian testing.

The host has two cores, the guest uses one.

The following tested kernel versions panic:
 - 2.6.30 (linux-image-2.6.30-2-amd64/2.6.30-8squeeze1)
 - 2.6.32 (linux-image-2.6.32-5-amd64/2.6.32-27)
 - 2.6.36 (linux-image-2.6.36-trunk-amd64/2.6.36-1~experimental.1)

The debian stable kernel does *not* panic:
 - 2.6.26 (linux-image-2.6.26-2-amd64/2.6.26-25)

Example log for 2.6.32-5-amd64 (experimental has similar backtrace); the
panic is caused by a div-by-zero in pvclock_tsc_khz:

[0.00] kvm-clock: cpu 0, msr 0:14f1701, boot clock
PANIC: early exception 00 rip 10:8102cd63 error 0 cr2 0
[0.00] Pid: 0, comm: swapper Not tainted 2.6.32-5-amd64 #1
[0.00] Call Trace:
[0.00]  [] ? early_idt_handler+0x5e/0x71
[0.00]  [] ? pvclock_tsc_khz+0x13/0x2a
[0.00]  [] ? kvmclock_init+0x133/0x18c
[0.00]  [] ? parse_crashkernel+0x46/0x23f
[0.00]  [] ? setup_arch+0x8f6/0x9cb
[0.00]  [] ? extract_entropy+0x6a/0x125
[0.00]  [] ? early_idt_handler+0x0/0x71
[0.00]  [] ? start_kernel+0xdb/0x3e8
[0.00]  [] ? x86_64_start_kernel+0xf9/0x106
[0.00] RIP pvclock_tsc_khz+0x13/0x2a


(gdb) disassemble pvclock_tsc_khz
Dump of assembler code for function pvclock_tsc_khz:
0x8102cd50 :  sub$0x8,%rsp
0x8102cd54 :  mov0x18(%rdi),%ecx
0x8102cd57 :  xor%edx,%edx
0x8102cd59 :  mov$0xf4240,%rax
0x8102cd63 : div%rcx
0x8102cd66 : movsbl 0x1c(%rdi),%ecx
0x8102cd6a : test   %cl,%cl
0x8102cd6c : jns0x8102cd75 


0x8102cd6e : neg%ecx
0x8102cd70 : shl%cl,%rax
0x8102cd73 : jmp0x8102cd78 


0x8102cd75 : shr%cl,%rax
0x8102cd78 : pop%rdx
0x8102cd79 : retq
End of assembler dump.

Debian Bug report:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603670

Regards,
Stefan
Loading Linux 2.6.32-5-amd64 ...
Loading initial ramdisk ...
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 2.6.32-5-amd64 (Debian 2.6.32-27) 
(m...@debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Sat Oct 30 
14:18:21 UTC 2010
[0.00] Command line: BOOT_IMAGE=/vmlinuz-2.6.32-5-amd64 
root=/dev/mapper/vg0-stefan ro single console=tty0 console=ttyS0,38400 
earlyprintk=ttyS0
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009f000 (usable)
[0.00]  BIOS-e820: 0009f000 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 3fffb000 (usable)
[0.00]  BIOS-e820: 3fffb000 - 4000 (reserved)
[0.00]  BIOS-e820: fffbc000 - 0001 (reserved)
[0.00] bootconsole [earlyser0] enabled
[0.00] DMI 2.4 present.
[0.00] last_pfn = 0x3fffb max_arch_pfn = 0x4
[0.00] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[0.00] init_memory_mapping: -3fffb000
[0.00] RAMDISK: 2f87f000 - 3003c109
[0.00] ACPI: RSDP 000f8830 00014 (v00 BOCHS )
[0.00] ACPI: RSDT 3fffde30 00034 (v01 BOCHS  BXPCRSDT 0001 
BXPC 0001)
[0.00] ACPI: FACP 3e70 00074 (v01 BOCHS  BXPCFACP 0001 
BXPC 0001)
[0.00] ACPI: DSDT 3fffdfd0 01E22 (v01   BXPC   BXDSDT 0001 
INTL 20090123)
[0.00] ACPI: FACS 3e00 00040
[0.00] ACPI: SSDT 3fffdf90 00037 (v01 BOCHS  BXPCSSDT 0001 
BXPC 0001)
[0.00] ACPI: APIC 3fffdeb0 00072 (v01 BOCHS  BXPCAPIC 0001 
BXPC 0001)
[0.00] ACPI: HPET 3fffde70 00038 (v01 BOCHS  BXPCHPET 0001 
BXPC 0001)
[0.00] No NUMA configuration found
[0.00] Faking a node at -3fffb000
[0.00] Bootmem setup node 0 -3fffb000
[0.00]   NODE_DATA [9000 - 00010fff]
[0.00]   bootmap [00011000 -  00018fff] pages 8
[0.00] (7 early reservations) ==> bootmem [00 - 003fffb000]
[0.00]   #0 [00 - 001000]   BIOS data page ==> [00 
- 001000]
[0.00]   #1 [006000 - 008000]   TRAMPOLINE ==> [006000 
- 008000]
[0.00]   #2 [000100 - 0001688414]TEXT DATA BSS ==> [000100 
- 0001688414]
[0.00]   #3 [002f87f000 - 003003c109]  RAMDISK ==> [002f87f00

[PATCH] KVM test: build subtest: Add path_to_rom_images

2010-11-17 Thread Lucas Meneghel Rodrigues

In some cases, KVM userspace source might miss the roms
needed for it to boot. In those cases, allow people to
specify a path_to_rom_images, that will be used to copy
roms to the recently compiled qemu-kvm.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/build.cfg.sample |   15 +++
 client/tests/kvm/tests/build.py   |   14 ++
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/build.cfg.sample 
b/client/tests/kvm/build.cfg.sample
index 860192b..b6a20f7 100644
--- a/client/tests/kvm/build.cfg.sample
+++ b/client/tests/kvm/build.cfg.sample
@@ -23,6 +23,9 @@ variants:
 # release_tag = 84
 release_dir = http://downloads.sourceforge.net/project/kvm/
 release_listing = http://sourceforge.net/projects/kvm/files/
+# In some cases, you might want to provide a ROM dir, so ROM
+# files can be copied from there to your source based install
+# path_to_rom_images = /usr/share/kvm
 - snapshot:
 mode = snapshot
 ## Install from a kvm snapshot location. You can optionally
@@ -30,14 +33,23 @@ variants:
 ## yesterday's snapshot.
 # snapshot_date = 20090712
 snapshot_dir = http://foo.org/kvm-snapshots/
+# In some cases, you might want to provide a ROM dir, so ROM
+# files can be copied from there to your source based install
+# path_to_rom_images = /usr/share/kvm
 - localtar:
 mode = localtar
 ## Install from tarball located on the host's filesystem.
 tarball = /tmp/kvm-84.tar.gz
+# In some cases, you might want to provide a ROM dir, so ROM
+# files can be copied from there to your source based install
+# path_to_rom_images = /usr/share/kvm
 - localsrc:
 mode = localsrc
 ## Install from tarball located on the host's filesystem.
 srcdir = /tmp/kvm-84
+# In some cases, you might want to provide a ROM dir, so ROM
+# files can be copied from there to your source based install
+# path_to_rom_images = /usr/share/kvm
 - git:
 mode = git
 ## Install KVM from git repositories.
@@ -64,6 +76,9 @@ variants:
 # kmod_lbranch = kmod_lbranch_name
 # kmod_commit = kmod_commit_name
 # kmod_patches = ['http://foo.com/patch1', 
'http://foo.com/patch2']
+# In some cases, you might want to provide a ROM dir, so ROM
+# files can be copied from there to your source based install
+# path_to_rom_images = /usr/share/kvm
 - yum:
 mode = yum
 src_pkg = qemu
diff --git a/client/tests/kvm/tests/build.py b/client/tests/kvm/tests/build.py
index c4f0b18..bb3e2dc 100644
--- a/client/tests/kvm/tests/build.py
+++ b/client/tests/kvm/tests/build.py
@@ -154,6 +154,15 @@ def create_symlinks(test_bindir, prefix=None, 
bin_list=None, unittest=None):
 os.symlink(unittest, qemu_unittest_path)
 
 
+def install_roms(rom_dir, prefix):
+logging.debug("Path to roms specified. Copying roms to install prefix")
+rom_dst_dir = os.path.join(prefix, 'share', 'qemu')
+for rom_src in glob.glob('%s/*.bin' % rom_dir):
+rom_dst = os.path.join(rom_dst_dir, os.path.basename(rom_src))
+logging.debug("Copying rom file %s to %s", rom_src, rom_dst)
+shutil.copy(rom_src, rom_dst)
+
+
 def save_build(build_dir, dest_dir):
 logging.debug('Saving the result of the build on %s', dest_dir)
 base_name = os.path.basename(build_dir)
@@ -314,6 +323,7 @@ class SourceDirInstaller(BaseInstaller):
 
 install_mode = params["mode"]
 srcdir = params.get("srcdir", None)
+self.path_to_roms = params.get("path_to_rom_images", None)
 
 if install_mode == 'localsrc':
 if srcdir is None:
@@ -391,6 +401,8 @@ class SourceDirInstaller(BaseInstaller):
 utils.system("make -C qemu install")
 elif self.repo_type == 2:
 utils.system("make install")
+if self.path_to_roms:
+install_roms(self.path_to_roms, self.prefix)
 create_symlinks(self.test_bindir, self.prefix)
 
 
@@ -559,6 +571,8 @@ class GitInstaller(SourceDirInstaller):
 def _install(self):
 os.chdir(self.userspace_srcdir)
 utils.system('make install')
+if self.path_to_roms:
+install_roms(self.path_to_roms, self.prefix)
 create_symlinks(test_bindir=self.test_bindir, prefix=self.prefix,
 bin_list=None,
 unittest=self.unittest_prefix)
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a

Re: [PATCH v2 1/6] KVM: MMU: fix forgot flush vcpu tlbs

2010-11-17 Thread Marcelo Tosatti

On Wed, Nov 17, 2010 at 06:26:51PM +0200, Avi Kivity wrote:
> On 11/17/2010 05:29 PM, Marcelo Tosatti wrote:
> >>  diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> >>  index ba00eef..58b4d9a 100644
> >>  --- a/arch/x86/kvm/paging_tmpl.h
> >>  +++ b/arch/x86/kvm/paging_tmpl.h
> >>  @@ -781,6 +781,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, 
> >> struct kvm_mmu_page *sp,
> >>else
> >>nonpresent = shadow_notrap_nonpresent_pte;
> >>drop_spte(vcpu->kvm,&sp->spt[i], nonpresent);
> >>  + kvm_flush_remote_tlbs(vcpu->kvm);
> >>continue;
> >>}
> >
> >This is not needed. Guest is responsible for flushing on
> >present->nonpresent change.
> 
> sync_page
> drop_spte
> kvm_mmu_notifier_invalidate_page
> kvm_unmap_rmapp
> spte doesn't exist -> no flush
> page is freed
> guest can write into freed page?

Ugh right.

> I don't think we need to flush immediately; set a "tlb dirty" bit
> somewhere that is cleareded when we flush the tlb.
> kvm_mmu_notifier_invalidate_page() can consult the bit and force a
> flush if set.

Yep.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 6/6] KVM: MMU: cleanup update_pte, pte_prefetch and sync_page functions

2010-11-17 Thread Marcelo Tosatti

On Wed, Nov 17, 2010 at 12:13:59PM +0800, Xiao Guangrong wrote:
> Abstract the same operation to cleanup them
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c |3 --
>  arch/x86/kvm/paging_tmpl.h |   70 ++-
>  2 files changed, 36 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 0668f4b..c513afc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3082,9 +3082,6 @@ static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
>   return;
>  }
>  
> - if (is_rsvd_bits_set(&vcpu->arch.mmu, *(u64 *)new, PT_PAGE_TABLE_LEVEL))
> - return;
> -
>   ++vcpu->kvm->stat.mmu_pte_updated;
>   if (!sp->role.cr4_pae)
>   paging32_update_pte(vcpu, sp, spte, new);
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 60f00db..01a00b0 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -299,25 +299,43 @@ static int FNAME(walk_addr_nested)(struct guest_walker 
> *walker,
>   addr, access);
>  }
>  
> +static bool FNAME(map_invalid_gpte)(struct kvm_vcpu *vcpu,
> + struct kvm_mmu_page *sp, u64 *spte,
> + pt_element_t gpte)
> +{
> + u64 nonpresent = shadow_trap_nonpresent_pte;
> +
> + if (is_rsvd_bits_set(&vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL))
> + goto no_present;
> +
> + if (!is_present_gpte(gpte)) {
> + if (!sp->unsync)
> + nonpresent = shadow_notrap_nonpresent_pte;
> + goto no_present;
> + }
> +
> + if (!(gpte & PT_ACCESSED_MASK))
> + goto no_present;
> +
> + return false;
> +
> +no_present:
> + if (drop_spte(vcpu->kvm, spte, nonpresent))
> + kvm_flush_remote_tlbs(vcpu->kvm);
> + return true;
> +}

TLB flush not necessary. Looks fine otherwise, please rebase.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/6] KVM: MMU: don't mark spte notrap if reserved bit set

2010-11-17 Thread Marcelo Tosatti

On Wed, Nov 17, 2010 at 12:11:41PM +0800, Xiao Guangrong wrote:
> If reserved bit is set, we need inject the #PF with PFEC.RSVD=1,
> but shadow_notrap_nonpresent_pte injects #PF with PFEC.RSVD=0 only
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/paging_tmpl.h |   17 +++--
>  1 files changed, 11 insertions(+), 6 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 4/6] KVM: MMU: rename 'reset_host_protection' to 'host_writable'

2010-11-17 Thread Marcelo Tosatti

On Wed, Nov 17, 2010 at 12:12:38PM +0800, Xiao Guangrong wrote:
> From: Lai Jiangshan 
> 
> Rename it to fix the sense better 
> 
> Signed-off-by: Lai Jiangshan 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c |8 
>  arch/x86/kvm/paging_tmpl.h |   10 +-
>  2 files changed, 9 insertions(+), 9 deletions(-)

Agreed. Does not apply anymore, please regenerate.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 5/6] KVM: MMU: remove 'clear_unsync' parameter

2010-11-17 Thread Marcelo Tosatti

On Wed, Nov 17, 2010 at 12:13:17PM +0800, Xiao Guangrong wrote:
> Remove it since we can jude it by sp->unsync
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/include/asm/kvm_host.h |2 +-
>  arch/x86/kvm/mmu.c  |8 
>  arch/x86/kvm/paging_tmpl.h  |5 ++---
>  3 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b04c0fa..ce8c1e4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -250,7 +250,7 @@ struct kvm_mmu {
>   void (*prefetch_page)(struct kvm_vcpu *vcpu,
> struct kvm_mmu_page *page);
>   int (*sync_page)(struct kvm_vcpu *vcpu,
> -  struct kvm_mmu_page *sp, bool clear_unsync);
> +  struct kvm_mmu_page *sp);
>   void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva);
>   hpa_t root_hpa;
>   int root_level;
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index c4531a3..0668f4b 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1162,7 +1162,7 @@ static void nonpaging_prefetch_page(struct kvm_vcpu 
> *vcpu,
>  }
>  
>  static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
> -struct kvm_mmu_page *sp, bool clear_unsync)
> +struct kvm_mmu_page *sp)
>  {
>   return 1;
>  }
> @@ -1292,7 +1292,7 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, 
> struct kvm_mmu_page *sp,
>   if (clear_unsync)
>   kvm_unlink_unsync_page(vcpu->kvm, sp);
>  
> - if (vcpu->arch.mmu.sync_page(vcpu, sp, clear_unsync)) {
> + if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
>   kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
>   return 1;
>   }
> @@ -1333,12 +1333,12 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  
> gfn_t gfn)
>   continue;
>  
>   WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
> + kvm_unlink_unsync_page(vcpu->kvm, s);
>   if ((s->role.cr4_pae != !!is_pae(vcpu)) ||
> - (vcpu->arch.mmu.sync_page(vcpu, s, true))) {
> + (vcpu->arch.mmu.sync_page(vcpu, s))) {
>   kvm_mmu_prepare_zap_page(vcpu->kvm, s, &invalid_list);
>   continue;
>   }
> - kvm_unlink_unsync_page(vcpu->kvm, s);
>   flush = true;
>   }
>  
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 57619ed..60f00db 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -740,8 +740,7 @@ static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
>   * - The spte has a reference to the struct page, so the pfn for a given gfn
>   *   can't change unless all sptes pointing to it are nuked first.
>   */
> -static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> - bool clear_unsync)
> +static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>  {
>   int i, offset, nr_present;
>   bool host_writable;
> @@ -781,7 +780,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
> kvm_mmu_page *sp,
>   u64 nonpresent;
>  
>   if (rsvd_bits_set || is_present_gpte(gpte) ||
> -   !clear_unsync)
> +   sp->unsync)
>   nonpresent = shadow_trap_nonpresent_pte;
>   else
>   nonpresent = shadow_notrap_nonpresent_pte;

Its better to keep this explicit as a parameter. 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: Record instruction set in kvm_exit tracepoint

2010-11-17 Thread Avi Kivity

exit_reason's meaning depend on the instruction set; record it so a trace
taken on one machine can be interpreted on another.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/svm.c   |2 +-
 arch/x86/kvm/trace.h |   10 --
 arch/x86/kvm/vmx.c   |2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1a5757a..8fe446b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2985,7 +2985,7 @@ static int handle_exit(struct kvm_vcpu *vcpu)
struct kvm_run *kvm_run = vcpu->run;
u32 exit_code = svm->vmcb->control.exit_code;
 
-   trace_kvm_exit(exit_code, vcpu);
+   trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
 
if (!(svm->vmcb->control.intercept_cr_write & INTERCEPT_CR0_MASK))
vcpu->arch.cr0 = svm->vmcb->save.cr0;
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index a6544b8..3a0fc74 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -178,21 +178,27 @@ TRACE_EVENT(kvm_apic,
 #define trace_kvm_apic_read(reg, val)  trace_kvm_apic(0, reg, val)
 #define trace_kvm_apic_write(reg, val) trace_kvm_apic(1, reg, val)
 
+#define KVM_ISA_VMX   1
+#define KVM_ISA_SVM   2
+
 /*
  * Tracepoint for kvm guest exit:
  */
 TRACE_EVENT(kvm_exit,
-   TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu),
-   TP_ARGS(exit_reason, vcpu),
+   TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu,
+u32 isa),
+   TP_ARGS(exit_reason, vcpu, isa),
 
TP_STRUCT__entry(
__field(unsigned int,   exit_reason )
__field(unsigned long,  guest_rip   )
+   __field(u32,isa )
),
 
TP_fast_assign(
__entry->exit_reason= exit_reason;
__entry->guest_rip  = kvm_rip_read(vcpu);
+   __entry->isa= isa;
),
 
TP_printk("reason %s rip 0x%lx",
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9367abc..6cbc6b2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3698,7 +3698,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
u32 exit_reason = vmx->exit_reason;
u32 vectoring_info = vmx->idt_vectoring_info;
 
-   trace_kvm_exit(exit_reason, vcpu);
+   trace_kvm_exit(exit_reason, vcpu, KVM_ISA_VMX);
 
/* If guest state is invalid, start emulating */
if (vmx->emulation_required && emulate_invalid_guest_state)
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 10/16] Add get_dev_path callback for usb bus.

2010-11-17 Thread Gleb Natapov


Signed-off-by: Gleb Natapov 
---
 hw/usb-bus.c |   42 ++
 1 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/hw/usb-bus.c b/hw/usb-bus.c
index 256b881..8b4583c 100644
--- a/hw/usb-bus.c
+++ b/hw/usb-bus.c
@@ -5,11 +5,13 @@
 #include "monitor.h"
 
 static void usb_bus_dev_print(Monitor *mon, DeviceState *qdev, int indent);
+static char *usbbus_get_fw_dev_path(DeviceState *dev);
 
 static struct BusInfo usb_bus_info = {
 .name  = "USB",
 .size  = sizeof(USBBus),
 .print_dev = usb_bus_dev_print,
+.get_fw_dev_path = usbbus_get_fw_dev_path,
 };
 static int next_usb_bus = 0;
 static QTAILQ_HEAD(, USBBus) busses = QTAILQ_HEAD_INITIALIZER(busses);
@@ -307,3 +309,43 @@ USBDevice *usbdevice_create(const char *cmdline)
 }
 return usb->usbdevice_init(params);
 }
+
+static int usbbus_get_fw_dev_path_helper(USBDevice *d, USBBus *bus, char *p,
+ int len)
+{
+int l = 0;
+USBPort *port;
+
+QTAILQ_FOREACH(port, &bus->used, next) {
+if (port->dev == d) {
+if (port->pdev) {
+l = usbbus_get_fw_dev_path_helper(port->pdev, bus, p, len);
+}
+l += snprintf(p + l, len - l, "%...@%x/", qdev_fw_name(&d->qdev),
+  port->index);
+break;
+}
+}
+
+return l;
+}
+
+static char *usbbus_get_fw_dev_path(DeviceState *dev)
+{
+USBDevice *d = (USBDevice*)dev;
+USBBus *bus = usb_bus_from_device(d);
+char path[100];
+int l;
+
+assert(d->attached != 0);
+
+l = usbbus_get_fw_dev_path_helper(d, bus, path, sizeof(path));
+
+if (l == 0) {
+abort();
+}
+
+path[l-1] = '\0';
+
+return strdup(path);
+}
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 01/16] Introduce fw_name field to DeviceInfo structure.

2010-11-17 Thread Gleb Natapov

Add "fw_name" to DeviceInfo to use in device path building. In
contrast to "name" "fw_name" should refer to functionality device
provides instead of particular device model like "name" does.

Signed-off-by: Gleb Natapov 
---
 hw/fdc.c|1 +
 hw/ide/isa.c|1 +
 hw/ide/qdev.c   |1 +
 hw/isa-bus.c|1 +
 hw/lance.c  |1 +
 hw/piix_pci.c   |1 +
 hw/qdev.h   |6 ++
 hw/scsi-disk.c  |1 +
 hw/usb-hub.c|1 +
 hw/usb-net.c|1 +
 hw/virtio-pci.c |1 +
 11 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/hw/fdc.c b/hw/fdc.c
index c159dcb..a467c4b 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -2040,6 +2040,7 @@ static const VMStateDescription vmstate_isa_fdc ={
 static ISADeviceInfo isa_fdc_info = {
 .init = isabus_fdc_init1,
 .qdev.name  = "isa-fdc",
+.qdev.fw_name  = "fdc",
 .qdev.size  = sizeof(FDCtrlISABus),
 .qdev.no_user = 1,
 .qdev.vmsd  = &vmstate_isa_fdc,
diff --git a/hw/ide/isa.c b/hw/ide/isa.c
index 6b57e0d..9856435 100644
--- a/hw/ide/isa.c
+++ b/hw/ide/isa.c
@@ -98,6 +98,7 @@ ISADevice *isa_ide_init(int iobase, int iobase2, int isairq,
 
 static ISADeviceInfo isa_ide_info = {
 .qdev.name  = "isa-ide",
+.qdev.fw_name  = "ide",
 .qdev.size  = sizeof(ISAIDEState),
 .init   = isa_ide_initfn,
 .qdev.reset = isa_ide_reset,
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 0808760..6d27b60 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -134,6 +134,7 @@ static int ide_drive_initfn(IDEDevice *dev)
 
 static IDEDeviceInfo ide_drive_info = {
 .qdev.name  = "ide-drive",
+.qdev.fw_name  = "drive",
 .qdev.size  = sizeof(IDEDrive),
 .init   = ide_drive_initfn,
 .qdev.props = (Property[]) {
diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index 4e306de..26036e0 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -153,6 +153,7 @@ static int isabus_bridge_init(SysBusDevice *dev)
 static SysBusDeviceInfo isabus_bridge_info = {
 .init = isabus_bridge_init,
 .qdev.name  = "isabus-bridge",
+.qdev.fw_name  = "isa",
 .qdev.size  = sizeof(SysBusDevice),
 .qdev.no_user = 1,
 };
diff --git a/hw/lance.c b/hw/lance.c
index dc12144..1a3bb1a 100644
--- a/hw/lance.c
+++ b/hw/lance.c
@@ -141,6 +141,7 @@ static void lance_reset(DeviceState *dev)
 static SysBusDeviceInfo lance_info = {
 .init   = lance_init,
 .qdev.name  = "lance",
+.qdev.fw_name  = "ethernet",
 .qdev.size  = sizeof(SysBusPCNetState),
 .qdev.reset = lance_reset,
 .qdev.vmsd  = &vmstate_lance,
diff --git a/hw/piix_pci.c b/hw/piix_pci.c
index b5589b9..38f9d9e 100644
--- a/hw/piix_pci.c
+++ b/hw/piix_pci.c
@@ -365,6 +365,7 @@ static PCIDeviceInfo i440fx_info[] = {
 static SysBusDeviceInfo i440fx_pcihost_info = {
 .init = i440fx_pcihost_initfn,
 .qdev.name= "i440FX-pcihost",
+.qdev.fw_name = "pci",
 .qdev.size= sizeof(I440FXState),
 .qdev.no_user = 1,
 };
diff --git a/hw/qdev.h b/hw/qdev.h
index 579328a..9f90efe 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -139,6 +139,7 @@ typedef void (*qdev_resetfn)(DeviceState *dev);
 
 struct DeviceInfo {
 const char *name;
+const char *fw_name;
 const char *alias;
 const char *desc;
 size_t size;
@@ -288,6 +289,11 @@ void qdev_prop_set_defaults(DeviceState *dev, Property 
*props);
 void qdev_prop_register_global_list(GlobalProperty *props);
 void qdev_prop_set_globals(DeviceState *dev);
 
+static inline const char *qdev_fw_name(DeviceState *dev)
+{
+return dev->info->fw_name ? : dev->info->alias ? : dev->info->name;
+}
+
 /* This is a nasty hack to allow passing a NULL bus to qdev_create.  */
 extern struct BusInfo system_bus_info;
 
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index dc71957..2b22777 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -1235,6 +1235,7 @@ static int scsi_disk_initfn(SCSIDevice *dev)
 
 static SCSIDeviceInfo scsi_disk_info = {
 .qdev.name= "scsi-disk",
+.qdev.fw_name = "disk",
 .qdev.desc= "virtual scsi disk or cdrom",
 .qdev.size= sizeof(SCSIDiskState),
 .qdev.reset   = scsi_disk_reset,
diff --git a/hw/usb-hub.c b/hw/usb-hub.c
index 2a1edfc..8e3a96b 100644
--- a/hw/usb-hub.c
+++ b/hw/usb-hub.c
@@ -545,6 +545,7 @@ static int usb_hub_initfn(USBDevice *dev)
 static struct USBDeviceInfo hub_info = {
 .product_desc   = "QEMU USB Hub",
 .qdev.name  = "usb-hub",
+.qdev.fw_name= "hub",
 .qdev.size  = sizeof(USBHubState),
 .init   = usb_hub_initfn,
 .handle_packet  = usb_hub_handle_packet,
diff --git a/hw/usb-net.c b/hw/usb-net.c
index 58c672f..f6bed21 100644
--- a/hw/usb-net.c
+++ b/hw/usb-net.c
@@ -1496,6 +1496,7 @@ static USBDevice *usb_net_init(const char *cmdline)
 static struct USBDeviceInfo net_info = {
 .product_desc   = "QEMU USB Network Interface",
 .qdev.name  = "usb-net",
+.qdev.fw_name= "network",
 .qdev.size  = sizeof(USBNetState),
 .init   = usb_ne

[PATCHv6 07/16] Add get_dev_path callback for system bus.

2010-11-17 Thread Gleb Natapov

Prints out mmio or pio used to access child device.

Signed-off-by: Gleb Natapov 
---
 hw/pci_host.c |2 ++
 hw/sysbus.c   |   30 ++
 hw/sysbus.h   |4 
 3 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/hw/pci_host.c b/hw/pci_host.c
index bc5b771..28d45bf 100644
--- a/hw/pci_host.c
+++ b/hw/pci_host.c
@@ -197,6 +197,7 @@ void pci_host_conf_register_ioport(pio_addr_t ioport, 
PCIHostState *s)
 {
 pci_host_init(s);
 register_ioport_simple(&s->conf_noswap_handler, ioport, 4, 4);
+sysbus_init_ioports(&s->busdev, ioport, 4);
 }
 
 int pci_host_data_register_mmio(PCIHostState *s, int swap)
@@ -215,4 +216,5 @@ void pci_host_data_register_ioport(pio_addr_t ioport, 
PCIHostState *s)
 register_ioport_simple(&s->data_noswap_handler, ioport, 4, 1);
 register_ioport_simple(&s->data_noswap_handler, ioport, 4, 2);
 register_ioport_simple(&s->data_noswap_handler, ioport, 4, 4);
+sysbus_init_ioports(&s->busdev, ioport, 4);
 }
diff --git a/hw/sysbus.c b/hw/sysbus.c
index d817721..1583bd8 100644
--- a/hw/sysbus.c
+++ b/hw/sysbus.c
@@ -22,11 +22,13 @@
 #include "monitor.h"
 
 static void sysbus_dev_print(Monitor *mon, DeviceState *dev, int indent);
+static char *sysbus_get_fw_dev_path(DeviceState *dev);
 
 struct BusInfo system_bus_info = {
 .name   = "System",
 .size   = sizeof(BusState),
 .print_dev  = sysbus_dev_print,
+.get_fw_dev_path = sysbus_get_fw_dev_path,
 };
 
 void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq)
@@ -106,6 +108,16 @@ void sysbus_init_mmio_cb(SysBusDevice *dev, 
target_phys_addr_t size,
 dev->mmio[n].cb = cb;
 }
 
+void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t ioport, pio_addr_t size)
+{
+pio_addr_t i;
+
+for (i = 0; i < size; i++) {
+assert(dev->num_pio < QDEV_MAX_PIO);
+dev->pio[dev->num_pio++] = ioport++;
+}
+}
+
 static int sysbus_device_init(DeviceState *dev, DeviceInfo *base)
 {
 SysBusDeviceInfo *info = container_of(base, SysBusDeviceInfo, qdev);
@@ -171,3 +183,21 @@ static void sysbus_dev_print(Monitor *mon, DeviceState 
*dev, int indent)
indent, "", s->mmio[i].addr, s->mmio[i].size);
 }
 }
+
+static char *sysbus_get_fw_dev_path(DeviceState *dev)
+{
+SysBusDevice *s = sysbus_from_qdev(dev);
+char path[40];
+int off;
+
+off = snprintf(path, sizeof(path), "%s", qdev_fw_name(dev));
+
+if (s->num_mmio) {
+snprintf(path + off, sizeof(path) - off, "@"TARGET_FMT_plx,
+ s->mmio[0].addr);
+} else if (s->num_pio) {
+snprintf(path + off, sizeof(path) - off, "@i%04x", s->pio[0]);
+}
+
+return strdup(path);
+}
diff --git a/hw/sysbus.h b/hw/sysbus.h
index 5980901..e9eb618 100644
--- a/hw/sysbus.h
+++ b/hw/sysbus.h
@@ -6,6 +6,7 @@
 #include "qdev.h"
 
 #define QDEV_MAX_MMIO 32
+#define QDEV_MAX_PIO 32
 #define QDEV_MAX_IRQ 256
 
 typedef struct SysBusDevice SysBusDevice;
@@ -23,6 +24,8 @@ struct SysBusDevice {
 mmio_mapfunc cb;
 ram_addr_t iofunc;
 } mmio[QDEV_MAX_MMIO];
+int num_pio;
+pio_addr_t pio[QDEV_MAX_PIO];
 };
 
 typedef int (*sysbus_initfn)(SysBusDevice *dev);
@@ -45,6 +48,7 @@ void sysbus_init_mmio_cb(SysBusDevice *dev, 
target_phys_addr_t size,
 mmio_mapfunc cb);
 void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p);
 void sysbus_pass_irq(SysBusDevice *dev, SysBusDevice *target);
+void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t ioport, pio_addr_t 
size);
 
 
 void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq);
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 15/16] Add notifier that will be called when machine is fully created.

2010-11-17 Thread Gleb Natapov

Action that depends on fully initialized device model should register
with this notifier chain.

Signed-off-by: Gleb Natapov 
---
 sysemu.h |2 ++
 vl.c |   15 +++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/sysemu.h b/sysemu.h
index 48f8eee..c42f33a 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -60,6 +60,8 @@ void qemu_system_reset(void);
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
 
+void qemu_add_machine_init_done_notifier(Notifier *notify);
+
 void do_savevm(Monitor *mon, const QDict *qdict);
 int load_vmstate(const char *name);
 void do_delvm(Monitor *mon, const QDict *qdict);
diff --git a/vl.c b/vl.c
index f02aaec..d6564d4 100644
--- a/vl.c
+++ b/vl.c
@@ -253,6 +253,9 @@ static void *boot_set_opaque;
 static NotifierList exit_notifiers =
 NOTIFIER_LIST_INITIALIZER(exit_notifiers);
 
+static NotifierList machine_init_done_notifiers =
+NOTIFIER_LIST_INITIALIZER(machine_init_done_notifiers);
+
 int kvm_allowed = 0;
 uint32_t xen_domid;
 enum xen_mode xen_mode = XEN_EMULATE;
@@ -1779,6 +1782,16 @@ static void qemu_run_exit_notifiers(void)
 notifier_list_notify(&exit_notifiers);
 }
 
+void qemu_add_machine_init_done_notifier(Notifier *notify)
+{
+notifier_list_add(&machine_init_done_notifiers, notify);
+}
+
+static void qemu_run_machine_init_done_notifiers(void)
+{
+notifier_list_notify(&machine_init_done_notifiers);
+}
+
 static const QEMUOption *lookup_opt(int argc, char **argv,
 const char **poptarg, int *poptind)
 {
@@ -3024,6 +3037,8 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
 
+qemu_run_machine_init_done_notifiers();
+
 qemu_system_reset();
 if (loadvm) {
 if (load_vmstate(loadvm) < 0) {
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 09/16] Record which USBDevice USBPort belongs too.

2010-11-17 Thread Gleb Natapov

Ports on root hub will have NULL here. This is needed to reconstruct
path from device to its root hub to build device path.

Signed-off-by: Gleb Natapov 
---
 hw/usb-bus.c  |3 ++-
 hw/usb-hub.c  |2 +-
 hw/usb-musb.c |2 +-
 hw/usb-ohci.c |2 +-
 hw/usb-uhci.c |2 +-
 hw/usb.h  |3 ++-
 6 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/usb-bus.c b/hw/usb-bus.c
index b692503..256b881 100644
--- a/hw/usb-bus.c
+++ b/hw/usb-bus.c
@@ -110,11 +110,12 @@ USBDevice *usb_create_simple(USBBus *bus, const char 
*name)
 }
 
 void usb_register_port(USBBus *bus, USBPort *port, void *opaque, int index,
-   usb_attachfn attach)
+   USBDevice *pdev, usb_attachfn attach)
 {
 port->opaque = opaque;
 port->index = index;
 port->attach = attach;
+port->pdev = pdev;
 QTAILQ_INSERT_TAIL(&bus->free, port, next);
 bus->nfree++;
 }
diff --git a/hw/usb-hub.c b/hw/usb-hub.c
index 8e3a96b..8a3f829 100644
--- a/hw/usb-hub.c
+++ b/hw/usb-hub.c
@@ -535,7 +535,7 @@ static int usb_hub_initfn(USBDevice *dev)
 for (i = 0; i < s->nb_ports; i++) {
 port = &s->ports[i];
 usb_register_port(usb_bus_from_device(dev),
-  &port->port, s, i, usb_hub_attach);
+  &port->port, s, i, &s->dev, usb_hub_attach);
 port->wPortStatus = PORT_STAT_POWER;
 port->wPortChange = 0;
 }
diff --git a/hw/usb-musb.c b/hw/usb-musb.c
index 7f15842..9efe7a6 100644
--- a/hw/usb-musb.c
+++ b/hw/usb-musb.c
@@ -343,7 +343,7 @@ struct MUSBState {
 }
 
 usb_bus_new(&s->bus, NULL /* FIXME */);
-usb_register_port(&s->bus, &s->port, s, 0, musb_attach);
+usb_register_port(&s->bus, &s->port, s, 0, NULL, musb_attach);
 
 return s;
 }
diff --git a/hw/usb-ohci.c b/hw/usb-ohci.c
index c60fd8d..59604cf 100644
--- a/hw/usb-ohci.c
+++ b/hw/usb-ohci.c
@@ -1705,7 +1705,7 @@ static void usb_ohci_init(OHCIState *ohci, DeviceState 
*dev,
 usb_bus_new(&ohci->bus, dev);
 ohci->num_ports = num_ports;
 for (i = 0; i < num_ports; i++) {
-usb_register_port(&ohci->bus, &ohci->rhport[i].port, ohci, i, 
ohci_attach);
+usb_register_port(&ohci->bus, &ohci->rhport[i].port, ohci, i, NULL, 
ohci_attach);
 }
 
 ohci->async_td = 0;
diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c
index 1d83400..b9b822f 100644
--- a/hw/usb-uhci.c
+++ b/hw/usb-uhci.c
@@ -1115,7 +1115,7 @@ static int usb_uhci_common_initfn(UHCIState *s)
 
 usb_bus_new(&s->bus, &s->dev.qdev);
 for(i = 0; i < NB_PORTS; i++) {
-usb_register_port(&s->bus, &s->ports[i].port, s, i, uhci_attach);
+usb_register_port(&s->bus, &s->ports[i].port, s, i, NULL, uhci_attach);
 }
 s->frame_timer = qemu_new_timer(vm_clock, uhci_frame_timer, s);
 s->expire_time = qemu_get_clock(vm_clock) +
diff --git a/hw/usb.h b/hw/usb.h
index 00d2802..0b32d77 100644
--- a/hw/usb.h
+++ b/hw/usb.h
@@ -203,6 +203,7 @@ struct USBPort {
 USBDevice *dev;
 usb_attachfn attach;
 void *opaque;
+USBDevice *pdev;
 int index; /* internal port index, may be used with the opaque */
 QTAILQ_ENTRY(USBPort) next;
 };
@@ -312,7 +313,7 @@ USBDevice *usb_create(USBBus *bus, const char *name);
 USBDevice *usb_create_simple(USBBus *bus, const char *name);
 USBDevice *usbdevice_create(const char *cmdline);
 void usb_register_port(USBBus *bus, USBPort *port, void *opaque, int index,
-   usb_attachfn attach);
+   USBDevice *pdev, usb_attachfn attach);
 void usb_unregister_port(USBBus *bus, USBPort *port);
 int usb_device_attach(USBDevice *dev);
 int usb_device_detach(USBDevice *dev);
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 08/16] Add get_fw_dev_path callback for pci bus.

2010-11-17 Thread Gleb Natapov


Signed-off-by: Gleb Natapov 
---
 hw/pci.c |  108 -
 1 files changed, 85 insertions(+), 23 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 438c0d1..8514e15 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -43,12 +43,14 @@
 
 static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent);
 static char *pcibus_get_dev_path(DeviceState *dev);
+static char *pcibus_get_fw_dev_path(DeviceState *dev);
 
 struct BusInfo pci_bus_info = {
 .name   = "PCI",
 .size   = sizeof(PCIBus),
 .print_dev  = pcibus_dev_print,
 .get_dev_path = pcibus_get_dev_path,
+.get_fw_dev_path = pcibus_get_fw_dev_path,
 .props  = (Property[]) {
 DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
 DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
@@ -1061,45 +1063,63 @@ void pci_msi_notify(PCIDevice *dev, unsigned int vector)
 typedef struct {
 uint16_t class;
 const char *desc;
+const char *fw_name;
+uint16_t fw_ign_bits;
 } pci_class_desc;
 
 static const pci_class_desc pci_class_descriptions[] =
 {
-{ 0x0100, "SCSI controller"},
-{ 0x0101, "IDE controller"},
-{ 0x0102, "Floppy controller"},
-{ 0x0103, "IPI controller"},
-{ 0x0104, "RAID controller"},
+{ 0x0001, "VGA controller", "display"},
+{ 0x0100, "SCSI controller", "scsi"},
+{ 0x0101, "IDE controller", "ide"},
+{ 0x0102, "Floppy controller", "fdc"},
+{ 0x0103, "IPI controller", "ipi"},
+{ 0x0104, "RAID controller", "raid"},
 { 0x0106, "SATA controller"},
 { 0x0107, "SAS controller"},
 { 0x0180, "Storage controller"},
-{ 0x0200, "Ethernet controller"},
-{ 0x0201, "Token Ring controller"},
-{ 0x0202, "FDDI controller"},
-{ 0x0203, "ATM controller"},
+{ 0x0200, "Ethernet controller", "ethernet"},
+{ 0x0201, "Token Ring controller", "token-ring"},
+{ 0x0202, "FDDI controller", "fddi"},
+{ 0x0203, "ATM controller", "atm"},
 { 0x0280, "Network controller"},
-{ 0x0300, "VGA controller"},
+{ 0x0300, "VGA controller", "display", 0x00ff},
 { 0x0301, "XGA controller"},
 { 0x0302, "3D controller"},
 { 0x0380, "Display controller"},
-{ 0x0400, "Video controller"},
-{ 0x0401, "Audio controller"},
+{ 0x0400, "Video controller", "video"},
+{ 0x0401, "Audio controller", "sound"},
 { 0x0402, "Phone"},
 { 0x0480, "Multimedia controller"},
-{ 0x0500, "RAM controller"},
-{ 0x0501, "Flash controller"},
+{ 0x0500, "RAM controller", "memory"},
+{ 0x0501, "Flash controller", "flash"},
 { 0x0580, "Memory controller"},
-{ 0x0600, "Host bridge"},
-{ 0x0601, "ISA bridge"},
-{ 0x0602, "EISA bridge"},
-{ 0x0603, "MC bridge"},
-{ 0x0604, "PCI bridge"},
-{ 0x0605, "PCMCIA bridge"},
-{ 0x0606, "NUBUS bridge"},
-{ 0x0607, "CARDBUS bridge"},
+{ 0x0600, "Host bridge", "host"},
+{ 0x0601, "ISA bridge", "isa"},
+{ 0x0602, "EISA bridge", "eisa"},
+{ 0x0603, "MC bridge", "mca"},
+{ 0x0604, "PCI bridge", "pci"},
+{ 0x0605, "PCMCIA bridge", "pcmcia"},
+{ 0x0606, "NUBUS bridge", "nubus"},
+{ 0x0607, "CARDBUS bridge", "cardbus"},
 { 0x0608, "RACEWAY bridge"},
 { 0x0680, "Bridge"},
-{ 0x0c03, "USB controller"},
+{ 0x0700, "Serial port", "serial"},
+{ 0x0701, "Parallel port", "parallel"},
+{ 0x0800, "Interrupt controller", "interrupt-controller"},
+{ 0x0801, "DMA controller", "dma-controller"},
+{ 0x0802, "Timer", "timer"},
+{ 0x0803, "RTC", "rtc"},
+{ 0x0900, "Keyboard", "keyboard"},
+{ 0x0901, "Pen", "pen"},
+{ 0x0902, "Mouse", "mouse"},
+{ 0x0A00, "Dock station", "dock", 0x00ff},
+{ 0x0B00, "i386 cpu", "cpu", 0x00ff},
+{ 0x0c00, "Fireware contorller", "fireware"},
+{ 0x0c01, "Access bus controller", "access-bus"},
+{ 0x0c02, "SSA controller", "ssa"},
+{ 0x0c03, "USB controller", "usb"},
+{ 0x0c04, "Fibre channel controller", "fibre-channel"},
 { 0, NULL}
 };
 
@@ -1828,6 +1848,48 @@ static void pcibus_dev_print(Monitor *mon, DeviceState 
*dev, int indent)
 }
 }
 
+static char *pci_dev_fw_name(DeviceState *dev, char *buf, int len)
+{
+PCIDevice *d = (PCIDevice *)dev;
+const char *name = NULL;
+const pci_class_desc *desc =  pci_class_descriptions;
+int class = pci_get_word(d->config + PCI_CLASS_DEVICE);
+
+while (desc->desc &&
+  (class & ~desc->fw_ign_bits) !=
+  (desc->class & ~desc->fw_ign_bits)) {
+desc++;
+}
+
+if (desc->desc) {
+name = desc->fw_name;
+}
+
+if (name) {
+pstrcpy(buf, len, name);
+} else {
+snprintf(buf, len, "pci%04x,%04x",
+ pci_get_word(d->config + PCI_VENDOR_ID),
+ pci_get_word(d->config + PCI_DEVICE_ID));
+}
+
+return buf;
+}
+
+static char *pcibus_get_fw_dev_path(DeviceState *dev)
+{
+PCIDevice *d = (PCIDevice *)dev;
+char path[50], na

[PATCHv6 06/16] Add get_fw_dev_path callback to IDE bus.

2010-11-17 Thread Gleb Natapov


Signed-off-by: Gleb Natapov 
---
 hw/ide/qdev.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 88ff657..01a181b 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -24,9 +24,12 @@
 
 /* - */
 
+static char *idebus_get_fw_dev_path(DeviceState *dev);
+
 static struct BusInfo ide_bus_info = {
 .name  = "IDE",
 .size  = sizeof(IDEBus),
+.get_fw_dev_path = idebus_get_fw_dev_path,
 };
 
 void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id)
@@ -35,6 +38,16 @@ void ide_bus_new(IDEBus *idebus, DeviceState *dev, int 
bus_id)
 idebus->bus_id = bus_id;
 }
 
+static char *idebus_get_fw_dev_path(DeviceState *dev)
+{
+char path[30];
+
+snprintf(path, sizeof(path), "%...@%d", qdev_fw_name(dev),
+ ((IDEBus*)dev->parent_bus)->bus_id);
+
+return strdup(path);
+}
+
 static int ide_qdev_init(DeviceState *qdev, DeviceInfo *base)
 {
 IDEDevice *dev = DO_UPCAST(IDEDevice, qdev, qdev);
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 13/16] Change fw_cfg_add_file() to get full file path as a parameter.

2010-11-17 Thread Gleb Natapov

Change fw_cfg_add_file() to get full file path as a parameter instead
of building one internally. Two reasons for that. First caller may need
to know how file is named. Second this moves policy of file naming out
from fw_cfg. Platform may want to use more then two levels of
directories for instance.

Signed-off-by: Gleb Natapov 
---
 hw/fw_cfg.c |   16 
 hw/fw_cfg.h |4 ++--
 hw/loader.c |   16 ++--
 3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index 72866ae..7b9434f 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -277,10 +277,9 @@ int fw_cfg_add_callback(FWCfgState *s, uint16_t key, 
FWCfgCallback callback,
 return 1;
 }
 
-int fw_cfg_add_file(FWCfgState *s,  const char *dir, const char *filename,
-uint8_t *data, uint32_t len)
+int fw_cfg_add_file(FWCfgState *s,  const char *filename, uint8_t *data,
+uint32_t len)
 {
-const char *basename;
 int i, index;
 
 if (!s->files) {
@@ -297,15 +296,8 @@ int fw_cfg_add_file(FWCfgState *s,  const char *dir, const 
char *filename,
 
 fw_cfg_add_bytes(s, FW_CFG_FILE_FIRST + index, data, len);
 
-basename = strrchr(filename, '/');
-if (basename) {
-basename++;
-} else {
-basename = filename;
-}
-
-snprintf(s->files->f[index].name, sizeof(s->files->f[index].name),
- "%s/%s", dir, basename);
+pstrcpy(s->files->f[index].name, sizeof(s->files->f[index].name),
+filename);
 for (i = 0; i < index; i++) {
 if (strcmp(s->files->f[index].name, s->files->f[i].name) == 0) {
 FW_CFG_DPRINTF("%s: skip duplicate: %s\n", __FUNCTION__,
diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index 4d13a4f..856bf91 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -60,8 +60,8 @@ int fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t 
value);
 int fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t value);
 int fw_cfg_add_callback(FWCfgState *s, uint16_t key, FWCfgCallback callback,
 void *callback_opaque, uint8_t *data, size_t len);
-int fw_cfg_add_file(FWCfgState *s, const char *dir, const char *filename,
-uint8_t *data, uint32_t len);
+int fw_cfg_add_file(FWCfgState *s, const char *filename, uint8_t *data,
+uint32_t len);
 FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
 target_phys_addr_t crl_addr, target_phys_addr_t 
data_addr);
 
diff --git a/hw/loader.c b/hw/loader.c
index 49ac1fa..1e98326 100644
--- a/hw/loader.c
+++ b/hw/loader.c
@@ -592,8 +592,20 @@ int rom_add_file(const char *file, const char *fw_dir,
 }
 close(fd);
 rom_insert(rom);
-if (rom->fw_file && fw_cfg)
-fw_cfg_add_file(fw_cfg, rom->fw_dir, rom->fw_file, rom->data, 
rom->romsize);
+if (rom->fw_file && fw_cfg) {
+const char *basename;
+char fw_file_name[56];
+
+basename = strrchr(rom->fw_file, '/');
+if (basename) {
+basename++;
+} else {
+basename = rom->fw_file;
+}
+snprintf(fw_file_name, sizeof(fw_file_name), "%s/%s", rom->fw_dir,
+ basename);
+fw_cfg_add_file(fw_cfg, fw_file_name, rom->data, rom->romsize);
+}
 return 0;
 
 err:
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 05/16] Store IDE bus id in IDEBus structure for easy access.

2010-11-17 Thread Gleb Natapov


Signed-off-by: Gleb Natapov 
---
 hw/ide/cmd646.c   |4 ++--
 hw/ide/internal.h |3 ++-
 hw/ide/isa.c  |2 +-
 hw/ide/piix.c |4 ++--
 hw/ide/qdev.c |3 ++-
 hw/ide/via.c  |4 ++--
 6 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index ff80dd5..b2cbdbc 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -257,8 +257,8 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev)
 pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1
 
 irq = qemu_allocate_irqs(cmd646_set_irq, d, 2);
-ide_bus_new(&d->bus[0], &d->dev.qdev);
-ide_bus_new(&d->bus[1], &d->dev.qdev);
+ide_bus_new(&d->bus[0], &d->dev.qdev, 0);
+ide_bus_new(&d->bus[1], &d->dev.qdev, 1);
 ide_init2(&d->bus[0], irq[0]);
 ide_init2(&d->bus[1], irq[1]);
 
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index d652e06..c0a1abc 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -448,6 +448,7 @@ struct IDEBus {
 IDEDevice *slave;
 BMDMAState *bmdma;
 IDEState ifs[2];
+int bus_id;
 uint8_t unit;
 uint8_t cmd;
 qemu_irq irq;
@@ -565,7 +566,7 @@ void ide_init2_with_non_qdev_drives(IDEBus *bus, DriveInfo 
*hd0,
 void ide_init_ioport(IDEBus *bus, int iobase, int iobase2);
 
 /* hw/ide/qdev.c */
-void ide_bus_new(IDEBus *idebus, DeviceState *dev);
+void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id);
 IDEDevice *ide_create_drive(IDEBus *bus, int unit, DriveInfo *drive);
 
 #endif /* HW_IDE_INTERNAL_H */
diff --git a/hw/ide/isa.c b/hw/ide/isa.c
index 4206afd..8c59c5a 100644
--- a/hw/ide/isa.c
+++ b/hw/ide/isa.c
@@ -67,7 +67,7 @@ static int isa_ide_initfn(ISADevice *dev)
 {
 ISAIDEState *s = DO_UPCAST(ISAIDEState, dev, dev);
 
-ide_bus_new(&s->bus, &s->dev.qdev);
+ide_bus_new(&s->bus, &s->dev.qdev, 0);
 ide_init_ioport(&s->bus, s->iobase, s->iobase2);
 isa_init_irq(dev, &s->irq, s->isairq);
 isa_init_ioport_range(dev, s->iobase, 8);
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 07483e8..d0b04a3 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -129,8 +129,8 @@ static int pci_piix_ide_initfn(PCIIDEState *d)
 
 vmstate_register(&d->dev.qdev, 0, &vmstate_ide_pci, d);
 
-ide_bus_new(&d->bus[0], &d->dev.qdev);
-ide_bus_new(&d->bus[1], &d->dev.qdev);
+ide_bus_new(&d->bus[0], &d->dev.qdev, 0);
+ide_bus_new(&d->bus[1], &d->dev.qdev, 1);
 ide_init_ioport(&d->bus[0], 0x1f0, 0x3f6);
 ide_init_ioport(&d->bus[1], 0x170, 0x376);
 
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 6d27b60..88ff657 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -29,9 +29,10 @@ static struct BusInfo ide_bus_info = {
 .size  = sizeof(IDEBus),
 };
 
-void ide_bus_new(IDEBus *idebus, DeviceState *dev)
+void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id)
 {
 qbus_create_inplace(&idebus->qbus, &ide_bus_info, dev, NULL);
+idebus->bus_id = bus_id;
 }
 
 static int ide_qdev_init(DeviceState *qdev, DeviceInfo *base)
diff --git a/hw/ide/via.c b/hw/ide/via.c
index b2c7cad..cc48b2b 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -158,8 +158,8 @@ static int vt82c686b_ide_initfn(PCIDevice *dev)
 
 vmstate_register(&dev->qdev, 0, &vmstate_ide_pci, d);
 
-ide_bus_new(&d->bus[0], &d->dev.qdev);
-ide_bus_new(&d->bus[1], &d->dev.qdev);
+ide_bus_new(&d->bus[0], &d->dev.qdev, 0);
+ide_bus_new(&d->bus[1], &d->dev.qdev, 1);
 ide_init2(&d->bus[0], isa_reserve_irq(14));
 ide_init2(&d->bus[1], isa_reserve_irq(15));
 ide_init_ioport(&d->bus[0], 0x1f0, 0x3f6);
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 16/16] Pass boot device list to firmware.

2010-11-17 Thread Gleb Natapov


Signed-off-by: Gleb Natapov 
---
 hw/fw_cfg.c |   14 ++
 sysemu.h|1 +
 vl.c|   48 
 3 files changed, 63 insertions(+), 0 deletions(-)

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index 7b9434f..20a816f 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -53,6 +53,7 @@ struct FWCfgState {
 FWCfgFiles *files;
 uint16_t cur_entry;
 uint32_t cur_offset;
+Notifier machine_ready;
 };
 
 static void fw_cfg_write(FWCfgState *s, uint8_t value)
@@ -315,6 +316,15 @@ int fw_cfg_add_file(FWCfgState *s,  const char *filename, 
uint8_t *data,
 return 1;
 }
 
+static void fw_cfg_machine_ready(struct Notifier* n)
+{
+uint32_t len;
+FWCfgState *s = container_of(n, FWCfgState, machine_ready);
+char *bootindex = get_boot_devices_list(&len);
+
+fw_cfg_add_file(s, "bootorder", (uint8_t*)bootindex, len);
+}
+
 FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
 target_phys_addr_t ctl_addr, target_phys_addr_t 
data_addr)
 {
@@ -343,6 +353,10 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t 
data_port,
 fw_cfg_add_i16(s, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu);
 
+
+s->machine_ready.notify = fw_cfg_machine_ready;
+qemu_add_machine_init_done_notifier(&s->machine_ready);
+
 return s;
 }
 
diff --git a/sysemu.h b/sysemu.h
index c42f33a..38a20a3 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -196,4 +196,5 @@ void register_devices(void);
 
 void add_boot_device_path(int32_t bootindex, DeviceState *dev,
   const char *suffix);
+char *get_boot_devices_list(uint32_t *size);
 #endif
diff --git a/vl.c b/vl.c
index d6564d4..da025c1 100644
--- a/vl.c
+++ b/vl.c
@@ -735,6 +735,54 @@ void add_boot_device_path(int32_t bootindex, DeviceState 
*dev,
 QTAILQ_INSERT_TAIL(&fw_boot_order, node, link);
 }
 
+/*
+ * This function returns null terminated string that consist of new line
+ * separated device pathes.
+ *
+ * memory pointed by "size" is assigned total length of the array in bytes
+ *
+ */
+char *get_boot_devices_list(uint32_t *size)
+{
+FWBootEntry *i;
+uint32_t total = 0;
+char *list = NULL;
+
+QTAILQ_FOREACH(i, &fw_boot_order, link) {
+char *devpath = NULL, *bootpath;
+int len;
+
+if (i->dev) {
+devpath = qdev_get_fw_dev_path(i->dev);
+assert(devpath);
+}
+
+if (i->suffix && devpath) {
+bootpath = qemu_malloc(strlen(devpath) + strlen(i->suffix) + 1);
+sprintf(bootpath, "%s%s", devpath, i->suffix);
+qemu_free(devpath);
+} else if (devpath) {
+bootpath = devpath;
+} else {
+bootpath = strdup(i->suffix);
+assert(bootpath);
+}
+
+if (total) {
+list[total-1] = '\n';
+}
+len = strlen(bootpath) + 1;
+list = qemu_realloc(list, total + len);
+memcpy(&list[total], bootpath, len);
+total += len;
+qemu_free(bootpath);
+}
+
+*size = total;
+
+return list;
+}
+
 static void numa_add(const char *optarg)
 {
 char option[128];
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 12/16] Add bootindex parameter to net/block/fd device

2010-11-17 Thread Gleb Natapov

If bootindex is specified on command line a string that describes device
in firmware readable way is added into sorted list. Later this list will
be passed into firmware to control boot order.

Signed-off-by: Gleb Natapov 
---
 block_int.h |4 +++-
 hw/e1000.c  |4 
 hw/eepro100.c   |3 +++
 hw/fdc.c|8 
 hw/ide/qdev.c   |5 +
 hw/ne2000.c |3 +++
 hw/pcnet.c  |4 
 hw/qdev.c   |   32 
 hw/qdev.h   |1 +
 hw/rtl8139.c|4 
 hw/scsi-disk.c  |1 +
 hw/usb-net.c|2 ++
 hw/virtio-blk.c |2 ++
 hw/virtio-net.c |2 ++
 net.h   |4 +++-
 sysemu.h|2 ++
 vl.c|   40 
 17 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/block_int.h b/block_int.h
index 3c3adb5..0a0e47d 100644
--- a/block_int.h
+++ b/block_int.h
@@ -227,6 +227,7 @@ typedef struct BlockConf {
 uint16_t logical_block_size;
 uint16_t min_io_size;
 uint32_t opt_io_size;
+int32_t bootindex;
 } BlockConf;
 
 static inline unsigned int get_physical_block_exp(BlockConf *conf)
@@ -249,6 +250,7 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
 DEFINE_PROP_UINT16("physical_block_size", _state,   \
_conf.physical_block_size, 512), \
 DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0),  \
-DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0)
+DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0),\
+DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1) \
 
 #endif /* BLOCK_INT_H */
diff --git a/hw/e1000.c b/hw/e1000.c
index 7811699..34ad136 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -30,6 +30,7 @@
 #include "net.h"
 #include "net/checksum.h"
 #include "loader.h"
+#include "sysemu.h"
 
 #include "e1000_hw.h"
 
@@ -1154,6 +1155,9 @@ static int pci_e1000_init(PCIDevice *pci_dev)
   d->dev.qdev.info->name, d->dev.qdev.id, d);
 
 qemu_format_nic_info_str(&d->nic->nc, macaddr);
+
+add_boot_device_path(d->conf.bootindex, &pci_dev->qdev, "/ethernet-...@0");
+
 return 0;
 }
 
diff --git a/hw/eepro100.c b/hw/eepro100.c
index 41d792a..ae96204 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -46,6 +46,7 @@
 #include "pci.h"
 #include "net.h"
 #include "eeprom93xx.h"
+#include "sysemu.h"
 
 #define KiB 1024
 
@@ -1907,6 +1908,8 @@ static int e100_nic_init(PCIDevice *pci_dev)
 s->vmstate->name = s->nic->nc.model;
 vmstate_register(&pci_dev->qdev, -1, s->vmstate, s);
 
+add_boot_device_path(s->conf.bootindex, &pci_dev->qdev, "/ethernet-...@0");
+
 return 0;
 }
 
diff --git a/hw/fdc.c b/hw/fdc.c
index 5ab754b..543aa68 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -35,6 +35,7 @@
 #include "sysbus.h"
 #include "qdev-addr.h"
 #include "blockdev.h"
+#include "sysemu.h"
 
 //
 /* debug Floppy devices */
@@ -523,6 +524,8 @@ typedef struct FDCtrlSysBus {
 typedef struct FDCtrlISABus {
 ISADevice busdev;
 struct FDCtrl state;
+int32_t bootindexA;
+int32_t bootindexB;
 } FDCtrlISABus;
 
 static uint32_t fdctrl_read (void *opaque, uint32_t reg)
@@ -1992,6 +1995,9 @@ static int isabus_fdc_init1(ISADevice *dev)
 qdev_set_legacy_instance_id(&dev->qdev, iobase, 2);
 ret = fdctrl_init_common(fdctrl);
 
+add_boot_device_path(isa->bootindexA, &dev->qdev, "/flo...@0");
+add_boot_device_path(isa->bootindexB, &dev->qdev, "/flo...@1");
+
 return ret;
 }
 
@@ -2051,6 +2057,8 @@ static ISADeviceInfo isa_fdc_info = {
 .qdev.props = (Property[]) {
 DEFINE_PROP_DRIVE("driveA", FDCtrlISABus, state.drives[0].bs),
 DEFINE_PROP_DRIVE("driveB", FDCtrlISABus, state.drives[1].bs),
+DEFINE_PROP_INT32("bootindexA", FDCtrlISABus, bootindexA, -1),
+DEFINE_PROP_INT32("bootindexB", FDCtrlISABus, bootindexB, -1),
 DEFINE_PROP_END_OF_LIST(),
 },
 };
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 01a181b..69a00e2 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -21,6 +21,7 @@
 #include "qemu-error.h"
 #include 
 #include "blockdev.h"
+#include "sysemu.h"
 
 /* - */
 
@@ -143,6 +144,10 @@ static int ide_drive_initfn(IDEDevice *dev)
 if (!dev->serial) {
 dev->serial = qemu_strdup(s->drive_serial_str);
 }
+
+add_boot_device_path(dev->conf.bootindex, &dev->qdev,
+ dev->unit ? "d...@1" : "d...@0");
+
 return 0;
 }
 
diff --git a/hw/ne2000.c b/hw/ne2000.c
index 126e7cf..a030106 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -26,6 +26,7 @@
 #include "net.h"
 #include "ne2000.h"
 #include "loader.h"
+#include "sysemu.h"
 
 /* debug NE2000 card */
 //#define DEBUG_NE2000
@@ -746,6 +747,8 @@ static int pci_ne2000_init(PCIDevice *pci_dev)
 }
 }
 
+add_boot_device_path(s->

[PATCHv6 03/16] Keep track of ISA ports ISA device is using in qdev.

2010-11-17 Thread Gleb Natapov

Store all io ports used by device in ISADevice structure.

Signed-off-by: Gleb Natapov 
---
 hw/cs4231a.c |1 +
 hw/fdc.c |3 +++
 hw/gus.c |4 
 hw/ide/isa.c |2 ++
 hw/isa-bus.c |   25 +
 hw/isa.h |4 
 hw/m48t59.c  |1 +
 hw/mc146818rtc.c |1 +
 hw/ne2000-isa.c  |3 +++
 hw/parallel.c|5 +
 hw/pckbd.c   |3 +++
 hw/sb16.c|4 
 hw/serial.c  |1 +
 13 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/hw/cs4231a.c b/hw/cs4231a.c
index 4d5ce5c..598f032 100644
--- a/hw/cs4231a.c
+++ b/hw/cs4231a.c
@@ -645,6 +645,7 @@ static int cs4231a_initfn (ISADevice *dev)
 isa_init_irq (dev, &s->pic, s->irq);
 
 for (i = 0; i < 4; i++) {
+isa_init_ioport(dev, i);
 register_ioport_write (s->port + i, 1, 1, cs_write, s);
 register_ioport_read (s->port + i, 1, 1, cs_read, s);
 }
diff --git a/hw/fdc.c b/hw/fdc.c
index a467c4b..5ab754b 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -1983,6 +1983,9 @@ static int isabus_fdc_init1(ISADevice *dev)
   &fdctrl_write_port, fdctrl);
 register_ioport_write(iobase + 0x07, 1, 1,
   &fdctrl_write_port, fdctrl);
+isa_init_ioport_range(dev, iobase + 1, 5);
+isa_init_ioport(dev, iobase + 7);
+
 isa_init_irq(&isa->busdev, &fdctrl->irq, isairq);
 fdctrl->dma_chann = dma_chann;
 
diff --git a/hw/gus.c b/hw/gus.c
index e9016d8..ff9e7c7 100644
--- a/hw/gus.c
+++ b/hw/gus.c
@@ -264,20 +264,24 @@ static int gus_initfn (ISADevice *dev)
 
 register_ioport_write (s->port, 1, 1, gus_writeb, s);
 register_ioport_write (s->port, 1, 2, gus_writew, s);
+isa_init_ioport_range(dev, s->port, 2);
 
 register_ioport_read ((s->port + 0x100) & 0xf00, 1, 1, gus_readb, s);
 register_ioport_read ((s->port + 0x100) & 0xf00, 1, 2, gus_readw, s);
+isa_init_ioport_range(dev, (s->port + 0x100) & 0xf00, 2);
 
 register_ioport_write (s->port + 6, 10, 1, gus_writeb, s);
 register_ioport_write (s->port + 6, 10, 2, gus_writew, s);
 register_ioport_read (s->port + 6, 10, 1, gus_readb, s);
 register_ioport_read (s->port + 6, 10, 2, gus_readw, s);
+isa_init_ioport_range(dev, s->port + 6, 10);
 
 
 register_ioport_write (s->port + 0x100, 8, 1, gus_writeb, s);
 register_ioport_write (s->port + 0x100, 8, 2, gus_writew, s);
 register_ioport_read (s->port + 0x100, 8, 1, gus_readb, s);
 register_ioport_read (s->port + 0x100, 8, 2, gus_readw, s);
+isa_init_ioport_range(dev, s->port + 0x100, 8);
 
 DMA_register_channel (s->emu.gusdma, GUS_read_DMA, s);
 s->emu.himemaddr = s->himem;
diff --git a/hw/ide/isa.c b/hw/ide/isa.c
index 9856435..4206afd 100644
--- a/hw/ide/isa.c
+++ b/hw/ide/isa.c
@@ -70,6 +70,8 @@ static int isa_ide_initfn(ISADevice *dev)
 ide_bus_new(&s->bus, &s->dev.qdev);
 ide_init_ioport(&s->bus, s->iobase, s->iobase2);
 isa_init_irq(dev, &s->irq, s->isairq);
+isa_init_ioport_range(dev, s->iobase, 8);
+isa_init_ioport(dev, s->iobase2);
 ide_init2(&s->bus, s->irq);
 vmstate_register(&dev->qdev, 0, &vmstate_ide_isa, s);
 return 0;
diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index 26036e0..c0ac7e9 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -92,6 +92,31 @@ void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq)
 dev->nirqs++;
 }
 
+static void isa_init_ioport_one(ISADevice *dev, uint16_t ioport)
+{
+assert(dev->nioports < ARRAY_SIZE(dev->ioports));
+dev->ioports[dev->nioports++] = ioport;
+}
+
+static int isa_cmp_ports(const void *p1, const void *p2)
+{
+return *(uint16_t*)p1 - *(uint16_t*)p2;
+}
+
+void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length)
+{
+int i;
+for (i = start; i < start + length; i++) {
+isa_init_ioport_one(dev, i);
+}
+qsort(dev->ioports, dev->nioports, sizeof(dev->ioports[0]), isa_cmp_ports);
+}
+
+void isa_init_ioport(ISADevice *dev, uint16_t ioport)
+{
+isa_init_ioport_range(dev, ioport, 1);
+}
+
 static int isa_qdev_init(DeviceState *qdev, DeviceInfo *base)
 {
 ISADevice *dev = DO_UPCAST(ISADevice, qdev, qdev);
diff --git a/hw/isa.h b/hw/isa.h
index aaf0272..4794b76 100644
--- a/hw/isa.h
+++ b/hw/isa.h
@@ -14,6 +14,8 @@ struct ISADevice {
 DeviceState qdev;
 uint32_t isairq[2];
 int nirqs;
+uint16_t ioports[32];
+int nioports;
 };
 
 typedef int (*isa_qdev_initfn)(ISADevice *dev);
@@ -26,6 +28,8 @@ ISABus *isa_bus_new(DeviceState *dev);
 void isa_bus_irqs(qemu_irq *irqs);
 qemu_irq isa_reserve_irq(int isairq);
 void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq);
+void isa_init_ioport(ISADevice *dev, uint16_t ioport);
+void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length);
 void isa_qdev_register(ISADeviceInfo *info);
 ISADevice *isa_create(const char *name);
 ISADevice *isa_create_simple(const char *name);
diff --git a/hw/m48t59.c b/hw/m

[PATCHv6 11/16] Add get_dev_path callback to scsi bus.

2010-11-17 Thread Gleb Natapov


Signed-off-by: Gleb Natapov 
---
 hw/scsi-bus.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 5a3fd4b..db7482a 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -5,9 +5,12 @@
 #include "qdev.h"
 #include "blockdev.h"
 
+static char *scsibus_get_fw_dev_path(DeviceState *dev);
+
 static struct BusInfo scsi_bus_info = {
 .name  = "SCSI",
 .size  = sizeof(SCSIBus),
+.get_fw_dev_path = scsibus_get_fw_dev_path,
 .props = (Property[]) {
 DEFINE_PROP_UINT32("scsi-id", SCSIDevice, id, -1),
 DEFINE_PROP_END_OF_LIST(),
@@ -528,3 +531,23 @@ void scsi_req_complete(SCSIRequest *req)
req->tag,
req->status);
 }
+
+static char *scsibus_get_fw_dev_path(DeviceState *dev)
+{
+SCSIDevice *d = (SCSIDevice*)dev;
+SCSIBus *bus = scsi_bus_from_device(d);
+char path[100];
+int i;
+
+for (i = 0; i < bus->ndev; i++) {
+if (bus->devs[i] == d) {
+break;
+}
+}
+
+assert(i != bus->ndev);
+
+snprintf(path, sizeof(path), "%...@%x", qdev_fw_name(dev), i);
+
+return strdup(path);
+}
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 14/16] Add bootindex for option roms.

2010-11-17 Thread Gleb Natapov

Extend -option-rom command to have additional parameter ,bootindex=.

Signed-off-by: Gleb Natapov 
---
 hw/loader.c|   16 +++-
 hw/loader.h|8 
 hw/multiboot.c |3 ++-
 hw/ne2000.c|2 +-
 hw/nseries.c   |4 ++--
 hw/palm.c  |6 +++---
 hw/pc.c|7 ---
 hw/pci.c   |2 +-
 hw/pcnet.c |2 +-
 qemu-config.c  |   17 +
 sysemu.h   |6 +-
 vl.c   |   11 +--
 12 files changed, 60 insertions(+), 24 deletions(-)

diff --git a/hw/loader.c b/hw/loader.c
index 1e98326..eb198f6 100644
--- a/hw/loader.c
+++ b/hw/loader.c
@@ -107,7 +107,7 @@ int load_image_targphys(const char *filename,
 
 size = get_image_size(filename);
 if (size > 0)
-rom_add_file_fixed(filename, addr);
+rom_add_file_fixed(filename, addr, -1);
 return size;
 }
 
@@ -557,10 +557,11 @@ static void rom_insert(Rom *rom)
 }
 
 int rom_add_file(const char *file, const char *fw_dir,
- target_phys_addr_t addr)
+ target_phys_addr_t addr, int32_t bootindex)
 {
 Rom *rom;
 int rc, fd = -1;
+char devpath[100];
 
 rom = qemu_mallocz(sizeof(*rom));
 rom->name = qemu_strdup(file);
@@ -605,7 +606,12 @@ int rom_add_file(const char *file, const char *fw_dir,
 snprintf(fw_file_name, sizeof(fw_file_name), "%s/%s", rom->fw_dir,
  basename);
 fw_cfg_add_file(fw_cfg, fw_file_name, rom->data, rom->romsize);
+snprintf(devpath, sizeof(devpath), "/r...@%s", fw_file_name);
+} else {
+snprintf(devpath, sizeof(devpath), "/rom@" TARGET_FMT_plx, addr);
 }
+
+add_boot_device_path(bootindex, NULL, devpath);
 return 0;
 
 err:
@@ -635,12 +641,12 @@ int rom_add_blob(const char *name, const void *blob, 
size_t len,
 
 int rom_add_vga(const char *file)
 {
-return rom_add_file(file, "vgaroms", 0);
+return rom_add_file(file, "vgaroms", 0, -1);
 }
 
-int rom_add_option(const char *file)
+int rom_add_option(const char *file, int32_t bootindex)
 {
-return rom_add_file(file, "genroms", 0);
+return rom_add_file(file, "genroms", 0, bootindex);
 }
 
 static void rom_reset(void *unused)
diff --git a/hw/loader.h b/hw/loader.h
index 1f82fc5..fc6bdff 100644
--- a/hw/loader.h
+++ b/hw/loader.h
@@ -22,7 +22,7 @@ void pstrcpy_targphys(const char *name,
 
 
 int rom_add_file(const char *file, const char *fw_dir,
- target_phys_addr_t addr);
+ target_phys_addr_t addr, int32_t bootindex);
 int rom_add_blob(const char *name, const void *blob, size_t len,
  target_phys_addr_t addr);
 int rom_load_all(void);
@@ -31,8 +31,8 @@ int rom_copy(uint8_t *dest, target_phys_addr_t addr, size_t 
size);
 void *rom_ptr(target_phys_addr_t addr);
 void do_info_roms(Monitor *mon);
 
-#define rom_add_file_fixed(_f, _a)  \
-rom_add_file(_f, NULL, _a)
+#define rom_add_file_fixed(_f, _a, _i)  \
+rom_add_file(_f, NULL, _a, _i)
 #define rom_add_blob_fixed(_f, _b, _l, _a)  \
 rom_add_blob(_f, _b, _l, _a)
 
@@ -43,6 +43,6 @@ void do_info_roms(Monitor *mon);
 #define PC_ROM_SIZE(PC_ROM_MAX - PC_ROM_MIN_VGA)
 
 int rom_add_vga(const char *file);
-int rom_add_option(const char *file);
+int rom_add_option(const char *file, int32_t bootindex);
 
 #endif
diff --git a/hw/multiboot.c b/hw/multiboot.c
index e710bbb..7cc3055 100644
--- a/hw/multiboot.c
+++ b/hw/multiboot.c
@@ -331,7 +331,8 @@ int load_multiboot(void *fw_cfg,
 fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, mb_bootinfo_data,
  sizeof(bootinfo));
 
-option_rom[nb_option_roms] = "multiboot.bin";
+option_rom[nb_option_roms].name = "multiboot.bin";
+option_rom[nb_option_roms].bootindex = 0;
 nb_option_roms++;
 
 return 1; /* yes, we are multiboot */
diff --git a/hw/ne2000.c b/hw/ne2000.c
index a030106..5966359 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -742,7 +742,7 @@ static int pci_ne2000_init(PCIDevice *pci_dev)
 if (!pci_dev->qdev.hotplugged) {
 static int loaded = 0;
 if (!loaded) {
-rom_add_option("pxe-ne2k_pci.bin");
+rom_add_option("pxe-ne2k_pci.bin", -1);
 loaded = 1;
 }
 }
diff --git a/hw/nseries.c b/hw/nseries.c
index 04a028d..2f6f473 100644
--- a/hw/nseries.c
+++ b/hw/nseries.c
@@ -1326,7 +1326,7 @@ static void n8x0_init(ram_addr_t ram_size, const char 
*boot_device,
 qemu_register_reset(n8x0_boot_init, s);
 }
 
-if (option_rom[0] && (boot_device[0] == 'n' || !kernel_filename)) {
+if (option_rom[0].name && (boot_device[0] == 'n' || !kernel_filename)) {
 int rom_size;
 uint8_t nolo_tags[0x1];
 /* No, wait, better start at the ROM.  */
@@ -1341,7 +1341,7 @@ static void n8x0_init(ram_addr_t ram_size, const char 
*boot_device,
  *
  * The code above is for loading the `zImage' file from Nokia
  * images.  */
-rom_size =

[PATCHv6 02/16] Introduce new BusInfo callback get_fw_dev_path.

2010-11-17 Thread Gleb Natapov

New get_fw_dev_path callback will be used for build device path usable
by firmware in contrast to qdev qemu internal device path.

Signed-off-by: Gleb Natapov 
---
 hw/qdev.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/hw/qdev.h b/hw/qdev.h
index 9f90efe..dc669b3 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -49,12 +49,14 @@ struct DeviceState {
 
 typedef void (*bus_dev_printfn)(Monitor *mon, DeviceState *dev, int indent);
 typedef char *(*bus_get_dev_path)(DeviceState *dev);
+typedef char *(*bus_get_fw_dev_path)(DeviceState *dev);
 
 struct BusInfo {
 const char *name;
 size_t size;
 bus_dev_printfn print_dev;
 bus_get_dev_path get_dev_path;
+bus_get_fw_dev_path get_fw_dev_path;
 Property *props;
 };
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 04/16] Add get_fw_dev_path callback to ISA bus in qdev.

2010-11-17 Thread Gleb Natapov

Use device ioports to create unique device path.

Signed-off-by: Gleb Natapov 
---
 hw/isa-bus.c |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index c0ac7e9..c423c1b 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -31,11 +31,13 @@ static ISABus *isabus;
 target_phys_addr_t isa_mem_base = 0;
 
 static void isabus_dev_print(Monitor *mon, DeviceState *dev, int indent);
+static char *isabus_get_fw_dev_path(DeviceState *dev);
 
 static struct BusInfo isa_bus_info = {
 .name  = "ISA",
 .size  = sizeof(ISABus),
 .print_dev = isabus_dev_print,
+.get_fw_dev_path = isabus_get_fw_dev_path,
 };
 
 ISABus *isa_bus_new(DeviceState *dev)
@@ -188,4 +190,18 @@ static void isabus_register_devices(void)
 sysbus_register_withprop(&isabus_bridge_info);
 }
 
+static char *isabus_get_fw_dev_path(DeviceState *dev)
+{
+ISADevice *d = (ISADevice*)dev;
+char path[40];
+int off;
+
+off = snprintf(path, sizeof(path), "%s", qdev_fw_name(dev));
+if (d->nioports) {
+snprintf(path + off, sizeof(path) - off, "@%04x", d->ioports[0]);
+}
+
+return strdup(path);
+}
+
 device_init(isabus_register_devices)
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 00/16] boot order specification

2010-11-17 Thread Gleb Natapov

I am using open firmware naming scheme to specify device path names.
In this version: added SCSI bus support. Pass boot order list as file
to firmware.

Names look like this on pci machine:
/p...@i0cf8/i...@1,1/dr...@1/d...@0
/p...@i0cf8/i...@1/f...@03f1/flo...@1
/p...@i0cf8/i...@1/f...@03f1/flo...@0
/p...@i0cf8/i...@1,1/dr...@1/d...@1
/p...@i0cf8/i...@1,1/dr...@0/d...@0
/p...@i0cf8/s...@3/d...@0,0
/p...@i0cf8/ether...@4/ethernet-...@0
/p...@i0cf8/ether...@5/ethernet-...@0
/p...@i0cf8/i...@1,1/dr...@0/d...@1
/p...@i0cf8/i...@1/i...@01e8/dr...@0/d...@0
/p...@i0cf8/u...@1,2/netw...@0/ether...@0
/p...@i0cf8/u...@1,2/h...@1/netw...@0/ether...@0
/r...@genroms/linuxboot.bin

and on isa machine:
/isa/i...@0170/dr...@0/d...@0
/isa/f...@03f1/flo...@1
/isa/f...@03f1/flo...@0
/isa/i...@0170/dr...@0/d...@1

Instead of using get_dev_path() callback I introduces another one
get_fw_dev_path. Unfortunately the way get_dev_path() callback is used
in migration code makes it hard to reuse it for other purposes. First
of all it is not called recursively so caller expects it to provide
unique name by itself. Device path though is inherently recursive. Each
individual element may not be unique, but the whole path will be. On
the other hand to call get_dev_path() recursively in migration code we
should implement it for all possible buses first. Other problem is
compatibility. If we change get_dev_path() output format now we will not
be able to migrate from old qemu to new one without some additional
compatibility layer.

Gleb Natapov (16):
  Introduce fw_name field to DeviceInfo structure.
  Introduce new BusInfo callback get_fw_dev_path.
  Keep track of ISA ports ISA device is using in qdev.
  Add get_fw_dev_path callback to ISA bus in qdev.
  Store IDE bus id in IDEBus structure for easy access.
  Add get_fw_dev_path callback to IDE bus.
  Add get_dev_path callback for system bus.
  Add get_fw_dev_path callback for pci bus.
  Record which USBDevice USBPort belongs too.
  Add get_dev_path callback for usb bus.
  Add get_dev_path callback to scsi bus.
  Add bootindex parameter to net/block/fd device
  Change fw_cfg_add_file() to get full file path as a parameter.
  Add bootindex for option roms.
  Add notifier that will be called when machine is fully created.
  Pass boot device list to firmware.

 block_int.h   |4 +-
 hw/cs4231a.c  |1 +
 hw/e1000.c|4 ++
 hw/eepro100.c |3 +
 hw/fdc.c  |   12 ++
 hw/fw_cfg.c   |   30 --
 hw/fw_cfg.h   |4 +-
 hw/gus.c  |4 ++
 hw/ide/cmd646.c   |4 +-
 hw/ide/internal.h |3 +-
 hw/ide/isa.c  |5 ++-
 hw/ide/piix.c |4 +-
 hw/ide/qdev.c |   22 ++-
 hw/ide/via.c  |4 +-
 hw/isa-bus.c  |   42 +++
 hw/isa.h  |4 ++
 hw/lance.c|1 +
 hw/loader.c   |   32 ---
 hw/loader.h   |8 ++--
 hw/m48t59.c   |1 +
 hw/mc146818rtc.c  |1 +
 hw/multiboot.c|3 +-
 hw/ne2000-isa.c   |3 +
 hw/ne2000.c   |5 ++-
 hw/nseries.c  |4 +-
 hw/palm.c |6 +-
 hw/parallel.c |5 ++
 hw/pc.c   |7 ++-
 hw/pci.c  |  110 ---
 hw/pci_host.c |2 +
 hw/pckbd.c|3 +
 hw/pcnet.c|6 ++-
 hw/piix_pci.c |1 +
 hw/qdev.c |   32 +++
 hw/qdev.h |9 
 hw/rtl8139.c  |4 ++
 hw/sb16.c |4 ++
 hw/scsi-bus.c |   23 +++
 hw/scsi-disk.c|2 +
 hw/serial.c   |1 +
 hw/sysbus.c   |   30 ++
 hw/sysbus.h   |4 ++
 hw/usb-bus.c  |   45 -
 hw/usb-hub.c  |3 +-
 hw/usb-musb.c |2 +-
 hw/usb-net.c  |3 +
 hw/usb-ohci.c |2 +-
 hw/usb-uhci.c |2 +-
 hw/usb.h  |3 +-
 hw/virtio-blk.c   |2 +
 hw/virtio-net.c   |2 +
 hw/virtio-pci.c   |1 +
 net.h |4 +-
 qemu-config.c |   17 
 sysemu.h  |   11 +-
 vl.c  |  114 -
 56 files changed, 588 insertions(+), 80 deletions(-)

-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/6] KVM: MMU: fix forgot flush vcpu tlbs

2010-11-17 Thread Avi Kivity

On 11/17/2010 05:29 PM, Marcelo Tosatti wrote:

>  diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>  index ba00eef..58b4d9a 100644
>  --- a/arch/x86/kvm/paging_tmpl.h
>  +++ b/arch/x86/kvm/paging_tmpl.h
>  @@ -781,6 +781,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, 
struct kvm_mmu_page *sp,
>else
>nonpresent = shadow_notrap_nonpresent_pte;
>drop_spte(vcpu->kvm,&sp->spt[i], nonpresent);
>  + kvm_flush_remote_tlbs(vcpu->kvm);
>continue;
>}

This is not needed. Guest is responsible for flushing on
present->nonpresent change.

sync_page
drop_spte
kvm_mmu_notifier_invalidate_page
kvm_unmap_rmapp
spte doesn't exist -> no flush
page is freed
guest can write into freed page?

I don't think we need to flush immediately; set a "tlb dirty" bit 
somewhere that is cleareded when we flush the tlb.  
kvm_mmu_notifier_invalidate_page() can consult the bit and force a flush 
if set.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/6] KVM: MMU: don't drop spte if overwrite it from W to RO

2010-11-17 Thread Avi Kivity


On 11/17/2010 05:42 PM, Marcelo Tosatti wrote:

On Wed, Nov 17, 2010 at 12:10:50PM +0800, Xiao Guangrong wrote:

We just need flush tlb if overwrite a writable spte with a read-only
one.

And we should move this operation to set_spte() for sync_page path

Signed-off-by: Xiao Guangrong
---
  arch/x86/kvm/mmu.c |   20 +---
  1 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e008ae7..9bad960 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1966,7 +1966,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
gfn_t gfn, pfn_t pfn, bool speculative,
bool can_unsync, bool reset_host_protection)
  {
-   u64 spte;
+   u64 spte, entry = *sptep;
int ret = 0;

/*
@@ -2039,6 +2039,14 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,

  set_pte:
update_spte(sptep, spte);
+   /*
+* If we overwrite a writable spte with a read-only one we
+* should flush remote TLBs. Otherwise rmap_write_protect
+* will find a read-only spte, even though the writable spte
+* might be cached on a CPU's TLB.
+*/
+   if (is_writable_pte(entry)&&  !is_writable_pte(*sptep))
+   kvm_flush_remote_tlbs(vcpu->kvm);

There is no need to flush on sync_page path since the guest is
responsible for it.



 If we don't, the next rmap_write_protect() will incorrectly decide 
that there's no need to flush tlbs.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/6] KVM: MMU: fix forgot flush vcpu tlbs

2010-11-17 Thread Marcelo Tosatti

On Wed, Nov 17, 2010 at 12:10:08PM +0800, Xiao Guangrong wrote:
> Some paths forgot to flush vcpu tlbs after remove rmap, this
> patch fix it.
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c |   14 +++---
>  arch/x86/kvm/paging_tmpl.h |1 +
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index bdb9fa9..e008ae7 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -736,10 +736,16 @@ static int set_spte_track_bits(u64 *sptep, u64 new_spte)
>   return 1;
>  }
>  
> -static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
> +static bool drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
>  {
> - if (set_spte_track_bits(sptep, new_spte))
> + bool ret = false;
> +
> + if (set_spte_track_bits(sptep, new_spte)) {
>   rmap_remove(kvm, sptep);
> + ret = true;
> + }
> +
> + return ret;
>  }
>  
>  static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)
> @@ -1997,7 +2003,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>   if (level > PT_PAGE_TABLE_LEVEL &&
>   has_wrprotected_page(vcpu->kvm, gfn, level)) {
>   ret = 1;
> - drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte);
> + if (drop_spte(vcpu->kvm, sptep,
> +   shadow_trap_nonpresent_pte))
> + kvm_flush_remote_tlbs(vcpu->kvm);
>   goto done;

The spte should not be present before (this condition can happen if the
has_wrprotected_page check from mapping_level races, which is possible
since it runs without mmu_lock protection).

> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index ba00eef..58b4d9a 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -781,6 +781,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
> kvm_mmu_page *sp,
>   else
>   nonpresent = shadow_notrap_nonpresent_pte;
>   drop_spte(vcpu->kvm, &sp->spt[i], nonpresent);
> + kvm_flush_remote_tlbs(vcpu->kvm);
>   continue;
>   }

This is not needed. Guest is responsible for flushing on
present->nonpresent change.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/6] KVM: MMU: don't drop spte if overwrite it from W to RO

2010-11-17 Thread Marcelo Tosatti

On Wed, Nov 17, 2010 at 12:10:50PM +0800, Xiao Guangrong wrote:
> We just need flush tlb if overwrite a writable spte with a read-only
> one.
> 
> And we should move this operation to set_spte() for sync_page path
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c |   20 +---
>  1 files changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index e008ae7..9bad960 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1966,7 +1966,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>   gfn_t gfn, pfn_t pfn, bool speculative,
>   bool can_unsync, bool reset_host_protection)
>  {
> - u64 spte;
> + u64 spte, entry = *sptep;
>   int ret = 0;
>  
>   /*
> @@ -2039,6 +2039,14 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  
>  set_pte:
>   update_spte(sptep, spte);
> + /*
> +  * If we overwrite a writable spte with a read-only one we
> +  * should flush remote TLBs. Otherwise rmap_write_protect
> +  * will find a read-only spte, even though the writable spte
> +  * might be cached on a CPU's TLB.
> +  */
> + if (is_writable_pte(entry) && !is_writable_pte(*sptep))
> + kvm_flush_remote_tlbs(vcpu->kvm);

There is no need to flush on sync_page path since the guest is
responsible for it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5-light 0/6] KVM: Improve IRQ assignment for device passthrough

2010-11-17 Thread Marcelo Tosatti

On Tue, Nov 16, 2010 at 10:30:01PM +0100, Jan Kiszka wrote:
> This is the rebased light version of the previous series, i.e. without
> PCI-2.3-based IRQ masking or any SRCU conversion. PCI-2.3 support is
> under rework to explore options for automatic mode switches.
> 
> Jan Kiszka (6):
>   KVM: Clear assigned guest IRQ on release
>   KVM: Switch assigned device IRQ forwarding to threaded handler
>   KVM: Refactor IRQ names of assigned devices
>   KVM: Save/restore state of assigned PCI device
>   KVM: Clean up kvm_vm_ioctl_assigned_device
>   KVM: Document device assigment API
> 
>  Documentation/kvm/api.txt |  178 
> +
>  include/linux/kvm_host.h  |   13 +---
>  virt/kvm/assigned-dev.c   |  125 
>  3 files changed, 227 insertions(+), 89 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v2 1/2] Minimal RAM API support

2010-11-17 Thread Anthony Liguori


On 11/01/2010 10:14 AM, Alex Williamson wrote:

This adds a minimum chunk of Anthony's RAM API support so that we
can identify actual VM RAM versus all the other things that make
use of qemu_ram_alloc.

Signed-off-by: Alex Williamson
---

  Makefile.objs |1 +
  cpu-common.h  |2 +
  memory.c  |  109 +
  memory.h  |   18 +
  4 files changed, 130 insertions(+), 0 deletions(-)
  create mode 100644 memory.c
  create mode 100644 memory.h

diff --git a/Makefile.objs b/Makefile.objs
index f07fb01..33fae0b 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -154,6 +154,7 @@ hw-obj-y += vl.o loader.o
  hw-obj-y += virtio.o virtio-console.o
  hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o
  hw-obj-y += watchdog.o
+hw-obj-y += memory.o
  hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
  hw-obj-$(CONFIG_ECC) += ecc.o
  hw-obj-$(CONFIG_NAND) += nand.o
diff --git a/cpu-common.h b/cpu-common.h
index a543b5d..6aa2738 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -23,6 +23,8 @@
  /* address in the RAM (different from a physical address) */
  typedef unsigned long ram_addr_t;

+#include "memory.h"
+
  /* memory API */

  typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, 
uint32_t value);
diff --git a/memory.c b/memory.c
new file mode 100644
index 000..2895082
--- /dev/null
+++ b/memory.c
@@ -0,0 +1,109 @@
+/*
+ * RAM API
+ *
+ *  Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Alex Williamson
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see.
+ */
+#include "memory.h"
+#include "range.h"
+
+typedef struct QemuRamSlot {
+target_phys_addr_t start_addr;
+ram_addr_t size;
+ram_addr_t offset;
+QLIST_ENTRY(QemuRamSlot) next;
+} QemuRamSlot;
+
+typedef struct QemuRamSlots {
+QLIST_HEAD(slots, QemuRamSlot) slots;
+} QemuRamSlots;
   


No need for all of the 'Qemu' prefixes.


+
+static QemuRamSlots ram_slots = { .slots = QLIST_HEAD_INITIALIZER(ram_slots) };
   


Might be nicer to just typedef the extra struct away.


+static QemuRamSlot *qemu_ram_find_slot(target_phys_addr_t start_addr,
+   ram_addr_t size)
+{
+QemuRamSlot *slot;
+
+QLIST_FOREACH(slot,&ram_slots.slots, next) {
+if (slot->start_addr == start_addr&&  slot->size == size) {
+return slot;
+}
+
+if (ranges_overlap(start_addr, size, slot->start_addr, slot->size)) {
+abort();
   


Should display a message before aborting.


+}
+}
+
+return NULL;
+}
+
+int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
+   ram_addr_t phys_offset)
+{
+QemuRamSlot *slot;
+
+if (!size) {
+return -EINVAL;
+}
+
+assert(!qemu_ram_find_slot(start_addr, size));
+
+slot = qemu_mallocz(sizeof(QemuRamSlot));
+
+slot->start_addr = start_addr;
+slot->size = size;
+slot->offset = phys_offset;
+
+QLIST_INSERT_HEAD(&ram_slots.slots, slot, next);
+
+cpu_register_physical_memory(slot->start_addr, slot->size, slot->offset);
+
+return 0;
+}
+
+void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size)
+{
+QemuRamSlot *slot;
+
+if (!size) {
+return;
+}
+
+slot = qemu_ram_find_slot(start_addr, size);
+assert(slot != NULL);
+
+QLIST_REMOVE(slot, next);
+qemu_free(slot);
+cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED);
+
+return;
+}
+
+int qemu_ram_for_each_slot(void *opaque, qemu_ram_for_each_slot_fn fn)
+{
+QemuRamSlot *slot;
+
+QLIST_FOREACH(slot,&ram_slots.slots, next) {
+int ret = fn(opaque, slot->start_addr, slot->size, slot->offset);
+if (ret) {
+return ret;
+}
+}
+return 0;
+}
diff --git a/memory.h b/memory.h
new file mode 100644
index 000..0c17ff9
--- /dev/null
+++ b/memory.h
@@ -0,0 +1,18 @@
+#ifndef QEMU_MEMORY_H
+#define QEMU_MEMORY_H
+
+#include "qemu-common.h"
+#include "cpu-common.h"
   


Header needs copyright and would be nice to have some comments 
explaining these functions.


Regards,

Anthony Liguori


+int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
+  ram_addr_t phys_offset);
+
+void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size);
+
+typedef int (*qemu_ram_for_e

Re: [PATCH v2] device-assignment: register a reset function

2010-11-17 Thread Marcelo Tosatti

On Tue, Nov 16, 2010 at 03:05:29PM +0100, Bernhard Kohl wrote:
> This is necessary because during reboot of a VM the assigned devices
> continue DMA transfers which causes memory corruption.
> 
> Signed-off-by: Thomas Ostler 
> Signed-off-by: Bernhard Kohl 
> ---
> Changes v1 -> v2:
> - use defined macros, e.g. PCI_COMMAND
> - write all zero to the command register to disconnect the device logically
> ---
>  hw/device-assignment.c |   12 
>  1 files changed, 12 insertions(+), 0 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-17 Thread Avi Kivity


On 11/15/2010 11:15 AM, Sheng Yang wrote:

We need to query the entry later.

+int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
+   struct kvm_kernel_irq_routing_entry *entry)
+{
+   int count = 0;
+   struct kvm_kernel_irq_routing_entry *ei = NULL;
+   struct kvm_irq_routing_table *irq_rt;
+   struct hlist_node *n;
+
+   rcu_read_lock();
+   irq_rt = rcu_dereference(kvm->irq_routing);
+   if (gsi<  irq_rt->nr_rt_entries)
+   hlist_for_each_entry(ei, n,&irq_rt->map[gsi], link)
+   count++;
+   if (count == 1)
+   *entry = *ei;
+   rcu_read_unlock();
+
+   return (count != 1);
+}
+


Not good form to rely on ei being valid after the loop.

I guess this is only useful for msi?  Need to document it.

*entry may be stale after rcu_read_unlock().  Is this a problem?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-17 Thread Avi Kivity


On 11/15/2010 11:15 AM, Sheng Yang wrote:

This patch enable per-vector mask for assigned devices using MSI-X.

This patch provided two new APIs: one is for guest to specific device's MSI-X
table address in MMIO, the other is for userspace to get information about mask
bit.

All the mask bit operation are kept in kernel, in order to accelerate.
Userspace shouldn't access the device MMIO directly for the information,
instead it should uses provided API to do so.

Signed-off-by: Sheng Yang
---
  arch/x86/kvm/x86.c   |1 +
  include/linux/kvm.h  |   32 +
  include/linux/kvm_host.h |5 +
  virt/kvm/assigned-dev.c  |  318 +-
  4 files changed, 355 insertions(+), 1 deletions(-)



Documentation?


+static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int len,
+ void *val)
+{
+   struct kvm_assigned_dev_kernel *adev =
+   container_of(this, struct kvm_assigned_dev_kernel,
+msix_mmio_dev);
+   int idx, r = 0;
+   u32 entry[4];
+   struct kvm_kernel_irq_routing_entry e;
+
+   /* TODO: Get big-endian machine work */
+   mutex_lock(&adev->kvm->lock);
+   if (!msix_mmio_in_range(adev, addr, len)) {
+   r = -EOPNOTSUPP;
+   goto out;
+   }
+   if ((addr&  0x3) || len != 4)
+   goto out;
+
+   idx = msix_get_enabled_idx(adev, addr, len);
+   if (idx<  0) {
+   idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
+   if ((addr % PCI_MSIX_ENTRY_SIZE) ==
+   PCI_MSIX_ENTRY_VECTOR_CTRL)
+   *(unsigned long *)val =
+   test_bit(idx, adev->msix_mask_bitmap) ?
+   PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
+   else
+   r = -EOPNOTSUPP;
+   goto out;
+   }
+
+   r = kvm_get_irq_routing_entry(adev->kvm,
+   adev->guest_msix_entries[idx].vector,&e);
+   if (r || e.type != KVM_IRQ_ROUTING_MSI) {
+   r = -EOPNOTSUPP;
+   goto out;
+   }
+   entry[0] = e.msi.address_lo;
+   entry[1] = e.msi.address_hi;
+   entry[2] = e.msi.data;
+   entry[3] = test_bit(adev->guest_msix_entries[idx].entry,
+   adev->msix_mask_bitmap);
+   memcpy(val,&entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
+
+out:
+   mutex_unlock(&adev->kvm->lock);
+   return r;
+}
+
+static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
+  const void *val)
+{
+   struct kvm_assigned_dev_kernel *adev =
+   container_of(this, struct kvm_assigned_dev_kernel,
+msix_mmio_dev);
+   int idx, r = 0;
+   unsigned long new_val = *(unsigned long *)val;


What if it's a 64-bit write on a 32-bit host?

Are we sure the trailing bytes of val are zero?


+
+   /* TODO: Get big-endian machine work */


BUILD_BUG_ON(something)


+   mutex_lock(&adev->kvm->lock);
+   if (!msix_mmio_in_range(adev, addr, len)) {
+   r = -EOPNOTSUPP;
+   goto out;
+   }


Why is this needed?  Didn't the iodev check already do this?


+   if ((addr&  0x3) || len != 4)
+   goto out;


What if len == 8?  I think mst said it was legal.


+
+   idx = msix_get_enabled_idx(adev, addr, len);
+   if (idx<  0) {
+   idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
+   if (((addr % PCI_MSIX_ENTRY_SIZE) ==
+   PCI_MSIX_ENTRY_VECTOR_CTRL)) {
+   if (new_val&  ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
+   goto out;
+   if (new_val&  PCI_MSIX_ENTRY_CTRL_MASKBIT)
+   set_bit(idx, adev->msix_mask_bitmap);
+   else
+   clear_bit(idx, adev->msix_mask_bitmap);
+   /* It's possible that we need re-enable MSI-X, so go
+* back to userspace */
+   }
+   /* Userspace would handle other MMIO writing */
+   r = -EOPNOTSUPP;


That's not very good.  We should do the entire thing in the kernel or in 
userspace.  We can have a new EXIT_REASON to let userspace know an msix 
entry changed, and it should read it from the kernel.



+   goto out;
+   }
+   if (addr % PCI_MSIX_ENTRY_SIZE != PCI_MSIX_ENTRY_VECTOR_CTRL) {
+   r = -EOPNOTSUPP;
+   goto out;
+   }
+   if (new_val&  ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
+   goto out;
+   update_msix_mask(adev, idx, !!(new_val&  PCI_MSIX_ENTRY_CTRL_MASKBIT));
+out:
+   mutex_unlock(&adev->kvm->lock);
+
+   return r;
+}
+
+static int kvm_vm_ioctl_update_msix_mmio(struct kvm *kvm,
+

Re: [PATCH v3] device-assignment: Register as un-migratable

2010-11-17 Thread Marcelo Tosatti

On Mon, Nov 15, 2010 at 04:11:19PM -0700, Alex Williamson wrote:
> Use register_device_unmigratable() to declare ourselves as
> non-migratable.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
>  v3: Use .name instead of repeating "pci-assign"
>  v2: Use dummy vmsd instead of dummy save_state
> 
>  hw/device-assignment.c |   11 +++
>  1 files changed, 11 insertions(+), 0 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-17 Thread Marcelo Tosatti

On Wed, Nov 17, 2010 at 09:29:22AM +0800, Sheng Yang wrote:
> > > + adev->msix_mask_bitmap);
> > > + memcpy(val, &entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
> > 
> > Division by zero?
> 
> Not quite understand. You mean sizeof *entry or PCI_MSIX_ENTRY_SIZE? Both of 
> them 
> should be positive integer I think... Maybe I should use sizeof u32 here?

You're right, nevermind.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/2] KVM: x86 emulator: preserve an operand's segment identity

2010-11-17 Thread Avi Kivity

Currently the x86 emulator converts the segment register associated with
an operand into a segment base which is added into the operand address.
This loss of information results in us not doing segment limit checks properly.

Replace struct operand's addr.mem field by a segmented_address structure
which holds both the effetive address and segment.  This will allow us to
do the limit check at the point of access.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |5 ++-
 arch/x86/kvm/emulate.c |  106 +++-
 2 files changed, 59 insertions(+), 52 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index b36c6b3..b48c133 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -159,7 +159,10 @@ struct operand {
};
union {
unsigned long *reg;
-   unsigned long mem;
+   struct segmented_address {
+   ulong ea;
+   unsigned seg;
+   } mem;
} addr;
union {
unsigned long val;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3325b47..e967055 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -410,9 +410,9 @@ address_mask(struct decode_cache *c, unsigned long reg)
 }
 
 static inline unsigned long
-register_address(struct decode_cache *c, unsigned long base, unsigned long reg)
+register_address(struct decode_cache *c, unsigned long reg)
 {
-   return base + address_mask(c, reg);
+   return address_mask(c, reg);
 }
 
 static inline void
@@ -444,26 +444,26 @@ static unsigned long seg_base(struct x86_emulate_ctxt 
*ctxt,
return ops->get_cached_segment_base(seg, ctxt->vcpu);
 }
 
-static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt,
-  struct x86_emulate_ops *ops,
-  struct decode_cache *c)
+static unsigned seg_override(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+struct decode_cache *c)
 {
if (!c->has_seg_override)
return 0;
 
-   return seg_base(ctxt, ops, c->seg_override);
+   return c->seg_override;
 }
 
-static unsigned long es_base(struct x86_emulate_ctxt *ctxt,
-struct x86_emulate_ops *ops)
+static ulong linear(struct x86_emulate_ctxt *ctxt,
+   struct segmented_address addr)
 {
-   return seg_base(ctxt, ops, VCPU_SREG_ES);
-}
+   struct decode_cache *c = &ctxt->decode;
+   ulong la;
 
-static unsigned long ss_base(struct x86_emulate_ctxt *ctxt,
-struct x86_emulate_ops *ops)
-{
-   return seg_base(ctxt, ops, VCPU_SREG_SS);
+   la = seg_base(ctxt, ctxt->ops, addr.seg) + addr.ea;
+   if (c->ad_bytes != 8)
+   la &= (u32)-1;
+   return la;
 }
 
 static void emulate_exception(struct x86_emulate_ctxt *ctxt, int vec,
@@ -556,7 +556,7 @@ static void *decode_register(u8 modrm_reg, unsigned long 
*regs,
 
 static int read_descriptor(struct x86_emulate_ctxt *ctxt,
   struct x86_emulate_ops *ops,
-  ulong addr,
+  struct segmented_address addr,
   u16 *size, unsigned long *address, int op_bytes)
 {
int rc;
@@ -564,10 +564,12 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt,
if (op_bytes == 2)
op_bytes = 3;
*address = 0;
-   rc = ops->read_std(addr, (unsigned long *)size, 2, ctxt->vcpu, NULL);
+   rc = ops->read_std(linear(ctxt, addr), (unsigned long *)size, 2,
+  ctxt->vcpu, NULL);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = ops->read_std(addr + 2, address, op_bytes, ctxt->vcpu, NULL);
+   rc = ops->read_std(linear(ctxt, addr) + 2, address, op_bytes,
+  ctxt->vcpu, NULL);
return rc;
 }
 
@@ -760,7 +762,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
break;
}
}
-   op->addr.mem = modrm_ea;
+   op->addr.mem.ea = modrm_ea;
 done:
return rc;
 }
@@ -775,13 +777,13 @@ static int decode_abs(struct x86_emulate_ctxt *ctxt,
op->type = OP_MEM;
switch (c->ad_bytes) {
case 2:
-   op->addr.mem = insn_fetch(u16, 2, c->eip);
+   op->addr.mem.ea = insn_fetch(u16, 2, c->eip);
break;
case 4:
-   op->addr.mem = insn_fetch(u32, 4, c->eip);
+   op->addr.mem.ea = insn_fetch(u32, 4, c->eip);
break;
case 8:
-   op->addr.mem = insn_fetch(u64, 8, c->eip);
+   op->addr.mem.ea = insn_fetch(u64, 8, c->eip);
break;
}
 done:
@@ -800,7 +802,7 @

[PATCH v2 2/2] KVM: x86 emulator: do not perform address calculations on linear addresses

2010-11-17 Thread Avi Kivity

Linear addresses are supposed to already have segment checks performed on them;
if we play with these addresses the checks become invalid.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e967055..bdbbb18 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -568,7 +568,8 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt,
   ctxt->vcpu, NULL);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = ops->read_std(linear(ctxt, addr) + 2, address, op_bytes,
+   addr.ea += 2;
+   rc = ops->read_std(linear(ctxt, addr), address, op_bytes,
   ctxt->vcpu, NULL);
return rc;
 }
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/2] Introduce segmented addresses to the emulator

2010-11-17 Thread Avi Kivity

Currently we lose segment information associated with memory operands.  This
prevents us from doing proper segment checks.

This patchset prepares the way by remembering which segment is associated
with a memory operand.

Avi Kivity (2):
  KVM: x86 emulator: preserve an operand's segment identity
  v2: truncate linear address to 32 bits if not in long mode (thanks Gleb)
  KVM: x86 emulator: do not perform address calculations on linear
addresses
  v2: fix typo

 arch/x86/include/asm/kvm_emulate.h |5 +-
 arch/x86/kvm/emulate.c |  107 +++-
 2 files changed, 60 insertions(+), 52 deletions(-)

-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] device-assignment: register a reset function

2010-11-17 Thread Bernhard Kohl

This is necessary because during reboot of a VM the assigned devices
continue DMA transfers which causes memory corruption.

Signed-off-by: Thomas Ostler 
Signed-off-by: Bernhard Kohl 
---
Changes v1 -> v2:
- use defined macros, e.g. PCI_COMMAND
- write all zero to the command register to disconnect the device logically
---
 hw/device-assignment.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 5f5bde1..8d5a609 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1434,6 +1434,17 @@ static void 
assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
 dev->msix_table_page = NULL;
 }
 
+static void reset_assigned_device(DeviceState *dev)
+{
+PCIDevice *d = DO_UPCAST(PCIDevice, qdev, dev);
+
+/*
+ * When a 0 is written to the command register, the device is logically
+ * disconnected from the PCI bus. This avoids further DMA transfers.
+ */
+assigned_dev_pci_write_config(d, PCI_COMMAND, 0, 2);
+}
+
 static int assigned_initfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
@@ -1544,6 +1555,7 @@ static PCIDeviceInfo assign_info = {
 .qdev.name= "pci-assign",
 .qdev.desc= "pass through host pci devices to the guest",
 .qdev.size= sizeof(AssignedDevice),
+.qdev.reset   = reset_assigned_device,
 .init = assigned_initfn,
 .exit = assigned_exitfn,
 .config_read  = assigned_dev_pci_read_config,
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH] KVM test: Introduce a command-line wrapper

2010-11-17 Thread Jason Wang

This patch adds a command line wrapper in order make it easier to let
user to run dedicated tests with specified params. The idea is simple:
user specifiy the test params through commnad line, then wrapper
modifiy the configuration file automatically. This is possible as the
variants limitations are located at the end of file.

The params are categorized into two kinds:
- The params used to limit the variants, whose default configuration
was stored with "key->value" in test_cli.cfg and can be changed by
user. The cli just add 'only xxx' at the end of configuration file.

  The following options are used to limit the variants:
  --diskformat= qcow2/raw
  --nicmodel= virtio_net/rtl8139/e1000
  --driveformat= virtio_blk/ide/scsi
  --vcpu= up/smp2
  --pciassign= no_pci_assignable/pf_assignable/vf_assianable
  --pagesize= smallpages/hugepages
  --guest= Guest supports by autotest can be read from tests_base.cfg
  --testcase= Test cases you want to be run, can be read form
   test_base.cfg

- The test specific params such as "nic_mode", when cli get
"key=value" from cmdline, it just add it to the end of file.

Problem:
Use still need to know the configuration file (test_base.cfg) to
understand the name and test params.

TODO:
Call autotest cli to submit jobs to autotest server?

Example of usage:

1:./run-test.py --diskformat=qcow2 --nicmodel=virtio_net
--driveformat=virtio_blk --vcpu=smp2 --pciassign=no_pci_assignable
--pagesize=smallpages --guest=Fedora.14.64
--testcase="unattended_install.cdrom"

This would test installation for Fedora 14 64 bit guests with
qcow2 as its image format, virtio-net as its nic model, virtio_blk as
its drive format, 2 vcpus and with small pages and no assigned devices.

2: There's no need to specifiy all params through cmd line and you can
only specify the options you are interested.
./run-test.py
This would use default tests configuration just as the above.


3: Run an nfs based installation with 4 vcpu and 2G memory
./run-test.py --guest=RHEL.5.5.x86_64
--testcase="unattend_install.nfs" --nfs_server=$server
--nfs_dir=$directory --smp=4 --mem=2048

4: Use test specific params - run pxe using tap mode network
./run-test.py --nic_mode=tap --testcase=pxe
---
 client/tests/kvm/run-test.py  |   47 +
 client/tests/kvm/tests_cli.cfg.sample |   24 +
 2 files changed, 71 insertions(+), 0 deletions(-)
 create mode 100755 client/tests/kvm/run-test.py
 create mode 100644 client/tests/kvm/tests_cli.cfg.sample

diff --git a/client/tests/kvm/run-test.py b/client/tests/kvm/run-test.py
new file mode 100755
index 000..1ef171e
--- /dev/null
+++ b/client/tests/kvm/run-test.py
@@ -0,0 +1,47 @@
+#!/usr/bin/python
+"""
+Program to run dedicated test from command line.
+
+...@copyright: Red Hat 2010
+"""
+
+import sys, re, shutil, os
+
+def_params = {}
+help = {}
+extra_params = {}
+verbose = False
+
+def help(params):
+print "%s: kvm-autotest client test cli" % sys.argv[0]
+print "available options: "
+for key in params.keys():
+print "--%s=" % key
+
+if __name__ == "__main__":
+
+# Read default configuration
+for (key, value) in re.findall("#\s+(.*)->(.*)",
+   file("tests_cli.cfg").read()):
+def_params[key] = value
+
+for argv in sys.argv[1:]:
+if "help" in argv:
+help(def_params)
+sys.exit(0)
+try:
+(key, value) = re.findall("--(.*)=(.*)", argv)[0]
+if key in def_params.keys():
+def_params[key] = value
+else:
+extra_params[key] = value
+except IndexError:
+pass
+
+shutil.copy("tests_cli.cfg", "tests.cfg")
+for key in def_params.keys():
+file("tests.cfg","a+").write("only %s\n" % def_params[key])
+for (key, value) in extra_params.items():
+file("tests.cfg","a+").write("%s = %s\n" % (key, value))
+
+os.system("../../bin/autotest control --verbose")
diff --git a/client/tests/kvm/tests_cli.cfg.sample 
b/client/tests/kvm/tests_cli.cfg.sample
new file mode 100644
index 000..f5074bd
--- /dev/null
+++ b/client/tests/kvm/tests_cli.cfg.sample
@@ -0,0 +1,24 @@
+# Do not edit this file directly, it is used by run-test.py
+#
+include tests_base.cfg
+include cdkeys.cfg
+
+# As for the defaults:
+# * qemu and qemu-img are expected to be found under /usr/bin/qemu-kvm and
+#   /usr/bin/qemu-img respectively.
+# * All image files are expected under /tmp/kvm_autotest_root/images/
+# * All iso files are expected under /tmp/kvm_autotest_root/isos/
+qemu_img_binary = /usr/bin/qemu-img
+qemu_binary = /usr/bin/qemu-kvm
+image_name(_.*)? ?<= /tmp/kvm_autotest_root/images/
+cdrom(_.*)? ?<= /tmp/kvm_autotest_root/isos/
+
+# you can change default configuration here:
+# diskformat->qcow2
+# nicmodel->virtio_net
+# driveformat->virtio_blk
+# vcpu->smp2
+# pciassign->no_pci_assignable
+# pagesize->smallpages
+# guest->Fedora.14.64
+# testc

[PATCH 2/2] KVM: x86 emulator: do not perform address calculations on linear addresses

2010-11-17 Thread Avi Kivity

Linear addresses are supposed to already have segment checks performed on them;
if we play with these addresses the checks become invalid.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index cd13def..0fdaeeb 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -562,7 +562,8 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt,
   ctxt->vcpu, NULL);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = ops->read_std(linear(ctxt, addr) + 2, address, op_bytes,
+   addr.ea += 2;
+   rc = ops->read_std(linear(ctxt, addr, address, op_bytes,
   ctxt->vcpu, NULL);
return rc;
 }
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: x86 emulator: preserve an operand's segment identity

2010-11-17 Thread Avi Kivity

Currently the x86 emulator converts the segment register associated with
an operand into a segment base which is added into the operand address.
This loss of information results in us not doing segment limit checks properly.

Replace struct operand's addr.mem field by a segmented_address structure
which holds both the effetive address and segment.  This will allow us to
do the limit check at the point of access.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |5 ++-
 arch/x86/kvm/emulate.c |  102 +--
 2 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index b36c6b3..b48c133 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -159,7 +159,10 @@ struct operand {
};
union {
unsigned long *reg;
-   unsigned long mem;
+   struct segmented_address {
+   ulong ea;
+   unsigned seg;
+   } mem;
} addr;
union {
unsigned long val;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3325b47..cd13def 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -410,9 +410,9 @@ address_mask(struct decode_cache *c, unsigned long reg)
 }
 
 static inline unsigned long
-register_address(struct decode_cache *c, unsigned long base, unsigned long reg)
+register_address(struct decode_cache *c, unsigned long reg)
 {
-   return base + address_mask(c, reg);
+   return address_mask(c, reg);
 }
 
 static inline void
@@ -444,26 +444,20 @@ static unsigned long seg_base(struct x86_emulate_ctxt 
*ctxt,
return ops->get_cached_segment_base(seg, ctxt->vcpu);
 }
 
-static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt,
-  struct x86_emulate_ops *ops,
-  struct decode_cache *c)
+static unsigned seg_override(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+struct decode_cache *c)
 {
if (!c->has_seg_override)
return 0;
 
-   return seg_base(ctxt, ops, c->seg_override);
+   return c->seg_override;
 }
 
-static unsigned long es_base(struct x86_emulate_ctxt *ctxt,
-struct x86_emulate_ops *ops)
-{
-   return seg_base(ctxt, ops, VCPU_SREG_ES);
-}
-
-static unsigned long ss_base(struct x86_emulate_ctxt *ctxt,
-struct x86_emulate_ops *ops)
+static ulong linear(struct x86_emulate_ctxt *ctxt,
+   struct segmented_address addr)
 {
-   return seg_base(ctxt, ops, VCPU_SREG_SS);
+   return seg_base(ctxt, ctxt->ops, addr.seg) + addr.ea;
 }
 
 static void emulate_exception(struct x86_emulate_ctxt *ctxt, int vec,
@@ -556,7 +550,7 @@ static void *decode_register(u8 modrm_reg, unsigned long 
*regs,
 
 static int read_descriptor(struct x86_emulate_ctxt *ctxt,
   struct x86_emulate_ops *ops,
-  ulong addr,
+  struct segmented_address addr,
   u16 *size, unsigned long *address, int op_bytes)
 {
int rc;
@@ -564,10 +558,12 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt,
if (op_bytes == 2)
op_bytes = 3;
*address = 0;
-   rc = ops->read_std(addr, (unsigned long *)size, 2, ctxt->vcpu, NULL);
+   rc = ops->read_std(linear(ctxt, addr), (unsigned long *)size, 2,
+  ctxt->vcpu, NULL);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = ops->read_std(addr + 2, address, op_bytes, ctxt->vcpu, NULL);
+   rc = ops->read_std(linear(ctxt, addr) + 2, address, op_bytes,
+  ctxt->vcpu, NULL);
return rc;
 }
 
@@ -760,7 +756,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
break;
}
}
-   op->addr.mem = modrm_ea;
+   op->addr.mem.ea = modrm_ea;
 done:
return rc;
 }
@@ -775,13 +771,13 @@ static int decode_abs(struct x86_emulate_ctxt *ctxt,
op->type = OP_MEM;
switch (c->ad_bytes) {
case 2:
-   op->addr.mem = insn_fetch(u16, 2, c->eip);
+   op->addr.mem.ea = insn_fetch(u16, 2, c->eip);
break;
case 4:
-   op->addr.mem = insn_fetch(u32, 4, c->eip);
+   op->addr.mem.ea = insn_fetch(u32, 4, c->eip);
break;
case 8:
-   op->addr.mem = insn_fetch(u64, 8, c->eip);
+   op->addr.mem.ea = insn_fetch(u64, 8, c->eip);
break;
}
 done:
@@ -800,7 +796,7 @@ static void fetch_bit_operand(struct decode_cache *c)
else if (c->src.bytes == 4)
sv = (s32)c->src.val

[PATCH 0/2] Minor emulator cleanups

2010-11-17 Thread Avi Kivity

A couple of trivial patches that clean up a bit of cruft from the emulator.

Avi Kivity (2):
  KVM: x86 emulator: drop unused #ifndef __KERNEL__
  KVM: x86 emulator: drop DPRINTF()

 arch/x86/kvm/emulate.c |   14 +-
 1 files changed, 1 insertions(+), 13 deletions(-)

-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: x86 emulator: drop DPRINTF()

2010-11-17 Thread Avi Kivity

Failed emulation is reported via a tracepoint; the cmps printk is pointless.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ffd6e01..3325b47 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -22,7 +22,6 @@
 
 #include 
 #include "kvm_cache_regs.h"
-#define DPRINTF(x...) do {} while (0)
 #include 
 #include 
 
@@ -2796,10 +2795,8 @@ done_prefixes:
c->execute = opcode.u.execute;
 
/* Unrecognised? */
-   if (c->d == 0 || (c->d & Undefined)) {
-   DPRINTF("Cannot emulate %02x\n", c->b);
+   if (c->d == 0 || (c->d & Undefined))
return -1;
-   }
 
if (mode == X86EMUL_MODE_PROT64 && (c->d & Stack))
c->op_bytes = 8;
@@ -3261,7 +3258,6 @@ special_insn:
break;
case 0xa6 ... 0xa7: /* cmps */
c->dst.type = OP_NONE; /* Disable writeback. */
-   DPRINTF("cmps: mem1=0x%p mem2=0x%p\n", c->src.addr.mem, 
c->dst.addr.mem);
goto cmp;
case 0xa8 ... 0xa9: /* test ax, imm */
goto test;
@@ -3778,6 +3774,5 @@ twobyte_insn:
goto writeback;
 
 cannot_emulate:
-   DPRINTF("Cannot emulate %02x\n", c->b);
return -1;
 }
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] Introduce segmented addresses to the emulator

2010-11-17 Thread Avi Kivity

Currently we lose segment information associated with memory operands.  This
prevents us from doing proper segment checks.

This patchset prepares the way by remembering which segment is associated
with a memory operand.

Avi Kivity (2):
  KVM: x86 emulator: preserve an operand's segment identity
  KVM: x86 emulator: do not perform address calculations on linear
addresses

 arch/x86/include/asm/kvm_emulate.h |5 ++-
 arch/x86/kvm/emulate.c |  103 ++--
 2 files changed, 55 insertions(+), 53 deletions(-)

-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: x86 emulator: drop unused #ifndef KERNEL

2010-11-17 Thread Avi Kivity

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |7 ---
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 38b6e8d..ffd6e01 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -20,16 +20,9 @@
  * From: xen-unstable 10676:af9809f51f81a3c43f276f00c81a52ef558afda4
  */
 
-#ifndef __KERNEL__
-#include 
-#include 
-#include 
-#define DPRINTF(_f, _a ...) printf(_f , ## _a)
-#else
 #include 
 #include "kvm_cache_regs.h"
 #define DPRINTF(x...) do {} while (0)
-#endif
 #include 
 #include 
 
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5-light 0/6] KVM: Improve IRQ assignment for device passthrough

2010-11-17 Thread Michael S. Tsirkin

On Tue, Nov 16, 2010 at 10:30:01PM +0100, Jan Kiszka wrote:
> This is the rebased light version of the previous series, i.e. without
> PCI-2.3-based IRQ masking or any SRCU conversion. PCI-2.3 support is
> under rework to explore options for automatic mode switches.
> 
> Jan Kiszka (6):
>   KVM: Clear assigned guest IRQ on release
>   KVM: Switch assigned device IRQ forwarding to threaded handler
>   KVM: Refactor IRQ names of assigned devices
>   KVM: Save/restore state of assigned PCI device
>   KVM: Clean up kvm_vm_ioctl_assigned_device
>   KVM: Document device assigment API
> 
>  Documentation/kvm/api.txt |  178 
> +
>  include/linux/kvm_host.h  |   13 +---
>  virt/kvm/assigned-dev.c   |  125 
>  3 files changed, 227 insertions(+), 89 deletions(-)

Acked-by: Michael S. Tsirkin 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM test: Let vhost use netdev_extra_params

2010-11-17 Thread Jason Wang

As we have a more generic support for netdev configuration, this patch
enable the vhost through netdev_extra_params.

Signed-off-by: Jason Wang 
---
 client/tests/kvm/tests_base.cfg.sample |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 2ae7f78..17fd7ba 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -716,9 +716,11 @@ variants:
 nic_model = virtio
 # You can add advanced attributes on nic_extra_params such as mrg_rxbuf
 #nic_extra_params =
-# You can set vhost = yes to enable the vhost kernel backend
-# (This only works if nic_mode=tap)
-vhost = no
+# You can add advanced attributes through netdev_extra_params
+# such as sndbuf, as an example, you can uncomment the
+# following lines to enable the vhost support ( only available
+# for tap )
+#netdev_extra_params = vhost=on,
 jumbo:
 mtu = 65520
 ethtool:

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM test: Support params for netdev

2010-11-17 Thread Jason Wang

Qemu-kvm can config the netdev through its command line parameters,
this patch add the support for it and user could use pass params like
sndbuf to qemu through netdev_extra_params in config file.

Signed-off-by: Jason Wang 
---
 client/tests/kvm/kvm_vm.py |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 72f5296..fdbaa90 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -260,11 +260,11 @@ class VM:
 
 def add_net(help, vlan, mode, ifname=None, script=None,
 downscript=None, tftp=None, bootfile=None, hostfwd=[],
-netdev_id=None, vhost=False):
+netdev_id=None, netdev_extra_params=None):
 if has_option(help, "netdev"):
 cmd = " -netdev %s,id=%s" % (mode, netdev_id)
-if vhost:
-cmd += ",vhost=on"
+if netdev_extra_params:
+cmd += ",%s" % netdev_extra_params
 else:
 cmd = " -net %s,vlan=%d" % (mode, vlan)
 if mode == "tap":
@@ -420,7 +420,7 @@ class VM:
 script, downscript, tftp,
 nic_params.get("bootp"), redirs,
 self.netdev_id[vlan],
-nic_params.get("vhost")=="yes")
+nic_params.get("netdev_extra_params"))
 # Proceed to next NIC
 vlan += 1
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM test: Minor enhancement for kdump

2010-11-17 Thread Jason Wang

Make the crash kerenl prob command could be configurated.
Use carshkernel=128M instead of crashkerenl=1...@64m.

Signed-off-by: Jason Wang 
---
 client/tests/kvm/tests/kdump.py|   10 ++
 client/tests/kvm/tests_base.cfg.sample |4 +++-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/client/tests/kvm/tests/kdump.py b/client/tests/kvm/tests/kdump.py
index a5843c7..90c8cd2 100644
--- a/client/tests/kvm/tests/kdump.py
+++ b/client/tests/kvm/tests/kdump.py
@@ -18,11 +18,14 @@ def run_kdump(test, params, env):
 timeout = float(params.get("login_timeout", 240))
 crash_timeout = float(params.get("crash_timeout", 360))
 session = kvm_test_utils.wait_for_login(vm, 0, timeout, 0, 2)
-def_kernel_param_cmd = ("grubby --update-kernel=`grubby --default-kernel`"
-" --args=crashkernel=1...@64m")
+def_kernel_param_cmd = "grubby --update-kernel=`grubby --default-kernel`" \
+   " --args=crashkernel=128M"
 kernel_param_cmd = params.get("kernel_param_cmd", def_kernel_param_cmd)
 def_kdump_enable_cmd = "chkconfig kdump on && service kdump start"
 kdump_enable_cmd = params.get("kdump_enable_cmd", def_kdump_enable_cmd)
+def_crash_kernel_prob_cmd = "grep -q 1 /sys/kernel/kexec_crash_loaded"
+crash_kernel_prob_cmd = params.get("crash_kernel_prob_cmd",
+   def_crash_kernel_prob_cmd)
 
 def crash_test(vcpu):
 """
@@ -55,8 +58,7 @@ def run_kdump(test, params, env):
 
 try:
 logging.info("Checking the existence of crash kernel...")
-prob_cmd = "grep -q 1 /sys/kernel/kexec_crash_loaded"
-s = session.get_command_status(prob_cmd)
+s = session.get_command_status(crash_kernel_prob_cmd)
 if s != 0:
 logging.info("Crash kernel is not loaded. Trying to load it")
 # We need to setup the kernel params
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 2ae7f78..3c248e2 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -667,7 +667,7 @@ variants:
 - clock_getres: install setup unattended_install.cdrom
 type = clock_getres
 
-- kdump:
+- kdump: unattended_install.cdrom
 type = kdump
 # time waited for the completion of crash dump
 # crash_timeout = 360
@@ -675,6 +675,8 @@ variants:
 # kernel_param_cmd = "grubby --update-kernel=`grubby --default-kernel` 
--args=crashkernel=1...@64m"
 # command to enable kdump service
 # kdump_enable_cmd = chkconfig kdump on && service kdump start
+# command to probe the crash kernel
+# crash_kernel_prob_cmd = "grep -q 1 /sys/kernel/kexec_crash_loaded"
 
 # system_powerdown, system_reset and shutdown *must* be the last ones
 # defined (in this order), since the effect of such tests can leave

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/2] Type-safe ioport callbacks

2010-11-17 Thread Avi Kivity

The current ioport callbacks are not type-safe, in that they accept an "opaque"
pointer as an argument whose type must match the argument to the registration
function; this is not checked by the compiler.

This patch adds an alternative that is type-safe.  Instead of an opaque
argument, both registation and the callback use a new IOPort type.  The
callback then uses container_of() to access its main structures.

Currently the old and new methods exist side by side; once the old way is gone,
we can also save a bunch of memory since the new method requires one pointer
per ioport instead of 6.

Acked-by: Anthony Liguori 
Signed-off-by: Avi Kivity 
---
 ioport.c  |   64 +
 ioport.h  |2 +
 iorange.h |   30 
 3 files changed, 96 insertions(+), 0 deletions(-)
 create mode 100644 iorange.h

diff --git a/ioport.c b/ioport.c
index ec3dc65..aa4188a 100644
--- a/ioport.c
+++ b/ioport.c
@@ -174,6 +174,70 @@ int register_ioport_write(pio_addr_t start, int length, 
int size,
 return 0;
 }
 
+static uint32_t ioport_readb_thunk(void *opaque, uint32_t addr)
+{
+IORange *ioport = opaque;
+uint64_t data;
+
+ioport->ops->read(ioport, addr - ioport->base, 1, &data);
+return data;
+}
+
+static uint32_t ioport_readw_thunk(void *opaque, uint32_t addr)
+{
+IORange *ioport = opaque;
+uint64_t data;
+
+ioport->ops->read(ioport, addr - ioport->base, 2, &data);
+return data;
+}
+
+static uint32_t ioport_readl_thunk(void *opaque, uint32_t addr)
+{
+IORange *ioport = opaque;
+uint64_t data;
+
+ioport->ops->read(ioport, addr - ioport->base, 4, &data);
+return data;
+}
+
+static void ioport_writeb_thunk(void *opaque, uint32_t addr, uint32_t data)
+{
+IORange *ioport = opaque;
+
+ioport->ops->write(ioport, addr - ioport->base, 1, data);
+}
+
+static void ioport_writew_thunk(void *opaque, uint32_t addr, uint32_t data)
+{
+IORange *ioport = opaque;
+
+ioport->ops->write(ioport, addr - ioport->base, 2, data);
+}
+
+static void ioport_writel_thunk(void *opaque, uint32_t addr, uint32_t data)
+{
+IORange *ioport = opaque;
+
+ioport->ops->write(ioport, addr - ioport->base, 4, data);
+}
+
+void ioport_register(IORange *ioport)
+{
+register_ioport_read(ioport->base, ioport->len, 1,
+ ioport_readb_thunk, ioport);
+register_ioport_read(ioport->base, ioport->len, 2,
+ ioport_readw_thunk, ioport);
+register_ioport_read(ioport->base, ioport->len, 4,
+ ioport_readl_thunk, ioport);
+register_ioport_write(ioport->base, ioport->len, 1,
+  ioport_writeb_thunk, ioport);
+register_ioport_write(ioport->base, ioport->len, 2,
+  ioport_writew_thunk, ioport);
+register_ioport_write(ioport->base, ioport->len, 4,
+  ioport_writel_thunk, ioport);
+}
+
 void isa_unassign_ioport(pio_addr_t start, int length)
 {
 int i;
diff --git a/ioport.h b/ioport.h
index 3d3c8a3..5ae62a3 100644
--- a/ioport.h
+++ b/ioport.h
@@ -25,6 +25,7 @@
 #define IOPORT_H
 
 #include "qemu-common.h"
+#include "iorange.h"
 
 typedef uint32_t pio_addr_t;
 #define FMT_pioaddr PRIx32
@@ -36,6 +37,7 @@ typedef uint32_t pio_addr_t;
 typedef void (IOPortWriteFunc)(void *opaque, uint32_t address, uint32_t data);
 typedef uint32_t (IOPortReadFunc)(void *opaque, uint32_t address);
 
+void ioport_register(IORange *iorange);
 int register_ioport_read(pio_addr_t start, int length, int size,
  IOPortReadFunc *func, void *opaque);
 int register_ioport_write(pio_addr_t start, int length, int size,
diff --git a/iorange.h b/iorange.h
new file mode 100644
index 000..9783168
--- /dev/null
+++ b/iorange.h
@@ -0,0 +1,30 @@
+#ifndef IORANGE_H
+#define IORANGE_H
+
+#include 
+
+typedef struct IORange IORange;
+typedef struct IORangeOps IORangeOps;
+
+struct IORangeOps {
+void (*read)(IORange *iorange, uint64_t offset, unsigned width,
+ uint64_t *data);
+void (*write)(IORange *iorange, uint64_t offset, unsigned width,
+  uint64_t data);
+};
+
+struct IORange {
+const IORangeOps *ops;
+uint64_t base;
+uint64_t len;
+};
+
+static inline void iorange_init(IORange *iorange, const IORangeOps *ops,
+uint64_t base, uint64_t len)
+{
+iorange->ops = ops;
+iorange->base = base;
+iorange->len = len;
+}
+
+#endif
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/2] Type-safe ioport callbacks

2010-11-17 Thread Avi Kivity

A not-so-recent qemu -> qemu-kvm merge broke cpu hotplug without the compiler
complaining because of the type-unsafeness of the ioport callbacks.  This
patchset adds a type-safe variant of ioport callbacks and coverts a sample
ioport.  Converting the other 300-odd registrations is left as an excercise
to the community.

v3:
 - define a common IORange that can also be used for mmio
 - move start/length into IORange
 - make access width a parameter of the access functions instead of
   having a callback per access size

v2:
 - const correctness
 - avoid return void

Avi Kivity (2):
  Type-safe ioport callbacks
  piix4 acpi: convert io BAR to type-safe ioport callbacks

 hw/acpi_piix4.c |   55 +++
 ioport.c|   64 +++
 ioport.h|2 +
 iorange.h   |   30 +
 4 files changed, 118 insertions(+), 33 deletions(-)
 create mode 100644 iorange.h

-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/2] piix4 acpi: convert io BAR to type-safe ioport callbacks

2010-11-17 Thread Avi Kivity

Acked-by: Anthony Liguori 
Signed-off-by: Avi Kivity 
---
 hw/acpi_piix4.c |   55 ++-
 1 files changed, 22 insertions(+), 33 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index f549089..173d781 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -52,6 +52,7 @@ struct pci_status {
 
 typedef struct PIIX4PMState {
 PCIDevice dev;
+IORange ioport;
 uint16_t pmsts;
 uint16_t pmen;
 uint16_t pmcntrl;
@@ -128,10 +129,16 @@ static void pm_tmr_timer(void *opaque)
 pm_update_sci(s);
 }
 
-static void pm_ioport_writew(void *opaque, uint32_t addr, uint32_t val)
+static void pm_ioport_write(IORange *ioport, uint64_t addr, unsigned width,
+uint64_t val)
 {
-PIIX4PMState *s = opaque;
-addr &= 0x3f;
+PIIX4PMState *s = container_of(ioport, PIIX4PMState, ioport);
+
+if (width != 2) {
+PIIX4_DPRINTF("PM write port=0x%04x width=%d val=0x%08x\n",
+  (unsigned)addr, width, (unsigned)val);
+}
+
 switch(addr) {
 case 0x00:
 {
@@ -184,12 +191,12 @@ static void pm_ioport_writew(void *opaque, uint32_t addr, 
uint32_t val)
 PIIX4_DPRINTF("PM writew port=0x%04x val=0x%04x\n", addr, val);
 }
 
-static uint32_t pm_ioport_readw(void *opaque, uint32_t addr)
+static void pm_ioport_read(IORange *ioport, uint64_t addr, unsigned width,
+uint64_t *data)
 {
-PIIX4PMState *s = opaque;
+PIIX4PMState *s = container_of(ioport, PIIX4PMState, ioport);
 uint32_t val;
 
-addr &= 0x3f;
 switch(addr) {
 case 0x00:
 val = get_pmsts(s);
@@ -200,27 +207,6 @@ static uint32_t pm_ioport_readw(void *opaque, uint32_t 
addr)
 case 0x04:
 val = s->pmcntrl;
 break;
-default:
-val = 0;
-break;
-}
-PIIX4_DPRINTF("PM readw port=0x%04x val=0x%04x\n", addr, val);
-return val;
-}
-
-static void pm_ioport_writel(void *opaque, uint32_t addr, uint32_t val)
-{
-//PIIX4PMState *s = opaque;
-PIIX4_DPRINTF("PM writel port=0x%04x val=0x%08x\n", addr & 0x3f, val);
-}
-
-static uint32_t pm_ioport_readl(void *opaque, uint32_t addr)
-{
-PIIX4PMState *s = opaque;
-uint32_t val;
-
-addr &= 0x3f;
-switch(addr) {
 case 0x08:
 val = get_pmtmr(s);
 break;
@@ -228,10 +214,15 @@ static uint32_t pm_ioport_readl(void *opaque, uint32_t 
addr)
 val = 0;
 break;
 }
-PIIX4_DPRINTF("PM readl port=0x%04x val=0x%08x\n", addr, val);
-return val;
+PIIX4_DPRINTF("PM readw port=0x%04x val=0x%04x\n", addr, val);
+*data = val;
 }
 
+static const IORangeOps pm_iorange_ops = {
+.read = pm_ioport_read,
+.write = pm_ioport_write,
+};
+
 static void apm_ctrl_changed(uint32_t val, void *arg)
 {
 PIIX4PMState *s = arg;
@@ -265,10 +256,8 @@ static void pm_io_space_update(PIIX4PMState *s)
 
 /* XXX: need to improve memory and ioport allocation */
 PIIX4_DPRINTF("PM: mapping to 0x%x\n", pm_io_base);
-register_ioport_write(pm_io_base, 64, 2, pm_ioport_writew, s);
-register_ioport_read(pm_io_base, 64, 2, pm_ioport_readw, s);
-register_ioport_write(pm_io_base, 64, 4, pm_ioport_writel, s);
-register_ioport_read(pm_io_base, 64, 4, pm_ioport_readl, s);
+iorange_init(&s->ioport, &pm_iorange_ops, pm_io_base, 64);
+ioport_register(&s->ioport);
 }
 }
 
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC

2010-11-17 Thread Gleb Natapov

On Tue, Nov 16, 2010 at 02:24:06PM -0700, Alex Williamson wrote:
> On Tue, 2010-11-16 at 08:58 -0600, Anthony Liguori wrote:
> > On 11/01/2010 10:14 AM, Alex Williamson wrote:
> > > Register the actual VM RAM using the new API
> > >
> > > Signed-off-by: Alex Williamson
> > > ---
> > >
> > >   hw/pc.c |   12 ++--
> > >   1 files changed, 6 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/hw/pc.c b/hw/pc.c
> > > index 69b13bf..0ea6d10 100644
> > > --- a/hw/pc.c
> > > +++ b/hw/pc.c
> > > @@ -912,14 +912,14 @@ void pc_memory_init(ram_addr_t ram_size,
> > >   /* allocate RAM */
> > >   ram_addr = qemu_ram_alloc(NULL, "pc.ram",
> > > below_4g_mem_size + above_4g_mem_size);
> > > -cpu_register_physical_memory(0, 0xa, ram_addr);
> > > -cpu_register_physical_memory(0x10,
> > > - below_4g_mem_size - 0x10,
> > > - ram_addr + 0x10);
> > > +
> > > +qemu_ram_register(0, 0xa, ram_addr);
> > > +qemu_ram_register(0x10, below_4g_mem_size - 0x10,
> > > +  ram_addr + 0x10);
> > >   #if TARGET_PHYS_ADDR_BITS>  32
> > >   if (above_4g_mem_size>  0) {
> > > -cpu_register_physical_memory(0x1ULL, above_4g_mem_size,
> > > - ram_addr + below_4g_mem_size);
> > > +qemu_ram_register(0x1ULL, above_4g_mem_size,
> > > +  ram_addr + below_4g_mem_size);
> > >   }
> > >
> > 
> > Take a look at the memory shadowing in the i440fx.  The regions of 
> > memory in the BIOS area can temporarily become RAM.
> > 
> > That's because there is normally RAM backing this space but the memory 
> > controller redirects writes to the ROM space.
> > 
> > Not sure the best way to handle this, but the basic concept is, RAM 
> > always exists but if a device tries to access it, it may or may not be 
> > accessible as RAM at any given point in time.
> 
> Gack.  For the benefit of those that want to join the fun without
> digging up the spec, these magic flippable segments the i440fx can
> toggle are 12 fixed 16k segments from 0xc to 0xe and a single
> 64k segment from 0xf to 0xf.  There are read-enable and
> write-enable bits for each, so the chipset can be configured to read
> from the bios and write to memory (to setup BIOS-RAM caching), and read
> from memory and write to the bios (to enable BIOS-RAM caching).  The
> other bit combinations are also available.
> 
There is also 0xa−0xb which is usually part of framebuffer, but
chipset can be configured to access this memory as RAM when CPU is in
SMM mode.

> For my purpose in using this to program the IOMMU with guest physical to
> host virtual addresses for device assignment, it doesn't really matter
> since there should never be a DMA in this range of memory.  But for a
IIRC spec defines for each range of memory if it is accessed from PCI bus.

> general RAM API, I'm not sure either.  I'm tempted to say that while
> this is in fact a use of RAM, the RAM is never presented to the guest as
> usable system memory (E820_RAM for x86), and should therefore be
> excluded from the RAM API if we're using it only to track regions that
> are actual guest usable physical memory.
A guest is no only OS (like Windows or Linux), but the bios code is also part
of the guest and it can access all of this memory.

> 
> We had talked on irc that pc.c should be registering 0x0 to
> below_4g_mem_size as ram, but now I tend to disagree with that.  The
> memory backing 0xa-0x10 is present, but it's not presented to
> the guest as usable RAM.
It is, during SMM, if bios configured chipset to do so.
 
>  What's your strict definition of what the RAM
> API includes?  Is it only what the guest could consider usable RAM or
> does it also include quirky chipset accelerator features like this
> (everything with a guest physical address)?  Thanks,
> 

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5-light 0/6] KVM: Improve IRQ assignment for device passthrough

2010-11-17 Thread Avi Kivity


On 11/16/2010 11:30 PM, Jan Kiszka wrote:

This is the rebased light version of the previous series, i.e. without
PCI-2.3-based IRQ masking or any SRCU conversion. PCI-2.3 support is
under rework to explore options for automatic mode switches.



Looks good, especially the threaded irq - a real winner.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 0/9] KVM: Improve IRQ assignment for device passthrough

2010-11-17 Thread Avi Kivity


On 11/16/2010 08:26 PM, Jan Kiszka wrote:

Am 16.11.2010 17:55, Marcelo Tosatti wrote:
>  On Mon, Nov 08, 2010 at 12:21:44PM +0100, Jan Kiszka wrote:
>>  Nine patches (yeah, it's getting more and more) to improve "classic"
>>  device assigment /wrt IRQs. Highlight is the last one that resolves the
>>  host IRQ sharing issue for all PCI 2.3 devices. Quite essential when
>>  passing non-MSI-ready devices like many USB host controllers.
>>
>>  As there were concerns regarding the overhead of IRQ masking via the PCI
>>  config space, I did some micro-benchmarks. Well, the concerns are valid:
>>
>>  disable_irq_nosync:   ~600 cycles
>>  pci_2_3_irq_check_and_mask:  ~6000 cycles (EHCI)
>>  ~22000 cycles (AR9287, with peaks>10)
>>
>>  Specifically the varying impact of the device like in the Atheros case
>>  is worrying (this device is actually known to cause horrible latencies
>>  to the host, but who knows what other devices do). So I decided to go
>>  with PCI-2.3 masking as default off in the to-be-sent qemu-kvm patch.
>>  Maybe something to consider vor VFIO as well.
>
>  Looks fine to me. Michael, Alex?
>
>  Also needs a rebase.

I think Avi wanted to explore possibilities to avoid host_pci_2_3=on|off
by switching between both modes on demand. I've some ideas, but still
need to look into design details. Given that this would have impact on
the ABI, I guess we better wait with merging 9/9.


Indeed.


Moreover, Avi preferred to skip the srcu conversion and run with a
simply lock across kvm_set_irq (just like we do now).


Yes.  It was predicated on agreeing that deferring broadcast/multicast 
IPIs to a workqueue is possible.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v15 00/17] Provide a zero-copy method on KVM virtio-net.

2010-11-17 Thread Xin, Xiaohui

>-Original Message-
>From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
>Of Xin,
>Xiaohui
>Sent: Thursday, November 11, 2010 4:28 PM
>To: David Miller
>Cc: net...@vger.kernel.org; kvm@vger.kernel.org; linux-ker...@vger.kernel.org;
>m...@redhat.com; mi...@elte.hu; herb...@gondor.apana.org.au; 
>jd...@linux.intel.com
>Subject: RE: [PATCH v15 00/17] Provide a zero-copy method on KVM virtio-net.
>
>>-Original Message-
>>From: David Miller [mailto:da...@davemloft.net]
>>Sent: Thursday, November 11, 2010 1:47 AM
>>To: Xin, Xiaohui
>>Cc: net...@vger.kernel.org; kvm@vger.kernel.org; linux-ker...@vger.kernel.org;
>>m...@redhat.com; mi...@elte.hu; herb...@gondor.apana.org.au; 
>>jd...@linux.intel.com
>>Subject: Re: [PATCH v15 00/17] Provide a zero-copy method on KVM virtio-net.
>>
>>From: xiaohui@intel.com
>>Date: Wed, 10 Nov 2010 17:23:28 +0800
>>
>>> From: Xin Xiaohui 
>>>
2) The idea to key off of skb->dev in skb_release_data() is
   fundamentally flawed since many actions can change skb->dev on you,
   which will end up causing a leak of your external data areas.
>>>
>>> How about this one? If the destructor_arg is not a good candidate,
>>> then I have to add an apparent field in shinfo.
>>
>>If destructor_arg is actually a net_device pointer or similar,
>>you will need to take a reference count on it or similar.
>>
>Do you mean destructor_arg will be consumed by other user?
>If that case, may I add a new structure member in shinfo?
>Thus only zero-copy will use it, and no need for the reference count.
>
How about this? It really needs somewhere to track the external data area,
and if something wrong with it, we can also release the data area. We think 
skb_release_data() is the right place to deal with it. If I understood right,
that destructor_arg will be used by other else that why reference count is
needed, then how about add a new structure member in shinfo?

Thanks
Xiaohui 

>>Which means --> good bye performance especially on SMP.
>>
>>You're going to be adding new serialization points and at
>>least two new atomics per packet.
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

84 matches

Mail list logo