Re: [PATCH RFC] virtio-net: remove useless disable on freeze

2012-05-30 Thread Rusty Russell
On Mon, 28 May 2012 15:53:25 +0300, "Michael S. Tsirkin"  
wrote:
> On Wed, Apr 04, 2012 at 12:19:54PM +0300, Michael S. Tsirkin wrote:
> > disable_cb is just an optimization: it
> > can not guarantee that there are no callbacks.
> > 
> > I didn't yet figure out whether a callback
> > in freeze will trigger a bug, but disable_cb
> > won't address it in any case. So let's remove
> > the useless calls as a first step.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> 
> Looks like this isn't in the 3.5 pull request -
> just lost in the shuffle?
> disable_cb is advisory so can't be relied upon.

I always (try to?) reply as I accept patches.

This one did slip by, but it's harmless so no need to push AFAICT.

Applied.

Thanks!
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spapr: Add "memop" hypercall

2012-05-30 Thread Benjamin Herrenschmidt
On Mon, 2012-05-28 at 13:40 +0300, Avi Kivity wrote:
> Depends.  How do you detect it exists?  Are you detecting kvm, or qemu,
> or the hypercall itself?
> 
> I'd hate us to find ourselves in a maze of disconnected documentation
> with no clear guidelines on when a feature is available and when it is not.

At the moment SLOF just "uses it" when using the frame buffer. We could
advertise its presence via the device-tree, there's already stuff there
to expose what hypercalls or set of hypecalls are implemented for PAPR,
we could add qemu specific extensions.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] Enable LTR/OBFF before device is used by driver

2012-05-30 Thread Hao, Xudong
> -Original Message-
> From: Don Dutile [mailto:ddut...@redhat.com]
> Sent: Thursday, May 31, 2012 12:13 AM
> To: Xudong Hao
> Cc: bhelg...@google.com; linux-...@vger.kernel.org;
> linux-ker...@vger.kernel.org; kvm@vger.kernel.org; a...@redhat.com;
> alex.william...@redhat.com; Zhang, Xiantao; Hao, Xudong
> Subject: Re: [PATCH 1/1] Enable LTR/OBFF before device is used by driver
> 
> While you are making the other recommended fixes, could
> you add/create a pci_obff_supported() function, like the pci_ltr_supported()
> function, and more importantly, add it to the pci_disable_obff() function?
> The latter does not check that DEVCAP2 is supported, and thus, could be
> writing to non-existent (potentially private) cap space (i.e., a V1 CAP 
> device,
> which do exist in the marketplace).
> 
> The [enable,disable]_ltr functions do a good job of doing pci_ltr_supported()
> checks, but the obff enable,disable functions should have similar calls.
> 

Yeah, it's a good suggestion, I'll add pci_obff_supported() function.

> 
> On 05/13/2012 10:48 PM, Xudong Hao wrote:
> > Enable LTR(Latency tolerance reporting) and OBFF(optimized buffer 
> > flush/fill)
> in
> >   pci_enable_device(), so that they are enabled before the device is used by
> driver.
> >
> > Signed-off-by: Xudong Hao
> >
> > ---
> >   drivers/pci/pci.c |   29 +
> >   1 files changed, 29 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 111569c..2369883 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1134,6 +1134,31 @@ int pci_load_and_free_saved_state(struct
> pci_dev *dev,
> >   }
> >   EXPORT_SYMBOL_GPL(pci_load_and_free_saved_state);
> >
> > +static void pci_enable_dev_caps(struct pci_dev *dev)
> > +{
> > +   /* set default value */
> > +   unsigned long type = PCI_EXP_OBFF_SIGNAL_ALWAYS;
> > +
> > +   /* LTR(Latency tolerance reporting) allows devices to send
> > +* messages to the root complex indicating their latency
> > +* tolerance for snooped&  unsnooped memory transactions.
> > +*/
> > +   pci_enable_ltr(dev);
> > +
> > +   /* OBFF (optimized buffer flush/fill), where supported,
> > +* can help improve energy efficiency by giving devices
> > +* information about when interrupts and other activity
> > +* will have a reduced power impact.
> > +*/
> > +   pci_enable_obff(dev, type);
> > +}
> > +
> > +static void pci_disable_dev_caps(struct pci_dev *dev)
> > +{
> > +   pci_disable_obff(dev);
> > +   pci_disable_ltr(dev);
> > +}
> > +
> >   static int do_pci_enable_device(struct pci_dev *dev, int bars)
> >   {
> >  int err;
> > @@ -1146,6 +1171,9 @@ static int do_pci_enable_device(struct pci_dev
> *dev, int bars)
> >  return err;
> >  pci_fixup_device(pci_fixup_enable, dev);
> >
> > +   /* Enable some device capibility before it's used by driver. */
> > +   pci_enable_dev_caps(dev);
> > +
> >  return 0;
> >   }
> >
> > @@ -1361,6 +1389,7 @@ static void do_pci_disable_device(struct pci_dev
> *dev)
> >  }
> >
> >  pcibios_disable_device(dev);
> > +   pci_disable_dev_caps(dev);
> >   }
> >
> >   /**
> > --
> > 1.6.0.rc1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM updates for v3.5

2012-05-30 Thread Marcelo Tosatti

Linus,

Please pull from

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git master

to receive a build fix for non-kvm archs and a transparent hugepage
refcount bugfix on hosts with 4M pages.


Avi Kivity (1):
  KVM: Export asm-generic/kvm_para.h

Xiao Guangrong (1):
  KVM: MMU: fix huge page adapted on non-PAE host

 arch/x86/kvm/mmu.c |3 +--
 include/asm-generic/Kbuild |1 +
 2 files changed, 2 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] KVM test: use new functions in cdrom_test

2012-05-30 Thread Lucas Meneghel Rodrigues
By the way, looks good, applied to next.

On Wed, May 30, 2012 at 11:51 AM, Lukáš Doktor  wrote:
> I forgot add pull request link:
> https://github.com/autotest/autotest/pull/368
>
> Dne 30.5.2012 16:43, Lukáš Doktor napsal(a):
>> Use get_block and other framework functions in cdrom test. Also
>> don't fail the whole test when tray-status reporting is not supported
>> by qemu and other cleanups.
>>
>> Signed-off-by: Lukáš Doktor
>> ---
>>   client/tests/kvm/tests/cdrom.py |  118 
>> ---
>>   client/virt/subtests.cfg.sample |    2 +
>>   2 files changed, 50 insertions(+), 70 deletions(-)
>>
>> diff --git a/client/tests/kvm/tests/cdrom.py 
>> b/client/tests/kvm/tests/cdrom.py
>> index 089150b..4390796 100644
>> --- a/client/tests/kvm/tests/cdrom.py
>> +++ b/client/tests/kvm/tests/cdrom.py
>> @@ -21,7 +21,7 @@ def run_cdrom(test, params, env):
>>       3) * If cdrom_test_autounlock is set, verifies that device is unlocked
>>          <300s after boot
>>       4) Eject cdrom using monitor and change with another iso several times.
>> -    5) Eject cdrom in guest and check tray status reporting.
>> +    5) * If cdrom_test_tray_status = yes, tests tray reporting.
>>       6) Try to format cdrom and check the return string.
>>       7) Mount cdrom device.
>>       8) Copy file from cdrom and compare files using diff.
>> @@ -35,6 +35,10 @@ def run_cdrom(test, params, env):
>>                                           eject CDROM directly after insert
>>       @param cfg: cdrom_test_autounlock - Test whether guest OS unlocks cdrom
>>                                           after boot (<300s after VM is 
>> booted)
>> +    @param cfg: cdrom_test_tray_status - Test tray reporting (eject and 
>> insert
>> +                                         CD couple of times in guest).
>> +
>> +    @warning: Check dmesg for block device failures
>>       """
>>       def master_cdroms(params):
>>           """ Creates 'new' cdrom with one file on it """
>> @@ -43,7 +47,7 @@ def run_cdrom(test, params, env):
>>           cdrom_cd1 = params.get("cdrom_cd1")
>>           if not os.path.isabs(cdrom_cd1):
>>               cdrom_cd1 = os.path.join(test.bindir, cdrom_cd1)
>> -        cdrom_dir = os.path.realpath(os.path.dirname(cdrom_cd1))
>> +        cdrom_dir = os.path.dirname(cdrom_cd1)
>>           utils.run("dd if=/dev/urandom of=orig bs=10M count=1")
>>           utils.run("dd if=/dev/urandom of=new bs=10M count=1")
>>           utils.run("mkisofs -o %s/orig.iso orig" % cdrom_dir)
>> @@ -55,57 +59,27 @@ def run_cdrom(test, params, env):
>>           error.context("cleaning up temp cdrom images")
>>           os.remove("%s/new.iso" % cdrom_dir)
>>
>> -    def get_block_info(re_device='[^\n][^:]+'):
>> -        """ Gets device string and file from kvm-monitor """
>> -        blocks = vm.monitor.info("block")
>> -        devices = []
>> -        files = []
>> -        if isinstance(blocks, str):
>> -            devices = re.findall('(%s): .*' % re_device, blocks)
>> -            if devices:
>> -                for dev in devices:
>> -                    cdfile = re.findall('%s: .*file=(\S*) ' % dev, blocks)
>> -                    if cdfile:
>> -                        cdfile = os.path.realpath(cdfile[0])
>> -                    else:
>> -                        cdfile = None
>> -                    files.append(cdfile)
>> -        else:
>> -            for block in blocks:
>> -                if re.match(re_device, block['device']):
>> -                    devices.append(block['device'])
>> -                    try:
>> -                        cdfile = block['inserted']['file']
>> -                        if cdfile:
>> -                            cdfile = os.path.realpath(cdfile)
>> -                    except KeyError:
>> -                        cdfile = None
>> -                    files.append(cdfile)
>> -        return (devices, files)
>> -
>> -    def get_cdrom_info(device):
>> +    def get_cdrom_file(device):
>>           """
>>           @param device: qemu monitor device
>>           @return: file associated with $device device
>>           """
>> -        (_, cdfile) = get_block_info(device)
>> -        logging.debug("Device name: %s, ISO: %s", device, cdfile[0])
>> -        return cdfile[0]
>> -
>> -    def check_cdrom_locked(cdrom):
>> -        """ Checks whether the cdrom is locked """
>>           blocks = vm.monitor.info("block")
>> +        cdfile = None
>>           if isinstance(blocks, str):
>> -            lock_str = "locked=1"
>> -            for block in blocks.splitlines():
>> -                if cdrom in block and lock_str in block:
>> -                    return True
>> +            cdfile = re.findall('%s: .*file=(\S*) ' % device, blocks)
>> +            if not cdfile:
>> +                return None
>> +            else:
>> +                cdfile = cdfile[0]
>>           else:
>>               for block in blocks:
>> -                if ('inserted' in block.keys() 

[PATCH] virt: Add Fedora 17 to the list of guests

2012-05-30 Thread Lucas Meneghel Rodrigues
Also, make it the default guest for KVM autotest.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/tests.cfg.sample |   18 -
 client/tests/libvirt/tests.cfg.sample |   21 ++--
 client/virt/guest-os.cfg.sample   |   28 ++
 client/virt/unattended/Fedora-17.ks   |   35 +
 client/virt/virt_utils.py |6 +++---
 5 files changed, 86 insertions(+), 22 deletions(-)
 create mode 100644 client/virt/unattended/Fedora-17.ks

diff --git a/client/tests/kvm/tests.cfg.sample 
b/client/tests/kvm/tests.cfg.sample
index 9329a05..912232e 100644
--- a/client/tests/kvm/tests.cfg.sample
+++ b/client/tests/kvm/tests.cfg.sample
@@ -57,8 +57,8 @@ variants:
 # Subtest choice. You can modify that line to add more subtests
 only unattended_install.cdrom, boot, shutdown
 
-# Runs qemu, f16 64 bit guest OS, install, boot, shutdown
-- @qemu_f16_quick:
+# Runs qemu, f17 64 bit guest OS, install, boot, shutdown
+- @qemu_f17_quick:
 # We want qemu for this run
 qemu_binary = /usr/bin/qemu
 qemu_img_binary = /usr/bin/qemu-img
@@ -72,13 +72,13 @@ variants:
 only no_9p_export
 only no_pci_assignable
 only smallpages
-only Fedora.16.64
+only Fedora.17.64
 only unattended_install.cdrom.extra_cdrom_ks, boot, shutdown
 # qemu needs -enable-kvm on the cmdline
 extra_params += ' -enable-kvm'
 
-# Runs qemu-kvm, f16 64 bit guest OS, install, boot, shutdown
-- @qemu_kvm_f16_quick:
+# Runs qemu-kvm, f17 64 bit guest OS, install, boot, shutdown
+- @qemu_kvm_f17_quick:
 # We want qemu-kvm for this run
 qemu_binary = /usr/bin/qemu-kvm
 qemu_img_binary = /usr/bin/qemu-img
@@ -90,10 +90,10 @@ variants:
 only no_9p_export
 only no_pci_assignable
 only smallpages
-only Fedora.16.64
+only Fedora.17.64
 only unattended_install.cdrom.extra_cdrom_ks, boot, shutdown
 
-# Runs qemu-kvm, f16 64 bit guest OS, install, starts qemu-kvm
+# Runs qemu-kvm, f17 64 bit guest OS, install, starts qemu-kvm
 # with 9P support and runs 9P CI tests
 - @qemu_kvm_9p_export:
 qemu_binary = /usr/bin/qemu-kvm
@@ -106,7 +106,7 @@ variants:
 only no_pci_assignable
 only smallpages
 only 9p_export
-only Fedora.16.64
+only Fedora.17.64
 only unattended_install.cdrom.extra_cdrom_ks, boot, 9p.9p_ci, shutdown
 
 # Runs your own guest image (qcow2, can be adjusted), all migration tests
@@ -129,4 +129,4 @@ variants:
 only migrate
 
 # Choose your test list from the testsets defined
-only qemu_kvm_f16_quick
+only qemu_kvm_f17_quick
diff --git a/client/tests/libvirt/tests.cfg.sample 
b/client/tests/libvirt/tests.cfg.sample
index 548caf8..2b17d50 100644
--- a/client/tests/libvirt/tests.cfg.sample
+++ b/client/tests/libvirt/tests.cfg.sample
@@ -6,8 +6,8 @@
 include tests-shared.cfg
 
 variants:
-# Runs virt-install, f16 64 bit guest OS, install, boot, shutdown
-- @libvirt_f16_quick:
+# Runs virt-install, f17 64 bit guest OS, install, boot, shutdown
+- @libvirt_f17_quick:
 virt_install_binary = /usr/bin/virt-install
 qemu_img_binary = /usr/bin/qemu-img
 hvm_or_pv = hvm
@@ -18,11 +18,12 @@ variants:
 only no_9p_export
 only no_pci_assignable
 only smallpages
-only Fedora.16.64
+only Fedora.17.64
 only unattended_install.cdrom.extra_cdrom_ks, boot, reboot, shutdown, 
remove_guest.with_disk
 
-# Runs virt-install, f16 64 as a 64 bit PV guest OS, install, boot, 
shutdown
-- @libvirt_xenpv_f16_quick:
+
+# Runs virt-install, f17 64 as a 64 bit PV guest OS, install, boot, 
shutdown
+- @libvirt_xenpv_f17_quick:
 virt_install_binary = /usr/bin/virt-install
 qemu_img_binary = /usr/bin/qemu-img
 hvm_or_pv = pv
@@ -34,12 +35,12 @@ variants:
 only no_9p_export
 only no_pci_assignable
 only smallpages
-only Fedora.16.64
+only Fedora.17.64
 only unattended_install.cdrom.http_ks, boot, reboot, shutdown, 
remove_guest.with_disk
 
-# Runs virt-install, f16 64 as a 64 bit HVM (full virt) guest OS,
+# Runs virt-install, f17 64 as a 64 bit HVM (full virt) guest OS,
 # install, boot, shutdown
-- @libvirt_xenhvm_f16_quick:
+- @libvirt_xenhvm_f17_quick:
 virt_install_binary = /usr/bin/virt-install
 qemu_img_binary = /usr/bin/qemu-img
 hvm_or_pv = hvm
@@ -52,7 +53,7 @@ variants:
 only no_9p_export
 only no_pci_assignable
 only smallpages
-only Fedora.16.64
+only Fedora.17.64
 only unattended_install.cdrom.in_cdrom_ks, boot, reboot, shutdown, 
remove_guest.with_disk
 
 # Runs virt-install, RHEL 6.0 64 bit guest OS, install, boot, shutdown
@@ -90,4 

Re: [PATCHv2] kvm: optimize ISR lookups

2012-05-30 Thread Marcelo Tosatti
On Tue, May 22, 2012 at 03:54:56PM +0300, Michael S. Tsirkin wrote:
> We perform ISR lookups twice: during interrupt
> injection and on EOI. Typical workloads only have
> a single bit set there. So we can avoid ISR scans by
> 1. counting bits as we set/clear them in ISR
> 2. if count is 1, caching the vector number
> 3. if count != 1, invalidating the cache
> 
> The real purpose of this is enabling PV EOI
> which needs to quickly validate the vector.
> But non PV guests might also benefit.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> I am well aware of Thomas and Peter's suggestion of reworking APIC
> register handling in kvm instead of adding a cache like this patch does.
> 
> This revision does *not* address that comment yet: it only corrects a
> bug in the original patch.
> 
> Posting in this form for ease of testing.
> 
> Changes from v1:
>   replace ASSERT by BUG_ON, correcting inverted logic
> 
>  arch/x86/kvm/lapic.c |   51 -
>  arch/x86/kvm/lapic.h |2 +
>  2 files changed, 51 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 93c1574..0d2985d 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void 
> *bitmap)
>   clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
>  }
>  
> +static inline int __apic_test_and_set_vector(int vec, void *bitmap)
> +{
> + return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> +}
> +
> +static inline int __apic_test_and_clear_vector(int vec, void *bitmap)
> +{
> + return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> +}
> +
>  static inline int apic_hw_enabled(struct kvm_lapic *apic)
>  {
>   return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
> @@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap)
>   return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
>  }
>  
> +static u8 count_vectors(void *bitmap)
> +{
> + u32 *word = bitmap;
> + int word_offset;
> + u8 count = 0;
> + for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
> + count += hweight32(word[word_offset << 2]);
> + return count;
> +}
> +
>  static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
>  {
>   apic->irr_pending = true;
> @@ -242,6 +262,25 @@ static inline void apic_clear_irr(int vec, struct 
> kvm_lapic *apic)
>   apic->irr_pending = true;
>  }
>  
> +static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
> +{
> + if (!__apic_test_and_set_vector(vec, apic->regs + APIC_ISR))
> + ++apic->isr_count;
> + BUG_ON(apic->isr_count > MAX_APIC_VECTOR);
> + if (likely(apic->isr_count == 1))
> + apic->isr_cache = vec;
> + else
> + apic->isr_cache = -1;
> +}
> +
> +static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
> +{
> + if (__apic_test_and_clear_vector(vec, apic->regs + APIC_ISR))
> + --apic->isr_count;
> + BUG_ON(apic->isr_count < 0);
> + apic->isr_cache = -1;
> +}
> +
>  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>  {
>   struct kvm_lapic *apic = vcpu->arch.apic;
> @@ -273,6 +312,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct 
> kvm_lapic_irq *irq)
>  static inline int apic_find_highest_isr(struct kvm_lapic *apic)
>  {
>   int result;
> + if (!apic->isr_count)
> + return -1;
> + if (likely(apic->isr_cache != -1))

assert(isr_count == 1).

Looks fine otherwise. Gleb can you review please?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel 51bfd299... qemu a1fce560...

2012-05-30 Thread Marcelo Tosatti
On Tue, May 22, 2012 at 07:40:29AM +, Ren, Yongjie wrote:
> > >>
> > > Latest commit 3fd9fedb in qemu-kvm master tree still has this issue.
> > > And, the commit ID provided in Launchpad is correct.
> > 
> > Can you please check if the bug exists in upstream qemu.git as well?
> > 
> This bug doesn't exist on upstream qemu.git with latest commit: fd4567d9.
> So, it should only exists on qemu-kvm tree.

What do you mean by "If you press some key to continue, after being
automatically repaired, the guest can boot up." exactly?

That the error is seen once and after pressing a key it disappears?

> > In any case, it would be helpful to bisect the problem.
> > 
> > Kevin
> N?r??yb?X??ǧv?^?)޺{.n?+h????ܨ}???Ơz?&j:+v???zZ+??+zf???h???~i???z??w?&?)ߢf
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel 51bfd299... qemu a1fce560...

2012-05-30 Thread Marcelo Tosatti
On Tue, May 22, 2012 at 07:40:29AM +, Ren, Yongjie wrote:
> > -Original Message-
> > From: Kevin Wolf [mailto:kw...@redhat.com]
> > Sent: Monday, May 21, 2012 11:30 PM
> > To: Ren, Yongjie
> > Cc: Avi Kivity; kvm@vger.kernel.org; Liu, RongrongX
> > Subject: Re: Biweekly KVM Test report, kernel 51bfd299... qemu
> > a1fce560...
> > 
> > Am 21.05.2012 11:45, schrieb Ren, Yongjie:
> > >> -Original Message-
> > >> From: Kevin Wolf [mailto:kw...@redhat.com]
> > >> Sent: Monday, May 21, 2012 5:05 PM
> > >> To: Avi Kivity
> > >> Cc: Ren, Yongjie; kvm@vger.kernel.org
> > >> Subject: Re: Biweekly KVM Test report, kernel 51bfd299... qemu
> > >> a1fce560...
> > >>
> > >> Am 21.05.2012 10:27, schrieb Avi Kivity:
> > >>> On 05/21/2012 06:34 AM, Ren, Yongjie wrote:
> >  Hi All,
> > 
> >  This is KVM upstream test result against kvm.git
> > >> 51bfd2998113e1f8ce8dcf853407b76a04b5f2a0 based on kernel
> > 3.4.0-rc7,
> > >> and qemu-kvm.git a1fce560c0e5f287ed65d2aaadb3e59578aaa983.
> > 
> >  We found 1 new bug and 1 bug got fixed in the past two weeks.
> > 
> >  New issue (1):
> >  1. disk error when guest boot up via qcow2 image
> >    https://bugs.launchpad.net/qemu/+bug/1002121
> >    -- Should be a regression on qemu-kvm.
> > 
> > >>>
> > >>> Kevin, is this the known regression in qcow2 or something new?
> > >>
> > >> If the commit ID is right, it must be something new. The regression that
> > >> Marcelo found was fixed in 54e68143.
> > >>
> > > Yes, it's right. This should be a new regression.
> > > I looked at the comment of 54e68143, and found it was not related the
> > issue I reported.
> > >
> > >> The Launchpad bug refers to commit e54f008ef, which doesn't include
> > this
> > >> fix indeed. So was the test repeated with a more current qemu-kvm
> > >> version after filing the bug in Launchpad, or is the commit ID in this
> > >> mail wrong?
> > >>
> > > Latest commit 3fd9fedb in qemu-kvm master tree still has this issue.
> > > And, the commit ID provided in Launchpad is correct.
> > 
> > Can you please check if the bug exists in upstream qemu.git as well?
> > 
> This bug doesn't exist on upstream qemu.git with latest commit: fd4567d9.
> So, it should only exists on qemu-kvm tree.

Please bisect manually (not using git bisect), with the attached list of 
commits. These are the qemu -> qemu-kvm merge commits in the range
described as bad/good.

4a808cd6ff25fa3d7f019dc56f9369c48c415645
87e51f4e6a5c07f336709d824fe7fe9e60c8f730
aa7159eea0f1f45c56235e77f40229d0d02c0384
9d81ede5d772c95c503f834a6c7ddc78c827209b
c63457f3ebfb494d5a3968dfbe84971f3a3fc7b6
69bbfba89575ee0665dee605901184c69ab8e6db
d0151e10d7e0d784fab1b7c895d0704ae0dee7d7
968fcfb5edc564ae5736251bb984a02caf2cfbf2
6c3937ec0b2ce7fbfd5c762c9aeb7aefd54c75ae
171d2f2249a360d7d623130d3aa991418c53716d
fd453a24166e36a3d376c9bc221e520e3ee425af
abc551bd456cf0407fa798395d83dc5aa35f6dbb
823ccf41509baa197dd6a3bef63837a6cf101ad8
be833bc6c2dc40929d1c815b8cbe26c2d9e6dc03
f2b8514f43036f948658cdcddd3c3428245adcac




Re: [PATCH 0/5] pci-assign: More small cleanups

2012-05-30 Thread Alex Williamson
On Wed, 2012-05-30 at 11:05 +0200, Jan Kiszka wrote:
> Besides an update for hiding unsupported ioport BARs and tiny code
> massages, this removes the obsolete iommu property from the pci-assign
> device.
> 
> Jan Kiszka (5):
>   pci-assign: Hide ioport regions on lacking sysfs support
>   pci-assign: Drop iommu property
>   pci-assign: Privatize type definitions
>   pci-assign: Drop write-only AssignedDevRegion::num
>   pci-assign: Drop kvm_assigned_irq::host_irq initialization
> 
>  hw/device-assignment.c |  120 +++
>  hw/device-assignment.h |   98 ---
>  2 files changed, 99 insertions(+), 119 deletions(-)
> 

Looks good.  For series:

Acked-by: Alex Williamson 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] KVM: Optimize vcpu->requests slow path slightly

2012-05-30 Thread Marcelo Tosatti
On Sun, May 20, 2012 at 04:49:27PM +0300, Avi Kivity wrote:
> Instead of using a atomic operation per active request, use just one
> to get all requests at once, then check them with local ops.  This
> probably isn't any faster, since simultaneous requests are rare, but
> it does reduce code size.
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/kvm/x86.c |   33 ++---
>  1 file changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 953e692..c0209eb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5232,55 +5232,58 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   bool req_int_win = !irqchip_in_kernel(vcpu->kvm) &&
>   vcpu->run->request_interrupt_window;
>   bool req_immediate_exit = 0;
> + ulong reqs;
>  
>   if (unlikely(req_int_win))
>   kvm_make_request(KVM_REQ_EVENT, vcpu);
>  
>   if (vcpu->requests) {
> - if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
> + reqs = xchg(&vcpu->requests, 0UL);
> +
> + if (test_bit(KVM_REQ_MMU_RELOAD, &reqs))
>   kvm_mmu_unload(vcpu);
> - if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
> + if (test_bit(KVM_REQ_MIGRATE_TIMER, &reqs))
>   __kvm_migrate_timers(vcpu);
> - if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) {
> + if (test_bit(KVM_REQ_CLOCK_UPDATE, &reqs)) {
>   r = kvm_guest_time_update(vcpu);
>   if (unlikely(r))
>   goto out;
>   }

Bailing out loses requests in "reqs". 

Caching the requests makes the following type of sequence behave strangely

req = xchg(&vcpu->requests);
if request is set
request handler
...
set REQ_EVENT
...

prepare for guest entry
vcpu->requests set
bail

The code is more straightforward as it is.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] deal with guest panicked event

2012-05-30 Thread Luiz Capitulino
On Mon, 21 May 2012 14:50:51 +0800
Wen Congyang  wrote:

> When the guest is panicked, it will write 0x1 to the port 0x505. So if
> qemu reads 0x1 from this port, we can do the folloing three things
> according to the parameter -onpanic:
> 1. emit QEVENT_GUEST_PANICKED only
> 2. emit QEVENT_GUEST_PANICKED and pause VM
> 3. emit QEVENT_GUEST_PANICKED and quit VM
> 
> Note: if we emit QEVENT_GUEST_PANICKED only, and the management
> application does not receive this event(the management may not
> run when the event is emitted), the management won't know the
> guest is panicked.

It will if it checks the vm status.

Btw, please, split this further into a patch adding the event, another one
adding the new runstate and then the rest. One more comment below.

> 
> Signed-off-by: Wen Congyang 
> ---
>  kvm-all.c|   84 
> ++
>  kvm-stub.c   |9 ++
>  kvm.h|3 ++
>  monitor.c|3 ++
>  monitor.h|1 +
>  qapi-schema.json |6 +++-
>  qemu-options.hx  |   14 +
>  qmp.c|3 +-
>  vl.c |   17 ++-
>  9 files changed, 137 insertions(+), 3 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 9b73ccf..b5b0531 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -19,6 +19,7 @@
>  #include 
>  
>  #include 
> +#include 
>  
>  #include "qemu-common.h"
>  #include "qemu-barrier.h"
> @@ -29,6 +30,8 @@
>  #include "bswap.h"
>  #include "memory.h"
>  #include "exec-memory.h"
> +#include "iorange.h"
> +#include "qemu-objects.h"
>  
>  /* This check must be after config-host.h is included */
>  #ifdef CONFIG_EVENTFD
> @@ -1707,3 +1710,84 @@ int kvm_on_sigbus(int code, void *addr)
>  {
>  return kvm_arch_on_sigbus(code, addr);
>  }
> +
> +/* Possible values for action parameter. */
> +#define PANICKED_REPORT 1   /* emit QEVENT_GUEST_PANICKED only */
> +#define PANICKED_PAUSE  2   /* emit QEVENT_GUEST_PANICKED and pause VM */
> +#define PANICKED_QUIT   3   /* emit QEVENT_GUEST_PANICKED and quit VM */
> +
> +static int panicked_action = PANICKED_REPORT;
> +
> +static void kvm_pv_port_read(IORange *iorange, uint64_t offset, unsigned 
> width,
> + uint64_t *data)
> +{
> +*data = (1 << KVM_PV_FEATURE_PANICKED);
> +}
> +
> +static void panicked_mon_event(const char *action)
> +{
> +QObject *data;
> +
> +data = qobject_from_jsonf("{ 'action': %s }", action);
> +monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
> +qobject_decref(data);
> +}
> +
> +static void panicked_perform_action(void)
> +{
> +switch(panicked_action) {
> +case PANICKED_REPORT:
> +panicked_mon_event("report");
> +break;
> +
> +case PANICKED_PAUSE:
> +panicked_mon_event("pause");
> +vm_stop(RUN_STATE_GUEST_PANICKED);
> +break;
> +
> +case PANICKED_QUIT:
> +panicked_mon_event("quit");
> +exit(0);
> +break;
> +}

Having the data argument is not needed/wanted. The mngt app can guess it if it
needs to know it, but I think it doesn't want to.

> +}
> +
> +static void kvm_pv_port_write(IORange *iorange, uint64_t offset, unsigned 
> width,
> +  uint64_t data)
> +{
> +if (data == KVM_PV_PANICKED)
> +panicked_perform_action();
> +}
> +
> +static void kvm_pv_port_destructor(IORange *iorange)
> +{
> +g_free(iorange);
> +}
> +
> +static IORangeOps pv_io_range_ops = {
> +.read = kvm_pv_port_read,
> +.write = kvm_pv_port_write,
> +.destructor = kvm_pv_port_destructor,
> +};
> +
> +void kvm_pv_port_init(void)
> +{
> +IORange *pv_io_range = g_malloc(sizeof(IORange));
> +
> +iorange_init(pv_io_range, &pv_io_range_ops, 0x505, 1);
> +ioport_register(pv_io_range);
> +}
> +
> +int select_panicked_action(const char *p)
> +{
> +if (strcasecmp(p, "report") == 0)
> +panicked_action = PANICKED_REPORT;
> +else if (strcasecmp(p, "pause") == 0)
> +panicked_action = PANICKED_PAUSE;
> +else if (strcasecmp(p, "quit") == 0)
> +panicked_action = PANICKED_QUIT;
> +else
> +return -1;
> +
> +return 0;
> +}
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 47c573d..4cf977e 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -128,3 +128,12 @@ int kvm_on_sigbus(int code, void *addr)
>  {
>  return 1;
>  }
> +
> +void kvm_pv_port_init(void)
> +{
> +}
> +
> +int select_panicked_action(const char *p)
> +{
> +return -1;
> +}
> diff --git a/kvm.h b/kvm.h
> index 4ccae8c..95075cf 100644
> --- a/kvm.h
> +++ b/kvm.h
> @@ -60,6 +60,9 @@ int kvm_has_gsi_routing(void);
>  
>  int kvm_allows_irq0_override(void);
>  
> +void kvm_pv_port_init(void);
> +int select_panicked_action(const char *p);
> +
>  #ifdef NEED_CPU_H
>  int kvm_init_vcpu(CPUArchState *env);
>  
> diff --git a/monitor.c b/monitor.c
> index 12a6fe2..83cb059 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -493,6 +493,9 @@ void monitor_protocol_event(M

Re: [Qemu-devel] [PATCH 1/3] start vm after reseting it

2012-05-30 Thread Luiz Capitulino
On Mon, 21 May 2012 14:49:32 +0800
Wen Congyang  wrote:

> The guest should run after reseting it, but it does not
> run if its old state is RUN_STATE_INTERNAL_ERROR or RUN_STATE_PAUSED.
> 
> Signed-off-by: Wen Congyang 
> ---
>  vl.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index 23ab3a3..7f5fed8 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1539,6 +1539,7 @@ static bool main_loop_should_exit(void)
>  if (runstate_check(RUN_STATE_INTERNAL_ERROR) ||
>  runstate_check(RUN_STATE_SHUTDOWN)) {
>  runstate_set(RUN_STATE_PAUSED);
> +vm_start();

Please, drop the runstate_set() call. I think you also have to
to call bdrv_iostatus_reset(), as qmp_cont() does.

>  }
>  }
>  if (qemu_powerdown_requested()) {

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] Enable LTR/OBFF before device is used by driver

2012-05-30 Thread Don Dutile

While you are making the other recommended fixes, could
you add/create a pci_obff_supported() function, like the pci_ltr_supported()
function, and more importantly, add it to the pci_disable_obff() function?
The latter does not check that DEVCAP2 is supported, and thus, could be
writing to non-existent (potentially private) cap space (i.e., a V1 CAP device,
which do exist in the marketplace).

The [enable,disable]_ltr functions do a good job of doing pci_ltr_supported()
checks, but the obff enable,disable functions should have similar calls.


On 05/13/2012 10:48 PM, Xudong Hao wrote:

Enable LTR(Latency tolerance reporting) and OBFF(optimized buffer flush/fill) in
  pci_enable_device(), so that they are enabled before the device is used by 
driver.

Signed-off-by: Xudong Hao

---
  drivers/pci/pci.c |   29 +
  1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 111569c..2369883 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1134,6 +1134,31 @@ int pci_load_and_free_saved_state(struct pci_dev *dev,
  }
  EXPORT_SYMBOL_GPL(pci_load_and_free_saved_state);

+static void pci_enable_dev_caps(struct pci_dev *dev)
+{
+   /* set default value */
+   unsigned long type = PCI_EXP_OBFF_SIGNAL_ALWAYS;
+
+   /* LTR(Latency tolerance reporting) allows devices to send
+* messages to the root complex indicating their latency
+* tolerance for snooped&  unsnooped memory transactions.
+*/
+   pci_enable_ltr(dev);
+
+   /* OBFF (optimized buffer flush/fill), where supported,
+* can help improve energy efficiency by giving devices
+* information about when interrupts and other activity
+* will have a reduced power impact.
+*/
+   pci_enable_obff(dev, type);
+}
+
+static void pci_disable_dev_caps(struct pci_dev *dev)
+{
+   pci_disable_obff(dev);
+   pci_disable_ltr(dev);
+}
+
  static int do_pci_enable_device(struct pci_dev *dev, int bars)
  {
 int err;
@@ -1146,6 +1171,9 @@ static int do_pci_enable_device(struct pci_dev *dev, int 
bars)
 return err;
 pci_fixup_device(pci_fixup_enable, dev);

+   /* Enable some device capibility before it's used by driver. */
+   pci_enable_dev_caps(dev);
+
 return 0;
  }

@@ -1361,6 +1389,7 @@ static void do_pci_disable_device(struct pci_dev *dev)
 }

 pcibios_disable_device(dev);
+   pci_disable_dev_caps(dev);
  }

  /**
--
1.6.0.rc1

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM test: use new functions in cdrom_test

2012-05-30 Thread Lukáš Doktor

I forgot add pull request link:
https://github.com/autotest/autotest/pull/368

Dne 30.5.2012 16:43, Lukáš Doktor napsal(a):

Use get_block and other framework functions in cdrom test. Also
don't fail the whole test when tray-status reporting is not supported
by qemu and other cleanups.

Signed-off-by: Lukáš Doktor
---
  client/tests/kvm/tests/cdrom.py |  118 ---
  client/virt/subtests.cfg.sample |2 +
  2 files changed, 50 insertions(+), 70 deletions(-)

diff --git a/client/tests/kvm/tests/cdrom.py b/client/tests/kvm/tests/cdrom.py
index 089150b..4390796 100644
--- a/client/tests/kvm/tests/cdrom.py
+++ b/client/tests/kvm/tests/cdrom.py
@@ -21,7 +21,7 @@ def run_cdrom(test, params, env):
  3) * If cdrom_test_autounlock is set, verifies that device is unlocked
 <300s after boot
  4) Eject cdrom using monitor and change with another iso several times.
-5) Eject cdrom in guest and check tray status reporting.
+5) * If cdrom_test_tray_status = yes, tests tray reporting.
  6) Try to format cdrom and check the return string.
  7) Mount cdrom device.
  8) Copy file from cdrom and compare files using diff.
@@ -35,6 +35,10 @@ def run_cdrom(test, params, env):
  eject CDROM directly after insert
  @param cfg: cdrom_test_autounlock - Test whether guest OS unlocks cdrom
  after boot (<300s after VM is booted)
+@param cfg: cdrom_test_tray_status - Test tray reporting (eject and insert
+ CD couple of times in guest).
+
+@warning: Check dmesg for block device failures
  """
  def master_cdroms(params):
  """ Creates 'new' cdrom with one file on it """
@@ -43,7 +47,7 @@ def run_cdrom(test, params, env):
  cdrom_cd1 = params.get("cdrom_cd1")
  if not os.path.isabs(cdrom_cd1):
  cdrom_cd1 = os.path.join(test.bindir, cdrom_cd1)
-cdrom_dir = os.path.realpath(os.path.dirname(cdrom_cd1))
+cdrom_dir = os.path.dirname(cdrom_cd1)
  utils.run("dd if=/dev/urandom of=orig bs=10M count=1")
  utils.run("dd if=/dev/urandom of=new bs=10M count=1")
  utils.run("mkisofs -o %s/orig.iso orig" % cdrom_dir)
@@ -55,57 +59,27 @@ def run_cdrom(test, params, env):
  error.context("cleaning up temp cdrom images")
  os.remove("%s/new.iso" % cdrom_dir)

-def get_block_info(re_device='[^\n][^:]+'):
-""" Gets device string and file from kvm-monitor """
-blocks = vm.monitor.info("block")
-devices = []
-files = []
-if isinstance(blocks, str):
-devices = re.findall('(%s): .*' % re_device, blocks)
-if devices:
-for dev in devices:
-cdfile = re.findall('%s: .*file=(\S*) ' % dev, blocks)
-if cdfile:
-cdfile = os.path.realpath(cdfile[0])
-else:
-cdfile = None
-files.append(cdfile)
-else:
-for block in blocks:
-if re.match(re_device, block['device']):
-devices.append(block['device'])
-try:
-cdfile = block['inserted']['file']
-if cdfile:
-cdfile = os.path.realpath(cdfile)
-except KeyError:
-cdfile = None
-files.append(cdfile)
-return (devices, files)
-
-def get_cdrom_info(device):
+def get_cdrom_file(device):
  """
  @param device: qemu monitor device
  @return: file associated with $device device
  """
-(_, cdfile) = get_block_info(device)
-logging.debug("Device name: %s, ISO: %s", device, cdfile[0])
-return cdfile[0]
-
-def check_cdrom_locked(cdrom):
-""" Checks whether the cdrom is locked """
  blocks = vm.monitor.info("block")
+cdfile = None
  if isinstance(blocks, str):
-lock_str = "locked=1"
-for block in blocks.splitlines():
-if cdrom in block and lock_str in block:
-return True
+cdfile = re.findall('%s: .*file=(\S*) ' % device, blocks)
+if not cdfile:
+return None
+else:
+cdfile = cdfile[0]
  else:
  for block in blocks:
-if ('inserted' in block.keys() and
-block['inserted']['file'] == cdrom):
-return block['locked']
-return False
+if block['device'] == device:
+try:
+cdfile = block['inserted']['file']
+except KeyError:
+continue
+return cdfile

  def check_cdrom_tray(cdrom):
  """ Checks whether 

[PATCH] KVM test: use new functions in cdrom_test

2012-05-30 Thread Lukáš Doktor
Use get_block and other framework functions in cdrom test. Also
don't fail the whole test when tray-status reporting is not supported
by qemu and other cleanups.

Signed-off-by: Lukáš Doktor 
---
 client/tests/kvm/tests/cdrom.py |  118 ---
 client/virt/subtests.cfg.sample |2 +
 2 files changed, 50 insertions(+), 70 deletions(-)

diff --git a/client/tests/kvm/tests/cdrom.py b/client/tests/kvm/tests/cdrom.py
index 089150b..4390796 100644
--- a/client/tests/kvm/tests/cdrom.py
+++ b/client/tests/kvm/tests/cdrom.py
@@ -21,7 +21,7 @@ def run_cdrom(test, params, env):
 3) * If cdrom_test_autounlock is set, verifies that device is unlocked
<300s after boot
 4) Eject cdrom using monitor and change with another iso several times.
-5) Eject cdrom in guest and check tray status reporting.
+5) * If cdrom_test_tray_status = yes, tests tray reporting.
 6) Try to format cdrom and check the return string.
 7) Mount cdrom device.
 8) Copy file from cdrom and compare files using diff.
@@ -35,6 +35,10 @@ def run_cdrom(test, params, env):
 eject CDROM directly after insert
 @param cfg: cdrom_test_autounlock - Test whether guest OS unlocks cdrom
 after boot (<300s after VM is booted)
+@param cfg: cdrom_test_tray_status - Test tray reporting (eject and insert
+ CD couple of times in guest).
+
+@warning: Check dmesg for block device failures
 """
 def master_cdroms(params):
 """ Creates 'new' cdrom with one file on it """
@@ -43,7 +47,7 @@ def run_cdrom(test, params, env):
 cdrom_cd1 = params.get("cdrom_cd1")
 if not os.path.isabs(cdrom_cd1):
 cdrom_cd1 = os.path.join(test.bindir, cdrom_cd1)
-cdrom_dir = os.path.realpath(os.path.dirname(cdrom_cd1))
+cdrom_dir = os.path.dirname(cdrom_cd1)
 utils.run("dd if=/dev/urandom of=orig bs=10M count=1")
 utils.run("dd if=/dev/urandom of=new bs=10M count=1")
 utils.run("mkisofs -o %s/orig.iso orig" % cdrom_dir)
@@ -55,57 +59,27 @@ def run_cdrom(test, params, env):
 error.context("cleaning up temp cdrom images")
 os.remove("%s/new.iso" % cdrom_dir)
 
-def get_block_info(re_device='[^\n][^:]+'):
-""" Gets device string and file from kvm-monitor """
-blocks = vm.monitor.info("block")
-devices = []
-files = []
-if isinstance(blocks, str):
-devices = re.findall('(%s): .*' % re_device, blocks)
-if devices:
-for dev in devices:
-cdfile = re.findall('%s: .*file=(\S*) ' % dev, blocks)
-if cdfile:
-cdfile = os.path.realpath(cdfile[0])
-else:
-cdfile = None
-files.append(cdfile)
-else:
-for block in blocks:
-if re.match(re_device, block['device']):
-devices.append(block['device'])
-try:
-cdfile = block['inserted']['file']
-if cdfile:
-cdfile = os.path.realpath(cdfile)
-except KeyError:
-cdfile = None
-files.append(cdfile)
-return (devices, files)
-
-def get_cdrom_info(device):
+def get_cdrom_file(device):
 """
 @param device: qemu monitor device
 @return: file associated with $device device
 """
-(_, cdfile) = get_block_info(device)
-logging.debug("Device name: %s, ISO: %s", device, cdfile[0])
-return cdfile[0]
-
-def check_cdrom_locked(cdrom):
-""" Checks whether the cdrom is locked """
 blocks = vm.monitor.info("block")
+cdfile = None
 if isinstance(blocks, str):
-lock_str = "locked=1"
-for block in blocks.splitlines():
-if cdrom in block and lock_str in block:
-return True
+cdfile = re.findall('%s: .*file=(\S*) ' % device, blocks)
+if not cdfile:
+return None
+else:
+cdfile = cdfile[0]
 else:
 for block in blocks:
-if ('inserted' in block.keys() and
-block['inserted']['file'] == cdrom):
-return block['locked']
-return False
+if block['device'] == device:
+try:
+cdfile = block['inserted']['file']
+except KeyError:
+continue
+return cdfile
 
 def check_cdrom_tray(cdrom):
 """ Checks whether the tray is opend """
@@ -121,7 +95,7 @@ def run_cdrom(test, params, env):
 for block in blocks:
 if block['device'] == cdrom an

Re: [PATCH] kvm: optimize ISR lookups

2012-05-30 Thread Avi Kivity
On 05/24/2012 01:00 AM, Thomas Gleixner wrote:

> Thought more about that.
> 
> We have a clear distinction between HW accessed data and software
> accessed data.
> 
> If I look at TPR then it is special cased already and it does:
> 
>case APIC_TASKPRI:
> report_tpr_access(apic, false|true);
> /* fall thru */
> 
> And the fall through is using the general accessor for all not special
> cased registers.
> 
> So all you have to do is 
> 
>case APIC_TASKPRI:
> report_tpr_access(apic, false|true);
> + return access_mapped_reg(...);
> 
> Instead of the fall through.
> 
> So there is no synchronizing back and forth problem simply because you
> already have a special case for that register.
> 
> I know you'll argue that the tpr reporting is a special hack for
> windows guests, at least that's what the changelog tells.
> 
> But even if we have a few more registers accessed by hardware and if
> they do not require a special casing, I really doubt that the overhead
> of special casing those few regs will be worse than not having the
> obvious optimization in place.
> 
> And looking deeper it's a total non issue. The apic mapping is 4k. The
> register stride is strictly 0x10. That makes a total of 256 possible
> registers.
> 
> So now you have two possibilites:
> 
> 1) Create a 256 bit == 64byte bitfield to select the one or the other
>representation.
> 
>The overhead of checking the bit is not going to be observable.
> 
> 2) Create a 256 function pointer array == 2k resp. 1k (64 / 32bit)
> 
>That's not a lot of memory even if you have to maintain two
>separate variants for read and write, but it allows you to get rid
>of the already horribly compiled switch case in apic_read/write and
>you'll get the optional stuff like report_tpr_access() w/o extra
>conditionals just for free.
> 
>An extra goodie is that you can catch any access to a non existing
>register which you now just silently ignore.  And that allows you
>to react on any future hardware oddities without adding a single
>runtime conditional.
> 
>This is stricly x86 and x86 is way better at dealing with indirect
>calls than with the mess gcc creates compiling those switch case
>constructs.
> 
>So I'd go for that and rather spend the memory and the time in
>setting up the function pointers on init/ioctl than dealing with
>the inconsistency of HW/SW representation with magic hacks.
> 

I like the bitmap version, it seems very lightweight.  But by itself it
doesn't allow us to use bitmap_weight (or the other bitmap accessors),
unless you assume beforehand that those registers will never be in the
hardware-layout region.

(you also need extra code for copying the APIC state to and from
userspace; right now we just memcpy the APIC page)


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V8 17/17] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

2012-05-30 Thread Raghavendra K T

On 05/30/2012 05:24 PM, Jan Kiszka wrote:

On 2012-05-02 12:09, Raghavendra K T wrote:

From: Raghavendra K T

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
enabled guest.

KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled
in guest.

Thanks Alex for KVM_HC_FEATURES inputs and Vatsa for rewriting KVM_HC_KICK_CPU


This contains valuable documentation for features that are already
supported. Can you break them out and post as separate patch already?
One comment on them below.



That sounds like a good idea. Sure, will do that.



Signed-off-by: Srivatsa Vaddagiri
Signed-off-by: Raghavendra K T
---
  Documentation/virtual/kvm/cpuid.txt  |4 ++
  Documentation/virtual/kvm/hypercalls.txt |   60 ++
  2 files changed, 64 insertions(+), 0 deletions(-)
diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 8820685..062dff9 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock 
available at msrs
  KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
 ||   || writing to msr 0x4b564d02
  --
+KVM_FEATURE_PV_UNHALT  || 6 || guest checks this feature bit
+   ||   || before enabling paravirtualized
+   ||   || spinlock support.
+--
  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
 ||   || per-cpu warps are expected in
 ||   || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt 
b/Documentation/virtual/kvm/hypercalls.txt
new file mode 100644
index 000..bc3f14a
--- /dev/null
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -0,0 +1,60 @@
+KVM Hypercalls Documentation
+===
+The template for each hypercall is:
+1. Hypercall name, value.
+2. Architecture(s)
+3. Status (deprecated, obsolete, active)
+4. Purpose
+
+1. KVM_HC_VAPIC_POLL_IRQ
+
+Value: 1
+Architecture: x86
+Purpose: None


Purpose: Trigger guest exit so that the host can check for pending
interrupts on reentry.


will add fold this and resend.

[...]

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] virtio_blk: unlock vblk->lock during kick

2012-05-30 Thread Christian Borntraeger
On 30/05/12 15:19, Stefan Hajnoczi wrote:
> Holding the vblk->lock across kick causes poor scalability in SMP
> guests.  If one CPU is doing virtqueue kick and another CPU touches the
> vblk->lock it will have to spin until virtqueue kick completes.
> 
> This patch reduces system% CPU utilization in SMP guests that are
> running multithreaded I/O-bound workloads.  The improvements are small
> but show as iops and SMP are increased.

Funny, recently I got a bug report regarding spinlock lockup
(see http://lkml.indiana.edu/hypermail/linux/kernel/1205.3/02201.html)
Turned out that blk_done was called on many guest cpus while the guest
was heavily paging on one virtio block device. (and the guest had much
more cpus than the host)
This patch will probably reduce the pressure for those cases as well.
we can then finish requests if somebody else is doing the kick.

IIRC there were some other approaches to address this lock holding during
kick but this looks like the less intrusive one.

> Signed-off-by: Stefan Hajnoczi 
Acked-by: Christian Borntraeger 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] virtio_blk: unlock vblk->lock during kick

2012-05-30 Thread Stefan Hajnoczi
Holding the vblk->lock across kick causes poor scalability in SMP
guests.  If one CPU is doing virtqueue kick and another CPU touches the
vblk->lock it will have to spin until virtqueue kick completes.

This patch reduces system% CPU utilization in SMP guests that are
running multithreaded I/O-bound workloads.  The improvements are small
but show as iops and SMP are increased.

Khoa Huynh  provided initial performance data that
indicates this optimization is worthwhile at high iops.

Asias He  reports the following fio results:

Host: Linux 3.4.0+ #302 SMP x86_64 GNU/Linux
Guest: same as host kernel

Average 3 runs:
with locked kick
-
readiops=119907.50 bw=59954.00 runt=35018.50 io=2048.00
write   iops=217187.00 bw=108594.00 runt=19312.00 io=2048.00
readiops=33948.00 bw=16974.50 runt=186820.50 io=3095.70
write   iops=35014.00 bw=17507.50 runt=181151.00 io=3095.70
clat (usec) max=3484.10 avg=121085.38 stdev=174416.11 min=0.00
clat (usec) max=3438.30 avg=59863.35 stdev=116607.69 min=0.00
clat (usec) max=3745.65 avg=454501.30 stdev=332699.00 min=0.00
clat (usec) max=4089.75 avg=442374.99 stdev=304874.62 min=0.00
cpu sys=615.12 majf=24080.50 ctx=64253616.50 usr=68.08 minf=17907363.00
cpu sys=1235.95 majf=23389.00 ctx=59788148.00 usr=98.34 minf=20020008.50
cpu sys=764.96 majf=28414.00 ctx=848279274.00 usr=36.39 minf=19737254.00
cpu sys=714.13 majf=21853.50 ctx=854608972.00 usr=33.56 minf=18256760.50

with unlocked kick
-
readiops=118559.00 bw=59279.66 runt=35400.66 io=2048.00
write   iops=227560.00 bw=113780.33 runt=18440.00 io=2048.00
readiops=34567.66 bw=17284.00 runt=183497.33 io=3095.70
write   iops=34589.33 bw=17295.00 runt=183355.00 io=3095.70
clat (usec) max=3485.56 avg=121989.58 stdev=197355.15 min=0.00
clat (usec) max=3222.33 avg=57784.11 stdev=141002.89 min=0.00
clat (usec) max=4060.93 avg=447098.65 stdev=315734.33 min=0.00
clat (usec) max=3656.30 avg=447281.70 stdev=314051.33 min=0.00
cpu sys=683.78 majf=24501.33 ctx=64435364.66 usr=68.91 minf=17907893.33
cpu sys=1218.24 majf=25000.33 ctx=60451475.00 usr=101.04 minf=19757720.00
cpu sys=740.39 majf=24809.00 ctx=845290443.66 usr=37.25 minf=19349958.33
cpu sys=723.63 majf=27597.33 ctx=850199927.33 usr=35.35 minf=19092343.00

FIO config file
-

[global]
exec_prerun="echo 3 > /proc/sys/vm/drop_caches"
group_reporting
norandommap
ioscheduler=noop
thread
bs=512
size=4MB
direct=1
filename=/dev/vdb
numjobs=256
ioengine=aio
iodepth=64
loops=3

Signed-off-by: Stefan Hajnoczi 
---
 drivers/block/virtio_blk.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 693187d..1a50f41 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -201,8 +201,14 @@ static void do_virtblk_request(struct request_queue *q)
issued++;
}
 
-   if (issued)
-   virtqueue_kick(vblk->vq);
+   if (!issued)
+   return;
+
+   if (virtqueue_kick_prepare(vblk->vq)) {
+   spin_unlock(&vblk->lock);
+   virtqueue_notify(vblk->vq);
+   spin_lock(&vblk->lock);
+   }
 }
 
 /* return id (s/n) string for *disk to *id_str
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V8 17/17] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

2012-05-30 Thread Jan Kiszka
On 2012-05-02 12:09, Raghavendra K T wrote:
> From: Raghavendra K T  
> 
> KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
> enabled guest.
> 
> KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be 
> enabled
> in guest.
> 
> Thanks Alex for KVM_HC_FEATURES inputs and Vatsa for rewriting KVM_HC_KICK_CPU

This contains valuable documentation for features that are already
supported. Can you break them out and post as separate patch already?
One comment on them below.

> 
> Signed-off-by: Srivatsa Vaddagiri 
> Signed-off-by: Raghavendra K T 
> ---
>  Documentation/virtual/kvm/cpuid.txt  |4 ++
>  Documentation/virtual/kvm/hypercalls.txt |   60 
> ++
>  2 files changed, 64 insertions(+), 0 deletions(-)
> diff --git a/Documentation/virtual/kvm/cpuid.txt 
> b/Documentation/virtual/kvm/cpuid.txt
> index 8820685..062dff9 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock 
> available at msrs
>  KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
> ||   || writing to msr 0x4b564d02
>  
> --
> +KVM_FEATURE_PV_UNHALT  || 6 || guest checks this feature bit
> +   ||   || before enabling 
> paravirtualized
> +   ||   || spinlock support.
> +--
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no 
> guest-side
> ||   || per-cpu warps are expected in
> ||   || kvmclock.
> diff --git a/Documentation/virtual/kvm/hypercalls.txt 
> b/Documentation/virtual/kvm/hypercalls.txt
> new file mode 100644
> index 000..bc3f14a
> --- /dev/null
> +++ b/Documentation/virtual/kvm/hypercalls.txt
> @@ -0,0 +1,60 @@
> +KVM Hypercalls Documentation
> +===
> +The template for each hypercall is:
> +1. Hypercall name, value.
> +2. Architecture(s)
> +3. Status (deprecated, obsolete, active)
> +4. Purpose
> +
> +1. KVM_HC_VAPIC_POLL_IRQ
> +
> +Value: 1
> +Architecture: x86
> +Purpose: None

Purpose: Trigger guest exit so that the host can check for pending
interrupts on reentry.

> +
> +2. KVM_HC_MMU_OP
> +
> +Value: 2
> +Architecture: x86
> +Status: deprecated.
> +Purpose: Support MMU operations such as writing to PTE,
> +flushing TLB, release PT.
> +
> +3. KVM_HC_FEATURES
> +
> +Value: 3
> +Architecture: PPC
> +Status: active
> +Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid
> +used to enumerate which hypercalls are available. On PPC, either device tree
> +based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration
> +mechanism (which is this hypercall) can be used.
> +
> +4. KVM_HC_PPC_MAP_MAGIC_PAGE
> +
> +Value: 4
> +Architecture: PPC
> +Status: active
> +Purpose: To enable communication between the hypervisor and guest there is a
> +shared page that contains parts of supervisor visible register state.
> +The guest can map this shared page to access its supervisor register through
> +memory using this hypercall.
> +
> +5. KVM_HC_KICK_CPU
> +
> +Value: 5
> +Architecture: x86
> +Status: active
> +Purpose: Hypercall used to wakeup a vcpu from HLT state
> +
> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in 
> guest
> +kernel mode for an event to occur (ex: a spinlock to become available) can
> +execute HLT instruction once it has busy-waited for more than a threshold
> +time-interval. Execution of HLT instruction would cause the hypervisor to put
> +the vcpu to sleep until occurence of an appropriate event. Another vcpu of 
> the
> +same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
> +specifying APIC ID of the vcpu to be wokenup.
> +
> +TODO:
> +1. more information on input and output needed?
> +2. Add more detail to purpose of hypercalls.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] Enable LTR/OBFF before device is used by driver

2012-05-30 Thread Hao, Xudong
> -Original Message-
> From: Bjorn Helgaas [mailto:bhelg...@google.com]
> Sent: Saturday, May 19, 2012 9:20 AM
> To: Xudong Hao
> Cc: linux-...@vger.kernel.org; linux-ker...@vger.kernel.org;
> kvm@vger.kernel.org; a...@redhat.com; alex.william...@redhat.com; Zhang,
> Xiantao; Hao, Xudong
> Subject: Re: [PATCH 1/1] Enable LTR/OBFF before device is used by driver
> 
> On Sun, May 13, 2012 at 8:48 PM, Xudong Hao 
> wrote:
> > Enable LTR(Latency tolerance reporting) and OBFF(optimized buffer 
> > flush/fill)
> in
> >  pci_enable_device(), so that they are enabled before the device is used by
> driver.
> 
> Please split this into two patches (one for LTR and another for OBFF)
> so they can be reverted individually if they cause trouble.  

OK.

> It would
> be nice if you bundled these together with your other "save/restore
> max Latency Value" patch so they were all together on the mailing
> list.
> 
Sure, I'll modify the save/restore patch and bundle them together.

> I read the LTR sections of the PCIe spec, but I can't figure out how
> it's supposed to work.  It says "power management policies ... can be
> implemented to consider Endpoint service requirements."  Does that
> mean there's some OS software that might be involved, or is this just
> a matter of software flipping the LTR-enable bit and the hardware
> doing everything else?  How confident can we be that enabling this is
> safe?
> 

Software only set the LTR-enable bit, then hardware/chipset/device do 
everything else. There are one thing that software can be involved: software 
can configure maximum latency tolerance.

> For OBFF, is there some OS piece not included here that tells a Root
> Complex that "now is a good time for Endpoints to do something," e.g.,
> the spec mentions an "operating system timer tick."  Is there some
> benefit to this patch without that piece?  I don't understand the big
> picture yet.
> 

As like LTR, OBFF do not need OS do additional changes, just set obff-enable 
bit.

> > Signed-off-by: Xudong Hao 
> >
> > ---
> >  drivers/pci/pci.c |   29 +
> >  1 files changed, 29 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 111569c..2369883 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1134,6 +1134,31 @@ int pci_load_and_free_saved_state(struct
> pci_dev *dev,
> >  }
> >  EXPORT_SYMBOL_GPL(pci_load_and_free_saved_state);
> >
> > +static void pci_enable_dev_caps(struct pci_dev *dev)
> > +{
> > +       /* set default value */
> > +       unsigned long type = PCI_EXP_OBFF_SIGNAL_ALWAYS;
> 
> There's only one use of this value, so skip the variable and just use
> PCI_EXP_OBFF_SIGNAL_ALWAYS in the call.
> 

Ok.

> The comment at pci_enable_obff() says PCI_OBFF_SIGNAL_L0 is the
> preferred type, so please explain why you're not using that.
> 

Yes, here it's better to set PCI_OBFF_SIGNAL_L0 by default.

> > +
> > +       /* LTR(Latency tolerance reporting) allows devices to send
> > +        * messages to the root complex indicating their latency
> > +        * tolerance for snooped & unsnooped memory transactions.
> > +        */
> 
> Follow Linux comment style, i.e.,
> 
> /*
>  * LTR ...
>  */
> 

Will modify, Thanks.

> > +       pci_enable_ltr(dev);
> > +
> > +       /* OBFF (optimized buffer flush/fill), where supported,
> > +        * can help improve energy efficiency by giving devices
> > +        * information about when interrupts and other activity
> > +        * will have a reduced power impact.
> > +        */
> > +       pci_enable_obff(dev, type);
> > +}
> > +
> > +static void pci_disable_dev_caps(struct pci_dev *dev)
> > +{
> > +       pci_disable_obff(dev);
> > +       pci_disable_ltr(dev);
> > +}
> > +
> >  static int do_pci_enable_device(struct pci_dev *dev, int bars)
> >  {
> >        int err;
> > @@ -1146,6 +1171,9 @@ static int do_pci_enable_device(struct pci_dev
> *dev, int bars)
> >                return err;
> >        pci_fixup_device(pci_fixup_enable, dev);
> >
> > +       /* Enable some device capibility before it's used by driver. */
> > +       pci_enable_dev_caps(dev);
> 
> Why is this here?  It seems similar to what's already in
> pci_init_capabilities().  Is there a reason to do this in the
> pci_enable_device() path rather than in the pci_device_add() path?
> 

pci_enable_device is called by any pci driver including kvm driver, Considering 
such a case in kvm, when device is assigned to guest(the device will be reset), 
we want not host lose those advanced PM feature, so add it in pci_enable_device 
so that kvm driver call it.
 
> > +
> >        return 0;
> >  }
> >
> > @@ -1361,6 +1389,7 @@ static void do_pci_disable_device(struct pci_dev
> *dev)
> >        }
> >
> >        pcibios_disable_device(dev);
> > +       pci_disable_dev_caps(dev);
> >  }
> >
> >  /**
> > --
> > 1.6.0.rc1
> >
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to

Re: [PATCH RFC V8 0/17] Paravirtualized ticket spinlocks

2012-05-30 Thread Raghavendra K T

On 05/16/2012 08:49 AM, Raghavendra K T wrote:

On 05/14/2012 12:15 AM, Raghavendra K T wrote:

On 05/07/2012 08:22 PM, Avi Kivity wrote:

I could not come with pv-flush results (also Nikunj had clarified that
the result was on NOn PLE


I'd like to see those numbers, then.

Ingo, please hold on the kvm-specific patches, meanwhile.

[...]

To summarise,
with 32 vcpu guest with nr thread=32 we get around 27% improvement. In
very low/undercommitted systems we may see very small improvement or
small acceptable degradation ( which it deserves).



For large guests, current value SPIN_THRESHOLD, along with ple_window 
needed some of research/experiment.


[Thanks to Jeremy/Nikunj for inputs and help in result analysis ]

I started with debugfs spinlock/histograms, and ran experiments with 32, 
64 vcpu guests for spin threshold of 2k, 4k, 8k, 16k, and 32k with

1vm/2vm/4vm  for kernbench, sysbench, ebizzy, hackbench.
[ spinlock/histogram  gives logarithmic view of lockwait times ]

machine: PLE machine  with 32 cores.

Here is the result summary.
The summary includes 2 part,
(1) %improvement w.r.t 2K spin threshold,
(2) improvement w.r.t sum of histogram numbers in debugfs (that gives 
rough indication of contention/cpu time wasted)


 For e.g 98% for 4k threshold kbench 1 vm would imply, there is a 98% 
reduction in sigma(histogram values) compared to 2k case


Result for 32 vcpu guest
==
++---+---+---+---+
|Base-2k | 4k|8k |   16k |32k|
++---+---+---+---+
| kbench-1vm |   44  |   50  |   46  |   41  |
|  SPINHisto-1vm |   98  |   99  |   99  |   99  |
| kbench-2vm |   25  |   45  |   49  |   45  |
|  SPINHisto-2vm |   31  |   91  |   99  |   99  |
| kbench-4vm |  -13  |  -27  |   -2  |   -4  |
|  SPINHisto-4vm |   29  |   66  |   95  |   99  |
++---+---+---+---+
| ebizzy-1vm |  954  |  942  |  913  |  915  |
|  SPINHisto-1vm |   96  |   99  |   99  |   99  |
| ebizzy-2vm |  158  |  135  |  123  |  106  |
|  SPINHisto-2vm |   90  |   98  |   99  |   99  |
| ebizzy-4vm |  -13  |  -28  |  -33  |  -37  |
|  SPINHisto-4vm |   83  |   98  |   99  |   99  |
++---+---+---+---+
| hbench-1vm |   48  |   56  |   52  |   64  |
|  SPINHisto-1vm |   92  |   95  |   99  |   99  |
| hbench-2vm |   32  |   40  |   39  |   21  |
|  SPINHisto-2vm |   74  |   96  |   99  |   99  |
| hbench-4vm |   27  |   15  |3  |  -57  |
|  SPINHisto-4vm |   68  |   88  |   94  |   97  |
++---+---+---+---+
|sysbnch-1vm |0  |0  |1  |0  |
|  SPINHisto-1vm |   76  |   98  |   99  |   99  |
|sysbnch-2vm |   -1  |3  |   -1  |   -4  |
|  SPINHisto-2vm |   82  |   94  |   96  |   99  |
|sysbnch-4vm |0  |   -2  |   -8  |  -14  |
|  SPINHisto-4vm |   57  |   79  |   88  |   95  |
++---+---+---+---+

result for 64  vcpu guest
=
++---+---+---+---+
|Base-2k | 4k|8k |   16k |32k|
++---+---+---+---+
| kbench-1vm |1  |  -11  |  -25  |   31  |
|  SPINHisto-1vm |3  |   10  |   47  |   99  |
| kbench-2vm |   15  |   -9  |  -66  |  -15  |
|  SPINHisto-2vm |2  |   11  |   19  |   90  |
++---+---+---+---+
| ebizzy-1vm |  784  | 1097  |  978  |  930  |
|  SPINHisto-1vm |   74  |   97  |   98  |   99  |
| ebizzy-2vm |   43  |   48  |   56  |   32  |
|  SPINHisto-2vm |   58  |   93  |   97  |   98  |
++---+---+---+---+
| hbench-1vm |8  |   55  |   56  |   62  |
|  SPINHisto-1vm |   18  |   69  |   96  |   99  |
| hbench-2vm |   13  |  -14  |  -75  |  -29  |
|  SPINHisto-2vm |   57  |   74  |   80  |   97  |
++---+---+---+---+
|sysbnch-1vm |9  |   11  |   15  |   10  |
|  SPINHisto-1vm |   80  |   93  |   98  |   99  |
|sysbnch-2vm |3  |3  |4  |2  |
|  SPINHisto-2vm |   72  |

[PATCH 2/4] KVM: PPC: Book3S HV: Make the guest hash table size configurable

2012-05-30 Thread Alexander Graf
From: Paul Mackerras 

This adds a new ioctl to enable userspace to control the size of the guest
hashed page table (HPT) and to clear it out when resetting the guest.
The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
a pointer to a u32 containing the desired order of the HPT (log base 2
of the size in bytes), which is updated on successful return to the
actual order of the HPT which was allocated.

There must be no vcpus running at the time of this ioctl.  To enforce
this, we now keep a count of the number of vcpus running in
kvm->arch.vcpus_running.

If the ioctl is called when a HPT has already been allocated, we don't
reallocate the HPT but just clear it out.  We first clear the
kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
the kvm->lock mutex, it will prevent any vcpus from starting to run until
we're done, and (b) it means that the first vcpu to run after we're done
will re-establish the VRMA if necessary.

If userspace doesn't call this ioctl before running the first vcpu, the
kernel will allocate a default-sized HPT at that point.  We do it then
rather than when creating the VM, as the code did previously, so that
userspace has a chance to do the ioctl if it wants.

When allocating the HPT, we can allocate either from the kernel page
allocator, or from the preallocated pool.  If userspace is asking for
a different size from the preallocated HPTs, we first try to allocate
using the kernel page allocator.  Then we try to allocate from the
preallocated pool, and then if that fails, we try allocating decreasing
sizes from the kernel page allocator, down to the minimum size allowed
(256kB).  Note that the kernel page allocator limits allocations to
1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
16MB (on 64-bit powerpc, at least).

Signed-off-by: Paul Mackerras 
[agraf: fix module compilation]
Signed-off-by: Alexander Graf 
---
 Documentation/virtual/kvm/api.txt|   36 +
 arch/powerpc/include/asm/kvm_book3s_64.h |7 +-
 arch/powerpc/include/asm/kvm_host.h  |4 +
 arch/powerpc/include/asm/kvm_ppc.h   |3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  123 +++---
 arch/powerpc/kvm/book3s_hv.c |   40 +++---
 arch/powerpc/kvm/book3s_hv_builtin.c |5 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   15 ++--
 arch/powerpc/kvm/powerpc.c   |   18 +
 include/linux/kvm.h  |3 +
 10 files changed, 200 insertions(+), 54 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 9301266..310fe50 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1930,6 +1930,42 @@ The "pte_enc" field provides a value that can OR'ed into 
the hash
 PTE's RPN field (ie, it needs to be shifted left by 12 to OR it
 into the hash PTE second double word).
 
+
+4.75 KVM_PPC_ALLOCATE_HTAB
+
+Capability: KVM_CAP_PPC_ALLOC_HTAB
+Architectures: powerpc
+Type: vm ioctl
+Parameters: Pointer to u32 containing hash table order (in/out)
+Returns: 0 on success, -1 on error
+
+This requests the host kernel to allocate an MMU hash table for a
+guest using the PAPR paravirtualization interface.  This only does
+anything if the kernel is configured to use the Book 3S HV style of
+virtualization.  Otherwise the capability doesn't exist and the ioctl
+returns an ENOTTY error.  The rest of this description assumes Book 3S
+HV.
+
+There must be no vcpus running when this ioctl is called; if there
+are, it will do nothing and return an EBUSY error.
+
+The parameter is a pointer to a 32-bit unsigned integer variable
+containing the order (log base 2) of the desired size of the hash
+table, which must be between 18 and 46.  On successful return from the
+ioctl, it will have been updated with the order of the hash table that
+was allocated.
+
+If no hash table has been allocated when any vcpu is asked to run
+(with the KVM_RUN ioctl), the host kernel will allocate a
+default-sized hash table (16 MB).
+
+If this ioctl is called when a hash table has already been allocated,
+the kernel will clear out the existing hash table (zero all HPTEs) and
+return the hash table order in the parameter.  (If the guest is using
+the virtualized real-mode area (VRMA) facility, the kernel will
+re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index b0c08b1..0dd1d86 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -36,11 +36,8 @@ static inline void svcpu_put(struct 
kvmppc_book3s_shadow_vcpu *svcpu)
 #define SPAPR_TCE_SHIFT12
 
 #ifdef CONFIG_KVM_BOOK3S_64_HV
-/* For now use fixed-size 16MB page table */
-#define HPT_ORDER  24
-#define HPT_NPTEG  (1ul << (HPT_ORDER - 7))

[PATCH 3/4] KVM: PPC: booke: Added DECAR support

2012-05-30 Thread Alexander Graf
From: Bharat Bhushan 

Added the decrementer auto-reload support. DECAR is readable
on e500v2/e500mc and later cpus.

Signed-off-by: Bharat Bhushan 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |2 ++
 arch/powerpc/kvm/booke.c|5 +
 arch/powerpc/kvm/booke_emulate.c|3 +++
 arch/powerpc/kvm/e500_emulate.c |3 +++
 4 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index dd783be..50ea12f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -418,7 +418,9 @@ struct kvm_vcpu_arch {
ulong mcsrr1;
ulong mcsr;
u32 dec;
+#ifdef CONFIG_BOOKE
u32 decar;
+#endif
u32 tbl;
u32 tbu;
u32 tcr;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 72f13f4..86681ee 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1267,6 +1267,11 @@ void kvmppc_decrementer_func(unsigned long data)
 {
struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
 
+   if (vcpu->arch.tcr & TCR_ARE) {
+   vcpu->arch.dec = vcpu->arch.decar;
+   kvmppc_emulate_dec(vcpu);
+   }
+
kvmppc_set_tsr_bits(vcpu, TSR_DIS);
 }
 
diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c
index 6c76397..9eb9809 100644
--- a/arch/powerpc/kvm/booke_emulate.c
+++ b/arch/powerpc/kvm/booke_emulate.c
@@ -129,6 +129,9 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
kvmppc_set_tcr(vcpu, spr_val);
break;
 
+   case SPRN_DECAR:
+   vcpu->arch.decar = spr_val;
+   break;
/*
 * Note: SPRG4-7 are user-readable.
 * These values are loaded into the real SPRGs when resuming the
diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
index 8b99e07..e04b0ef 100644
--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -269,6 +269,9 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, ulong *spr_val)
*spr_val = vcpu->arch.shared->mas7_3 >> 32;
break;
 #endif
+   case SPRN_DECAR:
+   *spr_val = vcpu->arch.decar;
+   break;
case SPRN_TLB0CFG:
*spr_val = vcpu->arch.tlbcfg[0];
break;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] KVM: PPC: Not optimizing MSR_CE and MSR_ME with paravirt.

2012-05-30 Thread Alexander Graf
From: Bharat Bhushan 

If there is pending critical or machine check interrupt then guest
would like to capture it when guest enable MSR.CE and MSR_ME respectively.
Also as mostly MSR_CE and MSR_ME are updated with rfi/rfci/rfmii
which anyway traps so removing the the paravirt optimization for MSR.CE
and MSR.ME.

Signed-off-by: Bharat Bhushan 
Signed-off-by: Alexander Graf 
---
 Documentation/virtual/kvm/ppc-pv.txt |2 --
 arch/powerpc/kernel/kvm_emul.S   |2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/ppc-pv.txt 
b/Documentation/virtual/kvm/ppc-pv.txt
index 6e7c370..4911cf9 100644
--- a/Documentation/virtual/kvm/ppc-pv.txt
+++ b/Documentation/virtual/kvm/ppc-pv.txt
@@ -109,8 +109,6 @@ The following bits are safe to be set inside the guest:
 
   MSR_EE
   MSR_RI
-  MSR_CR
-  MSR_ME
 
 If any other bit changes in the MSR, please still use mtmsr(d).
 
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index 62ceb2a..e100ff3 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -122,7 +122,7 @@ kvm_emulate_mtmsrd_len:
.long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4
 
 
-#define MSR_SAFE_BITS (MSR_EE | MSR_CE | MSR_ME | MSR_RI)
+#define MSR_SAFE_BITS (MSR_EE | MSR_RI)
 #define MSR_CRITICAL_BITS ~MSR_SAFE_BITS
 
 .global kvm_emulate_mtmsr
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: PPC: Factor out guest epapr initialization

2012-05-30 Thread Alexander Graf
From: Liu Yu-B13201 

epapr paravirtualization support is now a Kconfig
selectable option

Signed-off-by: Liu Yu 
[stuart.yo...@freescale.com: misc minor fixes, description update]
Signed-off-by: Stuart Yoder 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/epapr_hcalls.h |2 +
 arch/powerpc/kernel/Makefile|1 +
 arch/powerpc/kernel/epapr_hcalls.S  |   25 +++
 arch/powerpc/kernel/epapr_paravirt.c|   52 +++
 arch/powerpc/kernel/kvm.c   |   28 ++---
 arch/powerpc/kernel/kvm_emul.S  |   10 --
 arch/powerpc/platforms/Kconfig  |9 +
 7 files changed, 92 insertions(+), 35 deletions(-)
 create mode 100644 arch/powerpc/kernel/epapr_hcalls.S
 create mode 100644 arch/powerpc/kernel/epapr_paravirt.c

diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
b/arch/powerpc/include/asm/epapr_hcalls.h
index 976835d..bf2c06c 100644
--- a/arch/powerpc/include/asm/epapr_hcalls.h
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -153,6 +153,8 @@
 #define EV_HCALL_CLOBBERS2 EV_HCALL_CLOBBERS3, "r5"
 #define EV_HCALL_CLOBBERS1 EV_HCALL_CLOBBERS2, "r4"
 
+extern bool epapr_paravirt_enabled;
+extern u32 epapr_hypercall_start[];
 
 /*
  * We use "uintptr_t" to define a register because it's guaranteed to be a
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 83afacd..bb282dd 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -128,6 +128,7 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),)
 obj-y  += ppc_save_regs.o
 endif
 
+obj-$(CONFIG_EPAPR_PARAVIRT)   += epapr_paravirt.o epapr_hcalls.o
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o
 
 # Disable GCOV in odd or sensitive code
diff --git a/arch/powerpc/kernel/epapr_hcalls.S 
b/arch/powerpc/kernel/epapr_hcalls.S
new file mode 100644
index 000..697b390
--- /dev/null
+++ b/arch/powerpc/kernel/epapr_hcalls.S
@@ -0,0 +1,25 @@
+/*
+ * Copyright (C) 2012 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Hypercall entry point. Will be patched with device tree instructions. */
+.global epapr_hypercall_start
+epapr_hypercall_start:
+   li  r3, -1
+   nop
+   nop
+   nop
+   blr
diff --git a/arch/powerpc/kernel/epapr_paravirt.c 
b/arch/powerpc/kernel/epapr_paravirt.c
new file mode 100644
index 000..028aeae
--- /dev/null
+++ b/arch/powerpc/kernel/epapr_paravirt.c
@@ -0,0 +1,52 @@
+/*
+ * ePAPR para-virtualization support.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) 2012 Freescale Semiconductor, Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+bool epapr_paravirt_enabled;
+
+static int __init epapr_paravirt_init(void)
+{
+   struct device_node *hyper_node;
+   const u32 *insts;
+   int len, i;
+
+   hyper_node = of_find_node_by_path("/hypervisor");
+   if (!hyper_node)
+   return -ENODEV;
+
+   insts = of_get_property(hyper_node, "hcall-instructions", &len);
+   if (!insts)
+   return -ENODEV;
+
+   if (len % 4 || len > (4 * 4))
+   return -ENODEV;
+
+   for (i = 0; i < (len / 4); i++)
+   patch_instruction(epapr_hypercall_start + i, insts[i]);
+
+   epapr_paravirt_enabled = true;
+
+   return 0;
+}
+
+early_initcall(epapr_paravirt_init);
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 62bdf23..1c13307 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_MAGIC_PAGE (-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
@@ -726,7 +727,7 @@ unsigned long kvm_hypercall(unsigned long *in,
unsigned long register r11 asm("r11") = nr;
unsigned long register r12 asm("r12");
 
-   asm volatile("blkvm_hypercall_start"
+   asm volatile("blepapr_hypercall_start"
 : "=r"(r0), "=r"(r3), "=r"(r

[PULL 0/4] ppc patch queue 2012-05-30

2012-05-30 Thread Alexander Graf
Hi Avi,

This is my current patch queue for ppc. Please pull.

Changes this time include:

  - Generalize KVM_GUEST support to overall ePAPR code
  - Fix reset for Book3S HV
  - Fix machine check deferral when CONFIG_KVM_GUEST=y
  - Add support for BookE register DECAR

Alex


The following changes since commit b48b2c3e50433ff6f7e46186daa7f986bd960215:
  Jonas Bonn (1):
openrisc: use generic strnlen_user() function

are available in the git repository at:

  git://github.com/agraf/linux-2.6.git for-upstream

Bharat Bhushan (2):
  KVM: PPC: booke: Added DECAR support
  KVM: PPC: Not optimizing MSR_CE and MSR_ME with paravirt.

Liu Yu-B13201 (1):
  KVM: PPC: Factor out guest epapr initialization

Paul Mackerras (1):
  KVM: PPC: Book3S HV: Make the guest hash table size configurable

 Documentation/virtual/kvm/api.txt|   36 +
 Documentation/virtual/kvm/ppc-pv.txt |2 -
 arch/powerpc/include/asm/epapr_hcalls.h  |2 +
 arch/powerpc/include/asm/kvm_book3s_64.h |7 +-
 arch/powerpc/include/asm/kvm_host.h  |6 ++
 arch/powerpc/include/asm/kvm_ppc.h   |3 +-
 arch/powerpc/kernel/Makefile |1 +
 arch/powerpc/kernel/epapr_hcalls.S   |   25 ++
 arch/powerpc/kernel/epapr_paravirt.c |   52 +
 arch/powerpc/kernel/kvm.c|   28 +--
 arch/powerpc/kernel/kvm_emul.S   |   12 +---
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  123 +++---
 arch/powerpc/kvm/book3s_hv.c |   40 +++---
 arch/powerpc/kvm/book3s_hv_builtin.c |5 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   15 ++--
 arch/powerpc/kvm/booke.c |5 +
 arch/powerpc/kvm/booke_emulate.c |3 +
 arch/powerpc/kvm/e500_emulate.c  |3 +
 arch/powerpc/kvm/powerpc.c   |   18 +
 arch/powerpc/platforms/Kconfig   |9 ++
 include/linux/kvm.h  |3 +
 21 files changed, 306 insertions(+), 92 deletions(-)
 create mode 100644 arch/powerpc/kernel/epapr_hcalls.S
 create mode 100644 arch/powerpc/kernel/epapr_paravirt.c
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: UDP problem with virtio...

2012-05-30 Thread Erik Brakkee

Stefan Hajnoczi wrote:

On Sat, May 26, 2012 at 11:23 AM, Erik Brakkee  wrote:

I have done some more experiments and it does work when using a Centos 6.2
guest. Therefore, it is most likely a compatibility issues between the guest
and host virtio implementation.
I read somewhere there is some sort of compatibility mode for virtio but
could not find how to activate it. Is there a way to make these different
virtio host/guest implementations talk to each other?

They should be compatible.

I noticed in your domain XML the VM has two virtio-net interfaces.
Could that be affecting your mount success?

Did you collect a packet capture inside the guest?  (tcpdump -nli eth0
-s0 -w /tmp/dump.pcap)

It would be interesting to understand what the guest sees during the
failed and successful UDP NFS mounts.

Can you be more specific about the mount failure.  Is there an error
message?  Anything interesting in the guest's dmesg?

Did you collect a packet capture on the host?

Stefan


Hi all,

I sent a reply earlier but apparently not to this mailing list.
The problem went away after installing the latest software updates on 
the openSuSE 11.3 guest. I tried reproducing it from a backup of the VM 
running the older kernel but was unable to do this on a openSuSE 12.1 
host. To reproduce it I would have had to use the centos 6.2 host but 
that would be quite a lot more work to do. So I am leaving the issue now 
as resolved. It was probably some earlier bug that was fixed.


Cheers
  Erik


--

Nonsense and other useful things: http://brakkee.org
MountainHoppers: http://mountainhoppers.nl
Track Detective: http://trackdetective.com
Twitter: http://twitter.com/ErikBrakkee



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] pci-assign: Drop write-only AssignedDevRegion::num

2012-05-30 Thread Jan Kiszka
No one was reading back this field.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index da50069..713e32a 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -95,7 +95,6 @@ typedef struct {
 void *r_virtbase;/* mmapped access address for memory regions */
 uint32_t r_baseport; /* the base guest port for I/O regions */
 } u;
-int num;/* our index within v_addrs[] */
 pcibus_t e_size;/* emulated size of region in bytes */
 pcibus_t r_size;/* real size of region in bytes */
 PCIRegion *region;
@@ -449,7 +448,6 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 for (i = 0; i < regions_num; i++, cur_region++) {
 if (!cur_region->valid)
 continue;
-pci_dev->v_addrs[i].num = i;
 
 /* handle memory io regions */
 if (cur_region->type & IORESOURCE_MEM) {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] pci-assign: Drop kvm_assigned_irq::host_irq initialization

2012-05-30 Thread Jan Kiszka
real_device.irq is never set explicitly, thus remains 0. So we can
simply drop this line as assigned_irq_data is zero-initialized anyway.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 713e32a..9271002 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -883,7 +883,6 @@ static int assign_irq(AssignedDevice *dev)
 memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
 assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
 assigned_irq_data.guest_irq = irq;
-assigned_irq_data.host_irq = dev->real_device.irq;
 if (dev->irq_requested_type) {
 assigned_irq_data.flags = dev->irq_requested_type;
 r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] pci-assign: Drop iommu property

2012-05-30 Thread Jan Kiszka
Disabling the IOMMU for an assigned device was never more than a highly
experimental features and is no longer supported by host kernels >= 3.2.
So drop this useless control from property list.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |   22 ++
 hw/device-assignment.h |6 ++
 2 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 077d81e..b9b955b 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -749,21 +749,13 @@ static int assign_device(AssignedDevice *dev)
 assigned_dev_data.busnr = dev->h_busnr;
 assigned_dev_data.devfn = dev->h_devfn;
 
-/* We always enable the IOMMU unless disabled on the command line */
-if (dev->features & ASSIGNED_DEVICE_USE_IOMMU_MASK) {
-if (!kvm_check_extension(kvm_state, KVM_CAP_IOMMU)) {
-fprintf(stderr, "No IOMMU found.  Unable to assign device 
\"%s\"\n",
-dev->dev.qdev.id);
-return -ENODEV;
-}
-assigned_dev_data.flags |= KVM_DEV_ASSIGN_ENABLE_IOMMU;
-}
-if (!(dev->features & ASSIGNED_DEVICE_USE_IOMMU_MASK)) {
-fprintf(stderr,
-"WARNING: Assigning a device without IOMMU protection can "
-"cause host memory corruption if the device issues DMA write "
-"requests!\n");
+assigned_dev_data.flags = KVM_DEV_ASSIGN_ENABLE_IOMMU;
+if (!kvm_check_extension(kvm_state, KVM_CAP_IOMMU)) {
+fprintf(stderr, "No IOMMU found.  Unable to assign device \"%s\"\n",
+dev->dev.qdev.id);
+return -ENODEV;
 }
+
 if (dev->features & ASSIGNED_DEVICE_SHARE_INTX_MASK &&
 kvm_has_intx_set_mask()) {
 assigned_dev_data.flags |= KVM_DEV_ASSIGN_PCI_2_3;
@@ -1782,8 +1774,6 @@ PropertyInfo qdev_prop_hostaddr = {
 static Property da_properties[] =
 {
 DEFINE_PROP("host", AssignedDevice, host, qdev_prop_hostaddr, 
PCIHostDevice),
-DEFINE_PROP_BIT("iommu", AssignedDevice, features,
-   ASSIGNED_DEVICE_USE_IOMMU_BIT, true),
 DEFINE_PROP_BIT("prefer_msi", AssignedDevice, features,
 ASSIGNED_DEVICE_PREFER_MSI_BIT, false),
 DEFINE_PROP_BIT("share_intx", AssignedDevice, features,
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 5d271d5..1ef2dbe 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -74,11 +74,9 @@ typedef struct {
 PCIRegion *region;
 } AssignedDevRegion;
 
-#define ASSIGNED_DEVICE_USE_IOMMU_BIT   0
-#define ASSIGNED_DEVICE_PREFER_MSI_BIT  1
-#define ASSIGNED_DEVICE_SHARE_INTX_BIT  2
+#define ASSIGNED_DEVICE_PREFER_MSI_BIT  0
+#define ASSIGNED_DEVICE_SHARE_INTX_BIT  1
 
-#define ASSIGNED_DEVICE_USE_IOMMU_MASK  (1 << ASSIGNED_DEVICE_USE_IOMMU_BIT)
 #define ASSIGNED_DEVICE_PREFER_MSI_MASK (1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
 #define ASSIGNED_DEVICE_SHARE_INTX_MASK (1 << ASSIGNED_DEVICE_SHARE_INTX_BIT)
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] pci-assign: Privatize type definitions

2012-05-30 Thread Jan Kiszka
No need to carry the type definitions in device-assignment.h. Move them
over to allow dropping the header once we use an INTx routing notifier
instead of exporting assigned_dev_update_irqs.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |   90 +
 hw/device-assignment.h |   96 
 2 files changed, 90 insertions(+), 96 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index b9b955b..da50069 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "qemu-kvm.h"
@@ -40,6 +41,7 @@
 #include "monitor.h"
 #include "range.h"
 #include "sysemu.h"
+#include "pci.h"
 
 #define MSIX_PAGE_SIZE 0x1000
 
@@ -61,6 +63,94 @@
 #define DEBUG(fmt, ...) do { } while(0)
 #endif
 
+typedef struct PCIHostDevice {
+int seg;
+int bus;
+int dev;
+int func;
+} PCIHostDevice;
+
+typedef struct {
+int type;   /* Memory or port I/O */
+int valid;
+uint32_t base_addr;
+uint32_t size;/* size of the region */
+int resource_fd;
+} PCIRegion;
+
+typedef struct {
+uint8_t bus, dev, func; /* Bus inside domain, device and function */
+int irq;/* IRQ number */
+uint16_t region_number; /* number of active regions */
+
+/* Port I/O or MMIO Regions */
+PCIRegion regions[PCI_NUM_REGIONS - 1];
+int config_fd;
+} PCIDevRegions;
+
+typedef struct {
+MemoryRegion container;
+MemoryRegion real_iomem;
+union {
+void *r_virtbase;/* mmapped access address for memory regions */
+uint32_t r_baseport; /* the base guest port for I/O regions */
+} u;
+int num;/* our index within v_addrs[] */
+pcibus_t e_size;/* emulated size of region in bytes */
+pcibus_t r_size;/* real size of region in bytes */
+PCIRegion *region;
+} AssignedDevRegion;
+
+#define ASSIGNED_DEVICE_PREFER_MSI_BIT  0
+#define ASSIGNED_DEVICE_SHARE_INTX_BIT  1
+
+#define ASSIGNED_DEVICE_PREFER_MSI_MASK (1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
+#define ASSIGNED_DEVICE_SHARE_INTX_MASK (1 << ASSIGNED_DEVICE_SHARE_INTX_BIT)
+
+typedef struct {
+uint32_t addr_lo;
+uint32_t addr_hi;
+uint32_t data;
+uint32_t ctrl;
+} MSIXTableEntry;
+
+typedef struct AssignedDevice {
+PCIDevice dev;
+PCIHostDevice host;
+uint32_t features;
+int intpin;
+uint8_t debug_flags;
+AssignedDevRegion v_addrs[PCI_NUM_REGIONS - 1];
+PCIDevRegions real_device;
+int run;
+int girq;
+uint16_t h_segnr;
+uint8_t h_busnr;
+uint8_t h_devfn;
+int irq_requested_type;
+int bound;
+struct {
+#define ASSIGNED_DEVICE_CAP_MSI (1 << 0)
+#define ASSIGNED_DEVICE_CAP_MSIX (1 << 1)
+uint32_t available;
+#define ASSIGNED_DEVICE_MSI_ENABLED (1 << 0)
+#define ASSIGNED_DEVICE_MSIX_ENABLED (1 << 1)
+#define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
+uint32_t state;
+} cap;
+uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
+uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
+int irq_entries_nr;
+struct kvm_irq_routing_entry *entry;
+MSIXTableEntry *msix_table;
+target_phys_addr_t msix_table_addr;
+uint16_t msix_max;
+MemoryRegion mmio;
+char *configfd_name;
+int32_t bootindex;
+QLIST_ENTRY(AssignedDevice) next;
+} AssignedDevice;
+
 static void assigned_dev_load_option_rom(AssignedDevice *dev);
 
 static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 1ef2dbe..3fcb804 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -28,102 +28,6 @@
 #ifndef __DEVICE_ASSIGNMENT_H__
 #define __DEVICE_ASSIGNMENT_H__
 
-#include 
-#include "qemu-common.h"
-#include "qemu-queue.h"
-#include "pci.h"
-
-/* From include/linux/pci.h in the kernel sources */
-#define PCI_DEVFN(slot, func)   slot) & 0x1f) << 3) | ((func) & 0x07))
-
-typedef struct PCIHostDevice {
-int seg;
-int bus;
-int dev;
-int func;
-} PCIHostDevice;
-
-typedef struct {
-int type;   /* Memory or port I/O */
-int valid;
-uint32_t base_addr;
-uint32_t size;/* size of the region */
-int resource_fd;
-} PCIRegion;
-
-typedef struct {
-uint8_t bus, dev, func; /* Bus inside domain, device and function */
-int irq;/* IRQ number */
-uint16_t region_number; /* number of active regions */
-
-/* Port I/O or MMIO Regions */
-PCIRegion regions[PCI_NUM_REGIONS - 1];
-int config_fd;
-} PCIDevRegions;
-
-typedef struct {
-MemoryRegion container;
-MemoryRegion real_iomem;
-union {
-void *r_virtbase;/* mmapped access address for memory regions */
-uint32_t r_baseport; /* the base guest port for I/O regions */
-} u;
-int num;/* our index within v_addrs[] */
-pcibus_t e_size;

[PATCH 1/5] pci-assign: Hide ioport regions on lacking sysfs support

2012-05-30 Thread Jan Kiszka
As suggested by Alex: Instead of failing if the kernel does not allow us
to speak to an ioport region, warn the user but, hide the region and
continue.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 9ad5de5..077d81e 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -430,10 +430,11 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 ret);
 abort();
 } else if (errno != EINVAL) {
-fprintf(stderr,
-"Kernel doesn't support ioport resource access.\n");
+fprintf(stderr, "Kernel doesn't support ioport resource "
+"access, hiding this region.\n");
 close(pci_dev->v_addrs[i].region->resource_fd);
-return -1;
+cur_region->valid = 0;
+continue;
 }
 
 pci_dev->v_addrs[i].u.r_baseport = cur_region->base_addr;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] pci-assign: More small cleanups

2012-05-30 Thread Jan Kiszka
Besides an update for hiding unsupported ioport BARs and tiny code
massages, this removes the obsolete iommu property from the pci-assign
device.

Jan Kiszka (5):
  pci-assign: Hide ioport regions on lacking sysfs support
  pci-assign: Drop iommu property
  pci-assign: Privatize type definitions
  pci-assign: Drop write-only AssignedDevRegion::num
  pci-assign: Drop kvm_assigned_irq::host_irq initialization

 hw/device-assignment.c |  120 +++
 hw/device-assignment.h |   98 ---
 2 files changed, 99 insertions(+), 119 deletions(-)

-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci-assign: Hide ioport regions on lacking sysfs support

2012-05-30 Thread Jan Kiszka
On 2012-05-30 10:50, Avi Kivity wrote:
> On 05/30/2012 11:47 AM, Jan Kiszka wrote:
>> On 2012-05-30 10:21, Avi Kivity wrote:
>>> On 05/29/2012 08:28 PM, Alex Williamson wrote:
 On Tue, 2012-05-29 at 20:13 +0300, Avi Kivity wrote:
> On 05/29/2012 08:04 PM, Jan Kiszka wrote:
>> As suggested by Alex: Instead of failing if the kernel does not allow us
>> to speak to an ioport region, warn the user but, hide the region and
>> continue.
>
>
> Should we not, in addition, abort if the region is actually used?  A
> guest malfunction is likely if we don't.

 The only way we could know that it's used is if it's the device ends up
 with no valid regions as a result of this.  Otherwise it's dependent on
 both the device and the driver whether it can still function without the
 i/o port regions.  Thanks,
>>>
>>> If the I/O callback is called, we know it's used.
>>
>> We neither expose the region to the guest (so the guest has no clue
>> where to write to unless it assumes a fixed address - of which we have
>> no clue) nor register any callback for it.
> 
> Ah, I thought you expose the BAR but don't back it with anything.  No
> idea which approach is better, so we might as well try yours first.

Great. But don't apply this version, I just found some testing
left-over. Will send and update soon.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci-assign: Hide ioport regions on lacking sysfs support

2012-05-30 Thread Avi Kivity
On 05/30/2012 11:47 AM, Jan Kiszka wrote:
> On 2012-05-30 10:21, Avi Kivity wrote:
>> On 05/29/2012 08:28 PM, Alex Williamson wrote:
>>> On Tue, 2012-05-29 at 20:13 +0300, Avi Kivity wrote:
 On 05/29/2012 08:04 PM, Jan Kiszka wrote:
> As suggested by Alex: Instead of failing if the kernel does not allow us
> to speak to an ioport region, warn the user but, hide the region and
> continue.


 Should we not, in addition, abort if the region is actually used?  A
 guest malfunction is likely if we don't.
>>>
>>> The only way we could know that it's used is if it's the device ends up
>>> with no valid regions as a result of this.  Otherwise it's dependent on
>>> both the device and the driver whether it can still function without the
>>> i/o port regions.  Thanks,
>> 
>> If the I/O callback is called, we know it's used.
> 
> We neither expose the region to the guest (so the guest has no clue
> where to write to unless it assumes a fixed address - of which we have
> no clue) nor register any callback for it.

Ah, I thought you expose the BAR but don't back it with anything.  No
idea which approach is better, so we might as well try yours first.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci-assign: Hide ioport regions on lacking sysfs support

2012-05-30 Thread Jan Kiszka
On 2012-05-30 10:21, Avi Kivity wrote:
> On 05/29/2012 08:28 PM, Alex Williamson wrote:
>> On Tue, 2012-05-29 at 20:13 +0300, Avi Kivity wrote:
>>> On 05/29/2012 08:04 PM, Jan Kiszka wrote:
 As suggested by Alex: Instead of failing if the kernel does not allow us
 to speak to an ioport region, warn the user but, hide the region and
 continue.
>>>
>>>
>>> Should we not, in addition, abort if the region is actually used?  A
>>> guest malfunction is likely if we don't.
>>
>> The only way we could know that it's used is if it's the device ends up
>> with no valid regions as a result of this.  Otherwise it's dependent on
>> both the device and the driver whether it can still function without the
>> i/o port regions.  Thanks,
> 
> If the I/O callback is called, we know it's used.

We neither expose the region to the guest (so the guest has no clue
where to write to unless it assumes a fixed address - of which we have
no clue) nor register any callback for it.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spapr: Add "memop" hypercall

2012-05-30 Thread Alexander Graf

On 28.05.2012, at 12:40, Avi Kivity wrote:

> On 05/25/2012 06:12 AM, Benjamin Herrenschmidt wrote:
>> 
>> BTW. This is a qemu patch, and that hypercall isn't KVM related at all,
>> ie, it's implemented in qemu and is used with or without KVM, so
>> documenting it in the kernel tree makes little sense. Same goes with
>> H_RTAS.
>> 
>> I'll add a doc to qemu in my next spin of it.
>> 
> 
> Depends.  How do you detect it exists?  Are you detecting kvm, or qemu,
> or the hypercall itself?

The hypercall itself. SLOF is the only user. QEMU provides SLOF. SLOF calls the 
hypercall. If the hypercall returns "I don't exist", it doesn't exist. :)

> I'd hate us to find ourselves in a maze of disconnected documentation
> with no clear guidelines on when a feature is available and when it is not.

Yeah, but semantically these hypercalls are on the same layer as fw_cfg. So 
they clearly belong to QEMU. In fact, they're also used when running with 
emulation.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci-assign: Hide ioport regions on lacking sysfs support

2012-05-30 Thread Avi Kivity
On 05/29/2012 08:28 PM, Alex Williamson wrote:
> On Tue, 2012-05-29 at 20:13 +0300, Avi Kivity wrote:
>> On 05/29/2012 08:04 PM, Jan Kiszka wrote:
>> > As suggested by Alex: Instead of failing if the kernel does not allow us
>> > to speak to an ioport region, warn the user but, hide the region and
>> > continue.
>> 
>> 
>> Should we not, in addition, abort if the region is actually used?  A
>> guest malfunction is likely if we don't.
> 
> The only way we could know that it's used is if it's the device ends up
> with no valid regions as a result of this.  Otherwise it's dependent on
> both the device and the driver whether it can still function without the
> i/o port regions.  Thanks,

If the I/O callback is called, we know it's used.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html