Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Kevin Wolf :
> Am 19.01.2011 14:04, schrieb Yoshiaki Tamura:
 +static void event_tap_blk_flush(EventTapBlkReq *blk_req)
 +{
 +    BlockDriverState *bs;
 +
 +    bs = bdrv_find(blk_req->device_name);
>>>
>>> Please store the BlockDriverState in blk_req. This code loops over all
>>> block devices and does a string comparison - and that for each request.
>>> You can also save the qemu_strdup() when creating the request.
>>>
>>> In the few places where you really need the device name (might be the
>>> case for load/save, I'm not sure), you can still get it from the
>>> BlockDriverState.
>>
>> I would do so for the primary side.  Although we haven't
>> implemented yet, we want to replay block requests from block
>> layer on the secondary side, and need device name to restore
>> BlockDriverState.
>
> Hm, I see. I'm not happy about it, but I don't have a suggestion right
> away how to avoid it.
>
>>>
 +
 +    if (blk_req->is_flush) {
 +        bdrv_aio_flush(bs, blk_req->reqs[0].cb, blk_req->reqs[0].opaque);
>>>
>>> You need to handle errors. If bdrv_aio_flush returns NULL, call the
>>> callback with -EIO.
>>
>> I'll do so.
>>
>>>
 +        return;
 +    }
 +
 +    bdrv_aio_writev(bs, blk_req->reqs[0].sector, blk_req->reqs[0].qiov,
 +                    blk_req->reqs[0].nb_sectors, blk_req->reqs[0].cb,
 +                    blk_req->reqs[0].opaque);
>>>
>>> Same here.
>>>
 +    bdrv_flush(bs);
>>>
>>> This looks really strange. What is this supposed to do?
>>>
>>> One point is that you write it immediately after bdrv_aio_write, so you
>>> get an fsync for which you don't know if it includes the current write
>>> request or if it doesn't. Which data do you want to get flushed to the disk?
>>
>> I was expecting to flush the aio request that was just initiated.
>> Am I misunderstanding the function?
>
> Seems so. The function names don't use really clear terminology either,
> so you're not the first one to fall in this trap. Basically we have:
>
> * qemu_aio_flush() waits for all AIO requests to complete. I think you
> wanted to have exactly this, but only for a single block device. Such a
> function doesn't exist yet.
>
> * bdrv_flush() makes sure that all successfully completed requests are
> written to disk (by calling fsync)
>
> * bdrv_aio_flush() is the asynchronous version of bdrv_flush, i.e. run
> the fsync in the thread pool

Then what I wanted to do is, call qemu_aio_flush first, then
bdrv_flush.  It should be like live migration.

>
>>> The other thing is that you introduce a bdrv_flush for each request,
>>> basically forcing everyone to something very similar to writethrough
>>> mode. I'm sure this will have a big impact on performance.
>>
>> The reason is to avoid inversion of queued requests.  Although
>> processing one-by-one is heavy, wouldn't having requests flushed
>> to disk out of order break the disk image?
>
> No, that's fine. If a guest issues two requests at the same time, they
> may complete in any order. You just need to make sure that you don't
> call the completion callback before the request really has completed.

We need to flush requests, meaning aio and fsync, before sending
the final state of the guests, to make sure we can switch to the
secondary safely.

> I'm just starting to wonder if the guest won't timeout the requests if
> they are queued for too long. Even more, with IDE, it can only handle
> one request at a time, so not completing requests doesn't sound like a
> good idea at all. In what intervals is the event-tap queue flushed?

The requests are flushed once each transaction completes.  So
it's not with specific intervals.

> On the other hand, if you complete before actually writing out, you
> don't get timeouts, but you signal success to the guest when the request
> could still fail. What would you do in this case? With a writeback cache
> mode we're fine, we can just fail the next flush (until then nothing is
> guaranteed to be on disk and order doesn't matter either), but with
> cache=writethrough we're in serious trouble.
>
> Have you thought about this problem? Maybe we end up having to flush the
> event-tap queue for each single write in writethrough mode.

Yes, and that's what I'm trying to do at this point.  I know that
performance matters a lot, but sacrificing reliability over
performance now isn't a good idea.  I first want to lay the
ground, and then focus on optimization.  Note that without dirty
bitmap optimization, Kemari suffers a lot in sending rams.
Anthony and I discussed to take this approach at KVM Forum.

>>> Additionally, error handling is missing.
>>
>> I looked at the codes using bdrv_flush and realized some of them
>> doesn't handle errors, but scsi-disk.c does.  Should everyone
>> handle errors or depends on the usage?
>
> I added the return code only recently, it was a void function
> previously. Probably some error handling should be added to all of them.

Ah:)  Glad to hear 

Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Kevin Wolf :
> Am 19.01.2011 14:16, schrieb Yoshiaki Tamura:
>> 2011/1/19 Kevin Wolf :
>>> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
 event-tap function is called only when it is on, and requests sent
 from device emulators.

 Signed-off-by: Yoshiaki Tamura 
 ---
  block.c |   11 +++
  1 files changed, 11 insertions(+), 0 deletions(-)

 diff --git a/block.c b/block.c
 index ff2795b..85bd8b8 100644
 --- a/block.c
 +++ b/block.c
 @@ -28,6 +28,7 @@
  #include "block_int.h"
  #include "module.h"
  #include "qemu-objects.h"
 +#include "event-tap.h"

  #ifdef CONFIG_BSD
  #include 
 @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
 *bs, int64_t sector_num,
      if (bdrv_check_request(bs, sector_num, nb_sectors))
          return NULL;

 +    if (bs->device_name && event_tap_is_on()) {
 +        return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
 +                                         cb, opaque);
 +    }
 +
      if (bs->dirty_bitmap) {
          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
                                           opaque);
>>>
>>> Just noticed the context here... Does this patch break block migration
>>> when event-tap is on?
>>
>> I don't think so.  event-tap will call bdrv_aio_writev() upon
>> flushing requests and it shouldn't affect block-migration.  The
>> block written after the synchronization should be marked as dirty
>> and should be sent in the next round.  Am I missing the point?
>
> No, that makes sense. I don't have a complete understanding of the whole
> series yet, so there may be well more misunderstandings on my side.

It's OK.  I'm glad that you're reviewing.

>>> Another question that came to my mind is if we really hook everything we
>>> need. I think we'll need to have a hook in bdrv_flush as well. I don't
>>> know if you do hook qemu_aio_flush and friends -  does a call cause
>>> event-tap to flush its queue? If not, a call to qemu_aio_flush might
>>> hang qemu because it's waiting for requests to complete which are
>>> actually stuck in the event-tap queue.
>>
>> No it doesn't queue at event-tap.  Marcelo pointed that we should
>> hook bdrv_aio_flush to avoid requests inversion, that made sense
>> to me.  Do we need to hook bdrv_flush for that same reason?  If
>
> bdrv_flush() is the synchronous version of bdrv_aio_flush(), so in
> general it seems likely that we need to do the same.

Hmm.  Because it's synchronous, we need to start synchronization
right away, and once done, flush requests queued in event-tap
then return.

>> we hook bdrv_flush and qemu_aio_flush, we're going loop forever
>> because the synchronization code is calling vm_stop that call
>> bdrv_flush_all and qemu_aio_flush.
>
> qemu_aio_flush doesn't invoke any bdrv_* functions, so I don't see why
> we would loop forever. It just waits for AIO requests to complete.
>
> I just looked up the code and I think the situation is a bit different
> than I thought originally: qemu_aio_flush waits only for completion of
> requests which belong to a driver that has registered using
> qemu_aio_set_fd_handler. So this means that AIO requests queued in
> event-tap are not considered in-flight requests and we won't get stuck
> in a loop. Maybe we have no problem in fact. :-)

I had the same thoughts.  We don't have to hook qemu_aio_flush.

> On the other hand, e.g. migration relies on the fact that after a
> qemu_aio_flush, all AIO requests that the device model has submitted are
> completed. I think event-tap must take care that the requests which are
> queued are not forgotten to be migrated. (Maybe the code already
> considers this, I'm just writing down what comes to my mind...)

That's where event-tap is calling qemu_aio_flush.  It should be
almost same as for live migration.  Requests queued in event-tap
are replayed on the secondary side, that is the core design of
Kemari.

Thanks,

Yoshi

>
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/3] KVM: fix rcu usage warning in kvm_arch_vcpu_ioctl_set_sregs()

2011-01-19 Thread Xiao Guangrong
On 01/20/2011 02:13 AM, Marcelo Tosatti wrote:
> On Wed, Jan 19, 2011 at 01:16:23PM +0800, Xiao Guangrong wrote:
>> On 01/12/2011 03:39 PM, Xiao Guangrong wrote:
>>> Fix:
>>>
>>> [ 1001.499596] ===
>>> [ 1001.499599] [ INFO: suspicious rcu_dereference_check() usage. ]
>>> [ 1001.499601] ---
>>> [ 1001.499604] include/linux/kvm_host.h:301 invoked rcu_dereference_check() 
>>> without protection!
>>> ..
>>> [ 1001.499636] Pid: 6035, comm: qemu-system-x86 Not tainted 2.6.37-rc6+ #62
>>> [ 1001.499638] Call Trace:
>>> [ 1001.499644]  [] lockdep_rcu_dereference+0x9d/0xa5
>>> [ 1001.499653]  [] gfn_to_memslot+0x8d/0xc8 [kvm]
>>> [ 1001.499661]  [] gfn_to_hva+0x16/0x3f [kvm]
>>> [ 1001.499669]  [] kvm_read_guest_page+0x1e/0x5e [kvm]
>>> [ 1001.499681]  [] kvm_read_guest_page_mmu+0x53/0x5e [kvm]
>>> [ 1001.499699]  [] load_pdptrs+0x3f/0x9c [kvm]
>>> [ 1001.499705]  [] ? vmx_set_cr0+0x507/0x517 [kvm_intel]
>>> [ 1001.499717]  [] kvm_arch_vcpu_ioctl_set_sregs+0x1f3/0x3c0 [kvm]
>>> [ 1001.499727]  [] kvm_vcpu_ioctl+0x6a5/0xbc5 [kvm]
>>>
>>> Signed-off-by: Xiao Guangrong 
>>
>> Ping ...?
> 
> Applied this fix. For the make_all_cpus_request optimization, can you
> show numbers with this new version? Because now there is LOCK# similarly
> to the spinlock.

Marcelo,

Sure :-), there is the simply test result of kernbench:

Before patch:

real5m6.493s
user3m57.847s   
sys 9m7.115s 

real5m1.750s
user4m0.109s
sys 9m10.192s   


After patch:
real5m0.140s
user3m57.956s   
sys 8m58.339s  

real4m56.314s   
user4m0.303s
sys 8m55.774s  

 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Economic Growth and Community Development

2011-01-19 Thread Grantscanada.org
New 2011 edition.

Canadian Business Publications is offering to the public a revised edition of 
the Canadian Subsidy Directory, a guide containing more than 3000 direct and 
indirect financial subsidies, grants and loans offered by government 
departments and agencies, foundations, associations and organizations.  
In this new 2011 edition all programs are well described.

The Canadian Subsidy Directory is the most comprehensive tool to start up a 
business, improve existent activities, set up a business plan, or obtain 
assistance from experts in fields such as: Industry, transport, agriculture, 
communications, municipal infrastructure, education, import-export, labor, 
construction and renovation, the service sector, hi-tech industries, research 
and development, joint ventures, arts, cinema, theatre, music and recording 
industry , the self employed, contests, and new talents.
Assistance from and for foundations and associations, guidance to prepare a 
business plan, market surveys, computers, and much more!

NEW: The Canadian Subsidy Directory (database) offers continuously updated 
information for non-profit organizations, businesses, municipalities, 
individuals and aboriginals.
This database contains more than 3000 subsidies, grants or loans offered by 
various Canadian governments agencies and foundations.
Completely hosted, no database software to download or manage.

Viewable With Your Favorite Browser, Online Anywhere, Anytime. Access from 
anywhere in the world on any internet connected PC or Mac.

Canadian Subsidy Directory (All Canada, federal + provincial + foundations)
Available in three different formats:
. CD-rom (.Acrobat Reader file, .pdf) / $69.95
. Printed (Free cd included) / $149.95
. On-line database access (1 year subscription) / $129.95

If you require any additional information, please contact:
Natasha Robillard, toll free 1-866-322-3376 
450-229-1942 (local)


This commercial email message is broadcasted by Canadian Business Publications
To remove your email from the distribution list please use: 
subventionque...@gmail.com

Ontario:
4865 Hwy 138, r.r. 1
St-Andrews west
On
K0C 2A0

Quebec:
1999 Le Skieur
Ste-Adele
Qc
J8B 2Y9

---

[FRANCAIS]

Annuaire des Subventions au Quebec 2011
Dépot légal-Bibliothèque Nationale du Québec
ISBN 978-2-922870-03-9

Les Publications Canadiennes offrent au public une édition révisée de 
l'ANNUAIRE DES SUBVENTIONS AU QUÉBEC contenant plus de 1800 programmes d'aides 
et de subventions provenant des divers paliers gouvernementaux et organismes. 
Dans la nouvelle édition 2011 on trouve la description des programmes ainsi que 
les montants alloués.

L'Annuaire des Subventions au Quebec est l'outil idéal soit pour démarrer son 
entreprise, améliorer une entreprise existante, mettre sur pied son plan 
d'affaires ou obtenir l'aide de conseillers experts dans le domaine des 
affaires:
Démarrage d'entreprises, études, recherches, arts, agriculture, import export, 
main d'oeuvre, cinéma, prêts, promotion, bourses, théatre, transports, 
communications, mise sur pied et développement d'entreprises, construction et 
rénovation, aérospatial, concours, nouveaux talents, aide aux associations, 
organismes et fondations, informatique, musique, industrie du disque, plans 
d'affaires, études de marchés, infrastructures, aide aux travailleurs autonomes 
et plus encore !

NOUVEAU: Publications Canadiennes est fière d'annoncer le lancement de 
l'Annuaire des Subventions au Québec en ligne.
Sous forme d'une base de données ce nouvel outil permet désormais d'accéder 
directement aux programmes d'aides et de subventions disponibles pour le Québec.

Continuellement mise à jour cette base de données permet aux utilisateurs de 
visualiser rapidement les programmes offerts par une multitude de gouvernements 
et 
fondations, permettant ainsi de réduire considérablement le temps consacré à la 
recherche.

L'Annuaire Des Subventions en ligne est accessible par fureteur internet et 
repose sur une plateforme Filemaker utilisant la fonction "instant web 
publishing".
Tout en étant extrêmement versatile et puissant l'Annuaire des Subventions au 
Québec est maintenant une solution des plus conviviales pour Windows, Mac et le 
Web.


Acces à la base de données en ligne / $129.95
Annuaire des Subventions au Quebec 2011 (Cd-rom,.pdf) / $ 69.95
Annuaire des Subventions au Quebec 2011 (imprimé + Cd-rom gratuit) / $ 149.95



Informations.Ligne sans frais: 866-322-3376, 450-229-1942 
(local)


--
Ce message publicitaire est diffusé par Canadian Business Publications
Pour retirer votre adresse de courriel: subventionque...@gmail.com

1999 Le Skieur
Ste-Adele
Qc
J8B 2Y9
1-866-322-3376

Ontario:
4865 Hwy 138, r.r. 1
St-Andrews west
On
K0C 2A0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.ker

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Blue Swirl
On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
 wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>>
>
> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>
> But if the underlying observation is that the device tree is not really a
> tree, you're 100% correct.  This is part of why a factory interface that
> just takes a parent bus is too simplistic.
>
> I think we ought to introduce a -pci-device option that is specifically for
> creating PCI devices that doesn't require a parent bus argument but provides
> a way to specify stable addressing (for instancing, using a linear index).

I think kvm_state should not be a property of any device or bus. It
should be split to more logical pieces.

Some parts of it could remain in CPUState, because they are associated
with a VCPU.

Also, for example irqfd could be considered to be similar object to
char or block devices provided by QEMU to devices. Would it make sense
to introduce new host types for passing parts of kvm_state to devices?

I'd also make coalesced MMIO stuff part of memory object. We are not
passing any state references when using cpu_physical_memory_rw(), but
that could be changed.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MIPS, io-thread, icount and wfi

2011-01-19 Thread Edgar E. Iglesias
On Wed, Jan 19, 2011 at 03:02:26PM -0200, Marcelo Tosatti wrote:
> On Tue, Jan 18, 2011 at 11:00:57AM +0100, Jan Kiszka wrote:
> > On 2011-01-18 01:19, Edgar E. Iglesias wrote:
> > > On Mon, Jan 17, 2011 at 11:03:08AM +0100, Edgar E. Iglesias wrote:
> > >> Hi,
> > >>
> > >> I'm running an io-thread enabled qemu-system-mipsel with icount.
> > >> When the guest (linux) goes to sleep through the wait insn (waiting
> > >> to be woken up by future timer interrupts), the thing deadlocks.
> > >>
> > >> IIUC, this is because vm timers are driven by icount, but the CPU is
> > >> halted so icount makes no progress and time stands still.
> > >>
> > >> I've locally disabled vcpu halting when icount is enabled, that
> > >> works around my problem but of course makes qemu consume 100% host cpu.
> > >>
> > >> I don't know why I only see this problem with io-thread builds?
> > >> Could be related timing and luck.
> > >>
> > >> Would be interesting to know if someone has any info on how this was
> > >> intended to work (if it was)? And if there are ideas for better
> > >> workarounds or fixes that don't disable vcpu halting entirely.
> > > 
> > > Hi,
> > > 
> > > I've found the problem. For some reason io-thread builds use a
> > > static timeout for wait loops. The entire chunk of code that
> > > makes sure qemu_icount makes forward progress when the CPU's
> > > are idle has been ifdef'ed away...
> > > 
> > > This fixes the problem for me, hopefully without affecting
> > > io-thread runs without icount.
> > > 
> > > commit 0f4f3a919952500b487b438c5520f07a1c6be35b
> > > Author: Edgar E. Iglesias 
> > > Date:   Tue Jan 18 01:01:57 2011 +0100
> > > 
> > > qemu-timer: Fix timeout calc for io-thread with icount
> > > 
> > > Make sure we always make forward progress with qemu_icount to
> > > avoid deadlocks. For io-thread, use the static 1000 timeout
> > > only if icount is disabled.
> > > 
> > > Signed-off-by: Edgar E. Iglesias 
> > > 
> > > diff --git a/qemu-timer.c b/qemu-timer.c
> > > index 95814af..db1ec49 100644
> > > --- a/qemu-timer.c
> > > +++ b/qemu-timer.c
> > > @@ -110,7 +110,6 @@ static int64_t cpu_get_clock(void)
> > >  }
> > >  }
> > >  
> > > -#ifndef CONFIG_IOTHREAD
> > >  static int64_t qemu_icount_delta(void)
> > >  {
> > >  if (!use_icount) {
> > > @@ -124,7 +123,6 @@ static int64_t qemu_icount_delta(void)
> > >  return cpu_get_icount() - cpu_get_clock();
> > >  }
> > >  }
> > > -#endif
> > >  
> > >  /* enable cpu_get_ticks() */
> > >  void cpu_enable_ticks(void)
> > > @@ -1077,9 +1075,17 @@ void quit_timers(void)
> > >  
> > >  int qemu_calculate_timeout(void)
> > >  {
> > > -#ifndef CONFIG_IOTHREAD
> > >  int timeout;
> > >  
> > > +#ifdef CONFIG_IOTHREAD
> > > +/* When using icount, making forward progress with qemu_icount when 
> > > the
> > > +   guest CPU is idle is critical. We only use the static io-thread 
> > > timeout
> > > +   for non icount runs.  */
> > > +if (!use_icount) {
> > > +return 1000;
> > > +}
> > > +#endif
> > > +
> > >  if (!vm_running)
> > >  timeout = 5000;
> > >  else {
> > > @@ -1110,8 +1116,5 @@ int qemu_calculate_timeout(void)
> > >  }
> > >  
> > >  return timeout;
> > > -#else /* CONFIG_IOTHREAD */
> > > -return 1000;
> > > -#endif
> > >  }
> > >  
> > > 
> > > 
> > 
> > This logic and timeout values were imported on iothread merge. And I bet
> > at least the timeout value of 1s (vs. 5s) can still be found in
> > qemu-kvm. Maybe someone over there can remember the rationales behind
> > choosing this value.
> > 
> > Jan
> 
> This timeout is for the main select() call. So there is not a lot
> of reasoning, how long to wait when there's no activity on the file
> descriptors.

OK, I suspected something like that. Thanks both of you for the info.
I'll give people a couple of days to complain at the patch, if noone
does I'll apply it.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 12:52 PM, Daniel P. Berrange wrote:

On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
   

On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
 

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.
   

Yeah, but replacing the main chipset will certainly change the PCI
topology such that if you're specifying bus=X and addr=X and then
also using -M pc, unless you're parsing the default topology to come
up with the addressing, it will break in the future.
 

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem
   


Right, but you expose bus addressing via the XML, no?  That means that 
if a user specifies something like '1.0', and you translate that to 
bus='pci.0',addr='1.0', then when pc-0.50 comes out and slot 1.0 is used 
for the integrated 3D VGA graphics adapter, the guest creation will fail.


If you expose topological configuration to the user, the guest will not 
continue working down the road unless you come up with a scheme where 
you map addresses to a different address range for newer pcs.


Regards,

Anthony Liguori


Regards,
Daniel
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 11:42:18AM -0600, Anthony Liguori wrote:
> On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:
> >On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> >>On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >>>On 01/18/11 18:09, Anthony Liguori wrote:
> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>The device model topology is 100% a hidden architectural detail.
> >This is true for the sysbus, it is obviously not the case for PCI and
> >similarly discoverable buses. There we have a guest-explorable topology
> >that is currently equivalent to the the qdev layout.
> But we also don't do PCI passthrough so we really haven't even explored
> how that maps in qdev. I don't know if qemu-kvm has attempted to
> qdev-ify it.
> >>>It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >>>
> >>>BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >>>work with tcg?
> >>>
> The -device interface is a stable interface. Right now, you don't
> specify any type of identifier of the pci bus when you create a PCI
> device. It's implied in the interface.
> >>>Wrong.  You can specify the bus you want attach the device to via
> >>>bus=.  This is true for *every* device, including all pci
> >>>devices.  If unspecified qdev uses the first bus it finds.
> >>>
> >>>As long as there is a single pci bus only there is simply no need
> >>>to specify it, thats why nobody does that today.
> >>Right.  In terms of specifying bus=, what are we promising re:
> >>compatibility?  Will there always be a pci.0?  If we add some
> >>PCI-to-PCI bridges in order to support more devices, is libvirt
> >>support to parse the hierarchy and figure out which bus to put the
> >>device on?
> >The answer to your questions probably differ depending on
> >whether '-nodefconfig' and '-nodefaults' are set on the
> >command line.  If they are set, then I'd expect to only
> >ever see one PCI bus with name pci.0 forever more, unless
> >i explicitly ask for more. If they are not set, then you
> >might expect to see multiple PCI buses by appear by magic
> 
> Yeah, we can't promise that.  If you use -M pc, you aren't
> guaranteed a stable PCI bus topology even with
> -nodefconfig/-nodefaults.

That's why we never use '-M pc' when actually invoking QEMU.
If the user specifies 'pc' in the XML, we canonicalize that
to the versioned alternative like 'pc-0.12' before invoking
QEMU. We also expose the list of versioned machines to apps
so they can do canonicalization themselves if desired.

Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-19 Thread Jeremy Fitzhardinge
On 01/19/2011 09:21 AM, Peter Zijlstra wrote:
> On Wed, 2011-01-19 at 22:42 +0530, Srivatsa Vaddagiri wrote:
>> Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
>>
>> KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or 
>> it
>> is woken up because of an event like interrupt.
>>
>> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu.
>>
>> The presence of these hypercalls is indicated to guest via
>> KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding
>> patch to pass up the presence of this feature to guest via cpuid. Patch to 
>> qemu
>> will be sent separately.
>
> I didn't really read the patch, and I totally forgot everything from
> when I looked at the Xen series, but does the Xen/KVM hypercall
> interface for this include the vcpu to await the kick from?
>
> My guess is not, since the ticket locks used don't know who the owner
> is, which is of course, sad. There are FIFO spinlock implementations
> that can do this though.. although I think they all have a bigger memory
> footprint.

At least in the Xen code, a current owner isn't very useful, because we
need the current owner to kick the *next* owner to life at release time,
which we can't do without some structure recording which ticket belongs
to which cpu.

(A reminder: the big problem with ticket locks is not with the current
owner getting preempted, but making sure the next VCPU gets scheduled
quickly when the current one releases; without that all the waiting
VCPUs burn the timeslices until the VCPU scheduler gets around to
scheduling the actual next in line.)

At present, the code needs to scan an array of percpu "I am waiting on
lock X with ticket Y" structures to work out who's next.  The search is
somewhat optimised by keeping a cpuset of which CPUs are actually
blocked on spinlocks, but its still going to scale badly with lots of CPUs.

I haven't thought of a good way to improve on this; an obvious approach
is to just add a pointer to the spinlock and hang an explicit linked
list off it, but that's incompatible with wanting to avoid expanding the
lock.

You could have a table of auxiliary per-lock data hashed on the lock
address, but its not clear to me that its an improvement on the array
approach, especially given the synchronization issues of keeping that
structure up to date (do we have a generic lockless hashtable
implementation?).  But perhaps its one of those things that makes sense
at larger scales.

> The reason for wanting this should be clear I guess, it allows PI.

Well, if we can expand the spinlock to include an owner, then all this
becomes moot...

J


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
> On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
> >
> >The reason we specify 'bus' is that we wanted to be flexible wrt
> >upgrades of libvirt, without needing restarts of QEMU instances
> >it manages. That way we can introduce new functionality into
> >libvirt that relies on it having previously set 'bus' on all
> >active QEMUs.
> >
> >If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
> >be adding the extra bridges. I'd expect that QEMU provided just
> >the first bridge and then libvirt would specify how many more
> >bridges to create at boot or hotplug them later. So it wouldn't
> >ever need to parse topology.
> 
> Yeah, but replacing the main chipset will certainly change the PCI
> topology such that if you're specifying bus=X and addr=X and then
> also using -M pc, unless you're parsing the default topology to come
> up with the addressing, it will break in the future.

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem

Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Small change on shell background cmd for ioquit

2011-01-19 Thread Lucas Meneghel Rodrigues
The current form of the shell one liner was causing problems
on distros such as F14. We don't actually need nohup when
executing the parallel dds.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/tests_base.cfg.sample |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 360cf80..265d51c 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -803,7 +803,7 @@ variants:
 
 - ioquit:
 type = ioquit
-background_cmd = "for i in 1 2 3 4; do (nohup dd if=/dev/urandom 
of=/tmp/file bs=102400 count=1000 &) done"
+background_cmd = "for i in 1 2 3 4; do (dd if=/dev/urandom 
of=/tmp/file bs=102400 count=1000 &); done"
 check_cmd = ps -a |grep dd
 login_timeout = 360
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-19 Thread Srivatsa Vaddagiri
On Wed, Jan 19, 2011 at 06:21:12PM +0100, Peter Zijlstra wrote:
> I didn't really read the patch, and I totally forgot everything from
> when I looked at the Xen series, but does the Xen/KVM hypercall
> interface for this include the vcpu to await the kick from?

No not yet, for reasons you mention below.

> My guess is not, since the ticket locks used don't know who the owner
> is, which is of course, sad. There are FIFO spinlock implementations
> that can do this though.. although I think they all have a bigger memory
> footprint.
> 
> The reason for wanting this should be clear I guess, it allows PI.

Yeah - that would be good to experiment with.

- vatsa
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/3] KVM: fix rcu usage warning in kvm_arch_vcpu_ioctl_set_sregs()

2011-01-19 Thread Marcelo Tosatti
On Wed, Jan 19, 2011 at 01:16:23PM +0800, Xiao Guangrong wrote:
> On 01/12/2011 03:39 PM, Xiao Guangrong wrote:
> > Fix:
> > 
> > [ 1001.499596] ===
> > [ 1001.499599] [ INFO: suspicious rcu_dereference_check() usage. ]
> > [ 1001.499601] ---
> > [ 1001.499604] include/linux/kvm_host.h:301 invoked rcu_dereference_check() 
> > without protection!
> > ..
> > [ 1001.499636] Pid: 6035, comm: qemu-system-x86 Not tainted 2.6.37-rc6+ #62
> > [ 1001.499638] Call Trace:
> > [ 1001.499644]  [] lockdep_rcu_dereference+0x9d/0xa5
> > [ 1001.499653]  [] gfn_to_memslot+0x8d/0xc8 [kvm]
> > [ 1001.499661]  [] gfn_to_hva+0x16/0x3f [kvm]
> > [ 1001.499669]  [] kvm_read_guest_page+0x1e/0x5e [kvm]
> > [ 1001.499681]  [] kvm_read_guest_page_mmu+0x53/0x5e [kvm]
> > [ 1001.499699]  [] load_pdptrs+0x3f/0x9c [kvm]
> > [ 1001.499705]  [] ? vmx_set_cr0+0x507/0x517 [kvm_intel]
> > [ 1001.499717]  [] kvm_arch_vcpu_ioctl_set_sregs+0x1f3/0x3c0 [kvm]
> > [ 1001.499727]  [] kvm_vcpu_ioctl+0x6a5/0xbc5 [kvm]
> > 
> > Signed-off-by: Xiao Guangrong 
> 
> Ping ...?

Applied this fix. For the make_all_cpus_request optimization, can you
show numbers with this new version? Because now there is LOCK# similarly
to the spinlock.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MIPS, io-thread, icount and wfi

2011-01-19 Thread Marcelo Tosatti
On Tue, Jan 18, 2011 at 11:00:57AM +0100, Jan Kiszka wrote:
> On 2011-01-18 01:19, Edgar E. Iglesias wrote:
> > On Mon, Jan 17, 2011 at 11:03:08AM +0100, Edgar E. Iglesias wrote:
> >> Hi,
> >>
> >> I'm running an io-thread enabled qemu-system-mipsel with icount.
> >> When the guest (linux) goes to sleep through the wait insn (waiting
> >> to be woken up by future timer interrupts), the thing deadlocks.
> >>
> >> IIUC, this is because vm timers are driven by icount, but the CPU is
> >> halted so icount makes no progress and time stands still.
> >>
> >> I've locally disabled vcpu halting when icount is enabled, that
> >> works around my problem but of course makes qemu consume 100% host cpu.
> >>
> >> I don't know why I only see this problem with io-thread builds?
> >> Could be related timing and luck.
> >>
> >> Would be interesting to know if someone has any info on how this was
> >> intended to work (if it was)? And if there are ideas for better
> >> workarounds or fixes that don't disable vcpu halting entirely.
> > 
> > Hi,
> > 
> > I've found the problem. For some reason io-thread builds use a
> > static timeout for wait loops. The entire chunk of code that
> > makes sure qemu_icount makes forward progress when the CPU's
> > are idle has been ifdef'ed away...
> > 
> > This fixes the problem for me, hopefully without affecting
> > io-thread runs without icount.
> > 
> > commit 0f4f3a919952500b487b438c5520f07a1c6be35b
> > Author: Edgar E. Iglesias 
> > Date:   Tue Jan 18 01:01:57 2011 +0100
> > 
> > qemu-timer: Fix timeout calc for io-thread with icount
> > 
> > Make sure we always make forward progress with qemu_icount to
> > avoid deadlocks. For io-thread, use the static 1000 timeout
> > only if icount is disabled.
> > 
> > Signed-off-by: Edgar E. Iglesias 
> > 
> > diff --git a/qemu-timer.c b/qemu-timer.c
> > index 95814af..db1ec49 100644
> > --- a/qemu-timer.c
> > +++ b/qemu-timer.c
> > @@ -110,7 +110,6 @@ static int64_t cpu_get_clock(void)
> >  }
> >  }
> >  
> > -#ifndef CONFIG_IOTHREAD
> >  static int64_t qemu_icount_delta(void)
> >  {
> >  if (!use_icount) {
> > @@ -124,7 +123,6 @@ static int64_t qemu_icount_delta(void)
> >  return cpu_get_icount() - cpu_get_clock();
> >  }
> >  }
> > -#endif
> >  
> >  /* enable cpu_get_ticks() */
> >  void cpu_enable_ticks(void)
> > @@ -1077,9 +1075,17 @@ void quit_timers(void)
> >  
> >  int qemu_calculate_timeout(void)
> >  {
> > -#ifndef CONFIG_IOTHREAD
> >  int timeout;
> >  
> > +#ifdef CONFIG_IOTHREAD
> > +/* When using icount, making forward progress with qemu_icount when the
> > +   guest CPU is idle is critical. We only use the static io-thread 
> > timeout
> > +   for non icount runs.  */
> > +if (!use_icount) {
> > +return 1000;
> > +}
> > +#endif
> > +
> >  if (!vm_running)
> >  timeout = 5000;
> >  else {
> > @@ -1110,8 +1116,5 @@ int qemu_calculate_timeout(void)
> >  }
> >  
> >  return timeout;
> > -#else /* CONFIG_IOTHREAD */
> > -return 1000;
> > -#endif
> >  }
> >  
> > 
> > 
> 
> This logic and timeout values were imported on iothread merge. And I bet
> at least the timeout value of 1s (vs. 5s) can still be found in
> qemu-kvm. Maybe someone over there can remember the rationales behind
> choosing this value.
> 
> Jan

This timeout is for the main select() call. So there is not a lot
of reasoning, how long to wait when there's no activity on the file
descriptors.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH unit-tests 1/3] Task stack pointer should point to the end of the page.

2011-01-19 Thread Marcelo Tosatti
On Tue, Jan 11, 2011 at 03:30:05PM +0200, Gleb Natapov wrote:
> 
> Signed-off-by: Gleb Natapov 
> ---
>  lib/x86/desc.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/lib/x86/desc.c b/lib/x86/desc.c
> index 1bd4421..11bd2a2 100644
> --- a/lib/x86/desc.c
> +++ b/lib/x86/desc.c
> @@ -362,7 +362,7 @@ void setup_tss32(void)
>   tss[i].cr3 = read_cr3();
>   tss[i].ss0 = tss[i].ss1 = tss[i].ss2 = 0x10;
>   tss[i].esp = tss[i].esp0 = tss[i].esp1 = tss[i].esp2 =
> - (u32)tss_stack[i];
> + (u32)tss_stack[i] + 4096;
>   tss[i].cs = 0x08;
>   tss[i].ds = tss[i].es = tss[i].fs = tss[i].gs = tss[i].ss = 
> 0x10;
>   tss[i].iomap_base = (u16)desc_size;
> -- 
> 1.7.2.3

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM test: Removing scripts/hugepage.py

2011-01-19 Thread Lucas Meneghel Rodrigues
Now that its functionality has been reimplemented as
KVM test infrastructure.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/scripts/hugepage.py |  118 --
 1 files changed, 0 insertions(+), 118 deletions(-)
 delete mode 100755 client/tests/kvm/scripts/hugepage.py

diff --git a/client/tests/kvm/scripts/hugepage.py 
b/client/tests/kvm/scripts/hugepage.py
deleted file mode 100755
index 8a1b0f6..000
--- a/client/tests/kvm/scripts/hugepage.py
+++ /dev/null
@@ -1,118 +0,0 @@
-#!/usr/bin/python
-# -*- coding: utf-8 -*-
-import os, sys, time
-
-"""
-Simple script to allocate enough hugepages for KVM testing purposes.
-"""
-
-class HugePageError(Exception):
-"""
-Simple wrapper for the builtin Exception class.
-"""
-pass
-
-
-class HugePage:
-def __init__(self, hugepage_path=None):
-"""
-Gets environment variable values and calculates the target number
-of huge memory pages.
-
-@param hugepage_path: Path where to mount hugetlbfs path, if not
-yet configured.
-"""
-self.vms = len(os.environ['KVM_TEST_vms'].split())
-self.mem = int(os.environ['KVM_TEST_mem'])
-try:
-self.max_vms = int(os.environ['KVM_TEST_max_vms'])
-except KeyError:
-self.max_vms = 0
-
-if hugepage_path:
-self.hugepage_path = hugepage_path
-else:
-self.hugepage_path = '/mnt/kvm_hugepage'
-
-self.hugepage_size = self.get_hugepage_size()
-self.target_hugepages = self.get_target_hugepages()
-print "Number of VMs this test will use: %d" % self.vms
-print "Amount of memory used by each vm: %s" % self.mem
-print ("System setting for large memory page size: %s" %
-   self.hugepage_size)
-print ("Number of large memory pages needed for this test: %s" %
-   self.target_hugepages)
-
-
-def get_hugepage_size(self):
-"""
-Get the current system setting for huge memory page size.
-"""
-meminfo = open('/proc/meminfo', 'r').readlines()
-huge_line_list = [h for h in meminfo if h.startswith("Hugepagesize")]
-try:
-return int(huge_line_list[0].split()[1])
-except ValueError, e:
-raise HugePageError("Could not get huge page size setting from "
-"/proc/meminfo: %s" % e)
-
-
-def get_target_hugepages(self):
-"""
-Calculate the target number of hugepages for testing purposes.
-"""
-if self.vms < self.max_vms:
-self.vms = self.max_vms
-# memory of all VMs plus qemu overhead of 64MB per guest
-vmsm = (self.vms * self.mem) + (self.vms * 64)
-return int(vmsm * 1024 / self.hugepage_size)
-
-
-def set_hugepages(self):
-"""
-Sets the hugepage limit to the target hugepage value calculated.
-"""
-hugepage_cfg = open("/proc/sys/vm/nr_hugepages", "r+")
-hp = hugepage_cfg.readline()
-while int(hp) < self.target_hugepages:
-loop_hp = hp
-hugepage_cfg.write(str(self.target_hugepages))
-hugepage_cfg.flush()
-hugepage_cfg.seek(0)
-hp = int(hugepage_cfg.readline())
-if loop_hp == hp:
-raise HugePageError("Cannot set the kernel hugepage setting "
-"to the target value of %d hugepages." %
-self.target_hugepages)
-hugepage_cfg.close()
-print ("Successfuly set %s large memory pages on host " %
-   self.target_hugepages)
-
-
-def mount_hugepage_fs(self):
-"""
-Verify if there's a hugetlbfs mount set. If there's none, will set up
-a hugetlbfs mount using the class attribute that defines the mount
-point.
-"""
-if not os.path.ismount(self.hugepage_path):
-if not os.path.isdir(self.hugepage_path):
-os.makedirs(self.hugepage_path)
-cmd = "mount -t hugetlbfs none %s" % self.hugepage_path
-if os.system(cmd):
-raise HugePageError("Cannot mount hugetlbfs path %s" %
-self.hugepage_path)
-
-
-def setup(self):
-self.set_hugepages()
-self.mount_hugepage_fs()
-
-
-if __name__ == "__main__":
-if len(sys.argv) < 2:
-huge_page = HugePage()
-else:
-huge_page = HugePage(sys.argv[1])
-
-huge_page.setup()
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM test: Modify enospc test to not require scripts/check_image.py v2

2011-01-19 Thread Lucas Meneghel Rodrigues
With this we prepare to remove the aforementioned script.

Changes from v1:
* Accounting for the lost of the check_image_critical
logic, now just trap kvm_vm.VMError() exceptions and
log them as an error if they happen.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/tests/enospc.py |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/client/tests/kvm/tests/enospc.py b/client/tests/kvm/tests/enospc.py
index 2d8b628..6a149f9 100644
--- a/client/tests/kvm/tests/enospc.py
+++ b/client/tests/kvm/tests/enospc.py
@@ -1,7 +1,7 @@
 import logging, commands, time, os, re
 from autotest_lib.client.common_lib import error
 from autotest_lib.client.bin import utils
-import kvm_test_utils
+import kvm_test_utils, kvm_vm
 
 
 def run_enospc(test, params, env):
@@ -46,11 +46,12 @@ def run_enospc(test, params, env):
 if "paused" in status:
 pause_n += 1
 logging.info("Checking all images in use by the VM")
-script_path = os.path.join(test.bindir, "scripts/check_image.py")
-try:
-cmd_result = utils.run('python %s' % script_path)
-except error.CmdError, e:
-logging.debug(e.result_obj.stdout)
+for image_name in vm.params.objects("images"):
+image_params = vm.params.object_params(image_name)
+try:
+kvm_vm.check_image(image_params, test.bindir)
+except kvm_vm.VMError, e:
+logging.error(e)
 logging.info("Guest paused, extending Logical Volume size")
 try:
 cmd_result = utils.run("lvextend -L +200M /dev/vgtest/lvtest")
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM test: Removing scripts/check_image.py

2011-01-19 Thread Lucas Meneghel Rodrigues
As its functionality was implemented as part of the
framework on previous patches.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/scripts/check_image.py |   99 ---
 1 files changed, 0 insertions(+), 99 deletions(-)
 delete mode 100644 client/tests/kvm/scripts/check_image.py

diff --git a/client/tests/kvm/scripts/check_image.py 
b/client/tests/kvm/scripts/check_image.py
deleted file mode 100644
index 2b5c227..000
--- a/client/tests/kvm/scripts/check_image.py
+++ /dev/null
@@ -1,99 +0,0 @@
-import os, sys, commands
-
-
-class ImageCheckError(Exception):
-"""
-Simple wrapper for the builtin Exception class.
-"""
-pass
-
-
-class ImageCheck(object):
-"""
-Check qcow2 image by qemu-img info/check command.
-"""
-def __init__(self):
-"""
-Gets params from environment variables and sets class attributes.
-"""
-self.image_path_list = []
-client_dir =  os.environ['AUTODIR']
-self.kvm_dir = os.path.join(client_dir, 'tests/kvm')
-img_to_check = os.environ['KVM_TEST_images'].split()
-
-for img in img_to_check:
-img_name_str = "KVM_TEST_image_name_%s" % img
-if not os.environ.has_key(img_name_str):
-img_name_str = "KVM_TEST_image_name"
-img_format_str = "KVM_TEST_image_format_%s" % img
-if os.environ.has_key(img_format_str):
-image_format = os.environ[img_format_str]
-else:
-image_format = os.environ['KVM_TEST_image_format']
-if image_format != "qcow2":
-continue
-image_name = os.environ[img_name_str]
-image_filename = "%s.%s" % (image_name, image_format)
-image_filename = os.path.join(self.kvm_dir, image_filename)
-self.image_path_list.append(image_filename)
-if os.environ.has_key('KVM_TEST_qemu_img_binary'):
-self.qemu_img_path = os.environ['KVM_TEST_qemu_img_binary']
-else:
-self.qemu_img_path = os.path.join(self.kvm_dir, 'qemu-img')
-self.qemu_img_check = True
-cmd = "%s |grep check" % self.qemu_img_path
-(s1, output) = commands.getstatusoutput(cmd)
-if s1:
-self.qemu_img_check = False
-print "Command qemu-img check not available, not checking..."
-cmd = "%s |grep info" % self.qemu_img_path
-(s2, output) = commands.getstatusoutput(cmd)
-if s2:
-self.qemu_img_check = False
-print "Command qemu-img info not available, not checking..."
-
-def exec_img_cmd(self, cmd_type, image_path):
-"""
-Run qemu-img info/check on given image.
-
-@param cmd_type: Sub command used together with qemu.
-@param image_path: Real path of the image.
-"""
-cmd = ' '.join([self.qemu_img_path, cmd_type, image_path])
-print "Checking image with command %s" % cmd
-(status, output) = commands.getstatusoutput(cmd)
-print output
-if status or (cmd_type == "check" and not "No errors" in output):
-msg = "Command %s failed" % cmd
-return False, msg
-else:
-return True, ''
-
-
-def check_image(self):
-"""
-Run qemu-img info/check to check the image in list.
-
-If the image checking is failed, raise an exception.
-"""
-# Check all the image in list.
-errmsg = []
-for image_path in self.image_path_list:
-if not os.path.exists(image_path):
-print "Image %s does not exist!" % image_path
-continue
-s, o = self.exec_img_cmd('info', image_path)
-if not s:
-errmsg.append(o)
-s, o = self.exec_img_cmd('check', image_path)
-if not s:
-errmsg.append(o)
-
-if len(errmsg) > 0:
-raise ImageCheckError('Errors were found, please check log!')
-
-
-if __name__ == "__main__":
-image_check = ImageCheck()
-if image_check.qemu_img_check:
-image_check.check_image()
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM test: Turning hugepages setup into KVM autotest infrastructure v2

2011-01-19 Thread Lucas Meneghel Rodrigues
So we can get rid of scripts/hugepage.py. The implementation
strategy is to have a kvm_utils.HugePageConfig class that
can do both hugepages setup and cleanup, and call it during
pre/postprocessing.

Changes from v1:
- Stylistic changes on usage of error.context as suggested
by Michael

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/kvm_preprocessing.py  |8 +++
 client/tests/kvm/kvm_utils.py  |  101 
 client/tests/kvm/tests_base.cfg.sample |3 +-
 3 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index 4a6e0f8..41455cf 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -254,6 +254,10 @@ def preprocess(test, params, env):
 logging.debug("KVM userspace version: %s" % kvm_userspace_version)
 test.write_test_keyval({"kvm_userspace_version": kvm_userspace_version})
 
+if params.get("setup_hugepages") == "yes":
+h = kvm_utils.HugePageConfig(params)
+h.setup()
+
 # Execute any pre_commands
 if params.get("pre_command"):
 process_command(test, params, env, params.get("pre_command"),
@@ -350,6 +354,10 @@ def postprocess(test, params, env):
 env["tcpdump"].close()
 del env["tcpdump"]
 
+if params.get("setup_hugepages") == "yes":
+h = kvm_utils.HugePageConfig(params)
+h.cleanup()
+
 # Execute any post_commands
 if params.get("post_command"):
 process_command(test, params, env, params.get("post_command"),
diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 78c9f25..632badb 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -1265,6 +1265,107 @@ class KvmLoggingConfig(logging_config.LoggingConfig):
 verbose=verbose)
 
 
+class HugePageConfig:
+def __init__(self, params):
+"""
+Gets environment variable values and calculates the target number
+of huge memory pages.
+
+@param params: Dict like object containing parameters for the test.
+"""
+self.vms = len(params.objects("vms"))
+self.mem = int(params.get("mem"))
+self.max_vms = int(params.get("max_vms", 0))
+self.hugepage_path = '/mnt/kvm_hugepage'
+self.hugepage_size = self.get_hugepage_size()
+self.target_hugepages = self.get_target_hugepages()
+self.kernel_hp_file = '/proc/sys/vm/nr_hugepages'
+
+
+def get_hugepage_size(self):
+"""
+Get the current system setting for huge memory page size.
+"""
+meminfo = open('/proc/meminfo', 'r').readlines()
+huge_line_list = [h for h in meminfo if h.startswith("Hugepagesize")]
+try:
+return int(huge_line_list[0].split()[1])
+except ValueError, e:
+raise ValueError("Could not get huge page size setting from "
+ "/proc/meminfo: %s" % e)
+
+
+def get_target_hugepages(self):
+"""
+Calculate the target number of hugepages for testing purposes.
+"""
+if self.vms < self.max_vms:
+self.vms = self.max_vms
+# memory of all VMs plus qemu overhead of 64MB per guest
+vmsm = (self.vms * self.mem) + (self.vms * 64)
+return int(vmsm * 1024 / self.hugepage_size)
+
+
+@error.context_aware
+def set_hugepages(self):
+"""
+Sets the hugepage limit to the target hugepage value calculated.
+"""
+error.context("setting hugepages limit to %s" % self.target_hugepages)
+hugepage_cfg = open(self.kernel_hp_file, "r+")
+hp = hugepage_cfg.readline()
+while int(hp) < self.target_hugepages:
+loop_hp = hp
+hugepage_cfg.write(str(self.target_hugepages))
+hugepage_cfg.flush()
+hugepage_cfg.seek(0)
+hp = int(hugepage_cfg.readline())
+if loop_hp == hp:
+raise ValueError("Cannot set the kernel hugepage setting "
+ "to the target value of %d hugepages." %
+ self.target_hugepages)
+hugepage_cfg.close()
+logging.debug("Successfuly set %s large memory pages on host ",
+  self.target_hugepages)
+
+
+@error.context_aware
+def mount_hugepage_fs(self):
+"""
+Verify if there's a hugetlbfs mount set. If there's none, will set up
+a hugetlbfs mount using the class attribute that defines the mount
+point.
+"""
+error.context("mounting hugepages path")
+if not os.path.ismount(self.hugepage_path):
+if not os.path.isdir(self.hugepage_path):
+os.makedirs(self.hugepage_path)
+cmd = "mount -t hugetlbfs none %s" % self.hugepage_path
+utils.sy

[PATCH 0/5] Removing scripts/check_image.py and scripts/hugepage.py

2011-01-19 Thread Lucas Meneghel Rodrigues
Fixed patchsets according to Michael's comments and re-sending them.

The purpose of this patchsets is start work on superseding and removing
scripts under the kvm test scripts/ subdirectory, by turning them regular
parts of the kvm autotest framework.

Lucas Meneghel Rodrigues (5):
  KVM test: Introduce check_image postprocess directive v2
  KVM test: Modify enospc test to not require scripts/check_image.py v2
  KVM test: Removing scripts/check_image.py
  KVM test: Turning hugepages setup into KVM autotest infrastructure v2
  KVM test: Removing scripts/hugepage.py

 client/tests/kvm/kvm_preprocessing.py   |   11 +++-
 client/tests/kvm/kvm_utils.py   |  101 ++
 client/tests/kvm/kvm_vm.py  |   60 
 client/tests/kvm/scripts/check_image.py |   99 --
 client/tests/kvm/scripts/hugepage.py|  118 ---
 client/tests/kvm/tests/enospc.py|   13 ++--
 client/tests/kvm/tests_base.cfg.sample  |9 +--
 7 files changed, 180 insertions(+), 231 deletions(-)
 delete mode 100644 client/tests/kvm/scripts/check_image.py
 delete mode 100755 client/tests/kvm/scripts/hugepage.py

-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM test: Introduce check_image postprocess directive v2

2011-01-19 Thread Lucas Meneghel Rodrigues
With check_image_foo = yes, KVM autotest will use
qemu-img to perform checks on the qcow2 image.

This new functionality intends to replace the
script check_image.py. The plan is to supersede most
of the pre/post scripts in place throughout KVM
autotest.

Changes from v1:
- We already only raise exceptions during postprocessing
only if the test passed, and that avoids masking the
original test error reason. So eliminate the
check_image_critical logic altogether, so we always raise
errors.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/kvm_preprocessing.py  |3 +-
 client/tests/kvm/kvm_vm.py |   60 
 client/tests/kvm/tests_base.cfg.sample |6 +---
 3 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index 56acf0c..4a6e0f8 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -93,11 +93,12 @@ def preprocess_vm(test, params, env, name):
 def postprocess_image(test, params):
 """
 Postprocess a single QEMU image according to the instructions in params.
-Currently this function just removes an image if requested.
 
 @param test: An Autotest test object.
 @param params: A dict containing image postprocessing parameters.
 """
+if params.get("check_image") == "yes":
+kvm_vm.check_image(params, test.bindir)
 if params.get("remove_image") == "yes":
 kvm_vm.remove_image(params, test.bindir)
 
diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 18d10ef..393f1ab 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -47,6 +47,15 @@ class VMImageMissingError(VMError):
 return "CD image file not found: %r" % self.filename
 
 
+class VMImageCheckError(VMError):
+def __init__(self, filename):
+VMError.__init__(self, filename)
+self.filename = filename
+
+def __str__(self):
+return "Errors found on image: %r" % self.filename
+
+
 class VMBadPATypeError(VMError):
 def __init__(self, pa_type):
 VMError.__init__(self, pa_type)
@@ -239,6 +248,57 @@ def remove_image(params, root_dir):
 logging.debug("Image file %s not found")
 
 
+def check_image(params, root_dir):
+"""
+Check an image using qemu-img.
+
+@param params: Dictionary containing the test parameters.
+@param root_dir: Base directory for relative filenames.
+
+@note: params should contain:
+   image_name -- the name of the image file, without extension
+   image_format -- the format of the image (qcow2, raw etc)
+
+@raise VMImageCheckError: In case qemu-img check fails on the image.
+"""
+image_filename = get_image_filename(params, root_dir)
+logging.debug("Checking image file %s..." % image_filename)
+qemu_img_cmd = kvm_utils.get_path(root_dir,
+  params.get("qemu_img_binary", 
"qemu-img"))
+image_is_qcow2 = params.get("image_format") == 'qcow2'
+if os.path.exists(image_filename) and image_is_qcow2:
+# Verifying if qemu-img supports 'check'
+q_result = utils.run(qemu_img_cmd, ignore_status=True)
+q_output = q_result.stdout
+check_img = True
+if not "check" in q_output:
+logging.error("qemu-img does not support 'check', "
+  "skipping check...")
+check_img = False
+if not "info" in q_output:
+logging.error("qemu-img does not support 'info', "
+  "skipping check...")
+check_img = False
+if check_img:
+try:
+utils.system("%s info %s" % (qemu_img_cmd, image_filename))
+except error.CmdError:
+logging.error("Error getting info from image %s",
+  image_filename)
+try:
+utils.system("%s check %s" % (qemu_img_cmd, image_filename))
+except error.CmdError:
+raise VMImageCheckError(image_filename)
+
+else:
+if not os.path.exists(image_filename):
+logging.debug("Image file %s not found, skipping check...",
+  image_filename)
+elif not image_is_qcow2:
+logging.debug("Image file %s not qcow2, skipping check...",
+  image_filename)
+
+
 class VM:
 """
 This class handles all basic VM operations.
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 28064a8..3299af7 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -2357,11 +2357,7 @@ kdump:
 variants:
 - @qcow2:
 image_format = qcow2
-post_command += " python scripts/check_image.py;"
-post_command_timeout = 600
-post_command_noncritical = yes
-ioquit:
-p

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:


The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.
   


Yeah, but replacing the main chipset will certainly change the PCI 
topology such that if you're specifying bus=X and addr=X and then also 
using -M pc, unless you're parsing the default topology to come up with 
the addressing, it will break in the future.


That's why I think something simpler like a linear index that QEMU maps 
to a static location in the topology is probably the best future proof 
interface.


Regards,

Anthony Liguori


Regards,
Daniel
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-19 Thread Peter Zijlstra
On Wed, 2011-01-19 at 22:53 +0530, Srivatsa Vaddagiri wrote:
> On Wed, Jan 19, 2011 at 10:42:39PM +0530, Srivatsa Vaddagiri wrote:
> > Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
> > 
> > KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or 
> > it
> > is woken up because of an event like interrupt.
> 
> One possibility is to extend this hypercall to do a directed yield as well, 
> which needs some more thought. Another issue that needs to be resolved with
> pv-ticketlocks is the impact on intra-VM fairness. A guest experiencing heavy
> contention can keep yielding cpu, allowing other VMs to get more time than 
> they
> deserve.

No, yield sucks, if you know the owner you can do actual PI.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 11:19 AM, Daniel P. Berrange wrote:


In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices&  their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)
   


There are two distinct things here:

1) creating exactly the same virtual machine (like for migration) given 
a newer version of QEMU


2) creating a reasonably similar virtual machine given a newer version 
of QEMU


For (1), you cannot use -M pc.  You should use things like bus=X,addr=Y 
much better is for QEMU to dump a device file and to just reuse that 
instead of guessing what you need.


For (2), you cannot use bus=X,addr=Y because it makes assumptions about 
the PCI topology which may change in newer -M pc's.


I think libvirt needs to treat this two scenarios differently to support 
forwards compatibility.


Regards,

Anthony Liguori


Daniel
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
   

On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 

On 01/18/11 18:09, Anthony Liguori wrote:
   

On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 
   

The device model topology is 100% a hidden architectural detail.
 

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.
   

But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.
 

It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't
work with tcg?

   

The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.
 

Wrong.  You can specify the bus you want attach the device to via
bus=.  This is true for *every* device, including all pci
devices.  If unspecified qdev uses the first bus it finds.

As long as there is a single pci bus only there is simply no need
to specify it, thats why nobody does that today.
   

Right.  In terms of specifying bus=, what are we promising re:
compatibility?  Will there always be a pci.0?  If we add some
PCI-to-PCI bridges in order to support more devices, is libvirt
support to parse the hierarchy and figure out which bus to put the
device on?
 

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic
   


Yeah, we can't promise that.  If you use -M pc, you aren't guaranteed a 
stable PCI bus topology even with -nodefconfig/-nodefaults.


Regards,

Anthony Liguori


Daniel
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >On 01/18/11 18:09, Anthony Liguori wrote:
> >>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>
> The device model topology is 100% a hidden architectural detail.
> >>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>similarly discoverable buses. There we have a guest-explorable topology
> >>>that is currently equivalent to the the qdev layout.
> >>
> >>But we also don't do PCI passthrough so we really haven't even explored
> >>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>qdev-ify it.
> >
> >It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >
> >BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >work with tcg?
> >
> >>The -device interface is a stable interface. Right now, you don't
> >>specify any type of identifier of the pci bus when you create a PCI
> >>device. It's implied in the interface.
> >
> >Wrong.  You can specify the bus you want attach the device to via
> >bus=.  This is true for *every* device, including all pci
> >devices.  If unspecified qdev uses the first bus it finds.
> >
> >As long as there is a single pci bus only there is simply no need
> >to specify it, thats why nobody does that today.
> 
> Right.  In terms of specifying bus=, what are we promising re:
> compatibility?  Will there always be a pci.0?  If we add some
> PCI-to-PCI bridges in order to support more devices, is libvirt
> support to parse the hierarchy and figure out which bus to put the
> device on?

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic

Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Jan Kiszka
On 2011-01-19 17:57, Anthony Liguori wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>>
> 
> It's almost arbitrary, but I would say it's the direction that I/Os flow.

We need both if we want KVM buses. They are useless for enumerating the
device on that bus the guest sees it on.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-19 Thread Srivatsa Vaddagiri
On Wed, Jan 19, 2011 at 10:42:39PM +0530, Srivatsa Vaddagiri wrote:
> Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
> 
> KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
> is woken up because of an event like interrupt.

One possibility is to extend this hypercall to do a directed yield as well, 
which needs some more thought. Another issue that needs to be resolved with
pv-ticketlocks is the impact on intra-VM fairness. A guest experiencing heavy
contention can keep yielding cpu, allowing other VMs to get more time than they
deserve.

- vatsa
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] debugfs: Add support to print u32 array

2011-01-19 Thread Srivatsa Vaddagiri
Add debugfs support to print u32-arrays.

Most of this comes from Xen-hypervisor sources, which has been refactored to
make the code common for other users as well.

Signed-off-by: Srivatsa Vaddagiri 
Signed-off-by: Suzuki Poulose 

---
 arch/x86/xen/debugfs.c|  104 
 arch/x86/xen/debugfs.h|4 -
 arch/x86/xen/mmu.c|2 
 arch/x86/xen/multicalls.c |6 +-
 arch/x86/xen/spinlock.c   |2 
 fs/debugfs/file.c |  108 ++
 include/linux/debugfs.h   |   11 
 7 files changed, 124 insertions(+), 113 deletions(-)

Index: linux-2.6.37/arch/x86/xen/debugfs.c
===
--- linux-2.6.37.orig/arch/x86/xen/debugfs.c
+++ linux-2.6.37/arch/x86/xen/debugfs.c
@@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(
return d_xen_debug;
 }
 
-struct array_data
-{
-   void *array;
-   unsigned elements;
-};
-
-static int u32_array_open(struct inode *inode, struct file *file)
-{
-   file->private_data = NULL;
-   return nonseekable_open(inode, file);
-}
-
-static size_t format_array(char *buf, size_t bufsize, const char *fmt,
-  u32 *array, unsigned array_size)
-{
-   size_t ret = 0;
-   unsigned i;
-
-   for(i = 0; i < array_size; i++) {
-   size_t len;
-
-   len = snprintf(buf, bufsize, fmt, array[i]);
-   len++;  /* ' ' or '\n' */
-   ret += len;
-
-   if (buf) {
-   buf += len;
-   bufsize -= len;
-   buf[-1] = (i == array_size-1) ? '\n' : ' ';
-   }
-   }
-
-   ret++;  /* \0 */
-   if (buf)
-   *buf = '\0';
-
-   return ret;
-}
-
-static char *format_array_alloc(const char *fmt, u32 *array, unsigned 
array_size)
-{
-   size_t len = format_array(NULL, 0, fmt, array, array_size);
-   char *ret;
-
-   ret = kmalloc(len, GFP_KERNEL);
-   if (ret == NULL)
-   return NULL;
-
-   format_array(ret, len, fmt, array, array_size);
-   return ret;
-}
-
-static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
- loff_t *ppos)
-{
-   struct inode *inode = file->f_path.dentry->d_inode;
-   struct array_data *data = inode->i_private;
-   size_t size;
-
-   if (*ppos == 0) {
-   if (file->private_data) {
-   kfree(file->private_data);
-   file->private_data = NULL;
-   }
-
-   file->private_data = format_array_alloc("%u", data->array, 
data->elements);
-   }
-
-   size = 0;
-   if (file->private_data)
-   size = strlen(file->private_data);
-
-   return simple_read_from_buffer(buf, len, ppos, file->private_data, 
size);
-}
-
-static int xen_array_release(struct inode *inode, struct file *file)
-{
-   kfree(file->private_data);
-
-   return 0;
-}
-
-static const struct file_operations u32_array_fops = {
-   .owner  = THIS_MODULE,
-   .open   = u32_array_open,
-   .release= xen_array_release,
-   .read   = u32_array_read,
-   .llseek = no_llseek,
-};
-
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-   struct dentry *parent,
-   u32 *array, unsigned elements)
-{
-   struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
-
-   if (data == NULL)
-   return NULL;
-
-   data->array = array;
-   data->elements = elements;
-
-   return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
-}
Index: linux-2.6.37/arch/x86/xen/debugfs.h
===
--- linux-2.6.37.orig/arch/x86/xen/debugfs.h
+++ linux-2.6.37/arch/x86/xen/debugfs.h
@@ -3,8 +3,4 @@
 
 struct dentry * __init xen_init_debugfs(void);
 
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-   struct dentry *parent,
-   u32 *array, unsigned elements);
-
 #endif /* _XEN_DEBUGFS_H */
Index: linux-2.6.37/arch/x86/xen/mmu.c
===
--- linux-2.6.37.orig/arch/x86/xen/mmu.c
+++ linux-2.6.37/arch/x86/xen/mmu.c
@@ -2757,7 +2757,7 @@ static int __init xen_mmu_debugfs(void)
debugfs_create_u32("mmu_update", 0444, d_mmu_debug, 
&mmu_stats.mmu_update);
debugfs_create_u32("mmu_update_extended", 0444, d_mmu_debug,
   &mmu_stats.mmu_update_extended);
-   xen_debugfs_create_u32_array("mmu_update_histo", 0444, d_mmu_debug,
+   debugfs_create_u32_array("mmu_update_histo", 0444, d_mmu_debug,
 mmu_stats.mmu_update_histo, 20);

Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-19 Thread Peter Zijlstra
On Wed, 2011-01-19 at 22:42 +0530, Srivatsa Vaddagiri wrote:
> Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
> 
> KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
> is woken up because of an event like interrupt.
> 
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu.
> 
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding
> patch to pass up the presence of this feature to guest via cpuid. Patch to 
> qemu
> will be sent separately.


I didn't really read the patch, and I totally forgot everything from
when I looked at the Xen series, but does the Xen/KVM hypercall
interface for this include the vcpu to await the kick from?

My guess is not, since the ticket locks used don't know who the owner
is, which is of course, sad. There are FIFO spinlock implementations
that can do this though.. although I think they all have a bigger memory
footprint.

The reason for wanting this should be clear I guess, it allows PI.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:54:10AM -0600, Anthony Liguori wrote:
> On 01/19/2011 07:11 AM, Markus Armbruster wrote:
> >Gerd Hoffmann  writes:
> >
> >>On 01/18/11 18:09, Anthony Liguori wrote:
> >>>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >The device model topology is 100% a hidden architectural detail.
> This is true for the sysbus, it is obviously not the case for PCI and
> similarly discoverable buses. There we have a guest-explorable topology
> that is currently equivalent to the the qdev layout.
> >>>But we also don't do PCI passthrough so we really haven't even explored
> >>>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>>qdev-ify it.
> >>It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >>
> >>BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >>work with tcg?
> >>
> >>>The -device interface is a stable interface. Right now, you don't
> >>>specify any type of identifier of the pci bus when you create a PCI
> >>>device. It's implied in the interface.
> >>Wrong.  You can specify the bus you want attach the device to via
> >>bus=.  This is true for *every* device, including all pci
> >>devices. If unspecified qdev uses the first bus it finds.
> >>
> >>As long as there is a single pci bus only there is simply no need to
> >>specify it, thats why nobody does that today.  Once q35 finally
> >>arrives this will change of course.
> >As far as I know, libvirt does it already.
> 
> I think that's a bad idea from a forward compatibility perspective.

In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices & their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)

Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] kvm guest : Add support for pv-ticketlocks

2011-01-19 Thread Srivatsa Vaddagiri
This patch extends Linux guests running on KVM hypervisor to support
pv-ticketlocks. Very early during bootup, paravirtualied KVM guest detects if 
the hypervisor has required feature (KVM_FEATURE_WAIT_FOR_KICK) to support 
pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops.

Signed-off-by: Srivatsa Vaddagiri 
Signed-off-by: Suzuki Poulose 

---
 arch/x86/Kconfig|9 +
 arch/x86/include/asm/kvm_para.h |8 +
 arch/x86/kernel/head64.c|3 
 arch/x86/kernel/kvm.c   |  208 
 4 files changed, 228 insertions(+)

Index: linux-2.6.37/arch/x86/Kconfig
===
--- linux-2.6.37.orig/arch/x86/Kconfig
+++ linux-2.6.37/arch/x86/Kconfig
@@ -508,6 +508,15 @@ config KVM_GUEST
  This option enables various optimizations for running under the KVM
  hypervisor.
 
+config KVM_DEBUG_FS
+   bool "Enable debug information for KVM Guests in debugfs"
+   depends on KVM_GUEST
+   default n
+   ---help---
+ This option enables collection of various statistics for KVM guest.
+ Statistics are displayed in debugfs filesystem. Enabling this option
+ may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT
Index: linux-2.6.37/arch/x86/include/asm/kvm_para.h
===
--- linux-2.6.37.orig/arch/x86/include/asm/kvm_para.h
+++ linux-2.6.37/arch/x86/include/asm/kvm_para.h
@@ -162,8 +162,16 @@ static inline unsigned int kvm_arch_para
 
 #ifdef CONFIG_KVM_GUEST
 void __init kvm_guest_init(void);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_guest_early_init(void);
+#else
+#define kvm_guest_early_init() do { } while (0)
+#endif
+
 #else
 #define kvm_guest_init() do { } while (0)
+#define kvm_guest_early_init() do { } while (0)
 #endif
 
 #endif /* __KERNEL__ */
Index: linux-2.6.37/arch/x86/kernel/head64.c
===
--- linux-2.6.37.orig/arch/x86/kernel/head64.c
+++ linux-2.6.37/arch/x86/kernel/head64.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -118,6 +119,8 @@ void __init x86_64_start_reservations(ch
 
reserve_ebda_region();
 
+   kvm_guest_early_init();
+
/*
 * At this point everything still needed from the boot loader
 * or BIOS or kernel text should be early reserved or marked not
Index: linux-2.6.37/arch/x86/kernel/kvm.c
===
--- linux-2.6.37.orig/arch/x86/kernel/kvm.c
+++ linux-2.6.37/arch/x86/kernel/kvm.c
@@ -238,3 +238,211 @@ void __init kvm_guest_init(void)
 
paravirt_ops_setup();
 }
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+#ifdef CONFIG_KVM_DEBUG_FS
+
+#include 
+#include 
+
+static struct kvm_spinlock_stats
+{
+   u32 taken_slow;
+   u32 taken_slow_pickup;
+
+   u32 released_slow;
+   u32 released_slow_kicked;
+
+#define HISTO_BUCKETS  30
+   u32 histo_spin_blocked[HISTO_BUCKETS+1];
+
+   u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+   if (unlikely(zero_stats)) {
+   memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+   zero_stats = 0;
+   }
+}
+
+#define ADD_STATS(elem, val)   \
+   do { check_zero(); spinlock_stats.elem += (val); } while (0)
+
+static inline u64 spin_time_start(void)
+{
+   return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+   unsigned index = ilog2(delta);
+
+   check_zero();
+
+   if (index < HISTO_BUCKETS)
+   array[index]++;
+   else
+   array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+   u32 delta = sched_clock() - start;
+
+   __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+   spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+   d_kvm_debug = debugfs_create_dir("kvm", NULL);
+   if (!d_kvm_debug)
+   printk(KERN_WARNING "Could not create 'kvm' debugfs 
directory\n");
+
+   return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+   struct dentry *d_kvm = kvm_init_debugfs();
+
+   if (d_kvm == NULL)
+   return -ENOMEM;
+
+   d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+   debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+   debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+  &spinlock_stats.taken_slow);
+   debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+  &spinlock_stats.taken_slow_pickup);
+
+   debugfs_create_u32("released_slow", 

[PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-19 Thread Srivatsa Vaddagiri
Add two hypercalls to KVM hypervisor to support pv-ticketlocks.

KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
is woken up because of an event like interrupt.

KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu.

The presence of these hypercalls is indicated to guest via
KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding
patch to pass up the presence of this feature to guest via cpuid. Patch to qemu
will be sent separately.


Signed-off-by: Srivatsa Vaddagiri 
Signed-off-by: Suzuki Poulose 

---
 arch/x86/include/asm/kvm_para.h |2 +
 arch/x86/kvm/x86.c  |   69 +++-
 include/linux/kvm.h |1 
 include/linux/kvm_host.h|1 
 include/linux/kvm_para.h|2 +
 virt/kvm/kvm_main.c |1 
 6 files changed, 75 insertions(+), 1 deletion(-)

Index: linux-2.6.37/arch/x86/include/asm/kvm_para.h
===
--- linux-2.6.37.orig/arch/x86/include/asm/kvm_para.h
+++ linux-2.6.37/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,8 @@
 #define KVM_FEATURE_CLOCKSOURCE0
 #define KVM_FEATURE_NOP_IO_DELAY   1
 #define KVM_FEATURE_MMU_OP 2
+#define KVM_FEATURE_WAIT_FOR_KICK   4
+
 /* This indicates that the new set of kvmclock msrs
  * are available. The use of 0x11 and 0x12 is deprecated
  */
Index: linux-2.6.37/arch/x86/kvm/x86.c
===
--- linux-2.6.37.orig/arch/x86/kvm/x86.c
+++ linux-2.6.37/arch/x86/kvm/x86.c
@@ -1922,6 +1922,7 @@ int kvm_dev_ioctl_check_extension(long e
case KVM_CAP_DEBUGREGS:
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
+   case KVM_CAP_WAIT_FOR_KICK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -2340,7 +2341,8 @@ static void do_cpuid_ent(struct kvm_cpui
entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
 (1 << KVM_FEATURE_NOP_IO_DELAY) |
 (1 << KVM_FEATURE_CLOCKSOURCE2) |
-(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1 << KVM_FEATURE_WAIT_FOR_KICK);
entry->ebx = 0;
entry->ecx = 0;
entry->edx = 0;
@@ -4766,6 +4768,63 @@ int kvm_hv_hypercall(struct kvm_vcpu *vc
return 1;
 }
 
+/*
+ * kvm_pv_wait_for_kick_op : Block until kicked by either a KVM_HC_KICK_CPU
+ * hypercall or a event like interrupt.
+ *
+ * @vcpu : vcpu which is blocking.
+ */
+static void kvm_pv_wait_for_kick_op(struct kvm_vcpu *vcpu)
+{
+   DEFINE_WAIT(wait);
+
+   /*
+* Blocking on vcpu->wq allows us to wake up sooner if required to
+* service pending events (like interrupts).
+*
+* Also set state to TASK_INTERRUPTIBLE before checking vcpu->kicked to
+* avoid racing with kvm_pv_kick_cpu_op().
+*/
+   prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+
+   do {
+   /*
+* Somebody has already tried kicking us. Acknowledge that
+* and terminate the wait.
+*/
+   if (vcpu->kicked) {
+   vcpu->kicked = 0;
+   break;
+   }
+
+   /* Let's wait for either KVM_HC_KICK_CPU or someother event
+* to wake us up.
+*/
+
+   srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
+   schedule();
+   vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+   } while (0);
+
+   finish_wait(&vcpu->wq, &wait);
+}
+
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @cpu - vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
+{
+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);
+
+   if (vcpu) {
+   vcpu->kicked = 1;
+   wake_up_interruptible(&vcpu->wq);
+   }
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
unsigned long nr, a0, a1, a2, a3, ret;
@@ -4802,6 +4861,14 @@ int kvm_emulate_hypercall(struct kvm_vcp
case KVM_HC_MMU_OP:
r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
break;
+   case KVM_HC_WAIT_FOR_KICK:
+   kvm_pv_wait_for_kick_op(vcpu);
+   ret = 0;
+   break;
+   case KVM_HC_KICK_CPU:
+   kvm_pv_kick_cpu_op(vcpu->kvm, a0);
+   ret = 0;
+   break;
default:
ret = -KVM_ENOSYS;
break;
Index: linux-2.6.37/include/linux/kvm.h
===
--- linux-2.6.37.orig/include/linux/kvm.h
+++ linux-2.6.37/include/linux/kvm.h
@@ -540,6 +540,7 @@ struct kvm_ppc_

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 07:15 AM, Markus Armbruster wrote:

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?
   


It's almost arbitrary, but I would say it's the direction that I/Os flow.

But if the underlying observation is that the device tree is not really 
a tree, you're 100% correct.  This is part of why a factory interface 
that just takes a parent bus is too simplistic.


I think we ought to introduce a -pci-device option that is specifically 
for creating PCI devices that doesn't require a parent bus argument but 
provides a way to specify stable addressing (for instancing, using a 
linear index).


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 07:11 AM, Markus Armbruster wrote:

Gerd Hoffmann  writes:

   

On 01/18/11 18:09, Anthony Liguori wrote:
 

On 01/18/2011 10:56 AM, Jan Kiszka wrote:
   
 

The device model topology is 100% a hidden architectural detail.
   

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.
 

But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.
   

It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't
work with tcg?

 

The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.
   

Wrong.  You can specify the bus you want attach the device to via
bus=.  This is true for *every* device, including all pci
devices. If unspecified qdev uses the first bus it finds.

As long as there is a single pci bus only there is simply no need to
specify it, thats why nobody does that today.  Once q35 finally
arrives this will change of course.
 

As far as I know, libvirt does it already.
   


I think that's a bad idea from a forward compatibility perspective.

Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >On 01/18/11 18:09, Anthony Liguori wrote:
> >>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>
> The device model topology is 100% a hidden architectural detail.
> >>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>similarly discoverable buses. There we have a guest-explorable topology
> >>>that is currently equivalent to the the qdev layout.
> >>
> >>But we also don't do PCI passthrough so we really haven't even explored
> >>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>qdev-ify it.
> >
> >It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >
> >BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >work with tcg?
> >
> >>The -device interface is a stable interface. Right now, you don't
> >>specify any type of identifier of the pci bus when you create a PCI
> >>device. It's implied in the interface.
> >
> >Wrong.  You can specify the bus you want attach the device to via
> >bus=.  This is true for *every* device, including all pci
> >devices.  If unspecified qdev uses the first bus it finds.
> >
> >As long as there is a single pci bus only there is simply no need
> >to specify it, thats why nobody does that today.
> 
> Right.  In terms of specifying bus=, what are we promising re:
> compatibility?  Will there always be a pci.0?  If we add some
> PCI-to-PCI bridges in order to support more devices, is libvirt
> support to parse the hierarchy and figure out which bus to put the
> device on?

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.

Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:

On 01/18/11 18:09, Anthony Liguori wrote:

On 01/18/2011 10:56 AM, Jan Kiszka wrote:



The device model topology is 100% a hidden architectural detail.

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.


But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.


It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't 
work with tcg?



The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.


Wrong.  You can specify the bus you want attach the device to via 
bus=.  This is true for *every* device, including all pci 
devices.  If unspecified qdev uses the first bus it finds.


As long as there is a single pci bus only there is simply no need to 
specify it, thats why nobody does that today.


Right.  In terms of specifying bus=, what are we promising re: 
compatibility?  Will there always be a pci.0?  If we add some PCI-to-PCI 
bridges in order to support more devices, is libvirt support to parse 
the hierarchy and figure out which bus to put the device on?


Regards,

Anthony Liguori


  Once q35 finally arrives this will change of course.

cheers,
  Gerd


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] PV ticket locks without expanding spinlock

2011-01-19 Thread Srivatsa Vaddagiri
On Tue, Nov 16, 2010 at 01:08:31PM -0800, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge 
> 
> Hi all,
> 
> This is a revised version of the pvticket lock series.

The 3-patch series to follow this email extends KVM-hypervisor and Linux guest 
running on KVM-hypervisor to support pv-ticket spinlocks.

Two hypercalls are being introduced in KVM hypervisor, one that allows a
vcpu (spinning on a lock) to block and another that allows a vcpu to kick
another out of blocking state.

Patches are against 2.6.37 mainline kernel. I also don't yet have numbers 
at this time to show benefit of pv-ticketlocks - I would think the benefit
should be similar across hypervisors (Xen and KVM in this case).

- vatsa
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] device-assignment: Properly terminate vmsd.fields

2011-01-19 Thread Juan Quintela
Alex Williamson  wrote:
> The vmsd code expects the fields structure to be properly terminated,
> not NULL.  An assigned device should never be saved or restored, and
> recent qemu fixes to the no_migrate flag should ensure this, but let's
> avoid setting the wrong precedent.
>
> Signed-off-by: Alex Williamson 

Acked-by: Juan Quintela 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add bootindex parameter to assigned device

2011-01-19 Thread Marcelo Tosatti
On Mon, Jan 10, 2011 at 11:54:54AM +0200, Gleb Natapov wrote:
> 
> Signed-off-by: Gleb Natapov 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index e97f565..0038526 100644

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 14:16, schrieb Yoshiaki Tamura:
> 2011/1/19 Kevin Wolf :
>> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>>> event-tap function is called only when it is on, and requests sent
>>> from device emulators.
>>>
>>> Signed-off-by: Yoshiaki Tamura 
>>> ---
>>>  block.c |   11 +++
>>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index ff2795b..85bd8b8 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -28,6 +28,7 @@
>>>  #include "block_int.h"
>>>  #include "module.h"
>>>  #include "qemu-objects.h"
>>> +#include "event-tap.h"
>>>
>>>  #ifdef CONFIG_BSD
>>>  #include 
>>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
>>> *bs, int64_t sector_num,
>>>  if (bdrv_check_request(bs, sector_num, nb_sectors))
>>>  return NULL;
>>>
>>> +if (bs->device_name && event_tap_is_on()) {
>>> +return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>>> + cb, opaque);
>>> +}
>>> +
>>>  if (bs->dirty_bitmap) {
>>>  blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>>>   opaque);
>>
>> Just noticed the context here... Does this patch break block migration
>> when event-tap is on?
> 
> I don't think so.  event-tap will call bdrv_aio_writev() upon
> flushing requests and it shouldn't affect block-migration.  The
> block written after the synchronization should be marked as dirty
> and should be sent in the next round.  Am I missing the point?

No, that makes sense. I don't have a complete understanding of the whole
series yet, so there may be well more misunderstandings on my side.

>> Another question that came to my mind is if we really hook everything we
>> need. I think we'll need to have a hook in bdrv_flush as well. I don't
>> know if you do hook qemu_aio_flush and friends -  does a call cause
>> event-tap to flush its queue? If not, a call to qemu_aio_flush might
>> hang qemu because it's waiting for requests to complete which are
>> actually stuck in the event-tap queue.
> 
> No it doesn't queue at event-tap.  Marcelo pointed that we should
> hook bdrv_aio_flush to avoid requests inversion, that made sense
> to me.  Do we need to hook bdrv_flush for that same reason?  If

bdrv_flush() is the synchronous version of bdrv_aio_flush(), so in
general it seems likely that we need to do the same.

> we hook bdrv_flush and qemu_aio_flush, we're going loop forever
> because the synchronization code is calling vm_stop that call
> bdrv_flush_all and qemu_aio_flush.

qemu_aio_flush doesn't invoke any bdrv_* functions, so I don't see why
we would loop forever. It just waits for AIO requests to complete.

I just looked up the code and I think the situation is a bit different
than I thought originally: qemu_aio_flush waits only for completion of
requests which belong to a driver that has registered using
qemu_aio_set_fd_handler. So this means that AIO requests queued in
event-tap are not considered in-flight requests and we won't get stuck
in a loop. Maybe we have no problem in fact. :-)

On the other hand, e.g. migration relies on the fact that after a
qemu_aio_flush, all AIO requests that the device model has submitted are
completed. I think event-tap must take care that the requests which are
queued are not forgotten to be migrated. (Maybe the code already
considers this, I'm just writing down what comes to my mind...)

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm.git BUG - 'Two devices with same boot index 0'

2011-01-19 Thread Lucas Meneghel Rodrigues
Hi guys, I've noticed the following regression on qemu.git HEAD:

VMCreateError: VM creation command failed:
"/root/autotest/client/tests/kvm/qemu
-name 'vm1'
-monitor
unix:'/tmp/monitor-humanmonitor1-20110119-083240-ocD9',server,nowait
-serial unix:'/tmp/serial-20110119-083240-ocD9',server,nowait
-drive
file='/tmp/kvm_autotest_root/images/f14-64.qcow2',index=0,if=virtio,cache=none,boot=on
 
device virtio-net-pci,netdev=idyBlEoF,mac='9a:52:ec:f7:49:f4' -netdev
user,id=idyBlEoF,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:12323
-m 512
-smp 2
-drive
file='/tmp/kvm_autotest_root/isos/linux/Fedora-14-x86_64-DVD.iso',media=cdrom,index=2
-drive
file='/tmp/kvm_autotest_root/images/f14-64/ks.iso',media=cdrom,index=1
-kernel '/tmp/kvm_autotest_root/images/f14-64/vmlinuz'
-initrd '/tmp/kvm_autotest_root/images/f14-64/initrd.img'
-vnc :1
-boot d
--append 'ks=cdrom nicdelay=60 console=ttyS0,115200 console=tty0'"
(status: 1,output: 'Two devices with same boot index 0\n')

I took the liberty of splitting up the qemu-kvm command line into blocks
so I could verify the problem. So, after looking at the command line
components, it is evident that kvm autotest constructed a valid, correct
command line and somehow during the processing of the command line
qemu-kvm thinks 2 devices have the same boot index, when in reality they
don't.

Please let's look into this. I am available for any further
clarifications.

Cheers!

Lucas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 14:04, schrieb Yoshiaki Tamura:
>>> +static void event_tap_blk_flush(EventTapBlkReq *blk_req)
>>> +{
>>> +BlockDriverState *bs;
>>> +
>>> +bs = bdrv_find(blk_req->device_name);
>>
>> Please store the BlockDriverState in blk_req. This code loops over all
>> block devices and does a string comparison - and that for each request.
>> You can also save the qemu_strdup() when creating the request.
>>
>> In the few places where you really need the device name (might be the
>> case for load/save, I'm not sure), you can still get it from the
>> BlockDriverState.
> 
> I would do so for the primary side.  Although we haven't
> implemented yet, we want to replay block requests from block
> layer on the secondary side, and need device name to restore
> BlockDriverState.

Hm, I see. I'm not happy about it, but I don't have a suggestion right
away how to avoid it.

>>
>>> +
>>> +if (blk_req->is_flush) {
>>> +bdrv_aio_flush(bs, blk_req->reqs[0].cb, blk_req->reqs[0].opaque);
>>
>> You need to handle errors. If bdrv_aio_flush returns NULL, call the
>> callback with -EIO.
> 
> I'll do so.
> 
>>
>>> +return;
>>> +}
>>> +
>>> +bdrv_aio_writev(bs, blk_req->reqs[0].sector, blk_req->reqs[0].qiov,
>>> +blk_req->reqs[0].nb_sectors, blk_req->reqs[0].cb,
>>> +blk_req->reqs[0].opaque);
>>
>> Same here.
>>
>>> +bdrv_flush(bs);
>>
>> This looks really strange. What is this supposed to do?
>>
>> One point is that you write it immediately after bdrv_aio_write, so you
>> get an fsync for which you don't know if it includes the current write
>> request or if it doesn't. Which data do you want to get flushed to the disk?
> 
> I was expecting to flush the aio request that was just initiated.
> Am I misunderstanding the function?

Seems so. The function names don't use really clear terminology either,
so you're not the first one to fall in this trap. Basically we have:

* qemu_aio_flush() waits for all AIO requests to complete. I think you
wanted to have exactly this, but only for a single block device. Such a
function doesn't exist yet.

* bdrv_flush() makes sure that all successfully completed requests are
written to disk (by calling fsync)

* bdrv_aio_flush() is the asynchronous version of bdrv_flush, i.e. run
the fsync in the thread pool

>> The other thing is that you introduce a bdrv_flush for each request,
>> basically forcing everyone to something very similar to writethrough
>> mode. I'm sure this will have a big impact on performance.
> 
> The reason is to avoid inversion of queued requests.  Although
> processing one-by-one is heavy, wouldn't having requests flushed
> to disk out of order break the disk image?

No, that's fine. If a guest issues two requests at the same time, they
may complete in any order. You just need to make sure that you don't
call the completion callback before the request really has completed.

I'm just starting to wonder if the guest won't timeout the requests if
they are queued for too long. Even more, with IDE, it can only handle
one request at a time, so not completing requests doesn't sound like a
good idea at all. In what intervals is the event-tap queue flushed?

On the other hand, if you complete before actually writing out, you
don't get timeouts, but you signal success to the guest when the request
could still fail. What would you do in this case? With a writeback cache
mode we're fine, we can just fail the next flush (until then nothing is
guaranteed to be on disk and order doesn't matter either), but with
cache=writethrough we're in serious trouble.

Have you thought about this problem? Maybe we end up having to flush the
event-tap queue for each single write in writethrough mode.

>> Additionally, error handling is missing.
> 
> I looked at the codes using bdrv_flush and realized some of them
> doesn't handle errors, but scsi-disk.c does.  Should everyone
> handle errors or depends on the usage?

I added the return code only recently, it was a void function
previously. Probably some error handling should be added to all of them.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 3/4] qmp,nmi: convert do_inject_nmi() to QObject

2011-01-19 Thread Luiz Capitulino
On Mon, 10 Jan 2011 17:28:14 +0800
Lai Jiangshan  wrote:

> Make we can inject NMI via qemu-monitor-protocol.
> We use "inject-nmi" for the qmp command name, the meaning is clearer.
> 
> Signed-off-by:  Lai Jiangshan 
> ---
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index a49fcd4..4db413d 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -724,7 +724,8 @@ ETEXI
>  .args_type  = "cpu-index:i?",
>  .params = "[cpu]",
>  .help   = "inject an NMI on all CPUs or the given CPU",
> -.mhandler.cmd = do_inject_nmi,
> +.user_print = monitor_user_noop,
> +.mhandler.cmd_new = do_inject_nmi,
>  },
>  #endif
>  STEXI
> diff --git a/monitor.c b/monitor.c
> index 952f67f..1bee840 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2517,7 +2517,7 @@ static void do_wav_capture(Monitor *mon, const QDict 
> *qdict)
>  #endif
>  
>  #if defined(TARGET_I386)
> -static void do_inject_nmi(Monitor *mon, const QDict *qdict)
> +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
> **ret_data)
>  {
>  CPUState *env;
>  int cpu_index;
> @@ -2525,15 +2525,17 @@ static void do_inject_nmi(Monitor *mon, const QDict 
> *qdict)
>  if (!qdict_get(qdict, "cpu-index")) {
>  for (env = first_cpu; env != NULL; env = env->next_cpu)
>  cpu_interrupt(env, CPU_INTERRUPT_NMI);
> -return;
> +return 0;
>  }
>  
>  cpu_index = qdict_get_int(qdict, "cpu-index");
>  for (env = first_cpu; env != NULL; env = env->next_cpu)
>  if (env->cpu_index == cpu_index) {
>  cpu_interrupt(env, CPU_INTERRUPT_NMI);
> -break;
> +return 0;
>  }
> +
> +return -1;
>  }
>  #endif
>  
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 56c4d8b..c2d619c 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -429,6 +429,33 @@ Example:
>  
>  EQMP
>  
> +#if defined(TARGET_I386)
> +{
> +.name   = "inject_nmi",

Please use an hyphen.

> +.args_type  = "cpu-index:i?",
> +.params = "[cpu]",
> +.help   = "inject an NMI on all CPUs or the given CPU",
> +.user_print = monitor_user_noop,
> +.mhandler.cmd_new = do_inject_nmi,
> +},
> +#endif
> +SQMP
> +inject_nmi
> +--
> +
> +Inject an NMI on the given CPU (x86 only).

Please, explain that we can also inject on all CPUs.

> +
> +Arguments:
> +
> +- "cpu_index": the index of the CPU to be injected NMI (json-int)

It's actually "cpu-index", and you should write "(json-int, optional)".

> +
> +Example:
> +
> +-> { "execute": "inject_nmi", "arguments": { "cpu-index": 0 } }

Hyphen.

> +<- { "return": {} }
> +
> +EQMP
> +
>  {
>  .name   = "migrate",
>  .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 2/4] nmi: make cpu-index argument optional

2011-01-19 Thread Luiz Capitulino

Sorry for the long delay on this one, in general looks good, I have just
a few small comments.

On Mon, 10 Jan 2011 17:27:51 +0800
Lai Jiangshan  wrote:

> When the argument "cpu-index" is not given,
> then "nmi" command will inject NMI on all CPUs.

Please, state that we're changing the human monitor behavior on this.

> This simulate the nmi button on physical machine.
> 
> Thanks to Markus Armbruster for correcting the logic
> detecting "cpu-index" is given or not.
> 
> Signed-off-by:  Lai Jiangshan 
> ---
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 99b96a8..a49fcd4 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -721,9 +721,9 @@ ETEXI
>  #if defined(TARGET_I386)
>  {
>  .name   = "nmi",
> -.args_type  = "cpu-index:i",
> -.params = "cpu",
> -.help   = "inject an NMI on the given CPU",
> +.args_type  = "cpu-index:i?",
> +.params = "[cpu]",
> +.help   = "inject an NMI on all CPUs or the given CPU",

IMO, it's better to be a bit more clear, something like: "Inject an NMI on all
CPUs if no argument is given, otherwise inject it on the specified CPU".

>  .mhandler.cmd = do_inject_nmi,
>  },
>  #endif
> diff --git a/monitor.c b/monitor.c
> index fd18887..952f67f 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2520,8 +2520,15 @@ static void do_wav_capture(Monitor *mon, const QDict 
> *qdict)
>  static void do_inject_nmi(Monitor *mon, const QDict *qdict)
>  {
>  CPUState *env;
> -int cpu_index = qdict_get_int(qdict, "cpu-index");
> +int cpu_index;
>  
> +if (!qdict_get(qdict, "cpu-index")) {

Please, use qdict_haskey().

> +for (env = first_cpu; env != NULL; env = env->next_cpu)
> +cpu_interrupt(env, CPU_INTERRUPT_NMI);
> +return;
> +}
> +
> +cpu_index = qdict_get_int(qdict, "cpu-index");
>  for (env = first_cpu; env != NULL; env = env->next_cpu)
>  if (env->cpu_index == cpu_index) {
>  cpu_interrupt(env, CPU_INTERRUPT_NMI);
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Kevin Wolf :
> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>> event-tap function is called only when it is on, and requests sent
>> from device emulators.
>>
>> Signed-off-by: Yoshiaki Tamura 
>> ---
>>  block.c |   11 +++
>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index ff2795b..85bd8b8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -28,6 +28,7 @@
>>  #include "block_int.h"
>>  #include "module.h"
>>  #include "qemu-objects.h"
>> +#include "event-tap.h"
>>
>>  #ifdef CONFIG_BSD
>>  #include 
>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
>> *bs, int64_t sector_num,
>>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>>          return NULL;
>>
>> +    if (bs->device_name && event_tap_is_on()) {
>> +        return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>> +                                         cb, opaque);
>> +    }
>> +
>>      if (bs->dirty_bitmap) {
>>          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>>                                           opaque);
>
> Just noticed the context here... Does this patch break block migration
> when event-tap is on?

I don't think so.  event-tap will call bdrv_aio_writev() upon
flushing requests and it shouldn't affect block-migration.  The
block written after the synchronization should be marked as dirty
and should be sent in the next round.  Am I missing the point?

> Another question that came to my mind is if we really hook everything we
> need. I think we'll need to have a hook in bdrv_flush as well. I don't
> know if you do hook qemu_aio_flush and friends -  does a call cause
> event-tap to flush its queue? If not, a call to qemu_aio_flush might
> hang qemu because it's waiting for requests to complete which are
> actually stuck in the event-tap queue.

No it doesn't queue at event-tap.  Marcelo pointed that we should
hook bdrv_aio_flush to avoid requests inversion, that made sense
to me.  Do we need to hook bdrv_flush for that same reason?  If
we hook bdrv_flush and qemu_aio_flush, we're going loop forever
because the synchronization code is calling vm_stop that call
bdrv_flush_all and qemu_aio_flush.

Yoshi

>
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Markus Armbruster
Anthony Liguori  writes:

> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>
>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>  
 On 2011-01-12 11:31, Jan Kiszka wrote:


> Am 12.01.2011 11:22, Avi Kivity wrote:
>
>  
>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>
>>
>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>> The devices can get at KVMState through the BusState.
>>>
>>>  
>> There is no kvm bus in a PC (I looked).  We're bending the device model
>> here because a device is implemented in the kernel and not in
>> userspace.  An implementation detail is magnified beyond all proportions.
>>
>> An ioapic that is implemented by kvm lives in exactly the same place
>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>> it elsewhere, not through creating imaginary buses that don't exist.
>>
>>
>>
> Exactly.
>
> So we can either "infect" the whole device tree with kvm (or maybe a
> more generic accelerator structure that also deals with Xen) or we need
> to pull the reference inside the device's init function from some global
> service (kvm_get_state).
>
>  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.


>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>> It can get to kvm_state through that bus.
>>>
>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>> argument should not be an issue.
>>>
>>>  
>> This design is in conflict with the requirement to attach KVM-assisted
>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>> bus. We don't support multi-homed qdev devices.
>>
>
> The bus topology reflects how I/O flows in and out of a device.  We do
> not model a perfect PC bus architecture and I don't think we ever
> intend to.  Instead, we model a functional architecture.
>
> I/O from an assigned device does not flow through the emulated PCI
> bus.  Therefore, it does not belong on the emulated PCI bus.
>
> Assigned devices need to interact with the emulated PCI bus, but they
> shouldn't be children of it.

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Markus Armbruster
Gerd Hoffmann  writes:

> On 01/18/11 18:09, Anthony Liguori wrote:
>> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>>
 The device model topology is 100% a hidden architectural detail.
>>> This is true for the sysbus, it is obviously not the case for PCI and
>>> similarly discoverable buses. There we have a guest-explorable topology
>>> that is currently equivalent to the the qdev layout.
>>
>> But we also don't do PCI passthrough so we really haven't even explored
>> how that maps in qdev. I don't know if qemu-kvm has attempted to
>> qdev-ify it.
>
> It is qdev-ified.  It is a normal pci device from qdev's point of view.
>
> BTW: is there any reason why (vfio-based) pci passthrough couldn't
> work with tcg?
>
>> The -device interface is a stable interface. Right now, you don't
>> specify any type of identifier of the pci bus when you create a PCI
>> device. It's implied in the interface.
>
> Wrong.  You can specify the bus you want attach the device to via
> bus=.  This is true for *every* device, including all pci
> devices. If unspecified qdev uses the first bus it finds.
>
> As long as there is a single pci bus only there is simply no need to
> specify it, thats why nobody does that today.  Once q35 finally
> arrives this will change of course.

As far as I know, libvirt does it already.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Markus Armbruster
Anthony Liguori  writes:

> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>
>>> The device model topology is 100% a hidden architectural detail.
>>>  
>> This is true for the sysbus, it is obviously not the case for PCI and
>> similarly discoverable buses. There we have a guest-explorable topology
>> that is currently equivalent to the the qdev layout.
>>
>
> But we also don't do PCI passthrough so we really haven't even
> explored how that maps in qdev.  I don't know if qemu-kvm has
> attempted to qdev-ify it.
>
 Management and analysis tools must be able to traverse the system buses
 and find guest devices this way.

>>> We need to provide a compatible interface to the guest.  If you agree
>>> with my above statements, then you'll also agree that we can do this
>>> without keeping the device model topology stable.
>>>
>>> But we also need to provide a compatible interface to management tools.
>>> Exposing the device model topology as a compatible interface
>>> artificially limits us.  It's far better to provide higher level
>>> supported interfaces to give us the flexibility to change the device
>>> model as we need to.
>>>  
>> How do you want to change qdev to keep the guest and management tool
>> view stable while branching off kvm sub-buses?
>
> The qdev device model is not a stable interface.  I think that's been
> clear from the very beginning.
>
>>   Please propose such
>> extensions so that they can be discussed. IIUC, that would be second
>> relation between qdev and qbus instances besides the physical topology.
>> What further use cases (besides passing kvm_state around) do you have in
>> mind?
>>
>
> The -device interface is a stable interface.  Right now, you don't
> specify any type of identifier of the pci bus when you create a PCI
> device.  It's implied in the interface.

Now I'm confused.  Isn't "-device FOO,bus=pci.0" specifying the PCI bus?

[...]
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Kevin Wolf :
> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>> event-tap controls when to start FT transaction, and provides proxy
>> functions to called from net/block devices.  While FT transaction, it
>> queues up net/block requests, and flush them when the transaction gets
>> completed.
>>
>> Signed-off-by: Yoshiaki Tamura 
>> Signed-off-by: OHMURA Kei 
>
> One general comment: On the first glance this seems to mix block and net
> (and some other things) arbitrarily instead of having a section for
> handling all block stuff, then network, etc.
>
> Is there a specific reason for the order in which you put the functions?
> If not, maybe reordering them might improve readability.

Thanks.  I'll rework on that.

>
>> ---
>>  Makefile.target |    1 +
>>  event-tap.c     |  847 
>> +++
>>  event-tap.h     |   42 +++
>>  qemu-tool.c     |   24 ++
>>  trace-events    |    9 +
>>  5 files changed, 923 insertions(+), 0 deletions(-)
>>  create mode 100644 event-tap.c
>>  create mode 100644 event-tap.h
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index e15b1c4..f36cd75 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -199,6 +199,7 @@ obj-y += rwhandler.o
>>  obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>>  obj-$(CONFIG_NO_KVM) += kvm-stub.o
>>  LIBS+=-lz
>> +obj-y += event-tap.o
>>
>>  QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
>>  QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
>> diff --git a/event-tap.c b/event-tap.c
>> new file mode 100644
>> index 000..f492708
>> --- /dev/null
>> +++ b/event-tap.c
>
>> @@ -0,0 +1,847 @@
>> +/*
>> + * Event Tap functions for QEMU
>> + *
>> + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "qemu-error.h"
>> +#include "block.h"
>> +#include "block_int.h"
>> +#include "ioport.h"
>> +#include "osdep.h"
>> +#include "sysemu.h"
>> +#include "hw/hw.h"
>> +#include "net.h"
>> +#include "event-tap.h"
>> +#include "trace.h"
>> +
>> +enum EVENT_TAP_STATE {
>> +    EVENT_TAP_OFF,
>> +    EVENT_TAP_ON,
>> +    EVENT_TAP_FLUSH,
>> +    EVENT_TAP_LOAD,
>> +    EVENT_TAP_REPLAY,
>> +};
>> +
>> +static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
>> +static BlockDriverAIOCB dummy_acb; /* we may need a pool for dummies */
>
> Indeed, bdrv_aio_cancel will segfault this way.
>
> If you use dummies instead of real ACBs the only way to correctly
> implement bdrv_aio_cancel is waiting for all in-flight AIOs
> (qemu_aio_flush).

So I need to insert a new event_tap function to bdrv_aio_cancel
to do that.

>
>> +typedef struct EventTapIOport {
>> +    uint32_t address;
>> +    uint32_t data;
>> +    int      index;
>> +} EventTapIOport;
>> +
>> +#define MMIO_BUF_SIZE 8
>> +
>> +typedef struct EventTapMMIO {
>> +    uint64_t address;
>> +    uint8_t  buf[MMIO_BUF_SIZE];
>> +    int      len;
>> +} EventTapMMIO;
>> +
>> +typedef struct EventTapNetReq {
>> +    char *device_name;
>> +    int iovcnt;
>> +    struct iovec *iov;
>> +    int vlan_id;
>> +    bool vlan_needed;
>> +    bool async;
>> +    NetPacketSent *sent_cb;
>> +} EventTapNetReq;
>> +
>> +#define MAX_BLOCK_REQUEST 32
>> +
>> +typedef struct EventTapBlkReq {
>> +    char *device_name;
>> +    int num_reqs;
>> +    int num_cbs;
>> +    bool is_flush;
>> +    BlockRequest reqs[MAX_BLOCK_REQUEST];
>> +    BlockDriverCompletionFunc *cb[MAX_BLOCK_REQUEST];
>> +    void *opaque[MAX_BLOCK_REQUEST];
>> +} EventTapBlkReq;
>> +
>> +#define EVENT_TAP_IOPORT (1 << 0)
>> +#define EVENT_TAP_MMIO   (1 << 1)
>> +#define EVENT_TAP_NET    (1 << 2)
>> +#define EVENT_TAP_BLK    (1 << 3)
>> +
>> +#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1)
>> +
>> +typedef struct EventTapLog {
>> +    int mode;
>> +    union {
>> +        EventTapIOport ioport;
>> +        EventTapMMIO mmio;
>> +    };
>> +    union {
>> +        EventTapNetReq net_req;
>> +        EventTapBlkReq blk_req;
>> +    };
>> +    QTAILQ_ENTRY(EventTapLog) node;
>> +} EventTapLog;
>> +
>> +static EventTapLog *last_event_tap;
>> +
>> +static QTAILQ_HEAD(, EventTapLog) event_list;
>> +static QTAILQ_HEAD(, EventTapLog) event_pool;
>> +
>> +static int (*event_tap_cb)(void);
>> +static QEMUBH *event_tap_bh;
>> +static VMChangeStateEntry *vmstate;
>> +
>> +static void event_tap_bh_cb(void *p)
>> +{
>> +    if (event_tap_cb) {
>> +        event_tap_cb();
>> +    }
>> +
>> +    qemu_bh_delete(event_tap_bh);
>> +    event_tap_bh = NULL;
>> +}
>> +
>> +static void event_tap_schedule_bh(void)
>> +{
>> +    trace_event_tap_ignore_bh(!!event_tap_bh);
>> +
>> +    /* if bh is already set, we ignore it for now */
>> +    if (event_tap_bh) {
>> +        return;
>> +    }
>> +
>> +    event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL);
>> +    qemu_bh_schedule(event_tap_bh);
>> +
>> +    return ;
>> +}
>> +
>> +static void event_tap_alloc_net_r

Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Kevin Wolf :
> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>> event-tap function is called only when it is on, and requests sent
>> from device emulators.
>>
>> Signed-off-by: Yoshiaki Tamura 
>> ---
>>  block.c |   11 +++
>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index ff2795b..85bd8b8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -28,6 +28,7 @@
>>  #include "block_int.h"
>>  #include "module.h"
>>  #include "qemu-objects.h"
>> +#include "event-tap.h"
>>
>>  #ifdef CONFIG_BSD
>>  #include 
>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
>> *bs, int64_t sector_num,
>>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>>          return NULL;
>>
>> +    if (bs->device_name && event_tap_is_on()) {
>
> bs->device_name is a pointer to a char array contained in bs, so it's
> never NULL. You probably mean *bs->device_name?

Yes, thanks for pointing out.  It didn't expose because
event_tap_is_on() was false upon flushing after synchronization.

Yoshi

>
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 27052] Module KVM : unable to handle kernel NULL pointer dereference at

2011-01-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=27052





--- Comment #3 from prochazka   2011-01-19 
11:34:21 ---
Sorry, 
witout hugepage, bug is alway here : 






rmap_remove: 8802455bfff8 0->BUG
[ cut here ]
kernel BUG at arch/x86/kvm/mmu.c:695!
invalid opcode:  [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
CPU 2
Modules linked in: kvm_intel kvm

Pid: 28761, comm: qemu Not tainted 2.6.37 #3 MS-9192-01S/Express5800/120Rj-2
[N8100-1407E]
RIP: 0010:[]  [] drop_spte+0x1de/0x1f0
[kvm]
RSP: 0018:88078db35a18  EFLAGS: 00010292
RAX: 002b RBX: 8802455bfff8 RCX: 0003
RDX: 81d970c8 RSI: 0082 RDI: 0246
RBP: 88078db35a28 R08: 000106f1 R09: 
R10:  R11: 000f R12: 8801cd2c8000
R13: 010147fc R14: 88078da98000 R15: 88078db35a84
FS:  7f4085c02710() GS:8800cfc8() knlGS:
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: 1806107a CR3: 0007641ee000 CR4: 26e0
DR0: 0001 DR1: 0002 DR2: 0001
DR3: 000a DR6: 0ff0 DR7: 0400
Process qemu (pid: 28761, threadinfo 88078db34000, task 8801b8264000)
Stack:
 0ff8 88077abdb280 88078db35ab8 a0021075
 00040001 add2 006d5f42 0001
  ea01 88078db35a78 001f010031ed
Call Trace:
 [] paging32_sync_page+0xe5/0x1c0 [kvm]
 [] __kvm_sync_page+0x5a/0xb0 [kvm]
 [] mmu_sync_children+0x249/0x350 [kvm]
 [] ? kvm_mmu_pte_write+0x29a/0xaa0 [kvm]
 [] ? seg_base+0x1a/0x30 [kvm]
 [] ? mmu_free_roots+0xc2/0x180 [kvm]
 [] ? kvm_mmu_get_page+0x4b5/0x710 [kvm]
 [] mmu_sync_roots+0xc8/0x160 [kvm]
 [] kvm_mmu_load+0x80/0x420 [kvm]
 [] kvm_arch_vcpu_ioctl_run+0xc95/0xe20 [kvm]
 [] ? native_load_tr_desc+0x11/0x20
 [] ? kvm_arch_vcpu_load+0x50/0x140 [kvm]
 [] kvm_vcpu_ioctl+0x561/0x860 [kvm]
 [] ? schedule+0x31c/0x990
 [] ? kvm_vm_ioctl+0x0/0x3e0 [kvm]
 [] do_vfs_ioctl+0xa7/0x560
 [] ? sys_futex+0xce/0x170
 [] sys_ioctl+0x4f/0x80
 [] system_call_fastpath+0x16/0x1b
Code: e1 0f 0b eb fe 48 89 de 48 c7 c7 4e ab 03 a0 31 c0 e8 2a 20 99 e1 0f 0b
eb fe 48 89 de 48 c7 c7 33 ab 03 a0 31 c0 e8 15 20 99 e1 <0f> 0b eb fe 66 66 66
66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
RIP  [] drop_spte+0x1de/0x1f0 [kvm]
 RSP 
---[ end trace 75c63c522243bec6 ]---
rmap_remove: 8807d245fff8 0->BUG
[ cut here ]
kernel BUG at arch/x86/kvm/mmu.c:695!
invalid opcode:  [#2] SMP
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
CPU 4
Modules linked in: kvm_intel kvm

Pid: 17775, comm: qemu Tainted: G  D 2.6.37 #3
MS-9192-01S/Express5800/120Rj-2 [N8100-1407E]
RIP: 0010:[]  [] drop_spte+0x1de/0x1f0
[kvm]
RSP: 0018:88002646ba18  EFLAGS: 00010292
RAX: 002b RBX: 8807d245fff8 RCX: 0003
RDX: 81d970c8 RSI: 0082 RDI: 0246
RBP: 88002646ba28 R08: 00011256 R09: 
R10:  R11: 000f R12: 88002645c000
R13: 098d67fc R14: 8800264e R15: 88002646ba84
FS:  7ff5b0c75710() GS:8800cfd0() knlGS:
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: e2248000 CR3: 26435000 CR4: 26e0
DR0: 00a0 DR1:  DR2: 0003
DR3: 00b0 DR6: 0ff0 DR7: 0400
Process qemu (pid: 17775, threadinfo 88002646a000, task 88005d4dc000)
Stack:
 0ff8 8801b7ef10a0 88002646bab8 a0021075
 0001 0001045c 00228e71 0001
  ea01 88002646ba78 0008010031ed
Call Trace:
 [] paging32_sync_page+0xe5/0x1c0 [kvm]
 [] __kvm_sync_page+0x5a/0xb0 [kvm]
 [] mmu_sync_children+0x249/0x350 [kvm]
 [] ? x86_emulate_insn+0x1e41/0x6350 [kvm]
 [] ? seg_base+0x1a/0x30 [kvm]
 [] ? mmu_free_roots+0xc2/0x180 [kvm]
 [] ? kvm_mmu_get_page+0x4b5/0x710 [kvm]
 [] mmu_sync_roots+0xc8/0x160 [kvm]
 [] kvm_mmu_load+0x80/0x420 [kvm]
 [] kvm_arch_vcpu_ioctl_run+0xc95/0xe20 [kvm]
 [] ? kvm_arch_vcpu_load+0x50/0x140 [kvm]
 [] kvm_vcpu_ioctl+0x561/0x860 [kvm]
 [] ? sys_sendto+0x138/0x140
 [] do_vfs_ioctl+0xa7/0x560
 [] sys_ioctl+0x4f/0x80
 [] system_call_fastpath+0x16/0x1b
Code: e1 0f 0b eb fe 48 89 de 48 c7 c7 4e ab 03 a0 31 c0 e8 2a 20 99 e1 0f 0b
eb fe 48 89 de 48 c7 c7 33 ab 03 a0 31 c0 e8 15 20 99 e1 <0f> 0b eb fe 66 66 66
66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
RIP  [] drop_spte+0x1de/0x1f0 [kvm]
 RSP 
---[ end trace 75c63c522243bec7 ]---
DEV-10.98.98.1:~#

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this lis

[Bug 27052] Module KVM : unable to handle kernel NULL pointer dereference at

2011-01-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=27052





--- Comment #2 from prochazka   2011-01-19 
11:26:16 ---
it seems without hugepage, i can not reproduce this bugs

(  -mem-prealloc -mem-path /hugepages  )

/usr/local/bin/qemu -name R005 -vga std -net
tap,vlan=0,name=interne,ifname=vmtap5 -net
nic,vlan=0,macaddr=ac:de:48:3f:74:73,model=rtl8139 -localtime -usb -usbdevice
tablet -vnc 10.98.98.1:105 -monitor tcp:127.0.0.1:10105,server,nowait,nodelay
-m 256 -pidfile /var/run/qemu/R005.pid -net
vde,port=55,vlan=5,sock=/tmpsafe/neoswitch_bridge,name=externe -net
nic,vlan=5,macaddr=ac:de:48:15:c2:f3,model=rtl8139 -rtc base=localtime -drive
file=/mnt/vdisk/images/VM-R005.1294325971.722755,index=0,media=disk,snapshot=on,cache=writeback
-drive
file=/swapfile-guest/swap1,if=ide,index=1,media=disk,snapshot=on,boot=off -fda
fat:floppy:/mnt/vdisk/diskconf/R005

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 27052] Module KVM : unable to handle kernel NULL pointer dereference at

2011-01-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=27052





--- Comment #1 from prochazka   2011-01-19 
11:05:05 ---

cpuinfo and cmdline : 

/usr/local/bin/qemu -name R005 -vga std -net
tap,vlan=0,name=interne,ifname=vmtap5 -net
nic,vlan=0,macaddr=ac:de:48:3f:74:73,model=rtl8139 -localtime -usb -usbdevice
tablet -vnc 10.98.98.1:105 -monitor tcp:127.0.0.1:10105,server,nowait,nodelay
-m 256 -pidfile /var/run/qemu/R005.pid -net
vde,port=55,vlan=5,sock=/tmpsafe/neoswitch_bridge,name=externe -net
nic,vlan=5,macaddr=ac:de:48:15:c2:f3,model=rtl8139 -mem-prealloc -mem-path
/hugepages -rtc base=localtime -drive
file=/mnt/vdisk/images/VM-R005.1294325971.722755,index=0,media=disk,snapshot=on,cache=writeback
-drive
file=/swapfile-guest/swap1,if=ide,index=1,media=disk,snapshot=on,boot=off -fda
fat:floppy:/mnt/vdisk/diskconf/R005


DEV-10.98.98.1:~# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Xeon(R) CPU   E5420  @ 2.50GHz
stepping: 6
cpu MHz : 2493.297
cache size  : 6144 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow vnmi
flexpriority
bogomips: 4986.59
clflush size: 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Xeon(R) CPU   E5420  @ 2.50GHz
stepping: 6
cpu MHz : 2493.297
cache size  : 6144 KB
physical id : 1
siblings: 4
core id : 0
cpu cores   : 4
apicid  : 4
initial apicid  : 4
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow vnmi
flexpriority
bogomips: 4987.73
clflush size: 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Xeon(R) CPU   E5420  @ 2.50GHz
stepping: 6
cpu MHz : 2493.297
cache size  : 6144 KB
physical id : 0
siblings: 4
core id : 1
cpu cores   : 4
apicid  : 1
initial apicid  : 1
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow vnmi
flexpriority
bogomips: 4987.66
clflush size: 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Xeon(R) CPU   E5420  @ 2.50GHz
stepping: 6
cpu MHz : 2493.297
cache size  : 6144 KB
physical id : 1
siblings: 4
core id : 1
cpu cores   : 4
apicid  : 5
initial apicid  : 5
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow vnmi
flexpriority
bogomips: 4987.67
clflush size: 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

processor   : 4
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Xeon(R) CPU   E5420  @ 2.50GHz
stepping: 6
cpu MHz : 2493.297
cache size  : 6144 KB
physical id : 0
siblings: 4
core id : 2
cpu cores   : 4
apicid  : 2
initial apicid  : 2
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush d

[Bug 27052] New: Module KVM : unable to handle kernel NULL pointer dereference at

2011-01-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=27052

   Summary: Module KVM : unable to handle kernel NULL pointer
dereference at
   Product: Virtualization
   Version: unspecified
Kernel Version: 2.6.37
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: blocking
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: prochazka.nico...@gmail.com
Regression: No


Hello, 
last git tree for qemu-kvm, kernel 2.6.37 , after some time guests ( Windows
XP) qemu-kvm vms causes this : 
( it seems to be specifiq on some server, here NecEXPRESS 5800 ) It is
difficult to me to reproduce this problem ( not access to this server ) . On
this server, bug is reproductible 100 % 


BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [] gfn_to_rmap+0x2a/0x70 [kvm]
PGD 1ca9c3067 PUD 1ccba0067 PMD 0
Oops:  [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
CPU 3
Modules linked in: kvm_intel kvm

Pid: 17700, comm: qemu Not tainted 2.6.37 #3 MS-9192-01S/Express5800/120Rh-1
[N8100-F]
RIP: 0010:[]  [] gfn_to_rmap+0x2a/0x70
[kvm]
RSP: 0018:8801caa179f8  EFLAGS: 00010246
RAX:  RBX: f001 RCX: 8801ca14e948
RDX:  RSI: f001 RDI: 000fee01
RBP: 8801caa17a08 R08: 8801ca14e000 R09: 0022
R10:  R11: 0001 R12: 0001
R13: 09204ffc R14: 8801ca99 R15: 8801caa17a84
FS:  7f0c0ab6f710() GS:8800cfcc() knlGS:
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2:  CR3: 0001cb8b7000 CR4: 26e0
DR0: 00a0 DR1:  DR2: 0003
DR3: 00b0 DR6: 0ff0 DR7: 0400
Process qemu (pid: 17700, threadinfo 8801caa16000, task 8802463b4000)
Stack:
 88019f29fff8 8801ca954000 8801caa17a28 a001c44c
 0ff8 880208c2d820 8801caa17ab8 a0021075
 00040001 5f99 00424f99 0001
Call Trace:
 [] drop_spte+0x7c/0x1f0 [kvm]
 [] paging32_sync_page+0xe5/0x1c0 [kvm]
 [] __kvm_sync_page+0x5a/0xb0 [kvm]
 [] mmu_sync_children+0x249/0x350 [kvm]
 [] ? kvm_mmu_pte_write+0x29a/0xaa0 [kvm]
 [] ? seg_base+0x1a/0x30 [kvm]
 [] ? mmu_free_roots+0xc2/0x180 [kvm]
 [] ? kvm_mmu_get_page+0x4b5/0x710 [kvm]
 [] mmu_sync_roots+0xc8/0x160 [kvm]
 [] kvm_mmu_load+0x80/0x420 [kvm]
 [] kvm_arch_vcpu_ioctl_run+0xc95/0xe20 [kvm]
 [] ? wake_futex+0x40/0x60
 [] ? kvm_arch_vcpu_load+0x50/0x140 [kvm]
 [] kvm_vcpu_ioctl+0x561/0x860 [kvm]
 [] ? schedule+0x31c/0x990
 [] do_vfs_ioctl+0xa7/0x560
 [] ? fput+0x34/0x280
 [] ? sys_futex+0xce/0x170
 [] sys_ioctl+0x4f/0x80
 [] system_call_fastpath+0x16/0x1b
Code: 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 0f 1f 44 00 00 41
89 d4 48 89 f3 e8af 3d fe ff 41 83 fc 01 48
89 c2 75 1a <48> 2b 18 48 8d 04 dd 00 00 00 00 48 03 42 18 48 8b 1c
   24 4c 8b
RIP  [] gfn_to_rmap+0x2a/0x70 [kvm]
 RSP 
CR2: 
---[ end trace 02041dca60973834 ]---

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Gerd Hoffmann

On 01/18/11 18:09, Anthony Liguori wrote:

On 01/18/2011 10:56 AM, Jan Kiszka wrote:



The device model topology is 100% a hidden architectural detail.

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.


But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.


It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't work 
with tcg?



The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.


Wrong.  You can specify the bus you want attach the device to via 
bus=.  This is true for *every* device, including all pci devices. 
 If unspecified qdev uses the first bus it finds.


As long as there is a single pci bus only there is simply no need to 
specify it, thats why nobody does that today.  Once q35 finally arrives 
this will change of course.


cheers,
  Gerd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
> event-tap function is called only when it is on, and requests sent
> from device emulators.
> 
> Signed-off-by: Yoshiaki Tamura 
> ---
>  block.c |   11 +++
>  1 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index ff2795b..85bd8b8 100644
> --- a/block.c
> +++ b/block.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "module.h"
>  #include "qemu-objects.h"
> +#include "event-tap.h"
>  
>  #ifdef CONFIG_BSD
>  #include 
> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
> *bs, int64_t sector_num,
>  if (bdrv_check_request(bs, sector_num, nb_sectors))
>  return NULL;
>  
> +if (bs->device_name && event_tap_is_on()) {
> +return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
> + cb, opaque);
> +}
> +
>  if (bs->dirty_bitmap) {
>  blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>   opaque);

Just noticed the context here... Does this patch break block migration
when event-tap is on?

Another question that came to my mind is if we really hook everything we
need. I think we'll need to have a hook in bdrv_flush as well. I don't
know if you do hook qemu_aio_flush and friends -  does a call cause
event-tap to flush its queue? If not, a call to qemu_aio_flush might
hang qemu because it's waiting for requests to complete which are
actually stuck in the event-tap queue.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
> event-tap controls when to start FT transaction, and provides proxy
> functions to called from net/block devices.  While FT transaction, it
> queues up net/block requests, and flush them when the transaction gets
> completed.
> 
> Signed-off-by: Yoshiaki Tamura 
> Signed-off-by: OHMURA Kei 

One general comment: On the first glance this seems to mix block and net
(and some other things) arbitrarily instead of having a section for
handling all block stuff, then network, etc.

Is there a specific reason for the order in which you put the functions?
If not, maybe reordering them might improve readability.

> ---
>  Makefile.target |1 +
>  event-tap.c |  847 
> +++
>  event-tap.h |   42 +++
>  qemu-tool.c |   24 ++
>  trace-events|9 +
>  5 files changed, 923 insertions(+), 0 deletions(-)
>  create mode 100644 event-tap.c
>  create mode 100644 event-tap.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index e15b1c4..f36cd75 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -199,6 +199,7 @@ obj-y += rwhandler.o
>  obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>  obj-$(CONFIG_NO_KVM) += kvm-stub.o
>  LIBS+=-lz
> +obj-y += event-tap.o
>  
>  QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
>  QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
> diff --git a/event-tap.c b/event-tap.c
> new file mode 100644
> index 000..f492708
> --- /dev/null
> +++ b/event-tap.c

> @@ -0,0 +1,847 @@
> +/*
> + * Event Tap functions for QEMU
> + *
> + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu-common.h"
> +#include "qemu-error.h"
> +#include "block.h"
> +#include "block_int.h"
> +#include "ioport.h"
> +#include "osdep.h"
> +#include "sysemu.h"
> +#include "hw/hw.h"
> +#include "net.h"
> +#include "event-tap.h"
> +#include "trace.h"
> +
> +enum EVENT_TAP_STATE {
> +EVENT_TAP_OFF,
> +EVENT_TAP_ON,
> +EVENT_TAP_FLUSH,
> +EVENT_TAP_LOAD,
> +EVENT_TAP_REPLAY,
> +};
> +
> +static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
> +static BlockDriverAIOCB dummy_acb; /* we may need a pool for dummies */

Indeed, bdrv_aio_cancel will segfault this way.

If you use dummies instead of real ACBs the only way to correctly
implement bdrv_aio_cancel is waiting for all in-flight AIOs
(qemu_aio_flush).

> +typedef struct EventTapIOport {
> +uint32_t address;
> +uint32_t data;
> +int  index;
> +} EventTapIOport;
> +
> +#define MMIO_BUF_SIZE 8
> +
> +typedef struct EventTapMMIO {
> +uint64_t address;
> +uint8_t  buf[MMIO_BUF_SIZE];
> +int  len;
> +} EventTapMMIO;
> +
> +typedef struct EventTapNetReq {
> +char *device_name;
> +int iovcnt;
> +struct iovec *iov;
> +int vlan_id;
> +bool vlan_needed;
> +bool async;
> +NetPacketSent *sent_cb;
> +} EventTapNetReq;
> +
> +#define MAX_BLOCK_REQUEST 32
> +
> +typedef struct EventTapBlkReq {
> +char *device_name;
> +int num_reqs;
> +int num_cbs;
> +bool is_flush;
> +BlockRequest reqs[MAX_BLOCK_REQUEST];
> +BlockDriverCompletionFunc *cb[MAX_BLOCK_REQUEST];
> +void *opaque[MAX_BLOCK_REQUEST];
> +} EventTapBlkReq;
> +
> +#define EVENT_TAP_IOPORT (1 << 0)
> +#define EVENT_TAP_MMIO   (1 << 1)
> +#define EVENT_TAP_NET(1 << 2)
> +#define EVENT_TAP_BLK(1 << 3)
> +
> +#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1)
> +
> +typedef struct EventTapLog {
> +int mode;
> +union {
> +EventTapIOport ioport;
> +EventTapMMIO mmio;
> +};
> +union {
> +EventTapNetReq net_req;
> +EventTapBlkReq blk_req;
> +};
> +QTAILQ_ENTRY(EventTapLog) node;
> +} EventTapLog;
> +
> +static EventTapLog *last_event_tap;
> +
> +static QTAILQ_HEAD(, EventTapLog) event_list;
> +static QTAILQ_HEAD(, EventTapLog) event_pool;
> +
> +static int (*event_tap_cb)(void);
> +static QEMUBH *event_tap_bh;
> +static VMChangeStateEntry *vmstate;
> +
> +static void event_tap_bh_cb(void *p)
> +{
> +if (event_tap_cb) {
> +event_tap_cb();
> +}
> +
> +qemu_bh_delete(event_tap_bh);
> +event_tap_bh = NULL;
> +}
> +
> +static void event_tap_schedule_bh(void)
> +{
> +trace_event_tap_ignore_bh(!!event_tap_bh);
> +
> +/* if bh is already set, we ignore it for now */
> +if (event_tap_bh) {
> +return;
> +}
> +
> +event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL);
> +qemu_bh_schedule(event_tap_bh);
> +
> +return ;
> +}
> +
> +static void event_tap_alloc_net_req(EventTapNetReq *net_req,
> +   VLANClientState *vc,
> +   const struct iovec *iov, int iovcnt,
> +   NetPacketSent *sent_cb, bool async)
> +{
> +int i;
> +
> +net_req->iovcnt = iovcnt;
> +net_req->async 

Re: [PATCH 1/2] KVM test: Turning hugepages setup into KVM autotest infrastructure

2011-01-19 Thread Michael Goldish
On 01/19/2011 03:56 AM, Lucas Meneghel Rodrigues wrote:
> So we can get rid of scripts/hugepage.py. The implementation
> strategy is to have a kvm_utils.HugePageConfig class that
> can do both hugepages setup and cleanup, and call it during
> pre/postprocessing.
> 
> Signed-off-by: Lucas Meneghel Rodrigues 
> ---
>  client/tests/kvm/kvm_preprocessing.py  |8 +++
>  client/tests/kvm/kvm_utils.py  |  102 
> 
>  client/tests/kvm/tests_base.cfg.sample |3 +-
>  3 files changed, 111 insertions(+), 2 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_preprocessing.py 
> b/client/tests/kvm/kvm_preprocessing.py
> index 4a6e0f8..41455cf 100644
> --- a/client/tests/kvm/kvm_preprocessing.py
> +++ b/client/tests/kvm/kvm_preprocessing.py
> @@ -254,6 +254,10 @@ def preprocess(test, params, env):
>  logging.debug("KVM userspace version: %s" % kvm_userspace_version)
>  test.write_test_keyval({"kvm_userspace_version": kvm_userspace_version})
>  
> +if params.get("setup_hugepages") == "yes":
> +h = kvm_utils.HugePageConfig(params)
> +h.setup()
> +
>  # Execute any pre_commands
>  if params.get("pre_command"):
>  process_command(test, params, env, params.get("pre_command"),
> @@ -350,6 +354,10 @@ def postprocess(test, params, env):
>  env["tcpdump"].close()
>  del env["tcpdump"]
>  
> +if params.get("setup_hugepages") == "yes":
> +h = kvm_utils.HugePageConfig(params)
> +h.cleanup()
> +
>  # Execute any post_commands
>  if params.get("post_command"):
>  process_command(test, params, env, params.get("post_command"),
> diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
> index 78c9f25..b61ff94 100644
> --- a/client/tests/kvm/kvm_utils.py
> +++ b/client/tests/kvm/kvm_utils.py
> @@ -1265,6 +1265,108 @@ class KvmLoggingConfig(logging_config.LoggingConfig):
>  verbose=verbose)
>  
>  
> +class HugePageConfig:
> +def __init__(self, params):
> +"""
> +Gets environment variable values and calculates the target number
> +of huge memory pages.
> +
> +@param params: Dict like object containing parameters for the test.
> +"""
> +self.vms = len(params.objects("vms"))
> +self.mem = int(params.get("mem"))
> +self.max_vms = int(params.get("max_vms", 0))
> +self.hugepage_path = '/mnt/kvm_hugepage'
> +self.hugepage_size = self.get_hugepage_size()
> +self.target_hugepages = self.get_target_hugepages()
> +self.kernel_hp_file = '/proc/sys/vm/nr_hugepages'
> +
> +
> +def get_hugepage_size(self):
> +"""
> +Get the current system setting for huge memory page size.
> +"""
> +meminfo = open('/proc/meminfo', 'r').readlines()
> +huge_line_list = [h for h in meminfo if h.startswith("Hugepagesize")]
> +try:
> +return int(huge_line_list[0].split()[1])
> +except ValueError, e:
> +raise ValueError("Could not get huge page size setting from "
> + "/proc/meminfo: %s" % e)
> +
> +
> +def get_target_hugepages(self):
> +"""
> +Calculate the target number of hugepages for testing purposes.
> +"""
> +if self.vms < self.max_vms:
> +self.vms = self.max_vms
> +# memory of all VMs plus qemu overhead of 64MB per guest
> +vmsm = (self.vms * self.mem) + (self.vms * 64)
> +return int(vmsm * 1024 / self.hugepage_size)
> +
> +
> +@error.context_aware
> +def set_hugepages(self):
> +"""
> +Sets the hugepage limit to the target hugepage value calculated.
> +"""
> +error.base_context("Setting hugepages limit to %s" %
> +   self.target_hugepages)
> +hugepage_cfg = open(self.kernel_hp_file, "r+")
> +hp = hugepage_cfg.readline()
> +while int(hp) < self.target_hugepages:
> +loop_hp = hp
> +hugepage_cfg.write(str(self.target_hugepages))
> +hugepage_cfg.flush()
> +hugepage_cfg.seek(0)
> +hp = int(hugepage_cfg.readline())
> +if loop_hp == hp:
> +raise ValueError("Cannot set the kernel hugepage setting "
> + "to the target value of %d hugepages." %
> + self.target_hugepages)
> +hugepage_cfg.close()
> +logging.debug("Successfuly set %s large memory pages on host ",
> +  self.target_hugepages)
> +
> +
> +@error.context_aware
> +def mount_hugepage_fs(self):
> +"""
> +Verify if there's a hugetlbfs mount set. If there's none, will set up
> +a hugetlbfs mount using the class attribute that defines the mount
> +point.
> +"""
> +error.base_context("Mounting hugepages path"

Re: Flow Control and Port Mirroring Revisited

2011-01-19 Thread Simon Horman
On Tue, Jan 18, 2011 at 10:13:33PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
> > >So it won't be all that simple to implement well, and before we try,
> > >I'd like to know whether there are applications that are helped
> > >by it. For example, we could try to measure latency at various
> > >pps and see whether the backpressure helps. netperf has -b, -w
> > >flags which might help these measurements.
> > 
> > Those options are enabled when one adds --enable-burst to the
> > pre-compilation ./configure  of netperf (one doesn't have to
> > recompile netserver).  However, if one is also looking at latency
> > statistics via the -j option in the top-of-trunk, or simply at the
> > histogram with --enable-histogram on the ./configure and a verbosity
> > level of 2 (global -v 2) then one wants the very top of trunk
> > netperf from:
> > 
> > http://www.netperf.org/svn/netperf2/trunk
> > 
> > to get the recently added support for accurate (netperf level) RTT
> > measuremnts on burst-mode request/response tests.
> > 
> > happy benchmarking,
> > 
> > rick jones

Thanks Rick, that is really helpful.

> > PS - the enhanced latency statistics from -j are only available in
> > the "omni" version of the TCP_RR test.  To get that add a
> > --enable-omni to the ./configure - and in this case both netperf and
> > netserver have to be recompiled.
> 
> 
> Is this TCP only? I would love to get latency data from UDP as well.

At a glance, -- -T UDP is what you are after.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
> event-tap function is called only when it is on, and requests sent
> from device emulators.
> 
> Signed-off-by: Yoshiaki Tamura 
> ---
>  block.c |   11 +++
>  1 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index ff2795b..85bd8b8 100644
> --- a/block.c
> +++ b/block.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "module.h"
>  #include "qemu-objects.h"
> +#include "event-tap.h"
>  
>  #ifdef CONFIG_BSD
>  #include 
> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
> *bs, int64_t sector_num,
>  if (bdrv_check_request(bs, sector_num, nb_sectors))
>  return NULL;
>  
> +if (bs->device_name && event_tap_is_on()) {

bs->device_name is a pointer to a char array contained in bs, so it's
never NULL. You probably mean *bs->device_name?

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-01-19 Thread Sheng Yang
On Monday 17 January 2011 20:39:30 Marcelo Tosatti wrote:
> On Mon, Jan 17, 2011 at 08:18:22PM +0800, Sheng Yang wrote:
> > > > +   goto out;
> > > > +
> > > > +   mmio = &mmio_dev->mmio[idx];
> > > > +   entry = (addr - mmio->table_base_addr) / PCI_MSIX_ENTRY_SIZE;
> > > > +   entry_base = mmio->table_base_va + entry * PCI_MSIX_ENTRY_SIZE;
> > > > +   ctrl_pos = (u32 *)(entry_base + PCI_MSIX_ENTRY_VECTOR_CTRL);
> > > > +
> > > > +   if (get_user(old_ctrl, ctrl_pos))
> > > > +   goto out;
> > > > +
> > > > +   /* No allow writing to other fields when entry is unmasked */
> > > > +   if (!(old_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT) &&
> > > > +   offset != PCI_MSIX_ENTRY_VECTOR_CTRL)
> > > > +   goto out;
> > > > +
> > > > +   if (copy_to_user((void __user *)(entry_base + offset), val, 
> > > > len))
> > > > +   goto out;
> > > 
> > > Instead of copying to/from userspace (which is subject to swapin,
> > > unexpected values), you could include the guest written value in a
> > > kvm_run structure, along with address. Qemu-kvm would use that to
> > > synchronize its copy of the table, on KVM_EXIT_MSIX_ROUTING_UPDATE
> > > exit.
> > 
> > We want to acelerate MSI-X mask bit accessing, which won't exit to
> > userspace in the most condition. That's the cost we want to optimize.
> > Also it's possible to userspace to read the correct value of MMIO(but
> > mostly userspace can't write to it in order to prevent synchronize
> > issue).
> 
> Yes, i meant exit to userspace only when necessary, but instead of
> copying directly everytime record the value the guest has written in
> kvm_run and exit with KVM_EXIT_MSIX_ROUTING_UPDATE.
> 
> > > > +   if (get_user(new_ctrl, ctrl_pos))
> > > > +   goto out;
> > > > +
> > > > +   if ((offset < PCI_MSIX_ENTRY_VECTOR_CTRL && len == 4) ||
> > > > +   (offset < PCI_MSIX_ENTRY_DATA && len == 8))
> > > > +   ret = -ENOTSYNC;
> > > > +   if (old_ctrl == new_ctrl)
> > > > +   goto out;
> > > > +   if (!(old_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT) &&
> > > > +   (new_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT))
> > > > +   r = update_msix_mask_bit(mmio_dev->kvm, mmio, entry, 1);
> > > > +   else if ((old_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT) &&
> > > > +   !(new_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT))
> > > > +   r = update_msix_mask_bit(mmio_dev->kvm, mmio, entry, 0);
> > > 
> > > Then you rely on the kernel copy of the values to enable/disable irq.
> > 
> > Yes, they are guest owned assigned IRQs. Any potential issue?
> 
> Nothing prevents the kernel from enabling or disabling irq multiple
> times with this code (what prevents it is a qemu-kvm that behaves as
> expected). This is not very good.
> 
> Perhaps the guest can only harm itself with that, but i'm not sure.

MSI-X interrupts are not shared, so I think guest can only harm itself if it 
was 
doing it wrong.
> 
> And also if an msix table page is swapped out guest will hang.
> 
> > > > +   return r;
> > > > +}
> > > 
> > > This is not consistent with registration, where there are no checks
> > > regarding validity of assigned device id. So why is it necessary?
> > 
> > I am not quite understand. We need to free mmio anyway, otherwise it
> > would result in wrong mmio interception...
> 
> If you return -EOPNOTSUPP in case kvm_find_assigned_dev fails in the
> read/write paths, there is no need to free anything.

It may work with assigned devices, but the potential user of this is including 
vfio 
drivers and emulate devices. And I don't think it's a good idea to have 
registeration process but no free process...

--
regards
Yang, Sheng

> 
> > > There is a lock ordering problem BTW:
> > > 
> > > - read/write handlers: dev->lock -> kvm->lock
> > > - vm_ioctl_deassign_device -> kvm_free_msix_mmio: kvm->lock ->
> > > dev->lock
> > 
> > Good catch! Would fix it(and other comments of course).
> > 
> > --
> > regards
> > Yang, Sheng
> > 
> > > > +
> > > > +int kvm_vm_ioctl_unregister_msix_mmio(struct kvm *kvm,
> > > > + struct kvm_msix_mmio_user 
> > > > *mmio_user)
> > > > +{
> > > > +   struct kvm_msix_mmio mmio;
> > > > +
> > > > +   mmio.dev_id = mmio_user->dev_id;
> > > > +   mmio.type = mmio_user->type;
> > > > +
> > > > +   return kvm_free_msix_mmio(kvm, &mmio);
> > > > +}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: Add documents for MSI-X MMIO API

2011-01-19 Thread Sheng Yang
On Monday 17 January 2011 20:45:55 Avi Kivity wrote:
> On 01/17/2011 02:35 PM, Sheng Yang wrote:
> > On Monday 17 January 2011 20:21:45 Avi Kivity wrote:
> > >  On 01/06/2011 12:19 PM, Sheng Yang wrote:
> > >  >  Signed-off-by: Sheng Yang
> > >  >  ---
> > >  >  
> > >  >Documentation/kvm/api.txt |   41
> > >  >+ 1 files changed, 41
> > >  >insertions(+), 0 deletions(-)
> > >  >  
> > >  >  diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
> > >  >  index e1a9297..4978b94 100644
> > >  >  --- a/Documentation/kvm/api.txt
> > >  >  +++ b/Documentation/kvm/api.txt
> > >  >  @@ -1263,6 +1263,47 @@ struct kvm_assigned_msix_entry {
> > >  >  
> > >  >__u16 padding[3];
> > >  >
> > >  >};
> > >  >  
> > >  >  +4.54 KVM_REGISTER_MSIX_MMIO
> > >  >  +
> > >  >  +Capability: KVM_CAP_MSIX_MMIO
> > >  >  +Architectures: x86
> > >  >  +Type: vm ioctl
> > >  >  +Parameters: struct kvm_msix_mmio_user (in)
> > >  >  +Returns: 0 on success, -1 on error
> > >  >  +
> > >  >  +This API indicates an MSI-X MMIO address of a guest device. Then
> > >  >  all MMIO +operation would be handled by kernel. When
> > >  >  necessary(e.g. MSI data/address +changed), KVM would exit to
> > >  >  userspace using
> > >  >  KVM_EXIT_MSIX_ROUTING_UPDATE to +indicate the MMIO modification and
> > >  >  require userspace to update IRQ routing +table.
> > >  >  +
> > >  >  +struct kvm_msix_mmio_user {
> > >  >  + __u32 dev_id;
> > >  >  + __u16 type; /* Device type and MMIO address type */
> > >  >  + __u16 max_entries_nr;   /* Maximum entries supported */
> > >  >  + __u64 base_addr;/* Guest physical address of MMIO */
> > >  >  + __u64 base_va;  /* Host virtual address of MMIO mapping 
> > > */
> > >  >  + __u64 flags;/* Reserved for now */
> > >  >  + __u64 reserved[4];
> > >  >  +};
> > >  >  +
> > >  >  +Current device type can be:
> > >  >  +#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV  (1<<   0)
> > >  >  +
> > >  >  +Current MMIO type can be:
> > >  >  +#define KVM_MSIX_MMIO_TYPE_BASE_TABLE(1<<   8)
> > >  >  +
> > >  
> > >  How does userspace know which entry of which table changed?  Need a
> > >  field in struct kvm_run for that.
> > 
> > We already got an guest MMIO address for that in the exit information.
> > I've created a chain of handler in qemu to handle it.
> 
> But we already decoded the table and entry...

But the handler is still wrapped by vcpu_mmio_write(), as a part of MMIO. So 
it's 
not quite handy to get the table and entry out. Also the updater in the 
userspace 
can share the most logic with ordinary userspace MMIO handler, which take 
address 
as parameter. So I think we don't need to pass the decoded table_id and entry 
to 
userspace.

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html