Re: [Qemu-devel] [Question] Question about ACPI table in qemu

2018-09-11 Thread Paolo Bonzini
On 11/09/2018 18:54, Li Qiang wrote:
> Hi all,
> 
> I noticed that both qemu and seabios create the ACPI table.
> Once I think he bios' ACPI table will overwrite the qemu's if seabios
> compiled with CONFIG_ACPI.

Yes, SeaBIOS's ACPI tables are not used anymore, to remove the need to
update QEMU and SeaBIOS in sync.

> But after I read this -->
> https://lists.gnu.org/archive/html/qemu-devel/2013-02/msg04555.html
> 
> There say:
> 
> "just have QEMU pass the tables to SeaBIOS for it to copy
 into memory like it does on CSM, coreboot, and Xen"
> 
> So where does this 'copy' happens? or is there something others I missed?

The copy happens in SeaBIOS's romfile_loader_execute function (called by
qemu_platform_setup).

Thanks,

Paolo



Re: [Qemu-devel] [Question] Question about the i440FX device

2018-09-09 Thread Li Qiang
Thanks Paolo,

Paolo Bonzini  于2018年9月10日周一 上午7:11写道:

> On 07/09/2018 08:32, Li Qiang wrote:
> > Hello all,
> >
> > I want to know why the i440FX in the following 'info qtree' information
> is
> > laid under the pci.0 bus.  In the chip spec here:
> > -->https://wiki.qemu.org/images/b/bb/29054901.pdf
> > I don't see this device.
> >
> > Can anyone give me some hints?
>
> Hi,
>
> the device implements what is in "3.2. PCI Configuration Space Mapped
> Registers" in the i440FX spec.
>
>
Indeed, also I find the comments in 'i440fx_class_init' is useful.
As I got the 'PCII440FXState' is for the "PCI-facing part of the host
bridge"
and 'I440FXState' is for the "host-facing part".

Thanks,
Li Qiang



> Paolo
>


Re: [Qemu-devel] [Question] Question about the i440FX device

2018-09-09 Thread Paolo Bonzini
On 07/09/2018 08:32, Li Qiang wrote:
> Hello all,
> 
> I want to know why the i440FX in the following 'info qtree' information is 
> laid under the pci.0 bus.  In the chip spec here:
> -->https://wiki.qemu.org/images/b/bb/29054901.pdf
> I don't see this device.
> 
> Can anyone give me some hints?

Hi,

the device implements what is in "3.2. PCI Configuration Space Mapped
Registers" in the i440FX spec.

Paolo



Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot

2018-05-08 Thread Eric Blake

On 12/25/2017 01:33 AM, He Junyan wrote:

hi all:

I am now focusing on snapshot optimization for Intel NVDimm kind
memory. Different from the normal memory, the NVDimm may be 128G, 256G
or even more for just one guest, and its speed is slower than the
normal memory. So sometimes it may take several minutes to complete
just one snapshot saving. Even with compression enabled, the snapshot
point may consume more than 30G disk space.
We decide to add incremental kind snapshot saving to resolve this. Just
store difference between snapshot points to save time and disk space.
But the current snapshot/save_vm framework seems not to support this.
We need to add snapshot dependency and extra operations when we LOAD
and DELETE the snapshot point.
Is that possible to modify the savevm framework and add some
incremental snapshot support to QCOW2 format?


In general, the list has tended to focus on external snapshots rather 
than internal; where persistent bitmaps have been the proposed mechanism 
for tracking incremental differences between snapshots.  But yes, it is 
certainly feasible that patches to improve internal snapshots to take 
advantage of incremental relationships may prove useful.  You will need 
to document all enhancements to the qcow2 file format and get that 
approved first, as interoperability demands that others reading the same 
spec would be able to interpret the image you create that is utilizing 
an internal snapshot with an incremental diff.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot

2018-01-15 Thread He Junyan
On 三, 2018-01-10 at 20:17 +, Stefan Hajnoczi wrote:
> On Wed, Jan 10, 2018 at 8:15 PM, Dr. David Alan Gilbert
>  wrote:
> > 
> > * Stefan Hajnoczi (stefa...@gmail.com) wrote:
> > > 
> > > On Tue, Jan 9, 2018 at 7:55 PM, Dr. David Alan Gilbert
> > >  wrote:
> > > > 
> > > > > 
> > > > > Certain guest operations like rebooting or zeroing memory
> > > > > will defeat
> > > > > the incremental guest RAM snapshot feature.  It's worth
> > > > > thinking about
> > > > > these cases to make sure this feature would be worth it in
> > > > > real use
> > > > > cases.
> > > > But those probably wouldn't upset an NVDimm?
> > > If the guest dirties all RAM then the incremental snapshot
> > > feature
> > > degrades to a full snapshot.  I'm asking if there are common
> > > operations where that happens.
> > > 
> > > I seem to remember Windows guests zero all pages on cold
> > > boot.  Maybe
> > > that's not the case anymore.
> > > 
> > > Worth checking before embarking on this feature because it could
> > > be a
> > > waste of effort if it turns out real-world guests dirty all
> > > memory in
> > > common cases.
> > Right, but I'm hoping that there's some magic somewhere where an
> > NVDimm doesn't
> > get zero'd because of a cold boot since that would seem to make it
> > volatile.
> This feature isn't specific to NVDIMM though.  It would be equally
> useful for regular RAM.
> 
> Stefan
> 

Thanks for all your advices.
I already did a lot of investigation and write some code. My
consideration is as following:
1. As the first step, I will use is_active() function to make it just
work for nvdimm kind memory region and just for snapshot saving, not
for live migration. I understand that this can work for all kinds of
memory, but it may low guest's performance if we always enable dirty
log tracking for memory. For the live migration, all the data need to
be copied and it seems can not get benefit by this manner.
2. Saving and Loading is relatively easy to do, while deleting some
snapshot point needs a lot of work to do. The current framework just
deletes all the data of one snapshot point by one shot. I want to add
some reference to L1/L2 table of QCOW2 when the cluster's data is
depended by other snapshot point, so we can keep the data when
deleting.

I will also check whether the cold boot will zero all pages later.
 
Thanks,
Junyan








Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot

2018-01-10 Thread Stefan Hajnoczi
On Wed, Jan 10, 2018 at 8:15 PM, Dr. David Alan Gilbert
 wrote:
> * Stefan Hajnoczi (stefa...@gmail.com) wrote:
>> On Tue, Jan 9, 2018 at 7:55 PM, Dr. David Alan Gilbert
>>  wrote:
>> >> Certain guest operations like rebooting or zeroing memory will defeat
>> >> the incremental guest RAM snapshot feature.  It's worth thinking about
>> >> these cases to make sure this feature would be worth it in real use
>> >> cases.
>> >
>> > But those probably wouldn't upset an NVDimm?
>>
>> If the guest dirties all RAM then the incremental snapshot feature
>> degrades to a full snapshot.  I'm asking if there are common
>> operations where that happens.
>>
>> I seem to remember Windows guests zero all pages on cold boot.  Maybe
>> that's not the case anymore.
>>
>> Worth checking before embarking on this feature because it could be a
>> waste of effort if it turns out real-world guests dirty all memory in
>> common cases.
>
> Right, but I'm hoping that there's some magic somewhere where an NVDimm 
> doesn't
> get zero'd because of a cold boot since that would seem to make it
> volatile.

This feature isn't specific to NVDIMM though.  It would be equally
useful for regular RAM.

Stefan



Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot

2018-01-10 Thread Dr. David Alan Gilbert
* Stefan Hajnoczi (stefa...@gmail.com) wrote:
> On Tue, Jan 9, 2018 at 7:55 PM, Dr. David Alan Gilbert
>  wrote:
> >> Certain guest operations like rebooting or zeroing memory will defeat
> >> the incremental guest RAM snapshot feature.  It's worth thinking about
> >> these cases to make sure this feature would be worth it in real use
> >> cases.
> >
> > But those probably wouldn't upset an NVDimm?
> 
> If the guest dirties all RAM then the incremental snapshot feature
> degrades to a full snapshot.  I'm asking if there are common
> operations where that happens.
> 
> I seem to remember Windows guests zero all pages on cold boot.  Maybe
> that's not the case anymore.
> 
> Worth checking before embarking on this feature because it could be a
> waste of effort if it turns out real-world guests dirty all memory in
> common cases.

Right, but I'm hoping that there's some magic somewhere where an NVDimm doesn't
get zero'd because of a cold boot since that would seem to make it
volatile.

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot

2018-01-10 Thread Stefan Hajnoczi
On Tue, Jan 9, 2018 at 7:55 PM, Dr. David Alan Gilbert
 wrote:
>> Certain guest operations like rebooting or zeroing memory will defeat
>> the incremental guest RAM snapshot feature.  It's worth thinking about
>> these cases to make sure this feature would be worth it in real use
>> cases.
>
> But those probably wouldn't upset an NVDimm?

If the guest dirties all RAM then the incremental snapshot feature
degrades to a full snapshot.  I'm asking if there are common
operations where that happens.

I seem to remember Windows guests zero all pages on cold boot.  Maybe
that's not the case anymore.

Worth checking before embarking on this feature because it could be a
waste of effort if it turns out real-world guests dirty all memory in
common cases.



Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot

2018-01-09 Thread Dr. David Alan Gilbert
* Stefan Hajnoczi (stefa...@gmail.com) wrote:
> On Mon, Dec 25, 2017 at 07:54:00AM +, He, Junyan wrote:
> > I am now focusing on snapshot optimization for Intel NVDimm kind memory. 
> > Different from the normal memory, the NVDimm may be 128G, 256G or even more 
> > for just one guest, and its speed is slower than the normal memory. So 
> > sometimes it may take several minutes to complete just one snapshot saving. 
> > Even with compression enabled, the snapshot point may consume more than 30G 
> > disk space. We decide to add incremental kind snapshot saving to resolve 
> > this. Just store difference between snapshot points to save time and disk 
> > space.
> > But the current snapshot/save_vm framework seems not to support this.
> > We need to add snapshot dependency and extra operations when we LOAD and 
> > DELETE the snapshot point.
> > Is that possible to modify the savevm framework and add some incremental 
> > snapshot support to QCOW2 format?
> 
> It sounds like you'd like to save incremental guest RAM snapshots based
> on the contents of earlier snapshots.  QEMU has no way of doing this at
> the moment.
> 
> In order to do this QEMU needs to keep dirty memory logging enabled at
> all times so it knows which pages have been written since the last
> snapshot.  This will affect guest performance.

Keeping the dirty logging going is something that COLO does, to send
incremental snapshots to the secondary.   As you say, it does slow
things down and you have to be able to keep the state between the
migration runs.

> Certain guest operations like rebooting or zeroing memory will defeat
> the incremental guest RAM snapshot feature.  It's worth thinking about
> these cases to make sure this feature would be worth it in real use
> cases.

But those probably wouldn't upset an NVDimm?

> What you're proposing isn't specific to NVDIMM.  Regular RAM use cases
> also benefit from incremental snapshots because less disk space is
> required for the savevm data.
> 
> Some questions about your use case:
> 
> 1. Does the guest have -device nvdimm?
> 
> 2. Is the qcow2 file on NVDIMM?
> 
> Stefan

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot

2018-01-09 Thread Stefan Hajnoczi
On Mon, Dec 25, 2017 at 07:54:00AM +, He, Junyan wrote:
> I am now focusing on snapshot optimization for Intel NVDimm kind memory. 
> Different from the normal memory, the NVDimm may be 128G, 256G or even more 
> for just one guest, and its speed is slower than the normal memory. So 
> sometimes it may take several minutes to complete just one snapshot saving. 
> Even with compression enabled, the snapshot point may consume more than 30G 
> disk space. We decide to add incremental kind snapshot saving to resolve 
> this. Just store difference between snapshot points to save time and disk 
> space.
> But the current snapshot/save_vm framework seems not to support this.
> We need to add snapshot dependency and extra operations when we LOAD and 
> DELETE the snapshot point.
> Is that possible to modify the savevm framework and add some incremental 
> snapshot support to QCOW2 format?

It sounds like you'd like to save incremental guest RAM snapshots based
on the contents of earlier snapshots.  QEMU has no way of doing this at
the moment.

In order to do this QEMU needs to keep dirty memory logging enabled at
all times so it knows which pages have been written since the last
snapshot.  This will affect guest performance.

Certain guest operations like rebooting or zeroing memory will defeat
the incremental guest RAM snapshot feature.  It's worth thinking about
these cases to make sure this feature would be worth it in real use
cases.

What you're proposing isn't specific to NVDIMM.  Regular RAM use cases
also benefit from incremental snapshots because less disk space is
required for the savevm data.

Some questions about your use case:

1. Does the guest have -device nvdimm?

2. Is the qcow2 file on NVDIMM?

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [SPARC] question on LEON IRQMP interrupt controller.

2018-01-08 Thread Jean-Christophe DUBOIS

Le 08/01/2018 à 20:56, Mark Cave-Ayland a écrit :
Thanks for the patch! I'm afraid I don't really have any experience 
with LEON as my focus is sun4m/sun4u, however I'm happy to take 
patches Acked/Reviewed by Fabien as the current LEON maintainer


I am waiting for Fabien feedback after my experiment with tsim.


if they don't cause any regressions in my own tests.


Of course.

JC




ATB,

Mark. 






Re: [Qemu-devel] [SPARC] question on LEON IRQMP interrupt controller.

2018-01-08 Thread Mark Cave-Ayland

On 02/01/18 11:13, Jean-Christophe DUBOIS wrote:


Hi Mark, Artyom,

I am wondering if the IRQMP code in hw/intc/grlib_irqmp.c is correct 
when it comes to acknowledging interrupts.


With the actual code an interrupt can be lowered/acked only by an "ack" 
from the processor which means that the trap handler related to this 
external interrupt needs to be run for the ack to happen.


In particular this means that the interrupt cannot be acked only by 
software. Even if the software clears the "pending" interrupts (by 
writing to the CLEAR_OFFSET register before the interrupt handler is 
run) this does not clear the interrupt to the processor (which is kept 
asserted until the handler is run and the interrupt acked by the 
processor). Do you know if this is indeed the intended behavior (I 
understand that for most operating system the interrupt handler will be 
run at last and this does not make a difference)?


I would expect that clearing interrupt through software (by writing to 
the CLEAR_OFFSET register) would have the same effect as the processor 
acknowledgment (and could avoid to run the interrupt handler if things 
have already been taken care of by software).


Unfortunately the documentation I got (on the web) on the IRQMP is not 
very clear on the topic.


Anyway you can find below the patch I'd like to provide for IRQMP.

Thanks


Thanks for the patch! I'm afraid I don't really have any experience with 
LEON as my focus is sun4m/sun4u, however I'm happy to take patches 
Acked/Reviewed by Fabien as the current LEON maintainer if they don't 
cause any regressions in my own tests.



ATB,

Mark.



Re: [Qemu-devel] [SPARC] question on LEON IRQMP interrupt controller.

2018-01-06 Thread Jean-Christophe DUBOIS

Hi,

So after trying my code on tsim, I can confirm that the software is 
indeed able to clear/ack the interrupt without requiring the ack from 
the processor.


Things are a bit strange with tsim as the simulator doesn't seem to 
respect time delay when the processor is in sleep/idle mode and jump 
straight to the next timer expiration/interrupt (incrementing the 
required clock cycles in the process) ...


And as the evaluation version of tsim stops the program after 2^32 clock 
cycles the program could not run more than 86 seconds (with the LEON 
clock @ 50 MHz).


But still the interrupts are acknowledged by software only without 
requiring the trap handler to run or the processor to ack the interrupt 
on a hardware way.


So I believe that my proposed modification is correct.

On a related note, tsim is using different interrupts (than Qemu) for 
timer and all. So what is the reference LEON3 platform that Qemu is 
emulating with leon3_generic?


JC

Le 03/01/2018 à 10:23, j...@tribudubois.net a écrit :

Le 2018-01-02 19:58, Fabien Chouteau a écrit :

Hello Jean-Christophe,

I'm the original author of this patch and I add in copy my colleague
Frederic.

On 02/01/2018 12:13, Jean-Christophe DUBOIS wrote:

I am wondering if the IRQMP code in hw/intc/grlib_irqmp.c is correct
when it comes to acknowledging interrupts.

With the actual code an interrupt can be lowered/acked only by an
"ack" from the processor which means that the trap handler related to
this external interrupt needs to be run for the ack to happen.

In particular this means that the interrupt cannot be acked only by
software. Even if the software clears the "pending" interrupts (by
writing to the CLEAR_OFFSET register before the interrupt handler is
run) this does not clear the interrupt to the processor (which is kept
asserted until the handler is run and the interrupt acked by the
processor). Do you know if this is indeed the intended behavior (I
understand that for most operating system the interrupt handler will
be run at last and this does not make a difference)?

I would expect that clearing interrupt through software (by writing to
the CLEAR_OFFSET register) would have the same effect as the processor
acknowledgment (and could avoid to run the interrupt handler if things
have already been taken care of by software).

Unfortunately the documentation I got (on the web) on the IRQMP is not
very clear on the topic.



I don't remember all the details of this CPU on top of my head, I worked
on this years ago.

If you have access to a real board the best would be to compare the
behavior of the CPU on it.


Unfortunately I don't have a real board (yet).


There's also a cycle accurate simulator of
Leon3, you can download an evaluation version here:
http://www.gaisler.com/index.php/downloads/simulators


OK, I will try the tsim simulator from Gaisler as a reference.




Anyway you can find below the patch I'd like to provide for IRQMP.



Can you explain the reason for this change? Why can't you use the
current interrupt handling?


I am working on a cooperative multitasking kernel (with no 
preemption). So the kernel is not handling interrupt related traps 
(actually the kernel is not handling the interrupt controller). All 
interrupts are masked at all time for all application or kernel so no 
interrupt trap handler is ever going to trigger (except for IRQ 15 
which is not maskable).


When the tasks have nothing to do the kernel goes to sleep using ASR19 
on LEON. So the system is awaken by next interrupt and the kernel will 
schedule the task handling the interrupt controller.


On LEON, I can go to sleep until the first interrupt. Once the first 
interrupt has been raised the LEON will never be able to get to 
sleep/idle again through ASR19 (it exists immediately) even if the 
interrupt controller handling task clears the interrupt (writing to 
CLEAR_OFFSET register). And this is because, in the actual Qemu 
implementation, the interrupt can only be acknowledged in the 
interrupt controller by the CPU through the triggering of the related 
trap handler.


So I am wondering if this is indeed the expected behavior in real 
hardware/life (interrupts can only be acked by processor and not by 
software).


Note: On a related subject I am wondering if the system (put in 
idle/sleep through ASR19) would woke up on interrupt if the interrupts 
are all masked through PSR (PIL) (it does wake up in Qemu for now). I 
will also test this on tsim before trying it on real hardware someday.




Regards,









Re: [Qemu-devel] [SPARC] question on LEON IRQMP interrupt controller.

2018-01-03 Thread jcd

Le 2018-01-02 19:58, Fabien Chouteau a écrit :

Hello Jean-Christophe,

I'm the original author of this patch and I add in copy my colleague
Frederic.

On 02/01/2018 12:13, Jean-Christophe DUBOIS wrote:

I am wondering if the IRQMP code in hw/intc/grlib_irqmp.c is correct
when it comes to acknowledging interrupts.

With the actual code an interrupt can be lowered/acked only by an
"ack" from the processor which means that the trap handler related to
this external interrupt needs to be run for the ack to happen.

In particular this means that the interrupt cannot be acked only by
software. Even if the software clears the "pending" interrupts (by
writing to the CLEAR_OFFSET register before the interrupt handler is
run) this does not clear the interrupt to the processor (which is kept
asserted until the handler is run and the interrupt acked by the
processor). Do you know if this is indeed the intended behavior (I
understand that for most operating system the interrupt handler will
be run at last and this does not make a difference)?

I would expect that clearing interrupt through software (by writing to
the CLEAR_OFFSET register) would have the same effect as the processor
acknowledgment (and could avoid to run the interrupt handler if things
have already been taken care of by software).

Unfortunately the documentation I got (on the web) on the IRQMP is not
very clear on the topic.



I don't remember all the details of this CPU on top of my head, I 
worked

on this years ago.

If you have access to a real board the best would be to compare the
behavior of the CPU on it.


Unfortunately I don't have a real board (yet).


There's also a cycle accurate simulator of
Leon3, you can download an evaluation version here:
http://www.gaisler.com/index.php/downloads/simulators


OK, I will try the tsim simulator from Gaisler as a reference.




Anyway you can find below the patch I'd like to provide for IRQMP.



Can you explain the reason for this change? Why can't you use the
current interrupt handling?


I am working on a cooperative multitasking kernel (with no preemption). 
So the kernel is not handling interrupt related traps (actually the 
kernel is not handling the interrupt controller). All interrupts are 
masked at all time for all application or kernel so no interrupt trap 
handler is ever going to trigger (except for IRQ 15 which is not 
maskable).


When the tasks have nothing to do the kernel goes to sleep using ASR19 
on LEON. So the system is awaken by next interrupt and the kernel will 
schedule the task handling the interrupt controller.


On LEON, I can go to sleep until the first interrupt. Once the first 
interrupt has been raised the LEON will never be able to get to 
sleep/idle again through ASR19 (it exists immediately) even if the 
interrupt controller handling task clears the interrupt (writing to 
CLEAR_OFFSET register). And this is because, in the actual Qemu 
implementation, the interrupt can only be acknowledged in the interrupt 
controller by the CPU through the triggering of the related trap 
handler.


So I am wondering if this is indeed the expected behavior in real 
hardware/life (interrupts can only be acked by processor and not by 
software).


Note: On a related subject I am wondering if the system (put in 
idle/sleep through ASR19) would woke up on interrupt if the interrupts 
are all masked through PSR (PIL) (it does wake up in Qemu for now). I 
will also test this on tsim before trying it on real hardware someday.




Regards,





Re: [Qemu-devel] [SPARC] question on LEON IRQMP interrupt controller.

2018-01-02 Thread Fabien Chouteau
Hello Jean-Christophe,

I'm the original author of this patch and I add in copy my colleague
Frederic.

On 02/01/2018 12:13, Jean-Christophe DUBOIS wrote:
> I am wondering if the IRQMP code in hw/intc/grlib_irqmp.c is correct
> when it comes to acknowledging interrupts.
>
> With the actual code an interrupt can be lowered/acked only by an
> "ack" from the processor which means that the trap handler related to
> this external interrupt needs to be run for the ack to happen.
>
> In particular this means that the interrupt cannot be acked only by
> software. Even if the software clears the "pending" interrupts (by
> writing to the CLEAR_OFFSET register before the interrupt handler is
> run) this does not clear the interrupt to the processor (which is kept
> asserted until the handler is run and the interrupt acked by the
> processor). Do you know if this is indeed the intended behavior (I
> understand that for most operating system the interrupt handler will
> be run at last and this does not make a difference)?
>
> I would expect that clearing interrupt through software (by writing to
> the CLEAR_OFFSET register) would have the same effect as the processor
> acknowledgment (and could avoid to run the interrupt handler if things
> have already been taken care of by software).
>
> Unfortunately the documentation I got (on the web) on the IRQMP is not
> very clear on the topic.
>

I don't remember all the details of this CPU on top of my head, I worked
on this years ago.

If you have access to a real board the best would be to compare the
behavior of the CPU on it. There's also a cycle accurate simulator of
Leon3, you can download an evaluation version here:
http://www.gaisler.com/index.php/downloads/simulators


> Anyway you can find below the patch I'd like to provide for IRQMP.
>

Can you explain the reason for this change? Why can't you use the
current interrupt handling?

Regards,




Re: [Qemu-devel] coroutine question, for NBD debugging

2017-11-04 Thread Paolo Bonzini
On 03/11/2017 21:03, Eric Blake wrote:
> In include/qemu/coroutine.h, we have:
> 
> /**
>  * Yield the coroutine for a given duration
>  *
>  * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be
>  * resumed when using aio_poll().
>  */
> void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
> 
> but there is no co_sleep_ns() anywhere.  What should the documentation
> actually state?

The old doc comment for co_sleep_ns was:

/**
 * Yield the coroutine for a given duration
 *
 * Note this function uses timers and hence only works when a main loop is in
 * use.  See main-loop.h and do not use from qemu-tool programs.
 */

So probably:

This function uses timers and hence needs to know the event loop
(#AioContext) to place the timer on.  In any case, co_aio_sleep_ns()
does not affect the #AioContext where the current coroutine is running,
as the coroutine will restart on the same #AioContext that it is
running on.

Thanks,

Paolo

> 
> Meanwhile, I'm trying to figure out if there is an easy way to hack
> qemu-nbd to send two structured replies interleaved (as in A.1, B.1,
> A.2, B.2) to ensure the NBD client coroutines properly handle
> interleaved responses, but adding a plain sleep() between A.1 and A.2
> does not work, and I'm not sure whether co_aio_sleep_ns() would work for
> the hack instead.
> 
> The hack I'm currently playing with splits reads and structured errors
> into two parts, attempting to use sleeps to force the server to send
> replies out of order:
> 
> diff --git c/nbd/server.c w/nbd/server.c
> index bcf0cdb47c..4b642127bb 100644
> --- c/nbd/server.c
> +++ w/nbd/server.c
> @@ -1285,13 +1285,24 @@ static int coroutine_fn
> nbd_co_send_structured_read(NBDClient *client,
>  {.iov_base = , .iov_len = sizeof(chunk)},
>  {.iov_base = data, .iov_len = size}
>  };
> +int ret;
> 
>  trace_nbd_co_send_structured_read(handle, offset, data, size);
> -set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_OFFSET_DATA,
> +set_be_chunk(, 0, NBD_REPLY_TYPE_OFFSET_DATA,
>   handle, sizeof(chunk) - sizeof(chunk.h) + size);
>  stq_be_p(, offset);
> 
> -return nbd_co_send_iov(client, iov, 2, errp);
> +ret = nbd_co_send_iov(client, iov, 2, errp);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +sleep(2);
> +
> +set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE,
> + handle, 0);
> +iov[0].iov_len = sizeof(chunk.h);
> +return nbd_co_send_iov(client, iov, 1, errp);
>  }
> 
>  static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
> @@ -1306,16 +1317,27 @@ static int coroutine_fn
> nbd_co_send_structured_error(NBDClient *client,
>  {.iov_base = , .iov_len = sizeof(chunk)},
>  {.iov_base = (char *)msg, .iov_len = msg ? strlen(msg) : 0},
>  };
> +int ret;
> 
>  assert(nbd_err);
>  trace_nbd_co_send_structured_error(handle, nbd_err,
> nbd_err_lookup(nbd_err), msg ?
> msg : "");
> -set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_ERROR,
> handle,
> +set_be_chunk(, 0, NBD_REPLY_TYPE_ERROR, handle,
>   sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len);
>  stl_be_p(, nbd_err);
>  stw_be_p(_length, iov[1].iov_len);
> 
> -return nbd_co_send_iov(client, iov, 1 + !!iov[1].iov_len, errp);
> +ret = nbd_co_send_iov(client, iov, 1 + !!iov[1].iov_len, errp);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +sleep(1);
> +
> +set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE,
> + handle, 0);
> +iov[0].iov_len = sizeof(chunk.h);
> +return nbd_co_send_iov(client, iov, 1, errp);
>  }
> 
>  /* nbd_co_receive_request
> 
> 
> coupled with a client that does:
> 
> $ ./qemu-io -f raw nbd://localhost:10809/foo --trace='nbd_*' -c
> 'aio_write -P 3 1 1' -c 'aio_read -P 1 1 1'
> 
> but the trace shows that the client does not receive B.1 until after it
> has blocked waiting for A.2, which is not what I was hoping to see.
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] coroutine question, for NBD debugging

2017-11-03 Thread Eric Blake
On 11/03/2017 03:03 PM, Eric Blake wrote:
> In include/qemu/coroutine.h, we have:
> 
> /**
>  * Yield the coroutine for a given duration
>  *
>  * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be
>  * resumed when using aio_poll().
>  */
> void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
> 
> but there is no co_sleep_ns() anywhere.  What should the documentation
> actually state?

This part still needs fixing; but partially answering myself:

> 
> Meanwhile, I'm trying to figure out if there is an easy way to hack
> qemu-nbd to send two structured replies interleaved (as in A.1, B.1,
> A.2, B.2) to ensure the NBD client coroutines properly handle
> interleaved responses, but adding a plain sleep() between A.1 and A.2
> does not work, and I'm not sure whether co_aio_sleep_ns() would work for
> the hack instead.
> 
> The hack I'm currently playing with splits reads and structured errors
> into two parts, attempting to use sleeps to force the server to send
> replies out of order:
> 

> +sleep(2);

Using this instead of sleep():

+co_aio_sleep_ns(client->exp->ctx, QEMU_CLOCK_REALTIME, 20);

> coupled with a client that does:
> 
> $ ./qemu-io -f raw nbd://localhost:10809/foo --trace='nbd_*' -c
> 'aio_write -P 3 1 1' -c 'aio_read -P 1 1 1'
> 

produced the trace I was looking for:

2837@1509745520.815687:nbd_send_request Sending request to server: {
.from = 1, .len = 1, .handle = 93924431318768, .flags = 0x1, .type = 1
(write) }
2837@1509745520.815764:nbd_send_request Sending request to server: {
.from = 1, .len = 1, .handle = 93924431318769, .flags = 0x0, .type = 0
(read) }
2837@1509745520.815913:nbd_receive_structured_reply_chunk Got structured
reply chunk: { flags = 0x0, type = 32769, handle = 93924431318768,
length = 25 }
2837@1509745520.856401:nbd_receive_structured_reply_chunk Got structured
reply chunk: { flags = 0x0, type = 1, handle = 93924431318769, length = 9 }
2837@1509745521.817279:nbd_receive_structured_reply_chunk Got structured
reply chunk: { flags = 0x1, type = 0, handle = 93924431318768, length = 0 }
aio_write failed: Operation not permitted
2837@1509745522.817474:nbd_receive_structured_reply_chunk Got structured
reply chunk: { flags = 0x1, type = 0, handle = 93924431318769, length = 0 }
read 1/1 bytes at offset 1
1 bytes, 1 ops; 0:00:02.00 (0.499560 bytes/sec and 0.4996 ops/sec)

which means I'm fairly confident that our client implementation is doing
the right thing with regards to complex out-of-order/interleaved
structured replies!  Always a nice thing to demonstrate.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-23 Thread Jiri Denemark
On Fri, Oct 20, 2017 at 15:04:57 +0200, David Hildenbrand wrote:
> On 20.10.2017 14:50, Jiri Denemark wrote:
> > The thing is libvirt calls query-cpu-model-expansion to check what the
> > host CPU is. This 'host-model' CPU is replaced with the probed CPU model
> > when a domain starts. The problem described by Marc is that the probed
> > CPU model cannot be used universally with all machine types. So starting
> > a domain on such host with machine type s390-virtio-ccw-2.10 works, but
> > a domain with machine type s390-virtio-ccw-2.9 fails to start with the
> > same probed CPU model.
> > 
> 
> My assumption would be that the CPU model is copied into the XML when
> the domain fist starts. This is what the documentation describes.

The CPU model is copied into the XML each time the domain starts.

> So when upgrading QEMU, the CPU model in the XML is still the same
> (z13), even though something different is now reported in the host info
> after upgrading QEMU (z14).
> 
> In this case it would continue to work.
> 
> The problem is that the CPU model is not copied into the XML doesn't
> remain there, right? It is suddenly replaced with a z14 host model.

Even preserving the actual CPU model in the XML from the first start
wouldn't solve the issue. You can always create a new domain with
s390-virtio-ccw-2.9 machine type even on the upgraded host.

Jirka



Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand

> 
> I intend to put some brain-power in this too. Probably next week.
> 
> My general impression is, that I have a at places different understanding
> of how things should work compared to David. Especially when it comes
> to this concept of persistent copying, and also an end-user-digestible
> definition of what host-model does. (I think this with copying capabilities
> from whatever xml which is subject to convoluted caching is a bit too
> hard to digest for an end user not involved in the development of qemu
> and libvirt).

When reading the doc I was assuming that it would be a persistent copy.
But it would only fix part of the issue.

In the end, it specifies quite clearly what is copied. And if that is
not runnable with the selected machine, bad luck. Therefore ...

> 
> I had a conversation with Boris a couple of hours ago, and it seems, things
> are pretty convoluted.
> 
> If I understand the train of thought here (David) it can be summarized like 
> this:
> For a certain QEMU binary no aspect of the cpu-models may depend on the
> machine type. In particular, compat properties and compat handling is
> not alowed to alter cpu-models (whether the available cpu models nor the
> capabilities of each of these).

... I always recommended avoiding such compatibility settings in the
machine. But before we had CPU models it was really all we had. No we
should let go of it.

We might be able to fix this one (gs) and take care of it in the future,
but ...

> 
> This we would have to make a part of the external interface. That way
> one could be sure that the 'cpu capabilities' are machine-type independent
> (that is, the same for all the machine types).

... the problem is once nasty stuff like zPCI comes in place. If we ever
have (other?) machine dependent stuff that toggles CPU features, we can
only try limit the damage.

> 
> Or did I get this completely wrong? (Based on the answer branches my
> think).

Not at all.

> 
> Halil
> 


-- 

Thanks,

David



Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread Halil Pasic


On 10/20/2017 04:12 PM, Christian Borntraeger wrote:
> 
> 
> On 10/20/2017 04:06 PM, David Hildenbrand wrote:
>> On 20.10.2017 16:02, Christian Borntraeger wrote:
>>>
>>>
>>> On 10/20/2017 03:51 PM, David Hildenbrand wrote:
>>> [...]
> The problem goes much further.
> A fresh guest with
>
> 
>  hvm
>
>
>
> does not start. No migration from an older system is necessary.
>

 Yes, as stated in the documentation "copying host CPU definition from
 capabilities XML" this can not work. And it works just as documented.
 Not saying this is a nice thing :)

 I think we should try to fix gs_allowed (if possible) and avoid
 something like that in the future. This would avoid other complexity
 involved when suddenly having X host models.
>>>
>>> Maybe this one is really a proper fix. It will allow the guest to start
>>> and on migration the cpu model will complain if the target cannot provide 
>>> gs.
>>> Similar things can happen if - for example - the host kernel lacks some 
>>> features.
>>
>> Right, just what I had in mind.
>>
>>>
>>>
>>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>>> index 77169bb..97a08fa 100644
>>> --- a/hw/s390x/s390-virtio-ccw.c
>>> +++ b/hw/s390x/s390-virtio-ccw.c
>>> @@ -430,7 +430,6 @@ static void ccw_machine_class_init(ObjectClass *oc, 
>>> void *data)
>>>  s390mc->ri_allowed = true;
>>>  s390mc->cpu_model_allowed = true;
>>>  s390mc->css_migration_enabled = true;
>>> -s390mc->gs_allowed = true;
>>>  mc->init = ccw_init;
>>>  mc->reset = s390_machine_reset;
>>>  mc->hot_add_cpu = s390_hot_add_cpu;
>>> @@ -509,12 +508,6 @@ bool cpu_model_allowed(void)
>>>  return get_machine_class()->cpu_model_allowed;
>>>  }
>>>  
>>> -bool gs_allowed(void)
>>> -{
>>> -/* for "none" machine this results in true */
>>> -return get_machine_class()->gs_allowed;
>>> -}
>>> -
>>>  static char *machine_get_loadparm(Object *obj, Error **errp)
>>>  {
>>>  S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
>>> @@ -757,7 +750,6 @@ static void ccw_machine_2_9_class_options(MachineClass 
>>> *mc)
>>>  {
>>>  S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc);
>>>  
>>> -s390mc->gs_allowed = false;
>>>  ccw_machine_2_10_class_options(mc);
>>>  SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_9);
>>>  s390mc->css_migration_enabled = false;
>>> diff --git a/include/hw/s390x/s390-virtio-ccw.h 
>>> b/include/hw/s390x/s390-virtio-ccw.h
>>> index a9a90c2..1de53b0 100644
>>> --- a/include/hw/s390x/s390-virtio-ccw.h
>>> +++ b/include/hw/s390x/s390-virtio-ccw.h
>>> @@ -40,7 +40,6 @@ typedef struct S390CcwMachineClass {
>>>  bool ri_allowed;
>>>  bool cpu_model_allowed;
>>>  bool css_migration_enabled;
>>> -bool gs_allowed;
>>>  } S390CcwMachineClass;
>>>  
>>>  /* runtime-instrumentation allowed by the machine */
>>> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
>>> index a0d5052..3f13fc2 100644
>>> --- a/target/s390x/kvm.c
>>> +++ b/target/s390x/kvm.c
>>> @@ -362,7 +362,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>  cap_ri = 1;
>>>  }
>>>  }
>>> -if (gs_allowed()) {
>>> +if (cpu_model_allowed()) {
>>>  if (kvm_vm_enable_cap(s, KVM_CAP_S390_GS, 0) == 0) {
>>>  cap_gs = 1;
>>>  }
>>>
>>
>> And the last hunk makes sure we keep same handling for machines without
>> CPU model support and therefore no way to mask support. For all recent
>> machines, we expect CPU model checks to be in place.
>>
>> Will have to think about this a bit more. Will you send this as a proper
>> patch?
> 
> After thinking about it :-)
> 

I intend to put some brain-power in this too. Probably next week.

My general impression is, that I have a at places different understanding
of how things should work compared to David. Especially when it comes
to this concept of persistent copying, and also an end-user-digestible
definition of what host-model does. (I think this with copying capabilities
from whatever xml which is subject to convoluted caching is a bit too
hard to digest for an end user not involved in the development of qemu
and libvirt).

I had a conversation with Boris a couple of hours ago, and it seems, things
are pretty convoluted.

If I understand the train of thought here (David) it can be summarized like 
this:
For a certain QEMU binary no aspect of the cpu-models may depend on the
machine type. In particular, compat properties and compat handling is
not alowed to alter cpu-models (whether the available cpu models nor the
capabilities of each of these).

This we would have to make a part of the external interface. That way
one could be sure that the 'cpu capabilities' are machine-type independent
(that is, the same for all the machine types).

Or did I get this completely wrong? (Based on the answer branches my
think).

Halil




Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread Christian Borntraeger


On 10/20/2017 04:06 PM, David Hildenbrand wrote:
> On 20.10.2017 16:02, Christian Borntraeger wrote:
>>
>>
>> On 10/20/2017 03:51 PM, David Hildenbrand wrote:
>> [...]
 The problem goes much further.
 A fresh guest with

 
  hvm



 does not start. No migration from an older system is necessary.

>>>
>>> Yes, as stated in the documentation "copying host CPU definition from
>>> capabilities XML" this can not work. And it works just as documented.
>>> Not saying this is a nice thing :)
>>>
>>> I think we should try to fix gs_allowed (if possible) and avoid
>>> something like that in the future. This would avoid other complexity
>>> involved when suddenly having X host models.
>>
>> Maybe this one is really a proper fix. It will allow the guest to start
>> and on migration the cpu model will complain if the target cannot provide gs.
>> Similar things can happen if - for example - the host kernel lacks some 
>> features.
> 
> Right, just what I had in mind.
> 
>>
>>
>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>> index 77169bb..97a08fa 100644
>> --- a/hw/s390x/s390-virtio-ccw.c
>> +++ b/hw/s390x/s390-virtio-ccw.c
>> @@ -430,7 +430,6 @@ static void ccw_machine_class_init(ObjectClass *oc, void 
>> *data)
>>  s390mc->ri_allowed = true;
>>  s390mc->cpu_model_allowed = true;
>>  s390mc->css_migration_enabled = true;
>> -s390mc->gs_allowed = true;
>>  mc->init = ccw_init;
>>  mc->reset = s390_machine_reset;
>>  mc->hot_add_cpu = s390_hot_add_cpu;
>> @@ -509,12 +508,6 @@ bool cpu_model_allowed(void)
>>  return get_machine_class()->cpu_model_allowed;
>>  }
>>  
>> -bool gs_allowed(void)
>> -{
>> -/* for "none" machine this results in true */
>> -return get_machine_class()->gs_allowed;
>> -}
>> -
>>  static char *machine_get_loadparm(Object *obj, Error **errp)
>>  {
>>  S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
>> @@ -757,7 +750,6 @@ static void ccw_machine_2_9_class_options(MachineClass 
>> *mc)
>>  {
>>  S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc);
>>  
>> -s390mc->gs_allowed = false;
>>  ccw_machine_2_10_class_options(mc);
>>  SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_9);
>>  s390mc->css_migration_enabled = false;
>> diff --git a/include/hw/s390x/s390-virtio-ccw.h 
>> b/include/hw/s390x/s390-virtio-ccw.h
>> index a9a90c2..1de53b0 100644
>> --- a/include/hw/s390x/s390-virtio-ccw.h
>> +++ b/include/hw/s390x/s390-virtio-ccw.h
>> @@ -40,7 +40,6 @@ typedef struct S390CcwMachineClass {
>>  bool ri_allowed;
>>  bool cpu_model_allowed;
>>  bool css_migration_enabled;
>> -bool gs_allowed;
>>  } S390CcwMachineClass;
>>  
>>  /* runtime-instrumentation allowed by the machine */
>> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
>> index a0d5052..3f13fc2 100644
>> --- a/target/s390x/kvm.c
>> +++ b/target/s390x/kvm.c
>> @@ -362,7 +362,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>  cap_ri = 1;
>>  }
>>  }
>> -if (gs_allowed()) {
>> +if (cpu_model_allowed()) {
>>  if (kvm_vm_enable_cap(s, KVM_CAP_S390_GS, 0) == 0) {
>>  cap_gs = 1;
>>  }
>>
> 
> And the last hunk makes sure we keep same handling for machines without
> CPU model support and therefore no way to mask support. For all recent
> machines, we expect CPU model checks to be in place.
> 
> Will have to think about this a bit more. Will you send this as a proper
> patch?

After thinking about it :-)




Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand
On 20.10.2017 16:02, Christian Borntraeger wrote:
> 
> 
> On 10/20/2017 03:51 PM, David Hildenbrand wrote:
> [...]
>>> The problem goes much further.
>>> A fresh guest with
>>>
>>> 
>>>  hvm
>>>
>>>
>>>
>>> does not start. No migration from an older system is necessary.
>>>
>>
>> Yes, as stated in the documentation "copying host CPU definition from
>> capabilities XML" this can not work. And it works just as documented.
>> Not saying this is a nice thing :)
>>
>> I think we should try to fix gs_allowed (if possible) and avoid
>> something like that in the future. This would avoid other complexity
>> involved when suddenly having X host models.
> 
> Maybe this one is really a proper fix. It will allow the guest to start
> and on migration the cpu model will complain if the target cannot provide gs.
> Similar things can happen if - for example - the host kernel lacks some 
> features.

Right, just what I had in mind.

> 
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 77169bb..97a08fa 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -430,7 +430,6 @@ static void ccw_machine_class_init(ObjectClass *oc, void 
> *data)
>  s390mc->ri_allowed = true;
>  s390mc->cpu_model_allowed = true;
>  s390mc->css_migration_enabled = true;
> -s390mc->gs_allowed = true;
>  mc->init = ccw_init;
>  mc->reset = s390_machine_reset;
>  mc->hot_add_cpu = s390_hot_add_cpu;
> @@ -509,12 +508,6 @@ bool cpu_model_allowed(void)
>  return get_machine_class()->cpu_model_allowed;
>  }
>  
> -bool gs_allowed(void)
> -{
> -/* for "none" machine this results in true */
> -return get_machine_class()->gs_allowed;
> -}
> -
>  static char *machine_get_loadparm(Object *obj, Error **errp)
>  {
>  S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
> @@ -757,7 +750,6 @@ static void ccw_machine_2_9_class_options(MachineClass 
> *mc)
>  {
>  S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc);
>  
> -s390mc->gs_allowed = false;
>  ccw_machine_2_10_class_options(mc);
>  SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_9);
>  s390mc->css_migration_enabled = false;
> diff --git a/include/hw/s390x/s390-virtio-ccw.h 
> b/include/hw/s390x/s390-virtio-ccw.h
> index a9a90c2..1de53b0 100644
> --- a/include/hw/s390x/s390-virtio-ccw.h
> +++ b/include/hw/s390x/s390-virtio-ccw.h
> @@ -40,7 +40,6 @@ typedef struct S390CcwMachineClass {
>  bool ri_allowed;
>  bool cpu_model_allowed;
>  bool css_migration_enabled;
> -bool gs_allowed;
>  } S390CcwMachineClass;
>  
>  /* runtime-instrumentation allowed by the machine */
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index a0d5052..3f13fc2 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -362,7 +362,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  cap_ri = 1;
>  }
>  }
> -if (gs_allowed()) {
> +if (cpu_model_allowed()) {
>  if (kvm_vm_enable_cap(s, KVM_CAP_S390_GS, 0) == 0) {
>  cap_gs = 1;
>  }
> 

And the last hunk makes sure we keep same handling for machines without
CPU model support and therefore no way to mask support. For all recent
machines, we expect CPU model checks to be in place.

Will have to think about this a bit more. Will you send this as a proper
patch?

-- 

Thanks,

David



Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread Christian Borntraeger


On 10/20/2017 03:51 PM, David Hildenbrand wrote:
[...]
>> The problem goes much further.
>> A fresh guest with
>>
>> 
>>  hvm
>>
>>
>>
>> does not start. No migration from an older system is necessary.
>>
> 
> Yes, as stated in the documentation "copying host CPU definition from
> capabilities XML" this can not work. And it works just as documented.
> Not saying this is a nice thing :)
> 
> I think we should try to fix gs_allowed (if possible) and avoid
> something like that in the future. This would avoid other complexity
> involved when suddenly having X host models.

Maybe this one is really a proper fix. It will allow the guest to start
and on migration the cpu model will complain if the target cannot provide gs.
Similar things can happen if - for example - the host kernel lacks some 
features.


diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 77169bb..97a08fa 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -430,7 +430,6 @@ static void ccw_machine_class_init(ObjectClass *oc, void 
*data)
 s390mc->ri_allowed = true;
 s390mc->cpu_model_allowed = true;
 s390mc->css_migration_enabled = true;
-s390mc->gs_allowed = true;
 mc->init = ccw_init;
 mc->reset = s390_machine_reset;
 mc->hot_add_cpu = s390_hot_add_cpu;
@@ -509,12 +508,6 @@ bool cpu_model_allowed(void)
 return get_machine_class()->cpu_model_allowed;
 }
 
-bool gs_allowed(void)
-{
-/* for "none" machine this results in true */
-return get_machine_class()->gs_allowed;
-}
-
 static char *machine_get_loadparm(Object *obj, Error **errp)
 {
 S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
@@ -757,7 +750,6 @@ static void ccw_machine_2_9_class_options(MachineClass *mc)
 {
 S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc);
 
-s390mc->gs_allowed = false;
 ccw_machine_2_10_class_options(mc);
 SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_9);
 s390mc->css_migration_enabled = false;
diff --git a/include/hw/s390x/s390-virtio-ccw.h 
b/include/hw/s390x/s390-virtio-ccw.h
index a9a90c2..1de53b0 100644
--- a/include/hw/s390x/s390-virtio-ccw.h
+++ b/include/hw/s390x/s390-virtio-ccw.h
@@ -40,7 +40,6 @@ typedef struct S390CcwMachineClass {
 bool ri_allowed;
 bool cpu_model_allowed;
 bool css_migration_enabled;
-bool gs_allowed;
 } S390CcwMachineClass;
 
 /* runtime-instrumentation allowed by the machine */
diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index a0d5052..3f13fc2 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -362,7 +362,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 cap_ri = 1;
 }
 }
-if (gs_allowed()) {
+if (cpu_model_allowed()) {
 if (kvm_vm_enable_cap(s, KVM_CAP_S390_GS, 0) == 0) {
 cap_gs = 1;
 }




Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand
On 20.10.2017 15:49, Christian Borntraeger wrote:
> 
> 
> On 10/20/2017 03:43 PM, David Hildenbrand wrote:
>> On 20.10.2017 15:36, Christian Borntraeger wrote:
>>>
>>>
>>> On 10/20/2017 03:16 PM, David Hildenbrand wrote:

> Hi all,
>
> we recently encountered the problem that the 'host-model' [1] has to be
> related to the machine type of a domain. We have following problem:
>
>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>domain using the default s390-virtio-ccw machine together with the
>host-model CPU mode [1]. The definition will have the machine
>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>in the domain definition. In a next step we upgrade to QEMU 2.10
>(first version to recognize z14). Everything is still fine, even
>though the machine runs in 2.9 compatibility mode. Finally we
>upgrade to a z14. As a consequence it is not possible to start the
>domain anymore as the machine type doesn't support our CPU host
>model (which is expanded at start time of the domain).

 Actually, what is the cause of that problem? I assume it is the gs
 feature (gs_allowed)?

 We should really avoid such things (..._allowed) for CPU model features
 in the future and clue all new such stuff to cpumodel_allowed.
>>>
>>> Yes, starting a guest with
>>>
>>> hvm
>>>   
>>>   
>>>
>>> results in
>>>
>>> qemu-system-s390x: Some features requested in the CPU model are not 
>>> available in the configuration: gs 
>>>
>>> Tying it to cpumodel_allowed would not help, migration-wise though.
>>> libvirt would still transform
>>>
>>>
>>> hvm
>>>   
>>>   
>>
>> My point was, that the host model would have to be copied and _remain_
>> there when s390-ccw-virtio was expanded to s390-ccw-virtio-2.9.
>>
>> So really replacing  by the model z13-base
>> This would at least fix this issue. Just like s390-ccw-virtio get's
>> replaced and remains thats way.
>>
>> But this might for sure have other problems.
> 
> The problem goes much further.
> A fresh guest with
> 
> 
>  hvm
>
>
> 
> does not start. No migration from an older system is necessary.
> 

Yes, as stated in the documentation "copying host CPU definition from
capabilities XML" this can not work. And it works just as documented.
Not saying this is a nice thing :)

I think we should try to fix gs_allowed (if possible) and avoid
something like that in the future. This would avoid other complexity
involved when suddenly having X host models.

-- 

Thanks,

David



Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread Christian Borntraeger


On 10/20/2017 03:43 PM, David Hildenbrand wrote:
> On 20.10.2017 15:36, Christian Borntraeger wrote:
>>
>>
>> On 10/20/2017 03:16 PM, David Hildenbrand wrote:
>>>
 Hi all,

 we recently encountered the problem that the 'host-model' [1] has to be
 related to the machine type of a domain. We have following problem:

Let's assume we've a z13 system with a QEMU 2.9 and we define a
domain using the default s390-virtio-ccw machine together with the
host-model CPU mode [1]. The definition will have the machine
expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
in the domain definition. In a next step we upgrade to QEMU 2.10
(first version to recognize z14). Everything is still fine, even
though the machine runs in 2.9 compatibility mode. Finally we
upgrade to a z14. As a consequence it is not possible to start the
domain anymore as the machine type doesn't support our CPU host
model (which is expanded at start time of the domain).
>>>
>>> Actually, what is the cause of that problem? I assume it is the gs
>>> feature (gs_allowed)?
>>>
>>> We should really avoid such things (..._allowed) for CPU model features
>>> in the future and clue all new such stuff to cpumodel_allowed.
>>
>> Yes, starting a guest with
>>
>> hvm
>>   
>>   
>>
>> results in
>>
>> qemu-system-s390x: Some features requested in the CPU model are not 
>> available in the configuration: gs 
>>
>> Tying it to cpumodel_allowed would not help, migration-wise though.
>> libvirt would still transform
>>
>>
>> hvm
>>   
>>   
> 
> My point was, that the host model would have to be copied and _remain_
> there when s390-ccw-virtio was expanded to s390-ccw-virtio-2.9.
> 
> So really replacing  by the model z13-base
> This would at least fix this issue. Just like s390-ccw-virtio get's
> replaced and remains thats way.
> 
> But this might for sure have other problems.

The problem goes much further.
A fresh guest with


 hvm
   
   

does not start. No migration from an older system is necessary.




Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand
On 20.10.2017 15:36, Christian Borntraeger wrote:
> 
> 
> On 10/20/2017 03:16 PM, David Hildenbrand wrote:
>>
>>> Hi all,
>>>
>>> we recently encountered the problem that the 'host-model' [1] has to be
>>> related to the machine type of a domain. We have following problem:
>>>
>>>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>>>domain using the default s390-virtio-ccw machine together with the
>>>host-model CPU mode [1]. The definition will have the machine
>>>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>>>in the domain definition. In a next step we upgrade to QEMU 2.10
>>>(first version to recognize z14). Everything is still fine, even
>>>though the machine runs in 2.9 compatibility mode. Finally we
>>>upgrade to a z14. As a consequence it is not possible to start the
>>>domain anymore as the machine type doesn't support our CPU host
>>>model (which is expanded at start time of the domain).
>>
>> Actually, what is the cause of that problem? I assume it is the gs
>> feature (gs_allowed)?
>>
>> We should really avoid such things (..._allowed) for CPU model features
>> in the future and clue all new such stuff to cpumodel_allowed.
> 
> Yes, starting a guest with
>
> hvm
>   
>   
> 
> results in
> 
> qemu-system-s390x: Some features requested in the CPU model are not available 
> in the configuration: gs 
> 
> Tying it to cpumodel_allowed would not help, migration-wise though.
> libvirt would still transform
> 
>
> hvm
>   
>   

My point was, that the host model would have to be copied and _remain_
there when s390-ccw-virtio was expanded to s390-ccw-virtio-2.9.

So really replacing  by the model z13-base
This would at least fix this issue. Just like s390-ccw-virtio get's
replaced and remains thats way.

But this might for sure have other problems.

> 
> 
> into 
> -cpu 
> z14-base,aen=on,cmmnt=on,aefsi=on,mepoch=on,msa8=on,msa7=on,msa6=on,msa5=on,msa4=on,
>  \
> msa3=on,msa2=on,msa1=on,sthyi=on,edat=on,ri=on,edat2=on,vx=on,ipter=on,vxeh=on,vxpd=on,
>  \
> esop=on,iep=on,cte=on,ais=on,gs=on,zpci=on,sea_esop2=on,te=on,cmm=on
>  ^
> because cpu model is certainly there. Now the guest would start but migration 
> would
> later fail because what we create now would never have been possible with 2.9.

Migration is a totally different story, as tooling has to make sure to
find a CPU model that is compatible over all host models. So
cpumodel_allowed would indeed work. (at least in my theory ;) )

> 
> If libvirt could get information from QEMU depending on the machine version, 
> this would
> make it work. But of course this has other issues.

I am not sure if that is the right thing to do.

The documentation states clearly that the host model is copied. If that
is not runnable, fix the setup.

> 
> Christian
> 


-- 

Thanks,

David



Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread Christian Borntraeger


On 10/20/2017 03:16 PM, David Hildenbrand wrote:
> 
>> Hi all,
>>
>> we recently encountered the problem that the 'host-model' [1] has to be
>> related to the machine type of a domain. We have following problem:
>>
>>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>>domain using the default s390-virtio-ccw machine together with the
>>host-model CPU mode [1]. The definition will have the machine
>>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>>in the domain definition. In a next step we upgrade to QEMU 2.10
>>(first version to recognize z14). Everything is still fine, even
>>though the machine runs in 2.9 compatibility mode. Finally we
>>upgrade to a z14. As a consequence it is not possible to start the
>>domain anymore as the machine type doesn't support our CPU host
>>model (which is expanded at start time of the domain).
> 
> Actually, what is the cause of that problem? I assume it is the gs
> feature (gs_allowed)?
> 
> We should really avoid such things (..._allowed) for CPU model features
> in the future and clue all new such stuff to cpumodel_allowed.

Yes, starting a guest with
   
hvm
  
  

results in

qemu-system-s390x: Some features requested in the CPU model are not available 
in the configuration: gs 

Tying it to cpumodel_allowed would not help, migration-wise though.
libvirt would still transform

   
hvm
  
  


into 
-cpu 
z14-base,aen=on,cmmnt=on,aefsi=on,mepoch=on,msa8=on,msa7=on,msa6=on,msa5=on,msa4=on,
 \
msa3=on,msa2=on,msa1=on,sthyi=on,edat=on,ri=on,edat2=on,vx=on,ipter=on,vxeh=on,vxpd=on,
 \
esop=on,iep=on,cte=on,ais=on,gs=on,zpci=on,sea_esop2=on,te=on,cmm=on
 ^
because cpu model is certainly there. Now the guest would start but migration 
would
later fail because what we create now would never have been possible with 2.9.

If libvirt could get information from QEMU depending on the machine version, 
this would
make it work. But of course this has other issues.

Christian




Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand

> Hi all,
> 
> we recently encountered the problem that the 'host-model' [1] has to be
> related to the machine type of a domain. We have following problem:
> 
>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>domain using the default s390-virtio-ccw machine together with the
>host-model CPU mode [1]. The definition will have the machine
>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>in the domain definition. In a next step we upgrade to QEMU 2.10
>(first version to recognize z14). Everything is still fine, even
>though the machine runs in 2.9 compatibility mode. Finally we
>upgrade to a z14. As a consequence it is not possible to start the
>domain anymore as the machine type doesn't support our CPU host
>model (which is expanded at start time of the domain).

Actually, what is the cause of that problem? I assume it is the gs
feature (gs_allowed)?

We should really avoid such things (..._allowed) for CPU model features
in the future and clue all new such stuff to cpumodel_allowed.

-- 

Thanks,

David



Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand
On 20.10.2017 14:50, Jiri Denemark wrote:
> On Fri, Oct 20, 2017 at 13:37:42 +0200, David Hildenbrand wrote:
>> On 20.10.2017 13:09, Marc Hartmayer wrote:
>>> we recently encountered the problem that the 'host-model' [1] has to be
>>> related to the machine type of a domain. We have following problem:
>>>
>>>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>>>domain using the default s390-virtio-ccw machine together with the
>>>host-model CPU mode [1]. The definition will have the machine
>>>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>>>in the domain definition. In a next step we upgrade to QEMU 2.10
>>>(first version to recognize z14). Everything is still fine, even
>>>though the machine runs in 2.9 compatibility mode. Finally we
>>>upgrade to a z14. As a consequence it is not possible to start the
>>>domain anymore as the machine type doesn't support our CPU host
>>>model (which is expanded at start time of the domain).
>>
>> Yes, the host model may vary depending on QEMU version and to some
>> degree also on compatibility machines (although I always try to push
>> people to not introduce any new such stuff).
>>
>> Quoting for the libvirt documentation: https://libvirt.org/formatdomain.html
>>
>> "host-model
>> The host-model mode is essentially a shortcut to copying host CPU
>> definition from capabilities XML into domain XML. Since the CPU
>> definition is copied just before starting a domain, exactly the same XML
>> can be used on different hosts while still providing the best guest CPU
>> each host supports."
>>
>> You're telling me that that copying does not happen, which seems to be
>> the problematic part about this in my opinion.
>>
>> So I am not sure if probing anything else (as you noted below) is the
>> correct thing to do. Copy it and you have a migration_safe and even
>> static version of the _current_ host CPU.
> 
> The thing is libvirt calls query-cpu-model-expansion to check what the
> host CPU is. This 'host-model' CPU is replaced with the probed CPU model
> when a domain starts. The problem described by Marc is that the probed
> CPU model cannot be used universally with all machine types. So starting
> a domain on such host with machine type s390-virtio-ccw-2.10 works, but
> a domain with machine type s390-virtio-ccw-2.9 fails to start with the
> same probed CPU model.
> 

My assumption would be that the CPU model is copied into the XML when
the domain fist starts. This is what the documentation describes.

So when upgrading QEMU, the CPU model in the XML is still the same
(z13), even though something different is now reported in the host info
after upgrading QEMU (z14).

In this case it would continue to work.

The problem is that the CPU model is not copied into the XML doesn't
remain there, right? It is suddenly replaced with a z14 host model.

> Jirka
> 


-- 

Thanks,

David



Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread Jiri Denemark
On Fri, Oct 20, 2017 at 13:37:42 +0200, David Hildenbrand wrote:
> On 20.10.2017 13:09, Marc Hartmayer wrote:
> > we recently encountered the problem that the 'host-model' [1] has to be
> > related to the machine type of a domain. We have following problem:
> > 
> >Let's assume we've a z13 system with a QEMU 2.9 and we define a
> >domain using the default s390-virtio-ccw machine together with the
> >host-model CPU mode [1]. The definition will have the machine
> >expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
> >in the domain definition. In a next step we upgrade to QEMU 2.10
> >(first version to recognize z14). Everything is still fine, even
> >though the machine runs in 2.9 compatibility mode. Finally we
> >upgrade to a z14. As a consequence it is not possible to start the
> >domain anymore as the machine type doesn't support our CPU host
> >model (which is expanded at start time of the domain).
> 
> Yes, the host model may vary depending on QEMU version and to some
> degree also on compatibility machines (although I always try to push
> people to not introduce any new such stuff).
> 
> Quoting for the libvirt documentation: https://libvirt.org/formatdomain.html
> 
> "host-model
> The host-model mode is essentially a shortcut to copying host CPU
> definition from capabilities XML into domain XML. Since the CPU
> definition is copied just before starting a domain, exactly the same XML
> can be used on different hosts while still providing the best guest CPU
> each host supports."
> 
> You're telling me that that copying does not happen, which seems to be
> the problematic part about this in my opinion.
> 
> So I am not sure if probing anything else (as you noted below) is the
> correct thing to do. Copy it and you have a migration_safe and even
> static version of the _current_ host CPU.

The thing is libvirt calls query-cpu-model-expansion to check what the
host CPU is. This 'host-model' CPU is replaced with the probed CPU model
when a domain starts. The problem described by Marc is that the probed
CPU model cannot be used universally with all machine types. So starting
a domain on such host with machine type s390-virtio-ccw-2.10 works, but
a domain with machine type s390-virtio-ccw-2.9 fails to start with the
same probed CPU model.

Jirka



Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread Jiri Denemark
On Fri, Oct 20, 2017 at 13:09:26 +0200, Marc Hartmayer wrote:
> we recently encountered the problem that the 'host-model' [1] has to be
> related to the machine type of a domain. We have following problem:
> 
>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>domain using the default s390-virtio-ccw machine together with the
>host-model CPU mode [1]. The definition will have the machine
>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>in the domain definition. In a next step we upgrade to QEMU 2.10
>(first version to recognize z14). Everything is still fine, even
>though the machine runs in 2.9 compatibility mode. Finally we
>upgrade to a z14. As a consequence it is not possible to start the
>domain anymore as the machine type doesn't support our CPU host
>model (which is expanded at start time of the domain).
> 
> For determining the actual host-model the QMP command
> 'query-cpu-model-expansion' is used. This is done once per QEMU binary
> and the result of it is cached by libvirt. The problem with that is
> that libvirt queries with the newest machine type of the QEMU binary
> for the host CPU model.

No, libvirt probes QEMU with -machine none.

> We could now either probe the host CPU model for each QEMU binary +
> machine type combination and for this we've to start a new QEMU
> process each time.

This is not really a viable solution.

Jirka



Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand
On 20.10.2017 13:09, Marc Hartmayer wrote:
> On Thu, Oct 12, 2017 at 02:07 PM +0200, Jiri Denemark  
> wrote:
>> On Mon, Oct 09, 2017 at 10:16:48 +0200, Marc Hartmayer wrote:
>>> On Thu, Oct 05, 2017 at 02:11 PM +0200, Jiri Denemark  
>>> wrote:
 But it's going to be a bit complicated because we ask QEMU what the
 host CPU is and the interface we used would need to be enhanced to
 give us different results for all supported machine types so that we
 can select the right one when a domain is started.
>>>
>>> So how do we deal with this?
>>
>> I think we need to start discussing this on qemu-devel list. If I
>> remember correctly, David Hildenbrand is the original author of the
>> query-cpu-model-expansion QMP command.
>>
>> Please, Cc me so that the thread is more visible to me.
>>
>> Thanks,
>>
>> Jirka
>>
> 
> Hi all,
> 
> we recently encountered the problem that the 'host-model' [1] has to be
> related to the machine type of a domain. We have following problem:
> 
>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>domain using the default s390-virtio-ccw machine together with the
>host-model CPU mode [1]. The definition will have the machine
>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>in the domain definition. In a next step we upgrade to QEMU 2.10
>(first version to recognize z14). Everything is still fine, even
>though the machine runs in 2.9 compatibility mode. Finally we
>upgrade to a z14. As a consequence it is not possible to start the
>domain anymore as the machine type doesn't support our CPU host
>model (which is expanded at start time of the domain).

Yes, the host model may vary depending on QEMU version and to some
degree also on compatibility machines (although I always try to push
people to not introduce any new such stuff).

Quoting for the libvirt documentation: https://libvirt.org/formatdomain.html

"host-model
The host-model mode is essentially a shortcut to copying host CPU
definition from capabilities XML into domain XML. Since the CPU
definition is copied just before starting a domain, exactly the same XML
can be used on different hosts while still providing the best guest CPU
each host supports."

You're telling me that that copying does not happen, which seems to be
the problematic part about this in my opinion.

So I am not sure if probing anything else (as you noted below) is the
correct thing to do. Copy it and you have a migration_safe and even
static version of the _current_ host CPU.

> 
> For determining the actual host-model the QMP command
> 'query-cpu-model-expansion' is used. This is done once per QEMU binary
> and the result of it is cached by libvirt. The problem with that is
> that libvirt queries with the newest machine type of the QEMU binary
> for the host CPU model. But now we realized that we have to consider
> the machine type of each domain for the determination of the host CPU
> model of a domain.
> 
> We could now either probe the host CPU model for each QEMU binary +
> machine type combination and for this we've to start a new QEMU
> process each time. Or we can add a QMP command/parameter that allows
> us to determine the host CPU model(s) with respect to the passed
> machine type and/or all supported machine types.
> 
> The QMP command query-cpu-model-expansion is currently described in
> qapi-schema.json as follows:
> 
> 
> ##
> # @query-cpu-model-expansion:
> #
> # Expands a given CPU model (or a combination of CPU model + additional 
> options)
> # to different granularities, allowing tooling to get an understanding what a
> # specific CPU model looks like in QEMU under a certain configuration.
> #
> # This interface can be used to query the "host" CPU model.
> #
> # The data returned by this command may be affected by:
> #
> # * QEMU version: CPU models may look different depending on the QEMU version.
> # (Except for CPU models reported as "static" in query-cpu-definitions.)
> # * machine-type: CPU model may look different depending on the machine-type.
> # (Except for CPU models reported as "static" in query-cpu-definitions.)
> # * machine options (including accelerator): in some architectures, CPU models
> # may look different depending on machine and accelerator options. (Except for
> # CPU models reported as "static" in query-cpu-definitions.)
> # * "-cpu" arguments and global properties: arguments to the -cpu option and
> # global properties may affect expansion of CPU models. Using
> # query-cpu-model-expansion while using these is not advised.
> #
> # Some architectures may not support all expansion types. s390x supports
> # "full" and "static".
> #
> # Returns: a CpuModelExpansionInfo. Returns an error if expanding CPU models 
> is
> # not supported, if the model cannot be expanded, if the model contains
> # an unknown CPU definition name, unknown properties or properties
> # with a wrong type. Also 

Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode

2017-10-20 Thread Marc Hartmayer
On Thu, Oct 12, 2017 at 02:07 PM +0200, Jiri Denemark  
wrote:
> On Mon, Oct 09, 2017 at 10:16:48 +0200, Marc Hartmayer wrote:
>> On Thu, Oct 05, 2017 at 02:11 PM +0200, Jiri Denemark  
>> wrote:
>> > But it's going to be a bit complicated because we ask QEMU what the
>> > host CPU is and the interface we used would need to be enhanced to
>> > give us different results for all supported machine types so that we
>> > can select the right one when a domain is started.
>>
>> So how do we deal with this?
>
> I think we need to start discussing this on qemu-devel list. If I
> remember correctly, David Hildenbrand is the original author of the
> query-cpu-model-expansion QMP command.
>
> Please, Cc me so that the thread is more visible to me.
>
> Thanks,
>
> Jirka
>

Hi all,

we recently encountered the problem that the 'host-model' [1] has to be
related to the machine type of a domain. We have following problem:

   Let's assume we've a z13 system with a QEMU 2.9 and we define a
   domain using the default s390-virtio-ccw machine together with the
   host-model CPU mode [1]. The definition will have the machine
   expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
   in the domain definition. In a next step we upgrade to QEMU 2.10
   (first version to recognize z14). Everything is still fine, even
   though the machine runs in 2.9 compatibility mode. Finally we
   upgrade to a z14. As a consequence it is not possible to start the
   domain anymore as the machine type doesn't support our CPU host
   model (which is expanded at start time of the domain).

For determining the actual host-model the QMP command
'query-cpu-model-expansion' is used. This is done once per QEMU binary
and the result of it is cached by libvirt. The problem with that is
that libvirt queries with the newest machine type of the QEMU binary
for the host CPU model. But now we realized that we have to consider
the machine type of each domain for the determination of the host CPU
model of a domain.

We could now either probe the host CPU model for each QEMU binary +
machine type combination and for this we've to start a new QEMU
process each time. Or we can add a QMP command/parameter that allows
us to determine the host CPU model(s) with respect to the passed
machine type and/or all supported machine types.

The QMP command query-cpu-model-expansion is currently described in
qapi-schema.json as follows:


##
# @query-cpu-model-expansion:
#
# Expands a given CPU model (or a combination of CPU model + additional options)
# to different granularities, allowing tooling to get an understanding what a
# specific CPU model looks like in QEMU under a certain configuration.
#
# This interface can be used to query the "host" CPU model.
#
# The data returned by this command may be affected by:
#
# * QEMU version: CPU models may look different depending on the QEMU version.
# (Except for CPU models reported as "static" in query-cpu-definitions.)
# * machine-type: CPU model may look different depending on the machine-type.
# (Except for CPU models reported as "static" in query-cpu-definitions.)
# * machine options (including accelerator): in some architectures, CPU models
# may look different depending on machine and accelerator options. (Except for
# CPU models reported as "static" in query-cpu-definitions.)
# * "-cpu" arguments and global properties: arguments to the -cpu option and
# global properties may affect expansion of CPU models. Using
# query-cpu-model-expansion while using these is not advised.
#
# Some architectures may not support all expansion types. s390x supports
# "full" and "static".
#
# Returns: a CpuModelExpansionInfo. Returns an error if expanding CPU models is
# not supported, if the model cannot be expanded, if the model contains
# an unknown CPU definition name, unknown properties or properties
# with a wrong type. Also returns an error if an expansion type is
# not supported.
#
# Since: 2.8.0
##
{ 'command': 'query-cpu-model-expansion',
'data': { 'type': 'CpuModelExpansionType',
'model': 'CpuModelInfo' },
'returns': 'CpuModelExpansionInfo' }

4:46:25 PM
◄
qapi-schema.json


What do you think?

Thank you.

 Marc



[1] https://libvirt.org/formatdomain.html#elementsCPU

--
Beste Grüße / Kind regards
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294




Re: [Qemu-devel] HDMI Question

2017-09-27 Thread Alistair Francis
On Wed, Sep 27, 2017 at 5:31 AM, Ni, Xingrong  wrote:
> Hello,
>
> I am using qemu and hit a question and would like to ask here, can we 
> simulate a HDMI device in Qemu, if so, how to do it?

Hello,

I don't see any reason why you can't. We (Xilinx) have a display port
device in QEMU that we use.

Basically how it works is you model the devices registers and use
QEMUs display backend (usually over SDL or VNC) to display the image.

Here is a pretty recent example of a display port device in QEMU:
https://github.com/qemu/qemu/blob/master/hw/display/xlnx_dp.c

In saying that display devices are pretty intense and this will not be
a trivial task.

Thanks,
Alistair

>
>
> Thanks & Best Wishes!
> Rocky
>
>



Re: [Qemu-devel] question:a question about throttle and hot-unplug

2017-09-14 Thread Alberto Garcia
On Thu 14 Sep 2017 11:41:40 AM CEST, WangJie (Captain) wrote:

> the patch you commited: 
> https://github.com/qemu/qemu/commit/7ca7f0f6db1fedd28d490795d778cf23979a2aa7#diff-ea36ba0f79150cc299732696a069caba
>
> remove blk_io_limits_disable from blk_remove_bs
>
> Then, if a disk which configured qos hot-unplug from VM, the backend
> of the disk reminds in throttle group.
>
> So when I hot-unplug and hot-plug a disk, and use the same throttle
> group name, will lead to qemu crash.
>
> and Eric committed a path as fallow fixed the bug on qemu-kvm 2.9.0-rc4:
> https://github.com/qemu/qemu/commit/1606e4cf8a976513ecac70ad6642a7ec45744cf5#diff-7cb66df56045598b75a219eebc27efb6
>
> Is what I said as below correct?

That sounds correct to me. Here's the original discussion for reference:

https://lists.gnu.org/archive/html/qemu-block/2017-04/msg00019.html

Berto



Re: [Qemu-devel] slirp: question about using SMB on windows host with linux guest

2017-08-08 Thread FONNEMANN Mark
Michael-



OK, I was confused until reading your email but I have now been able to get it 
to work.

Unfortunately, I find that it is *very* unreliable when trying to mount the 
share; It only works sometimes.

See attached screenshot.

Do you have any suggestions on how to fix this problem?

I am using qemu 2.9.0.



Best Regards / Mit freundlichen Grüßen / 敬具,

Mark.



-Original Message-
From: Michael Tokarev [mailto:m...@tls.msk.ru]
Sent: Sunday, May 14, 2017 02:47
To: FONNEMANN Mark <mark.fonnem...@hexagon.com>; qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] slirp: question about using SMB on windows host with 
linux guest



07.05.2017 00:30, FONNEMANN Mark write:

> Hello-

>

> I am trying to shares files between Windows host and Linux guest using 
> slirp’s built-in SMB server.



There's no such thing as "slirp built-in SMB Server". Qemu does not have an SMB 
server built-in, qemu uses samba to provide smb functionality, it spawns smbd.  
So you have to have smbd on windows in order to use this functionality.



However, on windows, it's easy to use native CIFS server, just share the 
directory you want to share and connect to your windows machine from within 
guest.



/mjt




Re: [Qemu-devel] slirp: question about using SMB on windows host with linux guest

2017-05-14 Thread Michael Tokarev
07.05.2017 00:30, FONNEMANN Mark write:
> Hello-
> 
> I am trying to shares files between Windows host and Linux guest using 
> slirp’s built-in SMB server.

There's no such thing as "slirp built-in SMB Server". Qemu does not have an SMB 
server
built-in, qemu uses samba to provide smb functionality, it spawns smbd.  So you 
have
to have smbd on windows in order to use this functionality.

However, on windows, it's easy to use native CIFS server, just share the 
directory you
want to share and connect to your windows machine from within guest.

/mjt



Re: [Qemu-devel] slirp: question about using SMB on windows host with linux guest

2017-05-13 Thread FONNEMANN Mark
>I am trying to shares files between Windows host and Linux guest using slirp’s 
>built-in SMB server.
>I start QEMU 2.9.0 using the following:
>
>C:\Program Files\qemu>qemu-system-i386.exe -readconfig 
>\Users\mfonnemann\qemu.cfg -net nic -net user,smb=\Users\mfonnemann\smb
>
>I then try to mount the SMB share first with 10.0.2.4:
>
>root@qemu:~# mount -t cifs //10.0.2.4/Users/mfonnemann/smb tmp -o 
>user=mfonnemann
>Password:
> CIFS VFS: Error connecting to IPv4 socket. Aborting 
> operation
> CIFS VFS: cifs_mount failed w/return code = -113
>mount error 113 = No route to host
>mount: mount.cifs failed with status -1
>
>And then with 10.0.2.2:
>
>root@qemu:~# mount -t cifs //10.0.2.2/Users/mfonnemann/smb tmp -o 
>user=mfonnemann
>Password:
>CIFS VFS: Unknown RFC 1002 frame
>
> CIFS VFS: Send error in SessSetup = -13
> CIFS VFS: cifs_mount failed w/return code = -13
>mount error 13 = Permission denied
>mount: mount.cifs failed with status -1
>
>Regarding RFC 1002 frame, I see the following when using “dmesg”:
>
>CIFS VFS: Unknown RFC 1002 frame
> Received Data: : dump of 36 bytes of data at 0xf71400c0
>
> 008d 57770064 6e006900 6f006400 . . . . d . w W . i . 
> n . d . o
> 73007700 31002000 20003000 72005000 . w . s .   . 1 . 0 . 
>   . P . r
> 20006f00 . o .

I just retried tried both again using Windows 7 Professional and Windows 10 
Professional.
I also tried using QEMU 2.8 as well.
Not sure what else to try.



Re: [Qemu-devel] quick question about live VM migration

2017-04-11 Thread Daniel P. Berrange
On Mon, Apr 10, 2017 at 01:07:14PM -0700, Vinod Chegu wrote:
> Hello,
> 
> A very basic question...
> 
> In a vanilla KVM env. (i.e. non open stack etc) when  a VM is live migrated
> to another host does the MAC address of the vNICs and the DHCP IP address
> for these interfaces remain the same ?

*All* aspects of the virtual machine hardware *must* remain the same. If you
have any differences in configuration of QEMU on the target host you are likely
to either prevent live migration working, or break the guest OS after migration
is finished. This includes MAC address, PCI addresses, device versions, etc,
etc

> Thought the answer was a yes as the goal is to minimize disruption to the
> apps etc. in the VM, but wanted to double check (couldn't verify as I don't
> have a ready test env handy at the moment).
> 
> BTW, please assume that the MAC address for the vNIC interfaces were not
> explicitly specified at the time of VM creation.

It is the responsibility of whoever/whatever launches QEMU on the target
host to ensure the hardware config is identical. The only safe way todo
this is ensure you specify every single aspect of guest hardware when
launching QEMU, including the MAC address. 

FWIW, if you are using libvirt to manage QEMU, then libvirt goes to great
lengths to ensure this happens (which is why QEMU command lines generated
by libvirt are so huge :-)

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [libvirt] Question regarding Snapshots

2017-02-27 Thread Eric Blake
On 02/27/2017 07:17 AM, Kashyap Chamarthy wrote:
> On Mon, Feb 27, 2017 at 01:33:56PM +0100, Kashyap Chamarthy wrote:
> 
> [...]
> 
 Can you make a snapshot, then go back to the base and ignore that snapshot
 like a closed branch, make another snapshot, and then go back and forth
 from each to each?

Yes.  Or graphically, you can have:

   -- A ... B ... C
  /
base <
  \
   -- D ... E ... F

and toggle between any of the named snapshots at will.  As Kashyap and
others pointed out, libvirt support for reverting to a particular
external snapshot is not fully present, so there are some manual steps
involved, but the task is still doable.

> 
>> For the long answer to the complications involved in reverting to
>> external snapshots, refer this (long read):
>>
>> https://wiki.libvirt.org/page/I_created_an_external_snapshot,_but_libvirt_will_not_let_me_delete_or_revert_to_it
> 
> [...]
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [libvirt] Question regarding Snapshots

2017-02-27 Thread Kashyap Chamarthy
On Mon, Feb 27, 2017 at 01:33:56PM +0100, Kashyap Chamarthy wrote:

[...]

> > > Can you make a snapshot, then go back to the base and ignore that snapshot
> > > like a closed branch, make another snapshot, and then go back and forth
> > > from each to each?
> 
> If you edit the '--disk' element as above and point to the right
> 'snapshot', you should be able to switch between several of them.

[...]

> And conveniently, you can tell libvirt to not track the metadata:
> 
>   $ virsh snapshot-create-as --domain vm1 guest-state1 \
>   --diskspec vda,file=/export/overlay1.qcow2 \
>   --disk-only --atomic --no-metadata
> 
> This way, libvirt will not track 'overlay1.qcow2'.  But if you do need
> it, as you know, just update the 

Err, unfinished sentence there.  Meant to write: just update the
'source file' attribute in the guest XML, pointing to the said disk.

> For the long answer to the complications involved in reverting to
> external snapshots, refer this (long read):
> 
> https://wiki.libvirt.org/page/I_created_an_external_snapshot,_but_libvirt_will_not_let_me_delete_or_revert_to_it

[...]

-- 
/kashyap



Re: [Qemu-devel] A question about PCI device address spaces

2016-12-26 Thread David Gibson
On Mon, Dec 26, 2016 at 01:01:34PM +0200, Marcel Apfelbaum wrote:
> On 12/22/2016 11:42 AM, Peter Xu wrote:
> > Hello,
> > 
> 
> Hi Peter,
> 
> > Since this is a general topic, I picked it out from the VT-d
> > discussion and put it here, just want to be more clear of it.
> > 
> > The issue is, whether we have exposed too much address spaces for
> > emulated PCI devices?
> > 
> > Now for each PCI device, we are having PCIDevice::bus_master_as for
> > the device visible address space, which derived from
> > pci_device_iommu_address_space():
> > 
> > AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> > {
> > PCIBus *bus = PCI_BUS(dev->bus);
> > PCIBus *iommu_bus = bus;
> > 
> > while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
> > iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus);
> > }
> > if (iommu_bus && iommu_bus->iommu_fn) {
> > return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, 
> > dev->devfn);
> > }
> > return _space_memory;
> > }
> > 
> > By default (for no-iommu case), it's pointed to system memory space,
> > which includes MMIO, and looks wrong - PCI device should not be able to
> > write to MMIO regions.
> > 
> 
> Why? As far as I know a PCI device can start a read/write transaction
> to virtually any address, it doesn't matter if it 'lands' in RAM or a MMIO
> region mapped by other device. But I might be wrong, need to read the spec 
> again...

So as I noted in another mail, my earlier comment which led Peter to
say that was misleading.  In particular I was talking about *non PCI*
MMIO devices, which barely exist on x86 (and even there the statement
won't necessarily be true).

> The PCI transaction will eventually reach the Root Complex/PCI host bridge
> where an IOMMU or some other hw entity can sanitize/translate, but is out of
> the scope of the device itself.

Right, but we're not talking about the device, or purely within PCI
address space.  We're explicitly talking about what addresses the
RC/host bridge will translate between PCI space and CPU address space.
I'm betting that even on x86, it won't be the whole 64-bit address
space (otherwise how would the host bridge know whether another PCI
device might be listening on that address).

> The Root Complex will 'translate' the transaction into a memory read/write
> in the behalf of the device and pass it to the memory controller.
> If the transaction target is another device, I am not sure if the
> Root Complex will re-route by itself or pass it to the Memory Controller.

It will either re-route itself, or simply drop it, possibly depending
on configuration.  I'm sure the MC won't be bouncing transactions back
to PCI space.  Note that for vanilla PCI the question is moot - the
cycle will be broadcast on the bus segment and something will pick it
up - either a device or the host bridge.  If multiple things try to
respond to the same addresses, things will go badly wrong.

> > As an example, if we dump a PCI device address space into detail on
> > x86_64 system, we can see (this is address space for a virtio-net-pci
> > device on an Q35 machine with 6G memory):
> > 
> > -0009 (prio 0, RW): pc.ram
> > 000a-000a (prio 1, RW): vga.vram
> > 000b-000b (prio 1, RW): vga-lowmem
> > 000c-000c9fff (prio 0, RW): pc.ram
> > 000ca000-000ccfff (prio 0, RW): pc.ram
> > 000cd000-000ebfff (prio 0, RW): pc.ram
> > 000ec000-000e (prio 0, RW): pc.ram
> > 000f-000f (prio 0, RW): pc.ram
> > 0010-7fff (prio 0, RW): pc.ram
> > b000-bfff (prio 0, RW): pcie-mmcfg-mmio
> > fd00-fdff (prio 1, RW): vga.vram
> > fe00-fe000fff (prio 0, RW): virtio-pci-common
> > fe001000-fe001fff (prio 0, RW): virtio-pci-isr
> > fe002000-fe002fff (prio 0, RW): virtio-pci-device
> > fe003000-fe003fff (prio 0, RW): virtio-pci-notify
> > febd0400-febd041f (prio 0, RW): vga ioports remapped
> > febd0500-febd0515 (prio 0, RW): bochs dispi interface
> > febd0600-febd0607 (prio 0, RW): qemu extended regs
> > febd1000-febd102f (prio 0, RW): msix-table
> > febd1800-febd1807 (prio 0, RW): msix-pba
> > febd2000-febd2fff (prio 1, RW): ahci
> > fec0-fec00fff (prio 0, RW): kvm-ioapic
> > fed0-fed003ff (prio 0, RW): hpet
> > fed1c000-fed1 (prio 1, RW): lpc-rcrb-mmio
> > fee0-feef (prio 4096, RW): kvm-apic-msi
> > fffc- (prio 0, R-): pc.bios
> > 0001-0001 (prio 0, RW): pc.ram
> > 
> > 

Re: [Qemu-devel] A question about PCI device address spaces

2016-12-26 Thread Marcel Apfelbaum

On 12/22/2016 11:42 AM, Peter Xu wrote:

Hello,



Hi Peter,


Since this is a general topic, I picked it out from the VT-d
discussion and put it here, just want to be more clear of it.

The issue is, whether we have exposed too much address spaces for
emulated PCI devices?

Now for each PCI device, we are having PCIDevice::bus_master_as for
the device visible address space, which derived from
pci_device_iommu_address_space():

AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
{
PCIBus *bus = PCI_BUS(dev->bus);
PCIBus *iommu_bus = bus;

while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus);
}
if (iommu_bus && iommu_bus->iommu_fn) {
return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn);
}
return _space_memory;
}

By default (for no-iommu case), it's pointed to system memory space,
which includes MMIO, and looks wrong - PCI device should not be able to
write to MMIO regions.



Why? As far as I know a PCI device can start a read/write transaction
to virtually any address, it doesn't matter if it 'lands' in RAM or a MMIO
region mapped by other device. But I might be wrong, need to read the spec 
again...

The PCI transaction will eventually reach the Root Complex/PCI host bridge
where an IOMMU or some other hw entity can sanitize/translate, but is out of
the scope of the device itself.

The Root Complex will 'translate' the transaction into a memory read/write
in the behalf of the device and pass it to the memory controller.
If the transaction target is another device, I am not sure if the
Root Complex will re-route by itself or pass it to the Memory Controller.



As an example, if we dump a PCI device address space into detail on
x86_64 system, we can see (this is address space for a virtio-net-pci
device on an Q35 machine with 6G memory):

-0009 (prio 0, RW): pc.ram
000a-000a (prio 1, RW): vga.vram
000b-000b (prio 1, RW): vga-lowmem
000c-000c9fff (prio 0, RW): pc.ram
000ca000-000ccfff (prio 0, RW): pc.ram
000cd000-000ebfff (prio 0, RW): pc.ram
000ec000-000e (prio 0, RW): pc.ram
000f-000f (prio 0, RW): pc.ram
0010-7fff (prio 0, RW): pc.ram
b000-bfff (prio 0, RW): pcie-mmcfg-mmio
fd00-fdff (prio 1, RW): vga.vram
fe00-fe000fff (prio 0, RW): virtio-pci-common
fe001000-fe001fff (prio 0, RW): virtio-pci-isr
fe002000-fe002fff (prio 0, RW): virtio-pci-device
fe003000-fe003fff (prio 0, RW): virtio-pci-notify
febd0400-febd041f (prio 0, RW): vga ioports remapped
febd0500-febd0515 (prio 0, RW): bochs dispi interface
febd0600-febd0607 (prio 0, RW): qemu extended regs
febd1000-febd102f (prio 0, RW): msix-table
febd1800-febd1807 (prio 0, RW): msix-pba
febd2000-febd2fff (prio 1, RW): ahci
fec0-fec00fff (prio 0, RW): kvm-ioapic
fed0-fed003ff (prio 0, RW): hpet
fed1c000-fed1 (prio 1, RW): lpc-rcrb-mmio
fee0-feef (prio 4096, RW): kvm-apic-msi
fffc- (prio 0, R-): pc.bios
0001-0001 (prio 0, RW): pc.ram

So here are the "pc.ram" regions the only ones that we should expose
to PCI devices? (it should contain all of them, including the low-mem
ones and the >=4g one)



As I previously said, it does not have to be RAM only, but let's wait
also for Michael's opinion.


And, should this rule work for all platforms?


The PCI rules should be generic for all platforms, but I don't know
the other platforms.

Thanks,
Marcel

Or say, would it be a

problem if I directly change address_space_memory in
pci_device_iommu_address_space() into something else, which only
contains RAMs? (of course this won't affect any platform that has
IOMMU, aka, customized PCIBus::iommu_fn function)

(btw, I'd appreciate if anyone has quick answer on why we have lots of
 continuous "pc.ram" in low 2g range - from can_merge() I guess they
 seem to have different dirty_log_mask, romd_mode, etc., but I still
 would like to know why they are having these difference. Anyway, this
 is totally an "optional question" just to satisfy my own curiosity :)

Thanks,

-- peterx






Re: [Qemu-devel] A question about PCI device address spaces

2016-12-25 Thread Peter Xu
On Fri, Dec 23, 2016 at 11:21:53AM +, Peter Maydell wrote:
> On 22 December 2016 at 09:42, Peter Xu  wrote:
> > Hello,
> >
> > Since this is a general topic, I picked it out from the VT-d
> > discussion and put it here, just want to be more clear of it.
> >
> > The issue is, whether we have exposed too much address spaces for
> > emulated PCI devices?
> >
> > Now for each PCI device, we are having PCIDevice::bus_master_as for
> > the device visible address space, which derived from
> > pci_device_iommu_address_space():
> >
> > AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> > {
> > PCIBus *bus = PCI_BUS(dev->bus);
> > PCIBus *iommu_bus = bus;
> >
> > while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
> > iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus);
> > }
> > if (iommu_bus && iommu_bus->iommu_fn) {
> > return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, 
> > dev->devfn);
> > }
> > return _space_memory;
> > }
> >
> > By default (for no-iommu case), it's pointed to system memory space,
> > which includes MMIO, and looks wrong - PCI device should not be able to
> > write to MMIO regions.
> 
> This is just legacy, I think, ie a combination of "this used to
> be system memory space so let's not break things" and "PC works
> mostly like this". It should be possible for the PCI host bridge
> emulation to set things up so that the device's visible address
> space is whatever it feels like. The PCI APIs we have for doing
> this have "iommu" in the name but they work just as well even
> if the host bridge doesn't actually have an iommu and is just
> setting up a fixed or slightly configurable mapping.
> I think it just hasn't been implemented because for guests which
> aren't misbehaving it doesn't make any difference.

Hmm, yes I see ppc e500 is using that for setting up its own address
space, possibly x86 can leverage it too when we really need this. For
now, I see no strong reason for this enhancement, so let me keep it as
it is, and wait until we have both a strong reason and a PCI guru.

Thank you for your answer! (to Paolo/David as well)

-- peterx



Re: [Qemu-devel] A question about PCI device address spaces

2016-12-23 Thread Peter Maydell
On 22 December 2016 at 09:42, Peter Xu  wrote:
> Hello,
>
> Since this is a general topic, I picked it out from the VT-d
> discussion and put it here, just want to be more clear of it.
>
> The issue is, whether we have exposed too much address spaces for
> emulated PCI devices?
>
> Now for each PCI device, we are having PCIDevice::bus_master_as for
> the device visible address space, which derived from
> pci_device_iommu_address_space():
>
> AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> {
> PCIBus *bus = PCI_BUS(dev->bus);
> PCIBus *iommu_bus = bus;
>
> while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
> iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus);
> }
> if (iommu_bus && iommu_bus->iommu_fn) {
> return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn);
> }
> return _space_memory;
> }
>
> By default (for no-iommu case), it's pointed to system memory space,
> which includes MMIO, and looks wrong - PCI device should not be able to
> write to MMIO regions.

This is just legacy, I think, ie a combination of "this used to
be system memory space so let's not break things" and "PC works
mostly like this". It should be possible for the PCI host bridge
emulation to set things up so that the device's visible address
space is whatever it feels like. The PCI APIs we have for doing
this have "iommu" in the name but they work just as well even
if the host bridge doesn't actually have an iommu and is just
setting up a fixed or slightly configurable mapping.
I think it just hasn't been implemented because for guests which
aren't misbehaving it doesn't make any difference.

thanks
-- PMM



Re: [Qemu-devel] A question about PCI device address spaces

2016-12-22 Thread David Gibson
On Thu, Dec 22, 2016 at 05:42:40PM +0800, Peter Xu wrote:
> Hello,
> 
> Since this is a general topic, I picked it out from the VT-d
> discussion and put it here, just want to be more clear of it.
> 
> The issue is, whether we have exposed too much address spaces for
> emulated PCI devices?
> 
> Now for each PCI device, we are having PCIDevice::bus_master_as for
> the device visible address space, which derived from
> pci_device_iommu_address_space():
> 
> AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> {
> PCIBus *bus = PCI_BUS(dev->bus);
> PCIBus *iommu_bus = bus;
> 
> while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
> iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus);
> }
> if (iommu_bus && iommu_bus->iommu_fn) {
> return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn);
> }
> return _space_memory;
> }
> 
> By default (for no-iommu case), it's pointed to system memory space,
> which includes MMIO, and looks wrong - PCI device should not be able to
> write to MMIO regions.

Sorry, I've realized my earlier comments were a bit misleading.

I'm pretty sure the inbound (==DMA) window(s) will be less than the
full 64-bit address space.  However, that doesn't necessarily mean it
won't cover *any* MMIO.

Plus, of course, any MMIO that's provided by PCI (or legacy ISA)
devices - and on the PC platform, that's nearly everything - will also
be visible in PCI space, since it doesn't need to go through the
inbound window for that at all.  Strictly speaking PCI-provided MMIO
may not appear at the same address in PCI space as it does in the
system memory space, but for PC they will be.  By platform convention
the outbound windows are also identity mappings.

Part of the reason I was misleading was that I was thinking of non-PC
platforms, which often have more "native" MMIO devices on the CPU side
of of the PCI host bridge.

> As an example, if we dump a PCI device address space into detail on
> x86_64 system, we can see (this is address space for a virtio-net-pci
> device on an Q35 machine with 6G memory):
> 
> -0009 (prio 0, RW): pc.ram
> 000a-000a (prio 1, RW): vga.vram
> 000b-000b (prio 1, RW): vga-lowmem
> 000c-000c9fff (prio 0, RW): pc.ram
> 000ca000-000ccfff (prio 0, RW): pc.ram
> 000cd000-000ebfff (prio 0, RW): pc.ram
> 000ec000-000e (prio 0, RW): pc.ram
> 000f-000f (prio 0, RW): pc.ram
> 0010-7fff (prio 0, RW): pc.ram
> b000-bfff (prio 0, RW): pcie-mmcfg-mmio
> fd00-fdff (prio 1, RW): vga.vram
> fe00-fe000fff (prio 0, RW): virtio-pci-common
> fe001000-fe001fff (prio 0, RW): virtio-pci-isr
> fe002000-fe002fff (prio 0, RW): virtio-pci-device
> fe003000-fe003fff (prio 0, RW): virtio-pci-notify
> febd0400-febd041f (prio 0, RW): vga ioports remapped
> febd0500-febd0515 (prio 0, RW): bochs dispi interface
> febd0600-febd0607 (prio 0, RW): qemu extended regs
> febd1000-febd102f (prio 0, RW): msix-table
> febd1800-febd1807 (prio 0, RW): msix-pba
> febd2000-febd2fff (prio 1, RW): ahci
> fec0-fec00fff (prio 0, RW): kvm-ioapic
> fed0-fed003ff (prio 0, RW): hpet
> fed1c000-fed1 (prio 1, RW): lpc-rcrb-mmio
> fee0-feef (prio 4096, RW): kvm-apic-msi
> fffc- (prio 0, R-): pc.bios
> 0001-0001 (prio 0, RW): pc.ram
> 
> So here are the "pc.ram" regions the only ones that we should expose
> to PCI devices? (it should contain all of them, including the low-mem
> ones and the >=4g one)
> 
> And, should this rule work for all platforms? Or say, would it be a
> problem if I directly change address_space_memory in
> pci_device_iommu_address_space() into something else, which only
> contains RAMs? (of course this won't affect any platform that has
> IOMMU, aka, customized PCIBus::iommu_fn function)

No, the arragement of both inbound and outbound windows is certainly
platform dependent (strictly speaking, dependent on the model and
configuration of the host bridge, but that tends to be tied strongly
to the platform).  I think address_space_memory is the closest
approximation we're going to get that works for multiple platforms -
having both inbound and outbound windows identity mapped is pretty
common, I believe, even if they don't strictly speaking cover the
whole address space.

> (btw, I'd appreciate if anyone has quick answer on why we have lots of
>  continuous "pc.ram" in low 2g range - from 

Re: [Qemu-devel] A question about PCI device address spaces

2016-12-22 Thread Paolo Bonzini


On 22/12/2016 10:42, Peter Xu wrote:
> Hello,
> 
> Since this is a general topic, I picked it out from the VT-d
> discussion and put it here, just want to be more clear of it.
> 
> The issue is, whether we have exposed too much address spaces for
> emulated PCI devices?
> 
> Now for each PCI device, we are having PCIDevice::bus_master_as for
> the device visible address space, which derived from
> pci_device_iommu_address_space():
> 
> AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> {
> PCIBus *bus = PCI_BUS(dev->bus);
> PCIBus *iommu_bus = bus;
> 
> while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
> iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus);
> }
> if (iommu_bus && iommu_bus->iommu_fn) {
> return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn);
> }
> return _space_memory;
> }
> 
> By default (for no-iommu case), it's pointed to system memory space,
> which includes MMIO, and looks wrong - PCI device should not be able to
> write to MMIO regions.

Hmm, I think that unless you describe that with ACS, PCI devices should
be able to write to MMIO regions.  Possibly not _all_ of them, I don't
know (maybe they cannot write to MMCONFIG?) but after all they can write
to the MSI region, and that is an MMIO region.

If the IOMMU translate callback included the MemTxAttrs, ACS Source
Validation could probably be implemented with an IOMMU region on the
root complex.  Most of the other ACS features either do not apply, or
they are already implicit in the way that all PCI devices' address
spaces are based on address_space_memory.

Paolo

> As an example, if we dump a PCI device address space into detail on
> x86_64 system, we can see (this is address space for a virtio-net-pci
> device on an Q35 machine with 6G memory):
> 
> -0009 (prio 0, RW): pc.ram
> 000a-000a (prio 1, RW): vga.vram
> 000b-000b (prio 1, RW): vga-lowmem
> 000c-000c9fff (prio 0, RW): pc.ram
> 000ca000-000ccfff (prio 0, RW): pc.ram
> 000cd000-000ebfff (prio 0, RW): pc.ram
> 000ec000-000e (prio 0, RW): pc.ram
> 000f-000f (prio 0, RW): pc.ram
> 0010-7fff (prio 0, RW): pc.ram
> b000-bfff (prio 0, RW): pcie-mmcfg-mmio
> fd00-fdff (prio 1, RW): vga.vram
> fe00-fe000fff (prio 0, RW): virtio-pci-common
> fe001000-fe001fff (prio 0, RW): virtio-pci-isr
> fe002000-fe002fff (prio 0, RW): virtio-pci-device
> fe003000-fe003fff (prio 0, RW): virtio-pci-notify
> febd0400-febd041f (prio 0, RW): vga ioports remapped
> febd0500-febd0515 (prio 0, RW): bochs dispi interface
> febd0600-febd0607 (prio 0, RW): qemu extended regs
> febd1000-febd102f (prio 0, RW): msix-table
> febd1800-febd1807 (prio 0, RW): msix-pba
> febd2000-febd2fff (prio 1, RW): ahci
> fec0-fec00fff (prio 0, RW): kvm-ioapic
> fed0-fed003ff (prio 0, RW): hpet
> fed1c000-fed1 (prio 1, RW): lpc-rcrb-mmio
> fee0-feef (prio 4096, RW): kvm-apic-msi
> fffc- (prio 0, R-): pc.bios
> 0001-0001 (prio 0, RW): pc.ram
> 
> So here are the "pc.ram" regions the only ones that we should expose
> to PCI devices? (it should contain all of them, including the low-mem
> ones and the >=4g one)
> 
> And, should this rule work for all platforms? Or say, would it be a
> problem if I directly change address_space_memory in
> pci_device_iommu_address_space() into something else, which only
> contains RAMs? (of course this won't affect any platform that has
> IOMMU, aka, customized PCIBus::iommu_fn function)
> 
> (btw, I'd appreciate if anyone has quick answer on why we have lots of
>  continuous "pc.ram" in low 2g range - from can_merge() I guess they
>  seem to have different dirty_log_mask, romd_mode, etc., but I still
>  would like to know why they are having these difference. Anyway, this
>  is totally an "optional question" just to satisfy my own curiosity :)
> 
> Thanks,
> 
> -- peterx
> 



Re: [Qemu-devel] virtIO question

2016-11-14 Thread Stefan Hajnoczi
On Mon, Nov 14, 2016 at 08:36:39PM +0800, zhun...@gmail.com wrote:
> I have a question about qemu.is it a bug in qemu version 1.2?
> in qemu version 1.2 ,it set avail event by the code :
>  if (vq->vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) {
> vring_avail_event(vq, vring_avail_idx(vq));
> }
>  and in version 2.7 the code is
>  if (virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
> vring_set_avail_event(vq, vq->last_avail_idx);
> }
> 
> a big difference of this is the value.vring_avail_idx(vq)is the latest value 
> of VRingAvail.idx,and vq->last_avail_idx is not, I think it really different 
> with the two different values,and I think the later is right,is it??

qemu.git/master has both vring_set_avail_event(vq, vring_avail_idx(vq))
and vring_set_avail_event(vq, vq->last_avail_idx) so I'm not sure what
you are referring to.

Please use git-blame(1) and git-log(1) to investigate changes to the
code yourself.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] virtIO question

2016-11-14 Thread Stefan Hajnoczi
On Sat, Nov 12, 2016 at 04:43:21PM +0800, zhun...@gmail.com wrote:
> Thanks,the expression is not the key problem,I just write it wrong,the key 
> problem is that what I get from the code is everytime dirver add a sg ,it 
> will call virtqueue_kick,such as network driver,in start_xmit function ,it 
> called xmit_skb generate a sg list and add it to the queue,then called 
> virtqueue_kick ,why it handle like this??can you explain it to me??thank you 
> very much!!!

I can see there *are* cases in Linux 4.9-rc1 virtio_net.c:start_xmit()
where virtqueue_kick() is skipped so that multiple tx packets can be
added to the virtqueue in a single kick.

static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
{
...
bool kick = !skb->xmit_more;

...
if (kick || netif_xmit_stopped(txq))
virtqueue_kick(sq->vq);
...
}

Also keep in mind that the virtio driver APIs in Linux are used by
multiple device drivers (virtio_net.ko, virtio_blk.ko, etc).
virtio_net.ko does not use all features offered by the API.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] virtIO question

2016-11-13 Thread zhun...@gmail.com
Hello,teacher,I may be found where the driver add mutiple buffer to virtqueue 
and then kick qemu side. It is when driver use NAPI to poll the device to get 
buffers,and it is in receive queue.but in transmit queue,every time driver add 
a buffer to virtqueue,then kick function is called!!!why ??is qemu handle 
buffer faster than driver add it??

thank you very much!



zhun...@gmail.com
 
From: Stefan Hajnoczi
Date: 2016-11-11 20:03
To: zhun...@gmail.com
CC: qemu
Subject: Re: Re: [Qemu-devel] virtIO question
On Thu, Nov 10, 2016 at 08:16:38PM +0800, zhun...@gmail.com wrote:
> From this point of view ,I think it make sense well, thank you very much!
>  but I have another question about notify mechanism between virtIO driver and 
> qemu.
> according the source code of Linux and qemu,
> when driver add a sg buffer to send queue named sq,
> sq->vq->vring.avail->idx++
> vq->num_added++
> and then use virtqueue_kick_prepare to make sure if need notify qemu.
> it (new_idx-event_idx)<(new_idx-old_idx)
 
This expression is wrong.  The specification and Linux code both say:
 
  (u16)(new_idx - event_idx - 1) < (u16)(new_idx - old_idx)
 
Both the (u16) and the -1 matter.  Maybe that's why you are confused by
this?
 
> if it is true,then notify other side.
> However,every time driver add a sg,then virtqueue_kick_prepare is called,and 
> vq->num_added  is reseted to 0,so in fact ,I think vq->num_added is always 0 
> or 1。
 
A driver may add multiple buffers to the virtqueue by calling
virtqueue_add_sgs() or similar functions multiple times before kicking.
Therefore vq->num_added > 1 is possible.
 
> as to qemu side,every time when pop a elem from virtqueue,it set 
> VRingUsed.ring[vring.num] to the lastest VRingAvail.idx, this according the 
> arithmetic ((new_idx-event_idx)<(new_idx-old_idx)),it seems that this 
> mechanism does not make sense
 
You are basically asking "how does event_idx work?".  The specification
says:
 
  "The driver can ask the device to delay interrupts until an entry with
  an index specified by the “used_event” field is written in the used ring
  (equivalently, until the idx field in the used ring will reach the
  value used_event + 1)."
 
and:
 
  "The device can ask the driver to delay notifications until an entry
  with an index specified by the “avail_event” field is written in the
  available ring (equivalently, until the idx field in the used ring will
  reach the value avail_event + 1)."
 
Whenever the device or driver wants to notify, it first checks if the
index update crossed the event index set by the other side.


Re: [Qemu-devel] virtIO question

2016-11-12 Thread zhun...@gmail.com
Thanks,the expression is not the key problem,I just write it wrong,the key 
problem is that what I get from the code is everytime dirver add a sg ,it will 
call virtqueue_kick,such as network driver,in start_xmit function ,it called 
xmit_skb generate a sg list and add it to the queue,then called virtqueue_kick 
,why it handle like this??can you explain it to me??thank you very much!!!



zhun...@gmail.com
 
From: Stefan Hajnoczi
Date: 2016-11-11 20:03
To: zhun...@gmail.com
CC: qemu
Subject: Re: Re: [Qemu-devel] virtIO question
On Thu, Nov 10, 2016 at 08:16:38PM +0800, zhun...@gmail.com wrote:
> From this point of view ,I think it make sense well, thank you very much!
>  but I have another question about notify mechanism between virtIO driver and 
> qemu.
> according the source code of Linux and qemu,
> when driver add a sg buffer to send queue named sq,
> sq->vq->vring.avail->idx++
> vq->num_added++
> and then use virtqueue_kick_prepare to make sure if need notify qemu.
> it (new_idx-event_idx)<(new_idx-old_idx)
 
This expression is wrong.  The specification and Linux code both say:
 
  (u16)(new_idx - event_idx - 1) < (u16)(new_idx - old_idx)
 
Both the (u16) and the -1 matter.  Maybe that's why you are confused by
this?
 
> if it is true,then notify other side.
> However,every time driver add a sg,then virtqueue_kick_prepare is called,and 
> vq->num_added  is reseted to 0,so in fact ,I think vq->num_added is always 0 
> or 1。
 
A driver may add multiple buffers to the virtqueue by calling
virtqueue_add_sgs() or similar functions multiple times before kicking.
Therefore vq->num_added > 1 is possible.
 
> as to qemu side,every time when pop a elem from virtqueue,it set 
> VRingUsed.ring[vring.num] to the lastest VRingAvail.idx, this according the 
> arithmetic ((new_idx-event_idx)<(new_idx-old_idx)),it seems that this 
> mechanism does not make sense
 
You are basically asking "how does event_idx work?".  The specification
says:
 
  "The driver can ask the device to delay interrupts until an entry with
  an index specified by the “used_event” field is written in the used ring
  (equivalently, until the idx field in the used ring will reach the
  value used_event + 1)."
 
and:
 
  "The device can ask the driver to delay notifications until an entry
  with an index specified by the “avail_event” field is written in the
  available ring (equivalently, until the idx field in the used ring will
  reach the value avail_event + 1)."
 
Whenever the device or driver wants to notify, it first checks if the
index update crossed the event index set by the other side.


Re: [Qemu-devel] virtIO question

2016-11-11 Thread jack
   thanks,but i really do not found when the driver will add mutiple
   buffers then call kick function,this is the key problem .

   æ�¥è‡ª é…æ—� MX5

    原始邮件 
   �件人:Stefan Hajnoczi <stefa...@gmail.com>
   时间:周五 11月11日 20:03
   收件人:zhun...@gmail.com
   抄�:qemu <qemu-devel@nongnu.org>
   主题:Re: Re: [Qemu-devel] virtIO question
   On Thu, Nov 10, 2016 at 08:16:38PM +0800, zhun...@gmail.com wrote:
   > From this point of view ,I think it make sense well, thank you very
   much!
   >  but I have another question about notify mechanism between virtIO
   driver and qemu.
   > according the source code of Linux and qemu,
   > when driver add a sg buffer to send queue named sq,
   > sq->vq->vring.avail->idx++
   > vq->num_added++
   > and then use virtqueue_kick_prepare to make sure if need notify qemu.
   > it (new_idx-event_idx)<(new_idx-old_idx)
   This expression is wrong.  The specification and Linux code both say:
 (u16)(new_idx - event_idx - 1) < (u16)(new_idx - old_idx)
   Both the (u16) and the -1 matter.  Maybe that's why you are confused by
   this?
   > if it is true,then notify other side.
   > However,every time driver add a sg,then virtqueue_kick_prepare is
   called,and vq->num_added  is reseted to 0,so in fact ,I think
   vq->num_added is always 0 or 1。
   A driver may add multiple buffers to the virtqueue by calling
   virtqueue_add_sgs() or similar functions multiple times before kicking.
   Therefore vq->num_added > 1 is possible.
   > as to qemu side,every time when pop a elem from virtqueue,it set
   VRingUsed.ring[vring.num] to the lastest VRingAvail.idx, this according
   the arithmetic ((new_idx-event_idx)<(new_idx-old_idx)),it seems that
   this mechanism does not make sense
   You are basically asking "how does event_idx work?".  The specification
   says:
 "The driver can ask the device to delay interrupts until an entry
   with
 an index specified by the “used_event� field is written in the
   used ring
 (equivalently, until the idx field in the used ring will reach the
 value used_event + 1)."
   and:
 "The device can ask the driver to delay noti�cations until an entry
 with an index specified by the “avail_event� field is written in
   the
 available ring (equivalently, until the idx field in the used ring
   will
 reach the value avail_event + 1)."
   Whenever the device or driver wants to notify, it first checks if the
   index update crossed the event index set by the other side.


signature.asc
Description: Binary data


Re: [Qemu-devel] virtIO question

2016-11-11 Thread Stefan Hajnoczi
On Thu, Nov 10, 2016 at 08:16:38PM +0800, zhun...@gmail.com wrote:
> From this point of view ,I think it make sense well, thank you very much!
>  but I have another question about notify mechanism between virtIO driver and 
> qemu.
> according the source code of Linux and qemu,
> when driver add a sg buffer to send queue named sq,
> sq->vq->vring.avail->idx++
> vq->num_added++
> and then use virtqueue_kick_prepare to make sure if need notify qemu.
> it (new_idx-event_idx)<(new_idx-old_idx)

This expression is wrong.  The specification and Linux code both say:

  (u16)(new_idx - event_idx - 1) < (u16)(new_idx - old_idx)

Both the (u16) and the -1 matter.  Maybe that's why you are confused by
this?

> if it is true,then notify other side.
> However,every time driver add a sg,then virtqueue_kick_prepare is called,and 
> vq->num_added  is reseted to 0,so in fact ,I think vq->num_added is always 0 
> or 1。

A driver may add multiple buffers to the virtqueue by calling
virtqueue_add_sgs() or similar functions multiple times before kicking.
Therefore vq->num_added > 1 is possible.

> as to qemu side,every time when pop a elem from virtqueue,it set 
> VRingUsed.ring[vring.num] to the lastest VRingAvail.idx, this according the 
> arithmetic ((new_idx-event_idx)<(new_idx-old_idx)),it seems that this 
> mechanism does not make sense

You are basically asking "how does event_idx work?".  The specification
says:

  "The driver can ask the device to delay interrupts until an entry with
  an index specified by the “used_event” field is written in the used ring
  (equivalently, until the idx field in the used ring will reach the
  value used_event + 1)."

and:

  "The device can ask the driver to delay notifications until an entry
  with an index specified by the “avail_event” field is written in the
  available ring (equivalently, until the idx field in the used ring will
  reach the value avail_event + 1)."

Whenever the device or driver wants to notify, it first checks if the
index update crossed the event index set by the other side.


signature.asc
Description: PGP signature


Re: [Qemu-devel] virtIO question

2016-11-10 Thread zhun...@gmail.com
From this point of view ,I think it make sense well, thank you very much!
 but I have another question about notify mechanism between virtIO driver and 
qemu.
according the source code of Linux and qemu,
when driver add a sg buffer to send queue named sq,
sq->vq->vring.avail->idx++
vq->num_added++
and then use virtqueue_kick_prepare to make sure if need notify qemu.
it (new_idx-event_idx)<(new_idx-old_idx)
if it is true,then notify other side.
However,every time driver add a sg,then virtqueue_kick_prepare is called,and 
vq->num_added  is reseted to 0,so in fact ,I think vq->num_added is always 0 or 
1。
as to qemu side,every time when pop a elem from virtqueue,it set 
VRingUsed.ring[vring.num] to the lastest VRingAvail.idx, this according the 
arithmetic ((new_idx-event_idx)<(new_idx-old_idx)),it seems that this mechanism 
does not make sense
I do not know if I describe it clearly.or can you give me an example to prove 
how it make sense!!
thanks a lot!


zhun...@gmail.com
 
From: Stefan Hajnoczi
Date: 2016-11-10 18:32
To: zhun...@gmail.com
CC: jkhasdev; qemu
Subject: Re: [Qemu-devel] virtIO question
On Wed, Nov 09, 2016 at 06:58:16PM +0800, zhun...@gmail.com wrote:
> I want to ask a another question,why a virt_queue in virtio include in_sgs 
> and out_sgs,for example,send_queue of virtIO net driver have in_sgs and 
> out_sgs,when transmit data,It add buffer to out_sgs of send_queue,but how it 
> to use in_sgs??
 
You can think of every virtqueue buffer as having two scatter-gather
lists:
1. out_sgs are driver->device buffers (e.g. tx packet payload)
2. in_sgs are device->driver buffers (e.g. rx packet payload)
 
Look at the virtio-net ctrl virtqueue (see spec and
virtio_net_handle_ctrl() for details).  Each buffer has:
 
1. struct virtio_net_ctrl_hdr (out_sgs)
2. request-specific fields (out_sgs)
3. virtio_net_ctrl_ack status byte (in_sgs)
 
The device parses the request and performs the operation.  Then it fills
in the result (success or error code) in the status byte.
 
Processing ctrl virtqueue buffers therefore requires both guest memory
reads (out_sgs) and writes (in_sgs).  Most of the other virtio devices
also use bi-directional buffers.
 
This may not be obvious if you only consider the virtio-net tx
virtqueue, for example, where buffers use out_sgs only.
 
Hope this makes sense.  If not, look at the specification again and
think about how virtio-net ctrl request processing works.


Re: [Qemu-devel] virtIO question

2016-11-10 Thread Stefan Hajnoczi
On Wed, Nov 09, 2016 at 06:58:16PM +0800, zhun...@gmail.com wrote:
> I want to ask a another question,why a virt_queue in virtio include in_sgs 
> and out_sgs,for example,send_queue of virtIO net driver have in_sgs and 
> out_sgs,when transmit data,It add buffer to out_sgs of send_queue,but how it 
> to use in_sgs??

You can think of every virtqueue buffer as having two scatter-gather
lists:
1. out_sgs are driver->device buffers (e.g. tx packet payload)
2. in_sgs are device->driver buffers (e.g. rx packet payload)

Look at the virtio-net ctrl virtqueue (see spec and
virtio_net_handle_ctrl() for details).  Each buffer has:

1. struct virtio_net_ctrl_hdr (out_sgs)
2. request-specific fields (out_sgs)
3. virtio_net_ctrl_ack status byte (in_sgs)

The device parses the request and performs the operation.  Then it fills
in the result (success or error code) in the status byte.

Processing ctrl virtqueue buffers therefore requires both guest memory
reads (out_sgs) and writes (in_sgs).  Most of the other virtio devices
also use bi-directional buffers.

This may not be obvious if you only consider the virtio-net tx
virtqueue, for example, where buffers use out_sgs only.

Hope this makes sense.  If not, look at the specification again and
think about how virtio-net ctrl request processing works.


signature.asc
Description: PGP signature


Re: [Qemu-devel] virtIO question

2016-11-09 Thread zhun...@gmail.com
I want to ask a another question,why a virt_queue in virtio include in_sgs and 
out_sgs,for example,send_queue of virtIO net driver have in_sgs and 
out_sgs,when transmit data,It add buffer to out_sgs of send_queue,but how it to 
use in_sgs??



zhun...@gmail.com
 
From: jitendra kumar khasdev
Date: 2016-11-05 23:41
To: Peter Maydell
CC: zhun...@gmail.com; qemu-devel
Subject: Re: [Qemu-devel] virtIO question
Have you looked at the virtio specification?

No.
 
This describes
the overall structure and communication mechanism, which
QEMU and Linux each only implement one half of:
http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html

Thanks Peter. This doc looks me interesting. 

---
Jitendra 


Re: [Qemu-devel] virtIO question

2016-11-05 Thread jack
   thanks a lot!I will read the document carefully!

   æ�¥è‡ª é…æ—� MX5

    原始邮件 
   �件人:jitendra kumar khasdev 
   æ—¶é—´ï¼šå‘¨å… 11月5æ—¥ 23:11
   收件人:Peter Maydell 
   抄�:zhun...@gmail.com,qemu-devel 
   主题:Re: [Qemu-devel] virtIO question

 Have you looked at the virtio specification?

   No.
   Â

 This describes
 the overall structure and communication mechanism, which
 QEMU and Linux each only implement one half of:
 [1]http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html

   Thanks Peter. This doc looks me interesting.Â
   ---
   JitendraÂ

References

   1. http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html


Re: [Qemu-devel] virtIO question

2016-11-05 Thread jitendra kumar khasdev
>
> Have you looked at the virtio specification?


No.


> This describes
> the overall structure and communication mechanism, which
> QEMU and Linux each only implement one half of:
> http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html


Thanks Peter. This doc looks me interesting.

---
Jitendra


Re: [Qemu-devel] virtIO question

2016-11-05 Thread Peter Maydell
On 5 November 2016 at 14:46, jitendra kumar khasdev  wrote:
> here is what I find difficulty in understanding the code,
>
> 1. Qemu virtio datastructures are not so clear.
> 2. I didn't find any sort high level design with respective qemu for virtIO.
> 3. How qemu virtIO and linux virtIO does shared memoru communication.
>
> I will deeply thankful, if somebody help me to kick start this because I am
> really willing to work on virtIO. I am hoping somebody in community will
> help to give me some direction to achive the goal.

Have you looked at the virtio specification? This describes
the overall structure and communication mechanism, which
QEMU and Linux each only implement one half of:
http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html

Hopefully it will provide the higher level overview
that you are looking for. Many of the data structures
and their purposes are also directly required and
described by the spec.

thanks
-- PMM



Re: [Qemu-devel] virtIO question

2016-11-05 Thread jitendra kumar khasdev
Hi All,

I am also exploring virtio implementation in qemu, I try to understand
virtio in linux(guest) and qemu, but I am not able to understand I/O stack
between linux to qemu.

here is what I find difficulty in understanding the code,

1. Qemu virtio datastructures are not so clear.
2. I didn't find any sort high level design with respective qemu for virtIO.
3. How qemu virtIO and linux virtIO does shared memoru communication.

I will deeply thankful, if somebody help me to kick start this because I am
really willing to work on virtIO. I am hoping somebody in community will
help to give me some direction to achive the goal.


Jitendra





On Sat, Nov 5, 2016 at 4:20 PM, zhun...@gmail.com  wrote:

> who can explain the means of idx in VRingUsed and VRingAvail structure
> about virtIO??
> thanks!
>
>
>
> zhun...@gmail.com
>


Re: [Qemu-devel] A question about virtual machine communication with Guest through serial device

2016-11-04 Thread Stefan Hajnoczi
On Mon, Oct 31, 2016 at 06:41:23AM +, Liu Pedroa wrote:
> Hi Everyone.
> 
> 
>   I  have a sample question. As the subset description.
> 
> 
> Guest OS : Ubuntu OS
> 
> QEMU :  linux kernel 4.8
> 
> 
> so i want the virtual client OS can communication( can send and receive 
> messages through serial device) with Guest OS. what should i do?

Add a chardev to your QEMU command-line (try -serial
unix:/tmp/test.sock,server,nowait).  Read the qemu(1) man page for
details on chardevs and all the options.

Inside the guest you need to access the serial port (e.g. /dev/ttyS0 on
Linux but make sure no login tty is running on that port).

On the host you can access the UNIX domain socket given on your QEMU
command-line.  That would be /tmp/test.sock in this example.

Also consider using virtio-serial instead of the traditional ISA serial
device.  It allows you to add/remove named (e.g. org.myproject.agent.0)
ports at run-time.  Search online to find example syntax.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] A question about this commit "char: convert from GIOChannel to QIOChannel"

2016-10-17 Thread Paolo Bonzini


- Original Message -
> From: "wangyunjian" 
> To: "Paolo Bonzini" 
> Cc: berra...@redhat.com, qemu-devel@nongnu.org, "caihe" 
> Sent: Monday, October 17, 2016 4:17:45 PM
> Subject: RE: A question about this commit "char: convert from GIOChannel to 
> QIOChannel"
> 
> In function tcp_chr_close, the s->sioc needs call object_unref() to be
> released ?
> 
> About the below code:
> 
> @@ -3205,10 +3205,13 @@ static void tcp_chr_close(CharDriverState *chr)
>  s->listen_tag = 0;
>  }
>  if (s->listen_ioc) {
>  object_unref(OBJECT(s->listen_ioc));
>  }
> +if (s->sioc) {
> +object_unref(OBJECT(s->sioc));
> +}
>  if (s->read_msgfds_num) {
>  for (i = 0; i < s->read_msgfds_num; i++) {
>  close(s->read_msgfds[i]);
>  }
>  g_free(s->read_msgfds);

This is done in tcp_chr_free_connection.  Maybe your tree does not have
commit 5b498459b441f4639fd4c39c35345637bfc6c16c?

Paolo

> Thanks
> 
> Yunjian
> 
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Monday, October 17, 2016 9:20 PM
> To: wangyunjian
> Cc: berra...@redhat.com; qemu-devel@nongnu.org; caihe
> Subject: Re: A question about this commit "char: convert from GIOChannel to
> QIOChannel"
> 
> 
> 
> - Original Message -
> > From: "wangyunjian" 
> > To: berra...@redhat.com, pbonz...@redhat.com, qemu-devel@nongnu.org
> > Cc: "caihe" 
> > Sent: Monday, October 17, 2016 3:02:32 PM
> > Subject: A question about this commit "char: convert from GIOChannel to
> > QIOChannel"
> > 
> > Commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 “char: convert from
> > GIOChannel to QIOChannel”, the old version will call closesocket when
> > tcp_chr_close and udp_chr_close are called. But the new version will
> > not call closesocket.
> > 
> > This can bring to a socket leak?
> 
> Hi, closesocket is called in io/channel-socket.c
> (qio_channel_socket_finalize).
> 
> Thanks,
> 
> Paolo
> 



Re: [Qemu-devel] A question about this commit "char: convert from GIOChannel to QIOChannel"

2016-10-17 Thread wangyunjian
In function tcp_chr_close, the s->sioc needs call object_unref() to be released 
? 

About the below code:

@@ -3205,10 +3205,13 @@ static void tcp_chr_close(CharDriverState *chr)
 s->listen_tag = 0;
 }
 if (s->listen_ioc) {
 object_unref(OBJECT(s->listen_ioc));
 }
+if (s->sioc) {
+object_unref(OBJECT(s->sioc));
+}
 if (s->read_msgfds_num) {
 for (i = 0; i < s->read_msgfds_num; i++) {
 close(s->read_msgfds[i]);
 }
 g_free(s->read_msgfds);

Thanks

Yunjian

-Original Message-
From: Paolo Bonzini [mailto:pbonz...@redhat.com] 
Sent: Monday, October 17, 2016 9:20 PM
To: wangyunjian
Cc: berra...@redhat.com; qemu-devel@nongnu.org; caihe
Subject: Re: A question about this commit "char: convert from GIOChannel to 
QIOChannel"



- Original Message -
> From: "wangyunjian" 
> To: berra...@redhat.com, pbonz...@redhat.com, qemu-devel@nongnu.org
> Cc: "caihe" 
> Sent: Monday, October 17, 2016 3:02:32 PM
> Subject: A question about this commit "char: convert from GIOChannel to 
> QIOChannel"
> 
> Commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 “char: convert from 
> GIOChannel to QIOChannel”, the old version will call closesocket when 
> tcp_chr_close and udp_chr_close are called. But the new version will 
> not call closesocket.
> 
> This can bring to a socket leak?

Hi, closesocket is called in io/channel-socket.c (qio_channel_socket_finalize).

Thanks,

Paolo


Re: [Qemu-devel] A question about this commit "char: convert from GIOChannel to QIOChannel"

2016-10-17 Thread Paolo Bonzini


- Original Message -
> From: "wangyunjian" 
> To: berra...@redhat.com, pbonz...@redhat.com, qemu-devel@nongnu.org
> Cc: "caihe" 
> Sent: Monday, October 17, 2016 3:02:32 PM
> Subject: A question about this commit "char: convert from GIOChannel to 
> QIOChannel"
> 
> Commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 “char: convert from
> GIOChannel to QIOChannel”,
> the old version will call closesocket when tcp_chr_close and udp_chr_close
> are called. But the new version will not call closesocket.
> 
> This can bring to a socket leak?

Hi, closesocket is called in io/channel-socket.c (qio_channel_socket_finalize).

Thanks,

Paolo



Re: [Qemu-devel] A question about postcopy safety

2016-09-05 Thread Daniel P. Berrange
On Mon, Sep 05, 2016 at 02:52:14PM +0100, Dr. David Alan Gilbert wrote:
> * liut...@yahoo.com (liut...@yahoo.com) wrote:
> > Hi David,
> 
> Hi Liutao,
> 
> > I'm studying the process of postcopy migration, and I found that the memory 
> > pages migrated from source to destination are not encrypted. Does this make 
> > the VM vulnerable if it's memory has been tampered with during postcopy 
> > migration?
> > 
> > I think precopy has less risk because the source's memory is always 
> > altering. If one page is tampered with during network transfer, with source 
> > still running, then a later version of that page may keep updating. So it 
> > would be quite difficult to track all different page versions, and tamper 
> > with the final version of one page.
> > 
> > But when it comes to postcopy, the situation is riskier because one 
> > specific page is only transferred once. It's easy to capture all 
> > transferring memory pages, tamper and resend.
> 
> I don't think there's much difference between precopy and postcopy for 
> security;
> the only secure way to do migration is over an encrypted transport and that 
> solves
> it for both precopy and postcopy.

Agreed, there's no real world difference in the security of pre & post copy.
If you care about security there's no avoiding the need to use an encrypted
transport.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] A question about postcopy safety

2016-09-05 Thread Dr. David Alan Gilbert
* liut...@yahoo.com (liut...@yahoo.com) wrote:
> Hi David,

Hi Liutao,

> I'm studying the process of postcopy migration, and I found that the memory 
> pages migrated from source to destination are not encrypted. Does this make 
> the VM vulnerable if it's memory has been tampered with during postcopy 
> migration?
> 
> I think precopy has less risk because the source's memory is always altering. 
> If one page is tampered with during network transfer, with source still 
> running, then a later version of that page may keep updating. So it would be 
> quite difficult to track all different page versions, and tamper with the 
> final version of one page.
> 
> But when it comes to postcopy, the situation is riskier because one specific 
> page is only transferred once. It's easy to capture all transferring memory 
> pages, tamper and resend.

I don't think there's much difference between precopy and postcopy for security;
the only secure way to do migration is over an encrypted transport and that 
solves
it for both precopy and postcopy.

I don't think it would be that hard for a malicious person to track the pages 
in precopy;
and indeed what they could do is wait until an interesting page comes along
(say one with a hash or the data they're interested in) and then insert a new 
version
of that page later with their own nasty version on - postcopy wouldn't allow
that second version.

The challenge is to get a nice fast high speed encryption layer, and for 
post-copy
it should have low added latency.

> 
> When the memory been tampered with, the safety of the VM will be compromised.
> 
> Any ideas? thank you!Liutao

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] A question about postcopy safety

2016-08-29 Thread Kashyap Chamarthy
On Mon, Aug 29, 2016 at 12:51:20PM +, liut...@yahoo.com wrote:
> Hi David,I'm studying the process of postcopy migration, and I found
> that the memory pages migrated from source to destination are not
> encrypted. Does this make the VM vulnerable if it's memory has been
> tampered with during postcopy migration?

If you already haven't, you might want to take a look at this post,
which discusses the security details during live migration with
post-copy.

https://www.berrange.com/posts/2016/08/16/improving-qemu-security-part-7-tls-support-for-migration/

It also has an example of setting the 'tls-creds' field with
'migrate-set-parameters' QMP command to use TLS, before triggering
'migrate' QMP command.

> I think precopy has less risk because the source's memory is always
> altering. If one page is tampered with during network transfer, with
> source still running, then a later version of that page may keep
> updating. So it would be quite difficult to track all different page
> versions, and tamper with the final version of one page.
> 
> But when it comes to postcopy, the situation is riskier because one
> specific page is only transferred once. It's easy to capture all
> transferring memory pages, tamper and resend.
> 
> When the memory been tampered with, the safety of the VM will be
> compromised.
> 
> Any ideas? thank you!Liutao

-- 
/kashyap



Re: [Qemu-devel] a question

2016-08-26 Thread Peter Maydell
On 26 August 2016 at 10:31, Michael Rolnik  wrote:
> I want to partially implement AT STK500 board  in order to run FreeRTOS AVR
> / ATMegaAVR demo.
> if you look into ATmega32 documentation you will see that, for example,
> Timer/Countet1 registers are held together at memory addresses [0x46 ..
> 0xff], but interrupt masks and enable bits are at some other place, actually
> 0x58 .. 0x59.

(0x58..0x59 is a subset of 0x46..0xff -- I think from the docs
that you meant 0x46..0x4f.)

I think I would treat this set of timers as a single device that
implements all the timers, and which implements also the timer
interrupt mask/enable registers. Where there are multiple disjoint
register sets you can do this by having the device provide multiple
MemoryRegions, which the SoC container device then maps into the
memory space at the right places.

If you plan to support multiple CPUs which reuse some of the
individual timer modules but not all of them, you can structure
the code to allow convenient reuse, but that is an internal
detail of the overall "timers" device.

> create non memory mapped devices. Each device will export the following
> functions
>
> avr__init
> avr__read_
> avr__write_
>
> create "big" memory mapped device, which will cover the whole IO space and
> it will call _read_/_write_ functions of the devices when there is an access
> to specific address in the IO memory space.

You can do all this with the memory region API. Your various
devices can provide multiple memory regions (implementing the bits
of memory space that they care about), and then a higher level
SoC container device can map those into the right places in a
container memory region which it then exports to the board level.
You don't need to invent a new API for having devices provide registers.

thanks
-- PMM



Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2

2016-08-25 Thread Daniel P. Berrange
On Thu, Aug 25, 2016 at 01:21:47PM +, Gaohaifeng (A) wrote:
> On Tue, Aug 23, 2016 at 08:57:44AM +, Gaohaifeng (A) wrote:
> > Hi Daniel & Paolo,
> > > 
> > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about
> > > 
> > > the below code segment:
> > > 
> > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, 
> > > void *opaque)
> > > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, 
> > > +void *opaque)
> > > {
> > >  CharDriverState *chr = opaque;
> > >  TCPCharDriver *s = chr->opaque;
> > > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, 
> > > GIOCondition cond, void *opaque)
> > >  if (len > s->max_size)
> > >  len = s->max_size;
> > >  size = tcp_chr_recv(chr, (void *)buf, len);
> > > -if (size == 0 ||
> > > -(size < 0 &&
> > > - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
> > > +if (size == 0 || size == -1) {
> > >  /* connection closed */
> > >  tcp_chr_disconnect(chr);
> > >  } else if (size > 0) {
> > > 
> > > The old version will not call tcp_chr_disconnect when error is not 
> > > EAGAIN.
> > > The new version will call tcp_chr_disconnect when size==-1. From the 
> > > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call 
> > > tcp_chr_disconnect.
> > > 
> > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit 
> > > since link status is not up. The link is down because 
> > > tcp_chr_disconnect will set it.
> > > 
> > > Can you explain it why EAGIN here changes the behavior ?
> 
> > tcp_chr_read is used in a call to io_add_watch_poll() which sets up an 
> > event loop watch so that tcp_chr_read is called when there is incoming data 
> > to be read.  IOW, when tcp_chr_read is invoked, I would always expect that 
> > at least 1 byte is available, and so EAGAIN should not occurr.
> > 
> > So I'm puzzled why you would get EAGAIN at all, unless there is some kind 
> > of race with other code also consuming bytes from the same socket.
> 
> It could easily reproduce this when repeating 'rmmod virtio_net;modprobe 
> virtio_net'.
> The call stack:
> #0  tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867
> #1  0x7f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, 
> opaque=0x7f09d5b2d540) at qemu-char.c:2917
> #2  0x7f09d46364fa in qio_channel_fd_source_dispatch 
> (source=0x7f093e603b30, callback=0x7f09d43b1fcb , 
> user_data=0x7f09d5b2d540) at io/channel-watch.c:84
> #3  0x7f09d2ea in g_main_context_dispatch () from 
> /lib64/libglib-2.0.so.0
> #4  0x7f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213
> #5  0x7f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at 
> main-loop.c:258
> #6  0x7f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506
> #7  0x7f09d43bbdd2 in main_loop () at vl.c:2119
> #8  0x7f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, 
> envp=0x7ffe7ad66b70) at vl.c:4896
> (gdb) f 1
> #1  0x7f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, 
> opaque=0x7f09d5b2d540) at qemu-char.c:2917
> 2917  tcp_chr_disconnect(chr);
> (gdb) p errno
> $1 = 11
> 
> EAGAIN = 11

I think what is happening is a race condition - the main event loop thread
sees I/O incoming triggering tcp_chr_read. Meanwhile the vhost-user.c file is
triggering vhost_user_read() in a different thread which consumes the incoming
I/O before tcp_chr_read can handle it.

IOW, you are right, we should have kept the EAGAIN handling code in tcp_chr_read
to avoid the disconnects.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2

2016-08-25 Thread Gaohaifeng (A)
On Tue, Aug 23, 2016 at 08:57:44AM +, Gaohaifeng (A) wrote:
> Hi Daniel & Paolo,
> > 
> > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about
> > 
> > the below code segment:
> > 
> > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, 
> > void *opaque)
> > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, 
> > +void *opaque)
> > {
> >  CharDriverState *chr = opaque;
> >  TCPCharDriver *s = chr->opaque;
> > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, 
> > GIOCondition cond, void *opaque)
> >  if (len > s->max_size)
> >  len = s->max_size;
> >  size = tcp_chr_recv(chr, (void *)buf, len);
> > -if (size == 0 ||
> > -(size < 0 &&
> > - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
> > +if (size == 0 || size == -1) {
> >  /* connection closed */
> >  tcp_chr_disconnect(chr);
> >  } else if (size > 0) {
> > 
> > The old version will not call tcp_chr_disconnect when error is not 
> > EAGAIN.
> > The new version will call tcp_chr_disconnect when size==-1. From the 
> > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call 
> > tcp_chr_disconnect.
> > 
> > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit 
> > since link status is not up. The link is down because 
> > tcp_chr_disconnect will set it.
> > 
> > Can you explain it why EAGIN here changes the behavior ?

> tcp_chr_read is used in a call to io_add_watch_poll() which sets up an event 
> loop watch so that tcp_chr_read is called when there is incoming data to be 
> read.  IOW, when tcp_chr_read is invoked, I would always expect that at least 
> 1 byte is available, and so EAGAIN should not occurr.
> 
> So I'm puzzled why you would get EAGAIN at all, unless there is some kind of 
> race with other code also consuming bytes from the same socket.

It could easily reproduce this when repeating 'rmmod virtio_net;modprobe 
virtio_net'.
The call stack:
#0  tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867
#1  0x7f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, 
opaque=0x7f09d5b2d540) at qemu-char.c:2917
#2  0x7f09d46364fa in qio_channel_fd_source_dispatch 
(source=0x7f093e603b30, callback=0x7f09d43b1fcb , 
user_data=0x7f09d5b2d540) at io/channel-watch.c:84
#3  0x7f09d2ea in g_main_context_dispatch () from 
/lib64/libglib-2.0.so.0
#4  0x7f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213
#5  0x7f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at 
main-loop.c:258
#6  0x7f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506
#7  0x7f09d43bbdd2 in main_loop () at vl.c:2119
#8  0x7f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, 
envp=0x7ffe7ad66b70) at vl.c:4896
(gdb) f 1
#1  0x7f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, 
opaque=0x7f09d5b2d540) at qemu-char.c:2917
2917tcp_chr_disconnect(chr);
(gdb) p errno
$1 = 11

EAGAIN = 11

> 
> Regards,
> Daniel


Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2

2016-08-24 Thread Daniel P. Berrange
On Tue, Aug 23, 2016 at 08:57:44AM +, Gaohaifeng (A) wrote:
> Hi Daniel & Paolo,
> 
> Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about
> 
> the below code segment:
> 
> -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void 
> *opaque)
> +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void 
> *opaque)
> {
>  CharDriverState *chr = opaque;
>  TCPCharDriver *s = chr->opaque;
> @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, 
> GIOCondition cond, void *opaque)
>  if (len > s->max_size)
>  len = s->max_size;
>  size = tcp_chr_recv(chr, (void *)buf, len);
> -if (size == 0 ||
> -(size < 0 &&
> - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
> +if (size == 0 || size == -1) {
>  /* connection closed */
>  tcp_chr_disconnect(chr);
>  } else if (size > 0) {
> 
> The old version will not call tcp_chr_disconnect when error is not
> EAGAIN.
> The new version will call tcp_chr_disconnect when size==-1. From the
> tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call
> tcp_chr_disconnect.
> 
> We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit since
> link status is not up. The link is down because tcp_chr_disconnect will
> set it.
> 
> Can you explain it why EAGIN here changes the behavior ?

tcp_chr_read is used in a call to io_add_watch_poll() which sets up an
event loop watch so that tcp_chr_read is called when there is incoming
data to be read.  IOW, when tcp_chr_read is invoked, I would always
expect that at least 1 byte is available, and so EAGAIN should not
occurr.

So I'm puzzled why you would get EAGAIN at all, unless there is some
kind of race with other code also consuming bytes from the same
socket.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] A question about tb_next_offset[2]

2016-07-27 Thread Peter Maydell
On 26 July 2016 at 19:49, Kartik Ramkrishnan  wrote:
> Hello all,
>
>I am running an i386 binary in user mode using qemu.
>
>In the code, I am looking for the next location that the simulated
> program counter jumps to when a TranslationBlock completes execution. This
> address should be the guest address, not the address on the host.

The place we go to is whatever the PC in the CPUState says
at the point when we finish execution of the TB.

> I found a variable called tb_next_offset[2] in 'struct TranslationBlock' in
> exec-all.h ,  which says (offset of original jump target). It sounds like
> the jump offset can be added to the target code's last address to obtain
> the new PC value.

This field was renamed earlier this year, so it's called
jmp_reset_offset.

It's part of an optimisation where rather than finishing the
TB, and then going out to the top level loop and looking up
the next TB from the guest PC, we can patch the first TB
to directly jump to the second TB. jmp_reset_offset is the
offset within the generated code of the branch instruction
which we need to patch in order to create this direct link.
It has nothing to do with the guest address.

thanks
-- PMM



Re: [Qemu-devel] A question about 9894dc0cdcc397ee5b26370bc53da6d360a363c2

2016-07-19 Thread Gonglei (Arei)
Hi,

I got it, thanks for your quick reply :)


Regards,
-Gonglei


> -Original Message-
> From: Daniel P. Berrange [mailto:berra...@redhat.com]
> Sent: Monday, July 18, 2016 5:07 PM
> To: Gonglei (Arei)
> Cc: Paolo Bonzini; Lilijun (Jerry); qemu-devel@nongnu.org
> Subject: Re: A question about 9894dc0cdcc397ee5b26370bc53da6d360a363c2
> 
> On Mon, Jul 18, 2016 at 08:41:32AM +, Gonglei (Arei) wrote:
> > Hi Daniel & Paolo,
> >
> > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel",
> > about the below code segment:
> >
> > static bool qemu_chr_open_socket_fd(CharDriverState *chr, Error **errp)
> >  {
> >  TCPCharDriver *s = chr->opaque;
> > -int fd;
> > +QIOChannelSocket *sioc = qio_channel_socket_new();
> >
> >  if (s->is_listen) {
> > -fd = socket_listen(s->addr, errp);
> > +if (qio_channel_socket_listen_sync(sioc, s->addr, errp) < 0) {
> > +goto fail;
> > +}
> > +qemu_chr_finish_socket_connection(chr, sioc);
> >  } else if (s->reconnect_time) {
> > -fd = socket_connect(s->addr, errp, qemu_chr_socket_connected,
> chr);
> > -return fd >= 0;
> > +qio_channel_socket_connect_async(sioc, s->addr,
> > +
> qemu_chr_socket_connected,
> > + chr, NULL);
> >  } else {
> > -fd = socket_connect(s->addr, errp, NULL, NULL);
> > -}
> > -if (fd < 0) {
> > -return false;
> > +if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) {
> > +goto fail;
> > +}
> > +qemu_chr_finish_socket_connection(chr, sioc);
> >  }
> >
> > -qemu_chr_finish_socket_connection(chr, fd);
> >  return true;
> > +
> > + fail:
> > +object_unref(OBJECT(sioc));
> > +return false;
> >  }
> >
> > Why did you change socket_connect() to qio_channel_socket_connect_async
> > but not qio_channel_socket_connect_sync when s->reconnect_time is ture?
> Thanks.
> > I can't get the reason from the commit message.
> 
> When the socket_connect() function is called with a non-null value for
> the NonBlockingConnectHandler parameter, it'll put the socket into
> non-blocking mode before doing the connection. Therefore socket_connect()
> method is no longer guaranteed to actually connect to the socket - it
> may simply start the connection and leave an event handler to finish
> it. This avoids blocking the caller waiting for the connectionb to
> finish which may take non-negligible time with TCP sockets.
> 
> The qio_channel_socket_connect_sync() method will always block until
> completion of the connect, so we use qio_channel_socket_connect_async
> which always does the connection attempt in the background.
> 
> The old code which may or may not finish the connect() in the background
> was a recipe for bugs because it forced callers to have 2 completely
> separate codepaths for the same API call and chances are only one of
> those code paths would be tested by the developers. With the code code
> we're guaranteed to always complete in the background, so the callers
> only have 1 codepath to worry about and so they'll be forced to ensure
> that codepath works.
> 
> > PS: We encountered a problem that when config vhost-user-net reconneticon
> function,
> > the virtio_net_get_features() didn't get a correct value after this changed,
> and it was
> > fixed when we change the async to sync.
> 
> It seems that you were mostly lucky with the old code in that presumably
> socket_connect() would complete the connection fully. There was never any
> guarantee of this though - it was perfectly feasible that socket_connect()
> would finish the connect in the background. So IMHO the vhost user code
> was already buggy if it doesn't cope with the connect finishing in the
> background.
> 
> 
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-
> http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o-
> http://virt-manager.org :|
> |: http://autobuild.org   -o-
> http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-
> http://live.gnome.org/gtk-vnc :|


Re: [Qemu-devel] A question about 9894dc0cdcc397ee5b26370bc53da6d360a363c2

2016-07-18 Thread Daniel P. Berrange
On Mon, Jul 18, 2016 at 08:41:32AM +, Gonglei (Arei) wrote:
> Hi Daniel & Paolo,
> 
> Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", 
> about the below code segment:
> 
> static bool qemu_chr_open_socket_fd(CharDriverState *chr, Error **errp)
>  {
>  TCPCharDriver *s = chr->opaque;
> -int fd;
> +QIOChannelSocket *sioc = qio_channel_socket_new();
>  
>  if (s->is_listen) {
> -fd = socket_listen(s->addr, errp);
> +if (qio_channel_socket_listen_sync(sioc, s->addr, errp) < 0) {
> +goto fail;
> +}
> +qemu_chr_finish_socket_connection(chr, sioc);
>  } else if (s->reconnect_time) {
> -fd = socket_connect(s->addr, errp, qemu_chr_socket_connected, chr);
> -return fd >= 0;
> +qio_channel_socket_connect_async(sioc, s->addr,
> + qemu_chr_socket_connected,
> + chr, NULL);
>  } else {
> -fd = socket_connect(s->addr, errp, NULL, NULL);
> -}
> -if (fd < 0) {
> -return false;
> +if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) {
> +goto fail;
> +}
> +qemu_chr_finish_socket_connection(chr, sioc);
>  }
>  
> -qemu_chr_finish_socket_connection(chr, fd);
>  return true;
> +
> + fail:
> +object_unref(OBJECT(sioc));
> +return false;
>  }
> 
> Why did you change socket_connect() to qio_channel_socket_connect_async
> but not qio_channel_socket_connect_sync when s->reconnect_time is ture? 
> Thanks.
> I can't get the reason from the commit message.

When the socket_connect() function is called with a non-null value for
the NonBlockingConnectHandler parameter, it'll put the socket into
non-blocking mode before doing the connection. Therefore socket_connect()
method is no longer guaranteed to actually connect to the socket - it
may simply start the connection and leave an event handler to finish
it. This avoids blocking the caller waiting for the connectionb to
finish which may take non-negligible time with TCP sockets.

The qio_channel_socket_connect_sync() method will always block until
completion of the connect, so we use qio_channel_socket_connect_async
which always does the connection attempt in the background.

The old code which may or may not finish the connect() in the background
was a recipe for bugs because it forced callers to have 2 completely
separate codepaths for the same API call and chances are only one of
those code paths would be tested by the developers. With the code code
we're guaranteed to always complete in the background, so the callers
only have 1 codepath to worry about and so they'll be forced to ensure
that codepath works.

> PS: We encountered a problem that when config vhost-user-net reconneticon 
> function,
> the virtio_net_get_features() didn't get a correct value after this changed, 
> and it was
> fixed when we change the async to sync.

It seems that you were mostly lucky with the old code in that presumably
socket_connect() would complete the connection fully. There was never any
guarantee of this though - it was perfectly feasible that socket_connect()
would finish the connect in the background. So IMHO the vhost user code
was already buggy if it doesn't cope with the connect finishing in the
background.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [Nbd] question on ioctl NBD_SET_FLAGS

2016-05-04 Thread Alex Bligh

On 20 Apr 2016, at 17:18, Wouter Verhelst  wrote:

> Hi Eric,
> 
> On Wed, Apr 20, 2016 at 09:42:02AM -0600, Eric Blake wrote:
> [...]
>> But in 3.9, the overlap bug was still present, and the set of global
>> flags had grown to include NBD_FLAG_NO_ZEROS (commit 038e066), which
>> overlaps with NBD_FLAG_READ_ONLY.  Ouch.  This means that a client
>> talking to a server that advertises NO_ZEROES means that the client will
>> mistakenly tell the kernel to treat EVERY export as read-only, even if
>> the client doesn't respond with NBD_FLAG_C_NO_ZEROES.
>> 
>> 3.10 fixed things; negotiate() now uses uint16_t *flags (instead of u32
>> *), and no longer tries to merge global flags with transmission flags;
>> only the transmission flags are ever passed to the kernel via
>> NBD_SET_FLAGS.  Maybe it's good that there was only 2 weeks between 3.9
>> and 3.10, so hopefully few distros are trying to ship that broken version.
> 
> Well, yeah, since 3.10 was an "oops" release when 3.9 exposed that bug
> (which indeed had existed for a while) and which was reported quite
> quickly on the list. Released versions of nbd which have the bug exist
> though, and trying to have a 3.8 (or below) client talk to a 3.9 (or
> above) server has the same issue.
> 
> I decided that there was no way in which I could fix it, and that "the
> export is readonly" is bad but not a "critical data loss" kind of bug,
> so releasing 3.10 was pretty much the only sane thing I could do (other
> than delaying NO_ZEROES, which might have worked too).

For what it's worth, Ubuntu 14.04 (still under long term
support) ships with nbd-client 3.7 and I've just had a bug
report filed against gonbdserver where all exports go readonly.
   https://github.com/abligh/gonbdserver/issues/4

I've added an option to disable NBD_FLAG_NO_ZEROS on nbd-server
but that's pretty horrible. And I think this will affect all
pre-3.10 clients talking to 3.9 and later servers.

I'm going to find a minimal patch to nbd-client and offer that
to Ubuntu / Debian.

This message is here in part so I have something to point them at
on the mailing list :-)

-- 
Alex Bligh







Re: [Qemu-devel] [Nbd] question on ioctl NBD_SET_FLAGS

2016-04-20 Thread Wouter Verhelst
Hi Eric,

On Wed, Apr 20, 2016 at 09:42:02AM -0600, Eric Blake wrote:
[...]
> But in 3.9, the overlap bug was still present, and the set of global
> flags had grown to include NBD_FLAG_NO_ZEROS (commit 038e066), which
> overlaps with NBD_FLAG_READ_ONLY.  Ouch.  This means that a client
> talking to a server that advertises NO_ZEROES means that the client will
> mistakenly tell the kernel to treat EVERY export as read-only, even if
> the client doesn't respond with NBD_FLAG_C_NO_ZEROES.
> 
> 3.10 fixed things; negotiate() now uses uint16_t *flags (instead of u32
> *), and no longer tries to merge global flags with transmission flags;
> only the transmission flags are ever passed to the kernel via
> NBD_SET_FLAGS.  Maybe it's good that there was only 2 weeks between 3.9
> and 3.10, so hopefully few distros are trying to ship that broken version.

Well, yeah, since 3.10 was an "oops" release when 3.9 exposed that bug
(which indeed had existed for a while) and which was reported quite
quickly on the list. Released versions of nbd which have the bug exist
though, and trying to have a 3.8 (or below) client talk to a 3.9 (or
above) server has the same issue.

I decided that there was no way in which I could fix it, and that "the
export is readonly" is bad but not a "critical data loss" kind of bug,
so releasing 3.10 was pretty much the only sane thing I could do (other
than delaying NO_ZEROES, which might have worked too).

[...]
> But do we need to document in the kernel code that existing clients
> mistakenly pass too many bits to the NBD_SET_FLAGS ioctl, so that if we
> ever reach the future point where we need more than 16 transmission
> flags, AND where we have more than 2 global flags defined, existing qemu
> 2.5 clients don't confuse the kernel when calling NBD_SET_FLAGS?  Or do
> we think that it is unlikely enough to worry about, where by the time
> there are more than 16 transmission flags, users are likely to already
> be using new-enough qemu that doesn't send global flags to the kernel?

I'm going to assume the latter is the case, yeah, but we could skip bits
16 and 17 (the only two handshake flags currently defined) and use the
upper 14 bits before using the lower 2 if it does turn out to be a
problem.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-devel] TCG question reg. tcg_cpu_exec()

2016-03-25 Thread Paolo Bonzini


On 14/03/2016 16:26, Pranith Kumar wrote:
> I am trying to understand what scenarios can result in no TBs
> executing for that CPU. My understanding is that there is a pending
> operation which needs to be handled before we can execute TBs from
> this CPU(I/O?).

For example the CPU could be halted and won't execute until it receives
an interrupt.

Paolo



Re: [Qemu-devel] Quick question on NVME on qemu

2015-10-12 Thread Igor Mammedov
On Sat, 10 Oct 2015 00:39:43 -0700
sahlot arvind  wrote:

> Hello,
> 
> Does qemu emulate a proper NVMe controller?  Can someone please tell me how
s far as I'm aware, QEMU doesn't emulate any NVMe controller yet.

> can I use it to emulate a proper NVMe controller? Is there some way to
> enable logging PCI accesses to NVMe controller so that I can see what
> accesses my driver or OS is doing?
> 
> Thanks in advance!
> Sahlot




Re: [Qemu-devel] Quick question on NVME on qemu

2015-10-12 Thread Kevin Wolf
Am 12.10.2015 um 10:05 hat Igor Mammedov geschrieben:
> On Sat, 10 Oct 2015 00:39:43 -0700
> sahlot arvind  wrote:
> 
> > Hello,
> > 
> > Does qemu emulate a proper NVMe controller?  Can someone please tell me how
> s far as I'm aware, QEMU doesn't emulate any NVMe controller yet.

It does, though I'm not sure how complete it is. You can use it with
-drive file=...,if=none,id=some_id -device nvme,drive=some_id

Kevin

> > can I use it to emulate a proper NVMe controller? Is there some way to
> > enable logging PCI accesses to NVMe controller so that I can see what
> > accesses my driver or OS is doing?
> > 
> > Thanks in advance!
> > Sahlot
> 
> 



Re: [Qemu-devel] Quick question on NVME on qemu

2015-10-12 Thread Keith Busch

On Mon, 12 Oct 2015, Kevin Wolf wrote:

Am 12.10.2015 um 10:05 hat Igor Mammedov geschrieben:

On Sat, 10 Oct 2015 00:39:43 -0700
sahlot arvind  wrote:


Hello,

Does qemu emulate a proper NVMe controller?  Can someone please tell me how

s far as I'm aware, QEMU doesn't emulate any NVMe controller yet.


It does, though I'm not sure how complete it is. You can use it with
-drive file=...,if=none,id=some_id -device nvme,drive=some_id


Yep, that's right. It is a basic 1.0 compliant (mostly) controller. There
are updates for up to 1.2 compliance that are in a public tree, but not
merged to mainline.

  http://git.infradead.org/users/kbusch/qemu-nvme.git


can I use it to emulate a proper NVMe controller? Is there some way to
enable logging PCI accesses to NVMe controller so that I can see what
accesses my driver or OS is doing?


It's not a problem to add more NVMe features. There's currently no such
logging mechanism, though. It shouldn't be too difficult to add if you
want to take a stab at it.



Re: [Qemu-devel] virtio: Question on VRING_DESC_F_WRITE flag

2015-05-18 Thread Stefan Hajnoczi
On Wed, May 13, 2015 at 07:16:13PM +, Subhashini Venkataraman wrote:
 I have a question on VRING_DESC_F_WRITE flag.  As per the specifications, if 
 this flag is set, then the corresponding buffer pointed by the descriptor is 
 Write only, otherwise Read Only.  I am trying to understand why there is no 
 READ_AND_WRITE option, i.e. a buffer could be Read by device emulation and 
 written back into.

All virtio devices are designed so that descriptors are read-only or
write-only.  A typical request looks like this:

 [header (RO)][data (RO)][status (WO)]

or

 [header (RO)][data (WO)][status (WO)]

Not sure what the original requirement for this design was, but the
read-only/write-only approach means that buffers could be sent over a
transport (like a pipe or TCP socket).  If a buffer was read-write then
it would be involve sending back a modified copy, which is unnecessary.

Stefan


pgp_Kf2hZh170.pgp
Description: PGP signature


Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c

2015-03-25 Thread Ian Campbell
On Wed, 2015-03-25 at 09:18 +0800, Chen, Tiejun wrote:
 Actually my problem is that, I need to add a new parameter, 'flag', like 
 this, xc_assign_device(xxx,xxx,flag). So if I don't refine xc.c, tools 
 can't be compiled successfully. Or maybe you're suggesting I may isolate 
 this file while building tools, right?

I answered this in my original reply:
 it is acceptable for the new
parameter to not be plumbed up to the Python bindings until someone
comes along with a requirement to use it from Python. IOW you can just
pass whatever the nop value is for the new argument.

The nop value is whatever value should be passed to retain the current
behaviour.

Ian.




Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c

2015-03-25 Thread Chen, Tiejun

On 2015/3/25 18:26, Ian Campbell wrote:

On Wed, 2015-03-25 at 09:18 +0800, Chen, Tiejun wrote:

Actually my problem is that, I need to add a new parameter, 'flag', like
this, xc_assign_device(xxx,xxx,flag). So if I don't refine xc.c, tools
can't be compiled successfully. Or maybe you're suggesting I may isolate
this file while building tools, right?


I answered this in my original reply:
  it is acceptable for the new
 parameter to not be plumbed up to the Python bindings until someone
 comes along with a requirement to use it from Python. IOW you can just
 pass whatever the nop value is for the new argument.



Yes, I knew this but I'm just getting a little confusion again after 
we're starting to talk if xc.c is compiled...


And I will try to do this.


The nop value is whatever value should be passed to retain the current
behaviour.


Sure.

Thanks
Tiejun




Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c

2015-03-24 Thread Ian Campbell
On Tue, 2015-03-24 at 18:15 +0800, Chen, Tiejun wrote:
 On 2015/3/24 17:51, Ian Campbell wrote:
  On Tue, 2015-03-24 at 16:47 +0800, Chen, Tiejun wrote:
  All guys,
 
 
 Thanks for your reply.
 
  Sorry to bother you.
 
  I have a question to two files, tools/python/xen/lowlevel/xc/xc.c and
  tools/python/xen/lowlevel/xl/xl.c. Who is a caller to those methods like
  pyxc_methods[] and pyxl_methods[]?
 
  They are registered with the Python runtime, so they are called from
  Python code. The first member of the struct is the pythonic function
 
 Sorry I don't understanding this. So seems you mean instead of xl, this 
 is called by the third party user with python?

Yes, tools/python/xen is the python bindings for various C libraries
supported by Xen.

NB, the libxl ones are broken and not even compiled right now, you can
ignore them.

 
  name, e.g. from xc.c:
   { domain_create,
 
 Otherwise, often we always perform `xl create xxx' to create a VM. So I 
 think this should go into this flow like this,
 
 xl_cmdtable.c:main_create()
   |
   + create_domain()
   |
   + libxl_domain_create_new()
   |
   + do_domain_create()
   |
   + 
 Right?

Yes, xl is written in C not python so tools/python doesn't enter the
picture.

 
 (PyCFunction)pyxc_domain_create,
 
 So I don't see 'pyxc_domain_create' is called. Or I'm missing something...

Chances are that there are no intree users of this code any more, xend
would have used it at one time with something like:
import xen.lowlevel.xc
xc = xen.lowlevel.xc.xc()
xc.domain_create()
etc.

 
  In my specific case, I'm trying to introduce a new flag to each a device
  while assigning device. So this means I have to add a parameter, 'flag',
  into
 
  int xc_assign_device(
xc_interface *xch,
uint32_t domid,
uint32_t machine_sbdf)
 
  Then this is extended as
 
  int xc_assign_device(
xc_interface *xch,
uint32_t domid,
uint32_t machine_sbdf,
uint32_t flag)
 
  After this introduction, obviously I should cover all cases using
  xc_assign_device(). And also I found this fallout goes into these two
  files. For example, here pyxc_assign_device() is involved. Currently it
  has two parameters, 'dom' and 'pci_str', and as I understand 'pci_str'
  should represent all pci devices with SBDF format, right?
 
  It appears so, yes.
 
  But I don't know exactly what rule should be complied to construct this
  sort of flag into 'pci_str', or any reasonable idea to achieve my goal?
 
  If it is non-trivial to fix them IMHO it is acceptable for the new
  parameter to not be plumbed up to the Python bindings until someone
  comes along with a requirement to use it from Python. IOW you can just
  pass whatever the nop value is for the new argument.
 
 
 Should I extend this 'pci_str' like Seg,bus,device,function:flag? But 
 I'm not sure if I'm breaking the existing usage since like I said, I 
 don't know what scenarios are using these methods.

Like I said in the paragraph above, if it is complicated then it is fine
to ignore this new parameter from Python.

I don't know what the semantics of flag is, if it is per SBDF then I
suppose if you really wanted to expose this here then you would need to
invent some syntax for doing so.

Ian.




Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c

2015-03-24 Thread Chen, Tiejun

On 2015/3/24 18:20, Ian Campbell wrote:

On Tue, 2015-03-24 at 18:15 +0800, Chen, Tiejun wrote:

On 2015/3/24 17:51, Ian Campbell wrote:

On Tue, 2015-03-24 at 16:47 +0800, Chen, Tiejun wrote:

All guys,



Thanks for your reply.


Sorry to bother you.

I have a question to two files, tools/python/xen/lowlevel/xc/xc.c and
tools/python/xen/lowlevel/xl/xl.c. Who is a caller to those methods like
pyxc_methods[] and pyxl_methods[]?


They are registered with the Python runtime, so they are called from
Python code. The first member of the struct is the pythonic function


Sorry I don't understanding this. So seems you mean instead of xl, this
is called by the third party user with python?


Yes, tools/python/xen is the python bindings for various C libraries
supported by Xen.


Thanks for your explanation.



NB, the libxl ones are broken and not even compiled right now, you can
ignore them.


Looks this is still compiled now.






name, e.g. from xc.c:
  { domain_create,


Otherwise, often we always perform `xl create xxx' to create a VM. So I
think this should go into this flow like this,

xl_cmdtable.c:main_create()
|
+ create_domain()
|
+ libxl_domain_create_new()
|
+ do_domain_create()
|
+ 
Right?


Yes, xl is written in C not python so tools/python doesn't enter the
picture.


Yeah.






(PyCFunction)pyxc_domain_create,


So I don't see 'pyxc_domain_create' is called. Or I'm missing something...


Chances are that there are no intree users of this code any more, xend
would have used it at one time with something like:
import xen.lowlevel.xc
 xc = xen.lowlevel.xc.xc()
 xc.domain_create()
etc.




In my specific case, I'm trying to introduce a new flag to each a device
while assigning device. So this means I have to add a parameter, 'flag',
into

int xc_assign_device(
   xc_interface *xch,
   uint32_t domid,
   uint32_t machine_sbdf)

Then this is extended as

int xc_assign_device(
   xc_interface *xch,
   uint32_t domid,
   uint32_t machine_sbdf,
   uint32_t flag)

After this introduction, obviously I should cover all cases using
xc_assign_device(). And also I found this fallout goes into these two
files. For example, here pyxc_assign_device() is involved. Currently it
has two parameters, 'dom' and 'pci_str', and as I understand 'pci_str'
should represent all pci devices with SBDF format, right?


It appears so, yes.


But I don't know exactly what rule should be complied to construct this
sort of flag into 'pci_str', or any reasonable idea to achieve my goal?


If it is non-trivial to fix them IMHO it is acceptable for the new
parameter to not be plumbed up to the Python bindings until someone
comes along with a requirement to use it from Python. IOW you can just
pass whatever the nop value is for the new argument.



Should I extend this 'pci_str' like Seg,bus,device,function:flag? But
I'm not sure if I'm breaking the existing usage since like I said, I
don't know what scenarios are using these methods.


Like I said in the paragraph above, if it is complicated then it is fine
to ignore this new parameter from Python.

I don't know what the semantics of flag is, if it is per SBDF then I


Yes, this should be a flag specific to a SBDF.

You know, I'm working to fix RMRR completely. Based on some discussion 
about that design ( I assume you may read that thread previously :) ), 
now we probably need to pass a flag to introduce our policy.



suppose if you really wanted to expose this here then you would need to
invent some syntax for doing so.



Definitely.

When I finish this I will send you to review technically.

Again, really appreciate your clarification to me.

Thanks
Tiejun



Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c

2015-03-24 Thread Ian Campbell
On Tue, 2015-03-24 at 18:31 +0800, Chen, Tiejun wrote:
  NB, the libxl ones are broken and not even compiled right now, you can
  ignore them.
 
 Looks this is still compiled now.

xc is, xl is not, I am sure of that.

  I don't know what the semantics of flag is, if it is per SBDF then I
 
 Yes, this should be a flag specific to a SBDF.
 
 You know, I'm working to fix RMRR completely. Based on some discussion 
 about that design ( I assume you may read that thread previously :) ), 
 now we probably need to pass a flag to introduce our policy.

Unless you have a concrete requirement to expose RMRR via the Python
bindings to libxc (i.e. you know somebody is using them) then I think
you should not bother.

Making RMRR work via the (C) interface to libxl used by xl and libvirt
is sufficient for a new in tree feature.

Ian.
  suppose if you really wanted to expose this here then you would need to
  invent some syntax for doing so.
 
 
 Definitely.
 
 When I finish this I will send you to review technically.
 
 Again, really appreciate your clarification to me.
 
 Thanks
 Tiejun





Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c

2015-03-24 Thread Ian Campbell
On Tue, 2015-03-24 at 16:47 +0800, Chen, Tiejun wrote:
 All guys,
 
 Sorry to bother you.
 
 I have a question to two files, tools/python/xen/lowlevel/xc/xc.c and 
 tools/python/xen/lowlevel/xl/xl.c. Who is a caller to those methods like 
 pyxc_methods[] and pyxl_methods[]?

They are registered with the Python runtime, so they are called from
Python code. The first member of the struct is the pythonic function
name, e.g. from xc.c:
{ domain_create, 
  (PyCFunction)pyxc_domain_create, 
  METH_VARARGS | METH_KEYWORDS, \n
  Create a new domain.\n
   dom[int, 0]:Domain identifier to use (allocated if zero).\n
  Returns: [int] new domain identifier; -1 on error.\n },
defines a method called domain_create, in the xen.lowlevel.xc namespace.

  And how should we call these approaches?

I'm not sure what you are asking here.

 In my specific case, I'm trying to introduce a new flag to each a device 
 while assigning device. So this means I have to add a parameter, 'flag', 
 into
 
 int xc_assign_device(
  xc_interface *xch,
  uint32_t domid,
  uint32_t machine_sbdf)
 
 Then this is extended as
 
 int xc_assign_device(
  xc_interface *xch,
  uint32_t domid,
  uint32_t machine_sbdf,
  uint32_t flag)
 
 After this introduction, obviously I should cover all cases using 
 xc_assign_device(). And also I found this fallout goes into these two 
 files. For example, here pyxc_assign_device() is involved. Currently it 
 has two parameters, 'dom' and 'pci_str', and as I understand 'pci_str' 
 should represent all pci devices with SBDF format, right?

It appears so, yes.

 But I don't know exactly what rule should be complied to construct this 
 sort of flag into 'pci_str', or any reasonable idea to achieve my goal?

If it is non-trivial to fix them IMHO it is acceptable for the new
parameter to not be plumbed up to the Python bindings until someone
comes along with a requirement to use it from Python. IOW you can just
pass whatever the nop value is for the new argument.

Ian.




Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c

2015-03-24 Thread Chen, Tiejun

On 2015/3/24 17:51, Ian Campbell wrote:

On Tue, 2015-03-24 at 16:47 +0800, Chen, Tiejun wrote:

All guys,



Thanks for your reply.


Sorry to bother you.

I have a question to two files, tools/python/xen/lowlevel/xc/xc.c and
tools/python/xen/lowlevel/xl/xl.c. Who is a caller to those methods like
pyxc_methods[] and pyxl_methods[]?


They are registered with the Python runtime, so they are called from
Python code. The first member of the struct is the pythonic function


Sorry I don't understanding this. So seems you mean instead of xl, this 
is called by the third party user with python?



name, e.g. from xc.c:
 { domain_create,


Otherwise, often we always perform `xl create xxx' to create a VM. So I 
think this should go into this flow like this,


xl_cmdtable.c:main_create()
|
+ create_domain()
|
+ libxl_domain_create_new()
|
+ do_domain_create()
|
+ 
Right?


   (PyCFunction)pyxc_domain_create,


So I don't see 'pyxc_domain_create' is called. Or I'm missing something...


   METH_VARARGS | METH_KEYWORDS, \n
   Create a new domain.\n
dom[int, 0]:Domain identifier to use (allocated if 
zero).\n
   Returns: [int] new domain identifier; -1 on error.\n },
defines a method called domain_create, in the xen.lowlevel.xc namespace.


  And how should we call these approaches?


I'm not sure what you are asking here.


If you can give a real case to call this, things couldn't be better :)




In my specific case, I'm trying to introduce a new flag to each a device
while assigning device. So this means I have to add a parameter, 'flag',
into

int xc_assign_device(
  xc_interface *xch,
  uint32_t domid,
  uint32_t machine_sbdf)

Then this is extended as

int xc_assign_device(
  xc_interface *xch,
  uint32_t domid,
  uint32_t machine_sbdf,
  uint32_t flag)

After this introduction, obviously I should cover all cases using
xc_assign_device(). And also I found this fallout goes into these two
files. For example, here pyxc_assign_device() is involved. Currently it
has two parameters, 'dom' and 'pci_str', and as I understand 'pci_str'
should represent all pci devices with SBDF format, right?


It appears so, yes.


But I don't know exactly what rule should be complied to construct this
sort of flag into 'pci_str', or any reasonable idea to achieve my goal?


If it is non-trivial to fix them IMHO it is acceptable for the new
parameter to not be plumbed up to the Python bindings until someone
comes along with a requirement to use it from Python. IOW you can just
pass whatever the nop value is for the new argument.



Should I extend this 'pci_str' like Seg,bus,device,function:flag? But 
I'm not sure if I'm breaking the existing usage since like I said, I 
don't know what scenarios are using these methods.


Thanks
Tiejun



Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c

2015-03-24 Thread Chen, Tiejun

On 2015/3/24 18:40, Ian Campbell wrote:

On Tue, 2015-03-24 at 18:31 +0800, Chen, Tiejun wrote:

NB, the libxl ones are broken and not even compiled right now, you can
ignore them.


Looks this is still compiled now.


xc is, xl is not, I am sure of that.


Indeed, you're right :)




I don't know what the semantics of flag is, if it is per SBDF then I


Yes, this should be a flag specific to a SBDF.

You know, I'm working to fix RMRR completely. Based on some discussion
about that design ( I assume you may read that thread previously :) ),
now we probably need to pass a flag to introduce our policy.


Unless you have a concrete requirement to expose RMRR via the Python
bindings to libxc (i.e. you know somebody is using them) then I think
you should not bother.


Actually my problem is that, I need to add a new parameter, 'flag', like 
this, xc_assign_device(xxx,xxx,flag). So if I don't refine xc.c, tools 
can't be compiled successfully. Or maybe you're suggesting I may isolate 
this file while building tools, right?




Making RMRR work via the (C) interface to libxl used by xl and libvirt
is sufficient for a new in tree feature.


Yeah.

Thanks
Tiejun



Ian.

suppose if you really wanted to expose this here then you would need to
invent some syntax for doing so.



Definitely.

When I finish this I will send you to review technically.

Again, really appreciate your clarification to me.

Thanks
Tiejun








Re: [Qemu-devel] Dummy question for setting up a serial connection between host and guest

2015-03-13 Thread Markus Armbruster
Christopher Covington c...@codeaurora.org writes:

 Hi Alex,

 On 03/10/2015 09:12 PM, Alex Sun wrote:
 I downloaded QEMU 2.2.0, and built a qemu-system-arm from there.
 I loaded a versatile kernel 2.6.32.5 and my own file system.
 
 #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb -option-rom
 efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile -append
 root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data
 -nographic -no-reboot
 
 I see following message while the kernel is booting:
 
 [3.873429] dev:f1: ttyAMA0 at MMIO 0x101f1000 (irq = 12) is a AMBA/PL011
 [3.884631] console [ttyAMA0] enabled
 [3.889783] dev:f2: ttyAMA1 at MMIO 0x101f2000 (irq = 13) is a AMBA/PL011
 [3.890646] dev:f3: ttyAMA2 at MMIO 0x101f3000 (irq = 14) is a AMBA/PL011
 
 I think the serial ports of ttyAMA[0-2] are enabled.
 Then I try to do serial redirection:
 
 #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb -option-rom
 efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile -append
 root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data
 -nographic -no-reboot *-serial stdio -serial pty*

 However, I get error message:
 QEMU 2.2.0 monitor - type 'help' for more information
 *(qemu) qemu-system-arm: -serial stdio: cannot use stdio by multiple 
 character
 devices*
 
 It's weired to me. I don't think there is another one other than qemu itself
 holding the stdio. 

You can connect multiple users to a character device in multiplexing
mode.  Quoting the fine manual:

A character device may be used in multiplexing mode by multiple
front-ends.  The key sequence of @key{Control-a} and @key{c} will
rotate the input focus between attached front-ends. Specify
@option{mux=on} to enable this mode.

 I think the conflict is that when you specify -nographic, the QEMU monitor
 uses stdio. I find it necessary to add -monitor none to all of my
 command lines.

I use -nodefault.  One stone, many annoying birds.



Re: [Qemu-devel] Dummy question for setting up a serial connection between host and guest

2015-03-12 Thread Christopher Covington
Hi Alex,

On 03/10/2015 09:12 PM, Alex Sun wrote:
 I downloaded QEMU 2.2.0, and built a qemu-system-arm from there.
 I loaded a versatile kernel 2.6.32.5 and my own file system.
 
 #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb -option-rom
 efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile -append
 root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data
 -nographic -no-reboot
 
 I see following message while the kernel is booting:
 
 [3.873429] dev:f1: ttyAMA0 at MMIO 0x101f1000 (irq = 12) is a AMBA/PL011
 [3.884631] console [ttyAMA0] enabled
 [3.889783] dev:f2: ttyAMA1 at MMIO 0x101f2000 (irq = 13) is a AMBA/PL011
 [3.890646] dev:f3: ttyAMA2 at MMIO 0x101f3000 (irq = 14) is a AMBA/PL011
 
 I think the serial ports of ttyAMA[0-2] are enabled.
 Then I try to do serial redirection:
 
 #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb -option-rom
 efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile -append
 root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data
 -nographic -no-reboot *-serial stdio -serial pty*

 However, I get error message:
 QEMU 2.2.0 monitor - type 'help' for more information
 *(qemu) qemu-system-arm: -serial stdio: cannot use stdio by multiple character
 devices*
 
 It's weired to me. I don't think there is another one other than qemu itself
 holding the stdio. 

I think the conflict is that when you specify -nographic, the QEMU monitor
uses stdio. I find it necessary to add -monitor none to all of my command 
lines.

Regards,
Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [Qemu-devel] Dummy question for setting up a serial connection between host and guest

2015-03-12 Thread Alex Sun
Thank you Chris!
Your suggestion is really working!!! Awesome!!!
 I find in this way if I input ^C in stdio I'm gonna kill the qemu
process. But I think it's fine to me ;-).

Did you managed to make serial communication between host (which holds QEMU
client) and guest (QEMU client)?
I tried to use -serial pty but realized that it is only one way traffic,
e.x. as I experimented,  I can send traffic from guest to host but I can't
do that from host to guest.

If it's not gonna work, I guess I have to use socket to make that
communication.

Thanks!
Alex


On Thu, Mar 12, 2015 at 9:33 AM, Christopher Covington c...@codeaurora.org
wrote:

 Hi Alex,

 On 03/10/2015 09:12 PM, Alex Sun wrote:
  I downloaded QEMU 2.2.0, and built a qemu-system-arm from there.
  I loaded a versatile kernel 2.6.32.5 and my own file system.
 
  #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb
 -option-rom
  efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile
 -append
  root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data
  -nographic -no-reboot
 
  I see following message while the kernel is booting:
 
  [3.873429] dev:f1: ttyAMA0 at MMIO 0x101f1000 (irq = 12) is a
 AMBA/PL011
  [3.884631] console [ttyAMA0] enabled
  [3.889783] dev:f2: ttyAMA1 at MMIO 0x101f2000 (irq = 13) is a
 AMBA/PL011
  [3.890646] dev:f3: ttyAMA2 at MMIO 0x101f3000 (irq = 14) is a
 AMBA/PL011
 
  I think the serial ports of ttyAMA[0-2] are enabled.
  Then I try to do serial redirection:
 
  #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb
 -option-rom
  efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile
 -append
  root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data
  -nographic -no-reboot *-serial stdio -serial pty*
 
  However, I get error message:
  QEMU 2.2.0 monitor - type 'help' for more information
  *(qemu) qemu-system-arm: -serial stdio: cannot use stdio by multiple
 character
  devices*
 
  It's weired to me. I don't think there is another one other than qemu
 itself
  holding the stdio.

 I think the conflict is that when you specify -nographic, the QEMU monitor
 uses stdio. I find it necessary to add -monitor none to all of my
 command lines.

 Regards,
 Chris

 --
 Qualcomm Innovation Center, Inc.
 The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
 a Linux Foundation Collaborative Project



Re: [Qemu-devel] a question for control queue

2015-01-14 Thread Jason Wang

On 01/09/2015 02:41 PM, Ouyang, Changchun wrote:

 Hi all,

 I have a question about the control queue in qemu,

 When the qemu have configured the control queue, and guest also
 negotiated the control queue successfully with qemu,

 Will the qemu will let vhost know guest try to use control queue to
 send some commands?


Currently not. Vhost is only in charge of data path so control virtqueue
is still handled by qemu. So the filtering does not even work if vhost
is used. The plan is let management (libvirt) to be notified when guest
want to do filtering. And then libvirt can configure the filter of host
devices.

  

 Or could the vhost also setup  the control queue to communicate
 directly with control queue on guest?


Technically, we can.

 How to do that?


Just do like what we did for rx virtqueue and tx virtqueue. But I see
several issues:

- For security reason, qemu was usually run as non-privileged process.
This means vhost kernel thread does not have the privilege to configure
filter in host.
- Vhost kernel thread know little about the backend (which could be
tun/macvtap or even packet socket).

But for vhost-user implementation, it may make sense but I'm not sure.

 Hope anyone could shed some lights on  this question.

 Thanks in advance!

  

 Thanks

 Changchun





Re: [Qemu-devel] Fwd: [question] About MSI for vioscsi

2015-01-08 Thread Wangting (Kathy)


On 2015-1-8 18:03, Vadim Rozenfeld wrote:
 On Thu, 2015-01-08 at 17:27 +0800, Wangting (Kathy) wrote:

 On 2015-1-8 17:01, Vadim Rozenfeld wrote:
 On Thu, 2015-01-08 at 16:40 +0800, Wangting (Kathy) wrote:
 Hi Vadim,

In order to enable MSI for vioscsi in virtio-win-0.1-74, I try to merge

 the patch about Bug 1077566 into it, but I find that MSI still doesn't 
 work,

 because the value of IRQ is not negative in the device manager.

I try to trace the I/O path with ENABLE_COM_DEBUG, and I find that

 pMsixCapOffset-MessageControl.MSIXEnable is zero when

 StorPortGetBusData returned in VioScsiFindAdapter, so MSI can't be enabled.

I have tested that MSI is successfully enabled in virtio-win-0.1-81.

 But I still want to find out why the patch is not work in 
 virtio-win-0.1-74.

 Is there some important points I missed?

 Did you enable MSISupported in inf file?

 Cheers,
 Vadim.

 Where is the inf file? Do you mean qemupciserial.inf?
 
 vioscsi.inx in the source directory. After stamping,
 it will be copied into vioscsi.inf, as part of build
 process.
 
 

Sorry, I forget to update vioscsi.inf after compiling. The patch is OK.

Thanks.

Ting


Thank you very much.

Best wishes,

Ting Wang














 
 
 
 




Re: [Qemu-devel] Fwd: [question] About MSI for vioscsi

2015-01-08 Thread Vadim Rozenfeld
On Thu, 2015-01-08 at 16:40 +0800, Wangting (Kathy) wrote:
 Hi Vadim,
 
In order to enable MSI for vioscsi in virtio-win-0.1-74, I try to merge
 
 the patch about Bug 1077566 into it, but I find that MSI still doesn't work,
 
 because the value of IRQ is not negative in the device manager.
 
I try to trace the I/O path with ENABLE_COM_DEBUG, and I find that
 
 pMsixCapOffset-MessageControl.MSIXEnable is zero when
 
 StorPortGetBusData returned in VioScsiFindAdapter, so MSI can't be enabled.
 
I have tested that MSI is successfully enabled in virtio-win-0.1-81.
 
 But I still want to find out why the patch is not work in virtio-win-0.1-74.
 
 Is there some important points I missed?

Did you enable MSISupported in inf file?

Cheers,
Vadim.

 
Thank you very much.
 
Best wishes,
 
Ting Wang
 
 
 
 
 
 
 
 





Re: [Qemu-devel] Fwd: [question] About MSI for vioscsi

2015-01-08 Thread Wangting (Kathy)


On 2015-1-8 17:01, Vadim Rozenfeld wrote:
 On Thu, 2015-01-08 at 16:40 +0800, Wangting (Kathy) wrote:
 Hi Vadim,

In order to enable MSI for vioscsi in virtio-win-0.1-74, I try to merge

 the patch about Bug 1077566 into it, but I find that MSI still doesn't work,

 because the value of IRQ is not negative in the device manager.

I try to trace the I/O path with ENABLE_COM_DEBUG, and I find that

 pMsixCapOffset-MessageControl.MSIXEnable is zero when

 StorPortGetBusData returned in VioScsiFindAdapter, so MSI can't be enabled.

I have tested that MSI is successfully enabled in virtio-win-0.1-81.

 But I still want to find out why the patch is not work in virtio-win-0.1-74.

 Is there some important points I missed?
 
 Did you enable MSISupported in inf file?
 
 Cheers,
 Vadim.
 
Where is the inf file? Do you mean qemupciserial.inf?

Thank you very much.

Best wishes,

Ting Wang








 
 
 
 




Re: [Qemu-devel] Fwd: [question] About MSI for vioscsi

2015-01-08 Thread Vadim Rozenfeld
On Thu, 2015-01-08 at 17:27 +0800, Wangting (Kathy) wrote:
 
 On 2015-1-8 17:01, Vadim Rozenfeld wrote:
  On Thu, 2015-01-08 at 16:40 +0800, Wangting (Kathy) wrote:
  Hi Vadim,
 
 In order to enable MSI for vioscsi in virtio-win-0.1-74, I try to merge
 
  the patch about Bug 1077566 into it, but I find that MSI still doesn't 
  work,
 
  because the value of IRQ is not negative in the device manager.
 
 I try to trace the I/O path with ENABLE_COM_DEBUG, and I find that
 
  pMsixCapOffset-MessageControl.MSIXEnable is zero when
 
  StorPortGetBusData returned in VioScsiFindAdapter, so MSI can't be enabled.
 
 I have tested that MSI is successfully enabled in virtio-win-0.1-81.
 
  But I still want to find out why the patch is not work in 
  virtio-win-0.1-74.
 
  Is there some important points I missed?
  
  Did you enable MSISupported in inf file?
  
  Cheers,
  Vadim.
  
 Where is the inf file? Do you mean qemupciserial.inf?

vioscsi.inx in the source directory. After stamping,
it will be copied into vioscsi.inf, as part of build
process.


 
 Thank you very much.
 
 Best wishes,
 
 Ting Wang
 
 
 
 
 
 
 
 
  
  
  
  
 
 





Re: [Qemu-devel] checkpatch.pl question

2014-06-08 Thread Paolo Bonzini

Il 07/06/2014 18:27, Peter Maydell ha scritto:

On 7 June 2014 17:00, Stefan Weil s...@weilnetz.de wrote:

Am 07.06.2014 16:58, schrieb Peter Maydell:

It's clearly something to do with it getting confused by the type name,
because you can suppress the warning by just changing it so it has
an _t suffix, for instance. In particular notice that the set of allowed
type names constructed by the build_types() subroutine is extremely
limited: it looks to me as if the script makes assumptions based on
kernel style (where 'struct foo' is preferred over a typedef and plain
'foo') that allow it to assume that if it's not one of a very few allowed
formats then it's not a type name.



Yes, but that's only part of the story. checkpatch.pl contains some
special handling for the standard C data types and also knows some Linux
data type patterns. It also handles the above case correctly in most
circumstances because there is special code for * used in function
argument lists.


Interesting. There are obviously multiple situations where it
fails in this way for different reasons. This is the case I saw yesterday:

@@ -125,19 +125,20 @@ static TCGRegSet tcg_target_available_regs[2];
 static TCGRegSet tcg_target_call_clobber_regs;

 #if TCG_TARGET_INSN_UNIT_SIZE == 1
-static inline void tcg_out8(TCGContext *s, uint8_t v)
+static __attribute__((unused)) inline void tcg_out8(TCGContext *s, uint8_t v)
 {
 *s-code_ptr++ = v;
 }

No macros or previous context for it to get confused by.
Saying TCGContext_t instead silences the warning.


Given QEMU's coding style, any camelcase identifier (or equivalently and 
identifier containing an uppercase and a lowercase letter) could be 
considered a type.


Paolo



Re: [Qemu-devel] checkpatch.pl question

2014-06-07 Thread Peter Maydell
On 6 June 2014 08:17, Markus Armbruster arm...@redhat.com wrote:
 --debug values=1 produces

 188  . extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
 188  EEVVVNTTVVCCVVVCC

 which suggests it recognizes the declaration just fine.

 Copying a few possible victims^Wexperts.

It's clearly something to do with it getting confused by the type name,
because you can suppress the warning by just changing it so it has
an _t suffix, for instance. In particular notice that the set of allowed
type names constructed by the build_types() subroutine is extremely
limited: it looks to me as if the script makes assumptions based on
kernel style (where 'struct foo' is preferred over a typedef and plain
'foo') that allow it to assume that if it's not one of a very few allowed
formats then it's not a type name.

I find this script extremely hard to understand, though.

thanks
-- PMM



Re: [Qemu-devel] checkpatch.pl question

2014-06-07 Thread Stefan Weil
Am 07.06.2014 16:58, schrieb Peter Maydell:
 On 6 June 2014 08:17, Markus Armbruster arm...@redhat.com wrote:
 --debug values=1 produces

 188  . extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
 188  EEVVVNTTVVCCVVVCC

 which suggests it recognizes the declaration just fine.

 Copying a few possible victims^Wexperts.
 
 It's clearly something to do with it getting confused by the type name,
 because you can suppress the warning by just changing it so it has
 an _t suffix, for instance. In particular notice that the set of allowed
 type names constructed by the build_types() subroutine is extremely
 limited: it looks to me as if the script makes assumptions based on
 kernel style (where 'struct foo' is preferred over a typedef and plain
 'foo') that allow it to assume that if it's not one of a very few allowed
 formats then it's not a type name.
 
 I find this script extremely hard to understand, though.
 
 thanks
 -- PMM
 

Yes, but that's only part of the story. checkpatch.pl contains some
special handling for the standard C data types and also knows some Linux
data type patterns. It also handles the above case correctly in most
circumstances because there is special code for * used in function
argument lists.

Here checkpatch.pl gets confused by a totally different line of the patch:

 type_init(register_vfio_pci_dev_type)

Simply adding a semicolon at the end of that line would help, but is of
course not correct (note that there is already some QEMU code which adds
such a semicolon and which should be fixed). It's generally very
difficult or even impossible for code analysers with a limited view to
analyse macro calls correctly. A human reviewer has the same problem.

checkpatch.pl could be improved to handle the patch correctly: the
type_init line and the line which raises the error message belong to
different files. Obviously some global state variables are not reset
when checkpatch.pl detects patch code belonging to a new file. I'm still
searching which ones. A similar problem might also occur when more than
one patch is checked: this also does not reset all relevant variables
when switching from one patch file to the next one. The original Linux
version of checkpatch.pl shows the same bug.

Regards
Stefan

PS. I just sent a first patch for checkpatch.pl which is totally
unrelated to the above problem, but which I noticed while I investigated it.



  1   2   >