Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, Jun 06, 2010 at 12:10:07PM +0200, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Sun, Jun 06, 2010 at 10:07:48AM +0200, Jan Kiszka wrote: > >> Gleb Natapov wrote: > >>> On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: > >>> I'd like to also support EOI handling. When the guest clears the > >>> interrupt condtion, the EOI callback would be called. This could occur > >>> much later than the IRQ delivery time. I'm not sure if we need the > >>> result code in that case. > >>> > >>> If any intermediate device (IOAPIC?) needs to be informed about either > >>> delivery or EOI also, it could create a proxy message with its > >>> callbacks in place. But we need then a separate opaque field (in > >>> addition to payload) to store the original message. > >>> > >>> struct IRQMsg { > >>> DeviceState *src; > >>> void (*delivery_cb)(IRQMsg *msg, int result); > >>> void (*eoi_cb)(IRQMsg *msg, int result); > >>> void *src_opaque; > >>> void *payload; > >>> }; > >> Extending the lifetime of IRQMsg objects beyond the delivery call stack > >> means qemu_malloc/free for every delivery. I think it takes a _very_ > >> appealing reason to justify this. But so far I do not see any use case > >> for eio_cb at all. > >> > > I dislike use of eoi for reinfecting missing interrupts since > > it eliminates use of internal PIC/APIC queue of not yet delivered > > interrupts. PIC and APIC has internal queue that can handle two > > elements: > > one is delivered, but not yet acked interrupt in isr and another is > > pending interrupt in irr. Using eoi callback (or ack notifier as it's > > called inside kernel) interrupt will be considered coalesced even if irr > > is cleared, but no ack was received for previously delivered interrupt. > > But ack notifiers actually has another use: device assignment. There is > > a plan to move device assignment from kernel to userspace and for that > > ack notifiers will have to be extended to userspace too. If so we can > > use them to do irq decoalescing as well. I doubt they should be part > > of IRQMsg though. Why not do what kernel does: have globally registered > > notifier based on irqchip/pin. > I read this twice but I still don't get your plan. Do you like or > dislike using EIO for de-coalescing? And how should these notifiers work? > > >>> That's because I confused myself :) I _dislike_ them to be used, but > >>> since device assignment requires ack notifiers anyway may be it is better > >>> to introduce one mechanism for device assignmen + de-coalescing instead > >>> of introducing two different mechanism. Using ack notifiers should be > >>> easy: RTC registers ack notifier and keep track of delivered interrupts. > >>> If timer triggers after previews irq was set, but before it was acked > >>> coalesced counter is incremented. In ack notifier callback coalesced > >>> counter is checked and if it is not zero new irq is set. > >> Ack notifier registrations and event deliveries still need to be routed. > >> Piggy-backing this on IRQ messages may be unavoidable for that reason. > > It is done in the kernel without piggy-backing. > > As it does not include any IRQ routers in front of the interrupt > controller. Maybe it works for x86, but it is no generic solution. > x86 has IRQ router in front of interrupt controller inside pci host bridge. > Also, periodic timer sources get no information about the fact that > their interrupt is masked somewhere along the path to the VCPUs and will > possibly replay countless IRQs when the masking ends, no? > Correct, for that we have mask notifiers in the kernel. Gets ugly be the minute. > > > >> Anyway, I'm going to post my HPET updates with the infrastructure for > >> IRQMsg now. Maybe it's helpful to see the other option in reality. > >> > > One other think to consider current approach does not always work. > > Win2K3-64bit-smp and Win2k8-64bit-smp configure RTC interrupt to be > > broadcasted to all cpus, but only boot cpu does time calculation. With > > current approach if interrupt is delivered to at least one vcpu > > it will not be considered coalesced, but if cpu it was delivered to is > > not cpu that does time accounting then clock will drift. > > That means we would have to fire callbacks per receiving CPU and report > its number back. Is there a way to find out if we are running such a > guest without an '-enable-win2k[38]-64bit-smp-rtc-drift-fix'? > Not that I know of. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Gleb Natapov wrote: > On Sun, Jun 06, 2010 at 10:07:48AM +0200, Jan Kiszka wrote: >> Gleb Natapov wrote: >>> On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: > On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: >>> I'd like to also support EOI handling. When the guest clears the >>> interrupt condtion, the EOI callback would be called. This could occur >>> much later than the IRQ delivery time. I'm not sure if we need the >>> result code in that case. >>> >>> If any intermediate device (IOAPIC?) needs to be informed about either >>> delivery or EOI also, it could create a proxy message with its >>> callbacks in place. But we need then a separate opaque field (in >>> addition to payload) to store the original message. >>> >>> struct IRQMsg { >>> DeviceState *src; >>> void (*delivery_cb)(IRQMsg *msg, int result); >>> void (*eoi_cb)(IRQMsg *msg, int result); >>> void *src_opaque; >>> void *payload; >>> }; >> Extending the lifetime of IRQMsg objects beyond the delivery call stack >> means qemu_malloc/free for every delivery. I think it takes a _very_ >> appealing reason to justify this. But so far I do not see any use case >> for eio_cb at all. >> > I dislike use of eoi for reinfecting missing interrupts since > it eliminates use of internal PIC/APIC queue of not yet delivered > interrupts. PIC and APIC has internal queue that can handle two elements: > one is delivered, but not yet acked interrupt in isr and another is > pending interrupt in irr. Using eoi callback (or ack notifier as it's > called inside kernel) interrupt will be considered coalesced even if irr > is cleared, but no ack was received for previously delivered interrupt. > But ack notifiers actually has another use: device assignment. There is > a plan to move device assignment from kernel to userspace and for that > ack notifiers will have to be extended to userspace too. If so we can > use them to do irq decoalescing as well. I doubt they should be part > of IRQMsg though. Why not do what kernel does: have globally registered > notifier based on irqchip/pin. I read this twice but I still don't get your plan. Do you like or dislike using EIO for de-coalescing? And how should these notifiers work? >>> That's because I confused myself :) I _dislike_ them to be used, but >>> since device assignment requires ack notifiers anyway may be it is better >>> to introduce one mechanism for device assignmen + de-coalescing instead >>> of introducing two different mechanism. Using ack notifiers should be >>> easy: RTC registers ack notifier and keep track of delivered interrupts. >>> If timer triggers after previews irq was set, but before it was acked >>> coalesced counter is incremented. In ack notifier callback coalesced >>> counter is checked and if it is not zero new irq is set. >> Ack notifier registrations and event deliveries still need to be routed. >> Piggy-backing this on IRQ messages may be unavoidable for that reason. > It is done in the kernel without piggy-backing. As it does not include any IRQ routers in front of the interrupt controller. Maybe it works for x86, but it is no generic solution. Also, periodic timer sources get no information about the fact that their interrupt is masked somewhere along the path to the VCPUs and will possibly replay countless IRQs when the masking ends, no? > >> Anyway, I'm going to post my HPET updates with the infrastructure for >> IRQMsg now. Maybe it's helpful to see the other option in reality. >> > One other think to consider current approach does not always work. > Win2K3-64bit-smp and Win2k8-64bit-smp configure RTC interrupt to be > broadcasted to all cpus, but only boot cpu does time calculation. With > current approach if interrupt is delivered to at least one vcpu > it will not be considered coalesced, but if cpu it was delivered to is > not cpu that does time accounting then clock will drift. That means we would have to fire callbacks per receiving CPU and report its number back. Is there a way to find out if we are running such a guest without an '-enable-win2k[38]-64bit-smp-rtc-drift-fix'? Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, Jun 06, 2010 at 10:07:48AM +0200, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: > >> Gleb Natapov wrote: > >>> On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: > > I'd like to also support EOI handling. When the guest clears the > > interrupt condtion, the EOI callback would be called. This could occur > > much later than the IRQ delivery time. I'm not sure if we need the > > result code in that case. > > > > If any intermediate device (IOAPIC?) needs to be informed about either > > delivery or EOI also, it could create a proxy message with its > > callbacks in place. But we need then a separate opaque field (in > > addition to payload) to store the original message. > > > > struct IRQMsg { > > DeviceState *src; > > void (*delivery_cb)(IRQMsg *msg, int result); > > void (*eoi_cb)(IRQMsg *msg, int result); > > void *src_opaque; > > void *payload; > > }; > Extending the lifetime of IRQMsg objects beyond the delivery call stack > means qemu_malloc/free for every delivery. I think it takes a _very_ > appealing reason to justify this. But so far I do not see any use case > for eio_cb at all. > > >>> I dislike use of eoi for reinfecting missing interrupts since > >>> it eliminates use of internal PIC/APIC queue of not yet delivered > >>> interrupts. PIC and APIC has internal queue that can handle two elements: > >>> one is delivered, but not yet acked interrupt in isr and another is > >>> pending interrupt in irr. Using eoi callback (or ack notifier as it's > >>> called inside kernel) interrupt will be considered coalesced even if irr > >>> is cleared, but no ack was received for previously delivered interrupt. > >>> But ack notifiers actually has another use: device assignment. There is > >>> a plan to move device assignment from kernel to userspace and for that > >>> ack notifiers will have to be extended to userspace too. If so we can > >>> use them to do irq decoalescing as well. I doubt they should be part > >>> of IRQMsg though. Why not do what kernel does: have globally registered > >>> notifier based on irqchip/pin. > >> I read this twice but I still don't get your plan. Do you like or > >> dislike using EIO for de-coalescing? And how should these notifiers work? > >> > > That's because I confused myself :) I _dislike_ them to be used, but > > since device assignment requires ack notifiers anyway may be it is better > > to introduce one mechanism for device assignmen + de-coalescing instead > > of introducing two different mechanism. Using ack notifiers should be > > easy: RTC registers ack notifier and keep track of delivered interrupts. > > If timer triggers after previews irq was set, but before it was acked > > coalesced counter is incremented. In ack notifier callback coalesced > > counter is checked and if it is not zero new irq is set. > > Ack notifier registrations and event deliveries still need to be routed. > Piggy-backing this on IRQ messages may be unavoidable for that reason. It is done in the kernel without piggy-backing. > > Anyway, I'm going to post my HPET updates with the infrastructure for > IRQMsg now. Maybe it's helpful to see the other option in reality. > One other think to consider current approach does not always work. Win2K3-64bit-smp and Win2k8-64bit-smp configure RTC interrupt to be broadcasted to all cpus, but only boot cpu does time calculation. With current approach if interrupt is delivered to at least one vcpu it will not be considered coalesced, but if cpu it was delivered to is not cpu that does time accounting then clock will drift. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Gleb Natapov wrote: > On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: >> Gleb Natapov wrote: >>> On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: > I'd like to also support EOI handling. When the guest clears the > interrupt condtion, the EOI callback would be called. This could occur > much later than the IRQ delivery time. I'm not sure if we need the > result code in that case. > > If any intermediate device (IOAPIC?) needs to be informed about either > delivery or EOI also, it could create a proxy message with its > callbacks in place. But we need then a separate opaque field (in > addition to payload) to store the original message. > > struct IRQMsg { > DeviceState *src; > void (*delivery_cb)(IRQMsg *msg, int result); > void (*eoi_cb)(IRQMsg *msg, int result); > void *src_opaque; > void *payload; > }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. >>> I dislike use of eoi for reinfecting missing interrupts since >>> it eliminates use of internal PIC/APIC queue of not yet delivered >>> interrupts. PIC and APIC has internal queue that can handle two elements: >>> one is delivered, but not yet acked interrupt in isr and another is >>> pending interrupt in irr. Using eoi callback (or ack notifier as it's >>> called inside kernel) interrupt will be considered coalesced even if irr >>> is cleared, but no ack was received for previously delivered interrupt. >>> But ack notifiers actually has another use: device assignment. There is >>> a plan to move device assignment from kernel to userspace and for that >>> ack notifiers will have to be extended to userspace too. If so we can >>> use them to do irq decoalescing as well. I doubt they should be part >>> of IRQMsg though. Why not do what kernel does: have globally registered >>> notifier based on irqchip/pin. >> I read this twice but I still don't get your plan. Do you like or >> dislike using EIO for de-coalescing? And how should these notifiers work? >> > That's because I confused myself :) I _dislike_ them to be used, but > since device assignment requires ack notifiers anyway may be it is better > to introduce one mechanism for device assignmen + de-coalescing instead > of introducing two different mechanism. Using ack notifiers should be > easy: RTC registers ack notifier and keep track of delivered interrupts. > If timer triggers after previews irq was set, but before it was acked > coalesced counter is incremented. In ack notifier callback coalesced > counter is checked and if it is not zero new irq is set. Ack notifier registrations and event deliveries still need to be routed. Piggy-backing this on IRQ messages may be unavoidable for that reason. Anyway, I'm going to post my HPET updates with the infrastructure for IRQMsg now. Maybe it's helpful to see the other option in reality. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, Jun 06, 2010 at 07:39:49AM +, Blue Swirl wrote: > On Sun, Jun 6, 2010 at 7:15 AM, Gleb Natapov wrote: > > On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: > >> > I'd like to also support EOI handling. When the guest clears the > >> > interrupt condtion, the EOI callback would be called. This could occur > >> > much later than the IRQ delivery time. I'm not sure if we need the > >> > result code in that case. > >> > > >> > If any intermediate device (IOAPIC?) needs to be informed about either > >> > delivery or EOI also, it could create a proxy message with its > >> > callbacks in place. But we need then a separate opaque field (in > >> > addition to payload) to store the original message. > >> > > >> > struct IRQMsg { > >> > DeviceState *src; > >> > void (*delivery_cb)(IRQMsg *msg, int result); > >> > void (*eoi_cb)(IRQMsg *msg, int result); > >> > void *src_opaque; > >> > void *payload; > >> > }; > >> > >> Extending the lifetime of IRQMsg objects beyond the delivery call stack > >> means qemu_malloc/free for every delivery. I think it takes a _very_ > >> appealing reason to justify this. But so far I do not see any use case > >> for eio_cb at all. > >> > > I dislike use of eoi for reinfecting missing interrupts since > > it eliminates use of internal PIC/APIC queue of not yet delivered > > interrupts. PIC and APIC has internal queue that can handle two elements: > > one is delivered, but not yet acked interrupt in isr and another is > > pending interrupt in irr. Using eoi callback (or ack notifier as it's > > called inside kernel) interrupt will be considered coalesced even if irr > > is cleared, but no ack was received for previously delivered interrupt. > > But ack notifiers actually has another use: device assignment. There is > > a plan to move device assignment from kernel to userspace and for that > > ack notifiers will have to be extended to userspace too. If so we can > > use them to do irq decoalescing as well. I doubt they should be part > > of IRQMsg though. Why not do what kernel does: have globally registered > > notifier based on irqchip/pin. > > Because translation at IOAPIC may be lossy, IRQs from many devices > pointing to the same vector? With IRQMsg you know where a specific > message came from. The situation is different inside the kernel: it > manages both translation and registration, whereas in QEMU we could > only control registration. Configuring IOAPIC like that is against x86 architecture. OS will not be able to map from interrupt vector back to device. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: > >>> I'd like to also support EOI handling. When the guest clears the > >>> interrupt condtion, the EOI callback would be called. This could occur > >>> much later than the IRQ delivery time. I'm not sure if we need the > >>> result code in that case. > >>> > >>> If any intermediate device (IOAPIC?) needs to be informed about either > >>> delivery or EOI also, it could create a proxy message with its > >>> callbacks in place. But we need then a separate opaque field (in > >>> addition to payload) to store the original message. > >>> > >>> struct IRQMsg { > >>> DeviceState *src; > >>> void (*delivery_cb)(IRQMsg *msg, int result); > >>> void (*eoi_cb)(IRQMsg *msg, int result); > >>> void *src_opaque; > >>> void *payload; > >>> }; > >> Extending the lifetime of IRQMsg objects beyond the delivery call stack > >> means qemu_malloc/free for every delivery. I think it takes a _very_ > >> appealing reason to justify this. But so far I do not see any use case > >> for eio_cb at all. > >> > > I dislike use of eoi for reinfecting missing interrupts since > > it eliminates use of internal PIC/APIC queue of not yet delivered > > interrupts. PIC and APIC has internal queue that can handle two elements: > > one is delivered, but not yet acked interrupt in isr and another is > > pending interrupt in irr. Using eoi callback (or ack notifier as it's > > called inside kernel) interrupt will be considered coalesced even if irr > > is cleared, but no ack was received for previously delivered interrupt. > > But ack notifiers actually has another use: device assignment. There is > > a plan to move device assignment from kernel to userspace and for that > > ack notifiers will have to be extended to userspace too. If so we can > > use them to do irq decoalescing as well. I doubt they should be part > > of IRQMsg though. Why not do what kernel does: have globally registered > > notifier based on irqchip/pin. > > I read this twice but I still don't get your plan. Do you like or > dislike using EIO for de-coalescing? And how should these notifiers work? > That's because I confused myself :) I _dislike_ them to be used, but since device assignment requires ack notifiers anyway may be it is better to introduce one mechanism for device assignmen + de-coalescing instead of introducing two different mechanism. Using ack notifiers should be easy: RTC registers ack notifier and keep track of delivered interrupts. If timer triggers after previews irq was set, but before it was acked coalesced counter is incremented. In ack notifier callback coalesced counter is checked and if it is not zero new irq is set. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, Jun 6, 2010 at 7:15 AM, Gleb Natapov wrote: > On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: >> > I'd like to also support EOI handling. When the guest clears the >> > interrupt condtion, the EOI callback would be called. This could occur >> > much later than the IRQ delivery time. I'm not sure if we need the >> > result code in that case. >> > >> > If any intermediate device (IOAPIC?) needs to be informed about either >> > delivery or EOI also, it could create a proxy message with its >> > callbacks in place. But we need then a separate opaque field (in >> > addition to payload) to store the original message. >> > >> > struct IRQMsg { >> > DeviceState *src; >> > void (*delivery_cb)(IRQMsg *msg, int result); >> > void (*eoi_cb)(IRQMsg *msg, int result); >> > void *src_opaque; >> > void *payload; >> > }; >> >> Extending the lifetime of IRQMsg objects beyond the delivery call stack >> means qemu_malloc/free for every delivery. I think it takes a _very_ >> appealing reason to justify this. But so far I do not see any use case >> for eio_cb at all. >> > I dislike use of eoi for reinfecting missing interrupts since > it eliminates use of internal PIC/APIC queue of not yet delivered > interrupts. PIC and APIC has internal queue that can handle two elements: > one is delivered, but not yet acked interrupt in isr and another is > pending interrupt in irr. Using eoi callback (or ack notifier as it's > called inside kernel) interrupt will be considered coalesced even if irr > is cleared, but no ack was received for previously delivered interrupt. > But ack notifiers actually has another use: device assignment. There is > a plan to move device assignment from kernel to userspace and for that > ack notifiers will have to be extended to userspace too. If so we can > use them to do irq decoalescing as well. I doubt they should be part > of IRQMsg though. Why not do what kernel does: have globally registered > notifier based on irqchip/pin. Because translation at IOAPIC may be lossy, IRQs from many devices pointing to the same vector? With IRQMsg you know where a specific message came from. The situation is different inside the kernel: it manages both translation and registration, whereas in QEMU we could only control registration.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Gleb Natapov wrote: > On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: >>> I'd like to also support EOI handling. When the guest clears the >>> interrupt condtion, the EOI callback would be called. This could occur >>> much later than the IRQ delivery time. I'm not sure if we need the >>> result code in that case. >>> >>> If any intermediate device (IOAPIC?) needs to be informed about either >>> delivery or EOI also, it could create a proxy message with its >>> callbacks in place. But we need then a separate opaque field (in >>> addition to payload) to store the original message. >>> >>> struct IRQMsg { >>> DeviceState *src; >>> void (*delivery_cb)(IRQMsg *msg, int result); >>> void (*eoi_cb)(IRQMsg *msg, int result); >>> void *src_opaque; >>> void *payload; >>> }; >> Extending the lifetime of IRQMsg objects beyond the delivery call stack >> means qemu_malloc/free for every delivery. I think it takes a _very_ >> appealing reason to justify this. But so far I do not see any use case >> for eio_cb at all. >> > I dislike use of eoi for reinfecting missing interrupts since > it eliminates use of internal PIC/APIC queue of not yet delivered > interrupts. PIC and APIC has internal queue that can handle two elements: > one is delivered, but not yet acked interrupt in isr and another is > pending interrupt in irr. Using eoi callback (or ack notifier as it's > called inside kernel) interrupt will be considered coalesced even if irr > is cleared, but no ack was received for previously delivered interrupt. > But ack notifiers actually has another use: device assignment. There is > a plan to move device assignment from kernel to userspace and for that > ack notifiers will have to be extended to userspace too. If so we can > use them to do irq decoalescing as well. I doubt they should be part > of IRQMsg though. Why not do what kernel does: have globally registered > notifier based on irqchip/pin. I read this twice but I still don't get your plan. Do you like or dislike using EIO for de-coalescing? And how should these notifiers work? Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: > > I'd like to also support EOI handling. When the guest clears the > > interrupt condtion, the EOI callback would be called. This could occur > > much later than the IRQ delivery time. I'm not sure if we need the > > result code in that case. > > > > If any intermediate device (IOAPIC?) needs to be informed about either > > delivery or EOI also, it could create a proxy message with its > > callbacks in place. But we need then a separate opaque field (in > > addition to payload) to store the original message. > > > > struct IRQMsg { > > DeviceState *src; > > void (*delivery_cb)(IRQMsg *msg, int result); > > void (*eoi_cb)(IRQMsg *msg, int result); > > void *src_opaque; > > void *payload; > > }; > > Extending the lifetime of IRQMsg objects beyond the delivery call stack > means qemu_malloc/free for every delivery. I think it takes a _very_ > appealing reason to justify this. But so far I do not see any use case > for eio_cb at all. > I dislike use of eoi for reinfecting missing interrupts since it eliminates use of internal PIC/APIC queue of not yet delivered interrupts. PIC and APIC has internal queue that can handle two elements: one is delivered, but not yet acked interrupt in isr and another is pending interrupt in irr. Using eoi callback (or ack notifier as it's called inside kernel) interrupt will be considered coalesced even if irr is cleared, but no ack was received for previously delivered interrupt. But ack notifiers actually has another use: device assignment. There is a plan to move device assignment from kernel to userspace and for that ack notifiers will have to be extended to userspace too. If so we can use them to do irq decoalescing as well. I doubt they should be part of IRQMsg though. Why not do what kernel does: have globally registered notifier based on irqchip/pin. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > On Sat, Jun 5, 2010 at 8:27 AM, Jan Kiszka wrote: >> Blue Swirl wrote: >>> On Sat, Jun 5, 2010 at 12:04 AM, Jan Kiszka wrote: Blue Swirl wrote: > On Thu, Jun 3, 2010 at 7:06 AM, Gleb Natapov wrote: >> On Thu, Jun 03, 2010 at 10:03:00AM +0300, Gleb Natapov wrote: >>> On Thu, Jun 03, 2010 at 08:59:23AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: > On Thu, Jun 03, 2010 at 08:23:46AM +0200, Jan Kiszka wrote: >> Blue Swirl wrote: >>> But how about if we introduced instead a message based IRQ? Then the >>> message could specify the originator device, maybe ACK/coalesce/NACK >>> callbacks and a bigger payload than just 1 bit of level. I think >>> that >>> could achieve the same coalescing effect as what the bidirectional >>> IRQ. The payload could be useful for other purposes, for example >>> Sparc64 IRQ messages contain three 64 bit words. >> If there are more users than just IRQ de-coalescing, this indeed >> sounds >> superior. We could pass objects like this one around: >> >> struct qemu_irq_msg { >> void (*delivery_cb)(int result); >> void *payload; >> }; >> >> They would be valid within the scope of the IRQ handlers. Whoever >> terminates or actually delivers the IRQ would invoke the callback. >> And >> platforms like sparc64 could evaluate the additional payload pointer >> in >> their irqchips or wherever they need it. IRQ routers on platforms >> that >> make use of these messages would have to replicate them when >> forwarding >> an event. >> >> OK? >> > Let me see if I understand you correctly. qemu_set_irq() will get > additional parameter qemu_irq_msg and if irq was not coalesced > delivery_cb is called, so there is a guaranty that if delivery_cb is > called it is done before qemu_set_irq() returns. Correct? If the side that triggers an IRQ passes a message object with a non-NULL callback, it is supposed to be called unconditionally, passing the result of the delivery (delivered, masked, coalesced). And yes, the callback will be invoked in the context of the irq handler, so before qemu_set_irq (or rather some new qemu_set_irq_msg) returns. >>> Looks fine to me. >>> >> Except that delivery_cb should probably get pointer to qemu_irq_msg as a >> parameter. > I'd like to also support EOI handling. When the guest clears the > interrupt condtion, the EOI callback would be called. This could occur > much later than the IRQ delivery time. I'm not sure if we need the > result code in that case. > > If any intermediate device (IOAPIC?) needs to be informed about either > delivery or EOI also, it could create a proxy message with its > callbacks in place. But we need then a separate opaque field (in > addition to payload) to store the original message. > > struct IRQMsg { > DeviceState *src; > void (*delivery_cb)(IRQMsg *msg, int result); > void (*eoi_cb)(IRQMsg *msg, int result); > void *src_opaque; > void *payload; > }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. >>> I think it's safer to use allocation model anyway because this will be >>> generic code. For example, an intermediate device may want to queue >>> the IRQs. Alternatively, the callbacks could use DeviceState and some >>> opaque which can be used as the callback context: >>> void (*delivery_cb)(DeviceState *src, void *src_opaque, int result); >>> >>> EOI can be added later if needed, QEMU seems to work fine now without >>> it. But based on IOAPIC data sheet, I'd suppose it should be need to >>> pass EOI from LAPIC to IOAPIC. It could be used by coalescing as >>> another opportunity to inject IRQs though I guess the guest will ack >>> the IRQ at the same time for both RTC and APIC. >> Let's wait for a real use case for an extended IRQMsg lifetime. For now >> we are fine with stack-allocated messages which are way simpler to >> handle. I'm already drafting a first prototype based on this model. >> Switching to dynamic allocation may still happen later on once the >> urgent need shows up. > > Passing around stack allocated objects is asking for trouble. I'd much > rather use the DeviceState/opaque version then, so at least > destination should not need to use IRQMsg for anything. Right, I've hiden the IRQMsg object from the target handler, temporarily storing it in qemu_irq instead. qemu_irq_handler had to be touched anyway, so I'm now passing the IRQ object to it
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, Jun 5, 2010 at 8:27 AM, Jan Kiszka wrote: > Blue Swirl wrote: >> On Sat, Jun 5, 2010 at 12:04 AM, Jan Kiszka wrote: >>> Blue Swirl wrote: On Thu, Jun 3, 2010 at 7:06 AM, Gleb Natapov wrote: > On Thu, Jun 03, 2010 at 10:03:00AM +0300, Gleb Natapov wrote: >> On Thu, Jun 03, 2010 at 08:59:23AM +0200, Jan Kiszka wrote: >>> Gleb Natapov wrote: On Thu, Jun 03, 2010 at 08:23:46AM +0200, Jan Kiszka wrote: > Blue Swirl wrote: >> But how about if we introduced instead a message based IRQ? Then the >> message could specify the originator device, maybe ACK/coalesce/NACK >> callbacks and a bigger payload than just 1 bit of level. I think that >> could achieve the same coalescing effect as what the bidirectional >> IRQ. The payload could be useful for other purposes, for example >> Sparc64 IRQ messages contain three 64 bit words. > If there are more users than just IRQ de-coalescing, this indeed > sounds > superior. We could pass objects like this one around: > > struct qemu_irq_msg { > void (*delivery_cb)(int result); > void *payload; > }; > > They would be valid within the scope of the IRQ handlers. Whoever > terminates or actually delivers the IRQ would invoke the callback. And > platforms like sparc64 could evaluate the additional payload pointer > in > their irqchips or wherever they need it. IRQ routers on platforms that > make use of these messages would have to replicate them when > forwarding > an event. > > OK? > Let me see if I understand you correctly. qemu_set_irq() will get additional parameter qemu_irq_msg and if irq was not coalesced delivery_cb is called, so there is a guaranty that if delivery_cb is called it is done before qemu_set_irq() returns. Correct? >>> If the side that triggers an IRQ passes a message object with a non-NULL >>> callback, it is supposed to be called unconditionally, passing the >>> result of the delivery (delivered, masked, coalesced). And yes, the >>> callback will be invoked in the context of the irq handler, so before >>> qemu_set_irq (or rather some new qemu_set_irq_msg) returns. >>> >> Looks fine to me. >> > Except that delivery_cb should probably get pointer to qemu_irq_msg as a > parameter. I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; }; >>> Extending the lifetime of IRQMsg objects beyond the delivery call stack >>> means qemu_malloc/free for every delivery. I think it takes a _very_ >>> appealing reason to justify this. But so far I do not see any use case >>> for eio_cb at all. >> >> I think it's safer to use allocation model anyway because this will be >> generic code. For example, an intermediate device may want to queue >> the IRQs. Alternatively, the callbacks could use DeviceState and some >> opaque which can be used as the callback context: >> void (*delivery_cb)(DeviceState *src, void *src_opaque, int result); >> >> EOI can be added later if needed, QEMU seems to work fine now without >> it. But based on IOAPIC data sheet, I'd suppose it should be need to >> pass EOI from LAPIC to IOAPIC. It could be used by coalescing as >> another opportunity to inject IRQs though I guess the guest will ack >> the IRQ at the same time for both RTC and APIC. > > Let's wait for a real use case for an extended IRQMsg lifetime. For now > we are fine with stack-allocated messages which are way simpler to > handle. I'm already drafting a first prototype based on this model. > Switching to dynamic allocation may still happen later on once the > urgent need shows up. Passing around stack allocated objects is asking for trouble. I'd much rather use the DeviceState/opaque version then, so at least destination should not need to use IRQMsg for anything.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > On Sat, Jun 5, 2010 at 12:04 AM, Jan Kiszka wrote: >> Blue Swirl wrote: >>> On Thu, Jun 3, 2010 at 7:06 AM, Gleb Natapov wrote: On Thu, Jun 03, 2010 at 10:03:00AM +0300, Gleb Natapov wrote: > On Thu, Jun 03, 2010 at 08:59:23AM +0200, Jan Kiszka wrote: >> Gleb Natapov wrote: >>> On Thu, Jun 03, 2010 at 08:23:46AM +0200, Jan Kiszka wrote: Blue Swirl wrote: > But how about if we introduced instead a message based IRQ? Then the > message could specify the originator device, maybe ACK/coalesce/NACK > callbacks and a bigger payload than just 1 bit of level. I think that > could achieve the same coalescing effect as what the bidirectional > IRQ. The payload could be useful for other purposes, for example > Sparc64 IRQ messages contain three 64 bit words. If there are more users than just IRQ de-coalescing, this indeed sounds superior. We could pass objects like this one around: struct qemu_irq_msg { void (*delivery_cb)(int result); void *payload; }; They would be valid within the scope of the IRQ handlers. Whoever terminates or actually delivers the IRQ would invoke the callback. And platforms like sparc64 could evaluate the additional payload pointer in their irqchips or wherever they need it. IRQ routers on platforms that make use of these messages would have to replicate them when forwarding an event. OK? >>> Let me see if I understand you correctly. qemu_set_irq() will get >>> additional parameter qemu_irq_msg and if irq was not coalesced >>> delivery_cb is called, so there is a guaranty that if delivery_cb is >>> called it is done before qemu_set_irq() returns. Correct? >> If the side that triggers an IRQ passes a message object with a non-NULL >> callback, it is supposed to be called unconditionally, passing the >> result of the delivery (delivered, masked, coalesced). And yes, the >> callback will be invoked in the context of the irq handler, so before >> qemu_set_irq (or rather some new qemu_set_irq_msg) returns. >> > Looks fine to me. > Except that delivery_cb should probably get pointer to qemu_irq_msg as a parameter. >>> I'd like to also support EOI handling. When the guest clears the >>> interrupt condtion, the EOI callback would be called. This could occur >>> much later than the IRQ delivery time. I'm not sure if we need the >>> result code in that case. >>> >>> If any intermediate device (IOAPIC?) needs to be informed about either >>> delivery or EOI also, it could create a proxy message with its >>> callbacks in place. But we need then a separate opaque field (in >>> addition to payload) to store the original message. >>> >>> struct IRQMsg { >>> DeviceState *src; >>> void (*delivery_cb)(IRQMsg *msg, int result); >>> void (*eoi_cb)(IRQMsg *msg, int result); >>> void *src_opaque; >>> void *payload; >>> }; >> Extending the lifetime of IRQMsg objects beyond the delivery call stack >> means qemu_malloc/free for every delivery. I think it takes a _very_ >> appealing reason to justify this. But so far I do not see any use case >> for eio_cb at all. > > I think it's safer to use allocation model anyway because this will be > generic code. For example, an intermediate device may want to queue > the IRQs. Alternatively, the callbacks could use DeviceState and some > opaque which can be used as the callback context: > void (*delivery_cb)(DeviceState *src, void *src_opaque, int result); > > EOI can be added later if needed, QEMU seems to work fine now without > it. But based on IOAPIC data sheet, I'd suppose it should be need to > pass EOI from LAPIC to IOAPIC. It could be used by coalescing as > another opportunity to inject IRQs though I guess the guest will ack > the IRQ at the same time for both RTC and APIC. Let's wait for a real use case for an extended IRQMsg lifetime. For now we are fine with stack-allocated messages which are way simpler to handle. I'm already drafting a first prototype based on this model. Switching to dynamic allocation may still happen later on once the urgent need shows up. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, Jun 5, 2010 at 12:04 AM, Jan Kiszka wrote: > Blue Swirl wrote: >> On Thu, Jun 3, 2010 at 7:06 AM, Gleb Natapov wrote: >>> On Thu, Jun 03, 2010 at 10:03:00AM +0300, Gleb Natapov wrote: On Thu, Jun 03, 2010 at 08:59:23AM +0200, Jan Kiszka wrote: > Gleb Natapov wrote: >> On Thu, Jun 03, 2010 at 08:23:46AM +0200, Jan Kiszka wrote: >>> Blue Swirl wrote: But how about if we introduced instead a message based IRQ? Then the message could specify the originator device, maybe ACK/coalesce/NACK callbacks and a bigger payload than just 1 bit of level. I think that could achieve the same coalescing effect as what the bidirectional IRQ. The payload could be useful for other purposes, for example Sparc64 IRQ messages contain three 64 bit words. >>> If there are more users than just IRQ de-coalescing, this indeed sounds >>> superior. We could pass objects like this one around: >>> >>> struct qemu_irq_msg { >>> void (*delivery_cb)(int result); >>> void *payload; >>> }; >>> >>> They would be valid within the scope of the IRQ handlers. Whoever >>> terminates or actually delivers the IRQ would invoke the callback. And >>> platforms like sparc64 could evaluate the additional payload pointer in >>> their irqchips or wherever they need it. IRQ routers on platforms that >>> make use of these messages would have to replicate them when forwarding >>> an event. >>> >>> OK? >>> >> Let me see if I understand you correctly. qemu_set_irq() will get >> additional parameter qemu_irq_msg and if irq was not coalesced >> delivery_cb is called, so there is a guaranty that if delivery_cb is >> called it is done before qemu_set_irq() returns. Correct? > If the side that triggers an IRQ passes a message object with a non-NULL > callback, it is supposed to be called unconditionally, passing the > result of the delivery (delivered, masked, coalesced). And yes, the > callback will be invoked in the context of the irq handler, so before > qemu_set_irq (or rather some new qemu_set_irq_msg) returns. > Looks fine to me. >>> Except that delivery_cb should probably get pointer to qemu_irq_msg as a >>> parameter. >> >> I'd like to also support EOI handling. When the guest clears the >> interrupt condtion, the EOI callback would be called. This could occur >> much later than the IRQ delivery time. I'm not sure if we need the >> result code in that case. >> >> If any intermediate device (IOAPIC?) needs to be informed about either >> delivery or EOI also, it could create a proxy message with its >> callbacks in place. But we need then a separate opaque field (in >> addition to payload) to store the original message. >> >> struct IRQMsg { >> DeviceState *src; >> void (*delivery_cb)(IRQMsg *msg, int result); >> void (*eoi_cb)(IRQMsg *msg, int result); >> void *src_opaque; >> void *payload; >> }; > > Extending the lifetime of IRQMsg objects beyond the delivery call stack > means qemu_malloc/free for every delivery. I think it takes a _very_ > appealing reason to justify this. But so far I do not see any use case > for eio_cb at all. I think it's safer to use allocation model anyway because this will be generic code. For example, an intermediate device may want to queue the IRQs. Alternatively, the callbacks could use DeviceState and some opaque which can be used as the callback context: void (*delivery_cb)(DeviceState *src, void *src_opaque, int result); EOI can be added later if needed, QEMU seems to work fine now without it. But based on IOAPIC data sheet, I'd suppose it should be need to pass EOI from LAPIC to IOAPIC. It could be used by coalescing as another opportunity to inject IRQs though I guess the guest will ack the IRQ at the same time for both RTC and APIC.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > On Thu, Jun 3, 2010 at 7:06 AM, Gleb Natapov wrote: >> On Thu, Jun 03, 2010 at 10:03:00AM +0300, Gleb Natapov wrote: >>> On Thu, Jun 03, 2010 at 08:59:23AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: > On Thu, Jun 03, 2010 at 08:23:46AM +0200, Jan Kiszka wrote: >> Blue Swirl wrote: >>> But how about if we introduced instead a message based IRQ? Then the >>> message could specify the originator device, maybe ACK/coalesce/NACK >>> callbacks and a bigger payload than just 1 bit of level. I think that >>> could achieve the same coalescing effect as what the bidirectional >>> IRQ. The payload could be useful for other purposes, for example >>> Sparc64 IRQ messages contain three 64 bit words. >> If there are more users than just IRQ de-coalescing, this indeed sounds >> superior. We could pass objects like this one around: >> >> struct qemu_irq_msg { >> void (*delivery_cb)(int result); >> void *payload; >> }; >> >> They would be valid within the scope of the IRQ handlers. Whoever >> terminates or actually delivers the IRQ would invoke the callback. And >> platforms like sparc64 could evaluate the additional payload pointer in >> their irqchips or wherever they need it. IRQ routers on platforms that >> make use of these messages would have to replicate them when forwarding >> an event. >> >> OK? >> > Let me see if I understand you correctly. qemu_set_irq() will get > additional parameter qemu_irq_msg and if irq was not coalesced > delivery_cb is called, so there is a guaranty that if delivery_cb is > called it is done before qemu_set_irq() returns. Correct? If the side that triggers an IRQ passes a message object with a non-NULL callback, it is supposed to be called unconditionally, passing the result of the delivery (delivered, masked, coalesced). And yes, the callback will be invoked in the context of the irq handler, so before qemu_set_irq (or rather some new qemu_set_irq_msg) returns. >>> Looks fine to me. >>> >> Except that delivery_cb should probably get pointer to qemu_irq_msg as a >> parameter. > > I'd like to also support EOI handling. When the guest clears the > interrupt condtion, the EOI callback would be called. This could occur > much later than the IRQ delivery time. I'm not sure if we need the > result code in that case. > > If any intermediate device (IOAPIC?) needs to be informed about either > delivery or EOI also, it could create a proxy message with its > callbacks in place. But we need then a separate opaque field (in > addition to payload) to store the original message. > > struct IRQMsg { > DeviceState *src; > void (*delivery_cb)(IRQMsg *msg, int result); > void (*eoi_cb)(IRQMsg *msg, int result); > void *src_opaque; > void *payload; > }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Thu, Jun 3, 2010 at 7:06 AM, Gleb Natapov wrote: > On Thu, Jun 03, 2010 at 10:03:00AM +0300, Gleb Natapov wrote: >> On Thu, Jun 03, 2010 at 08:59:23AM +0200, Jan Kiszka wrote: >> > Gleb Natapov wrote: >> > > On Thu, Jun 03, 2010 at 08:23:46AM +0200, Jan Kiszka wrote: >> > >> Blue Swirl wrote: >> > >>> But how about if we introduced instead a message based IRQ? Then the >> > >>> message could specify the originator device, maybe ACK/coalesce/NACK >> > >>> callbacks and a bigger payload than just 1 bit of level. I think that >> > >>> could achieve the same coalescing effect as what the bidirectional >> > >>> IRQ. The payload could be useful for other purposes, for example >> > >>> Sparc64 IRQ messages contain three 64 bit words. >> > >> If there are more users than just IRQ de-coalescing, this indeed sounds >> > >> superior. We could pass objects like this one around: >> > >> >> > >> struct qemu_irq_msg { >> > >> void (*delivery_cb)(int result); >> > >> void *payload; >> > >> }; >> > >> >> > >> They would be valid within the scope of the IRQ handlers. Whoever >> > >> terminates or actually delivers the IRQ would invoke the callback. And >> > >> platforms like sparc64 could evaluate the additional payload pointer in >> > >> their irqchips or wherever they need it. IRQ routers on platforms that >> > >> make use of these messages would have to replicate them when forwarding >> > >> an event. >> > >> >> > >> OK? >> > >> >> > > Let me see if I understand you correctly. qemu_set_irq() will get >> > > additional parameter qemu_irq_msg and if irq was not coalesced >> > > delivery_cb is called, so there is a guaranty that if delivery_cb is >> > > called it is done before qemu_set_irq() returns. Correct? >> > >> > If the side that triggers an IRQ passes a message object with a non-NULL >> > callback, it is supposed to be called unconditionally, passing the >> > result of the delivery (delivered, masked, coalesced). And yes, the >> > callback will be invoked in the context of the irq handler, so before >> > qemu_set_irq (or rather some new qemu_set_irq_msg) returns. >> > >> Looks fine to me. >> > Except that delivery_cb should probably get pointer to qemu_irq_msg as a > parameter. I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; };
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Thu, Jun 03, 2010 at 10:03:00AM +0300, Gleb Natapov wrote: > On Thu, Jun 03, 2010 at 08:59:23AM +0200, Jan Kiszka wrote: > > Gleb Natapov wrote: > > > On Thu, Jun 03, 2010 at 08:23:46AM +0200, Jan Kiszka wrote: > > >> Blue Swirl wrote: > > >>> But how about if we introduced instead a message based IRQ? Then the > > >>> message could specify the originator device, maybe ACK/coalesce/NACK > > >>> callbacks and a bigger payload than just 1 bit of level. I think that > > >>> could achieve the same coalescing effect as what the bidirectional > > >>> IRQ. The payload could be useful for other purposes, for example > > >>> Sparc64 IRQ messages contain three 64 bit words. > > >> If there are more users than just IRQ de-coalescing, this indeed sounds > > >> superior. We could pass objects like this one around: > > >> > > >> struct qemu_irq_msg { > > >> void (*delivery_cb)(int result); > > >> void *payload; > > >> }; > > >> > > >> They would be valid within the scope of the IRQ handlers. Whoever > > >> terminates or actually delivers the IRQ would invoke the callback. And > > >> platforms like sparc64 could evaluate the additional payload pointer in > > >> their irqchips or wherever they need it. IRQ routers on platforms that > > >> make use of these messages would have to replicate them when forwarding > > >> an event. > > >> > > >> OK? > > >> > > > Let me see if I understand you correctly. qemu_set_irq() will get > > > additional parameter qemu_irq_msg and if irq was not coalesced > > > delivery_cb is called, so there is a guaranty that if delivery_cb is > > > called it is done before qemu_set_irq() returns. Correct? > > > > If the side that triggers an IRQ passes a message object with a non-NULL > > callback, it is supposed to be called unconditionally, passing the > > result of the delivery (delivered, masked, coalesced). And yes, the > > callback will be invoked in the context of the irq handler, so before > > qemu_set_irq (or rather some new qemu_set_irq_msg) returns. > > > Looks fine to me. > Except that delivery_cb should probably get pointer to qemu_irq_msg as a parameter. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Thu, Jun 03, 2010 at 08:59:23AM +0200, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Thu, Jun 03, 2010 at 08:23:46AM +0200, Jan Kiszka wrote: > >> Blue Swirl wrote: > >>> But how about if we introduced instead a message based IRQ? Then the > >>> message could specify the originator device, maybe ACK/coalesce/NACK > >>> callbacks and a bigger payload than just 1 bit of level. I think that > >>> could achieve the same coalescing effect as what the bidirectional > >>> IRQ. The payload could be useful for other purposes, for example > >>> Sparc64 IRQ messages contain three 64 bit words. > >> If there are more users than just IRQ de-coalescing, this indeed sounds > >> superior. We could pass objects like this one around: > >> > >> struct qemu_irq_msg { > >>void (*delivery_cb)(int result); > >>void *payload; > >> }; > >> > >> They would be valid within the scope of the IRQ handlers. Whoever > >> terminates or actually delivers the IRQ would invoke the callback. And > >> platforms like sparc64 could evaluate the additional payload pointer in > >> their irqchips or wherever they need it. IRQ routers on platforms that > >> make use of these messages would have to replicate them when forwarding > >> an event. > >> > >> OK? > >> > > Let me see if I understand you correctly. qemu_set_irq() will get > > additional parameter qemu_irq_msg and if irq was not coalesced > > delivery_cb is called, so there is a guaranty that if delivery_cb is > > called it is done before qemu_set_irq() returns. Correct? > > If the side that triggers an IRQ passes a message object with a non-NULL > callback, it is supposed to be called unconditionally, passing the > result of the delivery (delivered, masked, coalesced). And yes, the > callback will be invoked in the context of the irq handler, so before > qemu_set_irq (or rather some new qemu_set_irq_msg) returns. > Looks fine to me. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Gleb Natapov wrote: > On Thu, Jun 03, 2010 at 08:23:46AM +0200, Jan Kiszka wrote: >> Blue Swirl wrote: >>> But how about if we introduced instead a message based IRQ? Then the >>> message could specify the originator device, maybe ACK/coalesce/NACK >>> callbacks and a bigger payload than just 1 bit of level. I think that >>> could achieve the same coalescing effect as what the bidirectional >>> IRQ. The payload could be useful for other purposes, for example >>> Sparc64 IRQ messages contain three 64 bit words. >> If there are more users than just IRQ de-coalescing, this indeed sounds >> superior. We could pass objects like this one around: >> >> struct qemu_irq_msg { >> void (*delivery_cb)(int result); >> void *payload; >> }; >> >> They would be valid within the scope of the IRQ handlers. Whoever >> terminates or actually delivers the IRQ would invoke the callback. And >> platforms like sparc64 could evaluate the additional payload pointer in >> their irqchips or wherever they need it. IRQ routers on platforms that >> make use of these messages would have to replicate them when forwarding >> an event. >> >> OK? >> > Let me see if I understand you correctly. qemu_set_irq() will get > additional parameter qemu_irq_msg and if irq was not coalesced > delivery_cb is called, so there is a guaranty that if delivery_cb is > called it is done before qemu_set_irq() returns. Correct? If the side that triggers an IRQ passes a message object with a non-NULL callback, it is supposed to be called unconditionally, passing the result of the delivery (delivered, masked, coalesced). And yes, the callback will be invoked in the context of the irq handler, so before qemu_set_irq (or rather some new qemu_set_irq_msg) returns. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Thu, Jun 03, 2010 at 08:23:46AM +0200, Jan Kiszka wrote: > Blue Swirl wrote: > > But how about if we introduced instead a message based IRQ? Then the > > message could specify the originator device, maybe ACK/coalesce/NACK > > callbacks and a bigger payload than just 1 bit of level. I think that > > could achieve the same coalescing effect as what the bidirectional > > IRQ. The payload could be useful for other purposes, for example > > Sparc64 IRQ messages contain three 64 bit words. > > If there are more users than just IRQ de-coalescing, this indeed sounds > superior. We could pass objects like this one around: > > struct qemu_irq_msg { > void (*delivery_cb)(int result); > void *payload; > }; > > They would be valid within the scope of the IRQ handlers. Whoever > terminates or actually delivers the IRQ would invoke the callback. And > platforms like sparc64 could evaluate the additional payload pointer in > their irqchips or wherever they need it. IRQ routers on platforms that > make use of these messages would have to replicate them when forwarding > an event. > > OK? > Let me see if I understand you correctly. qemu_set_irq() will get additional parameter qemu_irq_msg and if irq was not coalesced delivery_cb is called, so there is a guaranty that if delivery_cb is called it is done before qemu_set_irq() returns. Correct? -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > But how about if we introduced instead a message based IRQ? Then the > message could specify the originator device, maybe ACK/coalesce/NACK > callbacks and a bigger payload than just 1 bit of level. I think that > could achieve the same coalescing effect as what the bidirectional > IRQ. The payload could be useful for other purposes, for example > Sparc64 IRQ messages contain three 64 bit words. If there are more users than just IRQ de-coalescing, this indeed sounds superior. We could pass objects like this one around: struct qemu_irq_msg { void (*delivery_cb)(int result); void *payload; }; They would be valid within the scope of the IRQ handlers. Whoever terminates or actually delivers the IRQ would invoke the callback. And platforms like sparc64 could evaluate the additional payload pointer in their irqchips or wherever they need it. IRQ routers on platforms that make use of these messages would have to replicate them when forwarding an event. OK? Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Tue, Jun 1, 2010 at 6:30 PM, Gleb Natapov wrote: > On Tue, Jun 01, 2010 at 06:00:20PM +, Blue Swirl wrote: >> >> > Looks like irr in apic is never cleared. Probably bug in userspace apic >> >> > emulation. I'll look into it. Try to route interrupt via APIC (not >> >> > ExtInt), >> >> > or use qemu-kvm with in kernel irq chip. >> >> >> >> With APIC you mean Fixed? Then the IRQ is not delivered at all. >> > You need to deliver it through IOAPIC. >> >> In this version, when USE_APIC is defined, IRQs are routed via IOAPIC >> and APIC, PIC interrupts are disabled. Otherwise, IOAPIC and APIC are >> left to reset state and PIC is used. >> >> With the PIC version, there are bogus injections. With IOAPIC and APIC >> dots come at 2Hz. -enable-kvm has no effect. > > I looked into this briefly. Virtual wire case is not handled correctly, > and pic lacks coalescing detection at all (the reason BTW because proper > solution was rejected, so only hack required by Windows was added). > Windows routes RTC interrupts through IOAPIC and this case works as you > can see. -enable-kvm on qemu should not make any difference since it > uses the same userspace pic/ioapic/apic code as TCG. It would be > interesting to try with qemu-kvm which uses in-kernel irq chips by > default. > > BTW I managed to compile you test program with gcc 4.2. 4.4 and 4.3 > produced non-working binaries for me. This exercise showed me that the IRQ connectivity between APIC and RTC can vary a lot because of guest activities. So it may be impossible to know at APIC if some IRQ came from RTC and so the coalescing must be accounted at RTC (or close to RTC). This makes a generic solution less useful, I think the net result would be to refactor rtc_coalesced_timer*() to a new file. Also my generic approach of time base adjustment would need to be steered by RTC because APIC can't. So I think I'll have to abandon this idea for now. Sorry for the delay, thanks for the interesting discussion though. Coming back to $SUBJECT: I don't think the bidirectional IRQ is a good solution either, but it may be slightly better than current hack (I didn't realize the full ugliness of the hack before, namely that APIC and RTC are really not connected at all). The same effect could be achieved with two signals going to opposite directions, like APICREQ/APICACK on 82093 IOAPIC but again, if any IRQs point to same vector at IOAPIC, there is no way to get back to originator. With bidirectional IRQs, there's a one-shot chance at the delivery time, but not later at ACK/EOI time. But how about if we introduced instead a message based IRQ? Then the message could specify the originator device, maybe ACK/coalesce/NACK callbacks and a bigger payload than just 1 bit of level. I think that could achieve the same coalescing effect as what the bidirectional IRQ. The payload could be useful for other purposes, for example Sparc64 IRQ messages contain three 64 bit words.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Tue, Jun 01, 2010 at 06:00:20PM +, Blue Swirl wrote: > >> > Looks like irr in apic is never cleared. Probably bug in userspace apic > >> > emulation. I'll look into it. Try to route interrupt via APIC (not > >> > ExtInt), > >> > or use qemu-kvm with in kernel irq chip. > >> > >> With APIC you mean Fixed? Then the IRQ is not delivered at all. > > You need to deliver it through IOAPIC. > > In this version, when USE_APIC is defined, IRQs are routed via IOAPIC > and APIC, PIC interrupts are disabled. Otherwise, IOAPIC and APIC are > left to reset state and PIC is used. > > With the PIC version, there are bogus injections. With IOAPIC and APIC > dots come at 2Hz. -enable-kvm has no effect. I looked into this briefly. Virtual wire case is not handled correctly, and pic lacks coalescing detection at all (the reason BTW because proper solution was rejected, so only hack required by Windows was added). Windows routes RTC interrupts through IOAPIC and this case works as you can see. -enable-kvm on qemu should not make any difference since it uses the same userspace pic/ioapic/apic code as TCG. It would be interesting to try with qemu-kvm which uses in-kernel irq chips by default. BTW I managed to compile you test program with gcc 4.2. 4.4 and 4.3 produced non-working binaries for me. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Mon, May 31, 2010 at 5:19 AM, Gleb Natapov wrote: > On Sun, May 30, 2010 at 08:21:30PM +, Blue Swirl wrote: >> On Sun, May 30, 2010 at 8:07 PM, Gleb Natapov wrote: >> > On Sun, May 30, 2010 at 07:37:59PM +, Blue Swirl wrote: >> >> On Sun, May 30, 2010 at 1:49 PM, Gleb Natapov wrote: >> >> > On Sun, May 30, 2010 at 12:56:26PM +, Blue Swirl wrote: >> >> >> >> Well, I'd like to get the test program also trigger it. Now I'm >> >> >> >> getting: >> >> >> >> apic: write: 0350 = >> >> >> >> apic: apic_reset_irq_delivered: old coalescing 0 >> >> >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> >> >> apic: apic_set_irq: coalescing 1 >> >> >> >> apic: apic_get_irq_delivered: returning coalescing 1 >> >> >> >> apic: apic_reset_irq_delivered: old coalescing 1 >> >> >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> >> >> apic: apic_set_irq: coalescing 0 >> >> >> >> apic: apic_get_irq_delivered: returning coalescing 0 >> >> >> >> apic: apic_reset_irq_delivered: old coalescing 0 >> >> >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> >> >> apic: apic_set_irq: coalescing 0 >> >> >> >> >> >> > So interrupt is _alway_ coalesced. If apic_get_irq_delivered() returns >> >> > 0 it means the interrupt was not delivered. >> >> >> >> That seems strange. I changed the program so that the handler gets >> >> executed, also output a dot to serial from the handler. I changed the >> >> frequency to 2Hz. >> >> >> >> Now, if I leave out -rtc-td-hack, I get the dots at 2Hz as expected. >> >> With -rtc-td-hack, the dots come out much faster. I added >> >> DEBUG_COALESCING also to RTC, with that enabled I get: >> >> qemu -L . -bios coalescing.bin -d in_asm,int -no-hpet -rtc-td-hack >> >> -serial stdio >> >> cmos: coalesced irqs scaled to 0 >> >> cmos: coalesced irqs increased to 1 >> >> cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> .cmos: injecting on ack >> >> ..cmos: injecting from timer >> >> .cmos: coalesced irqs increased to 2 >> >> cmos: injecting on ack >> >> >> >> So, there are bogus injections. >> > >> > Looks like irr in apic is never cleared. Probably bug in userspace apic >> > emulation. I'll look into it. Try to route interrupt via APIC (not ExtInt), >> > or use qemu-kvm with in kernel irq chip. >> >> With APIC you mean Fixed? Then the IRQ is not delivered at all. > You need to deliver it through IOAPIC. In this version, when USE_APIC is defined, IRQs are routed via IOAPIC and APIC, PIC interrupts are disabled. Otherwise, IOAPIC and APIC are left to reset state and PIC is used. With the PIC version, there are bogus injections. With IOAPIC and APIC dots come at 2Hz. -enable-kvm has no effect. coalescing.S Description: Binary data coalescing_apic.bin.bz2 Description: BZip2 compressed data coalescing_pic.bin.bz2 Description: BZip2 compressed data
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > On Sun, May 30, 2010 at 12:24 PM, Jan Kiszka wrote: >> Blue Swirl wrote: Linux don't use RTC as time source and I don't know about BSD, so no Linux or BSD test case for you, sorry. Run WindowXP standard HAL and put heavy load on the host. You can run video inside the gust to trigger coalescing more easily. >>> I don't have Windows XP, sorry. >> ReactOS [1], at least its 32-bit version, appears to use the RTC as well. > > I tried LiveCD and QEMU versions, both seem to hang at boot. Is that expected? Live-CD works for me (in contrast to the pre-installed image in KVM mode), but it doesn't show the desired RTC usage. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, May 30, 2010 at 08:21:30PM +, Blue Swirl wrote: > On Sun, May 30, 2010 at 8:07 PM, Gleb Natapov wrote: > > On Sun, May 30, 2010 at 07:37:59PM +, Blue Swirl wrote: > >> On Sun, May 30, 2010 at 1:49 PM, Gleb Natapov wrote: > >> > On Sun, May 30, 2010 at 12:56:26PM +, Blue Swirl wrote: > >> >> >> Well, I'd like to get the test program also trigger it. Now I'm > >> >> >> getting: > >> >> >> apic: write: 0350 = > >> >> >> apic: apic_reset_irq_delivered: old coalescing 0 > >> >> >> apic: apic_local_deliver: vector 3 delivery mode 0 > >> >> >> apic: apic_set_irq: coalescing 1 > >> >> >> apic: apic_get_irq_delivered: returning coalescing 1 > >> >> >> apic: apic_reset_irq_delivered: old coalescing 1 > >> >> >> apic: apic_local_deliver: vector 3 delivery mode 0 > >> >> >> apic: apic_set_irq: coalescing 0 > >> >> >> apic: apic_get_irq_delivered: returning coalescing 0 > >> >> >> apic: apic_reset_irq_delivered: old coalescing 0 > >> >> >> apic: apic_local_deliver: vector 3 delivery mode 0 > >> >> >> apic: apic_set_irq: coalescing 0 > >> >> >> > >> > So interrupt is _alway_ coalesced. If apic_get_irq_delivered() returns > >> > 0 it means the interrupt was not delivered. > >> > >> That seems strange. I changed the program so that the handler gets > >> executed, also output a dot to serial from the handler. I changed the > >> frequency to 2Hz. > >> > >> Now, if I leave out -rtc-td-hack, I get the dots at 2Hz as expected. > >> With -rtc-td-hack, the dots come out much faster. I added > >> DEBUG_COALESCING also to RTC, with that enabled I get: > >> qemu -L . -bios coalescing.bin -d in_asm,int -no-hpet -rtc-td-hack -serial > >> stdio > >> cmos: coalesced irqs scaled to 0 > >> cmos: coalesced irqs increased to 1 > >> cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> .cmos: injecting on ack > >> ..cmos: injecting from timer > >> .cmos: coalesced irqs increased to 2 > >> cmos: injecting on ack > >> > >> So, there are bogus injections. > > > > Looks like irr in apic is never cleared. Probably bug in userspace apic > > emulation. I'll look into it. Try to route interrupt via APIC (not ExtInt), > > or use qemu-kvm with in kernel irq chip. > > With APIC you mean Fixed? Then the IRQ is not delivered at all. You need to deliver it through IOAPIC. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, May 30, 2010 at 8:07 PM, Gleb Natapov wrote: > On Sun, May 30, 2010 at 07:37:59PM +, Blue Swirl wrote: >> On Sun, May 30, 2010 at 1:49 PM, Gleb Natapov wrote: >> > On Sun, May 30, 2010 at 12:56:26PM +, Blue Swirl wrote: >> >> >> Well, I'd like to get the test program also trigger it. Now I'm >> >> >> getting: >> >> >> apic: write: 0350 = >> >> >> apic: apic_reset_irq_delivered: old coalescing 0 >> >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> >> apic: apic_set_irq: coalescing 1 >> >> >> apic: apic_get_irq_delivered: returning coalescing 1 >> >> >> apic: apic_reset_irq_delivered: old coalescing 1 >> >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> >> apic: apic_set_irq: coalescing 0 >> >> >> apic: apic_get_irq_delivered: returning coalescing 0 >> >> >> apic: apic_reset_irq_delivered: old coalescing 0 >> >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> >> apic: apic_set_irq: coalescing 0 >> >> >> >> > So interrupt is _alway_ coalesced. If apic_get_irq_delivered() returns >> > 0 it means the interrupt was not delivered. >> >> That seems strange. I changed the program so that the handler gets >> executed, also output a dot to serial from the handler. I changed the >> frequency to 2Hz. >> >> Now, if I leave out -rtc-td-hack, I get the dots at 2Hz as expected. >> With -rtc-td-hack, the dots come out much faster. I added >> DEBUG_COALESCING also to RTC, with that enabled I get: >> qemu -L . -bios coalescing.bin -d in_asm,int -no-hpet -rtc-td-hack -serial >> stdio >> cmos: coalesced irqs scaled to 0 >> cmos: coalesced irqs increased to 1 >> cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> .cmos: injecting on ack >> ..cmos: injecting from timer >> .cmos: coalesced irqs increased to 2 >> cmos: injecting on ack >> >> So, there are bogus injections. > > Looks like irr in apic is never cleared. Probably bug in userspace apic > emulation. I'll look into it. Try to route interrupt via APIC (not ExtInt), > or use qemu-kvm with in kernel irq chip. With APIC you mean Fixed? Then the IRQ is not delivered at all.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, May 30, 2010 at 07:37:59PM +, Blue Swirl wrote: > On Sun, May 30, 2010 at 1:49 PM, Gleb Natapov wrote: > > On Sun, May 30, 2010 at 12:56:26PM +, Blue Swirl wrote: > >> >> Well, I'd like to get the test program also trigger it. Now I'm getting: > >> >> apic: write: 0350 = > >> >> apic: apic_reset_irq_delivered: old coalescing 0 > >> >> apic: apic_local_deliver: vector 3 delivery mode 0 > >> >> apic: apic_set_irq: coalescing 1 > >> >> apic: apic_get_irq_delivered: returning coalescing 1 > >> >> apic: apic_reset_irq_delivered: old coalescing 1 > >> >> apic: apic_local_deliver: vector 3 delivery mode 0 > >> >> apic: apic_set_irq: coalescing 0 > >> >> apic: apic_get_irq_delivered: returning coalescing 0 > >> >> apic: apic_reset_irq_delivered: old coalescing 0 > >> >> apic: apic_local_deliver: vector 3 delivery mode 0 > >> >> apic: apic_set_irq: coalescing 0 > >> >> > > So interrupt is _alway_ coalesced. If apic_get_irq_delivered() returns > > 0 it means the interrupt was not delivered. > > That seems strange. I changed the program so that the handler gets > executed, also output a dot to serial from the handler. I changed the > frequency to 2Hz. > > Now, if I leave out -rtc-td-hack, I get the dots at 2Hz as expected. > With -rtc-td-hack, the dots come out much faster. I added > DEBUG_COALESCING also to RTC, with that enabled I get: > qemu -L . -bios coalescing.bin -d in_asm,int -no-hpet -rtc-td-hack -serial > stdio > cmos: coalesced irqs scaled to 0 > cmos: coalesced irqs increased to 1 > cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > .cmos: injecting on ack > ..cmos: injecting from timer > .cmos: coalesced irqs increased to 2 > cmos: injecting on ack > > So, there are bogus injections. Looks like irr in apic is never cleared. Probably bug in userspace apic emulation. I'll look into it. Try to route interrupt via APIC (not ExtInt), or use qemu-kvm with in kernel irq chip. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, May 30, 2010 at 1:49 PM, Gleb Natapov wrote: > On Sun, May 30, 2010 at 12:56:26PM +, Blue Swirl wrote: >> >> Well, I'd like to get the test program also trigger it. Now I'm getting: >> >> apic: write: 0350 = >> >> apic: apic_reset_irq_delivered: old coalescing 0 >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> apic: apic_set_irq: coalescing 1 >> >> apic: apic_get_irq_delivered: returning coalescing 1 >> >> apic: apic_reset_irq_delivered: old coalescing 1 >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> apic: apic_set_irq: coalescing 0 >> >> apic: apic_get_irq_delivered: returning coalescing 0 >> >> apic: apic_reset_irq_delivered: old coalescing 0 >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> apic: apic_set_irq: coalescing 0 >> >> > So interrupt is _alway_ coalesced. If apic_get_irq_delivered() returns > 0 it means the interrupt was not delivered. That seems strange. I changed the program so that the handler gets executed, also output a dot to serial from the handler. I changed the frequency to 2Hz. Now, if I leave out -rtc-td-hack, I get the dots at 2Hz as expected. With -rtc-td-hack, the dots come out much faster. I added DEBUG_COALESCING also to RTC, with that enabled I get: qemu -L . -bios coalescing.bin -d in_asm,int -no-hpet -rtc-td-hack -serial stdio cmos: coalesced irqs scaled to 0 cmos: coalesced irqs increased to 1 cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack .cmos: injecting on ack ..cmos: injecting from timer .cmos: coalesced irqs increased to 2 cmos: injecting on ack So, there are bogus injections. coalescing.S Description: Binary data coalescing.bin.bz2 Description: BZip2 compressed data
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, May 30, 2010 at 1:49 PM, Gleb Natapov wrote: > On Sun, May 30, 2010 at 12:56:26PM +, Blue Swirl wrote: >> >> Well, I'd like to get the test program also trigger it. Now I'm getting: >> >> apic: write: 0350 = >> >> apic: apic_reset_irq_delivered: old coalescing 0 >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> apic: apic_set_irq: coalescing 1 >> >> apic: apic_get_irq_delivered: returning coalescing 1 >> >> apic: apic_reset_irq_delivered: old coalescing 1 >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> apic: apic_set_irq: coalescing 0 >> >> apic: apic_get_irq_delivered: returning coalescing 0 >> >> apic: apic_reset_irq_delivered: old coalescing 0 >> >> apic: apic_local_deliver: vector 3 delivery mode 0 >> >> apic: apic_set_irq: coalescing 0 >> >> > So interrupt is _alway_ coalesced. If apic_get_irq_delivered() returns > 0 it means the interrupt was not delivered. > >> >> It looks like some other IRQs cause the coalescing, because also >> >> looking at RTC code, it seems it's not possible for RTC to raise the >> >> IRQ (except update IRQ, alarm etc.) without calling >> >> apic_reset_irq_delivered(). >> >> >> >> I've attached my test program. Compile: >> >> gcc -m32 -o coalescing coalescing.S -ffreestanding -nostdlib -Wl,-T >> >> coalescing.ld -g && objcopy -Obinary coalescing coalescing.bin >> >> >> >> Run: >> >> qemu -L . -bios coalescing.bin -no-hpet -rtc-td-hack >> >> >> > The application does not work for me. Looks like it fails to enter >> > protected mode. $pc jumps from 0xfff0 to 0x000f003e >> > and back. >> >> Strange. Here's a working binary. >> > Your binary works here too. What compiler are you using? Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.3.2-1.1' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-cld --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.3.2 (Debian 4.3.2-1.1)
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, May 30, 2010 at 12:56:26PM +, Blue Swirl wrote: > >> Well, I'd like to get the test program also trigger it. Now I'm getting: > >> apic: write: 0350 = > >> apic: apic_reset_irq_delivered: old coalescing 0 > >> apic: apic_local_deliver: vector 3 delivery mode 0 > >> apic: apic_set_irq: coalescing 1 > >> apic: apic_get_irq_delivered: returning coalescing 1 > >> apic: apic_reset_irq_delivered: old coalescing 1 > >> apic: apic_local_deliver: vector 3 delivery mode 0 > >> apic: apic_set_irq: coalescing 0 > >> apic: apic_get_irq_delivered: returning coalescing 0 > >> apic: apic_reset_irq_delivered: old coalescing 0 > >> apic: apic_local_deliver: vector 3 delivery mode 0 > >> apic: apic_set_irq: coalescing 0 > >> So interrupt is _alway_ coalesced. If apic_get_irq_delivered() returns 0 it means the interrupt was not delivered. > >> It looks like some other IRQs cause the coalescing, because also > >> looking at RTC code, it seems it's not possible for RTC to raise the > >> IRQ (except update IRQ, alarm etc.) without calling > >> apic_reset_irq_delivered(). > >> > >> I've attached my test program. Compile: > >> gcc -m32 -o coalescing coalescing.S -ffreestanding -nostdlib -Wl,-T > >> coalescing.ld -g && objcopy -Obinary coalescing coalescing.bin > >> > >> Run: > >> qemu -L . -bios coalescing.bin -no-hpet -rtc-td-hack > >> > > The application does not work for me. Looks like it fails to enter > > protected mode. $pc jumps from 0xfff0 to 0x000f003e > > and back. > > Strange. Here's a working binary. > Your binary works here too. What compiler are you using? -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
2010/5/30 Gleb Natapov : > On Sun, May 30, 2010 at 12:10:16PM +, Blue Swirl wrote: >> >> >> You missed the key word 'stopped'. If the timer is really stopped, no >> >> >> IRQs should ever come out afterwards, just like on real HW. For the >> >> >> emulation, this means loss of ticks which should have been delivered >> >> >> before the change. >> >> >> >> >> > I haven't missed it. I describe to you reality of the situation. You >> >> > want >> >> > to change reality to be more close to what you want it to be by adding >> >> > words to my description. >> >> >> >> Quoting Jan: 'So what to do with the backlog when the timer is >> >> stopped?' I didn't add any words to your description, please be more >> >> careful with your attributions. Why do you think I want to change the >> >> reality? >> > Please refer to my words when you answer to my quote. You quoted my >> > answer to you statement: >> > Gleb only mentioned the frequency change, I thought that was not so big >> > problem. But I don't think this case should be allowed happen at all, >> > it can't exist on real HW. >> >> With 'this case' I was referring to 'case with timer stopped', not >> 'case which Gleb mentioned'. >> >> > No 'stopped' was under discussion nowhere. >> >> It's clearly written there in the sentence Jan wrote. >> > Jan, not me, but lets leave this topic alone since you agree that > stopped is just a case of frequency change anyway. > >> > FWIW 'stopped' is just a case >> > of frequency change. >> >> True. >> >> > >> >> >> >> XP frequency change isn't the same case as timer being stopped. >> >> >> > And what is the big difference exactly? >> >> Because after the timer is stopped, its extremely unrealistic to send >> any IRQs. Whereas if the frequency is changed to some other nonzero >> value, we can cheat and inject some more queued IRQs. >> > Correct, when gets disables clock source (by reset or any other means) > coalesced backlog should be forgotten. > >> Anyway, if this case is not interesting because it doesn't happen in >> real life emulation scenarios, we can forget it no matter how buggy >> the current QEMU implementation is. >> >> >> > Please just go write code, experiment, debug >> >> > and _then_ come here with design. >> >> >> >> I added some debugging to RTC, PIC and APIC. I also built a small >> >> guest in x86 assembly to test the coalescing. However, in the tests >> >> with this guest and others I noticed that the coalescing only happens >> >> in some obscure conditions. >> > So try with real guest and with real load. >> >> Well, I'd like to get the test program also trigger it. Now I'm getting: >> apic: write: 0350 = >> apic: apic_reset_irq_delivered: old coalescing 0 >> apic: apic_local_deliver: vector 3 delivery mode 0 >> apic: apic_set_irq: coalescing 1 >> apic: apic_get_irq_delivered: returning coalescing 1 >> apic: apic_reset_irq_delivered: old coalescing 1 >> apic: apic_local_deliver: vector 3 delivery mode 0 >> apic: apic_set_irq: coalescing 0 >> apic: apic_get_irq_delivered: returning coalescing 0 >> apic: apic_reset_irq_delivered: old coalescing 0 >> apic: apic_local_deliver: vector 3 delivery mode 0 >> apic: apic_set_irq: coalescing 0 >> >> It looks like some other IRQs cause the coalescing, because also >> looking at RTC code, it seems it's not possible for RTC to raise the >> IRQ (except update IRQ, alarm etc.) without calling >> apic_reset_irq_delivered(). >> >> I've attached my test program. Compile: >> gcc -m32 -o coalescing coalescing.S -ffreestanding -nostdlib -Wl,-T >> coalescing.ld -g && objcopy -Obinary coalescing coalescing.bin >> >> Run: >> qemu -L . -bios coalescing.bin -no-hpet -rtc-td-hack >> > The application does not work for me. Looks like it fails to enter > protected mode. $pc jumps from 0xfff0 to 0x000f003e > and back. > >> >> >> >> By default the APIC's delivery method for IRQs is ExtInt and >> >> coalescing counting happens only with Fixed. This means that the guest >> >> needs to reprogram APIC. It also looks like RTC interrupts need to be >> >> triggered. But I didn't see both of these to happen simultaneously in >> >> my tests with Linux and Windows guests. Of course, -rtc-td-hack flag >> >> must be used and I also disabled HPET to be sure that RTC would be >> >> used. >> >> >> >> With DEBUG_COALESCING enabled, I just get increasing numbers for >> >> apic_irq_delivered: >> >> apic: apic_set_irq: coalescing 67123 >> >> apic: apic_set_irq: coalescing 67124 >> >> apic: apic_set_irq: coalescing 67125 >> > So have you actually used -rtc-td-hack option? I compiled head of >> > qemu.git with DEBUG_COALESCING and run WindowsXP guest with -rtc-td-hack >> > and I get: >> > apic: apic_reset_irq_delivered: old coalescing 3 >> > apic: apic_set_irq: coalescing 1 >> > apic: apic_get_irq_delivered: returning coalescing 1 >> > apic: apic_set_irq: coalescing 2 >> > apic: apic_set_irq: coalescing 3 >> > apic: apic_set_irq: coalescing 4 >> > apic: apic_set_irq: coal
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, May 30, 2010 at 12:24 PM, Jan Kiszka wrote: > Blue Swirl wrote: >>> Linux don't use RTC as time source and I don't know about BSD, so no >>> Linux or BSD test case for you, sorry. Run WindowXP standard HAL and put >>> heavy load on the host. You can run video inside the gust to trigger >>> coalescing more easily. >> >> I don't have Windows XP, sorry. > > ReactOS [1], at least its 32-bit version, appears to use the RTC as well. I tried LiveCD and QEMU versions, both seem to hang at boot. Is that expected?
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
2010/5/30 Gleb Natapov : > On Sun, May 30, 2010 at 12:10:16PM +, Blue Swirl wrote: >> >> >> You missed the key word 'stopped'. If the timer is really stopped, no >> >> >> IRQs should ever come out afterwards, just like on real HW. For the >> >> >> emulation, this means loss of ticks which should have been delivered >> >> >> before the change. >> >> >> >> >> > I haven't missed it. I describe to you reality of the situation. You >> >> > want >> >> > to change reality to be more close to what you want it to be by adding >> >> > words to my description. >> >> >> >> Quoting Jan: 'So what to do with the backlog when the timer is >> >> stopped?' I didn't add any words to your description, please be more >> >> careful with your attributions. Why do you think I want to change the >> >> reality? >> > Please refer to my words when you answer to my quote. You quoted my >> > answer to you statement: >> > Gleb only mentioned the frequency change, I thought that was not so big >> > problem. But I don't think this case should be allowed happen at all, >> > it can't exist on real HW. >> >> With 'this case' I was referring to 'case with timer stopped', not >> 'case which Gleb mentioned'. >> >> > No 'stopped' was under discussion nowhere. >> >> It's clearly written there in the sentence Jan wrote. >> > Jan, not me, but lets leave this topic alone since you agree that > stopped is just a case of frequency change anyway. > >> > FWIW 'stopped' is just a case >> > of frequency change. >> >> True. >> >> > >> >> >> >> XP frequency change isn't the same case as timer being stopped. >> >> >> > And what is the big difference exactly? >> >> Because after the timer is stopped, its extremely unrealistic to send >> any IRQs. Whereas if the frequency is changed to some other nonzero >> value, we can cheat and inject some more queued IRQs. >> > Correct, when gets disables clock source (by reset or any other means) > coalesced backlog should be forgotten. > >> Anyway, if this case is not interesting because it doesn't happen in >> real life emulation scenarios, we can forget it no matter how buggy >> the current QEMU implementation is. >> >> >> > Please just go write code, experiment, debug >> >> > and _then_ come here with design. >> >> >> >> I added some debugging to RTC, PIC and APIC. I also built a small >> >> guest in x86 assembly to test the coalescing. However, in the tests >> >> with this guest and others I noticed that the coalescing only happens >> >> in some obscure conditions. >> > So try with real guest and with real load. >> >> Well, I'd like to get the test program also trigger it. Now I'm getting: >> apic: write: 0350 = >> apic: apic_reset_irq_delivered: old coalescing 0 >> apic: apic_local_deliver: vector 3 delivery mode 0 >> apic: apic_set_irq: coalescing 1 >> apic: apic_get_irq_delivered: returning coalescing 1 >> apic: apic_reset_irq_delivered: old coalescing 1 >> apic: apic_local_deliver: vector 3 delivery mode 0 >> apic: apic_set_irq: coalescing 0 >> apic: apic_get_irq_delivered: returning coalescing 0 >> apic: apic_reset_irq_delivered: old coalescing 0 >> apic: apic_local_deliver: vector 3 delivery mode 0 >> apic: apic_set_irq: coalescing 0 >> >> It looks like some other IRQs cause the coalescing, because also >> looking at RTC code, it seems it's not possible for RTC to raise the >> IRQ (except update IRQ, alarm etc.) without calling >> apic_reset_irq_delivered(). >> >> I've attached my test program. Compile: >> gcc -m32 -o coalescing coalescing.S -ffreestanding -nostdlib -Wl,-T >> coalescing.ld -g && objcopy -Obinary coalescing coalescing.bin >> >> Run: >> qemu -L . -bios coalescing.bin -no-hpet -rtc-td-hack >> > The application does not work for me. Looks like it fails to enter > protected mode. $pc jumps from 0xfff0 to 0x000f003e > and back. Strange. Here's a working binary. > >> >> >> >> By default the APIC's delivery method for IRQs is ExtInt and >> >> coalescing counting happens only with Fixed. This means that the guest >> >> needs to reprogram APIC. It also looks like RTC interrupts need to be >> >> triggered. But I didn't see both of these to happen simultaneously in >> >> my tests with Linux and Windows guests. Of course, -rtc-td-hack flag >> >> must be used and I also disabled HPET to be sure that RTC would be >> >> used. >> >> >> >> With DEBUG_COALESCING enabled, I just get increasing numbers for >> >> apic_irq_delivered: >> >> apic: apic_set_irq: coalescing 67123 >> >> apic: apic_set_irq: coalescing 67124 >> >> apic: apic_set_irq: coalescing 67125 >> > So have you actually used -rtc-td-hack option? I compiled head of >> > qemu.git with DEBUG_COALESCING and run WindowsXP guest with -rtc-td-hack >> > and I get: >> > apic: apic_reset_irq_delivered: old coalescing 3 >> > apic: apic_set_irq: coalescing 1 >> > apic: apic_get_irq_delivered: returning coalescing 1 >> > apic: apic_set_irq: coalescing 2 >> > apic: apic_set_irq: coalescing 3 >> > apic: apic_set_irq: coales
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, May 30, 2010 at 12:10:16PM +, Blue Swirl wrote: > >> >> You missed the key word 'stopped'. If the timer is really stopped, no > >> >> IRQs should ever come out afterwards, just like on real HW. For the > >> >> emulation, this means loss of ticks which should have been delivered > >> >> before the change. > >> >> > >> > I haven't missed it. I describe to you reality of the situation. You want > >> > to change reality to be more close to what you want it to be by adding > >> > words to my description. > >> > >> Quoting Jan: 'So what to do with the backlog when the timer is > >> stopped?' I didn't add any words to your description, please be more > >> careful with your attributions. Why do you think I want to change the > >> reality? > > Please refer to my words when you answer to my quote. You quoted my > > answer to you statement: > > Gleb only mentioned the frequency change, I thought that was not so big > > problem. But I don't think this case should be allowed happen at all, > > it can't exist on real HW. > > With 'this case' I was referring to 'case with timer stopped', not > 'case which Gleb mentioned'. > > > No 'stopped' was under discussion nowhere. > > It's clearly written there in the sentence Jan wrote. > Jan, not me, but lets leave this topic alone since you agree that stopped is just a case of frequency change anyway. > > FWIW 'stopped' is just a case > > of frequency change. > > True. > > > > >> > >> XP frequency change isn't the same case as timer being stopped. > >> > > And what is the big difference exactly? > > Because after the timer is stopped, its extremely unrealistic to send > any IRQs. Whereas if the frequency is changed to some other nonzero > value, we can cheat and inject some more queued IRQs. > Correct, when gets disables clock source (by reset or any other means) coalesced backlog should be forgotten. > Anyway, if this case is not interesting because it doesn't happen in > real life emulation scenarios, we can forget it no matter how buggy > the current QEMU implementation is. > > >> > Please just go write code, experiment, debug > >> > and _then_ come here with design. > >> > >> I added some debugging to RTC, PIC and APIC. I also built a small > >> guest in x86 assembly to test the coalescing. However, in the tests > >> with this guest and others I noticed that the coalescing only happens > >> in some obscure conditions. > > So try with real guest and with real load. > > Well, I'd like to get the test program also trigger it. Now I'm getting: > apic: write: 0350 = > apic: apic_reset_irq_delivered: old coalescing 0 > apic: apic_local_deliver: vector 3 delivery mode 0 > apic: apic_set_irq: coalescing 1 > apic: apic_get_irq_delivered: returning coalescing 1 > apic: apic_reset_irq_delivered: old coalescing 1 > apic: apic_local_deliver: vector 3 delivery mode 0 > apic: apic_set_irq: coalescing 0 > apic: apic_get_irq_delivered: returning coalescing 0 > apic: apic_reset_irq_delivered: old coalescing 0 > apic: apic_local_deliver: vector 3 delivery mode 0 > apic: apic_set_irq: coalescing 0 > > It looks like some other IRQs cause the coalescing, because also > looking at RTC code, it seems it's not possible for RTC to raise the > IRQ (except update IRQ, alarm etc.) without calling > apic_reset_irq_delivered(). > > I've attached my test program. Compile: > gcc -m32 -o coalescing coalescing.S -ffreestanding -nostdlib -Wl,-T > coalescing.ld -g && objcopy -Obinary coalescing coalescing.bin > > Run: > qemu -L . -bios coalescing.bin -no-hpet -rtc-td-hack > The application does not work for me. Looks like it fails to enter protected mode. $pc jumps from 0xfff0 to 0x000f003e and back. > >> > >> By default the APIC's delivery method for IRQs is ExtInt and > >> coalescing counting happens only with Fixed. This means that the guest > >> needs to reprogram APIC. It also looks like RTC interrupts need to be > >> triggered. But I didn't see both of these to happen simultaneously in > >> my tests with Linux and Windows guests. Of course, -rtc-td-hack flag > >> must be used and I also disabled HPET to be sure that RTC would be > >> used. > >> > >> With DEBUG_COALESCING enabled, I just get increasing numbers for > >> apic_irq_delivered: > >> apic: apic_set_irq: coalescing 67123 > >> apic: apic_set_irq: coalescing 67124 > >> apic: apic_set_irq: coalescing 67125 > > So have you actually used -rtc-td-hack option? I compiled head of > > qemu.git with DEBUG_COALESCING and run WindowsXP guest with -rtc-td-hack > > and I get: > > apic: apic_reset_irq_delivered: old coalescing 3 > > apic: apic_set_irq: coalescing 1 > > apic: apic_get_irq_delivered: returning coalescing 1 > > apic: apic_set_irq: coalescing 2 > > apic: apic_set_irq: coalescing 3 > > apic: apic_set_irq: coalescing 4 > > apic: apic_set_irq: coalescing 5 > > apic: apic_set_irq: coalescing 6 > > apic: apic_reset_irq_delivered: old coalescing 6 > > apic: apic_set_irq: coalescing 1 > > api
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: >> Linux don't use RTC as time source and I don't know about BSD, so no >> Linux or BSD test case for you, sorry. Run WindowXP standard HAL and put >> heavy load on the host. You can run video inside the gust to trigger >> coalescing more easily. > > I don't have Windows XP, sorry. ReactOS [1], at least its 32-bit version, appears to use the RTC as well. Jan [1] http://www.reactos.org/de/download.html signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
2010/5/30 Gleb Natapov : > On Sat, May 29, 2010 at 09:21:14PM +, Blue Swirl wrote: >> On Sat, May 29, 2010 at 4:37 PM, Gleb Natapov wrote: >> > On Sat, May 29, 2010 at 04:13:22PM +, Blue Swirl wrote: >> >> On Sat, May 29, 2010 at 2:46 PM, Gleb Natapov wrote: >> >> > On Sat, May 29, 2010 at 09:35:30AM +, Blue Swirl wrote: >> >> >> > I still don't see how the alternative is supposed to simplify our >> >> >> > life >> >> >> > or improve the efficiency of the de-coalescing workaround. It's >> >> >> > rather >> >> >> > problematic like Gleb pointed out: The de-coalescing logic needs to >> >> >> > be >> >> >> > informed about periodicity changes that can only be delivered along >> >> >> > IRQs. So what to do with the backlog when the timer is stopped? >> >> >> >> >> >> What happens with the current design? Gleb only mentioned the >> >> >> frequency change, I thought that was not so big problem. But I don't >> >> >> think this case should be allowed happen at all, it can't exist on >> >> >> real HW. >> >> >> >> >> > Hm, why it can't exist on real HW? Do simple exercise. Run WindowsXP >> >> > inside QEMU, connect with gdb to QEMU process and check what frequency >> >> > RTC configured with (hint: it will be 64Hz), now run video inside the >> >> > guest and check frequency again (hint: it will be 1Khz). >> >> >> >> You missed the key word 'stopped'. If the timer is really stopped, no >> >> IRQs should ever come out afterwards, just like on real HW. For the >> >> emulation, this means loss of ticks which should have been delivered >> >> before the change. >> >> >> > I haven't missed it. I describe to you reality of the situation. You want >> > to change reality to be more close to what you want it to be by adding >> > words to my description. >> >> Quoting Jan: 'So what to do with the backlog when the timer is >> stopped?' I didn't add any words to your description, please be more >> careful with your attributions. Why do you think I want to change the >> reality? > Please refer to my words when you answer to my quote. You quoted my > answer to you statement: > Gleb only mentioned the frequency change, I thought that was not so big > problem. But I don't think this case should be allowed happen at all, > it can't exist on real HW. With 'this case' I was referring to 'case with timer stopped', not 'case which Gleb mentioned'. > No 'stopped' was under discussion nowhere. It's clearly written there in the sentence Jan wrote. > FWIW 'stopped' is just a case > of frequency change. True. > >> >> XP frequency change isn't the same case as timer being stopped. >> > And what is the big difference exactly? Because after the timer is stopped, its extremely unrealistic to send any IRQs. Whereas if the frequency is changed to some other nonzero value, we can cheat and inject some more queued IRQs. Anyway, if this case is not interesting because it doesn't happen in real life emulation scenarios, we can forget it no matter how buggy the current QEMU implementation is. >> > Please just go write code, experiment, debug >> > and _then_ come here with design. >> >> I added some debugging to RTC, PIC and APIC. I also built a small >> guest in x86 assembly to test the coalescing. However, in the tests >> with this guest and others I noticed that the coalescing only happens >> in some obscure conditions. > So try with real guest and with real load. Well, I'd like to get the test program also trigger it. Now I'm getting: apic: write: 0350 = apic: apic_reset_irq_delivered: old coalescing 0 apic: apic_local_deliver: vector 3 delivery mode 0 apic: apic_set_irq: coalescing 1 apic: apic_get_irq_delivered: returning coalescing 1 apic: apic_reset_irq_delivered: old coalescing 1 apic: apic_local_deliver: vector 3 delivery mode 0 apic: apic_set_irq: coalescing 0 apic: apic_get_irq_delivered: returning coalescing 0 apic: apic_reset_irq_delivered: old coalescing 0 apic: apic_local_deliver: vector 3 delivery mode 0 apic: apic_set_irq: coalescing 0 It looks like some other IRQs cause the coalescing, because also looking at RTC code, it seems it's not possible for RTC to raise the IRQ (except update IRQ, alarm etc.) without calling apic_reset_irq_delivered(). I've attached my test program. Compile: gcc -m32 -o coalescing coalescing.S -ffreestanding -nostdlib -Wl,-T coalescing.ld -g && objcopy -Obinary coalescing coalescing.bin Run: qemu -L . -bios coalescing.bin -no-hpet -rtc-td-hack >> >> By default the APIC's delivery method for IRQs is ExtInt and >> coalescing counting happens only with Fixed. This means that the guest >> needs to reprogram APIC. It also looks like RTC interrupts need to be >> triggered. But I didn't see both of these to happen simultaneously in >> my tests with Linux and Windows guests. Of course, -rtc-td-hack flag >> must be used and I also disabled HPET to be sure that RTC would be >> used. >> >> With DEBUG_COALESCING enabled, I just get increasing numbers for >> apic_irq_delivered:
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On 05/29/2010 12:15 PM, Blue Swirl wrote: This would allow counting the executed instructions and limit it. Thus we could emulate a 500MHz CPU on a 2GHz CPU more accurately. Why would you want to limit number of instruction executed by guest if CPU has nothing else to do anyway? The problem occurs not when we have spare cycles so give to a guest, but in opposite case. I think one problem is that the guest has executed too much compared to what would happen with real HW with a lesser CPU. That explains the RTC frequency reprogramming case. The root cause is that while qemu is scheduled out, the real time clock keeps ticking. Since we can't stop real time, we must compensate in other ways. So write the code and show us. You haven't show any evidence that RTC is the wrong place. RTC knows when interrupt was acknowledge to RTC, it know when clock frequency changes, it know when device reset happened. APIC knows only that interrupt was coalesced. It doesn't even know that it may be masked by a guest in IOAPIC (interrupts delivered while they are masked not considered coalesced). Oh, I thought interrupt masking was the reason for coalescing! What exactly is the reason then? The above. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On 05/28/2010 01:19 AM, Jan Kiszka wrote: Still, this does not answer: - How do you want to detect lost timer ticks? Your (and Gleb's) approach: during injection An alternative: during guest ack The normal pattern is inj/ack/inj/ack; if we see inj/inj/inj we know the guest isn't keeping up. - What subsystem(s) keeps track of the backlog? Clearly, the clock chip (hpet/rtc/pit). With the alternative approach, the clock emulation requests that acks be reported via a qemu_irq interface. - And depending on the above: How to detect at all that a specific IRQ is a timer tick? Clearly, a blocker unless the clock is responsible for timekeeping. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, May 30, 2010 at 11:41:36AM +, Blue Swirl wrote: > >> I meant that in the scenario, the guest won't change the RTC before > >> 30Mcyc because of some built in determinism in the guest. At that > >> point, because of some reason, the change would happen. > >> > > I still don't understand what are you trying to say here. Guest changes > > frequency because of some even in the guest. It is totally independent > > of what happens in QEMUs RTC emulation. > > I'm trying to understand the order of events. In the scenario, the > order of events on real HW would be: > 10Mcyc: tick IRQ 1 > 20Mcyc: tick IRQ 2 > 30Mcyc: tick IRQ 3 > 30Mcyc: reprogram timer > 31Mcyc: tick IRQ 4 > 32Mcyc: tick IRQ 5 > 33Mcyc: tick IRQ 6 > 34Mcyc: tick IRQ 7 > > With QEMU, the order could become: > 30Mcyc: reprogram timer > 30.5Mcyc: tick IRQ 1 > 31Mcyc: tick IRQ 2 > 31.5Mcyc: tick IRQ 3 > 32Mcyc: tick IRQ 4 > 32.5Mcyc: tick IRQ 5 > 33Mcyc: tick IRQ 6 > 34Mcyc: tick IRQ 7 > > Correct? Not sure, your description is not complete. Let me try: On real HW: 10Mcyc: tick IRQ 1 -> delivered to an OS 20Mcyc: tick IRQ 2 -> delivered to an OS 30Mcyc: tick IRQ 3 -> delivered to an OS 30Mcyc: reprogram timer 31Mcyc: tick IRQ 4 -> delivered to an OS 32Mcyc: tick IRQ 5 -> delivered to an OS 33Mcyc: tick IRQ 6 -> delivered to an OS 34Mcyc: tick IRQ 7 -> delivered to an OS With QEMU: 10Mcyc: tick IRQ 1 -> coalesced 20Mcyc: tick IRQ 2 -> coalesced 30Mcyc: tick IRQ 3 -> coalesced 30Mcyc: reprogram timer 30.1Mcyc: tick IRQ 1 -> delivered to an OS 30.2Mcyc: tick IRQ 2 -> delivered to an OS 30.3Mcyc: tick IRQ 3 -> delivered to an OS 31Mcyc: tick IRQ 4 -> delivered to an OS 32Mcyc: tick IRQ 5 -> delivered to an OS 33Mcyc: tick IRQ 6 -> delivered to an OS 34Mcyc: tick IRQ 7 -> delivered to an OS -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
2010/5/30 Gleb Natapov : > On Sat, May 29, 2010 at 08:52:34PM +, Blue Swirl wrote: >> On Sat, May 29, 2010 at 4:32 PM, Gleb Natapov wrote: >> > On Sat, May 29, 2010 at 04:03:22PM +, Blue Swirl wrote: >> >> 2010/5/29 Gleb Natapov : >> >> > On Sat, May 29, 2010 at 09:15:11AM +, Blue Swirl wrote: >> >> >> >> There is no code, because we're still at architecture design stage. >> >> >> >> >> >> >> > Try to write test code to understand the problem better. >> >> >> >> >> >> I will. >> >> >> >> >> > Please do ASAP. This discussion shows that you don't understand what is >> >> > the >> >> > problem that we are dialing with. >> >> >> >> Which part of the problem you think I don't understand? >> >> >> > It seams to me you don't understand how Windows uses RTC for time >> > keeping and how the QEMU solves the problem today. >> >> RTC causes periodic interrupts and Windows interrupt handler >> increments jiffies, like Linux? >> > Linux does much more complicated things than that to keep time, so the > only way to fix time drift in Linux was to introduce pvclock. For Window > it is not so accurate too, since Windows can change clock frequency any > time it can't calculate time from jiffies, it needs to update clock at > each time tick. > >> >> >> >> >> >> guests could also be assisted with special handling (like >> >> >> >> >> >> win2k >> >> >> >> >> >> install hack), for example guest instructions could be counted >> >> >> >> >> >> (approximately, for example using TB size or TSC) and only >> >> >> >> >> >> inject >> >> >> >> >> >> after at least N instructions have passed. >> >> >> >> >> > Guest instructions cannot be easily counted in KVM (it can be >> >> >> >> >> > done more >> >> >> >> >> > or less reliably using perf counters, may be). >> >> >> >> >> >> >> >> >> >> Aren't there any debug registers or perf counters, which can >> >> >> >> >> generate >> >> >> >> >> an interrupt after some number of instructions have been >> >> >> >> >> executed? >> >> >> >> > Don't think debug registers have something like that and they are >> >> >> >> > available for guest use anyway. Perf counters differs greatly >> >> >> >> > from CPU >> >> >> >> > to CPU (even between two CPUs of the same manufacturer), and we >> >> >> >> > want to >> >> >> >> > keep using them for profiling guests. And I don't see what >> >> >> >> > problem it >> >> >> >> > will solve anyway that can be solved by simple delay between irq >> >> >> >> > reinjection. >> >> >> >> >> >> >> >> This would allow counting the executed instructions and limit it. >> >> >> >> Thus >> >> >> >> we could emulate a 500MHz CPU on a 2GHz CPU more accurately. >> >> >> >> >> >> >> > Why would you want to limit number of instruction executed by guest >> >> >> > if >> >> >> > CPU has nothing else to do anyway? The problem occurs not when we >> >> >> > have >> >> >> > spare cycles so give to a guest, but in opposite case. >> >> >> >> >> >> I think one problem is that the guest has executed too much compared >> >> >> to what would happen with real HW with a lesser CPU. That explains the >> >> >> RTC frequency reprogramming case. >> >> > You think wrong. The problem is exactly opposite: the guest haven't >> >> > had enough execution time between two time interrupts. I don't know what >> >> > RTC frequency reprogramming case you are talking about here. >> >> >> >> The case you told me where N pending tick IRQs exist but the guest >> >> wants to change the RTC frequency from 64Hz to 1024Hz. >> >> >> >> Let's make this more concrete. 1 GHz CPU, initially 100Hz RTC, so >> >> 10Mcycles/tick or 10ms/tick. At T = 30Mcycles, guest wants to change >> >> the frequency to 1000Hz. >> >> >> >> The problem for emulation is that for the same 3 ticks, there has been >> >> so little execution power that the ticks have been coalesced. But >> >> isn't the guest cycle count then much lower than 30Mcyc? >> >> >> >> Isn't it so that the guest must be above 30Mcyc to be able to want the >> >> change? But if we reach that point, the problem must have not been >> >> too little execution time, but too much. >> >> >> > Sorry I tried hard to understand what have you said above but failed. >> > What do you mean "to be able to want the change"? Guest sometimes wants >> > to get 64 timer interrupts per second and sometimes it wants to get 1024 >> > timer interrupt per second. It wants it not as a result of time drift or >> > anything. It's just how guest behaves. You seams to be to fixated on >> > guest frequency change. It's just something you have to take into >> > account when you reinject interrupts. >> >> I meant that in the scenario, the guest won't change the RTC before >> 30Mcyc because of some built in determinism in the guest. At that >> point, because of some reason, the change would happen. >> > I still don't understand what are you trying to say here. Guest changes > frequency because of some even in the guest. It is totally independent > of what happens in QEMUs RTC emulation
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 09:21:14PM +, Blue Swirl wrote: > On Sat, May 29, 2010 at 4:37 PM, Gleb Natapov wrote: > > On Sat, May 29, 2010 at 04:13:22PM +, Blue Swirl wrote: > >> On Sat, May 29, 2010 at 2:46 PM, Gleb Natapov wrote: > >> > On Sat, May 29, 2010 at 09:35:30AM +, Blue Swirl wrote: > >> >> > I still don't see how the alternative is supposed to simplify our life > >> >> > or improve the efficiency of the de-coalescing workaround. It's rather > >> >> > problematic like Gleb pointed out: The de-coalescing logic needs to be > >> >> > informed about periodicity changes that can only be delivered along > >> >> > IRQs. So what to do with the backlog when the timer is stopped? > >> >> > >> >> What happens with the current design? Gleb only mentioned the > >> >> frequency change, I thought that was not so big problem. But I don't > >> >> think this case should be allowed happen at all, it can't exist on > >> >> real HW. > >> >> > >> > Hm, why it can't exist on real HW? Do simple exercise. Run WindowsXP > >> > inside QEMU, connect with gdb to QEMU process and check what frequency > >> > RTC configured with (hint: it will be 64Hz), now run video inside the > >> > guest and check frequency again (hint: it will be 1Khz). > >> > >> You missed the key word 'stopped'. If the timer is really stopped, no > >> IRQs should ever come out afterwards, just like on real HW. For the > >> emulation, this means loss of ticks which should have been delivered > >> before the change. > >> > > I haven't missed it. I describe to you reality of the situation. You want > > to change reality to be more close to what you want it to be by adding > > words to my description. > > Quoting Jan: 'So what to do with the backlog when the timer is > stopped?' I didn't add any words to your description, please be more > careful with your attributions. Why do you think I want to change the > reality? Please refer to my words when you answer to my quote. You quoted my answer to you statement: Gleb only mentioned the frequency change, I thought that was not so big problem. But I don't think this case should be allowed happen at all, it can't exist on real HW. No 'stopped' was under discussion nowhere. FWIW 'stopped' is just a case of frequency change. > > XP frequency change isn't the same case as timer being stopped. > And what is the big difference exactly? > > Please just go write code, experiment, debug > > and _then_ come here with design. > > I added some debugging to RTC, PIC and APIC. I also built a small > guest in x86 assembly to test the coalescing. However, in the tests > with this guest and others I noticed that the coalescing only happens > in some obscure conditions. So try with real guest and with real load. > > By default the APIC's delivery method for IRQs is ExtInt and > coalescing counting happens only with Fixed. This means that the guest > needs to reprogram APIC. It also looks like RTC interrupts need to be > triggered. But I didn't see both of these to happen simultaneously in > my tests with Linux and Windows guests. Of course, -rtc-td-hack flag > must be used and I also disabled HPET to be sure that RTC would be > used. > > With DEBUG_COALESCING enabled, I just get increasing numbers for > apic_irq_delivered: > apic: apic_set_irq: coalescing 67123 > apic: apic_set_irq: coalescing 67124 > apic: apic_set_irq: coalescing 67125 So have you actually used -rtc-td-hack option? I compiled head of qemu.git with DEBUG_COALESCING and run WindowsXP guest with -rtc-td-hack and I get: apic: apic_reset_irq_delivered: old coalescing 3 apic: apic_set_irq: coalescing 1 apic: apic_get_irq_delivered: returning coalescing 1 apic: apic_set_irq: coalescing 2 apic: apic_set_irq: coalescing 3 apic: apic_set_irq: coalescing 4 apic: apic_set_irq: coalescing 5 apic: apic_set_irq: coalescing 6 apic: apic_reset_irq_delivered: old coalescing 6 apic: apic_set_irq: coalescing 1 apic: apic_get_irq_delivered: returning coalescing 1 > > If the hack were active, the numbers would be close to zero (or at > least some point) because apic_reset_irq_delivered would be called, > but this does not happen. Could you specify a clear test case with > which the coalescing action could be tested? Linux or BSD based, > please. Linux don't use RTC as time source and I don't know about BSD, so no Linux or BSD test case for you, sorry. Run WindowXP standard HAL and put heavy load on the host. You can run video inside the gust to trigger coalescing more easily. > > >> But what if the guest changed the frequency very often, and between > >> changes used zero value, like 64Hz -> 0Hz -> 128Hz -> 0Hz -> 64Hz? > > Too bad, the world is not perfect. > > > > -- > > Gleb. > > -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 08:52:34PM +, Blue Swirl wrote: > On Sat, May 29, 2010 at 4:32 PM, Gleb Natapov wrote: > > On Sat, May 29, 2010 at 04:03:22PM +, Blue Swirl wrote: > >> 2010/5/29 Gleb Natapov : > >> > On Sat, May 29, 2010 at 09:15:11AM +, Blue Swirl wrote: > >> >> >> There is no code, because we're still at architecture design stage. > >> >> >> > >> >> > Try to write test code to understand the problem better. > >> >> > >> >> I will. > >> >> > >> > Please do ASAP. This discussion shows that you don't understand what is > >> > the > >> > problem that we are dialing with. > >> > >> Which part of the problem you think I don't understand? > >> > > It seams to me you don't understand how Windows uses RTC for time > > keeping and how the QEMU solves the problem today. > > RTC causes periodic interrupts and Windows interrupt handler > increments jiffies, like Linux? > Linux does much more complicated things than that to keep time, so the only way to fix time drift in Linux was to introduce pvclock. For Window it is not so accurate too, since Windows can change clock frequency any time it can't calculate time from jiffies, it needs to update clock at each time tick. > >> >> >> >> >> guests could also be assisted with special handling (like win2k > >> >> >> >> >> install hack), for example guest instructions could be counted > >> >> >> >> >> (approximately, for example using TB size or TSC) and only > >> >> >> >> >> inject > >> >> >> >> >> after at least N instructions have passed. > >> >> >> >> > Guest instructions cannot be easily counted in KVM (it can be > >> >> >> >> > done more > >> >> >> >> > or less reliably using perf counters, may be). > >> >> >> >> > >> >> >> >> Aren't there any debug registers or perf counters, which can > >> >> >> >> generate > >> >> >> >> an interrupt after some number of instructions have been executed? > >> >> >> > Don't think debug registers have something like that and they are > >> >> >> > available for guest use anyway. Perf counters differs greatly from > >> >> >> > CPU > >> >> >> > to CPU (even between two CPUs of the same manufacturer), and we > >> >> >> > want to > >> >> >> > keep using them for profiling guests. And I don't see what problem > >> >> >> > it > >> >> >> > will solve anyway that can be solved by simple delay between irq > >> >> >> > reinjection. > >> >> >> > >> >> >> This would allow counting the executed instructions and limit it. > >> >> >> Thus > >> >> >> we could emulate a 500MHz CPU on a 2GHz CPU more accurately. > >> >> >> > >> >> > Why would you want to limit number of instruction executed by guest if > >> >> > CPU has nothing else to do anyway? The problem occurs not when we have > >> >> > spare cycles so give to a guest, but in opposite case. > >> >> > >> >> I think one problem is that the guest has executed too much compared > >> >> to what would happen with real HW with a lesser CPU. That explains the > >> >> RTC frequency reprogramming case. > >> > You think wrong. The problem is exactly opposite: the guest haven't > >> > had enough execution time between two time interrupts. I don't know what > >> > RTC frequency reprogramming case you are talking about here. > >> > >> The case you told me where N pending tick IRQs exist but the guest > >> wants to change the RTC frequency from 64Hz to 1024Hz. > >> > >> Let's make this more concrete. 1 GHz CPU, initially 100Hz RTC, so > >> 10Mcycles/tick or 10ms/tick. At T = 30Mcycles, guest wants to change > >> the frequency to 1000Hz. > >> > >> The problem for emulation is that for the same 3 ticks, there has been > >> so little execution power that the ticks have been coalesced. But > >> isn't the guest cycle count then much lower than 30Mcyc? > >> > >> Isn't it so that the guest must be above 30Mcyc to be able to want the > >> change? But if we reach that point, the problem must have not been > >> too little execution time, but too much. > >> > > Sorry I tried hard to understand what have you said above but failed. > > What do you mean "to be able to want the change"? Guest sometimes wants > > to get 64 timer interrupts per second and sometimes it wants to get 1024 > > timer interrupt per second. It wants it not as a result of time drift or > > anything. It's just how guest behaves. You seams to be to fixated on > > guest frequency change. It's just something you have to take into > > account when you reinject interrupts. > > I meant that in the scenario, the guest won't change the RTC before > 30Mcyc because of some built in determinism in the guest. At that > point, because of some reason, the change would happen. > I still don't understand what are you trying to say here. Guest changes frequency because of some even in the guest. It is totally independent of what happens in QEMUs RTC emulation. > > > >> > > >> >> > >> >> > > >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > And even if the rate did not matter, the APIC woult still > >> >> >> >> >> > have to now > >> >> >> >>
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 4:37 PM, Gleb Natapov wrote: > On Sat, May 29, 2010 at 04:13:22PM +, Blue Swirl wrote: >> On Sat, May 29, 2010 at 2:46 PM, Gleb Natapov wrote: >> > On Sat, May 29, 2010 at 09:35:30AM +, Blue Swirl wrote: >> >> > I still don't see how the alternative is supposed to simplify our life >> >> > or improve the efficiency of the de-coalescing workaround. It's rather >> >> > problematic like Gleb pointed out: The de-coalescing logic needs to be >> >> > informed about periodicity changes that can only be delivered along >> >> > IRQs. So what to do with the backlog when the timer is stopped? >> >> >> >> What happens with the current design? Gleb only mentioned the >> >> frequency change, I thought that was not so big problem. But I don't >> >> think this case should be allowed happen at all, it can't exist on >> >> real HW. >> >> >> > Hm, why it can't exist on real HW? Do simple exercise. Run WindowsXP >> > inside QEMU, connect with gdb to QEMU process and check what frequency >> > RTC configured with (hint: it will be 64Hz), now run video inside the >> > guest and check frequency again (hint: it will be 1Khz). >> >> You missed the key word 'stopped'. If the timer is really stopped, no >> IRQs should ever come out afterwards, just like on real HW. For the >> emulation, this means loss of ticks which should have been delivered >> before the change. >> > I haven't missed it. I describe to you reality of the situation. You want > to change reality to be more close to what you want it to be by adding > words to my description. Quoting Jan: 'So what to do with the backlog when the timer is stopped?' I didn't add any words to your description, please be more careful with your attributions. Why do you think I want to change the reality? XP frequency change isn't the same case as timer being stopped. > Please just go write code, experiment, debug > and _then_ come here with design. I added some debugging to RTC, PIC and APIC. I also built a small guest in x86 assembly to test the coalescing. However, in the tests with this guest and others I noticed that the coalescing only happens in some obscure conditions. By default the APIC's delivery method for IRQs is ExtInt and coalescing counting happens only with Fixed. This means that the guest needs to reprogram APIC. It also looks like RTC interrupts need to be triggered. But I didn't see both of these to happen simultaneously in my tests with Linux and Windows guests. Of course, -rtc-td-hack flag must be used and I also disabled HPET to be sure that RTC would be used. With DEBUG_COALESCING enabled, I just get increasing numbers for apic_irq_delivered: apic: apic_set_irq: coalescing 67123 apic: apic_set_irq: coalescing 67124 apic: apic_set_irq: coalescing 67125 If the hack were active, the numbers would be close to zero (or at least some point) because apic_reset_irq_delivered would be called, but this does not happen. Could you specify a clear test case with which the coalescing action could be tested? Linux or BSD based, please. >> But what if the guest changed the frequency very often, and between >> changes used zero value, like 64Hz -> 0Hz -> 128Hz -> 0Hz -> 64Hz? > Too bad, the world is not perfect. > > -- > Gleb. >
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 4:32 PM, Gleb Natapov wrote: > On Sat, May 29, 2010 at 04:03:22PM +, Blue Swirl wrote: >> 2010/5/29 Gleb Natapov : >> > On Sat, May 29, 2010 at 09:15:11AM +, Blue Swirl wrote: >> >> >> There is no code, because we're still at architecture design stage. >> >> >> >> >> > Try to write test code to understand the problem better. >> >> >> >> I will. >> >> >> > Please do ASAP. This discussion shows that you don't understand what is the >> > problem that we are dialing with. >> >> Which part of the problem you think I don't understand? >> > It seams to me you don't understand how Windows uses RTC for time > keeping and how the QEMU solves the problem today. RTC causes periodic interrupts and Windows interrupt handler increments jiffies, like Linux? >> >> >> >> >> guests could also be assisted with special handling (like win2k >> >> >> >> >> install hack), for example guest instructions could be counted >> >> >> >> >> (approximately, for example using TB size or TSC) and only inject >> >> >> >> >> after at least N instructions have passed. >> >> >> >> > Guest instructions cannot be easily counted in KVM (it can be >> >> >> >> > done more >> >> >> >> > or less reliably using perf counters, may be). >> >> >> >> >> >> >> >> Aren't there any debug registers or perf counters, which can >> >> >> >> generate >> >> >> >> an interrupt after some number of instructions have been executed? >> >> >> > Don't think debug registers have something like that and they are >> >> >> > available for guest use anyway. Perf counters differs greatly from >> >> >> > CPU >> >> >> > to CPU (even between two CPUs of the same manufacturer), and we want >> >> >> > to >> >> >> > keep using them for profiling guests. And I don't see what problem it >> >> >> > will solve anyway that can be solved by simple delay between irq >> >> >> > reinjection. >> >> >> >> >> >> This would allow counting the executed instructions and limit it. Thus >> >> >> we could emulate a 500MHz CPU on a 2GHz CPU more accurately. >> >> >> >> >> > Why would you want to limit number of instruction executed by guest if >> >> > CPU has nothing else to do anyway? The problem occurs not when we have >> >> > spare cycles so give to a guest, but in opposite case. >> >> >> >> I think one problem is that the guest has executed too much compared >> >> to what would happen with real HW with a lesser CPU. That explains the >> >> RTC frequency reprogramming case. >> > You think wrong. The problem is exactly opposite: the guest haven't >> > had enough execution time between two time interrupts. I don't know what >> > RTC frequency reprogramming case you are talking about here. >> >> The case you told me where N pending tick IRQs exist but the guest >> wants to change the RTC frequency from 64Hz to 1024Hz. >> >> Let's make this more concrete. 1 GHz CPU, initially 100Hz RTC, so >> 10Mcycles/tick or 10ms/tick. At T = 30Mcycles, guest wants to change >> the frequency to 1000Hz. >> >> The problem for emulation is that for the same 3 ticks, there has been >> so little execution power that the ticks have been coalesced. But >> isn't the guest cycle count then much lower than 30Mcyc? >> >> Isn't it so that the guest must be above 30Mcyc to be able to want the >> change? But if we reach that point, the problem must have not been >> too little execution time, but too much. >> > Sorry I tried hard to understand what have you said above but failed. > What do you mean "to be able to want the change"? Guest sometimes wants > to get 64 timer interrupts per second and sometimes it wants to get 1024 > timer interrupt per second. It wants it not as a result of time drift or > anything. It's just how guest behaves. You seams to be to fixated on > guest frequency change. It's just something you have to take into > account when you reinject interrupts. I meant that in the scenario, the guest won't change the RTC before 30Mcyc because of some built in determinism in the guest. At that point, because of some reason, the change would happen. > >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> > And even if the rate did not matter, the APIC woult still have >> >> >> >> >> > to now >> >> >> >> >> > about the fact that an IRQ is really periodic and does not >> >> >> >> >> > only appear >> >> >> >> >> > as such for a certain interval. This really does not sound like >> >> >> >> >> > simplifying things or even make them cleaner. >> >> >> >> >> >> >> >> >> >> It would, the voodoo would be contained only in APIC, RTC would >> >> >> >> >> be >> >> >> >> >> just like any other device. With the bidirectional irqs, this >> >> >> >> >> voodoo >> >> >> >> >> would probably eventually spread to many other devices. The >> >> >> >> >> logical >> >> >> >> >> conclusion of that would be a system where all devices would be >> >> >> >> >> careful not to disturb the guest at wrong moment because that >> >> >> >> >> would >> >> >> >> >> trigger a bug. >> >> >> >> >> >> >> >>
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 04:13:22PM +, Blue Swirl wrote: > On Sat, May 29, 2010 at 2:46 PM, Gleb Natapov wrote: > > On Sat, May 29, 2010 at 09:35:30AM +, Blue Swirl wrote: > >> > I still don't see how the alternative is supposed to simplify our life > >> > or improve the efficiency of the de-coalescing workaround. It's rather > >> > problematic like Gleb pointed out: The de-coalescing logic needs to be > >> > informed about periodicity changes that can only be delivered along > >> > IRQs. So what to do with the backlog when the timer is stopped? > >> > >> What happens with the current design? Gleb only mentioned the > >> frequency change, I thought that was not so big problem. But I don't > >> think this case should be allowed happen at all, it can't exist on > >> real HW. > >> > > Hm, why it can't exist on real HW? Do simple exercise. Run WindowsXP > > inside QEMU, connect with gdb to QEMU process and check what frequency > > RTC configured with (hint: it will be 64Hz), now run video inside the > > guest and check frequency again (hint: it will be 1Khz). > > You missed the key word 'stopped'. If the timer is really stopped, no > IRQs should ever come out afterwards, just like on real HW. For the > emulation, this means loss of ticks which should have been delivered > before the change. > I haven't missed it. I describe to you reality of the situation. You want to change reality to be more close to what you want it to be by adding words to my description. Please just go write code, experiment, debug and _then_ come here with design. > But what if the guest changed the frequency very often, and between > changes used zero value, like 64Hz -> 0Hz -> 128Hz -> 0Hz -> 64Hz? Too bad, the world is not perfect. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > On Sat, May 29, 2010 at 10:16 AM, Jan Kiszka wrote: >> Blue Swirl wrote: >> This is - according to my current understanding - the proposed >> alternative architecture: >> >> .---. >> | de-coalescing | >> | logic | >> '---' >>^ | >> period,| |IRQ >> coalesced| |(or tuned VM clock) >>(yes/no)| v >> .---. .. .---. >> | RTC |-IRQ->| router |-IRQ>| APIC | >> '---' '' '---' >>|^| ^ >>||| | >>'---period---''--period---' > The period information is already known by the higher level clock > management, The timer device models program the next event of some qemu-timer. There is no tag attached explaining the timer subsystem or anyone else the reason behind this programming. >>> Yes, but why would we care? All timers in the system besides RTC >>> should be affected likewise, this would in fact be the benefit from >>> handling the problem at higher level. >> And how does your equation for calculating the clock slow-down look like? > > How about icount_adjust()? I would suggest that you implement your ideas now. Please keep us informed about the progress as this series (and more) depends on a decision. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 2:46 PM, Gleb Natapov wrote: > On Sat, May 29, 2010 at 09:35:30AM +, Blue Swirl wrote: >> > I still don't see how the alternative is supposed to simplify our life >> > or improve the efficiency of the de-coalescing workaround. It's rather >> > problematic like Gleb pointed out: The de-coalescing logic needs to be >> > informed about periodicity changes that can only be delivered along >> > IRQs. So what to do with the backlog when the timer is stopped? >> >> What happens with the current design? Gleb only mentioned the >> frequency change, I thought that was not so big problem. But I don't >> think this case should be allowed happen at all, it can't exist on >> real HW. >> > Hm, why it can't exist on real HW? Do simple exercise. Run WindowsXP > inside QEMU, connect with gdb to QEMU process and check what frequency > RTC configured with (hint: it will be 64Hz), now run video inside the > guest and check frequency again (hint: it will be 1Khz). You missed the key word 'stopped'. If the timer is really stopped, no IRQs should ever come out afterwards, just like on real HW. For the emulation, this means loss of ticks which should have been delivered before the change. But what if the guest changed the frequency very often, and between changes used zero value, like 64Hz -> 0Hz -> 128Hz -> 0Hz -> 64Hz?
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 04:03:22PM +, Blue Swirl wrote: > 2010/5/29 Gleb Natapov : > > On Sat, May 29, 2010 at 09:15:11AM +, Blue Swirl wrote: > >> >> There is no code, because we're still at architecture design stage. > >> >> > >> > Try to write test code to understand the problem better. > >> > >> I will. > >> > > Please do ASAP. This discussion shows that you don't understand what is the > > problem that we are dialing with. > > Which part of the problem you think I don't understand? > It seams to me you don't understand how Windows uses RTC for time keeping and how the QEMU solves the problem today. > >> >> >> >> guests could also be assisted with special handling (like win2k > >> >> >> >> install hack), for example guest instructions could be counted > >> >> >> >> (approximately, for example using TB size or TSC) and only inject > >> >> >> >> after at least N instructions have passed. > >> >> >> > Guest instructions cannot be easily counted in KVM (it can be done > >> >> >> > more > >> >> >> > or less reliably using perf counters, may be). > >> >> >> > >> >> >> Aren't there any debug registers or perf counters, which can generate > >> >> >> an interrupt after some number of instructions have been executed? > >> >> > Don't think debug registers have something like that and they are > >> >> > available for guest use anyway. Perf counters differs greatly from CPU > >> >> > to CPU (even between two CPUs of the same manufacturer), and we want > >> >> > to > >> >> > keep using them for profiling guests. And I don't see what problem it > >> >> > will solve anyway that can be solved by simple delay between irq > >> >> > reinjection. > >> >> > >> >> This would allow counting the executed instructions and limit it. Thus > >> >> we could emulate a 500MHz CPU on a 2GHz CPU more accurately. > >> >> > >> > Why would you want to limit number of instruction executed by guest if > >> > CPU has nothing else to do anyway? The problem occurs not when we have > >> > spare cycles so give to a guest, but in opposite case. > >> > >> I think one problem is that the guest has executed too much compared > >> to what would happen with real HW with a lesser CPU. That explains the > >> RTC frequency reprogramming case. > > You think wrong. The problem is exactly opposite: the guest haven't > > had enough execution time between two time interrupts. I don't know what > > RTC frequency reprogramming case you are talking about here. > > The case you told me where N pending tick IRQs exist but the guest > wants to change the RTC frequency from 64Hz to 1024Hz. > > Let's make this more concrete. 1 GHz CPU, initially 100Hz RTC, so > 10Mcycles/tick or 10ms/tick. At T = 30Mcycles, guest wants to change > the frequency to 1000Hz. > > The problem for emulation is that for the same 3 ticks, there has been > so little execution power that the ticks have been coalesced. But > isn't the guest cycle count then much lower than 30Mcyc? > > Isn't it so that the guest must be above 30Mcyc to be able to want the > change? But if we reach that point, the problem must have not been > too little execution time, but too much. > Sorry I tried hard to understand what have you said above but failed. What do you mean "to be able to want the change"? Guest sometimes wants to get 64 timer interrupts per second and sometimes it wants to get 1024 timer interrupt per second. It wants it not as a result of time drift or anything. It's just how guest behaves. You seams to be to fixated on guest frequency change. It's just something you have to take into account when you reinject interrupts. > > > >> > >> > > >> >> >> > >> >> >> >> > >> >> >> >> > And even if the rate did not matter, the APIC woult still have > >> >> >> >> > to now > >> >> >> >> > about the fact that an IRQ is really periodic and does not only > >> >> >> >> > appear > >> >> >> >> > as such for a certain interval. This really does not sound like > >> >> >> >> > simplifying things or even make them cleaner. > >> >> >> >> > >> >> >> >> It would, the voodoo would be contained only in APIC, RTC would be > >> >> >> >> just like any other device. With the bidirectional irqs, this > >> >> >> >> voodoo > >> >> >> >> would probably eventually spread to many other devices. The > >> >> >> >> logical > >> >> >> >> conclusion of that would be a system where all devices would be > >> >> >> >> careful not to disturb the guest at wrong moment because that > >> >> >> >> would > >> >> >> >> trigger a bug. > >> >> >> >> > >> >> >> > This voodoo will be so complex and unreliable that it will make > >> >> >> > RTC hack > >> >> >> > pale in comparison (and I still don't see how you are going to > >> >> >> > make it > >> >> >> > actually work). > >> >> >> > >> >> >> Implement everything inside APIC: only coalescing and reinjection. > >> >> > APIC has zero info needed to implement reinjection correctly as was > >> >> > shown to you several time in this thread and you simply keep ignoring > >> >> >
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
2010/5/29 Gleb Natapov : > On Sat, May 29, 2010 at 09:15:11AM +, Blue Swirl wrote: >> >> There is no code, because we're still at architecture design stage. >> >> >> > Try to write test code to understand the problem better. >> >> I will. >> > Please do ASAP. This discussion shows that you don't understand what is the > problem that we are dialing with. Which part of the problem you think I don't understand? >> >> >> >> guests could also be assisted with special handling (like win2k >> >> >> >> install hack), for example guest instructions could be counted >> >> >> >> (approximately, for example using TB size or TSC) and only inject >> >> >> >> after at least N instructions have passed. >> >> >> > Guest instructions cannot be easily counted in KVM (it can be done >> >> >> > more >> >> >> > or less reliably using perf counters, may be). >> >> >> >> >> >> Aren't there any debug registers or perf counters, which can generate >> >> >> an interrupt after some number of instructions have been executed? >> >> > Don't think debug registers have something like that and they are >> >> > available for guest use anyway. Perf counters differs greatly from CPU >> >> > to CPU (even between two CPUs of the same manufacturer), and we want to >> >> > keep using them for profiling guests. And I don't see what problem it >> >> > will solve anyway that can be solved by simple delay between irq >> >> > reinjection. >> >> >> >> This would allow counting the executed instructions and limit it. Thus >> >> we could emulate a 500MHz CPU on a 2GHz CPU more accurately. >> >> >> > Why would you want to limit number of instruction executed by guest if >> > CPU has nothing else to do anyway? The problem occurs not when we have >> > spare cycles so give to a guest, but in opposite case. >> >> I think one problem is that the guest has executed too much compared >> to what would happen with real HW with a lesser CPU. That explains the >> RTC frequency reprogramming case. > You think wrong. The problem is exactly opposite: the guest haven't > had enough execution time between two time interrupts. I don't know what > RTC frequency reprogramming case you are talking about here. The case you told me where N pending tick IRQs exist but the guest wants to change the RTC frequency from 64Hz to 1024Hz. Let's make this more concrete. 1 GHz CPU, initially 100Hz RTC, so 10Mcycles/tick or 10ms/tick. At T = 30Mcycles, guest wants to change the frequency to 1000Hz. The problem for emulation is that for the same 3 ticks, there has been so little execution power that the ticks have been coalesced. But isn't the guest cycle count then much lower than 30Mcyc? Isn't it so that the guest must be above 30Mcyc to be able to want the change? But if we reach that point, the problem must have not been too little execution time, but too much. > >> >> > >> >> >> >> >> >> >> >> >> >> >> > And even if the rate did not matter, the APIC woult still have to >> >> >> >> > now >> >> >> >> > about the fact that an IRQ is really periodic and does not only >> >> >> >> > appear >> >> >> >> > as such for a certain interval. This really does not sound like >> >> >> >> > simplifying things or even make them cleaner. >> >> >> >> >> >> >> >> It would, the voodoo would be contained only in APIC, RTC would be >> >> >> >> just like any other device. With the bidirectional irqs, this voodoo >> >> >> >> would probably eventually spread to many other devices. The logical >> >> >> >> conclusion of that would be a system where all devices would be >> >> >> >> careful not to disturb the guest at wrong moment because that would >> >> >> >> trigger a bug. >> >> >> >> >> >> >> > This voodoo will be so complex and unreliable that it will make RTC >> >> >> > hack >> >> >> > pale in comparison (and I still don't see how you are going to make >> >> >> > it >> >> >> > actually work). >> >> >> >> >> >> Implement everything inside APIC: only coalescing and reinjection. >> >> > APIC has zero info needed to implement reinjection correctly as was >> >> > shown to you several time in this thread and you simply keep ignoring >> >> > it. >> >> >> >> On the contrary, APIC is actually the only source of the IRQ ack >> >> information. RTC hack would not work without APIC (or the >> >> bidirectional IRQ) passing this info to RTC. >> >> >> >> What APIC doesn't have now is the timer frequency or period info. This >> >> is known by RTC and also higher levels managing the clocks. >> >> >> > So APIC has one bit of information and RTC everything else. >> >> The information known by RTC (timer period) is also known by higher levels. >> > What do you mean by higher level here? vl.c or APIC. vl.c, qemu-timer.c. >> > The current >> > approach (and proposed patch) brings this one bit of information to RTC, >> > you are arguing that RTC should be able to communicate all its info to >> > APIC. Sorry I don't see that your way has any advantage. Just more >> > complex interface and it is much easier to get
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
2010/5/28 Gleb Natapov : > On Fri, May 28, 2010 at 08:06:45PM +, Blue Swirl wrote: >> 2010/5/28 Gleb Natapov : >> > On Thu, May 27, 2010 at 06:37:10PM +, Blue Swirl wrote: >> >> 2010/5/27 Gleb Natapov : >> >> > On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote: >> >> >> On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: >> >> >> > Blue Swirl wrote: >> >> >> >> On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka >> >> >> >> wrote: >> >> >> >>> Anthony Liguori wrote: >> >> >> On 05/25/2010 02:09 PM, Blue Swirl wrote: >> >> >> > On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka >> >> >> > wrote: >> >> >> > >> >> >> >> From: Jan Kiszka >> >> >> >> >> >> >> >> This allows to communicate potential IRQ coalescing during >> >> >> >> delivery from >> >> >> >> the sink back to the source. Targets that support IRQ coalescing >> >> >> >> workarounds need to register handlers that return the >> >> >> >> appropriate >> >> >> >> QEMU_IRQ_* code, and they have to propergate the code across >> >> >> >> all IRQ >> >> >> >> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, >> >> >> >> it can >> >> >> >> apply its workaround. If multiple sinks exist, the source may >> >> >> >> only >> >> >> >> consider an IRQ coalesced if all other sinks either report >> >> >> >> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >> >> >> >> >> >> >> > No real devices are interested whether any of their output lines >> >> >> > are >> >> >> > even connected. This would introduce a new signal type, >> >> >> > bidirectional >> >> >> > multi-level, which is not correct. >> >> >> > >> >> >> I don't think it's really an issue of correct, but I wouldn't >> >> >> disagree >> >> >> to a suggestion that we ought to introduce a new signal type for >> >> >> this >> >> >> type of bidirectional feedback. Maybe it's qemu_coalesced_irq >> >> >> and has a >> >> >> similar interface as qemu_irq. >> >> >> >>> A separate type would complicate the delivery of the feedback value >> >> >> >>> across GPIO pins (as Paul requested for the RTC->HPET routing). >> >> >> >>> >> >> >> > I think the real solution to coalescing is put the logic inside >> >> >> > one >> >> >> > device, in this case APIC because it has the information about >> >> >> > irq >> >> >> > delivery. APIC could monitor incoming RTC irqs for frequency >> >> >> > information and whether they get delivered or not. If not, an >> >> >> > internal >> >> >> > timer is installed which injects the lost irqs. >> >> >> >>> That won't fly as the IRQs will already arrive at the APIC with a >> >> >> >>> sufficiently high jitter. At the bare minimum, you need to tell the >> >> >> >>> interrupt controller about the fact that a particular IRQ should be >> >> >> >>> delivered at a specific regular rate. For this, you also need a >> >> >> >>> generic >> >> >> >>> interface - nothing really "won". >> >> >> >> >> >> >> >> OK, let's simplify: just reinject at next possible chance. No need >> >> >> >> to >> >> >> >> monitor or tell anything. >> >> >> > >> >> >> > There are guests that won't like this (I know of one in-house, but >> >> >> > others may even have more examples), specifically if you end up >> >> >> > firing >> >> >> > multiple IRQs in a row due to a longer backlog. For that reason, the >> >> >> > RTC >> >> >> > spreads the reinjection according to the current rate. >> >> >> >> >> >> Then reinject with a constant delay, or next CPU exit. Such buggy >> >> > If guest's time frequency is the same as host time frequency you can't >> >> > reinject with constant delay. That is why current code mixes two >> >> > approaches: reinject M interrupts in a raw then delay. >> >> >> >> This approach can be also used by APIC-only version. >> >> >> > I don't know what APIC-only version you are talking about. I haven't >> > seen the code and I don't understand hand waving, sorry. >> >> There is no code, because we're still at architecture design stage. >> > Try to write test code to understand the problem better. I will. >> >> >> guests could also be assisted with special handling (like win2k >> >> >> install hack), for example guest instructions could be counted >> >> >> (approximately, for example using TB size or TSC) and only inject >> >> >> after at least N instructions have passed. >> >> > Guest instructions cannot be easily counted in KVM (it can be done more >> >> > or less reliably using perf counters, may be). >> >> >> >> Aren't there any debug registers or perf counters, which can generate >> >> an interrupt after some number of instructions have been executed? >> > Don't think debug registers have something like that and they are >> > available for guest use anyway. Perf counters differs greatly from CPU >> > to CPU (even between two CPUs of the same manufacturer), and we want to >> > keep using them for profilin
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
2010/5/29 Jan Kiszka : > Gleb Natapov wrote: >> On Fri, May 28, 2010 at 08:06:45PM +, Blue Swirl wrote: >>> 2010/5/28 Gleb Natapov : On Thu, May 27, 2010 at 06:37:10PM +, Blue Swirl wrote: > 2010/5/27 Gleb Natapov : >> On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote: >>> On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: Blue Swirl wrote: > On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka wrote: >> Anthony Liguori wrote: >>> On 05/25/2010 02:09 PM, Blue Swirl wrote: On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: > From: Jan Kiszka > > This allows to communicate potential IRQ coalescing during > delivery from > the sink back to the source. Targets that support IRQ coalescing > workarounds need to register handlers that return the appropriate > QEMU_IRQ_* code, and they have to propergate the code across all > IRQ > redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it > can > apply its workaround. If multiple sinks exist, the source may only > consider an IRQ coalesced if all other sinks either report > QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. > No real devices are interested whether any of their output lines are even connected. This would introduce a new signal type, bidirectional multi-level, which is not correct. >>> I don't think it's really an issue of correct, but I wouldn't >>> disagree >>> to a suggestion that we ought to introduce a new signal type for >>> this >>> type of bidirectional feedback. Maybe it's qemu_coalesced_irq and >>> has a >>> similar interface as qemu_irq. >> A separate type would complicate the delivery of the feedback value >> across GPIO pins (as Paul requested for the RTC->HPET routing). >> I think the real solution to coalescing is put the logic inside one device, in this case APIC because it has the information about irq delivery. APIC could monitor incoming RTC irqs for frequency information and whether they get delivered or not. If not, an internal timer is installed which injects the lost irqs. >> That won't fly as the IRQs will already arrive at the APIC with a >> sufficiently high jitter. At the bare minimum, you need to tell the >> interrupt controller about the fact that a particular IRQ should be >> delivered at a specific regular rate. For this, you also need a >> generic >> interface - nothing really "won". > OK, let's simplify: just reinject at next possible chance. No need to > monitor or tell anything. There are guests that won't like this (I know of one in-house, but others may even have more examples), specifically if you end up firing multiple IRQs in a row due to a longer backlog. For that reason, the RTC spreads the reinjection according to the current rate. >>> Then reinject with a constant delay, or next CPU exit. Such buggy >> If guest's time frequency is the same as host time frequency you can't >> reinject with constant delay. That is why current code mixes two >> approaches: reinject M interrupts in a raw then delay. > This approach can be also used by APIC-only version. > I don't know what APIC-only version you are talking about. I haven't seen the code and I don't understand hand waving, sorry. >>> There is no code, because we're still at architecture design stage. >>> >> Try to write test code to understand the problem better. >> >>> guests could also be assisted with special handling (like win2k >>> install hack), for example guest instructions could be counted >>> (approximately, for example using TB size or TSC) and only inject >>> after at least N instructions have passed. >> Guest instructions cannot be easily counted in KVM (it can be done more >> or less reliably using perf counters, may be). > Aren't there any debug registers or perf counters, which can generate > an interrupt after some number of instructions have been executed? Don't think debug registers have something like that and they are available for guest use anyway. Perf counters differs greatly from CPU to CPU (even between two CPUs of the same manufacturer), and we want to keep using them for profiling guests. And I don't see what problem it will solve anyway that can be solved by simple delay between irq reinjection. >>> This would allow counting the executed instructions and limit it. Thus >>> we could emulate a 500MHz CPU on a 2GHz CPU more
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: >>> On the contrary, APIC is actually the only source of the IRQ ack >>> information. RTC hack would not work without APIC (or the >>> bidirectional IRQ) passing this info to RTC. >>> >>> What APIC doesn't have now is the timer frequency or period info. This >>> is known by RTC and also higher levels managing the clocks. >>> >> So APIC has one bit of information and RTC everything else. > > The information known by RTC (timer period) is also known by higher levels. Curious to see where you'll find this. > >> The current >> approach (and proposed patch) brings this one bit of information to RTC, >> you are arguing that RTC should be able to communicate all its info to >> APIC. Sorry I don't see that your way has any advantage. Just more >> complex interface and it is much easier to get it wrong for other time >> sources. > > I don't think anymore that APIC should be handling this but the > generic stuff, like vl.c or exec.c. Then there would be only > information passing from APIC to higher levels. You neglect the the information required to associate a periodic source (e.g. RTC) with an IRQ sink (e.g. APIC). Without that, you will have a hard time figuring out if a reported IRQ coalescing requires any activities or should simply be welcomed (for I/O IRQs). > >>> I keep ignoring the idea that the current model, where both RTC and >>> APIC must somehow work together to make coalescing work, is the only >>> possible just because it is committed and it happens to work in some >>> cases. It would be much better to concentrate this to one place, APIC >>> or preferably higher level where it may benefit other timers too. >>> Provided of course that the other models can be made to work. >>> >> So write the code and show us. You haven't show any evidence that RTC is >> the wrong place. RTC knows when interrupt was acknowledge to RTC, it >> know when clock frequency changes, it know when device reset happened. >> APIC knows only that interrupt was coalesced. It doesn't even know that >> it may be masked by a guest in IOAPIC (interrupts delivered while they >> are masked not considered coalesced). > > Oh, I thought interrupt masking was the reason for coalescing! What > exactly is the reason then? Missing acks, ie. the IRQ is still pending when the next one arrives. You want to filter out masked/suppressed IRQs to avoid running the de-coalescing logic on sources that are actually cut off (like the RTC IRQ when the HPET took over). > >> Time source knows only when >> frequency changes and may be when device reset happens if timer is >> stopped by device on reset. So RTC is actually a sweet spot if you want >> to minimize amount of info you need to pass between various layers. >> > Maybe that version would not bend backwards as much as the current to > cater for buggy hosts. > You mean "buggy guests"? >>> Yes, sorry. >>> What guests are not buggy in your opinion? Linux tries hard to be smart and as a result the only way to have stable clock with it is to go paravirt. >>> I'm not an OS designer, but I think an OS should never crash, even if >>> a burst of IRQs is received. Reprogramming the timer should consider >>> the pending IRQ situation (0 or 1 with real HW). Those bugs are one >>> cause of the problem. >> OS should never crash in the absence of HW bugs? I doubt you can design >> an OS that can run in a face of any HW failure. Anyway here we are >> trying to solve guests time keeping problem not crashes. Do you think >> you can design OS that can keep time accurately no matter how crazy all >> HW clock behaves? > > I think my OS design skills are not relevant in this discussion, but > IIRC there are fault tolerant operating systems for extreme conditions > so it can be done. No one can influence the design of released OS versions anymore. > >> The fact is that timer device is not "just like any >> other device" in virtual world. Any other device is easy: you just >> implement spec as close as possible and everything works. For time >> source device this is not enough. You can implement RTC+HPET to the >> letter and your guest will drift like crazy. > It's doable: a cycle accurate emulator will not cause any drift, > without any voodoo. The interrupts would come after executing the same > instruction as the real HW. For emulating any sufficiently buggy > guests in any sufficiently desperate low resource conditions, this may > be the only option that will always work. > Yes, but qemu and kvm are not cycle accurate emulators and don't strive to be one. On the contrary KVM runs at native host CPU speed most of the time, so any emulation done between two instruction is theoretically noticeable for a guest. TSC is bypassed directly to a guest too, so keeping all time source in perfect sync is also impossible. >>> That is actually another cause of the problem. KVM gives the guest an >>> ill
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 9:45 AM, Jan Kiszka wrote: > Blue Swirl wrote: >> 2010/5/29 Jan Kiszka : >>> Gleb Natapov wrote: On Fri, May 28, 2010 at 08:06:45PM +, Blue Swirl wrote: > 2010/5/28 Gleb Natapov : >> On Thu, May 27, 2010 at 06:37:10PM +, Blue Swirl wrote: >>> 2010/5/27 Gleb Natapov : On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote: > On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: >> Blue Swirl wrote: >>> On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka >>> wrote: Anthony Liguori wrote: > On 05/25/2010 02:09 PM, Blue Swirl wrote: >> On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka >> wrote: >> >>> From: Jan Kiszka >>> >>> This allows to communicate potential IRQ coalescing during >>> delivery from >>> the sink back to the source. Targets that support IRQ coalescing >>> workarounds need to register handlers that return the >>> appropriate >>> QEMU_IRQ_* code, and they have to propergate the code across >>> all IRQ >>> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, >>> it can >>> apply its workaround. If multiple sinks exist, the source may >>> only >>> consider an IRQ coalesced if all other sinks either report >>> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >>> >> No real devices are interested whether any of their output lines >> are >> even connected. This would introduce a new signal type, >> bidirectional >> multi-level, which is not correct. >> > I don't think it's really an issue of correct, but I wouldn't > disagree > to a suggestion that we ought to introduce a new signal type for > this > type of bidirectional feedback. Maybe it's qemu_coalesced_irq > and has a > similar interface as qemu_irq. A separate type would complicate the delivery of the feedback value across GPIO pins (as Paul requested for the RTC->HPET routing). >> I think the real solution to coalescing is put the logic inside >> one >> device, in this case APIC because it has the information about >> irq >> delivery. APIC could monitor incoming RTC irqs for frequency >> information and whether they get delivered or not. If not, an >> internal >> timer is installed which injects the lost irqs. That won't fly as the IRQs will already arrive at the APIC with a sufficiently high jitter. At the bare minimum, you need to tell the interrupt controller about the fact that a particular IRQ should be delivered at a specific regular rate. For this, you also need a generic interface - nothing really "won". >>> OK, let's simplify: just reinject at next possible chance. No need >>> to >>> monitor or tell anything. >> There are guests that won't like this (I know of one in-house, but >> others may even have more examples), specifically if you end up >> firing >> multiple IRQs in a row due to a longer backlog. For that reason, the >> RTC >> spreads the reinjection according to the current rate. > Then reinject with a constant delay, or next CPU exit. Such buggy If guest's time frequency is the same as host time frequency you can't reinject with constant delay. That is why current code mixes two approaches: reinject M interrupts in a raw then delay. >>> This approach can be also used by APIC-only version. >>> >> I don't know what APIC-only version you are talking about. I haven't >> seen the code and I don't understand hand waving, sorry. > There is no code, because we're still at architecture design stage. > Try to write test code to understand the problem better. > guests could also be assisted with special handling (like win2k > install hack), for example guest instructions could be counted > (approximately, for example using TB size or TSC) and only inject > after at least N instructions have passed. Guest instructions cannot be easily counted in KVM (it can be done more or less reliably using perf counters, may be). >>> Aren't there any debug registers or perf counters, which can generate >>> an interrupt after some number of instructions have been executed? >> Don't think debug registers have something like that and they are >> available for guest use anyway. Perf counters differs greatly from CPU >> to CPU (eve
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > 2010/5/29 Jan Kiszka : >> Gleb Natapov wrote: >>> On Fri, May 28, 2010 at 08:06:45PM +, Blue Swirl wrote: 2010/5/28 Gleb Natapov : > On Thu, May 27, 2010 at 06:37:10PM +, Blue Swirl wrote: >> 2010/5/27 Gleb Natapov : >>> On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote: On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: > Blue Swirl wrote: >> On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka >> wrote: >>> Anthony Liguori wrote: On 05/25/2010 02:09 PM, Blue Swirl wrote: > On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka > wrote: > >> From: Jan Kiszka >> >> This allows to communicate potential IRQ coalescing during >> delivery from >> the sink back to the source. Targets that support IRQ coalescing >> workarounds need to register handlers that return the appropriate >> QEMU_IRQ_* code, and they have to propergate the code across all >> IRQ >> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, >> it can >> apply its workaround. If multiple sinks exist, the source may >> only >> consider an IRQ coalesced if all other sinks either report >> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >> > No real devices are interested whether any of their output lines > are > even connected. This would introduce a new signal type, > bidirectional > multi-level, which is not correct. > I don't think it's really an issue of correct, but I wouldn't disagree to a suggestion that we ought to introduce a new signal type for this type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a similar interface as qemu_irq. >>> A separate type would complicate the delivery of the feedback value >>> across GPIO pins (as Paul requested for the RTC->HPET routing). >>> > I think the real solution to coalescing is put the logic inside > one > device, in this case APIC because it has the information about irq > delivery. APIC could monitor incoming RTC irqs for frequency > information and whether they get delivered or not. If not, an > internal > timer is installed which injects the lost irqs. >>> That won't fly as the IRQs will already arrive at the APIC with a >>> sufficiently high jitter. At the bare minimum, you need to tell the >>> interrupt controller about the fact that a particular IRQ should be >>> delivered at a specific regular rate. For this, you also need a >>> generic >>> interface - nothing really "won". >> OK, let's simplify: just reinject at next possible chance. No need to >> monitor or tell anything. > There are guests that won't like this (I know of one in-house, but > others may even have more examples), specifically if you end up firing > multiple IRQs in a row due to a longer backlog. For that reason, the > RTC > spreads the reinjection according to the current rate. Then reinject with a constant delay, or next CPU exit. Such buggy >>> If guest's time frequency is the same as host time frequency you can't >>> reinject with constant delay. That is why current code mixes two >>> approaches: reinject M interrupts in a raw then delay. >> This approach can be also used by APIC-only version. >> > I don't know what APIC-only version you are talking about. I haven't > seen the code and I don't understand hand waving, sorry. There is no code, because we're still at architecture design stage. >>> Try to write test code to understand the problem better. >>> guests could also be assisted with special handling (like win2k install hack), for example guest instructions could be counted (approximately, for example using TB size or TSC) and only inject after at least N instructions have passed. >>> Guest instructions cannot be easily counted in KVM (it can be done more >>> or less reliably using perf counters, may be). >> Aren't there any debug registers or perf counters, which can generate >> an interrupt after some number of instructions have been executed? > Don't think debug registers have something like that and they are > available for guest use anyway. Perf counters differs greatly from CPU > to CPU (even between two CPUs of the same manufacturer), and we want to > keep using them for profiling guests. And I don't see what problem it > will solve anyway that can be solved by simple delay betwee
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Gleb Natapov wrote: > On Fri, May 28, 2010 at 08:06:45PM +, Blue Swirl wrote: >> 2010/5/28 Gleb Natapov : >>> On Thu, May 27, 2010 at 06:37:10PM +, Blue Swirl wrote: 2010/5/27 Gleb Natapov : > On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote: >> On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: >>> Blue Swirl wrote: On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka wrote: > Anthony Liguori wrote: >> On 05/25/2010 02:09 PM, Blue Swirl wrote: >>> On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka >>> wrote: >>> From: Jan Kiszka This allows to communicate potential IRQ coalescing during delivery from the sink back to the source. Targets that support IRQ coalescing workarounds need to register handlers that return the appropriate QEMU_IRQ_* code, and they have to propergate the code across all IRQ redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can apply its workaround. If multiple sinks exist, the source may only consider an IRQ coalesced if all other sinks either report QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >>> No real devices are interested whether any of their output lines are >>> even connected. This would introduce a new signal type, >>> bidirectional >>> multi-level, which is not correct. >>> >> I don't think it's really an issue of correct, but I wouldn't >> disagree >> to a suggestion that we ought to introduce a new signal type for this >> type of bidirectional feedback. Maybe it's qemu_coalesced_irq and >> has a >> similar interface as qemu_irq. > A separate type would complicate the delivery of the feedback value > across GPIO pins (as Paul requested for the RTC->HPET routing). > >>> I think the real solution to coalescing is put the logic inside one >>> device, in this case APIC because it has the information about irq >>> delivery. APIC could monitor incoming RTC irqs for frequency >>> information and whether they get delivered or not. If not, an >>> internal >>> timer is installed which injects the lost irqs. > That won't fly as the IRQs will already arrive at the APIC with a > sufficiently high jitter. At the bare minimum, you need to tell the > interrupt controller about the fact that a particular IRQ should be > delivered at a specific regular rate. For this, you also need a > generic > interface - nothing really "won". OK, let's simplify: just reinject at next possible chance. No need to monitor or tell anything. >>> There are guests that won't like this (I know of one in-house, but >>> others may even have more examples), specifically if you end up firing >>> multiple IRQs in a row due to a longer backlog. For that reason, the RTC >>> spreads the reinjection according to the current rate. >> Then reinject with a constant delay, or next CPU exit. Such buggy > If guest's time frequency is the same as host time frequency you can't > reinject with constant delay. That is why current code mixes two > approaches: reinject M interrupts in a raw then delay. This approach can be also used by APIC-only version. >>> I don't know what APIC-only version you are talking about. I haven't >>> seen the code and I don't understand hand waving, sorry. >> There is no code, because we're still at architecture design stage. >> > Try to write test code to understand the problem better. > >> guests could also be assisted with special handling (like win2k >> install hack), for example guest instructions could be counted >> (approximately, for example using TB size or TSC) and only inject >> after at least N instructions have passed. > Guest instructions cannot be easily counted in KVM (it can be done more > or less reliably using perf counters, may be). Aren't there any debug registers or perf counters, which can generate an interrupt after some number of instructions have been executed? >>> Don't think debug registers have something like that and they are >>> available for guest use anyway. Perf counters differs greatly from CPU >>> to CPU (even between two CPUs of the same manufacturer), and we want to >>> keep using them for profiling guests. And I don't see what problem it >>> will solve anyway that can be solved by simple delay between irq >>> reinjection. >> This would allow counting the executed instructions and limit it. Thus >> we could emulate a 500MHz CPU on a 2GHz CPU more accurately. >> > Why would you want to limit number of instruction executed by guest if > CPU has nothing else to do anyway? The problem occurs not
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: This is - according to my current understanding - the proposed alternative architecture: .---. | de-coalescing | | logic | '---' ^ | period,| |IRQ coalesced| |(or tuned VM clock) (yes/no)| v .---. .. .---. | RTC |-IRQ->| router |-IRQ>| APIC | '---' '' '---' |^| ^ ||| | '---period---''--period---' >>> The period information is already known by the higher level clock >>> management, >> The timer device models program the next event of some qemu-timer. There >> is no tag attached explaining the timer subsystem or anyone else the >> reason behind this programming. > > Yes, but why would we care? All timers in the system besides RTC > should be affected likewise, this would in fact be the benefit from > handling the problem at higher level. And how does your equation for calculating the clock slow-down look like? Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 10:16 AM, Jan Kiszka wrote: > Blue Swirl wrote: > This is - according to my current understanding - the proposed > alternative architecture: > > .---. > | de-coalescing | > | logic | > '---' > ^ | > period,| |IRQ > coalesced| |(or tuned VM clock) > (yes/no)| v > .---. .. .---. > | RTC |-IRQ->| router |-IRQ>| APIC | > '---' '' '---' > | ^ | ^ > | | | | > '---period---' '--period---' The period information is already known by the higher level clock management, >>> The timer device models program the next event of some qemu-timer. There >>> is no tag attached explaining the timer subsystem or anyone else the >>> reason behind this programming. >> >> Yes, but why would we care? All timers in the system besides RTC >> should be affected likewise, this would in fact be the benefit from >> handling the problem at higher level. > > And how does your equation for calculating the clock slow-down look like? How about icount_adjust()?
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 09:35:30AM +, Blue Swirl wrote: > > I still don't see how the alternative is supposed to simplify our life > > or improve the efficiency of the de-coalescing workaround. It's rather > > problematic like Gleb pointed out: The de-coalescing logic needs to be > > informed about periodicity changes that can only be delivered along > > IRQs. So what to do with the backlog when the timer is stopped? > > What happens with the current design? Gleb only mentioned the > frequency change, I thought that was not so big problem. But I don't > think this case should be allowed happen at all, it can't exist on > real HW. > Hm, why it can't exist on real HW? Do simple exercise. Run WindowsXP inside QEMU, connect with gdb to QEMU process and check what frequency RTC configured with (hint: it will be 64Hz), now run video inside the guest and check frequency again (hint: it will be 1Khz). -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, May 29, 2010 at 09:15:11AM +, Blue Swirl wrote: > >> There is no code, because we're still at architecture design stage. > >> > > Try to write test code to understand the problem better. > > I will. > Please do ASAP. This discussion shows that you don't understand what is the problem that we are dialing with. > >> >> >> guests could also be assisted with special handling (like win2k > >> >> >> install hack), for example guest instructions could be counted > >> >> >> (approximately, for example using TB size or TSC) and only inject > >> >> >> after at least N instructions have passed. > >> >> > Guest instructions cannot be easily counted in KVM (it can be done > >> >> > more > >> >> > or less reliably using perf counters, may be). > >> >> > >> >> Aren't there any debug registers or perf counters, which can generate > >> >> an interrupt after some number of instructions have been executed? > >> > Don't think debug registers have something like that and they are > >> > available for guest use anyway. Perf counters differs greatly from CPU > >> > to CPU (even between two CPUs of the same manufacturer), and we want to > >> > keep using them for profiling guests. And I don't see what problem it > >> > will solve anyway that can be solved by simple delay between irq > >> > reinjection. > >> > >> This would allow counting the executed instructions and limit it. Thus > >> we could emulate a 500MHz CPU on a 2GHz CPU more accurately. > >> > > Why would you want to limit number of instruction executed by guest if > > CPU has nothing else to do anyway? The problem occurs not when we have > > spare cycles so give to a guest, but in opposite case. > > I think one problem is that the guest has executed too much compared > to what would happen with real HW with a lesser CPU. That explains the > RTC frequency reprogramming case. You think wrong. The problem is exactly opposite: the guest haven't had enough execution time between two time interrupts. I don't know what RTC frequency reprogramming case you are talking about here. > > > > >> >> > >> >> >> > >> >> >> > And even if the rate did not matter, the APIC woult still have to > >> >> >> > now > >> >> >> > about the fact that an IRQ is really periodic and does not only > >> >> >> > appear > >> >> >> > as such for a certain interval. This really does not sound like > >> >> >> > simplifying things or even make them cleaner. > >> >> >> > >> >> >> It would, the voodoo would be contained only in APIC, RTC would be > >> >> >> just like any other device. With the bidirectional irqs, this voodoo > >> >> >> would probably eventually spread to many other devices. The logical > >> >> >> conclusion of that would be a system where all devices would be > >> >> >> careful not to disturb the guest at wrong moment because that would > >> >> >> trigger a bug. > >> >> >> > >> >> > This voodoo will be so complex and unreliable that it will make RTC > >> >> > hack > >> >> > pale in comparison (and I still don't see how you are going to make it > >> >> > actually work). > >> >> > >> >> Implement everything inside APIC: only coalescing and reinjection. > >> > APIC has zero info needed to implement reinjection correctly as was > >> > shown to you several time in this thread and you simply keep ignoring > >> > it. > >> > >> On the contrary, APIC is actually the only source of the IRQ ack > >> information. RTC hack would not work without APIC (or the > >> bidirectional IRQ) passing this info to RTC. > >> > >> What APIC doesn't have now is the timer frequency or period info. This > >> is known by RTC and also higher levels managing the clocks. > >> > > So APIC has one bit of information and RTC everything else. > > The information known by RTC (timer period) is also known by higher levels. > What do you mean by higher level here? vl.c or APIC. > > The current > > approach (and proposed patch) brings this one bit of information to RTC, > > you are arguing that RTC should be able to communicate all its info to > > APIC. Sorry I don't see that your way has any advantage. Just more > > complex interface and it is much easier to get it wrong for other time > > sources. > > I don't think anymore that APIC should be handling this but the > generic stuff, like vl.c or exec.c. Then there would be only > information passing from APIC to higher levels. Handling reinjection by general timer code makes infinitely more sense then handling it in APIC. One thing (from the top of my head) that can't be implemented at that level is injection of interrupt back to back (i.e injecting next interrupt immediately after guest acknowledge previous one to RTC). > > >> I keep ignoring the idea that the current model, where both RTC and > >> APIC must somehow work together to make coalescing work, is the only > >> possible just because it is committed and it happens to work in some > >> cases. It would be much better to concentrate this to one place, APIC > >> or preferably higher level where it ma
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Fri, May 28, 2010 at 08:06:45PM +, Blue Swirl wrote: > 2010/5/28 Gleb Natapov : > > On Thu, May 27, 2010 at 06:37:10PM +, Blue Swirl wrote: > >> 2010/5/27 Gleb Natapov : > >> > On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote: > >> >> On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: > >> >> > Blue Swirl wrote: > >> >> >> On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka > >> >> >> wrote: > >> >> >>> Anthony Liguori wrote: > >> >> On 05/25/2010 02:09 PM, Blue Swirl wrote: > >> >> > On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka > >> >> > wrote: > >> >> > > >> >> >> From: Jan Kiszka > >> >> >> > >> >> >> This allows to communicate potential IRQ coalescing during > >> >> >> delivery from > >> >> >> the sink back to the source. Targets that support IRQ coalescing > >> >> >> workarounds need to register handlers that return the appropriate > >> >> >> QEMU_IRQ_* code, and they have to propergate the code across all > >> >> >> IRQ > >> >> >> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, > >> >> >> it can > >> >> >> apply its workaround. If multiple sinks exist, the source may > >> >> >> only > >> >> >> consider an IRQ coalesced if all other sinks either report > >> >> >> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. > >> >> >> > >> >> > No real devices are interested whether any of their output lines > >> >> > are > >> >> > even connected. This would introduce a new signal type, > >> >> > bidirectional > >> >> > multi-level, which is not correct. > >> >> > > >> >> I don't think it's really an issue of correct, but I wouldn't > >> >> disagree > >> >> to a suggestion that we ought to introduce a new signal type for > >> >> this > >> >> type of bidirectional feedback. Maybe it's qemu_coalesced_irq and > >> >> has a > >> >> similar interface as qemu_irq. > >> >> >>> A separate type would complicate the delivery of the feedback value > >> >> >>> across GPIO pins (as Paul requested for the RTC->HPET routing). > >> >> >>> > >> >> > I think the real solution to coalescing is put the logic inside > >> >> > one > >> >> > device, in this case APIC because it has the information about irq > >> >> > delivery. APIC could monitor incoming RTC irqs for frequency > >> >> > information and whether they get delivered or not. If not, an > >> >> > internal > >> >> > timer is installed which injects the lost irqs. > >> >> >>> That won't fly as the IRQs will already arrive at the APIC with a > >> >> >>> sufficiently high jitter. At the bare minimum, you need to tell the > >> >> >>> interrupt controller about the fact that a particular IRQ should be > >> >> >>> delivered at a specific regular rate. For this, you also need a > >> >> >>> generic > >> >> >>> interface - nothing really "won". > >> >> >> > >> >> >> OK, let's simplify: just reinject at next possible chance. No need to > >> >> >> monitor or tell anything. > >> >> > > >> >> > There are guests that won't like this (I know of one in-house, but > >> >> > others may even have more examples), specifically if you end up firing > >> >> > multiple IRQs in a row due to a longer backlog. For that reason, the > >> >> > RTC > >> >> > spreads the reinjection according to the current rate. > >> >> > >> >> Then reinject with a constant delay, or next CPU exit. Such buggy > >> > If guest's time frequency is the same as host time frequency you can't > >> > reinject with constant delay. That is why current code mixes two > >> > approaches: reinject M interrupts in a raw then delay. > >> > >> This approach can be also used by APIC-only version. > >> > > I don't know what APIC-only version you are talking about. I haven't > > seen the code and I don't understand hand waving, sorry. > > There is no code, because we're still at architecture design stage. > Try to write test code to understand the problem better. > >> >> guests could also be assisted with special handling (like win2k > >> >> install hack), for example guest instructions could be counted > >> >> (approximately, for example using TB size or TSC) and only inject > >> >> after at least N instructions have passed. > >> > Guest instructions cannot be easily counted in KVM (it can be done more > >> > or less reliably using perf counters, may be). > >> > >> Aren't there any debug registers or perf counters, which can generate > >> an interrupt after some number of instructions have been executed? > > Don't think debug registers have something like that and they are > > available for guest use anyway. Perf counters differs greatly from CPU > > to CPU (even between two CPUs of the same manufacturer), and we want to > > keep using them for profiling guests. And I don't see what problem it > > will solve anyway that can be solved by simple delay between irq > > reinjection. > > This would allow counting the executed instructions a
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
2010/5/28 Gleb Natapov : > On Thu, May 27, 2010 at 06:37:10PM +, Blue Swirl wrote: >> 2010/5/27 Gleb Natapov : >> > On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote: >> >> On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: >> >> > Blue Swirl wrote: >> >> >> On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka wrote: >> >> >>> Anthony Liguori wrote: >> >> On 05/25/2010 02:09 PM, Blue Swirl wrote: >> >> > On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka >> >> > wrote: >> >> > >> >> >> From: Jan Kiszka >> >> >> >> >> >> This allows to communicate potential IRQ coalescing during >> >> >> delivery from >> >> >> the sink back to the source. Targets that support IRQ coalescing >> >> >> workarounds need to register handlers that return the appropriate >> >> >> QEMU_IRQ_* code, and they have to propergate the code across all >> >> >> IRQ >> >> >> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it >> >> >> can >> >> >> apply its workaround. If multiple sinks exist, the source may only >> >> >> consider an IRQ coalesced if all other sinks either report >> >> >> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >> >> >> >> >> > No real devices are interested whether any of their output lines are >> >> > even connected. This would introduce a new signal type, >> >> > bidirectional >> >> > multi-level, which is not correct. >> >> > >> >> I don't think it's really an issue of correct, but I wouldn't >> >> disagree >> >> to a suggestion that we ought to introduce a new signal type for this >> >> type of bidirectional feedback. Maybe it's qemu_coalesced_irq and >> >> has a >> >> similar interface as qemu_irq. >> >> >>> A separate type would complicate the delivery of the feedback value >> >> >>> across GPIO pins (as Paul requested for the RTC->HPET routing). >> >> >>> >> >> > I think the real solution to coalescing is put the logic inside one >> >> > device, in this case APIC because it has the information about irq >> >> > delivery. APIC could monitor incoming RTC irqs for frequency >> >> > information and whether they get delivered or not. If not, an >> >> > internal >> >> > timer is installed which injects the lost irqs. >> >> >>> That won't fly as the IRQs will already arrive at the APIC with a >> >> >>> sufficiently high jitter. At the bare minimum, you need to tell the >> >> >>> interrupt controller about the fact that a particular IRQ should be >> >> >>> delivered at a specific regular rate. For this, you also need a >> >> >>> generic >> >> >>> interface - nothing really "won". >> >> >> >> >> >> OK, let's simplify: just reinject at next possible chance. No need to >> >> >> monitor or tell anything. >> >> > >> >> > There are guests that won't like this (I know of one in-house, but >> >> > others may even have more examples), specifically if you end up firing >> >> > multiple IRQs in a row due to a longer backlog. For that reason, the RTC >> >> > spreads the reinjection according to the current rate. >> >> >> >> Then reinject with a constant delay, or next CPU exit. Such buggy >> > If guest's time frequency is the same as host time frequency you can't >> > reinject with constant delay. That is why current code mixes two >> > approaches: reinject M interrupts in a raw then delay. >> >> This approach can be also used by APIC-only version. >> > I don't know what APIC-only version you are talking about. I haven't > seen the code and I don't understand hand waving, sorry. There is no code, because we're still at architecture design stage. >> >> guests could also be assisted with special handling (like win2k >> >> install hack), for example guest instructions could be counted >> >> (approximately, for example using TB size or TSC) and only inject >> >> after at least N instructions have passed. >> > Guest instructions cannot be easily counted in KVM (it can be done more >> > or less reliably using perf counters, may be). >> >> Aren't there any debug registers or perf counters, which can generate >> an interrupt after some number of instructions have been executed? > Don't think debug registers have something like that and they are > available for guest use anyway. Perf counters differs greatly from CPU > to CPU (even between two CPUs of the same manufacturer), and we want to > keep using them for profiling guests. And I don't see what problem it > will solve anyway that can be solved by simple delay between irq > reinjection. This would allow counting the executed instructions and limit it. Thus we could emulate a 500MHz CPU on a 2GHz CPU more accurately. >> >> >> >> >> > And even if the rate did not matter, the APIC woult still have to now >> >> > about the fact that an IRQ is really periodic and does not only appear >> >> > as such for a certain interval. This really does not sound like >> >> > simplifying things or even make them cleaner. >> >> >>
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Thu, May 27, 2010 at 10:21 PM, Paul Brook wrote: > >> >> Then the amount >> >> of CPU cycles between timer interrupts would increase and hopefully >> >> the guest can keep up. If the guest sleeps, time base could be >> >> accelerated to catch up with wall clock and then set back to 1:1 rate. >> > >> > Can't follow you ATM, sorry. What should be slowed down then? And how >> > precisely? >> >> I think vm_clock and everything that depends on vm_clock, also >> rtc_clock should be tied to vm_clock in this mode, not host_clock. > > The problem is more fundamental than that. There is no real correlation > between vm_clock and the amount of code executed by the guest, especially not > on timescales less than a second. Can we measure (or at least estimate with higher accuracy than the tick IRQ delivery jitter) the amount of code executed, somehow? For example, add TSC sampling to all TB or KVM VCPU exit and load/store paths?
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Thu, May 27, 2010 at 10:19 PM, Jan Kiszka wrote: > Blue Swirl wrote: >> On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka wrote: >>> Blue Swirl wrote: On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka wrote: > Blue Swirl wrote: >> On Wed, May 26, 2010 at 11:26 PM, Paul Brook >> wrote: At the other extreme, would it be possible to make the educated guests aware of the virtualization also in clock aspect: virtio-clock? >>> The guest doesn't even need to be aware of virtualization. It just >>> needs to be >>> able to accommodate the lack of guaranteed realtime behavior. >>> >>> The fundamental problem here is that some guest operating systems >>> assume that >>> the hardware provides certain realtime guarantees with respect to >>> execution of >>> interrupt handlers. In particular they assume that the CPU will always >>> be >>> able to complete execution of the timer IRQ handler before the periodic >>> timer >>> triggers again. In most virtualized environments you have absolutely no >>> guarantee of realtime response. >>> >>> With Linux guests this was solved a long time ago by the introduction of >>> tickless kernels. These separate the timekeeping from wakeup events, >>> so it >>> doesn't matter if several wakeup triggers end up getting merged (either >>> at the >>> hardware level or via top/bottom half guest IRQ handlers). >>> >>> >>> It's worth mentioning that this problem also occurs on real hardware, >>> typically due to lame hardware/drivers which end up masking interrupts >>> or >>> otherwise stall the CPU for for long periods of time. >>> >>> >>> The PIT hack attempts to workaround broken guests by adding artificial >>> latency >>> to the timer event, ensuring that the guest "sees" them all. >>> Unfortunately >>> guests vary on when it is safe for them to see the next timer event, and >>> trying to observe this behavior involves potentially harmful heuristics >>> and >>> collusion between unrelated devices (e.g. interrupt controller and >>> timer). >>> >>> In some cases we don't even do that, and just reschedule the event some >>> arbitrarily small amount of time later. This assumes the guest to do >>> useful >>> work in that time. In a single threaded environment this is probably >>> true - >>> qemu got enough CPU to inject the first interrupt, so will probably >>> manage to >>> execute some guest code before the end of its timeslice. In an >>> environment >>> where interrupt processing/delivery and execution of the guest code >>> happen in >>> different threads this becomes increasingly likely to fail. >> So any voodoo around timer events is doomed to fail in some cases. >> What's the amount of hacks what we want then? Is there any generic > The aim of this patch is to reduce the amount of existing and upcoming > hacks. It may still require some refinements, but I think we haven't > found any smarter approach yet that fits existing use cases. I don't feel we have tried other possibilities hard enough. >>> Well, seeing prototypes wouldn't be bad, also to run real load againt >>> them. But at least I'm currently clueless what to implement. >> >> Perhaps now is then not the time to rush to implement something, but >> to brainstorm for a clean solution. > > And sometimes it can help to understand how ideas could even be improved > or why others doesn't work at all. > >> >> solution, like slowing down the guest system to the point where we can >> guarantee the interrupt rate vs. CPU execution speed? > That's generally a non-option in virtualized production environments. > Specifically if the guest system lost interrupts due to host > overcommitment, you do not want it slow down even further. I meant that the guest time could be scaled down, for example 2s in wall clock time would be presented to the guest as 1s. >>> But that is precisely what already happens when the guest loses timer >>> interrupts. There is no other time source for this kind of guests - >>> often except for some external events generated by systems which you >>> don't want to fall behind arbitrarily. >>> Then the amount of CPU cycles between timer interrupts would increase and hopefully the guest can keep up. If the guest sleeps, time base could be accelerated to catch up with wall clock and then set back to 1:1 rate. >>> Can't follow you ATM, sorry. What should be slowed down then? And how >>> precisely? >> >> I think vm_clock and everything that depends on vm_clock, also >> rtc_clock should be tied to vm_clock in this mode, not host_clock. > > Let me check if I got this idea correctly: Instead of tuning just the > tick frequency of the affected timer device / sending its backlog in a > row, you rather want t
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Thu, May 27, 2010 at 06:37:10PM +, Blue Swirl wrote: > 2010/5/27 Gleb Natapov : > > On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote: > >> On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: > >> > Blue Swirl wrote: > >> >> On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka wrote: > >> >>> Anthony Liguori wrote: > >> On 05/25/2010 02:09 PM, Blue Swirl wrote: > >> > On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka > >> > wrote: > >> > > >> >> From: Jan Kiszka > >> >> > >> >> This allows to communicate potential IRQ coalescing during delivery > >> >> from > >> >> the sink back to the source. Targets that support IRQ coalescing > >> >> workarounds need to register handlers that return the appropriate > >> >> QEMU_IRQ_* code, and they have to propergate the code across all IRQ > >> >> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it > >> >> can > >> >> apply its workaround. If multiple sinks exist, the source may only > >> >> consider an IRQ coalesced if all other sinks either report > >> >> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. > >> >> > >> > No real devices are interested whether any of their output lines are > >> > even connected. This would introduce a new signal type, bidirectional > >> > multi-level, which is not correct. > >> > > >> I don't think it's really an issue of correct, but I wouldn't disagree > >> to a suggestion that we ought to introduce a new signal type for this > >> type of bidirectional feedback. Maybe it's qemu_coalesced_irq and > >> has a > >> similar interface as qemu_irq. > >> >>> A separate type would complicate the delivery of the feedback value > >> >>> across GPIO pins (as Paul requested for the RTC->HPET routing). > >> >>> > >> > I think the real solution to coalescing is put the logic inside one > >> > device, in this case APIC because it has the information about irq > >> > delivery. APIC could monitor incoming RTC irqs for frequency > >> > information and whether they get delivered or not. If not, an > >> > internal > >> > timer is installed which injects the lost irqs. > >> >>> That won't fly as the IRQs will already arrive at the APIC with a > >> >>> sufficiently high jitter. At the bare minimum, you need to tell the > >> >>> interrupt controller about the fact that a particular IRQ should be > >> >>> delivered at a specific regular rate. For this, you also need a generic > >> >>> interface - nothing really "won". > >> >> > >> >> OK, let's simplify: just reinject at next possible chance. No need to > >> >> monitor or tell anything. > >> > > >> > There are guests that won't like this (I know of one in-house, but > >> > others may even have more examples), specifically if you end up firing > >> > multiple IRQs in a row due to a longer backlog. For that reason, the RTC > >> > spreads the reinjection according to the current rate. > >> > >> Then reinject with a constant delay, or next CPU exit. Such buggy > > If guest's time frequency is the same as host time frequency you can't > > reinject with constant delay. That is why current code mixes two > > approaches: reinject M interrupts in a raw then delay. > > This approach can be also used by APIC-only version. > I don't know what APIC-only version you are talking about. I haven't seen the code and I don't understand hand waving, sorry. > >> guests could also be assisted with special handling (like win2k > >> install hack), for example guest instructions could be counted > >> (approximately, for example using TB size or TSC) and only inject > >> after at least N instructions have passed. > > Guest instructions cannot be easily counted in KVM (it can be done more > > or less reliably using perf counters, may be). > > Aren't there any debug registers or perf counters, which can generate > an interrupt after some number of instructions have been executed? Don't think debug registers have something like that and they are available for guest use anyway. Perf counters differs greatly from CPU to CPU (even between two CPUs of the same manufacturer), and we want to keep using them for profiling guests. And I don't see what problem it will solve anyway that can be solved by simple delay between irq reinjection. > > >> > >> > And even if the rate did not matter, the APIC woult still have to now > >> > about the fact that an IRQ is really periodic and does not only appear > >> > as such for a certain interval. This really does not sound like > >> > simplifying things or even make them cleaner. > >> > >> It would, the voodoo would be contained only in APIC, RTC would be > >> just like any other device. With the bidirectional irqs, this voodoo > >> would probably eventually spread to many other devices. The logical > >> conclusion of that would be a system where all devices would be > >> careful not to disturb the guest at wrong moment because that would > >> trigge
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
> >> Then the amount > >> of CPU cycles between timer interrupts would increase and hopefully > >> the guest can keep up. If the guest sleeps, time base could be > >> accelerated to catch up with wall clock and then set back to 1:1 rate. > > > > Can't follow you ATM, sorry. What should be slowed down then? And how > > precisely? > > I think vm_clock and everything that depends on vm_clock, also > rtc_clock should be tied to vm_clock in this mode, not host_clock. The problem is more fundamental than that. There is no real correlation between vm_clock and the amount of code executed by the guest, especially not on timescales less than a second. Paul
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka wrote: >> Blue Swirl wrote: >>> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka wrote: Blue Swirl wrote: > On Wed, May 26, 2010 at 11:26 PM, Paul Brook > wrote: >>> At the other extreme, would it be possible to make the educated guests >>> aware of the virtualization also in clock aspect: virtio-clock? >> The guest doesn't even need to be aware of virtualization. It just needs >> to be >> able to accommodate the lack of guaranteed realtime behavior. >> >> The fundamental problem here is that some guest operating systems assume >> that >> the hardware provides certain realtime guarantees with respect to >> execution of >> interrupt handlers. In particular they assume that the CPU will always >> be >> able to complete execution of the timer IRQ handler before the periodic >> timer >> triggers again. In most virtualized environments you have absolutely no >> guarantee of realtime response. >> >> With Linux guests this was solved a long time ago by the introduction of >> tickless kernels. These separate the timekeeping from wakeup events, so >> it >> doesn't matter if several wakeup triggers end up getting merged (either >> at the >> hardware level or via top/bottom half guest IRQ handlers). >> >> >> It's worth mentioning that this problem also occurs on real hardware, >> typically due to lame hardware/drivers which end up masking interrupts or >> otherwise stall the CPU for for long periods of time. >> >> >> The PIT hack attempts to workaround broken guests by adding artificial >> latency >> to the timer event, ensuring that the guest "sees" them all. >> Unfortunately >> guests vary on when it is safe for them to see the next timer event, and >> trying to observe this behavior involves potentially harmful heuristics >> and >> collusion between unrelated devices (e.g. interrupt controller and >> timer). >> >> In some cases we don't even do that, and just reschedule the event some >> arbitrarily small amount of time later. This assumes the guest to do >> useful >> work in that time. In a single threaded environment this is probably >> true - >> qemu got enough CPU to inject the first interrupt, so will probably >> manage to >> execute some guest code before the end of its timeslice. In an >> environment >> where interrupt processing/delivery and execution of the guest code >> happen in >> different threads this becomes increasingly likely to fail. > So any voodoo around timer events is doomed to fail in some cases. > What's the amount of hacks what we want then? Is there any generic The aim of this patch is to reduce the amount of existing and upcoming hacks. It may still require some refinements, but I think we haven't found any smarter approach yet that fits existing use cases. >>> I don't feel we have tried other possibilities hard enough. >> Well, seeing prototypes wouldn't be bad, also to run real load againt >> them. But at least I'm currently clueless what to implement. > > Perhaps now is then not the time to rush to implement something, but > to brainstorm for a clean solution. And sometimes it can help to understand how ideas could even be improved or why others doesn't work at all. > > solution, like slowing down the guest system to the point where we can > guarantee the interrupt rate vs. CPU execution speed? That's generally a non-option in virtualized production environments. Specifically if the guest system lost interrupts due to host overcommitment, you do not want it slow down even further. >>> I meant that the guest time could be scaled down, for example 2s in >>> wall clock time would be presented to the guest as 1s. >> But that is precisely what already happens when the guest loses timer >> interrupts. There is no other time source for this kind of guests - >> often except for some external events generated by systems which you >> don't want to fall behind arbitrarily. >> >>> Then the amount >>> of CPU cycles between timer interrupts would increase and hopefully >>> the guest can keep up. If the guest sleeps, time base could be >>> accelerated to catch up with wall clock and then set back to 1:1 rate. >> Can't follow you ATM, sorry. What should be slowed down then? And how >> precisely? > > I think vm_clock and everything that depends on vm_clock, also > rtc_clock should be tied to vm_clock in this mode, not host_clock. Let me check if I got this idea correctly: Instead of tuning just the tick frequency of the affected timer device / sending its backlog in a row, you rather want to tune the vm_clock correspondingly? Maybe a way to abstract the required logic currently sitting only in the RTC for use by other timer sources as well. But just s
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
> > In some cases we don't even do that, and just reschedule the event some > > arbitrarily small amount of time later. This assumes the guest to do > > useful work in that time. In a single threaded environment this is > > probably true - qemu got enough CPU to inject the first interrupt, so > > will probably manage to execute some guest code before the end of its > > timeslice. In an environment where interrupt processing/delivery and > > execution of the guest code happen in different threads this becomes > > increasingly likely to fail. > > So any voodoo around timer events is doomed to fail in some cases. Depends on the level of voodoo. My guess is that common guest operating systems require hacks which result in demonstrably incorrect behavior. > What's the amount of hacks what we want then? Is there any generic > solution, like slowing down the guest system to the point where we can > guarantee the interrupt rate vs. CPU execution speed? The "-icount N" option gives deterministic virtual realtime behavior, however teh guuest if completely decoupled from real-world time. The "-icount auto" option gives semi-deterministic behavior while maintaining overall consistency with the real world. This may introduce some small-scale time jitter, but will still satisfy all but the most demanding hard-real-time assumptions. Neither of these options work with KVM. It may be possible to implement something using using performance counters. I don't know how much additional overhead this would involve. Paul
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka wrote: > Blue Swirl wrote: >> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka wrote: >>> Blue Swirl wrote: On Wed, May 26, 2010 at 11:26 PM, Paul Brook wrote: >> At the other extreme, would it be possible to make the educated guests >> aware of the virtualization also in clock aspect: virtio-clock? > The guest doesn't even need to be aware of virtualization. It just needs > to be > able to accommodate the lack of guaranteed realtime behavior. > > The fundamental problem here is that some guest operating systems assume > that > the hardware provides certain realtime guarantees with respect to > execution of > interrupt handlers. In particular they assume that the CPU will always be > able to complete execution of the timer IRQ handler before the periodic > timer > triggers again. In most virtualized environments you have absolutely no > guarantee of realtime response. > > With Linux guests this was solved a long time ago by the introduction of > tickless kernels. These separate the timekeeping from wakeup events, so > it > doesn't matter if several wakeup triggers end up getting merged (either > at the > hardware level or via top/bottom half guest IRQ handlers). > > > It's worth mentioning that this problem also occurs on real hardware, > typically due to lame hardware/drivers which end up masking interrupts or > otherwise stall the CPU for for long periods of time. > > > The PIT hack attempts to workaround broken guests by adding artificial > latency > to the timer event, ensuring that the guest "sees" them all. > Unfortunately > guests vary on when it is safe for them to see the next timer event, and > trying to observe this behavior involves potentially harmful heuristics > and > collusion between unrelated devices (e.g. interrupt controller and timer). > > In some cases we don't even do that, and just reschedule the event some > arbitrarily small amount of time later. This assumes the guest to do > useful > work in that time. In a single threaded environment this is probably true > - > qemu got enough CPU to inject the first interrupt, so will probably > manage to > execute some guest code before the end of its timeslice. In an environment > where interrupt processing/delivery and execution of the guest code > happen in > different threads this becomes increasingly likely to fail. So any voodoo around timer events is doomed to fail in some cases. What's the amount of hacks what we want then? Is there any generic >>> The aim of this patch is to reduce the amount of existing and upcoming >>> hacks. It may still require some refinements, but I think we haven't >>> found any smarter approach yet that fits existing use cases. >> >> I don't feel we have tried other possibilities hard enough. > > Well, seeing prototypes wouldn't be bad, also to run real load againt > them. But at least I'm currently clueless what to implement. Perhaps now is then not the time to rush to implement something, but to brainstorm for a clean solution. >> solution, like slowing down the guest system to the point where we can guarantee the interrupt rate vs. CPU execution speed? >>> That's generally a non-option in virtualized production environments. >>> Specifically if the guest system lost interrupts due to host >>> overcommitment, you do not want it slow down even further. >> >> I meant that the guest time could be scaled down, for example 2s in >> wall clock time would be presented to the guest as 1s. > > But that is precisely what already happens when the guest loses timer > interrupts. There is no other time source for this kind of guests - > often except for some external events generated by systems which you > don't want to fall behind arbitrarily. > >> Then the amount >> of CPU cycles between timer interrupts would increase and hopefully >> the guest can keep up. If the guest sleeps, time base could be >> accelerated to catch up with wall clock and then set back to 1:1 rate. > > Can't follow you ATM, sorry. What should be slowed down then? And how > precisely? I think vm_clock and everything that depends on vm_clock, also rtc_clock should be tied to vm_clock in this mode, not host_clock. > > Jan > >> >> Slowing down could be triggered by measuring the guest load (for >> example, by checking for presence of halt instructions), if it's close >> to 1, time would be slowed down. If the guest starts to issue halt >> instructions because it's more idle, we can increase speed. >> >> If this approach worked, even APIC could be made ignorant about >> coalescing voodoo so it should be a major cleanup. > > >
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka wrote: >> Blue Swirl wrote: >>> On Wed, May 26, 2010 at 11:26 PM, Paul Brook wrote: > At the other extreme, would it be possible to make the educated guests > aware of the virtualization also in clock aspect: virtio-clock? The guest doesn't even need to be aware of virtualization. It just needs to be able to accommodate the lack of guaranteed realtime behavior. The fundamental problem here is that some guest operating systems assume that the hardware provides certain realtime guarantees with respect to execution of interrupt handlers. In particular they assume that the CPU will always be able to complete execution of the timer IRQ handler before the periodic timer triggers again. In most virtualized environments you have absolutely no guarantee of realtime response. With Linux guests this was solved a long time ago by the introduction of tickless kernels. These separate the timekeeping from wakeup events, so it doesn't matter if several wakeup triggers end up getting merged (either at the hardware level or via top/bottom half guest IRQ handlers). It's worth mentioning that this problem also occurs on real hardware, typically due to lame hardware/drivers which end up masking interrupts or otherwise stall the CPU for for long periods of time. The PIT hack attempts to workaround broken guests by adding artificial latency to the timer event, ensuring that the guest "sees" them all. Unfortunately guests vary on when it is safe for them to see the next timer event, and trying to observe this behavior involves potentially harmful heuristics and collusion between unrelated devices (e.g. interrupt controller and timer). In some cases we don't even do that, and just reschedule the event some arbitrarily small amount of time later. This assumes the guest to do useful work in that time. In a single threaded environment this is probably true - qemu got enough CPU to inject the first interrupt, so will probably manage to execute some guest code before the end of its timeslice. In an environment where interrupt processing/delivery and execution of the guest code happen in different threads this becomes increasingly likely to fail. >>> So any voodoo around timer events is doomed to fail in some cases. >>> What's the amount of hacks what we want then? Is there any generic >> The aim of this patch is to reduce the amount of existing and upcoming >> hacks. It may still require some refinements, but I think we haven't >> found any smarter approach yet that fits existing use cases. > > I don't feel we have tried other possibilities hard enough. Well, seeing prototypes wouldn't be bad, also to run real load againt them. But at least I'm currently clueless what to implement. > >>> solution, like slowing down the guest system to the point where we can >>> guarantee the interrupt rate vs. CPU execution speed? >> That's generally a non-option in virtualized production environments. >> Specifically if the guest system lost interrupts due to host >> overcommitment, you do not want it slow down even further. > > I meant that the guest time could be scaled down, for example 2s in > wall clock time would be presented to the guest as 1s. But that is precisely what already happens when the guest loses timer interrupts. There is no other time source for this kind of guests - often except for some external events generated by systems which you don't want to fall behind arbitrarily. > Then the amount > of CPU cycles between timer interrupts would increase and hopefully > the guest can keep up. If the guest sleeps, time base could be > accelerated to catch up with wall clock and then set back to 1:1 rate. Can't follow you ATM, sorry. What should be slowed down then? And how precisely? Jan > > Slowing down could be triggered by measuring the guest load (for > example, by checking for presence of halt instructions), if it's close > to 1, time would be slowed down. If the guest starts to issue halt > instructions because it's more idle, we can increase speed. > > If this approach worked, even APIC could be made ignorant about > coalescing voodoo so it should be a major cleanup. signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka wrote: > Blue Swirl wrote: >> On Wed, May 26, 2010 at 11:26 PM, Paul Brook wrote: At the other extreme, would it be possible to make the educated guests aware of the virtualization also in clock aspect: virtio-clock? >>> The guest doesn't even need to be aware of virtualization. It just needs to >>> be >>> able to accommodate the lack of guaranteed realtime behavior. >>> >>> The fundamental problem here is that some guest operating systems assume >>> that >>> the hardware provides certain realtime guarantees with respect to execution >>> of >>> interrupt handlers. In particular they assume that the CPU will always be >>> able to complete execution of the timer IRQ handler before the periodic >>> timer >>> triggers again. In most virtualized environments you have absolutely no >>> guarantee of realtime response. >>> >>> With Linux guests this was solved a long time ago by the introduction of >>> tickless kernels. These separate the timekeeping from wakeup events, so it >>> doesn't matter if several wakeup triggers end up getting merged (either at >>> the >>> hardware level or via top/bottom half guest IRQ handlers). >>> >>> >>> It's worth mentioning that this problem also occurs on real hardware, >>> typically due to lame hardware/drivers which end up masking interrupts or >>> otherwise stall the CPU for for long periods of time. >>> >>> >>> The PIT hack attempts to workaround broken guests by adding artificial >>> latency >>> to the timer event, ensuring that the guest "sees" them all. Unfortunately >>> guests vary on when it is safe for them to see the next timer event, and >>> trying to observe this behavior involves potentially harmful heuristics and >>> collusion between unrelated devices (e.g. interrupt controller and timer). >>> >>> In some cases we don't even do that, and just reschedule the event some >>> arbitrarily small amount of time later. This assumes the guest to do useful >>> work in that time. In a single threaded environment this is probably true - >>> qemu got enough CPU to inject the first interrupt, so will probably manage >>> to >>> execute some guest code before the end of its timeslice. In an environment >>> where interrupt processing/delivery and execution of the guest code happen >>> in >>> different threads this becomes increasingly likely to fail. >> >> So any voodoo around timer events is doomed to fail in some cases. >> What's the amount of hacks what we want then? Is there any generic > > The aim of this patch is to reduce the amount of existing and upcoming > hacks. It may still require some refinements, but I think we haven't > found any smarter approach yet that fits existing use cases. I don't feel we have tried other possibilities hard enough. >> solution, like slowing down the guest system to the point where we can >> guarantee the interrupt rate vs. CPU execution speed? > > That's generally a non-option in virtualized production environments. > Specifically if the guest system lost interrupts due to host > overcommitment, you do not want it slow down even further. I meant that the guest time could be scaled down, for example 2s in wall clock time would be presented to the guest as 1s. Then the amount of CPU cycles between timer interrupts would increase and hopefully the guest can keep up. If the guest sleeps, time base could be accelerated to catch up with wall clock and then set back to 1:1 rate. Slowing down could be triggered by measuring the guest load (for example, by checking for presence of halt instructions), if it's close to 1, time would be slowed down. If the guest starts to issue halt instructions because it's more idle, we can increase speed. If this approach worked, even APIC could be made ignorant about coalescing voodoo so it should be a major cleanup.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > On Wed, May 26, 2010 at 11:26 PM, Paul Brook wrote: >>> At the other extreme, would it be possible to make the educated guests >>> aware of the virtualization also in clock aspect: virtio-clock? >> The guest doesn't even need to be aware of virtualization. It just needs to >> be >> able to accommodate the lack of guaranteed realtime behavior. >> >> The fundamental problem here is that some guest operating systems assume that >> the hardware provides certain realtime guarantees with respect to execution >> of >> interrupt handlers. In particular they assume that the CPU will always be >> able to complete execution of the timer IRQ handler before the periodic timer >> triggers again. In most virtualized environments you have absolutely no >> guarantee of realtime response. >> >> With Linux guests this was solved a long time ago by the introduction of >> tickless kernels. These separate the timekeeping from wakeup events, so it >> doesn't matter if several wakeup triggers end up getting merged (either at >> the >> hardware level or via top/bottom half guest IRQ handlers). >> >> >> It's worth mentioning that this problem also occurs on real hardware, >> typically due to lame hardware/drivers which end up masking interrupts or >> otherwise stall the CPU for for long periods of time. >> >> >> The PIT hack attempts to workaround broken guests by adding artificial >> latency >> to the timer event, ensuring that the guest "sees" them all. Unfortunately >> guests vary on when it is safe for them to see the next timer event, and >> trying to observe this behavior involves potentially harmful heuristics and >> collusion between unrelated devices (e.g. interrupt controller and timer). >> >> In some cases we don't even do that, and just reschedule the event some >> arbitrarily small amount of time later. This assumes the guest to do useful >> work in that time. In a single threaded environment this is probably true - >> qemu got enough CPU to inject the first interrupt, so will probably manage to >> execute some guest code before the end of its timeslice. In an environment >> where interrupt processing/delivery and execution of the guest code happen in >> different threads this becomes increasingly likely to fail. > > So any voodoo around timer events is doomed to fail in some cases. > What's the amount of hacks what we want then? Is there any generic The aim of this patch is to reduce the amount of existing and upcoming hacks. It may still require some refinements, but I think we haven't found any smarter approach yet that fits existing use cases. > solution, like slowing down the guest system to the point where we can > guarantee the interrupt rate vs. CPU execution speed? That's generally a non-option in virtualized production environments. Specifically if the guest system lost interrupts due to host overcommitment, you do not want it slow down even further. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
2010/5/27 Gleb Natapov : > On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote: >> On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: >> > Blue Swirl wrote: >> >> On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka wrote: >> >>> Anthony Liguori wrote: >> On 05/25/2010 02:09 PM, Blue Swirl wrote: >> > On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: >> > >> >> From: Jan Kiszka >> >> >> >> This allows to communicate potential IRQ coalescing during delivery >> >> from >> >> the sink back to the source. Targets that support IRQ coalescing >> >> workarounds need to register handlers that return the appropriate >> >> QEMU_IRQ_* code, and they have to propergate the code across all IRQ >> >> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can >> >> apply its workaround. If multiple sinks exist, the source may only >> >> consider an IRQ coalesced if all other sinks either report >> >> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >> >> >> > No real devices are interested whether any of their output lines are >> > even connected. This would introduce a new signal type, bidirectional >> > multi-level, which is not correct. >> > >> I don't think it's really an issue of correct, but I wouldn't disagree >> to a suggestion that we ought to introduce a new signal type for this >> type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a >> similar interface as qemu_irq. >> >>> A separate type would complicate the delivery of the feedback value >> >>> across GPIO pins (as Paul requested for the RTC->HPET routing). >> >>> >> > I think the real solution to coalescing is put the logic inside one >> > device, in this case APIC because it has the information about irq >> > delivery. APIC could monitor incoming RTC irqs for frequency >> > information and whether they get delivered or not. If not, an internal >> > timer is installed which injects the lost irqs. >> >>> That won't fly as the IRQs will already arrive at the APIC with a >> >>> sufficiently high jitter. At the bare minimum, you need to tell the >> >>> interrupt controller about the fact that a particular IRQ should be >> >>> delivered at a specific regular rate. For this, you also need a generic >> >>> interface - nothing really "won". >> >> >> >> OK, let's simplify: just reinject at next possible chance. No need to >> >> monitor or tell anything. >> > >> > There are guests that won't like this (I know of one in-house, but >> > others may even have more examples), specifically if you end up firing >> > multiple IRQs in a row due to a longer backlog. For that reason, the RTC >> > spreads the reinjection according to the current rate. >> >> Then reinject with a constant delay, or next CPU exit. Such buggy > If guest's time frequency is the same as host time frequency you can't > reinject with constant delay. That is why current code mixes two > approaches: reinject M interrupts in a raw then delay. This approach can be also used by APIC-only version. >> guests could also be assisted with special handling (like win2k >> install hack), for example guest instructions could be counted >> (approximately, for example using TB size or TSC) and only inject >> after at least N instructions have passed. > Guest instructions cannot be easily counted in KVM (it can be done more > or less reliably using perf counters, may be). Aren't there any debug registers or perf counters, which can generate an interrupt after some number of instructions have been executed? >> >> > And even if the rate did not matter, the APIC woult still have to now >> > about the fact that an IRQ is really periodic and does not only appear >> > as such for a certain interval. This really does not sound like >> > simplifying things or even make them cleaner. >> >> It would, the voodoo would be contained only in APIC, RTC would be >> just like any other device. With the bidirectional irqs, this voodoo >> would probably eventually spread to many other devices. The logical >> conclusion of that would be a system where all devices would be >> careful not to disturb the guest at wrong moment because that would >> trigger a bug. >> > This voodoo will be so complex and unreliable that it will make RTC hack > pale in comparison (and I still don't see how you are going to make it > actually work). Implement everything inside APIC: only coalescing and reinjection. Maybe that version would not bend backwards as much as the current to cater for buggy hosts. > The fact is that timer device is not "just like any > other device" in virtual world. Any other device is easy: you just > implement spec as close as possible and everything works. For time > source device this is not enough. You can implement RTC+HPET to the > letter and your guest will drift like crazy. It's doable: a cycle accurate emulator will not cause any drift, without any voodoo. The interrupts wou
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Wed, May 26, 2010 at 11:26 PM, Paul Brook wrote: >> At the other extreme, would it be possible to make the educated guests >> aware of the virtualization also in clock aspect: virtio-clock? > > The guest doesn't even need to be aware of virtualization. It just needs to be > able to accommodate the lack of guaranteed realtime behavior. > > The fundamental problem here is that some guest operating systems assume that > the hardware provides certain realtime guarantees with respect to execution of > interrupt handlers. In particular they assume that the CPU will always be > able to complete execution of the timer IRQ handler before the periodic timer > triggers again. In most virtualized environments you have absolutely no > guarantee of realtime response. > > With Linux guests this was solved a long time ago by the introduction of > tickless kernels. These separate the timekeeping from wakeup events, so it > doesn't matter if several wakeup triggers end up getting merged (either at the > hardware level or via top/bottom half guest IRQ handlers). > > > It's worth mentioning that this problem also occurs on real hardware, > typically due to lame hardware/drivers which end up masking interrupts or > otherwise stall the CPU for for long periods of time. > > > The PIT hack attempts to workaround broken guests by adding artificial latency > to the timer event, ensuring that the guest "sees" them all. Unfortunately > guests vary on when it is safe for them to see the next timer event, and > trying to observe this behavior involves potentially harmful heuristics and > collusion between unrelated devices (e.g. interrupt controller and timer). > > In some cases we don't even do that, and just reschedule the event some > arbitrarily small amount of time later. This assumes the guest to do useful > work in that time. In a single threaded environment this is probably true - > qemu got enough CPU to inject the first interrupt, so will probably manage to > execute some guest code before the end of its timeslice. In an environment > where interrupt processing/delivery and execution of the guest code happen in > different threads this becomes increasingly likely to fail. So any voodoo around timer events is doomed to fail in some cases. What's the amount of hacks what we want then? Is there any generic solution, like slowing down the guest system to the point where we can guarantee the interrupt rate vs. CPU execution speed?
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Wed, May 26, 2010 at 08:14:38PM +, Blue Swirl wrote: > On Wed, May 26, 2010 at 8:08 AM, Gleb Natapov wrote: > > On Tue, May 25, 2010 at 11:44:50PM +0200, Jan Kiszka wrote: > >> > > >> >> I think the real solution to coalescing is put the logic inside one > >> >> device, in this case APIC because it has the information about irq > >> >> delivery. APIC could monitor incoming RTC irqs for frequency > >> >> information and whether they get delivered or not. If not, an internal > >> >> timer is installed which injects the lost irqs. > >> > >> That won't fly as the IRQs will already arrive at the APIC with a > >> sufficiently high jitter. At the bare minimum, you need to tell the > >> interrupt controller about the fact that a particular IRQ should be > >> delivered at a specific regular rate. For this, you also need a generic > >> interface - nothing really "won". > >> > > Not only that. Suppose RTC runs with 64Hz frequency and N interrupts > > were coalesced during some period. Now the guest reprograms RTC to > > 1024Hz. N should be scaled accordingly otherwise reinjection will not > > fix the drift. Do you propose to put this logic into APIC to? > > Interesting case, I don't think this is handled by the current code > either. Of course it is: if(period != s->period) s->irq_coalesced = (s->irq_coalesced * s->period) / period; Qemu is not a toy anymore and dials with real loads in production now. > Could this happen if the target machine does not have HPET and > the guest runs a tickless kernel? > Windows doesn't have tickles kernel and Linux does not use RTC as time source. > I think the guest would be buggy to reprogram the timer without > checking that the interrupt from the previous timer frequency won't > interfere, for example by stopping the clock, or doing the > reprogramming only at timer interrupt handler. Otherwise the period > may be unpredictable for one period, which means that the time base > has been lost. Guest knows nothing about interrupt that was not yet delivered. And Windows doesn't stop clock while changing frequency. > > But let's consider how this could be handled with the current code (or > with the magical interrupts). When doing the scaling, the guest, while > reprogramming, is unaware of the old queued interrupts with the > previous rate. Why would scaling these be more correct? I'd think the > old ones should be just reinjected ASAP without any scaling. If old once will be reinjected as is time calculation in the guest will be incorrect. Windows is really stupid when it comes to time keeping. It just count interrupts and multiplies them by clock frequency to calculate the time. So if qemu accumulated 64 coalesced interrupts while clock frequency was 64Hz it means that the guest 1 second behind the time. If the guest change frequency to 1024hz reinfecting 64 interrupts will not fix guest time, 1024 interrupts should be injected instead. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote: > On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: > > Blue Swirl wrote: > >> On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka wrote: > >>> Anthony Liguori wrote: > On 05/25/2010 02:09 PM, Blue Swirl wrote: > > On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: > > > >> From: Jan Kiszka > >> > >> This allows to communicate potential IRQ coalescing during delivery > >> from > >> the sink back to the source. Targets that support IRQ coalescing > >> workarounds need to register handlers that return the appropriate > >> QEMU_IRQ_* code, and they have to propergate the code across all IRQ > >> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can > >> apply its workaround. If multiple sinks exist, the source may only > >> consider an IRQ coalesced if all other sinks either report > >> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. > >> > > No real devices are interested whether any of their output lines are > > even connected. This would introduce a new signal type, bidirectional > > multi-level, which is not correct. > > > I don't think it's really an issue of correct, but I wouldn't disagree > to a suggestion that we ought to introduce a new signal type for this > type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a > similar interface as qemu_irq. > >>> A separate type would complicate the delivery of the feedback value > >>> across GPIO pins (as Paul requested for the RTC->HPET routing). > >>> > > I think the real solution to coalescing is put the logic inside one > > device, in this case APIC because it has the information about irq > > delivery. APIC could monitor incoming RTC irqs for frequency > > information and whether they get delivered or not. If not, an internal > > timer is installed which injects the lost irqs. > >>> That won't fly as the IRQs will already arrive at the APIC with a > >>> sufficiently high jitter. At the bare minimum, you need to tell the > >>> interrupt controller about the fact that a particular IRQ should be > >>> delivered at a specific regular rate. For this, you also need a generic > >>> interface - nothing really "won". > >> > >> OK, let's simplify: just reinject at next possible chance. No need to > >> monitor or tell anything. > > > > There are guests that won't like this (I know of one in-house, but > > others may even have more examples), specifically if you end up firing > > multiple IRQs in a row due to a longer backlog. For that reason, the RTC > > spreads the reinjection according to the current rate. > > Then reinject with a constant delay, or next CPU exit. Such buggy If guest's time frequency is the same as host time frequency you can't reinject with constant delay. That is why current code mixes two approaches: reinject M interrupts in a raw then delay. > guests could also be assisted with special handling (like win2k > install hack), for example guest instructions could be counted > (approximately, for example using TB size or TSC) and only inject > after at least N instructions have passed. Guest instructions cannot be easily counted in KVM (it can be done more or less reliably using perf counters, may be). > > > And even if the rate did not matter, the APIC woult still have to now > > about the fact that an IRQ is really periodic and does not only appear > > as such for a certain interval. This really does not sound like > > simplifying things or even make them cleaner. > > It would, the voodoo would be contained only in APIC, RTC would be > just like any other device. With the bidirectional irqs, this voodoo > would probably eventually spread to many other devices. The logical > conclusion of that would be a system where all devices would be > careful not to disturb the guest at wrong moment because that would > trigger a bug. > This voodoo will be so complex and unreliable that it will make RTC hack pale in comparison (and I still don't see how you are going to make it actually work). The fact is that timer device is not "just like any other device" in virtual world. Any other device is easy: you just implement spec as close as possible and everything works. For time source device this is not enough. You can implement RTC+HPET to the letter and your guest will drift like crazy. > At the other extreme, would it be possible to make the educated guests > aware of the virtualization also in clock aspect: virtio-clock? No. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Wed, May 26, 2010 at 10:09:52PM +0200, Jan Kiszka wrote: > Blue Swirl wrote: > > On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka wrote: > >> Anthony Liguori wrote: > >>> On 05/25/2010 02:09 PM, Blue Swirl wrote: > On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: > > > From: Jan Kiszka > > > > This allows to communicate potential IRQ coalescing during delivery from > > the sink back to the source. Targets that support IRQ coalescing > > workarounds need to register handlers that return the appropriate > > QEMU_IRQ_* code, and they have to propergate the code across all IRQ > > redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can > > apply its workaround. If multiple sinks exist, the source may only > > consider an IRQ coalesced if all other sinks either report > > QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. > > > No real devices are interested whether any of their output lines are > even connected. This would introduce a new signal type, bidirectional > multi-level, which is not correct. > > >>> I don't think it's really an issue of correct, but I wouldn't disagree > >>> to a suggestion that we ought to introduce a new signal type for this > >>> type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a > >>> similar interface as qemu_irq. > >> A separate type would complicate the delivery of the feedback value > >> across GPIO pins (as Paul requested for the RTC->HPET routing). > >> > I think the real solution to coalescing is put the logic inside one > device, in this case APIC because it has the information about irq > delivery. APIC could monitor incoming RTC irqs for frequency > information and whether they get delivered or not. If not, an internal > timer is installed which injects the lost irqs. > >> That won't fly as the IRQs will already arrive at the APIC with a > >> sufficiently high jitter. At the bare minimum, you need to tell the > >> interrupt controller about the fact that a particular IRQ should be > >> delivered at a specific regular rate. For this, you also need a generic > >> interface - nothing really "won". > > > > OK, let's simplify: just reinject at next possible chance. No need to > > monitor or tell anything. > > There are guests that won't like this (I know of one in-house, but > others may even have more examples), specifically if you end up firing > multiple IRQs in a row due to a longer backlog. For that reason, the RTC > spreads the reinjection according to the current rate. > No need to look far for such guests. See commit dd17765b5f77ca. > And even if the rate did not matter, the APIC woult still have to now > about the fact that an IRQ is really periodic and does not only appear > as such for a certain interval. This really does not sound like > simplifying things or even make them cleaner. > > Jan > -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
> At the other extreme, would it be possible to make the educated guests > aware of the virtualization also in clock aspect: virtio-clock? The guest doesn't even need to be aware of virtualization. It just needs to be able to accommodate the lack of guaranteed realtime behavior. The fundamental problem here is that some guest operating systems assume that the hardware provides certain realtime guarantees with respect to execution of interrupt handlers. In particular they assume that the CPU will always be able to complete execution of the timer IRQ handler before the periodic timer triggers again. In most virtualized environments you have absolutely no guarantee of realtime response. With Linux guests this was solved a long time ago by the introduction of tickless kernels. These separate the timekeeping from wakeup events, so it doesn't matter if several wakeup triggers end up getting merged (either at the hardware level or via top/bottom half guest IRQ handlers). It's worth mentioning that this problem also occurs on real hardware, typically due to lame hardware/drivers which end up masking interrupts or otherwise stall the CPU for for long periods of time. The PIT hack attempts to workaround broken guests by adding artificial latency to the timer event, ensuring that the guest "sees" them all. Unfortunately guests vary on when it is safe for them to see the next timer event, and trying to observe this behavior involves potentially harmful heuristics and collusion between unrelated devices (e.g. interrupt controller and timer). In some cases we don't even do that, and just reschedule the event some arbitrarily small amount of time later. This assumes the guest to do useful work in that time. In a single threaded environment this is probably true - qemu got enough CPU to inject the first interrupt, so will probably manage to execute some guest code before the end of its timeslice. In an environment where interrupt processing/delivery and execution of the guest code happen in different threads this becomes increasingly likely to fail. Paul
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: >> I think the real solution to coalescing is put the logic inside one >> device, in this case APIC because it has the information about irq >> delivery. APIC could monitor incoming RTC irqs for frequency >> information and whether they get delivered or not. If not, an internal >> timer is installed which injects the lost irqs. That won't fly as the IRQs will already arrive at the APIC with a sufficiently high jitter. At the bare minimum, you need to tell the interrupt controller about the fact that a particular IRQ should be delivered at a specific regular rate. For this, you also need a generic interface - nothing really "won". >>> OK, let's simplify: just reinject at next possible chance. No need to >>> monitor or tell anything. >> There are guests that won't like this (I know of one in-house, but >> others may even have more examples), specifically if you end up firing >> multiple IRQs in a row due to a longer backlog. For that reason, the RTC >> spreads the reinjection according to the current rate. > > Then reinject with a constant delay, or next CPU exit. Such buggy > guests could also be assisted with special handling (like win2k > install hack), for example guest instructions could be counted > (approximately, for example using TB size or TSC) and only inject > after at least N instructions have passed. Let's assume that would be an alternative... > >> And even if the rate did not matter, the APIC woult still have to now >> about the fact that an IRQ is really periodic and does not only appear >> as such for a certain interval. This really does not sound like >> simplifying things or even make them cleaner. > > It would, the voodoo would be contained only in APIC, RTC would be > just like any other device. ...we would still need that communication channel from the RTC to the APIC to tell the latter about entering/leaving periodic IRQ generation. And as we don't want the RTC requiring any clue about who's finally delivering the IRQ, we still need some generic interface. This cannot be contained in the APIC - even leaving the PIC as another IRQ delivery service aside for now, and maybe other virtualized architectures. > With the bidirectional irqs, this voodoo > would probably eventually spread to many other devices. Yes, the HPET was reported to need this feature as well. > The logical > conclusion of that would be a system where all devices would be > careful not to disturb the guest at wrong moment because that would > trigger a bug. > > At the other extreme, would it be possible to make the educated guests > aware of the virtualization also in clock aspect: virtio-clock? Paravirtual clocks are around since ages (e.g. kvm-clock), just unfortunately not long enough to find them supported by legacy guests. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka wrote: > Blue Swirl wrote: >> On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka wrote: >>> Anthony Liguori wrote: On 05/25/2010 02:09 PM, Blue Swirl wrote: > On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: > >> From: Jan Kiszka >> >> This allows to communicate potential IRQ coalescing during delivery from >> the sink back to the source. Targets that support IRQ coalescing >> workarounds need to register handlers that return the appropriate >> QEMU_IRQ_* code, and they have to propergate the code across all IRQ >> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can >> apply its workaround. If multiple sinks exist, the source may only >> consider an IRQ coalesced if all other sinks either report >> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >> > No real devices are interested whether any of their output lines are > even connected. This would introduce a new signal type, bidirectional > multi-level, which is not correct. > I don't think it's really an issue of correct, but I wouldn't disagree to a suggestion that we ought to introduce a new signal type for this type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a similar interface as qemu_irq. >>> A separate type would complicate the delivery of the feedback value >>> across GPIO pins (as Paul requested for the RTC->HPET routing). >>> > I think the real solution to coalescing is put the logic inside one > device, in this case APIC because it has the information about irq > delivery. APIC could monitor incoming RTC irqs for frequency > information and whether they get delivered or not. If not, an internal > timer is installed which injects the lost irqs. >>> That won't fly as the IRQs will already arrive at the APIC with a >>> sufficiently high jitter. At the bare minimum, you need to tell the >>> interrupt controller about the fact that a particular IRQ should be >>> delivered at a specific regular rate. For this, you also need a generic >>> interface - nothing really "won". >> >> OK, let's simplify: just reinject at next possible chance. No need to >> monitor or tell anything. > > There are guests that won't like this (I know of one in-house, but > others may even have more examples), specifically if you end up firing > multiple IRQs in a row due to a longer backlog. For that reason, the RTC > spreads the reinjection according to the current rate. Then reinject with a constant delay, or next CPU exit. Such buggy guests could also be assisted with special handling (like win2k install hack), for example guest instructions could be counted (approximately, for example using TB size or TSC) and only inject after at least N instructions have passed. > And even if the rate did not matter, the APIC woult still have to now > about the fact that an IRQ is really periodic and does not only appear > as such for a certain interval. This really does not sound like > simplifying things or even make them cleaner. It would, the voodoo would be contained only in APIC, RTC would be just like any other device. With the bidirectional irqs, this voodoo would probably eventually spread to many other devices. The logical conclusion of that would be a system where all devices would be careful not to disturb the guest at wrong moment because that would trigger a bug. At the other extreme, would it be possible to make the educated guests aware of the virtualization also in clock aspect: virtio-clock?
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Wed, May 26, 2010 at 8:08 AM, Gleb Natapov wrote: > On Tue, May 25, 2010 at 11:44:50PM +0200, Jan Kiszka wrote: >> > >> >> I think the real solution to coalescing is put the logic inside one >> >> device, in this case APIC because it has the information about irq >> >> delivery. APIC could monitor incoming RTC irqs for frequency >> >> information and whether they get delivered or not. If not, an internal >> >> timer is installed which injects the lost irqs. >> >> That won't fly as the IRQs will already arrive at the APIC with a >> sufficiently high jitter. At the bare minimum, you need to tell the >> interrupt controller about the fact that a particular IRQ should be >> delivered at a specific regular rate. For this, you also need a generic >> interface - nothing really "won". >> > Not only that. Suppose RTC runs with 64Hz frequency and N interrupts > were coalesced during some period. Now the guest reprograms RTC to > 1024Hz. N should be scaled accordingly otherwise reinjection will not > fix the drift. Do you propose to put this logic into APIC to? Interesting case, I don't think this is handled by the current code either. Could this happen if the target machine does not have HPET and the guest runs a tickless kernel? I think the guest would be buggy to reprogram the timer without checking that the interrupt from the previous timer frequency won't interfere, for example by stopping the clock, or doing the reprogramming only at timer interrupt handler. Otherwise the period may be unpredictable for one period, which means that the time base has been lost. But let's consider how this could be handled with the current code (or with the magical interrupts). When doing the scaling, the guest, while reprogramming, is unaware of the old queued interrupts with the previous rate. Why would scaling these be more correct? I'd think the old ones should be just reinjected ASAP without any scaling.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Blue Swirl wrote: > On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka wrote: >> Anthony Liguori wrote: >>> On 05/25/2010 02:09 PM, Blue Swirl wrote: On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: > From: Jan Kiszka > > This allows to communicate potential IRQ coalescing during delivery from > the sink back to the source. Targets that support IRQ coalescing > workarounds need to register handlers that return the appropriate > QEMU_IRQ_* code, and they have to propergate the code across all IRQ > redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can > apply its workaround. If multiple sinks exist, the source may only > consider an IRQ coalesced if all other sinks either report > QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. > No real devices are interested whether any of their output lines are even connected. This would introduce a new signal type, bidirectional multi-level, which is not correct. >>> I don't think it's really an issue of correct, but I wouldn't disagree >>> to a suggestion that we ought to introduce a new signal type for this >>> type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a >>> similar interface as qemu_irq. >> A separate type would complicate the delivery of the feedback value >> across GPIO pins (as Paul requested for the RTC->HPET routing). >> I think the real solution to coalescing is put the logic inside one device, in this case APIC because it has the information about irq delivery. APIC could monitor incoming RTC irqs for frequency information and whether they get delivered or not. If not, an internal timer is installed which injects the lost irqs. >> That won't fly as the IRQs will already arrive at the APIC with a >> sufficiently high jitter. At the bare minimum, you need to tell the >> interrupt controller about the fact that a particular IRQ should be >> delivered at a specific regular rate. For this, you also need a generic >> interface - nothing really "won". > > OK, let's simplify: just reinject at next possible chance. No need to > monitor or tell anything. There are guests that won't like this (I know of one in-house, but others may even have more examples), specifically if you end up firing multiple IRQs in a row due to a longer backlog. For that reason, the RTC spreads the reinjection according to the current rate. And even if the rate did not matter, the APIC woult still have to now about the fact that an IRQ is really periodic and does not only appear as such for a certain interval. This really does not sound like simplifying things or even make them cleaner. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Tue, May 25, 2010 at 8:16 PM, Anthony Liguori wrote: > On 05/25/2010 02:09 PM, Blue Swirl wrote: >> >> On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: >> >>> >>> From: Jan Kiszka >>> >>> This allows to communicate potential IRQ coalescing during delivery from >>> the sink back to the source. Targets that support IRQ coalescing >>> workarounds need to register handlers that return the appropriate >>> QEMU_IRQ_* code, and they have to propergate the code across all IRQ >>> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can >>> apply its workaround. If multiple sinks exist, the source may only >>> consider an IRQ coalesced if all other sinks either report >>> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >>> >> >> No real devices are interested whether any of their output lines are >> even connected. This would introduce a new signal type, bidirectional >> multi-level, which is not correct. >> > > I don't think it's really an issue of correct, but I wouldn't disagree to a > suggestion that we ought to introduce a new signal type for this type of > bidirectional feedback. Maybe it's qemu_coalesced_irq and has a similar > interface as qemu_irq. I'd rather try harder to find a solution for the problem without this. >> I think the real solution to coalescing is put the logic inside one >> device, in this case APIC because it has the information about irq >> delivery. APIC could monitor incoming RTC irqs for frequency >> information and whether they get delivered or not. If not, an internal >> timer is installed which injects the lost irqs. >> >> Of course, no real device could do such de-coalescing, but with this >> approach, the voodoo is contained to insides of one device, APIC. >> >> We should also take a step back to think what was the cause of lost >> irqs, IIRC uneven execution rate in QEMU. > > Not only that. The pathological case is where a host is limited to a 1khz > timer frequency and the guest requests a 1khz timer frequency. Practically > speaking, there is no way we'll ever be able to adjust timers to reinject > lost interrupts because of the host timer limitation. But APIC knows when the guest has acked a timer irq and so can immediately issue another one to cover a lost irq. APIC could also arrange a CPU exit. The irqs would not arrive at even rate, but in this pathological case, the host can't ever schedule any delays less than 1ms. >> Could this be fixed or taken >> into account in timer handling? For example, CPU loop could analyze >> the wall clock time between CPU exits and use that to offset the >> timers. Thus the timer frequency (in wall clock time) could be made to >> correspond a bit more to VCPU execution rate. >> > > A lot of what motivates the timer reinjection work is very old linux kernels > that had fixed userspace timer frequencies. On newer host kernels, it's > probably not nearly as important except when you get into pathological cases > like exposing a high frequency HPET timer to the guest where you cannot keep > up with the host. There will always be some delays and overhead because of the emulation. We can either hide this completely from the guest (by stretching the guest time) or we can try to keep the guest time base in synch with host but then the guest can't expect native class performance to be available between timer irqs. On a native hardware, you could have a 1kHz timer and a kernel task always running for max. 999us with interrupts disabled, then for at least 1us the interrupts are enabled and the timer interrupt handler gets a chance to run. On QEMU, such a small time window may be impossible to emulate. No coalescing would help in this case. With a cycle counting emulation it's possible, but the speed would probably be slower than native. With the interrupt coalescing you may be able to cover some easy cases. > > Regards, > > Anthony Liguori > >>> Signed-off-by: Jan Kiszka >>> --- >>> hw/irq.c | 38 +- >>> hw/irq.h | 22 +++--- >>> 2 files changed, 44 insertions(+), 16 deletions(-) >>> >>> diff --git a/hw/irq.c b/hw/irq.c >>> index 7703f62..db2cce6 100644 >>> --- a/hw/irq.c >>> +++ b/hw/irq.c >>> @@ -26,19 +26,27 @@ >>> >>> struct IRQState { >>> qemu_irq_handler handler; >>> + qemu_irq_fb_handler feedback_handler; >>> void *opaque; >>> int n; >>> }; >>> >>> -void qemu_set_irq(qemu_irq irq, int level) >>> +int qemu_set_irq(qemu_irq irq, int level) >>> { >>> - if (!irq) >>> - return; >>> - >>> - irq->handler(irq->opaque, irq->n, level); >>> + if (!irq) { >>> + return 0; >>> + } >>> + if (irq->feedback_handler) { >>> + return irq->feedback_handler(irq->opaque, irq->n, level); >>> + } else { >>> + irq->handler(irq->opaque, irq->n, level); >>> + return QEMU_IRQ_DELIVERED; >>> + } >>> } >>> >>> -qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int >>> n) >>> +static qemu_irq *allo
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka wrote: > Anthony Liguori wrote: >> On 05/25/2010 02:09 PM, Blue Swirl wrote: >>> On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: >>> From: Jan Kiszka This allows to communicate potential IRQ coalescing during delivery from the sink back to the source. Targets that support IRQ coalescing workarounds need to register handlers that return the appropriate QEMU_IRQ_* code, and they have to propergate the code across all IRQ redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can apply its workaround. If multiple sinks exist, the source may only consider an IRQ coalesced if all other sinks either report QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >>> No real devices are interested whether any of their output lines are >>> even connected. This would introduce a new signal type, bidirectional >>> multi-level, which is not correct. >>> >> >> I don't think it's really an issue of correct, but I wouldn't disagree >> to a suggestion that we ought to introduce a new signal type for this >> type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a >> similar interface as qemu_irq. > > A separate type would complicate the delivery of the feedback value > across GPIO pins (as Paul requested for the RTC->HPET routing). > >> >>> I think the real solution to coalescing is put the logic inside one >>> device, in this case APIC because it has the information about irq >>> delivery. APIC could monitor incoming RTC irqs for frequency >>> information and whether they get delivered or not. If not, an internal >>> timer is installed which injects the lost irqs. > > That won't fly as the IRQs will already arrive at the APIC with a > sufficiently high jitter. At the bare minimum, you need to tell the > interrupt controller about the fact that a particular IRQ should be > delivered at a specific regular rate. For this, you also need a generic > interface - nothing really "won". OK, let's simplify: just reinject at next possible chance. No need to monitor or tell anything.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Tue, May 25, 2010 at 11:44:50PM +0200, Jan Kiszka wrote: > > > >> I think the real solution to coalescing is put the logic inside one > >> device, in this case APIC because it has the information about irq > >> delivery. APIC could monitor incoming RTC irqs for frequency > >> information and whether they get delivered or not. If not, an internal > >> timer is installed which injects the lost irqs. > > That won't fly as the IRQs will already arrive at the APIC with a > sufficiently high jitter. At the bare minimum, you need to tell the > interrupt controller about the fact that a particular IRQ should be > delivered at a specific regular rate. For this, you also need a generic > interface - nothing really "won". > Not only that. Suppose RTC runs with 64Hz frequency and N interrupts were coalesced during some period. Now the guest reprograms RTC to 1024Hz. N should be scaled accordingly otherwise reinjection will not fix the drift. Do you propose to put this logic into APIC to? -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Anthony Liguori wrote: > On 05/25/2010 02:09 PM, Blue Swirl wrote: >> On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: >> >>> From: Jan Kiszka >>> >>> This allows to communicate potential IRQ coalescing during delivery from >>> the sink back to the source. Targets that support IRQ coalescing >>> workarounds need to register handlers that return the appropriate >>> QEMU_IRQ_* code, and they have to propergate the code across all IRQ >>> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can >>> apply its workaround. If multiple sinks exist, the source may only >>> consider an IRQ coalesced if all other sinks either report >>> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >>> >> No real devices are interested whether any of their output lines are >> even connected. This would introduce a new signal type, bidirectional >> multi-level, which is not correct. >> > > I don't think it's really an issue of correct, but I wouldn't disagree > to a suggestion that we ought to introduce a new signal type for this > type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a > similar interface as qemu_irq. A separate type would complicate the delivery of the feedback value across GPIO pins (as Paul requested for the RTC->HPET routing). > >> I think the real solution to coalescing is put the logic inside one >> device, in this case APIC because it has the information about irq >> delivery. APIC could monitor incoming RTC irqs for frequency >> information and whether they get delivered or not. If not, an internal >> timer is installed which injects the lost irqs. That won't fly as the IRQs will already arrive at the APIC with a sufficiently high jitter. At the bare minimum, you need to tell the interrupt controller about the fact that a particular IRQ should be delivered at a specific regular rate. For this, you also need a generic interface - nothing really "won". Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On 05/25/2010 02:09 PM, Blue Swirl wrote: On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: From: Jan Kiszka This allows to communicate potential IRQ coalescing during delivery from the sink back to the source. Targets that support IRQ coalescing workarounds need to register handlers that return the appropriate QEMU_IRQ_* code, and they have to propergate the code across all IRQ redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can apply its workaround. If multiple sinks exist, the source may only consider an IRQ coalesced if all other sinks either report QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. No real devices are interested whether any of their output lines are even connected. This would introduce a new signal type, bidirectional multi-level, which is not correct. I don't think it's really an issue of correct, but I wouldn't disagree to a suggestion that we ought to introduce a new signal type for this type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a similar interface as qemu_irq. I think the real solution to coalescing is put the logic inside one device, in this case APIC because it has the information about irq delivery. APIC could monitor incoming RTC irqs for frequency information and whether they get delivered or not. If not, an internal timer is installed which injects the lost irqs. Of course, no real device could do such de-coalescing, but with this approach, the voodoo is contained to insides of one device, APIC. We should also take a step back to think what was the cause of lost irqs, IIRC uneven execution rate in QEMU. Not only that. The pathological case is where a host is limited to a 1khz timer frequency and the guest requests a 1khz timer frequency. Practically speaking, there is no way we'll ever be able to adjust timers to reinject lost interrupts because of the host timer limitation. Could this be fixed or taken into account in timer handling? For example, CPU loop could analyze the wall clock time between CPU exits and use that to offset the timers. Thus the timer frequency (in wall clock time) could be made to correspond a bit more to VCPU execution rate. A lot of what motivates the timer reinjection work is very old linux kernels that had fixed userspace timer frequencies. On newer host kernels, it's probably not nearly as important except when you get into pathological cases like exposing a high frequency HPET timer to the guest where you cannot keep up with the host. Regards, Anthony Liguori Signed-off-by: Jan Kiszka --- hw/irq.c | 38 +- hw/irq.h | 22 +++--- 2 files changed, 44 insertions(+), 16 deletions(-) diff --git a/hw/irq.c b/hw/irq.c index 7703f62..db2cce6 100644 --- a/hw/irq.c +++ b/hw/irq.c @@ -26,19 +26,27 @@ struct IRQState { qemu_irq_handler handler; +qemu_irq_fb_handler feedback_handler; void *opaque; int n; }; -void qemu_set_irq(qemu_irq irq, int level) +int qemu_set_irq(qemu_irq irq, int level) { -if (!irq) -return; - -irq->handler(irq->opaque, irq->n, level); +if (!irq) { +return 0; +} +if (irq->feedback_handler) { +return irq->feedback_handler(irq->opaque, irq->n, level); +} else { +irq->handler(irq->opaque, irq->n, level); +return QEMU_IRQ_DELIVERED; +} } -qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) +static qemu_irq *allocate_irqs(qemu_irq_handler handler, + qemu_irq_fb_handler feedback_handler, + void *opaque, int n) { qemu_irq *s; struct IRQState *p; @@ -48,6 +56,7 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) p = (struct IRQState *)qemu_mallocz(sizeof(struct IRQState) * n); for (i = 0; i< n; i++) { p->handler = handler; +p->feedback_handler = feedback_handler; p->opaque = opaque; p->n = i; s[i] = p; @@ -56,22 +65,33 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) return s; } +qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) +{ +return allocate_irqs(handler, NULL, opaque, n); +} + +qemu_irq *qemu_allocate_feedback_irqs(qemu_irq_fb_handler handler, + void *opaque, int n) +{ +return allocate_irqs(NULL, handler, opaque, n); +} + void qemu_free_irqs(qemu_irq *s) { qemu_free(s[0]); qemu_free(s); } -static void qemu_notirq(void *opaque, int line, int level) +static int qemu_notirq(void *opaque, int line, int level) { struct IRQState *irq = opaque; -irq->handler(irq->opaque, irq->n, !level); +return qemu_set_irq(irq, !level); } qemu_irq qemu_irq_invert(qemu_irq irq) { /* The default state for IRQs is low, so raise the output now. */ qemu_irq_raise(irq); -
[Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: > From: Jan Kiszka > > This allows to communicate potential IRQ coalescing during delivery from > the sink back to the source. Targets that support IRQ coalescing > workarounds need to register handlers that return the appropriate > QEMU_IRQ_* code, and they have to propergate the code across all IRQ > redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can > apply its workaround. If multiple sinks exist, the source may only > consider an IRQ coalesced if all other sinks either report > QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. No real devices are interested whether any of their output lines are even connected. This would introduce a new signal type, bidirectional multi-level, which is not correct. I think the real solution to coalescing is put the logic inside one device, in this case APIC because it has the information about irq delivery. APIC could monitor incoming RTC irqs for frequency information and whether they get delivered or not. If not, an internal timer is installed which injects the lost irqs. Of course, no real device could do such de-coalescing, but with this approach, the voodoo is contained to insides of one device, APIC. We should also take a step back to think what was the cause of lost irqs, IIRC uneven execution rate in QEMU. Could this be fixed or taken into account in timer handling? For example, CPU loop could analyze the wall clock time between CPU exits and use that to offset the timers. Thus the timer frequency (in wall clock time) could be made to correspond a bit more to VCPU execution rate. > > Signed-off-by: Jan Kiszka > --- > hw/irq.c | 38 +- > hw/irq.h | 22 +++--- > 2 files changed, 44 insertions(+), 16 deletions(-) > > diff --git a/hw/irq.c b/hw/irq.c > index 7703f62..db2cce6 100644 > --- a/hw/irq.c > +++ b/hw/irq.c > @@ -26,19 +26,27 @@ > > struct IRQState { > qemu_irq_handler handler; > + qemu_irq_fb_handler feedback_handler; > void *opaque; > int n; > }; > > -void qemu_set_irq(qemu_irq irq, int level) > +int qemu_set_irq(qemu_irq irq, int level) > { > - if (!irq) > - return; > - > - irq->handler(irq->opaque, irq->n, level); > + if (!irq) { > + return 0; > + } > + if (irq->feedback_handler) { > + return irq->feedback_handler(irq->opaque, irq->n, level); > + } else { > + irq->handler(irq->opaque, irq->n, level); > + return QEMU_IRQ_DELIVERED; > + } > } > > -qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) > +static qemu_irq *allocate_irqs(qemu_irq_handler handler, > + qemu_irq_fb_handler feedback_handler, > + void *opaque, int n) > { > qemu_irq *s; > struct IRQState *p; > @@ -48,6 +56,7 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void > *opaque, int n) > p = (struct IRQState *)qemu_mallocz(sizeof(struct IRQState) * n); > for (i = 0; i < n; i++) { > p->handler = handler; > + p->feedback_handler = feedback_handler; > p->opaque = opaque; > p->n = i; > s[i] = p; > @@ -56,22 +65,33 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, > void *opaque, int n) > return s; > } > > +qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) > +{ > + return allocate_irqs(handler, NULL, opaque, n); > +} > + > +qemu_irq *qemu_allocate_feedback_irqs(qemu_irq_fb_handler handler, > + void *opaque, int n) > +{ > + return allocate_irqs(NULL, handler, opaque, n); > +} > + > void qemu_free_irqs(qemu_irq *s) > { > qemu_free(s[0]); > qemu_free(s); > } > > -static void qemu_notirq(void *opaque, int line, int level) > +static int qemu_notirq(void *opaque, int line, int level) > { > struct IRQState *irq = opaque; > > - irq->handler(irq->opaque, irq->n, !level); > + return qemu_set_irq(irq, !level); > } > > qemu_irq qemu_irq_invert(qemu_irq irq) > { > /* The default state for IRQs is low, so raise the output now. */ > qemu_irq_raise(irq); > - return qemu_allocate_irqs(qemu_notirq, irq, 1)[0]; > + return allocate_irqs(NULL, qemu_notirq, irq, 1)[0]; > } > diff --git a/hw/irq.h b/hw/irq.h > index 5daae44..eee03e6 100644 > --- a/hw/irq.h > +++ b/hw/irq.h > @@ -3,15 +3,18 @@ > > /* Generic IRQ/GPIO pin infrastructure. */ > > -/* FIXME: Rmove one of these. */ > +#define QEMU_IRQ_DELIVERED 0 > +#define QEMU_IRQ_COALESCED (-1) > +#define QEMU_IRQ_MASKED (-2) > + > typedef void (*qemu_irq_handler)(void *opaque, int n, int level); > -typedef void SetIRQFunc(void *opaque, int irq_num, int level); > +typedef int (*qemu_irq_fb_handler)(void *opaque, int n, int level); > > -void qemu_set_irq(qemu_irq irq, int level); > +int qemu_set_irq(qemu_irq irq, int level); > > -static inline void qemu_irq_raise(qemu_irq irq) >