Thomas Gleixner wrote:
On Wed, 2007-03-07 at 17:01 -0800, Daniel Arai wrote:
But more importantly, we want a kernel that can run both on native hardware and
in a paravirtualized environment. Linux doesn't really provide abstractions for
replacing the appropriate code. We tried to hook into the source code at a
level that seemed possible.
Again. You just refuse to change your implementation and you want to
keep it by arguing how hard it is because there are no abstractions.
It is no longer possible to change our _hypervisor_ implementation. The
Linux side of our code is entirely flexible, and we are trying to change
it, but it hasn't always been clear what you want us to do.
Your prayer wheel argument of missing abstractions and easiness of
emulating things is annoying. If you think it is better to emulate APIC,
please emulate it without paravirt ops. If you want the speed
improvement, work with us to create the interfaces and abstractions
which are necessary to have a sane, maintainable and useful for all
hypervisors implementation.
That's what we are doing. Our prayer wheel would be easier appeased if
you actually told us which parts of the VMI timer you objected to. As I
understand it now:
1) We should not call into external functions in other time sources; any
common code should be merged up
2) We should not be using global_clock_event; it is a horrible hack
which you want to remove
3) We should not use the smp_apic_timer_interrupt assembly code which
calls up to the lapic timer handlers
4) We should not add our own assembly code to call out to a local timer
handler (from Ingo)
These last two points create a conflict which is a little tricky to
solve. We can't add our own custom timer handler, and we can't re-use
the APIC timer handler. But there is no timer handler available on i386
that works, since the handlers will fall back to either PIC or IO-APIC
edge handling. Using either of those for the local timer interrupt on
SMP does not work because they assume traditional IRQ semantics - an IRQ
raised from the bus should be serviced by one processor. Re-raises of
the same IRQ on remote processors are locked out by the handler, and
dropped. Thus simultaneous local timers firing on multiple CPUs cause
only one to be serviced.
This does not work for local timer interrupts in NO_HZ mode, because
they must always be serviced so that they can reschedule the next local
timer. I have a proposed solution to this issue, but it fails to work
when the IO-APIC assumes control of all IRQs based on ACPI results
(which we control, but can't change because of compatibility issues with
other operating systems).
My proposal is to keep IRQ-0 as the timer interrupt, on all CPUs, but
fire it from the LAPIC after local apic timers get initialized. We
would do this by converting the irq handler using set_irq_handler(0,
handle_percpu_irq). The only problem is the IO-APIC code will want to
take over IRQ0 and convert it to an edge triggered IO-APIC interrupt.
But for the local irq handlers to work, we have to keep them using the
handle_percpu_irq handler, and can't let the IO-APIC steal these
vectors. There is no way to do conditionally for just a specific set of
IRQs in tree today, so we would need to add a special case to io_apic.c
to allow early boot code to reserve specific vectors so they are not
subsumed by the IO-APIC. This seems reasonable, but is a special case.
If, on the other hand, we are allowed to use our own assembly code to
call out to our local timer handler (dropping constraint #4 above), we
can simply rewire LOCAL_TIMER_VECTOR to point to this code, but now we
must emulate the semantics of irq_enter / leave / etc inside our code,
which is also not the cleanest solution. We used to do this, and it
caught flak I believe from Ingo.
The basic problem is that a local IRQ doesn't behave like a global IRQ,
and the i386 backend is unaware of how to set up any local IRQs except
in the case of local APIC, but you have told us we should not re-use the
APIC handlers by overloading global_clock_event. The patches we sent
out recently did just this, but seemed to meet even more violence than
our previous way of doing things.
So the question is, which approach do you prefer?
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/