Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-04-12 Thread David Miller
From: Paul Mackerras <[EMAIL PROTECTED]>
Date: Wed, 21 Mar 2007 11:03:14 +1100

> Linus Torvalds writes:
> 
> > We should just do this natively. There's been several tests over the years 
> > saying that it's much more efficient to do sti/cli as a simple store, and 
> > handling the "oops, we got an interrupt while interrupts were disabled" as 
> > a special case.
> > 
> > I have this dim memory that ARM has done it that way for a long time 
> > because it's so expensive to do a "real" cli/sti.
> > 
> > And I think -rt does it for other reasons. It's just more flexible.
> 
> 64-bit powerpc does this now as well.

I was curious about this so I had a look.

There appears to be three pieces of state used to manage this
on powerpc, PACASOFTIRQEN(r13), PACAHARDIRQEN(r13) and the
SOFTE() in the stackframe.

Plus there is all of this complicated logic on trap entry and
exit to manage these three values properly.

local_irq_restore() doesn't look like a simple piece of code
either.  Logically it should be simple, update the software
binary state, and if enabling see if any interrupts came in
while we were disable so we can run them.

Given all of that, is it really cheaper than just flipping the
bit in the cpu control register? :-/
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andrew Morton
On Tue, 20 Mar 2007 21:23:52 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote:

> Well it causes additional problems. We had some cases where it was really
> hard to distingush garbage and the true call chain.

yes, for some reason the naive backtraces seem to have got messier and messier 
over
the years and some of them are really quite hard to piece together nowadays.

An accurate backtrace would have some value, if we can get it bullet-proof.

The fault-injection driver wants it too.  And lockdep, I guess.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Zachary Amsden
Linus Torvalds wrote:
> On Tue, 20 Mar 2007, Zachary Amsden wrote:
>   
>> Actually, I was thinking the irq handlers would just not mess around with
>> eflags on the stack, just call the chip to ack the interrupt and re-enable
>> hardware interrupts when they left, since that is free anyway with the iret.
>> 
>
> No can do. Think level-triggered. You *need* to disable the interrupt, and 
> disabling it at the CPU is the easiest approach. Even so, you need to 
> worry about SMP and screaming interrupts at all CPU's, but if you don't 
> ack it to the IO-APIC until later, that should be ok (alternatively, you 
> need to just mask-and-ack the irq controller).
>   

Well, you can keep it masked, but a more important point is that I've 
entirely neglected local interrupts.  This might work for IRQs, but for 
local timer or thermal or IPIs, using the tasklet based replay simply 
will not work.

> One of the advantages of doing that is that you only ever have a queue of 
> one single entry, which then makes it easier to do the replay.
>   

Yes.  Unfortunately now both do_IRQ and all the smp_foo interrupt 
handlers need to detect and queue for replay, but fortunately they all 
have the interrupt number conveniently on the stack.

Zach
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Zachary Amsden wrote:
> 
> Actually, I was thinking the irq handlers would just not mess around with
> eflags on the stack, just call the chip to ack the interrupt and re-enable
> hardware interrupts when they left, since that is free anyway with the iret.

No can do. Think level-triggered. You *need* to disable the interrupt, and 
disabling it at the CPU is the easiest approach. Even so, you need to 
worry about SMP and screaming interrupts at all CPU's, but if you don't 
ack it to the IO-APIC until later, that should be ok (alternatively, you 
need to just mask-and-ack the irq controller).

> Maybe leaving irqs disabled is better.

One of the advantages of doing that is that you only ever have a queue of 
one single entry, which then makes it easier to do the replay.

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Zachary Amsden
Linus Torvalds wrote:
> On Tue, 20 Mar 2007, Zachary Amsden wrote:
>   
>> void local_irq_restore(int enabled)
>> {
>>pda.intr_mask = enabled;
>>/*
>> * note there is a window here where softirqs are not processed by
>> * the interrupt handler, but that is not a problem, since it will
>> * get done here in the outer enable of any nested pair.
>> */
>>if (enabled)
>>local_bh_enable();
>> }
>> 
>
> Actually, this one is more complicated. You also need to actually enable 
> hardware interrupts again if they got disabled by an interrupt actually 
> occurring while the "soft-interrupt" was disabled.
>   

Actually, I was thinking the irq handlers would just not mess around 
with eflags on the stack, just call the chip to ack the interrupt and 
re-enable hardware interrupts when they left, since that is free anyway 
with the iret.  Maybe leaving irqs disabled is better.

> Anyway, it really *should* be pretty damn simple. No need to disable 
> preemption, there should be no events that can *cause* it, since all 
> interrupts get headed off at the pass.. (the return-from-interrupt thng 
> should already notice that it's returning to an interrupts-disabled 
> section and not try to do any preemption).
>   
> What did I miss?
>   

I wasn't disabling preemption to actually disable preemption.  I was 
just using bh_disable as a global hammer to stop softirqs (thus the irq 
replay tasklet) from running during the normal irq_exit path.  Then, we 
can just use the existing software IRQ replay code, and I think barely 
any new code (queue_irq(), etc) has to be written.

Zach
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Zachary Amsden wrote:
> 
> void local_irq_restore(int enabled)
> {
>pda.intr_mask = enabled;
>/*
> * note there is a window here where softirqs are not processed by
> * the interrupt handler, but that is not a problem, since it will
> * get done here in the outer enable of any nested pair.
> */
>if (enabled)
>local_bh_enable();
> }

Actually, this one is more complicated. You also need to actually enable 
hardware interrupts again if they got disabled by an interrupt actually 
occurring while the "soft-interrupt" was disabled.

But since it's all a local-cpu issue, you can do things like test 
cpu-local memory flags for whetehr that has happened or not.

So it *should* be something as simple as

local_irq_disable()
{
pda.irq_enable = 0;
}

handle_interrupt()
{
if (!pda.irq_enable) {
pda.irq_queued = 1;
queue_interrupt();
.. make sure we return with hardirq's now 
   disabled: just clear IF in the pt_regs ..
return;
}
.. normal ..
}

local_irq_enable()
{
pda.irq_enable = 1;
barrier();
/* Common case - nothing happened while we were fake-disabled.. 
*/
if (!pda.irq_queued)
return; 

/* Ok, actually handle the things! */
handle_queued_irqs();

/*
 * And enable the hw interrupts again, they got disabled 
 * when we were queueing stuff.. 
 */
hardware_sti();
}

but I haven't really gone over it in any detail, I may have missed 
something really obvious.

Anyway, it really *should* be pretty damn simple. No need to disable 
preemption, there should be no events that can *cause* it, since all 
interrupts get headed off at the pass.. (the return-from-interrupt thng 
should already notice that it's returning to an interrupts-disabled 
section and not try to do any preemption).

What did I miss?

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
> On Tue, Mar 20, 2007 at 09:31:58AM -0700, Jeremy Fitzhardinge wrote:
>   
>> Linus Torvalds wrote:
>> 
>>> On Tue, 20 Mar 2007, Eric W. Biederman wrote:
>>>   
>>>   
 If that is the case.  In the normal kernel what would
 the "the oops, we got an interrupt code do?"
 I assume it would leave interrupts disabled when it returns?
 Like we currently do with the delayed disable of normal interrupts?
 
 
>>> Yeah, disable interrupts, and set a flag that the fake "sti" can test, and 
>>> just return without doing anything.
>>>
>>> (You may or may not also need to do extra work to Ack the hardware 
>>> interrupt etc, which may be irq-controller specific. Once the CPU has 
>>> accepted the interrupt, you may not be able to just leave it dangling)
>>>   
>>>   
>> So it would be something like:
>>
>> pda.intr_mask = 1;   /* disable interrupts */
>> ...
>> pda.intr_mask = 0;   /* enable interrupts */
>> if (xchg(&pda.intr_pending, 0))  /* check pending */
>>  asm("sti"); /* was pending; isr left cpu interrupts masked 
>> */
>> 
>
> I don't know that you need an xchg there. If you're still on the same
> CPU, it should all be nice and causal even across an interrupt handler.
> So it could be:
>
>pda.intr_mask = 0; /* intr_pending can't get set after this */
>if (unlikely(pda.intr_pending)) {
>   pda.intr_pending = 0;
>   asm("sti");
>}
>
> (This would actually need a C barrier, but I'll ignore that as this'd
> end up being asm...)
>
> But other interesting things could happen. If we never did a real CLI
> and we get preempted and switched to another CPU between clearing
> intr_mask and checking intr_pending, we get a little confused. 
>   

Could prevent preempt if pda.intr_mask is set.  preemptible() is defined as:

# define preemptible()(preempt_count() == 0 && !irqs_disabled())

anyway, so that would be changed to look at the intr_mask rather than
eflags.
(I'm not sure if preemptible() is actually used to determine whether
preempt or not).

Alternatively, the intr_mask could be encoded in a bit of preempt_count...

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Zachary Amsden
Jeremy Fitzhardinge wrote:
> Zachary Amsden wrote:
>   
>> I think Jeremy's idea was to have interrupt handlers leave interrupts
>> disabled on exit if pda.intr_mask was set.  In which case, they would
>> bypass all work and we could never get preempted.
>> 
>
> Yes, I was worried that if we left the isr without actually handling the
> interrupt, it would still be asserted and we'd just get interrupted
> again.  The idea is that we avoid touching cli/sti for the common case
> of no interrupts while interrupts are disabled, but we'd still need to
> fall back to using them if an interrupt becomes pending.
>
>   
>> I don't think leaving hardware interrupts disabled for such a long
>> time is good though. 
>> 
>
> How long?  It would be no longer than now, and possibly less, wouldn't it?
>   

Hmm.  Perhaps.  Something about the asymmetry bothers me alot though.

Zach

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Paul Mackerras
Linus Torvalds writes:

> We should just do this natively. There's been several tests over the years 
> saying that it's much more efficient to do sti/cli as a simple store, and 
> handling the "oops, we got an interrupt while interrupts were disabled" as 
> a special case.
> 
> I have this dim memory that ARM has done it that way for a long time 
> because it's so expensive to do a "real" cli/sti.
> 
> And I think -rt does it for other reasons. It's just more flexible.

64-bit powerpc does this now as well.

Paul.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Matt Mackall
On Tue, Mar 20, 2007 at 03:08:19PM -0800, Zachary Amsden wrote:
> Matt Mackall wrote:
> >I don't know that you need an xchg there. If you're still on the same
> >CPU, it should all be nice and causal even across an interrupt handler.
> >So it could be:
> >
> >   pda.intr_mask = 0; /* intr_pending can't get set after this */
> >  
> 
> Why not?  Oh, I see.  intr_mask is inverted form of EFLAGS_IF.

It's not even that. There are two things that can happen:

case 1:

  intr_mask = 1;

  intr_mask = 0;
  /* intr_pending is already set and CLI is in effect */
  if(intr_pending)

case 2:

  intr_mask = 1;
  intr_mask = 0;

  /* intr_pending remains cleared */
  if(intr_pending)

As this is all about local interrupts, it's all on a single CPU and
out of order issues aren't visible..
 
> >(This would actually need a C barrier, but I'll ignore that as this'd
> >end up being asm...)

..unless the compiler is doing the reordering, of course.

> >But other interesting things could happen. If we never did a real CLI
> >and we get preempted and switched to another CPU between clearing
> >intr_mask and checking intr_pending, we get a little confused. 
> 
> I think Jeremy's idea was to have interrupt handlers leave interrupts 
> disabled on exit if pda.intr_mask was set.  In which case, they would 
> bypass all work and we could never get preempted.

I was actually worrying about the case where the interrupt came in
"late". But I don't think it's a problem there either.

> I don't think leaving 
> hardware interrupts disabled for such a long time is good though.

It can only be worse than the current situation by the amount of time
it takes to defer an interrupt once. On average, it'll be a lot
better as most critical sections are -tiny-.

-- 
Mathematics is the supreme nostalgia of our time.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Andi Kleen wrote:
> 
> Linus is worried about the unwinder crashing -- that wouldn't help with that.

And to make it clear: this is not a theoretical worry. It happened many 
times over the months the unwinder was in. 

It was supposed to help debugging, but it made bugs that *would* have been 
nicely debuggable without it into nightmares. So the only reason for it 
existing in the first place was actually the thing that made it not work.

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Andi Kleen wrote:

> On Tue, Mar 20, 2007 at 11:49:39AM -0700, Linus Torvalds wrote:
> > 
> > the thing is, I'd rather see a long backtrace that is hard to decipher but 
> > that *never* *ever* causes any additional problems, over a pretty one.
> 
> Well it causes additional problems. We had some cases where it was really
> hard to distingush garbage and the true call chain. I can probably dig
> out some examples if you want.

Well, by "additional problems" _I_ mean things like "a warning turned into 
a fatal oops and didn't get logged at all".

That's a lot more serious than "there were a few extra entries in the 
traceback that caused us some confusion".

And yes, we had exactly that case happen several times.

> With lots of call backs (e.g. common with sysfs) it is also frequently
> not obvious how the call chains are supposed to go.

With callbacks, it's actually often nice to see the callback data that is 
on the stack (and it's very obvious from the "" ksymtab 
explanation: you can't have a  that is anything but a callback 
pointer (since it isn't a return address).

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> I think Jeremy's idea was to have interrupt handlers leave interrupts
> disabled on exit if pda.intr_mask was set.  In which case, they would
> bypass all work and we could never get preempted.

Yes, I was worried that if we left the isr without actually handling the
interrupt, it would still be asserted and we'd just get interrupted
again.  The idea is that we avoid touching cli/sti for the common case
of no interrupts while interrupts are disabled, but we'd still need to
fall back to using them if an interrupt becomes pending.

> I don't think leaving hardware interrupts disabled for such a long
> time is good though. 

How long?  It would be no longer than now, and possibly less, wouldn't it?

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Zachary Amsden
Matt Mackall wrote:
> I don't know that you need an xchg there. If you're still on the same
> CPU, it should all be nice and causal even across an interrupt handler.
> So it could be:
>
>pda.intr_mask = 0; /* intr_pending can't get set after this */
>   

Why not?  Oh, I see.  intr_mask is inverted form of EFLAGS_IF.

>if (unlikely(pda.intr_pending)) {
>   pda.intr_pending = 0;
>   asm("sti");
>}
>
> (This would actually need a C barrier, but I'll ignore that as this'd
> end up being asm...)
>
> But other interesting things could happen. If we never did a real CLI
> and we get preempted and switched to another CPU between clearing
> intr_mask and checking intr_pending, we get a little confused. 
>   

I think Jeremy's idea was to have interrupt handlers leave interrupts 
disabled on exit if pda.intr_mask was set.  In which case, they would 
bypass all work and we could never get preempted.  I don't think leaving 
hardware interrupts disabled for such a long time is good though.

> But perhaps that doesn't matter because we'd by definition have no
> pending interrupts on either processor?
>
> Is it expensive to do an STI if interrupts are already enabled?
>   

Yes.

Zach
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Matt Mackall
On Tue, Mar 20, 2007 at 09:31:58AM -0700, Jeremy Fitzhardinge wrote:
> Linus Torvalds wrote:
> > On Tue, 20 Mar 2007, Eric W. Biederman wrote:
> >   
> >> If that is the case.  In the normal kernel what would
> >> the "the oops, we got an interrupt code do?"
> >> I assume it would leave interrupts disabled when it returns?
> >> Like we currently do with the delayed disable of normal interrupts?
> >> 
> >
> > Yeah, disable interrupts, and set a flag that the fake "sti" can test, and 
> > just return without doing anything.
> >
> > (You may or may not also need to do extra work to Ack the hardware 
> > interrupt etc, which may be irq-controller specific. Once the CPU has 
> > accepted the interrupt, you may not be able to just leave it dangling)
> >   
> 
> So it would be something like:
> 
> pda.intr_mask = 1;/* disable interrupts */
> ...
> pda.intr_mask = 0;/* enable interrupts */
> if (xchg(&pda.intr_pending, 0))   /* check pending */
>   asm("sti"); /* was pending; isr left cpu interrupts masked 
> */

I don't know that you need an xchg there. If you're still on the same
CPU, it should all be nice and causal even across an interrupt handler.
So it could be:

   pda.intr_mask = 0; /* intr_pending can't get set after this */
   if (unlikely(pda.intr_pending)) {
  pda.intr_pending = 0;
  asm("sti");
   }

(This would actually need a C barrier, but I'll ignore that as this'd
end up being asm...)

But other interesting things could happen. If we never did a real CLI
and we get preempted and switched to another CPU between clearing
intr_mask and checking intr_pending, we get a little confused. 

But perhaps that doesn't matter because we'd by definition have no
pending interrupts on either processor?

Is it expensive to do an STI if interrupts are already enabled?

-- 
Mathematics is the supreme nostalgia of our time.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Rusty Russell
On Tue, 2007-03-20 at 09:58 -0600, Eric W. Biederman wrote:
> Looking at the above code snippet.  I guess it is about time to
> merge our per_cpu and pda variables...

Indeed, thanks for the prod.  Now 2.6.21-rc4-mm1 is out, I'll resend the
patches.

Cheers,
Rusty.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Zachary Amsden
Jeremy Fitzhardinge wrote:

>> Yeah, disable interrupts, and set a flag that the fake "sti" can test, and 
>> just return without doing anything.
>>
>> (You may or may not also need to do extra work to Ack the hardware 
>> interrupt etc, which may be irq-controller specific. Once the CPU has 
>> accepted the interrupt, you may not be able to just leave it dangling)
>>   
>> 
>
> So it would be something like:
>
> pda.intr_mask = 1;/* disable interrupts */
> ...
> pda.intr_mask = 0;/* enable interrupts */
> if (xchg(&pda.intr_pending, 0))   /* check pending */
>   

Well, can't do xchg, since it implies #LOCK, and you'll lose more than 
you gain on the processors where it matters.  Cmpxchg is fine, but 
processor dependent.

Or, just make the interrupt handlers use software resend for IRQs when 
pda.intr_mask is set to zero.  Now, local_irq_save / restore are very 
pretty:

int local_irq_save(void)
{
   int tmp = pda.intr_mask;
   pda.intr_mask = 0;
   /*
* note there is a window here where local IRQs notice intr_mask == 0
* in that case, they will attempt to resend the IRQ via a tasklet,
* and will succeed, albeit through a slightly longer path
*/
   local_bh_disable();
   return tmp;
}

void local_irq_restore(int enabled)
{
pda.intr_mask = enabled;
/*
 * note there is a window here where softirqs are not processed by
 * the interrupt handler, but that is not a problem, since it will
 * get done here in the outer enable of any nested pair.
 */
if (enabled)
local_bh_enable();
}

I think Ingo's suggestion of using the hardirq tracing is another way 
that could work, but it seems to be too heavyweight and tied too much to 
the lockdep verification code - plus it inserts additional 
raw_irq_disable in places that seem counter to the goal of getting rid 
of them in the first place.  Perhaps I'm misunderstanding the code, though.

Zach
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andi Kleen
On Tue, Mar 20, 2007 at 09:39:18PM +, Alan Cox wrote:
> > > Because that's really the issue: do you want a "pretty" backtrace, or do 
> > > you want one that is rock solid but has some crud in it.
> > 
> > I just want an as exact backtrace as possible. I also think
> > that we can make the unwinder robust enough.
> 
> Any reason you can't put the exact back trace in "[xxx]" and the ones we
> see on the stack which dont look like call trace as ?xxx? It makes the
> code a bit trickier but we depend on the quality of traces

Linus is worried about the unwinder crashing -- that wouldn't help with that.

What the (now out of tree) unwinder does is to check if it finishes
the trace and if not fall back to the old unwinder.

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Alan Cox
> > Because that's really the issue: do you want a "pretty" backtrace, or do 
> > you want one that is rock solid but has some crud in it.
> 
> I just want an as exact backtrace as possible. I also think
> that we can make the unwinder robust enough.

Any reason you can't put the exact back trace in "[xxx]" and the ones we
see on the stack which dont look like call trace as ?xxx? It makes the
code a bit trickier but we depend on the quality of traces

Alan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andi Kleen
> Worth it on 32-bit.  On AMD64, probably not.  On Intel 64-bit, maybe, 
> but less important than in P4 days.

Well most of Intel 64bit is P4 -- and Intel is still shipping
millions more of them each quarter.

> This could change character completely if used at the tail of a function 
> where you now have
> 
> sti; 1: ret
> 
> Which generates an interrupt holdoff on the ret, an unusual thing to do.

Unusual yes, but I don't see how it should cause problems. Or do you
have anything specific in mind? 

Worse is probably that on K8 this case might cause a pipeline stall
if there isn't a prefix in front of the ret.

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Zachary Amsden
Andi Kleen wrote:
> One thing I was pondering was to replace the expensive popfs with
>
> bt  $IF,(%rsp) 
> jnc 1f 
> sti
> 1: 
>
> But would be mostly a P4 optimization again and I'm not 100% sure it is
> worth it.
>   

Worth it on 32-bit.  On AMD64, probably not.  On Intel 64-bit, maybe, 
but less important than in P4 days.

This could change character completely if used at the tail of a function 
where you now have

sti; 1: ret

Which generates an interrupt holdoff on the ret, an unusual thing to do.

Zach
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andi Kleen
> But I was throwing that out as a long-term thing. I'm not claiming it's 
> trivial, but it should be doable.

One thing I was pondering was to replace the expensive popfs with

bt  $IF,(%rsp) 
jnc 1f 
sti
1: 

But would be mostly a P4 optimization again and I'm not 100% sure it is
worth it.

-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andi Kleen
On Tue, Mar 20, 2007 at 11:49:39AM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 20 Mar 2007, Andi Kleen wrote:
> > 
> > So what is your proposed alternative to handle long backtraces? 
> > You didn't answer that question. Please do, I'm curious about your thoughts
> > in this area.
> 
> the thing is, I'd rather see a long backtrace that is hard to decipher but 
> that *never* *ever* causes any additional problems, over a pretty one.

Well it causes additional problems. We had some cases where it was really
hard to distingush garbage and the true call chain. I can probably dig
out some examples if you want.

With lots of call backs (e.g. common with sysfs) it is also frequently
not obvious how the call chains are supposed to go.

I would have agreed with you in the 2.4 time frame. It was also
always fun to explain this to a newbie with oopses and watch their
face when they secretly think you're crazy ;-) But code is getting
much and more complex unfortunately and I had a few cases where
it was really ugly to sort them out. 

> Because that's really the issue: do you want a "pretty" backtrace, or do 
> you want one that is rock solid but has some crud in it.

I just want an as exact backtrace as possible. I also think
that we can make the unwinder robust enough.

> I'll take the rock solid one any day. Especially as even the pretty one 
> won't fix the most common problem, which is "I don't see the caller" (due 
> to inlining or tail-calls).

The problem is not one or a few levels of calls. That can be always figured
out from the source. The problem is when you have a double
digit number of them with non obvious (dynamic indirect) dependencies.  Yes it 
can be figured out too, but it is a long and very annoying process
with lots of grepping etc. and even then you sometimes can't be 100% 
sure in some cases. 

It's also a mechanic process that just cries to be done by a machine instead.

> In contrast, the ugly backtrace will have some "garbage entries" from 
> previous frames that didn't get overwritten, but there have actually been 

If it was only a few that would be great ...

-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Andi Kleen wrote:
> 
> So what is your proposed alternative to handle long backtraces? 
> You didn't answer that question. Please do, I'm curious about your thoughts
> in this area.

the thing is, I'd rather see a long backtrace that is hard to decipher but 
that *never* *ever* causes any additional problems, over a pretty one.

Because that's really the issue: do you want a "pretty" backtrace, or do 
you want one that is rock solid but has some crud in it.

I'll take the rock solid one any day. Especially as even the pretty one 
won't fix the most common problem, which is "I don't see the caller" (due 
to inlining or tail-calls).

In contrast, the ugly backtrace will have some "garbage entries" from 
previous frames that didn't get overwritten, but there have actually been 
(admittedly rare) cases where those garbage entries have given hints about 
what happened just before.

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andi Kleen
On Tue, Mar 20, 2007 at 10:27:00AM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 20 Mar 2007, Andi Kleen wrote:
> > 
> > The code never did that. In fact many of the problems we had initially
> > especially came out of that -- the fallback code that would handle
> > this case wasn't fully correct.
> 
> I don't keep my emails any more, but you *never* fixed the problems in 
> arch/*/kernel/traps.c.

I fixed that one after you dropped it (hmm, double checking: or at least
I thought I had fixed it, but don't see the code right now; will redo then) 
Basically it was just a one liner anyways - always check against all the
stacks that are there.


> Yes, the kernel/unwind.c issues generally got fixed. The infinite loops in 
> the *callers* never did.

There was later a weaker form that should have caught most loops, but admittedly
it wasn't 100% bullet-proof with exception stacks.

> 
> > Also frankly often your analysis about what went wrong was just
> > incorrect.
> 
> Still in denial, I see.
> 
> Do you still claim that "the fallback position always did the right 
> thing"

No initially it was buggy and that caused several of the crashes.

> Despite the fact that the unwinder had sometimes *corrupted* the 
> incoming information so much that the fallback position was the one that 
> oopsed? And no, you didn't fix that.

No, it oopsed because it was broken by itself.
Anyways that got fixed quickly.

> 
> And no, IT DID NOT use probe_kernel_address like you still claim.

There definitely was a patch that made it use it. You might have
not merged it though.
 
> Anyway, you work for Suse, I don't care what you do to the Suse kernel. 
> Maybe it will get stable some day. Somehow, I doubt it. 

So what is your proposed alternative to handle long backtraces? 
You didn't answer that question. Please do, I'm curious about your thoughts
in this area.

-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Andi Kleen wrote:
> 
> The code never did that. In fact many of the problems we had initially
> especially came out of that -- the fallback code that would handle
> this case wasn't fully correct.

I don't keep my emails any more, but you *never* fixed the problems in 
arch/*/kernel/traps.c.

Yes, the kernel/unwind.c issues generally got fixed. The infinite loops in 
the *callers* never did.

> Also frankly often your analysis about what went wrong was just
> incorrect.

Still in denial, I see.

Do you still claim that "the fallback position always did the right 
thing"? Despite the fact that the unwinder had sometimes *corrupted* the 
incoming information so much that the fallback position was the one that 
oopsed? And no, you didn't fix that.

And no, IT DID NOT use probe_kernel_address like you still claim.

Anyway, you work for Suse, I don't care what you do to the Suse kernel. 
Maybe it will get stable some day. Somehow, I doubt it. 

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andi Kleen
On Tue, Mar 20, 2007 at 09:52:42AM -0700, Linus Torvalds wrote:
> The ones I reported were all about trusting the stack contents implicitly, 

The latest version added the full double check you wanted - every
new RSP is validated against the current stack or the exception stacks.

> and assuming that the unwind info was there and valid. Using things like 

The code never did that. In fact many of the problems we had initially
especially came out of that -- the fallback code that would handle
this case wasn't fully correct.

> "__get_user()" didn't fix it, because if a WARN_ON() happened while we 
> held the mm semaphore and the unwind info was bogus, it would take a 
> page-fault and deadlock.

That was fixed too even before you dropped it by using probe_kernel_address()

> And I told you guys this. Over *months*. And you ignored me. You told me 
> everything was fine. Each time, somebody else ended up reporting a hang 
> where the unwinder was at fault. 

That was because most of the bugs were reported many times duplicated --
and reports from older releases kept streaming in even after the fix
went in.

Also frankly often your analysis about what went wrong was just
incorrect.
 
> And you clearly *still* haven't accepted the fact that the code was buggy. 

There were some bugs, but they were all fixed. Or at least I hope -- 
we will find out during testing.

> Does anybody wonder why I wouldn't merge it back?

So do you have an alternative to the unwinder? Don't tell me 
"real men decode 30+ entry long stack traces with lots of callbacks
by hand". Or do you prefer to use frame pointers everywhere? 

-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> I have this dim memory that ARM has done it that way for a long time 
> because it's so expensive to do a "real" cli/sti.
> 
> And I think -rt does it for other reasons. It's just more flexible.

-rt doesnt wrap cli/sti anymore: spin_lock_irq*() doesnt disable irq 
flags on -rt, thus the amount of real irqs-off sections is very small 
and reviewable.

But nevertheless an incarnation of the code survived and is upstream 
already, in the form of TRACE_IRQFLAGS lockdep code ;) This implements a 
soft hardirq flag _today_: all that would be needed is for Xen to define 
raw_local_irq_disable() as a NOP, and to use the 
current->hardirqs_enabled as 'soft IRQ-off flag'.

Note that ->hardirqs_enabled is self-maintained, i.e. it's not just a 
stupid shadow of the hardirq flag, it's an independently maintained flag 
that does not rely on the existence of the hard flag.

[ this code even has its own debugging code, so out-of-sync-flags,
  double-off and double-on is detected and complained about. So all the 
  hard stuff has already been done as part of lockdep, and it's even 
  long-term maintainable because under a native lockdep kernel we check 
  the soft flag against the hard flag. ]

so i think a soft cli/sti flag support should be merged/unified with the 
trace_hardirqs_on()/trace_hardirqs_off() code.

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Andi Kleen wrote:
> 
> No, me and Jan fixed all reported bugs as far as I know.

No you did not. You didn't fix the ones I reported. Which is why it got 
removed, and will not get added back until there is another maintainer.

The ones I reported were all about trusting the stack contents implicitly, 
and assuming that the unwind info was there and valid. Using things like 
"__get_user()" didn't fix it, because if a WARN_ON() happened while we 
held the mm semaphore and the unwind info was bogus, it would take a 
page-fault and deadlock.

Those kinds of things are not acceptable for debugging output. If I cannot 
use WARN_ON() because I hold the MM lock and I'm afraid there might be 
kernel corruption, then something is *wrong*!

And I told you guys this. Over *months*. And you ignored me. You told me 
everything was fine. Each time, somebody else ended up reporting a hang 
where the unwinder was at fault. And since I couldn't trust the 
maintainers to fix it, removing the broken feature that only caused more 
problems than it fixed was the only option.

And you clearly *still* haven't accepted the fact that the code was buggy. 

Does anybody wonder why I wouldn't merge it back?

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andi Kleen
On Tue, Mar 20, 2007 at 10:25:20AM -0600, Eric W. Biederman wrote:
> What I recall observing is call traces that made no sense.  Not just
> extra noise in the stack trace but things like seeing a function that
> has exactly one path to it, and not seeing all of the functions on
> that path in the call trace.

That's tail call/sibling call optimization. No unwinder can untangle that
because the return address is lost.  But it's also an quite important 
optimization.

> >
> > In 2.4 it was often very reasonable to just sort out the false positives,
> > but with sometimes 20-30+ level deep call chains in 2.6 with many callbacks 
> > that
> > just
> > gets far too tenuous. 
> 
> Hmm.  I haven't seen those traces, but I wonder if the size of those
> stack traces indicates potential stack overflow problems.

Most functions have quite small frames, so 20-30 is still not a problem

> Do you also validate the unwind data?

There are many sanity checks in the unwind code and it will fall back
to the old unwinder when it gets stuck.

> 
> > Although in future it would be good if people did some more analysis in root
> > causes for failures before let the paranoia take over and revert patches.
> >
> > We see a good example here of what I call the JFS/ACPI effect: code gets 
> > merged
> > too early with some visible problems. It gets a bad name and afterwards 
> > people
> > never look objectively at it again and just trust their prejudices. 
> 
> I don't know.  The impression I got was the root cause analysis stopped 
> when it was observed that the code was unsuitable for solving the problem.

No, me and Jan fixed all reported bugs as far as I know.

-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Jeremy Fitzhardinge
Linus Torvalds wrote:
> On Tue, 20 Mar 2007, Eric W. Biederman wrote:
>   
>> If that is the case.  In the normal kernel what would
>> the "the oops, we got an interrupt code do?"
>> I assume it would leave interrupts disabled when it returns?
>> Like we currently do with the delayed disable of normal interrupts?
>> 
>
> Yeah, disable interrupts, and set a flag that the fake "sti" can test, and 
> just return without doing anything.
>
> (You may or may not also need to do extra work to Ack the hardware 
> interrupt etc, which may be irq-controller specific. Once the CPU has 
> accepted the interrupt, you may not be able to just leave it dangling)
>   

So it would be something like:

pda.intr_mask = 1;  /* disable interrupts */
...
pda.intr_mask = 0;  /* enable interrupts */
if (xchg(&pda.intr_pending, 0)) /* check pending */
asm("sti"); /* was pending; isr left cpu interrupts masked 
*/
  

and in the interrupt handler:

if (pda.intr_mask) {
pda.intr_pending = 1;
regs->eflags &= ~IF;
maybe_ack_interrupt_controller();
iret
  

}

?

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Eric W. Biederman
Andi Kleen <[EMAIL PROTECTED]> writes:

>> I'm conflicted about the dwarf unwinder.  I was off doing other things
>> at the time so I missed the pain, but I do have a distinct recollection of
>> the back traces on x86_64 being distinctly worse the on i386. 
>
> The only case were i386 was better was with frame pointers, which
> was never fully implemented for x86-64. However i find that hilarious: 
> people are spending a lot of time right here in this thread to squeeze
> out the best call sequences for the paravirt ops, but then accept
> losing a full frame pointer register on i386. I never found that
> acceptable, that is why I prefered the unwinder instead. 
>
> This said the big problem with the frame pointers is mostly gone now:
> on older CPUs it tended to cause a pipeline stall early in the function.
> That is now fixed in the latest Intel/upcomming AMD CPUs, but there 
> are still millions and millions of older CPUs out there so I still
> don't consider it acceptable.

What I recall observing is call traces that made no sense.  Not just
extra noise in the stack trace but things like seeing a function that
has exactly one path to it, and not seeing all of the functions on
that path in the call trace.

In my later debugging I have been reasonably able to attribute those
kinds of things to compiler optimizations like inlining and tail call
optimization.

Now I will agree that having fewer or no false positives to weed
through is a good thing, if we can do it reliably.

>> Lately 
>> I haven't seen that so it may be I was misinterpreting what I was
>> seeing, and the compiler optimizations were what gave me such weird
>> back traces. 
>
> The main problem is that subsystems are getting more and more complex
> and especially callbacks seem to multiply far too quickly.
>
> In 2.4 it was often very reasonable to just sort out the false positives,
> but with sometimes 20-30+ level deep call chains in 2.6 with many callbacks 
> that
> just
> gets far too tenuous. 

Hmm.  I haven't seen those traces, but I wonder if the size of those
stack traces indicates potential stack overflow problems.
  
>> But if the quality of our backtraces has gone down and dwarf unwinder
>> could give us better back traces it is likely worth pursuing.  Of
>> course it would need to start with the assumption that it's tables
>> may be borked (the kernel is busted after all) and be much more
>> careful than Andi's last attempt.
>
> The latest version validates the stack always. It was only a few lines
> of change. I doubt it will make much difference though. The few true crashes
> we had were not actually due the unwinder itself, but the buggy fallback code
> (which were fixed quickly). But anyways it should satisfy everybody's paranoia
> now.

Do you also validate the unwind data?

> Although in future it would be good if people did some more analysis in root
> causes for failures before let the paranoia take over and revert patches.
>
> We see a good example here of what I call the JFS/ACPI effect: code gets 
> merged
> too early with some visible problems. It gets a bad name and afterwards people
> never look objectively at it again and just trust their prejudices. 

I don't know.  The impression I got was the root cause analysis stopped 
when it was observed that the code was unsuitable for solving the problem.
When asked about it, it appeared the developer did not understand the
question.  Therefore the root cause was assumed to be the developer.

At least that is how I have read the few little bits I have seen.

> But that's not a good strategy to get good code in the end I think. If there
> is enough evidence the early problems were fixed then prejudices should
> be reevaluated.

Certainly.  However if the developer has lost a certain amount of
initial trust, the burden becomes much higher.

Eric
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Jeremy Fitzhardinge
Eric W. Biederman wrote:
> Looking at the above code snippet.  I guess it is about time to
> merge our per_cpu and pda variables...

Rusty has a nice patch series to do just that; I think he's still
looking for a taker.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Chuck Ebbert
Eric W. Biederman wrote:
 
> I'm conflicted about the dwarf unwinder.  I was off doing other things
> at the time so I missed the pain, but I do have a distinct recollection of
> the back traces on x86_64 being distinctly worse the on i386.  Lately
> I haven't seen that so it may be I was misinterpreting what I was
> seeing, and the compiler optimizations were what gave me such weird
> back traces.  
> 

Well, if you compile x86_64 with frame pointers it helps a bit because
the compiler doesn't tail merge function calls. But the stack backtrace
ignores the frame pointers even if they're present, unlike i386 which
will use them.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Eric W. Biederman wrote:
> 
> If that is the case.  In the normal kernel what would
> the "the oops, we got an interrupt code do?"
> I assume it would leave interrupts disabled when it returns?
> Like we currently do with the delayed disable of normal interrupts?

Yeah, disable interrupts, and set a flag that the fake "sti" can test, and 
just return without doing anything.

(You may or may not also need to do extra work to Ack the hardware 
interrupt etc, which may be irq-controller specific. Once the CPU has 
accepted the interrupt, you may not be able to just leave it dangling)

But I was throwing that out as a long-term thing. I'm not claiming it's 
trivial, but it should be doable.

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Eric W. Biederman
Linus Torvalds <[EMAIL PROTECTED]> writes:

> On Mon, 19 Mar 2007, Jeremy Fitzhardinge wrote:
>> Actually, it still does need a temp register.  The sequence for cli is:
>> 
>> mov %fs:xen_vcpu, %eax
>> movb $1,1(%eax)
>
> We should just do this natively. There's been several tests over the years 
> saying that it's much more efficient to do sti/cli as a simple store, and 
> handling the "oops, we got an interrupt while interrupts were disabled" as 
> a special case.
>
> I have this dim memory that ARM has done it that way for a long time 
> because it's so expensive to do a "real" cli/sti.
>
> And I think -rt does it for other reasons. It's just more flexible.

If that is the case.  In the normal kernel what would
the "the oops, we got an interrupt code do?"
I assume it would leave interrupts disabled when it returns?
Like we currently do with the delayed disable of normal interrupts?

I'm trying to understand the proposed semantics.

Looking at the above code snippet.  I guess it is about time to
merge our per_cpu and pda variables...

Eric
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Mon, 19 Mar 2007, Jeremy Fitzhardinge wrote:
>
> Zachary Amsden wrote:
> > For VMI, the default clobber was "cc", and you need a way to allow at
> > least that, because saving and restoring flags is too expensive on x86.
> 
> According to lore (Andi, I think), asm() always clobbers cc. 

On x86, yes. Practically any instruction will clobber cc anyway, so it's 
the default.

Although the gcc guys have occasionally been suggesting we should set it 
in our asms.

> Actually, it still does need a temp register.  The sequence for cli is:
> 
> mov %fs:xen_vcpu, %eax
> movb $1,1(%eax)

We should just do this natively. There's been several tests over the years 
saying that it's much more efficient to do sti/cli as a simple store, and 
handling the "oops, we got an interrupt while interrupts were disabled" as 
a special case.

I have this dim memory that ARM has done it that way for a long time 
because it's so expensive to do a "real" cli/sti.

And I think -rt does it for other reasons. It's just more flexible.

Linus

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andi Kleen
\
> I'm conflicted about the dwarf unwinder.  I was off doing other things
> at the time so I missed the pain, but I do have a distinct recollection of
> the back traces on x86_64 being distinctly worse the on i386. 

The only case were i386 was better was with frame pointers, which
was never fully implemented for x86-64. However i find that hilarious: 
people are spending a lot of time right here in this thread to squeeze
out the best call sequences for the paravirt ops, but then accept
losing a full frame pointer register on i386. I never found that
acceptable, that is why I prefered the unwinder instead. 

This said the big problem with the frame pointers is mostly gone now:
on older CPUs it tended to cause a pipeline stall early in the function.
That is now fixed in the latest Intel/upcomming AMD CPUs, but there 
are still millions and millions of older CPUs out there so I still
don't consider it acceptable.

> Lately 
> I haven't seen that so it may be I was misinterpreting what I was
> seeing, and the compiler optimizations were what gave me such weird
> back traces. 

The main problem is that subsystems are getting more and more complex
and especially callbacks seem to multiply far too quickly.

In 2.4 it was often very reasonable to just sort out the false positives,
but with sometimes 20-30+ level deep call chains in 2.6 with many callbacks 
that just
gets far too tenuous. 
 
> But if the quality of our backtraces has gone down and dwarf unwinder
> could give us better back traces it is likely worth pursuing.  Of
> course it would need to start with the assumption that it's tables
> may be borked (the kernel is busted after all) and be much more
> careful than Andi's last attempt.

The latest version validates the stack always. It was only a few lines
of change. I doubt it will make much difference though. The few true crashes
we had were not actually due the unwinder itself, but the buggy fallback code
(which were fixed quickly). But anyways it should satisfy everybody's paranoia 
now.

Although in future it would be good if people did some more analysis in root 
causes
for failures before let the paranoia take over and revert patches.

We see a good example here of what I call the JFS/ACPI effect: code gets merged
too early with some visible problems. It gets a bad name and afterwards people 
never
look objectively at it again and just trust their prejudices. 

But that's not a good strategy to get good code in the end I think. If there
is enough evidence the early problems were fixed then prejudices should
be reevaluated.

I will let it cook some time in -mm* and we will see if it works now or not.
I'm pretty confident it will though. And if it does there is no reason not 
to resubmit it.

-Andi
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andreas Kleen
Am Di 20.03.2007 06:54 schrieb Jeremy Fitzhardinge <[EMAIL PROTECTED]>:

> Zachary Amsden wrote:
> > For VMI, the default clobber was "cc", and you need a way to allow
> > at
> > least that, because saving and restoring flags is too expensive on
> > x86.
>
> According to lore (Andi, I think), asm() always clobbers cc.

asm with input and/or output. I'm not sure about asms without that.

-Andi


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> For VMI, the default clobber was "cc", and you need a way to allow at
> least that, because saving and restoring flags is too expensive on x86.

According to lore (Andi, I think), asm() always clobbers cc. 

> I still don't think this was a good trade.  The primary motivation for
> clobbering %eax was that Xen wanted a free register to use for
> computing the offset into the shared data in the case of SMP
> preemptible kernels.  Xen no longer needs such a register, they can
> use the PDA offset instead.  And it does hurt native performance by
> unconditionally stealing a register in the four most commonly invoked
> paravirt-ops code sequences.

Actually, it still does need a temp register.  The sequence for cli is:

mov %fs:xen_vcpu, %eax
movb $1,1(%eax)

At some point I hope to move the vcpu structure directly into the
pda/percpu variables, at which point it will need no temps.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Rusty Russell
On Mon, 2007-03-19 at 18:00 -0800, Zachary Amsden wrote:
> Rusty Russell wrote:
> > *This* was the reason that the current hand-coded calls only clobber %
> > eax.  It was a compromise between native (no clobbers) and others (might
> > need a reg).
> 
> I still don't think this was a good trade.
...
> Xen no longer needs such a register

Hmm, well, if VMI is happy, Xen is happy, and lguest is happy, then
perhaps we're better off with a cc-only clobber rule?  Certainly makes
life simpler.

> > Now, since we decided to allow paravirt_ops operations to be normal C
> > (ie. the patching is optional and done late), we actually push and pop %
> > ecx and %edx.  This makes the call site 10 bytes long, which is a nice
> > size for patching anyway (enough for a movl $0, , a-la lguest's
> > cli, or movw $0, %gs: if we supported SMP).
> 
> You can do it in 11 bytes with no clobbers and normal C semantics by 
> linking to a direct address instead of calling to an indirect, but then 
> you need some gross fixup technology in paravirt_patch:
> 
> if (call_addr == (void*)native_sti) {
>   ...
> }

Well, I don't think we need such hacks: since we have to use handcoded
asm and mark the callsites anyway, marking what they're calling is
trivial.

The other idea from "btfixup" is that we can do the patching *much*
earlier, so we don't need the initial code to be valid at all if we
wanted to: we just need room to patch in a call insn.  We could then
generate trampolines which do the necessary pushes & pops automatically
for backends which want to use C calling conventions.

Perhaps it's time for code and benchmarks?

Rusty.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Eric W. Biederman
David Miller <[EMAIL PROTECTED]> writes:

> From: Linus Torvalds <[EMAIL PROTECTED]>
> Date: Mon, 19 Mar 2007 20:18:14 -0700 (PDT)
>
>> > > Please don't subject us to another couple months of hair-pulling only
>> > > to have Linus yank the thing out again, there are certainly more
>> > > useful things to spend time on :-)
>> 
>> Good call. Dwarf2 unwinding simply isn't worth doing. But I won't yank it 
>> out, I simply won't merge it. It was more than just totally buggy code, it 
>> was an inability of the people to understand that even bugfree code 
>> isn't enough - you have to be able to also handle buggy data.
>
> Thank you.

Hmm..

I know the feeling I have had a similar rant about the kexec on panic
code path.   The code is still no where near as paranoid about normal
kernel things not working as it could be, but by ranting about it
periodically the people doing the work are gradually making it better.

I'm conflicted about the dwarf unwinder.  I was off doing other things
at the time so I missed the pain, but I do have a distinct recollection of
the back traces on x86_64 being distinctly worse the on i386.  Lately
I haven't seen that so it may be I was misinterpreting what I was
seeing, and the compiler optimizations were what gave me such weird
back traces.  

But if the quality of our backtraces has gone down and dwarf unwinder
could give us better back traces it is likely worth pursuing.  Of
course it would need to start with the assumption that it's tables
may be borked (the kernel is busted after all) and be much more
careful than Andi's last attempt.

Eric
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread David Miller
From: Linus Torvalds <[EMAIL PROTECTED]>
Date: Mon, 19 Mar 2007 20:18:14 -0700 (PDT)

> > > Please don't subject us to another couple months of hair-pulling only
> > > to have Linus yank the thing out again, there are certainly more
> > > useful things to spend time on :-)
> 
> Good call. Dwarf2 unwinding simply isn't worth doing. But I won't yank it 
> out, I simply won't merge it. It was more than just totally buggy code, it 
> was an inability of the people to understand that even bugfree code 
> isn't enough - you have to be able to also handle buggy data.

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Linus Torvalds


On Mon, 19 Mar 2007, Andi Kleen wrote:
> 
> Initially we had some bugs that accounted for near all failures, but they 
> were all fixed in the latest version.

No. The real bugs were that the people involved wouldn't even accept that 
unwinding information was inevitably buggy and/or incomplete.

That much more fundamental bug never got fixed, as far as I know. 

I'm not going to merge anything that depends on unwind tables as things 
stand. The pain just isn't worth it.

> > Please don't subject us to another couple months of hair-pulling only
> > to have Linus yank the thing out again, there are certainly more
> > useful things to spend time on :-)

Good call. Dwarf2 unwinding simply isn't worth doing. But I won't yank it 
out, I simply won't merge it. It was more than just totally buggy code, it 
was an inability of the people to understand that even bugfree code 
isn't enough - you have to be able to also handle buggy data.

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Zachary Amsden
Rusty Russell wrote:
> On Mon, 2007-03-19 at 11:38 -0700, Linus Torvalds wrote:
>   
>> On Mon, 19 Mar 2007, Eric W. Biederman wrote:
>> 
>>> True.  You can use all of the call clobbered registers.
>>>   
>> Quite often, the biggest single win of inlining is not so much the code 
>> size (although if done right, that will be smaller too), but the fact that 
>> inlining DOES NOT CLOBBER AS MANY REGISTERS!
>> 

For VMI, the default clobber was "cc", and you need a way to allow at 
least that, because saving and restoring flags is too expensive on x86.

> Thanks Linus.
>
> *This* was the reason that the current hand-coded calls only clobber %
> eax.  It was a compromise between native (no clobbers) and others (might
> need a reg).
>   

I still don't think this was a good trade.  The primary motivation for 
clobbering %eax was that Xen wanted a free register to use for computing 
the offset into the shared data in the case of SMP preemptible kernels.  
Xen no longer needs such a register, they can use the PDA offset 
instead.  And it does hurt native performance by unconditionally 
stealing a register in the four most commonly invoked paravirt-ops code 
sequences.

> Now, since we decided to allow paravirt_ops operations to be normal C
> (ie. the patching is optional and done late), we actually push and pop %
> ecx and %edx.  This makes the call site 10 bytes long, which is a nice
> size for patching anyway (enough for a movl $0, , a-la lguest's
> cli, or movw $0, %gs: if we supported SMP).
>   

You can do it in 11 bytes with no clobbers and normal C semantics by 
linking to a direct address instead of calling to an indirect, but then 
you need some gross fixup technology in paravirt_patch:

if (call_addr == (void*)native_sti) {
  ...
}

I think we should probably try to do it in 12 bytes.  Freeing eax to the 
inline caller is likely to make up the 2 bytes of space more we have to nop.

One thing I always tried to get in VMI was to encapsulate the actual 
code which went through the business of computing arguments that were 
not even used in the native case.  Unfortunately, that seems impossible 
in the current design, but I don't think it is an issue because I don't 
think there is actually a way to express:

SWITCHABLE_CODE_BLOCK_BEGIN {
   /* arbitrary C code for native */
} SWITCHABLE_CODE_BLOCK_ALTERNATIVE {
   /* arbitrary C code for something else */
}

Dave's linker suggestion is probably the best for things like that.

Zach
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> Jeremy Fitzhardinge wrote:
>>  If we then work out in each direction and see matched push/pops,
>> then we know what registers can be trashed in the call.  This also
>> allows us to determine the callsite size, and therefore how much space
>> we need for inlining.
>>   
>
> No, that is a very dangerous suggestion.  You absolutely *cannot* do
> this safely without explicitly marking the start EIP of this code. 
> You *must* use metadata to do that.  It is never safe to disassemble
> backwards or "rewind" EIP for x86 code. 

What do you mean the instruction before is "mov $0x52515000,%eax"?

Yeah, you're right.  Oh well.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Zachary Amsden
Jeremy Fitzhardinge wrote:
> For example, say we wanted to put a general call for sti into entry.S,
> where its expected it won't touch any registers.  In that case, we'd
> have a sequence like:
>
> push %eax
> push %ecx
> push %edx
> call paravirt_cli
> pop %edx
> pop %ecx
> pop %eax
>   
>
> If we parse the relocs, then we'd find the reference to paravirt_cli. 
> If we look at the byte before and see 0xe8, then we can see if its a
> call.  If we then work out in each direction and see matched push/pops,
> then we know what registers can be trashed in the call.  This also
> allows us to determine the callsite size, and therefore how much space
> we need for inlining.
>   

No, that is a very dangerous suggestion.  You absolutely *cannot* do 
this safely without explicitly marking the start EIP of this code.  You 
*must* use metadata to do that.  It is never safe to disassemble 
backwards or "rewind" EIP for x86 code.

Zach
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Rusty Russell
On Mon, 2007-03-19 at 11:38 -0700, Linus Torvalds wrote:
> 
> On Mon, 19 Mar 2007, Eric W. Biederman wrote:
> > 
> > True.  You can use all of the call clobbered registers.
> 
> Quite often, the biggest single win of inlining is not so much the code 
> size (although if done right, that will be smaller too), but the fact that 
> inlining DOES NOT CLOBBER AS MANY REGISTERS!

Thanks Linus.

*This* was the reason that the current hand-coded calls only clobber %
eax.  It was a compromise between native (no clobbers) and others (might
need a reg).

Now, since we decided to allow paravirt_ops operations to be normal C
(ie. the patching is optional and done late), we actually push and pop %
ecx and %edx.  This makes the call site 10 bytes long, which is a nice
size for patching anyway (enough for a movl $0, , a-la lguest's
cli, or movw $0, %gs: if we supported SMP).

The current 6 paravirt ops which are patched cover the vast majority of
calls (until the Xen patches, then we need ~4 more?).  Jeremy chose to
expand patching to cover *all* paravirt ops, rather than just the new
hot ones, and that's where we tipped over the ugliness threshold.

Cheers,
Rusty.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Andi Kleen
> Possibly not, but I'd like to be able to say with confidence that
> running a PARAVIRT kernel on bare hardware has no performance loss
> compared to running a !PARAVIRT kernel.  There's the case of small
> instruction sequences which have been replaced with calls (such as
> sti/cli/push;popf/etc), 

My guess is that most critical pushf/popf are in spin_lock_irqsave(). It would 
be possible to special case that one -- inline it -- and use out of line
versions for all the others.

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Andi Kleen

> It's inability to handle sequences like the above sounds to me like
> a very good argument to _not_ merge the unwinder back into the tree.

The unwinder can handle it fine, it is just that gcc right now cannot
be taught to generate correct unwind tables for it. If paravirt ops
is widely used i guess it would be possible to find some workaround
for that though.
 
There is one case it truly cannot handle (it would be possible by switching 
to dwarf3), but that was relatively easily eliminated and wasn't a problem
imho.

> To me, that unwinder is nothing but trouble, it severly limits what
> cases you can use special calling conventions via inline asm (and we
> have done that on several occaisions) and even ignoring that the
> unwinder only works half the time.

Initially we had some bugs that accounted for near all failures, but they 
were all fixed in the latest version.
 
> Please don't subject us to another couple months of hair-pulling only
> to have Linus yank the thing out again, there are certainly more
> useful things to spend time on :-)

I think better backtraces are a good use of my time. Sorry if you
disagree on that.

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Jeremy Fitzhardinge
David Miller wrote:
> Another point worth making is that for function calls you
> can fix things up lazily if you want.
> [...]
> In fact forget I mentioned this idea :)
>   

OK :)  I think we'll only ever want to bind to a hypervisor once, since
the underlying hypervisor can't change on the fly (well, in principle it
might if you migrate, but you'll have more problems than just dealing
with the hook points).  And lazy binding doesn't buy much; I think its
much better to get it all out of the way at once, and then throw the
reloc info away.

> As another note, I do agree with Linus about the register usage
> arguments.  It is important.  I think it's been mentioned but what you
> could do is save nothing (so that "sti" and "cli" are just that and
> cost nothing), but the more complicated versions save and restore
> enough registers to operate.
>   

Right, that's pretty much what we do now.

> It all depends upon what you're trying to do.  For example, it's
> easy to use patching to make different PTE layouts be supportable
> in the same kernel image.  We do this on sparc64 since sun4v
> has a different PTE layout than sun4u, you can see the code in
> asm-sparc64/pgtable.h for details (search for "sun4v_*_patch")
>   

I see.  We want to do something conceptually like this, but we need to
handle more than just adjusting which of two constants to use as a
mask.  For example, Xen needs to run pfns through a pfn->mfn (machine
frame number) conversion and back when making/unpacking pagetable entries.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread David Miller
From: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Date: Mon, 19 Mar 2007 12:10:08 -0700

> All this is doable; I'd probably end up hacking boot/compressed/relocs.c
> to generate the appropriate reloc table.  My main concern is hacking the
> kernel build process itself; I'm unsure of what it would actually take
> to implement all this.

32-bit Sparc's btfixup might be usable as a guide.

Another point worth making is that for function calls you
can fix things up lazily if you want.

So you link, build the reloc tables, then link in a *.o file that
does provide the functions in the form of stubs.  The stubs intercept
the call, and patch the callsite, then revector to the real handler.

I don't like this idea actually because it essentially means you
either:

1) Only allow one setting of the operations

OR

2) Need to have code which walks the whole reloc table anyways
   to handle settings after the first so you can revector
   everyone back to the stubs and lazy reloc properly again

In fact forget I mentioned this idea :)

As another note, I do agree with Linus about the register usage
arguments.  It is important.  I think it's been mentioned but what you
could do is save nothing (so that "sti" and "cli" are just that and
cost nothing), but the more complicated versions save and restore
enough registers to operate.

It all depends upon what you're trying to do.  For example, it's
easy to use patching to make different PTE layouts be supportable
in the same kernel image.  We do this on sparc64 since sun4v
has a different PTE layout than sun4u, you can see the code in
asm-sparc64/pgtable.h for details (search for "sun4v_*_patch")
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Jeremy Fitzhardinge
Linus Torvalds wrote:
> On Mon, 19 Mar 2007, Eric W. Biederman wrote:
>   
>> True.  You can use all of the call clobbered registers.
>> 
>
> Quite often, the biggest single win of inlining is not so much the code 
> size (although if done right, that will be smaller too), but the fact that 
> inlining DOES NOT CLOBBER AS MANY REGISTERS!
>   

Yes, that's something that we've been conscious of when designing the
pv_ops patching mechanism.  As it stands, each time you emit a call to a
pv-ops operation, it not only stores the start and end of the patchable
area, but also which registers are available for clobbering at that
callsite by the patched-in code.  For things like sti/cli, where the
surrounding code expects nothing to be clobbered, we make sure that the
regs are preserved on the pv-ops side, even though there's a call to a
normal C function in the middle.  That gives in the pvops backend the
flexibility to patch over that site with either some inline code or a
call out to some function which doesn't necessarily clobber the full
caller-save set (or even any registers).

> In short: people here seem to think that inlining is about avoiding the 
> call/ret sequence. Not so. The real advantages of inlining are elsewhere.
>   

Yes.  Probably the biggest win of inlining is constant propagation into
the inline function, but register-use flexibility is good small-scale
win (vs the micro-scale call/ret elimination).

> Side note, you can certainly fix things like this at least in theory, but 
> it requires:
>
>  - the "call" instruction that is used instead of the inlining should 
>basically have no callee-clobbers. Any paravirt_ops called this way
>should have a special calling sequence where they simple save and 
>restore all the registers they use.
>
>This is usually not that bad. Just create a per-architecture wrapper 
>function that saves/restores anything that the C calling convention on 
>that architecture says is clobbered by calls.
>   

Most of the places we intercept are normal C calls anyway, so this isn't
a big issue.  Its mainly the places where we replace single instructions
that this will have a big effect.

The trouble with this is that we're back to having to have special
wrappers around each call to hide the normal C calling convention from
the compiler, so that it knows that it has more registers to play with. 
This was the main complaint about the original version of the patch,
because it all looks very ugly.

>  - if the function has arguments, and the inlined sequence can take the 
>arguments in arbitrary registers, you are going to penalize the inlined 
>sequence anyway (by forcing some fixed arbitrary register allocation 
>policy).This thing is largely unfixable without some really extreme 
>compiler games (like post-processing the assembler output and having 
>different entry-points depending on where the arguments are..)
>   

Yeah, that doesn't sound like much fun.  I think using the normal
regparm calling convention will be OK.  Aside from some slightly longer
instruction encodings, all the registers are more or less
interchangeable anyway.

> .. it will obviously depend on how thngs are done whether these things are 
> useful or not. But it does mean that it's always a good idea to just have 
> a config option of "turn off all the paravirt crap, because it *does* add 
> overhead, and replacing instructions on the fly doesn't make that 
> overhead go away".

Yes, there's a big switch to turn all this off.  It would be nice if we
could get things to the point that it isn't necessary to have (ie,
running on bare hardware really is indistinguishable either way), but
we're a fair way from being able to prove that.  In the meantime, the
goal is to try to keep the source-level changes a local as possible so
that maintaining the CONFIG_PARAVIRT vs non-PARAVIRT distinction is
straightforward.


J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Jeremy Fitzhardinge
Eric W. Biederman wrote:
> Rusty Russell <[EMAIL PROTECTED]> writes:
>
>   
>> On Sun, 2007-03-18 at 13:08 +0100, Andi Kleen wrote:
>> 
 The idea is _NOT_ that you go look for references to the paravirt_ops
 members structure, that would be stupid and you wouldn't be able to
 use the most efficient addressing mode on a given cpu, you'd be
 patching up indirect calls and crap like that.  Just say no...
 
>>> That wouldn't handle inlines though. At least some of the current
>>> paravirtops like cli/sti are critical enough to require inlining.
>>>   
>> Well, we'd patch the inline over the call if we have room.
>>
>> Magic patching would be neat, but the downsides are that (1) we can't
>> expand the patching room and (2) there's no way of attaching clobber
>> info to the call site (doing register liveness analysis is not
>> appealing).
>> 
>
> True.  You can use all of the call clobbered registers.
>   

The trouble is that there are places where we want to intercept things
like sti/cli in entry.S in a context where all the registers are in use,
so we can't generally make the assumption that caller-saved regs are
clobberable.  For any call-site in compiler-generated code we can make
that assumption, of course.

>> Now, this may not be fatal.  5 bytes is enough for all the native ops to
>> be patched inline.   For lguest this covers popf and pushf, but not cli
>> and sti (10 bytes): they'd have to be calls.
>>
>> As for clobber info, it turns out that almost all of the calls can
>> clobber %eax, which is probably enough.  We just need to mark the
>> handful of asm ones where this isn't true.
>> 
>
> I guess if the code is larger than a function call I'm failing to see
> the disadvantage in making it a direct function call.  Any modern
> processor ought to be able to predict it perfectly, and processors
> like the P4 may even optimize the call out of their L1 instruction
> cache.
>   

For current processors, I think i-cache pollution is really the only
downside to a call.  But maybe you could put multiple pv-op functions
into a cacheline if they're used in close proximity
(cli/sti/save_fl/restore_fl would all fit into a single cacheline).

> If what David is suggesting works, making all of these direct calls
> looks easy and very maintainable.   At which point patching
> instructions inline is quite possibly overkill.
>   

I spent some time looking at what it would take to get this going.  It
would probably look something like:

   1. make CONFIG_PARAVIRT select CONFIG_RELOCATABLE
   2. generate vmlinux
   3. extract relocs, and process the references to paravirt symbols
  into a table
   4. compile and link that table into the kernel again (like the ksyms
  dance)
   5. at boot time, use that table to redirect calls/references into the
  paravirt backend to the appropriate one

I think if we store the reloc references as section-relative, and the
reloc table itself is linked into its own section, then there's no need
for multipass linking to combine the reloc table into the kernel image,
since adding that table won't affect any existing reloc.

All this is doable; I'd probably end up hacking boot/compressed/relocs.c
to generate the appropriate reloc table.  My main concern is hacking the
kernel build process itself; I'm unsure of what it would actually take
to implement all this.

> Is it truly critical to inline any of these instructions?
>   

Possibly not, but I'd like to be able to say with confidence that
running a PARAVIRT kernel on bare hardware has no performance loss
compared to running a !PARAVIRT kernel.  There's the case of small
instruction sequences which have been replaced with calls (such as
sti/cli/push;popf/etc), and also the case of hooks which are noops on
native (like make_pte/pte_val, etc).  In those cases it would be nice to
patch in a replacement, either a real instruction or just nops.


J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread David Miller
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Mon, 19 Mar 2007 11:57:28 +0100

> On Monday 19 March 2007 00:46, Jeremy Fitzhardinge wrote:
> > Andi Kleen wrote:
> > For example, say we wanted to put a general call for sti into entry.S,
> > where its expected it won't touch any registers.  In that case, we'd
> > have a sequence like:
> >
> > push %eax
> > push %ecx
> > push %edx
> > call paravirt_cli
> > pop %edx
> > pop %ecx
> > pop %eax
> 
> This cannot right now be expressed as inline assembly in the unwinder at all 
> because there is no way to inject the push/pops into the compiler generated
> ehframe tables.
> 
> [BTW I plan to resubmit the unwinder with some changes]

It's inability to handle sequences like the above sounds to me like
a very good argument to _not_ merge the unwinder back into the tree.

To me, that unwinder is nothing but trouble, it severly limits what
cases you can use special calling conventions via inline asm (and we
have done that on several occaisions) and even ignoring that the
unwinder only works half the time.

Please don't subject us to another couple months of hair-pulling only
to have Linus yank the thing out again, there are certainly more
useful things to spend time on :-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Linus Torvalds


On Mon, 19 Mar 2007, Linus Torvalds wrote:
> 
> So *please* don't believe that you can make it "as cheap" to have some 
> automatic fixup of two sequences, one inlined and one as a "call".  It may 
> look so when you look at the single instruction generated, but you're 
> ignoring all the instructions *around* the site.

Side note, you can certainly fix things like this at least in theory, but 
it requires:

 - the "call" instruction that is used instead of the inlining should 
   basically have no callee-clobbers. Any paravirt_ops called this way
   should have a special calling sequence where they simple save and 
   restore all the registers they use.

   This is usually not that bad. Just create a per-architecture wrapper 
   function that saves/restores anything that the C calling convention on 
   that architecture says is clobbered by calls.

 - if the function has arguments, and the inlined sequence can take the 
   arguments in arbitrary registers, you are going to penalize the inlined 
   sequence anyway (by forcing some fixed arbitrary register allocation 
   policy).This thing is largely unfixable without some really extreme 
   compiler games (like post-processing the assembler output and having 
   different entry-points depending on where the arguments are..)

.. it will obviously depend on how thngs are done whether these things are 
useful or not. But it does mean that it's always a good idea to just have 
a config option of "turn off all the paravirt crap, because it *does* add 
overhead, and replacing instructions on the fly doesn't make that 
overhead go away".

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Chris Wright
* Eric W. Biederman ([EMAIL PROTECTED]) wrote:
> Is it truly critical to inline any of these instructions?

I don't have any current measurements.  But we'd been aiming
at getting irq_{en,dis}able to a simple memory write to pda.
But simplicity, maintenance, etc. win over trimming a couple
cycles, so still worth real look.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Linus Torvalds


On Mon, 19 Mar 2007, Eric W. Biederman wrote:
> 
> True.  You can use all of the call clobbered registers.

Quite often, the biggest single win of inlining is not so much the code 
size (although if done right, that will be smaller too), but the fact that 
inlining DOES NOT CLOBBER AS MANY REGISTERS!

The lack of register clobbering, and the freedom for the compiler to 
choose registers around an inlined function is usually the biggest win! If 
you can't do that, then inlining generally doesn't actually even help: a 
call/return to a single instruction isn't all that much slower than just 
doing the "cli" in the first place.

If we end up with a setup where any inlined instruction needs to act as if 
it was a function call (just with the "call" instruction papered over with 
the inlined instruction sequence), then there is no point to this at all. 

In short: people here seem to think that inlining is about avoiding the 
call/ret sequence. Not so. The real advantages of inlining are elsewhere.

So *please* don't believe that you can make it "as cheap" to have some 
automatic fixup of two sequences, one inlined and one as a "call".  It may 
look so when you look at the single instruction generated, but you're 
ignoring all the instructions *around* the site.

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Eric W. Biederman
Rusty Russell <[EMAIL PROTECTED]> writes:

> On Sun, 2007-03-18 at 13:08 +0100, Andi Kleen wrote:
>> > The idea is _NOT_ that you go look for references to the paravirt_ops
>> > members structure, that would be stupid and you wouldn't be able to
>> > use the most efficient addressing mode on a given cpu, you'd be
>> > patching up indirect calls and crap like that.  Just say no...
>> 
>> That wouldn't handle inlines though. At least some of the current
>> paravirtops like cli/sti are critical enough to require inlining.
>
> Well, we'd patch the inline over the call if we have room.
>
> Magic patching would be neat, but the downsides are that (1) we can't
> expand the patching room and (2) there's no way of attaching clobber
> info to the call site (doing register liveness analysis is not
> appealing).

True.  You can use all of the call clobbered registers.

> Now, this may not be fatal.  5 bytes is enough for all the native ops to
> be patched inline.   For lguest this covers popf and pushf, but not cli
> and sti (10 bytes): they'd have to be calls.
>
> As for clobber info, it turns out that almost all of the calls can
> clobber %eax, which is probably enough.  We just need to mark the
> handful of asm ones where this isn't true.

I guess if the code is larger than a function call I'm failing to see
the disadvantage in making it a direct function call.  Any modern
processor ought to be able to predict it perfectly, and processors
like the P4 may even optimize the call out of their L1 instruction
cache.

If what David is suggesting works, making all of these direct calls
looks easy and very maintainable.   At which point patching
instructions inline is quite possibly overkill.

Is it truly critical to inline any of these instructions?

Eric
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
>>
>> push %eax
>> push %ecx
>> push %edx
>> call paravirt_cli
>> pop %edx
>> pop %ecx
>> pop %eax
>> 
>
> This cannot right now be expressed as inline assembly in the unwinder at all 
> because there is no way to inject the push/pops into the compiler generated
> ehframe tables.
>   

I was thinking more of entry.S rather than inline asms.  The only ones I
can think of which would be relevant are the interrupt operations in
spinlocks, and even then we get a bit more leeway.

> gcc normally doesn't generate push/pops around directly around the
> call site, but somewhere else due to the way its register allocator works.
> It can be anywhere in the function or even not there at all if the register
> didn't contain anything useful. And they're not necessarily push/pops of 
> course.
>   

Right, I'm not making any assumptions about what gcc generates for
calls, other than assuming it uses a standard "call X" direct call
instruction.

> So you would need to write it as inline assembly. I'm not sure it would
> be significantly cleaner than just having tables then.
>   

No, my intention is that the vast majority of pv_ops uses, which are
calls from C code, would simply be normal C calls, syntatically and
semantically.

> It's unlikely you can do much useful in 5 bytes I guess.
>   

I think the main value is retaining non-PARAVIRT native performance
levels.  In the native case, the inlined instructions amount to 1 or 2
bytes, and having small patch sites is an advantage because there's less
space wasted on nops.  My understanding is that a direct call has almost
zero overhead on modern processors, because both the call and the return
can be completely predicted and prefetched, so the threshhold at which
you make inline vs call tradeoff is pretty small.

> Regarding cli/sti: i've been actually thinking about changing it in the
> non paravirt kernel. IIRC most save_flags/restore_flags are inside
> spin_lock_irqsave/restore() and that is a separate function anyways
> so a little larger special case code is ok as long as it is not slower. 
> There is some evidence that at least on P4 a software cli/sti flag without 
> pushf/popf would be faster.
>   

There are also the ones in entry.S.  I suppose spinlocks do get used
more than syscalls and interrupts.


J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Andi Kleen
On Monday 19 March 2007 00:46, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > Yes. All inline assembly tells gcc what registers are clobbered
> > and it fills in the tables. Hand clobbering in inline assembly cannot
> > be expressed with the current toolchain, so we moved all those
> > out of line.
> >
> > But again I'm not sure it will work anyways. For once you would
> > need large padding around the calls anyways for inline replacement --
> > how would you generate that? I expect you would need to put the calls
> > into asm() again and with that a custom annotiation format looks
> > reasonable.
>
> Inlining is most important for very small code: sti, cli, pushf;pop eax,
> etc (in many cases, no-ops).  We'd have at least 5 bytes to work in, and
> maybe more if there are surrounding push/pops to be consumed.
>
> For example, say we wanted to put a general call for sti into entry.S,
> where its expected it won't touch any registers.  In that case, we'd
> have a sequence like:
>
> push %eax
> push %ecx
> push %edx
> call paravirt_cli
> pop %edx
> pop %ecx
> pop %eax

This cannot right now be expressed as inline assembly in the unwinder at all 
because there is no way to inject the push/pops into the compiler generated
ehframe tables.

[BTW I plan to resubmit the unwinder with some changes]

>
>
> If we parse the relocs, then we'd find the reference to paravirt_cli.
> If we look at the byte before and see 0xe8, then we can see if its a
> call.  If we then work out in each direction and see matched push/pops,
> then we know what registers can be trashed in the call.  This also
> allows us to determine the callsite size, and therefore how much space
> we need for inlining.

gcc normally doesn't generate push/pops around directly around the
call site, but somewhere else due to the way its register allocator works.
It can be anywhere in the function or even not there at all if the register
didn't contain anything useful. And they're not necessarily push/pops of 
course.

So you would need to write it as inline assembly. I'm not sure it would
be significantly cleaner than just having tables then.

> So in this case, we see that there are 5 bytes for the call and a
> further 6 bytes of push/pops available for inlining.
>
> Of course this is hand-written code anyway, so there's no particular
> burden to having some extra metadata stashed away in another section.
> For compiler-generated code, we know that it's already expecting
> standard C ABI calling conventions.  The downside, of course, is that
> only the 5 byte call space is available for inline patching.

It's unlikely you can do much useful in 5 bytes I guess.

Regarding cli/sti: i've been actually thinking about changing it in the
non paravirt kernel. IIRC most save_flags/restore_flags are inside
spin_lock_irqsave/restore() and that is a separate function anyways
so a little larger special case code is ok as long as it is not slower. 
There is some evidence that at least on P4 a software cli/sti flag without 
pushf/popf would be faster.

-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-18 Thread Rusty Russell
On Sun, 2007-03-18 at 13:08 +0100, Andi Kleen wrote:
> > The idea is _NOT_ that you go look for references to the paravirt_ops
> > members structure, that would be stupid and you wouldn't be able to
> > use the most efficient addressing mode on a given cpu, you'd be
> > patching up indirect calls and crap like that.  Just say no...
> 
> That wouldn't handle inlines though. At least some of the current
> paravirtops like cli/sti are critical enough to require inlining.

Well, we'd patch the inline over the call if we have room.

Magic patching would be neat, but the downsides are that (1) we can't
expand the patching room and (2) there's no way of attaching clobber
info to the call site (doing register liveness analysis is not
appealing).

Now, this may not be fatal.  5 bytes is enough for all the native ops to
be patched inline.   For lguest this covers popf and pushf, but not cli
and sti (10 bytes): they'd have to be calls.

As for clobber info, it turns out that almost all of the calls can
clobber %eax, which is probably enough.  We just need to mark the
handful of asm ones where this isn't true.

Thoughts?
Rusty.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-18 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
> Yes. All inline assembly tells gcc what registers are clobbered
> and it fills in the tables. Hand clobbering in inline assembly cannot
> be expressed with the current toolchain, so we moved all those
> out of line.
>
> But again I'm not sure it will work anyways. For once you would
> need large padding around the calls anyways for inline replacement --
> how would you generate that? I expect you would need to put the calls
> into asm() again and with that a custom annotiation format looks reasonable.

Inlining is most important for very small code: sti, cli, pushf;pop eax,
etc (in many cases, no-ops).  We'd have at least 5 bytes to work in, and
maybe more if there are surrounding push/pops to be consumed.

For example, say we wanted to put a general call for sti into entry.S,
where its expected it won't touch any registers.  In that case, we'd
have a sequence like:

push %eax
push %ecx
push %edx
call paravirt_cli
pop %edx
pop %ecx
pop %eax
  

If we parse the relocs, then we'd find the reference to paravirt_cli. 
If we look at the byte before and see 0xe8, then we can see if its a
call.  If we then work out in each direction and see matched push/pops,
then we know what registers can be trashed in the call.  This also
allows us to determine the callsite size, and therefore how much space
we need for inlining.

So in this case, we see that there are 5 bytes for the call and a
further 6 bytes of push/pops available for inlining.

Of course this is hand-written code anyway, so there's no particular
burden to having some extra metadata stashed away in another section. 
For compiler-generated code, we know that it's already expecting
standard C ABI calling conventions.  The downside, of course, is that
only the 5 byte call space is available for inline patching.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-18 Thread Andi Kleen
On Sun, Mar 18, 2007 at 10:29:10AM -0700, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > You could use the dwarf2 unwind tables. They have exact information
> > what register has what. But it would likely get complicated.
> 
> Yes.  And would they be accurate for hand-written asm, which is where we
> have this problem?

Yes. All inline assembly tells gcc what registers are clobbered
and it fills in the tables. Hand clobbering in inline assembly cannot
be expressed with the current toolchain, so we moved all those
out of line.

But again I'm not sure it will work anyways. For once you would
need large padding around the calls anyways for inline replacement --
how would you generate that? I expect you would need to put the calls
into asm() again and with that a custom annotiation format looks reasonable.

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-18 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
> You could use the dwarf2 unwind tables. They have exact information
> what register has what. But it would likely get complicated.

Yes.  And would they be accurate for hand-written asm, which is where we
have this problem?

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-18 Thread Andi Kleen
> The bigger problem is that you don't know what registers you can
> clobber.  For the pv_ops in hand-written asm, that a big pain.  The code
> either has to assume the worst, or the "relocator" has to do something
> more sophisticated (like look for push/pop pairs surrounding the call,
> perhaps?).

You could use the dwarf2 unwind tables. They have exact information
what register has what. But it would likely get complicated.

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-18 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
>> The idea is _NOT_ that you go look for references to the paravirt_ops
>> members structure, that would be stupid and you wouldn't be able to
>> use the most efficient addressing mode on a given cpu, you'd be
>> patching up indirect calls and crap like that.  Just say no...
>> 
>
> That wouldn't handle inlines though. At least some of the current
> paravirtops like cli/sti are critical enough to require inlining.
>   

You could special-case it in the thing handling the relocs; if you're
relocating to point to a function which you have an inline substitution,
then inline.

The bigger problem is that you don't know what registers you can
clobber.  For the pv_ops in hand-written asm, that a big pain.  The code
either has to assume the worst, or the "relocator" has to do something
more sophisticated (like look for push/pop pairs surrounding the call,
perhaps?).

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-18 Thread Andi Kleen
> The idea is _NOT_ that you go look for references to the paravirt_ops
> members structure, that would be stupid and you wouldn't be able to
> use the most efficient addressing mode on a given cpu, you'd be
> patching up indirect calls and crap like that.  Just say no...

That wouldn't handle inlines though. At least some of the current
paravirtops like cli/sti are critical enough to require inlining.

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-18 Thread Jeremy Fitzhardinge
David Miller wrote:
> The idea is _NOT_ that you go look for references to the paravirt_ops
> members structure, that would be stupid and you wouldn't be able to
> use the most efficient addressing mode on a given cpu, you'd be
> patching up indirect calls and crap like that.  Just say no...
>
> Instead you get rid of paravirt ops completely, and you call functions
> whose symbol name will not resolve in the initial kernel link.
>   

Yeah, I came to that conclusion after thinking about it for a while. 
Thanks for the pointer to the sparc stuff; it looks very interesting.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-17 Thread David Miller
From: Rusty Russell <[EMAIL PROTECTED]>
Date: Sat, 17 Mar 2007 21:33:58 +1100

> On Fri, 2007-03-16 at 13:38 -0700, Jeremy Fitzhardinge wrote:
> > David Miller wrote:
> > > Perhaps the problem can be dealt with using ELF relocations.
> > >
> > > There is another case, discussed yesterday on netdev, where run-time
> > > resolution of ELF relocations would be useful (for
> > > very-very-very-read-only variables) so if it can solve this problem
> > > too it would be nice to have a generic infrastructure for it.
> > 
> > That's an interesting idea.  Have you or anyone else looked at what it
> > would take to code up?
> > 
> > For this case, I guess you'd walk the relocs looking for references into
> > the paravirt_ops structure.  You'd need to check that was a reference
> > from an indirect jump or call instruction, which would identify a
> > patchable callsite.  The offset into the pv_ops structure would identify
> > which operation is involved.
> 
> I wrote a whole email on ways to do this, BUT...

The idea is _NOT_ that you go look for references to the paravirt_ops
members structure, that would be stupid and you wouldn't be able to
use the most efficient addressing mode on a given cpu, you'd be
patching up indirect calls and crap like that.  Just say no...

Instead you get rid of paravirt ops completely, and you call functions
whose symbol name will not resolve in the initial kernel link.

You do an initial link of the kernel, look for the unresolved symbols
in the ELF relocation tables (just like the linker does), and put
those references into a table that is use to patch things up and you
can use standard ELF relocation code to handle this, exactly like code
we already have for module loading in the kernel already.

This idea is about 15 years old, sparc32 has been doing exactly this
via something called "btfixup" to handle the page table, TLB, and
cache differences of 15 different cpu+cache type combinations.

> #define pv_patch(call, args...) \
>   asm volatile(":"); 
>   call(args);
>   asm volatile("8889:"
>[ stuff to put 8889,  and call in fixup section ]

Please, use ELF and it's powerful and clean existing way to
do this please. :-)

> > What are the netdev requirements?
> 
> Reading Ben LaHaise's (very cool!) patch, it's not clear that using
> reloc postprocessing is going to be clearer than open-coding it as he
> has done.

Ben's case can be handled in the same way.  Just do not define the
symbols, pre-link, look for the references in the relocation tables,
and run through that when you do the set_very_readonly() or
install_paravirt_ops() thing.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-17 Thread Rusty Russell
On Fri, 2007-03-16 at 13:38 -0700, Jeremy Fitzhardinge wrote:
> David Miller wrote:
> > Perhaps the problem can be dealt with using ELF relocations.
> >
> > There is another case, discussed yesterday on netdev, where run-time
> > resolution of ELF relocations would be useful (for
> > very-very-very-read-only variables) so if it can solve this problem
> > too it would be nice to have a generic infrastructure for it.
> 
> That's an interesting idea.  Have you or anyone else looked at what it
> would take to code up?
> 
> For this case, I guess you'd walk the relocs looking for references into
> the paravirt_ops structure.  You'd need to check that was a reference
> from an indirect jump or call instruction, which would identify a
> patchable callsite.  The offset into the pv_ops structure would identify
> which operation is involved.

I wrote a whole email on ways to do this, BUT...

Perhaps our patching code can already be vastly simplified to something
like:

#define pv_patch(call, args...) \
asm volatile(":"); 
call(args);
asm volatile("8889:"
 [ stuff to put 8889,  and call in fixup section ]

The patching code doesn't even need to decode between those two: it
simply looks for an indirect call insn (0xff 0xd?).  If it finds more
than one, it doesn't patch.  If it only finds one, it patches.  We'll
probably hit them all just doing that.

> What are the netdev requirements?

Reading Ben LaHaise's (very cool!) patch, it's not clear that using
reloc postprocessing is going to be clearer than open-coding it as he
has done.

Rusty.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-17 Thread Rusty Russell
On Fri, 2007-03-16 at 10:24 +0100, Ingo Molnar wrote:
> * Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:
> 
> > Wrap a set of interesting paravirt_ops calls in a wrapper which makes 
> > the callsites available for patching.  Unfortunately this is pretty 
> > ugly because there's no way to get gcc to generate a function call, 
> > but also wrap just the callsite itself with the necessary labels.
> > 
> > This patch supports functions with 0-4 arguments, and either void or 
> > returning a value.  64-bit arguments must be split into a pair of 
> > 32-bit arguments (lower word first).  Small structures are returned in 
> > registers.
> 
> ugh. This is beyond ugly! Why dont we just compile two images, one for 
> Xen and one for native, do two passes to get those two images and 
> 'merge' them into a single vmlinuz (so that we still have a 'single' 
> kernel unit to deal with on the distro side). This way we avoid all this 
> crazy, limited, fragile patchery business...

But with lguest, VMI and kvm I don't think that's a good idea.

For background: the current patching code is ugly too, but it only deals
with the 6 most common functions, so it's contained.  This gets us
pretty close to !CONFIG_PARAVIRT performance.  (But slowdown is still
measurable).

We get worse with Xen wanting to alter mkpte et al.  By the time we
patch another 6 functions (each one slightly different), we get pretty
close to general coverage anyway, so it's clearer to do them all.

I think the question is, can we do it better than this?  My previous
attempts (which lead to the current code) ranged from ugly to very ugly
8(

Rusty.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-17 Thread Zachary Amsden
Jeremy Fitzhardinge wrote:
> I think the suggestion is much simpler.  If you convince gcc/binutils to
> leave the .reloc section in vmlinux, and make that available to the
> kernel itself, then you can scan all the kernel's relocs to find ones
> which refer to paravirt_ops, and use those to determine which are
> callsites that can be patched.
>   

Yes, that is pretty nice.

> The main upside is that all the callsites are just normal C calls;
> there's no special syntax or strange macros, and we get the full benefit
> of typechecking, etc.
>
> But I can see a few downsides compared the current scheme:
>
>1. Identifying the callsites is a somewhat hackish process of looking
>   at a reloc and doing a bit of dissassembly to see what is using
>   the reloc, to identify calls and jumps
>2. There's nothing explicit to tell us how much space there is to
>   patch into; we just have to assume sizeof(indirect call/jmp)
>3. There's no information about the register environment at the
>   callsite, so we just have to adopt normal C ABI rules.  For the
>   patch sites in hand-written asm, this could be tricky.
>4. gcc could do strange things which prevent detection of patch
>   sites.  For example, it might CSE the value of, say,
>   paravirt_ops.irq_enable, which would be a reasonable optimisation,
>   but prevent any of the resulting indirect calls from being
>   patched.  In general it relies on gcc to generate identifiable
>   callsites, which is a bit unpredictable.
>5. There's still a moderate amount of binutils hackery to get the
>   relocs into the right form, and there's plenty of scope for it to
>   screw up.
>   

And yes, those are nasty points.  I think I'd be interested in seeing 
more discussion on it, perhaps those issues could be worked out.

>> [ Roswell technology deleted ]
>> 

Nack.  Everyone needs Roswell technology.

Actually, that was not a serious proposal.  I think the effort and 
complexity would likely not justify the gain.  But I still had to throw 
it out, since it is what we use on Betelgeuse.

Z
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-16 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> I like this code very much; although it is unavoidably ugly, it is a
> nice general mechanism for doing code rewriting.  Much more
> elaboration on this below.
>

Thanks.

> static inline void local_irq_restore(const unsigned long flags)
> {
>vmi_wrap_call(
>SetInterruptMask, "pushl %0; popfl",
>VMI_NO_OUTPUT,
>1, VMI_IREG1 (flags),
>XCONC("cc", "memory"));
> }
>
> So the constraints are obvious and tied to the inline assembly.  But
> Jeremy seems to have done even better with the vcall stuff.  Prettier:
>
> +PVOP_VCALL0(setup_boot_clock);

Yeah, it doesn't try as hard as your example, so its all based around
the function call ABI.  If you want to inline something, you need to do
that elsewhere, which I guess is OK because that's not the
common case (only very simple cases can be replaced by inlines, and only
a few of those are worth doing).

> We went through this design exercise, and thought it was pretty
> promising.  Basically, you would reserve a set of "local" relocation
> types that should never be emitted by the toolchain.  Then you can
> have complex relocations, such as "replace pushf; popf %0 with
> arbitrary code."  You can even leave the arguments unfixed and grant
> the compiler register allocation, as long as you took care to encode
> the input / output registers somewhere (in a .reloc section of some
> sort, or encoded in the relocation type itself).

I'm pretty sure that's not what he means.  The big objection to the
PVOP_* stuff is the fact that there are these massive macros full of
inline asm to wrap the calls, which have to be invoked in a fragile
type-unsafe way.  Adding custom relocs would suffer the same problem,
since you'd need inline asm to deal with them, and I'm deathly
frightened of whatever binutils would do if you mean real relocs.

I think the suggestion is much simpler.  If you convince gcc/binutils to
leave the .reloc section in vmlinux, and make that available to the
kernel itself, then you can scan all the kernel's relocs to find ones
which refer to paravirt_ops, and use those to determine which are
callsites that can be patched.

The main upside is that all the callsites are just normal C calls;
there's no special syntax or strange macros, and we get the full benefit
of typechecking, etc.

But I can see a few downsides compared the current scheme:

   1. Identifying the callsites is a somewhat hackish process of looking
  at a reloc and doing a bit of dissassembly to see what is using
  the reloc, to identify calls and jumps
   2. There's nothing explicit to tell us how much space there is to
  patch into; we just have to assume sizeof(indirect call/jmp)
   3. There's no information about the register environment at the
  callsite, so we just have to adopt normal C ABI rules.  For the
  patch sites in hand-written asm, this could be tricky.
   4. gcc could do strange things which prevent detection of patch
  sites.  For example, it might CSE the value of, say,
  paravirt_ops.irq_enable, which would be a reasonable optimisation,
  but prevent any of the resulting indirect calls from being
  patched.  In general it relies on gcc to generate identifiable
  callsites, which is a bit unpredictable.
   5. There's still a moderate amount of binutils hackery to get the
  relocs into the right form, and there's plenty of scope for it to
  screw up.


> [ Roswell technology deleted ]

J
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-16 Thread Zachary Amsden
Jeremy Fitzhardinge wrote:
> Well, one thing to make clear is this is absolutely not a Xen-specific
> patch or piece of code.  This is part of the paravirt_ops infrastructure
> designed to remove the overhead of all the indirect calls which are
> scattered all over the place.  (Perhaps I should post the general
> paravirt and Xen specific patches in separate patch series to make this
> clear...).
>
> The idea is to wrap the callsite itself with in the same manner as the
> other altinstructions so that the general patcher can, at the very
> least, convert the indirect call to a direct one, or nop it out if its
> an indirect call.  This means that a pv_ops implementation can get about
> 90% of the benefit of patching without any extra effort.
>   

I like this code very much; although it is unavoidably ugly, it is a 
nice general mechanism for doing code rewriting.  Much more elaboration 
on this below.

> So, I disagree with your characterisation that its "limited"; this is a
> pretty general mechanism.  The fragile part is in using the PVOP_*
> macros properly to match the ABI's calling convention, particularly with
> tricky cases like passed and returned structures and 64-bit args.  But
> that just needs to be done once in one place, and is otherwise
> self-contained.
>   

You could just use the VMI ABI, then patch everything at runtime to call 
into the Xen paravirt-ops backend ;)

> I would love a better mechanism.  I played with using things like gcc's
> builtin_apply stuff, and so on, but I could find no way to get gcc to
> set up the args and then be able to just emit the call itself under asm
> control.
>   

I fought tooth and nail to get something cleaner than this for VMI back 
when it was a subarch.  In the end, the best I could do was wrap the 
constraints into prettier macros so the asm volatile stuff wasn't 
sticking out everywhere.  It was pretty, but the macros were so 
grotesque that I was exiled from my home planet.

static inline void local_irq_restore(const unsigned long flags)
{
vmi_wrap_call(
SetInterruptMask, "pushl %0; popfl",
VMI_NO_OUTPUT,
1, VMI_IREG1 (flags),
XCONC("cc", "memory"));
}

So the constraints are obvious and tied to the inline assembly.  But 
Jeremy seems to have done even better with the vcall stuff.  Prettier:

+   PVOP_VCALL0(setup_boot_clock);



>
> I haven't looked at Dave's reply in detail, but I saw some mention of
> using relocs.  The idea is intriguing , but I don't quite see how it
> would all fit together.
>   

We went through this design exercise, and thought it was pretty 
promising.  Basically, you would reserve a set of "local" relocation 
types that should never be emitted by the toolchain.  Then you can have 
complex relocations, such as "replace pushf; popf %0 with arbitrary 
code."  You can even leave the arguments unfixed and grant the compiler 
register allocation, as long as you took care to encode the input / 
output registers somewhere (in a .reloc section of some sort, or encoded 
in the relocation type itself).

Now you can make complex decisions at runtime, and apply choice 
functions to these relocations that can cope with a variety of different 
circumstances - you could encode not just paravirt-ops as relocations, 
but all of the alternative instructions, and smp alternatives, and even 
higher level constructs, such as choices made by the user with the 
kernel command line - some potential examples:

acpi=noirq
idle=halt

With proper synchronization, using something like stop_machine_run, you 
can even make these choices dynamically, and then relink the kernel in 
place to take faster paths.  And the technique is universal, so you 
could use it cross architecture, which would be really helpful for 
architectures that say, have really slow indirect branches.

Once the technique gains wide acceptance, you could use it for all 
kernel interfaces which have static function pointers for the post-init 
lifetime of the kernel.  Which might contribute to a global performance 
improvement of perhaps a couple percent.  But the cost is clearly the 
complexity.

I just had a slightly interesting idea - you could even catch bugs where 
dynamic assignments to function pointers fail to update the appropriate 
patch sites by checking for non .init code sections which write through 
accelerated_fn_ptr_t's using static checking from sparse.

Is that sort of what you were thinking of Dave?

Zach
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-16 Thread Jeremy Fitzhardinge
David Miller wrote:
> Perhaps the problem can be dealt with using ELF relocations.
>
> There is another case, discussed yesterday on netdev, where run-time
> resolution of ELF relocations would be useful (for
> very-very-very-read-only variables) so if it can solve this problem
> too it would be nice to have a generic infrastructure for it.

That's an interesting idea.  Have you or anyone else looked at what it
would take to code up?

For this case, I guess you'd walk the relocs looking for references into
the paravirt_ops structure.  You'd need to check that was a reference
from an indirect jump or call instruction, which would identify a
patchable callsite.  The offset into the pv_ops structure would identify
which operation is involved.  That would be enough to use the existing
paravirt_ops patch machinery to insert a patch, though it doesn't
provide quite as much information as the current scheme.  The current
scheme also tells us how much space is available for patching over, and
what registers the patched-in code can use/clobber.

What are the netdev requirements?

J
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-16 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:

> * David Miller <[EMAIL PROTECTED]> wrote:

> > Perhaps the problem can be dealt with using ELF relocations.
> > 
> > There is another case, discussed yesterday on netdev, where run-time 
> > resolution of ELF relocations would be useful (for 
> > very-very-very-read-only variables) so if it can solve this problem 
> > too it would be nice to have a generic infrastructure for it.
>   


> yeah, and i really think this is very fundamental: [...]

I think what Dave is suggesting is that we use the reloc information the
compiler generates to find the patchable callsites rather than have
special wrappers.  This is an interesting idea.

> Limited, instruction-level patching like alternatives.h is fine because 
> that makes it easier to support multiple, incompatible CPU 
> architectures, without having to do a hugely intrusive split at the 
> kernel RPM level.
>
> but the level of 'binary patching' done by the paravirt and Xen goes way 
> beyond that,

Not really.  There are only three cases:

   1. replace an indirect call with a direct call
   2. nop out a callsite
   3. patch in a short inline sequence

And as I pointed out, this is used by all pv_op backends, using a common
piece of code to implement at least 1 and 2.  3 could be implemented
semi-generically by using rules like "if (func == native_sti) {
patch("sti"); }", which would cover many cases where a hypervisor
doesn't need any special handling for a particular operation.

The goal is to eliminate the cost of the indirect calls with nice
predictable indirect calls.  There's a 1 byte/callsite overhead, but I
don't think that's a horrible overhead.

And, at worst, its only a little more complex than the kinds of
transformations.

Ideally, its a mechanism which could be used elsewhere.  It applies with
you have some kind of ops_vector table which is updated once (or perhaps
very rarely), and you don't want to wear the overhead of indirect calls
everywhere.

>  and the changes here really underscore that we:
>
>   _should not emulate the closed source world_
>
> There the only solution is to binary-patch - because they have no source 
> code. But here, we've got all the source code.
>   

I don't think this is a relevant comparison.  This is purely a matter of
optimising out unnecessary indirect calls.

> nobody wants to boot a xen-paravirt kernel from a floppy, so image size 
> is not an issue. In-RAM overhead would in fact be /reduced/, because 
> currently all the paravirt overhead hits both the native and the 
> paravirt kernel. Nor would /all/ of the vmlinuz have to be replicated in 
> the images - it's enough to replicate only those functions that truly 
> differ between the two build methods.

One of the explicit goals of pv_ops was to allow a single kernel to
either boot on native hardware or under any one of the supported
hypervisors, explicitly to avoid having to manage multiple kernel
images.  Compiling the kernel N+1 times for N hypervisors, and then
bundling them up in some kind of multi-image format doesn't seem like a
particularly good tradeoff.  The kernel RPM on my machine here is
already ~50Mbytes; expanding that to 250Mbytes to support native, Xen,
vmi, lguest and kvm doesn't seem reasonable.

J
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization