Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-08 Thread Eric W. Biederman
>
> The version I would up testing is below, and it doesn't work.
> I still get "No irq handler for vector" warnings as well as
> a couple of complaints from lock/irq debugging.The debugging
> doesn't worry me.  The fact that I don't have a good way to ensure
> I have no more irqs in flight does.
>
> So unless someone can find a sure way to drain the irqs in flight,
> I can't migrate an irq from process context, and looking at irr and
> handling a pending irq appears required.  '

Bah.  I had not taken into account that the local apic despite
being tightly coupled with the cpu is for programming purposes
an asynchronous device.  If I want to give it time to react to something
I need to read from it.

The routine below actually works. 

My remaining practical question is can this been done cleanly.

Ingo's lock debugging dislikes this routine.  
By using raw_local_irq_enable I have avoided all but a message on
the irq return path, I haven't quite worked out where.

But at least this version feels like it could be done better
(less inline? different helpers?) someone.

For interrupts coming through a sane interrupt controller moving
this into process context would certainly simplify things.  For edge
triggered interrupts coming through an io_apic I'm not at all certain
what makes sense.

When the routine below is used to ack an edge triggered interrupt
it runs before the edge triggered interrupt handler so losing an
edge shouldn't happen (we haven't acknowledged the hardware yet)
and even if we do the device driver gets to run at least once.

So doing migration in the irq handler still looks like the best
solution even if it is ugly.  As long as the little bit of
stack overhead isn't a problem I think enabling interrupts to
clear out any pending irqs certainly looks simpler. 

In another vein.  I went and looked through all of Intel's and
AMD's public errata that I could find and there weren't any associated
with irr or isr, so I think my previous version of the code is still
sane, and not likely to break.

I can improve it a little by getting the vector as:
"vector = ~ get_irq_regs()->orig_rax;" instead of reading
ISR.  That still leaves reading the pending bit in ISR and the
other funny tricks.

I'm conflicted between the two approaches a little because playing
games with enabling interrupts in an interrupt handler seems to
have some weird corner cases.

static void ack_apic(unsigned int irq)
{
#if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE)
struct irq_desc *desc;
desc = irq_desc + irq;
if (likely(!(desc->status & IRQ_MOVE_PENDING)))
goto simple;

if (hardirq_count() != HARDIRQ_OFFSET)
goto simple;

desc->chip->mask(irq);
ack_APIC_irq();

/* Ensure all of the irq handlers for this irq have completed
 * before we migrate it.
 */
spin_unlock(>lock);
raw_local_irq_enable();
apic_read(APIC_ID);
raw_local_irq_disable();
spin_lock(>lock);

move_masked_irq(irq);
desc->chip->unmask(irq);
return;
simple:
#endif
ack_APIC_irq();
}


BUG: at /home/eric/projects/linux/linux-2.6-devel/kernel/lockdep.c:1860 
trace_hardirqs_on()

Call Trace:
   [] trace_hardirqs_on_thunk+0x35/0x37
 [] generic_delete_inode+0x0/0x13e
 [] restore_args+0x0/0x30
 [] generic_delete_inode+0x0/0x13e
 [] ack_apic+0x63/0x99
 [] ack_apic+0x5b/0x99
 [] handle_fasteoi_irq+0xc1/0xd1
 [] generic_delete_inode+0x0/0x13e
 [] do_IRQ+0x89/0xf3
 [] default_idle+0x35/0x51
 [] default_idle+0x0/0x51
 [] ret_from_intr+0x0/0xf
   [] generic_delete_inode+0x0/0x13e
 [] default_idle+0x0/0x51
 [] default_idle+0x35/0x51
 [] default_idle+0x37/0x51
 [] default_idle+0x35/0x51
 [] cpu_idle+0x56/0x75
 [] start_secondary+0x481/0x490

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-08 Thread Eric W. Biederman
[EMAIL PROTECTED] (Eric W. Biederman) writes:

> Ingo Molnar <[EMAIL PROTECTED]> writes:
>
>> * Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>>
>>> Ingo would it be reasonable to get a wait queue so I can wait for an 
>>> irq that needs the delayed disable action to actually become masked?
>>
>> that might make sense, but what will do the wakeup - incidental IRQ 
>> arriving on the new CPU? Isnt that a bit risky - maybe the device wont 
>> generate IRQs for a really long time.
>
> I still need to test this, but I believe I have found a simpler
> way to avoid irr problems during migration, and I believe the code
> works equally well with either edge or level triggered interrupts.
>
> The idea is this: Instead of trying test for and handle when irr
> occurs, simply enable local interrupts after disabling and
> acknowledging the irq so that anything pending will be processed,
> before we perform the migration operation.
>
> I don't think the edge case cares about the mask/ack order but
> but masking before acking appears important for the level triggered
> case, so we might as well use that order for both.
>
> Does this look like a sane way to handle this?

The version I would up testing is below, and it doesn't work.
I still get "No irq handler for vector" warnings as well as
a couple of complaints from lock/irq debugging.The debugging
doesn't worry me.  The fact that I don't have a good way to ensure
I have no more irqs in flight does.

So unless someone can find a sure way to drain the irqs in flight,
I can't migrate an irq from process context, and looking at irr and
handling a pending irq appears required.  '

Eric

static void ack_apic(unsigned int irq)
{
#if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE)
if (unlikely((irq_desc[irq].status & IRQ_MOVE_PENDING) && 
 (hardirq_count() == HARDIRQ_OFFSET))) {
struct irq_desc *desc = irq_desc + irq;
desc->chip->mask(irq);
ack_APIC_irq();

/* Ensure all of the irq handlers for this irq have completed
 * before we migrate it.
 */
raw_local_irq_enable();
cpu_relax();
raw_local_irq_disable();
synchronize_irq(irq);

move_masked_irq(irq);
desc->chip->unmask(irq);
return;
}
#endif
ack_APIC_irq();
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-08 Thread Eric W. Biederman
Ingo Molnar <[EMAIL PROTECTED]> writes:

> * Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>
>> Ingo would it be reasonable to get a wait queue so I can wait for an 
>> irq that needs the delayed disable action to actually become masked?
>
> that might make sense, but what will do the wakeup - incidental IRQ 
> arriving on the new CPU? Isnt that a bit risky - maybe the device wont 
> generate IRQs for a really long time.

I still need to test this, but I believe I have found a simpler
way to avoid irr problems during migration, and I believe the code
works equally well with either edge or level triggered interrupts.

The idea is this: Instead of trying test for and handle when irr
occurs, simply enable local interrupts after disabling and
acknowledging the irq so that anything pending will be processed,
before we perform the migration operation.

I don't think the edge case cares about the mask/ack order but
but masking before acking appears important for the level triggered
case, so we might as well use that order for both.

Does this look like a sane way to handle this?

static void ack_apic(unsigned int irq)
{
struct irq_desc *desc = irq_desc + irq;
int do_unmask_irq = 0;
if (unlikely((irq_desc[irq].status & IRQ_MOVE_PENDING) && !in_irq()) {
do_unmask_irq = 1;
desc->chip->mask(irq);
}
ack_APIC_irq();
if (unlikely(do_unmask_irq)) {
/* Don't let pending irqs accumulate */
local_irq_enable();
syncrhonize_irq(irq);
move_masked_irq(irq);
local_irq_disable();
desc->chip->unmask(irq);
}
}

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-08 Thread Eric W. Biederman
Ingo Molnar [EMAIL PROTECTED] writes:

 * Eric W. Biederman [EMAIL PROTECTED] wrote:

 Ingo would it be reasonable to get a wait queue so I can wait for an 
 irq that needs the delayed disable action to actually become masked?

 that might make sense, but what will do the wakeup - incidental IRQ 
 arriving on the new CPU? Isnt that a bit risky - maybe the device wont 
 generate IRQs for a really long time.

I still need to test this, but I believe I have found a simpler
way to avoid irr problems during migration, and I believe the code
works equally well with either edge or level triggered interrupts.

The idea is this: Instead of trying test for and handle when irr
occurs, simply enable local interrupts after disabling and
acknowledging the irq so that anything pending will be processed,
before we perform the migration operation.

I don't think the edge case cares about the mask/ack order but
but masking before acking appears important for the level triggered
case, so we might as well use that order for both.

Does this look like a sane way to handle this?

static void ack_apic(unsigned int irq)
{
struct irq_desc *desc = irq_desc + irq;
int do_unmask_irq = 0;
if (unlikely((irq_desc[irq].status  IRQ_MOVE_PENDING)  !in_irq()) {
do_unmask_irq = 1;
desc-chip-mask(irq);
}
ack_APIC_irq();
if (unlikely(do_unmask_irq)) {
/* Don't let pending irqs accumulate */
local_irq_enable();
syncrhonize_irq(irq);
move_masked_irq(irq);
local_irq_disable();
desc-chip-unmask(irq);
}
}

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-08 Thread Eric W. Biederman
[EMAIL PROTECTED] (Eric W. Biederman) writes:

 Ingo Molnar [EMAIL PROTECTED] writes:

 * Eric W. Biederman [EMAIL PROTECTED] wrote:

 Ingo would it be reasonable to get a wait queue so I can wait for an 
 irq that needs the delayed disable action to actually become masked?

 that might make sense, but what will do the wakeup - incidental IRQ 
 arriving on the new CPU? Isnt that a bit risky - maybe the device wont 
 generate IRQs for a really long time.

 I still need to test this, but I believe I have found a simpler
 way to avoid irr problems during migration, and I believe the code
 works equally well with either edge or level triggered interrupts.

 The idea is this: Instead of trying test for and handle when irr
 occurs, simply enable local interrupts after disabling and
 acknowledging the irq so that anything pending will be processed,
 before we perform the migration operation.

 I don't think the edge case cares about the mask/ack order but
 but masking before acking appears important for the level triggered
 case, so we might as well use that order for both.

 Does this look like a sane way to handle this?

The version I would up testing is below, and it doesn't work.
I still get No irq handler for vector warnings as well as
a couple of complaints from lock/irq debugging.The debugging
doesn't worry me.  The fact that I don't have a good way to ensure
I have no more irqs in flight does.

So unless someone can find a sure way to drain the irqs in flight,
I can't migrate an irq from process context, and looking at irr and
handling a pending irq appears required.  '

Eric

static void ack_apic(unsigned int irq)
{
#if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE)
if (unlikely((irq_desc[irq].status  IRQ_MOVE_PENDING)  
 (hardirq_count() == HARDIRQ_OFFSET))) {
struct irq_desc *desc = irq_desc + irq;
desc-chip-mask(irq);
ack_APIC_irq();

/* Ensure all of the irq handlers for this irq have completed
 * before we migrate it.
 */
raw_local_irq_enable();
cpu_relax();
raw_local_irq_disable();
synchronize_irq(irq);

move_masked_irq(irq);
desc-chip-unmask(irq);
return;
}
#endif
ack_APIC_irq();
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-08 Thread Eric W. Biederman

 The version I would up testing is below, and it doesn't work.
 I still get No irq handler for vector warnings as well as
 a couple of complaints from lock/irq debugging.The debugging
 doesn't worry me.  The fact that I don't have a good way to ensure
 I have no more irqs in flight does.

 So unless someone can find a sure way to drain the irqs in flight,
 I can't migrate an irq from process context, and looking at irr and
 handling a pending irq appears required.  '

Bah.  I had not taken into account that the local apic despite
being tightly coupled with the cpu is for programming purposes
an asynchronous device.  If I want to give it time to react to something
I need to read from it.

The routine below actually works. 

My remaining practical question is can this been done cleanly.

Ingo's lock debugging dislikes this routine.  
By using raw_local_irq_enable I have avoided all but a message on
the irq return path, I haven't quite worked out where.

But at least this version feels like it could be done better
(less inline? different helpers?) someone.

For interrupts coming through a sane interrupt controller moving
this into process context would certainly simplify things.  For edge
triggered interrupts coming through an io_apic I'm not at all certain
what makes sense.

When the routine below is used to ack an edge triggered interrupt
it runs before the edge triggered interrupt handler so losing an
edge shouldn't happen (we haven't acknowledged the hardware yet)
and even if we do the device driver gets to run at least once.

So doing migration in the irq handler still looks like the best
solution even if it is ugly.  As long as the little bit of
stack overhead isn't a problem I think enabling interrupts to
clear out any pending irqs certainly looks simpler. 

In another vein.  I went and looked through all of Intel's and
AMD's public errata that I could find and there weren't any associated
with irr or isr, so I think my previous version of the code is still
sane, and not likely to break.

I can improve it a little by getting the vector as:
vector = ~ get_irq_regs()-orig_rax; instead of reading
ISR.  That still leaves reading the pending bit in ISR and the
other funny tricks.

I'm conflicted between the two approaches a little because playing
games with enabling interrupts in an interrupt handler seems to
have some weird corner cases.

static void ack_apic(unsigned int irq)
{
#if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE)
struct irq_desc *desc;
desc = irq_desc + irq;
if (likely(!(desc-status  IRQ_MOVE_PENDING)))
goto simple;

if (hardirq_count() != HARDIRQ_OFFSET)
goto simple;

desc-chip-mask(irq);
ack_APIC_irq();

/* Ensure all of the irq handlers for this irq have completed
 * before we migrate it.
 */
spin_unlock(desc-lock);
raw_local_irq_enable();
apic_read(APIC_ID);
raw_local_irq_disable();
spin_lock(desc-lock);

move_masked_irq(irq);
desc-chip-unmask(irq);
return;
simple:
#endif
ack_APIC_irq();
}


BUG: at /home/eric/projects/linux/linux-2.6-devel/kernel/lockdep.c:1860 
trace_hardirqs_on()

Call Trace:
 IRQ  [8048562f] trace_hardirqs_on_thunk+0x35/0x37
 [80290401] generic_delete_inode+0x0/0x13e
 [8020a0fc] restore_args+0x0/0x30
 [80290401] generic_delete_inode+0x0/0x13e
 [8021648d] ack_apic+0x63/0x99
 [80216485] ack_apic+0x5b/0x99
 [8025881e] handle_fasteoi_irq+0xc1/0xd1
 [80290401] generic_delete_inode+0x0/0x13e
 [8020c0de] do_IRQ+0x89/0xf3
 [80208ce8] default_idle+0x35/0x51
 [80208cb3] default_idle+0x0/0x51
 [8020a0a6] ret_from_intr+0x0/0xf
 EOI  [80290401] generic_delete_inode+0x0/0x13e
 [80208cb3] default_idle+0x0/0x51
 [80208ce8] default_idle+0x35/0x51
 [80208cea] default_idle+0x37/0x51
 [80208ce8] default_idle+0x35/0x51
 [80208d5a] cpu_idle+0x56/0x75
 [808b9a69] start_secondary+0x481/0x490

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-06 Thread Eric W. Biederman
Ingo Molnar <[EMAIL PROTECTED]> writes:

> * Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>
>> Ingo would it be reasonable to get a wait queue so I can wait for an 
>> irq that needs the delayed disable action to actually become masked?
>
> that might make sense, but what will do the wakeup - incidental IRQ 
> arriving on the new CPU? 

That is what I was thinking.

> Isnt that a bit risky - maybe the device wont 
> generate IRQs for a really long time.

Well this is in a user space context called from user space and it
exactly matches the semantics we have now.  If we make it an
interruptible sleep the user space process shouldn't block.

I guess the other thing to do is do it in a non-block fashion
and just call schedule_work from the interrupt context when the
irq is disabled.  For i386 with it's in kernel irq scheduler
that might be better.

I think the nasty case is probably what do we do when it is
the timer interrupt we are dealing with.

Hmm.  I think I should look up what the rules are for
calling local_irq_enable when in interrupt context.  That
might be another way to satisfy this problem.

If local irqs are enabled I don't have to worry about the irr
register.

You've got me brainstorming now.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-06 Thread Ingo Molnar

* Eric W. Biederman <[EMAIL PROTECTED]> wrote:

> Ingo would it be reasonable to get a wait queue so I can wait for an 
> irq that needs the delayed disable action to actually become masked?

that might make sense, but what will do the wakeup - incidental IRQ 
arriving on the new CPU? Isnt that a bit risky - maybe the device wont 
generate IRQs for a really long time.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-06 Thread Eric W. Biederman

Ingo would it be reasonable to get a wait queue so I can wait for an
irq that needs the delayed disable action to actually become masked?

There are a lot of practical reasons while I cannot reprogram an
unmasked irq.  However if we can wait it should be able to get all of
the convoluted irq migration logic out of interrupt context, and into
process context.

If know interrupts have been enabled on the cpus where an irq could be
delivered after the irq has been masked, I know that the irq is not
currently in progress. 

Therefore I know the irq will not be in progress again until the irq
is unmasked.

Once I know the irq will not be received again it is safe to modify
the irq handling data structures and the interrupt controller.

I think the only generic piece missing from this equation is a wait
queue so I can wait for the irq to become masked.  Say a generic
sleeping masq_irq() call?

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-06 Thread Eric W. Biederman
"Luigi Genoni" <[EMAIL PROTECTED]> writes:

> btw, I tested in on ( CPU and no way to reproduce the bug, (no messages 
> appeared and load average was quite as high as exspected). 
> Going to test an a 16 CPU (the same who triggered the bug at the beginning) 
> immediatelly when it will be free.

Thanks.  Hopefully that will confirm I have properly tracked this bug.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-06 Thread Eric W. Biederman

Ingo thanks for the review.

Ingo Molnar <[EMAIL PROTECTED]> writes:

> * Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>
>> When making the interrupt vectors per cpu I failed to handle a case 
>> during irq migration.  If the same interrupt comes in while we are 
>> servicing the irq but before we migrate it the pending bit in the 
>> local apic IRR register will be set for that irq.
>
> hm. I think this needs more work. For example this part of the fix looks 
> quite ugly to me:

I'm not at all certain I can make an ugly reality look beautiful.

>> +static unsigned apic_in_service_vector(void)
>> +{
>> +unsigned isr, vector;
>> +/* Figure out which vector we are servicing */
>> + for (vector = FIRST_EXTERNAL_VECTOR; vector < FIRST_SYSTEM_VECTOR; vector 
>> +=
> 32) {
>> +isr = apic_read(APIC_ISR + ((vector/32) * 0x10));
>> +if (isr)
>> +break;
>> +}
>> +/* Find the low bits of the vector we are servicing */
>> +vector += __ffs(isr);
>> +return vector;
>
> so we read the hardware to figure out what the hell we are doing. The 
> problem is - why doesnt the kernel know at this point what it is doing? 
> It knows the CPU and it should know all the vector numbers. It also has 
> an irq number.

Yes.  And by adding a percpu global I can do this.  If figured since this
should be a rare case it would be equally simple to read this from
the hardware, as it already has the information stored in roughly the
form I would need to store it in.  If there errata in this area and I
am likely to hit them then it is probably a good idea to look at a
different implementation.


>> +/* If the irq we are servicing has moved and is not pending
>> + * free it's vector.
>> + */
>> +irr = apic_read(APIC_IRR + ((vector/32) * 0x10));
>
> the IRR is quite fragile. It's a rarely used hardware field and it has 
> erratums attached to it. Again, it seems like this function too tries to 
> recover information it /should/ already have.

If I am servicing an interrupt and that same interrupt comes in again
before I acknowledge it how /should/ I know that?

The only way I know to find that information is to ask the hardware.

>> @@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq)
>>  
>>  static void ack_apic_edge(unsigned int irq)
>>  {
>> -move_native_irq(irq);
>> +if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) {
>> +move_native_irq(irq);
>> +apic_handle_pending_vector(apic_in_service_vector());
>> +}
>>  ack_APIC_irq();
>
> this looks a bit ugly and a bit fragile. We had a simple 
> 'move_native_irq()' call for IRQ balancing, which is straightforward, 
> but now we've got this complex condition open coded.

Well the condition is testing a single bit so I don't think it is that
complex.  Maybe taking it out of line will help or maybe that will obscure
things.  I'm inclined to believe hiding the irq migration logic will
obscure things and make it harder to debug.

Now part of the reason I did it this way is I have at least 3
set_affinity implementations and this issue really has nothing to do
with the external interrupt controller but everything to do with the
cpu local interrupt controller, so this did not seem like something
that was reasonably buried in set_affinity.

> If then this should be done in some sort of helper - but even then, the 
> whole approach looks a bit erroneous.
>
> To me the cleanest way to migrate an IRQ between two different vectors 
> on two different CPUs would be to first mask the IRQ source in the PIC, 
> then to do the migration /atomically/ (no intermediary state), and then 
> unmask. If the PIC loses edges while masked then that's a problem of the 
> irq chip implementation of the PIC: its ->set_affinity() method should 
> refuse to migrate edge-triggered IRQs if it can lose edges while 
> unmasked!

Ingo I believe what you have described is essentially what we are
doing before my patches, or what we were doing in even older versions
that had other races and problems. 

To some extent I have inherited the current design that mostly works.  The
only known reliable way to block and edge triggered irq is to be servicing it.
The practical problem there is that when we sit on an irq the irq can come
in again and queue up in irr.  Which means that once we have updated the
data structures acked the irq and returned the irq will come in again
in the old location because the io_apic has a 1 deep queue for each
irq.  Of course for the irq controllers that can be masked safely your
proposed change to disable irq is unfortunate.

You appear to be lobbying for disabling the irq asynchronously to all
of the irq reception activity.  That would seem to me to require
running on the cpu where the irq is currently programmed to be
delivered disabling local interrupts, as well as disabling the irq in
the interrupt controller before reprogramming it.  For the irqs that

Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-06 Thread Eric W. Biederman

Ingo thanks for the review.

Ingo Molnar [EMAIL PROTECTED] writes:

 * Eric W. Biederman [EMAIL PROTECTED] wrote:

 When making the interrupt vectors per cpu I failed to handle a case 
 during irq migration.  If the same interrupt comes in while we are 
 servicing the irq but before we migrate it the pending bit in the 
 local apic IRR register will be set for that irq.

 hm. I think this needs more work. For example this part of the fix looks 
 quite ugly to me:

I'm not at all certain I can make an ugly reality look beautiful.

 +static unsigned apic_in_service_vector(void)
 +{
 +unsigned isr, vector;
 +/* Figure out which vector we are servicing */
 + for (vector = FIRST_EXTERNAL_VECTOR; vector  FIRST_SYSTEM_VECTOR; vector 
 +=
 32) {
 +isr = apic_read(APIC_ISR + ((vector/32) * 0x10));
 +if (isr)
 +break;
 +}
 +/* Find the low bits of the vector we are servicing */
 +vector += __ffs(isr);
 +return vector;

 so we read the hardware to figure out what the hell we are doing. The 
 problem is - why doesnt the kernel know at this point what it is doing? 
 It knows the CPU and it should know all the vector numbers. It also has 
 an irq number.

Yes.  And by adding a percpu global I can do this.  If figured since this
should be a rare case it would be equally simple to read this from
the hardware, as it already has the information stored in roughly the
form I would need to store it in.  If there errata in this area and I
am likely to hit them then it is probably a good idea to look at a
different implementation.


 +/* If the irq we are servicing has moved and is not pending
 + * free it's vector.
 + */
 +irr = apic_read(APIC_IRR + ((vector/32) * 0x10));

 the IRR is quite fragile. It's a rarely used hardware field and it has 
 erratums attached to it. Again, it seems like this function too tries to 
 recover information it /should/ already have.

If I am servicing an interrupt and that same interrupt comes in again
before I acknowledge it how /should/ I know that?

The only way I know to find that information is to ask the hardware.

 @@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq)
  
  static void ack_apic_edge(unsigned int irq)
  {
 -move_native_irq(irq);
 +if (unlikely(irq_desc[irq].status  IRQ_MOVE_PENDING)) {
 +move_native_irq(irq);
 +apic_handle_pending_vector(apic_in_service_vector());
 +}
  ack_APIC_irq();

 this looks a bit ugly and a bit fragile. We had a simple 
 'move_native_irq()' call for IRQ balancing, which is straightforward, 
 but now we've got this complex condition open coded.

Well the condition is testing a single bit so I don't think it is that
complex.  Maybe taking it out of line will help or maybe that will obscure
things.  I'm inclined to believe hiding the irq migration logic will
obscure things and make it harder to debug.

Now part of the reason I did it this way is I have at least 3
set_affinity implementations and this issue really has nothing to do
with the external interrupt controller but everything to do with the
cpu local interrupt controller, so this did not seem like something
that was reasonably buried in set_affinity.

 If then this should be done in some sort of helper - but even then, the 
 whole approach looks a bit erroneous.

 To me the cleanest way to migrate an IRQ between two different vectors 
 on two different CPUs would be to first mask the IRQ source in the PIC, 
 then to do the migration /atomically/ (no intermediary state), and then 
 unmask. If the PIC loses edges while masked then that's a problem of the 
 irq chip implementation of the PIC: its -set_affinity() method should 
 refuse to migrate edge-triggered IRQs if it can lose edges while 
 unmasked!

Ingo I believe what you have described is essentially what we are
doing before my patches, or what we were doing in even older versions
that had other races and problems. 

To some extent I have inherited the current design that mostly works.  The
only known reliable way to block and edge triggered irq is to be servicing it.
The practical problem there is that when we sit on an irq the irq can come
in again and queue up in irr.  Which means that once we have updated the
data structures acked the irq and returned the irq will come in again
in the old location because the io_apic has a 1 deep queue for each
irq.  Of course for the irq controllers that can be masked safely your
proposed change to disable irq is unfortunate.

You appear to be lobbying for disabling the irq asynchronously to all
of the irq reception activity.  That would seem to me to require
running on the cpu where the irq is currently programmed to be
delivered disabling local interrupts, as well as disabling the irq in
the interrupt controller before reprogramming it.  For the irqs that
applies to there does seem to be some merit in that. 

Of course the irq will queue in irr so it can be 

Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-06 Thread Eric W. Biederman
Luigi Genoni [EMAIL PROTECTED] writes:

 btw, I tested in on ( CPU and no way to reproduce the bug, (no messages 
 appeared and load average was quite as high as exspected). 
 Going to test an a 16 CPU (the same who triggered the bug at the beginning) 
 immediatelly when it will be free.

Thanks.  Hopefully that will confirm I have properly tracked this bug.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-06 Thread Eric W. Biederman

Ingo would it be reasonable to get a wait queue so I can wait for an
irq that needs the delayed disable action to actually become masked?

There are a lot of practical reasons while I cannot reprogram an
unmasked irq.  However if we can wait it should be able to get all of
the convoluted irq migration logic out of interrupt context, and into
process context.

If know interrupts have been enabled on the cpus where an irq could be
delivered after the irq has been masked, I know that the irq is not
currently in progress. 

Therefore I know the irq will not be in progress again until the irq
is unmasked.

Once I know the irq will not be received again it is safe to modify
the irq handling data structures and the interrupt controller.

I think the only generic piece missing from this equation is a wait
queue so I can wait for the irq to become masked.  Say a generic
sleeping masq_irq() call?

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-06 Thread Ingo Molnar

* Eric W. Biederman [EMAIL PROTECTED] wrote:

 Ingo would it be reasonable to get a wait queue so I can wait for an 
 irq that needs the delayed disable action to actually become masked?

that might make sense, but what will do the wakeup - incidental IRQ 
arriving on the new CPU? Isnt that a bit risky - maybe the device wont 
generate IRQs for a really long time.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-06 Thread Eric W. Biederman
Ingo Molnar [EMAIL PROTECTED] writes:

 * Eric W. Biederman [EMAIL PROTECTED] wrote:

 Ingo would it be reasonable to get a wait queue so I can wait for an 
 irq that needs the delayed disable action to actually become masked?

 that might make sense, but what will do the wakeup - incidental IRQ 
 arriving on the new CPU? 

That is what I was thinking.

 Isnt that a bit risky - maybe the device wont 
 generate IRQs for a really long time.

Well this is in a user space context called from user space and it
exactly matches the semantics we have now.  If we make it an
interruptible sleep the user space process shouldn't block.

I guess the other thing to do is do it in a non-block fashion
and just call schedule_work from the interrupt context when the
irq is disabled.  For i386 with it's in kernel irq scheduler
that might be better.

I think the nasty case is probably what do we do when it is
the timer interrupt we are dealing with.

Hmm.  I think I should look up what the rules are for
calling local_irq_enable when in interrupt context.  That
might be another way to satisfy this problem.

If local irqs are enabled I don't have to worry about the irr
register.

You've got me brainstorming now.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Ingo Molnar

* Eric W. Biederman <[EMAIL PROTECTED]> wrote:

> When making the interrupt vectors per cpu I failed to handle a case 
> during irq migration.  If the same interrupt comes in while we are 
> servicing the irq but before we migrate it the pending bit in the 
> local apic IRR register will be set for that irq.

hm. I think this needs more work. For example this part of the fix looks 
quite ugly to me:

> +static unsigned apic_in_service_vector(void)
> +{
> + unsigned isr, vector;
> + /* Figure out which vector we are servicing */
> + for (vector = FIRST_EXTERNAL_VECTOR; vector < FIRST_SYSTEM_VECTOR; 
> vector += 32) {
> + isr = apic_read(APIC_ISR + ((vector/32) * 0x10));
> + if (isr)
> + break;
> + }
> + /* Find the low bits of the vector we are servicing */
> + vector += __ffs(isr);
> + return vector;

so we read the hardware to figure out what the hell we are doing. The 
problem is - why doesnt the kernel know at this point what it is doing? 
It knows the CPU and it should know all the vector numbers. It also has 
an irq number.

> + /* If the irq we are servicing has moved and is not pending
> +  * free it's vector.
> +  */
> + irr = apic_read(APIC_IRR + ((vector/32) * 0x10));

the IRR is quite fragile. It's a rarely used hardware field and it has 
erratums attached to it. Again, it seems like this function too tries to 
recover information it /should/ already have.

> @@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq)
>  
>  static void ack_apic_edge(unsigned int irq)
>  {
> - move_native_irq(irq);
> + if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) {
> + move_native_irq(irq);
> + apic_handle_pending_vector(apic_in_service_vector());
> + }
>   ack_APIC_irq();

this looks a bit ugly and a bit fragile. We had a simple 
'move_native_irq()' call for IRQ balancing, which is straightforward, 
but now we've got this complex condition open coded.

If then this should be done in some sort of helper - but even then, the 
whole approach looks a bit erroneous.

To me the cleanest way to migrate an IRQ between two different vectors 
on two different CPUs would be to first mask the IRQ source in the PIC, 
then to do the migration /atomically/ (no intermediary state), and then 
unmask. If the PIC loses edges while masked then that's a problem of the 
irq chip implementation of the PIC: its ->set_affinity() method should 
refuse to migrate edge-triggered IRQs if it can lose edges while 
unmasked!

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Lu, Yinghai
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 05, 2007 3:03 PM
>Nope. irq routines are a stack.  if apic_in_service_vector could return
>the wrong value.  ack_APIC_irq() which use the same information would
>acknowledge the wrong irq.  If there was actually any danger of
>mis-computing that information I would just pass it from the interrupt
>service routine stash it in a per cpu variable and then read it out.
>But the apic already has registers doing that, so I was lazy and used
>what was available.  It should be the common case that we need that
>information.

OK.

I wonder if current kernel support different cpu handle irq request for
different device at the same time.

YH


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Eric W. Biederman
"Lu, Yinghai" <[EMAIL PROTECTED]> writes:

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Monday, February 05, 2007 12:37 PM
>
>
>>The only corner case I can see that might potentially happen is
>>"apic_in_service_vector() != irq_vector[irq]" and if that is the case
>>we don't want to migrate, because the precondition that we are in the
>>irq handler servicing the expected irq isn't true.
>
> Reuse vector could help in that case.

A little but you are still on the wrong cpu, and that corner case might
even be worth looking at on i386.  I haven't assessed what might go wrong
yet but the current strategy of migrating irqs is based in the interrupt
service routine being where new instances of that irq will be
delivered.

What I do know is all of this is a very narrow window, and insanely
hard to trigger with consequences that shouldn't hang the machine.
The only reason we are even getting a reasonable number of irqs being
in service and in the irr is because we are acking a level triggered
vector used by two irqs and then the pending irq retriggers.  Avoiding
that case would certainly be an efficiency improvement.

Basically I think it takes the alignment of several 1 in a million
chances before the code I posted will have problems with this
theoretical case.  

> In another case, if two irq are migrated from one cpu to another cpu.
> ack_apic_edge for irq2 could use get apci_in_servier_vector for irq1,
> and handle that to clear irr for irq1. instead of irq2.

Nope. irq routines are a stack.  if apic_in_service_vector could return
the wrong value.  ack_APIC_irq() which use the same information would
acknowledge the wrong irq.  If there was actually any danger of
mis-computing that information I would just pass it from the interrupt
service routine stash it in a per cpu variable and then read it out.
But the apic already has registers doing that, so I was lazy and used
what was available.  It should be the common case that we need that
information.


Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Lu, Yinghai
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 05, 2007 12:37 PM


>The only corner case I can see that might potentially happen is
>"apic_in_service_vector() != irq_vector[irq]" and if that is the case
>we don't want to migrate, because the precondition that we are in the
>irq handler servicing the expected irq isn't true.

Reuse vector could help in that case.

In another case, if two irq are migrated from one cpu to another cpu.
ack_apic_edge for irq2 could use get apci_in_servier_vector for irq1,
and handle that to clear irr for irq1. instead of irq2.

YH




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Eric W. Biederman
"Lu, Yinghai" <[EMAIL PROTECTED]> writes:

> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
>>> How about let apic_hangle_pending_vector take the irq too?
>
>>We can't compute the vector by reading the hardware registers after
>>we have acknowledged the irq.
>
>>I hope that was the answer you were looking for I'm not quite certain
>>what you mean by take.
>
> I mean 

I don't see the point.  We would have to miscompute the vector that
we are servicing or improperly fill in vector_irq for that to be useful.

The only corner case I can see that might potentially happen is
"apic_in_service_vector() != irq_vector[irq]" and if that is the case
we don't want to migrate, because the precondition that we are in the
irq handler servicing the expected irq isn't true.

Of course someone would have to set IRQ_MOVE_PENDING pretty fast for
that case to hit, it's a tiny hole probably worth closing though.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Lu, Yinghai
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
>> How about let apic_hangle_pending_vector take the irq too?

>We can't compute the vector by reading the hardware registers after
>we have acknowledged the irq.

>I hope that was the answer you were looking for I'm not quite certain
>what you mean by take.

I mean 


+static void apic_handle_pending_vector(unsigned vector)
+{
+   unsigned irr;
+   int irq;
+
+   irq = __get_cpu_var(vector_irq)[vector];
+   if (irq >= 0)
+   return;
+
+   /* If the irq we are servicing has moved and is not pending
+* free it's vector.
+*/

==>

+static void apic_handle_pending_vector(unsigned vector, unsigned irqx)
+{
+   unsigned irr;
+   int irq;
+
+   irq = __get_cpu_var(vector_irq)[vector];
+   if ( (-irq) != irqx)
+   return;
+ 
+
+   /* If the irq we are servicing has moved and is not pending
+* free it's vector.
+*/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Eric W. Biederman
"Lu, Yinghai" <[EMAIL PROTECTED]> writes:

> Eric,
>
> How about let apic_hangle_pending_vector take the irq too?

We can't compute the vector by reading the hardware registers after
we have acknowledged the irq.

I hope that was the answer you were looking for I'm not quite certain
what you mean by take.

> I wonder if there any chance that you have two IRQ_MOVE_PENDING?

I'm not quite certain what you are asking on the face of it there
is one bit per irq so it is impossible.

I don't think we can get into any trouble we don't let a vector be
reused on a cpu until we know it isn't in irr.  So even if things
come in highly out of order the should still function.

I also have seen irqs migrate on nearly every irq received.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Lu, Yinghai
Eric,

How about let apic_hangle_pending_vector take the irq too?
I wonder if there any chance that you have two IRQ_MOVE_PENDING? 


BYW, Andi, Do you want to try "reuse vector when do the irq-balance
between CPUS"?

If So, I will update the patch that you dropped some months ago.

YH


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Lu, Yinghai
Eric,

How about let apic_hangle_pending_vector take the irq too?
I wonder if there any chance that you have two IRQ_MOVE_PENDING? 


BYW, Andi, Do you want to try reuse vector when do the irq-balance
between CPUS?

If So, I will update the patch that you dropped some months ago.

YH


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Eric W. Biederman
Lu, Yinghai [EMAIL PROTECTED] writes:

 Eric,

 How about let apic_hangle_pending_vector take the irq too?

We can't compute the vector by reading the hardware registers after
we have acknowledged the irq.

I hope that was the answer you were looking for I'm not quite certain
what you mean by take.

 I wonder if there any chance that you have two IRQ_MOVE_PENDING?

I'm not quite certain what you are asking on the face of it there
is one bit per irq so it is impossible.

I don't think we can get into any trouble we don't let a vector be
reused on a cpu until we know it isn't in irr.  So even if things
come in highly out of order the should still function.

I also have seen irqs migrate on nearly every irq received.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Lu, Yinghai
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 How about let apic_hangle_pending_vector take the irq too?

We can't compute the vector by reading the hardware registers after
we have acknowledged the irq.

I hope that was the answer you were looking for I'm not quite certain
what you mean by take.

I mean 


+static void apic_handle_pending_vector(unsigned vector)
+{
+   unsigned irr;
+   int irq;
+
+   irq = __get_cpu_var(vector_irq)[vector];
+   if (irq = 0)
+   return;
+
+   /* If the irq we are servicing has moved and is not pending
+* free it's vector.
+*/

==

+static void apic_handle_pending_vector(unsigned vector, unsigned irqx)
+{
+   unsigned irr;
+   int irq;
+
+   irq = __get_cpu_var(vector_irq)[vector];
+   if ( (-irq) != irqx)
+   return;
+ 
+
+   /* If the irq we are servicing has moved and is not pending
+* free it's vector.
+*/



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Eric W. Biederman
Lu, Yinghai [EMAIL PROTECTED] writes:

 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 How about let apic_hangle_pending_vector take the irq too?

We can't compute the vector by reading the hardware registers after
we have acknowledged the irq.

I hope that was the answer you were looking for I'm not quite certain
what you mean by take.

 I mean 

I don't see the point.  We would have to miscompute the vector that
we are servicing or improperly fill in vector_irq for that to be useful.

The only corner case I can see that might potentially happen is
apic_in_service_vector() != irq_vector[irq] and if that is the case
we don't want to migrate, because the precondition that we are in the
irq handler servicing the expected irq isn't true.

Of course someone would have to set IRQ_MOVE_PENDING pretty fast for
that case to hit, it's a tiny hole probably worth closing though.

Eric

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Lu, Yinghai
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 05, 2007 12:37 PM


The only corner case I can see that might potentially happen is
apic_in_service_vector() != irq_vector[irq] and if that is the case
we don't want to migrate, because the precondition that we are in the
irq handler servicing the expected irq isn't true.

Reuse vector could help in that case.

In another case, if two irq are migrated from one cpu to another cpu.
ack_apic_edge for irq2 could use get apci_in_servier_vector for irq1,
and handle that to clear irr for irq1. instead of irq2.

YH




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Eric W. Biederman
Lu, Yinghai [EMAIL PROTECTED] writes:

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Monday, February 05, 2007 12:37 PM


The only corner case I can see that might potentially happen is
apic_in_service_vector() != irq_vector[irq] and if that is the case
we don't want to migrate, because the precondition that we are in the
irq handler servicing the expected irq isn't true.

 Reuse vector could help in that case.

A little but you are still on the wrong cpu, and that corner case might
even be worth looking at on i386.  I haven't assessed what might go wrong
yet but the current strategy of migrating irqs is based in the interrupt
service routine being where new instances of that irq will be
delivered.

What I do know is all of this is a very narrow window, and insanely
hard to trigger with consequences that shouldn't hang the machine.
The only reason we are even getting a reasonable number of irqs being
in service and in the irr is because we are acking a level triggered
vector used by two irqs and then the pending irq retriggers.  Avoiding
that case would certainly be an efficiency improvement.

Basically I think it takes the alignment of several 1 in a million
chances before the code I posted will have problems with this
theoretical case.  

 In another case, if two irq are migrated from one cpu to another cpu.
 ack_apic_edge for irq2 could use get apci_in_servier_vector for irq1,
 and handle that to clear irr for irq1. instead of irq2.

Nope. irq routines are a stack.  if apic_in_service_vector could return
the wrong value.  ack_APIC_irq() which use the same information would
acknowledge the wrong irq.  If there was actually any danger of
mis-computing that information I would just pass it from the interrupt
service routine stash it in a per cpu variable and then read it out.
But the apic already has registers doing that, so I was lazy and used
what was available.  It should be the common case that we need that
information.


Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Lu, Yinghai
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 05, 2007 3:03 PM
Nope. irq routines are a stack.  if apic_in_service_vector could return
the wrong value.  ack_APIC_irq() which use the same information would
acknowledge the wrong irq.  If there was actually any danger of
mis-computing that information I would just pass it from the interrupt
service routine stash it in a per cpu variable and then read it out.
But the apic already has registers doing that, so I was lazy and used
what was available.  It should be the common case that we need that
information.

OK.

I wonder if current kernel support different cpu handle irq request for
different device at the same time.

YH


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-05 Thread Ingo Molnar

* Eric W. Biederman [EMAIL PROTECTED] wrote:

 When making the interrupt vectors per cpu I failed to handle a case 
 during irq migration.  If the same interrupt comes in while we are 
 servicing the irq but before we migrate it the pending bit in the 
 local apic IRR register will be set for that irq.

hm. I think this needs more work. For example this part of the fix looks 
quite ugly to me:

 +static unsigned apic_in_service_vector(void)
 +{
 + unsigned isr, vector;
 + /* Figure out which vector we are servicing */
 + for (vector = FIRST_EXTERNAL_VECTOR; vector  FIRST_SYSTEM_VECTOR; 
 vector += 32) {
 + isr = apic_read(APIC_ISR + ((vector/32) * 0x10));
 + if (isr)
 + break;
 + }
 + /* Find the low bits of the vector we are servicing */
 + vector += __ffs(isr);
 + return vector;

so we read the hardware to figure out what the hell we are doing. The 
problem is - why doesnt the kernel know at this point what it is doing? 
It knows the CPU and it should know all the vector numbers. It also has 
an irq number.

 + /* If the irq we are servicing has moved and is not pending
 +  * free it's vector.
 +  */
 + irr = apic_read(APIC_IRR + ((vector/32) * 0x10));

the IRR is quite fragile. It's a rarely used hardware field and it has 
erratums attached to it. Again, it seems like this function too tries to 
recover information it /should/ already have.

 @@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq)
  
  static void ack_apic_edge(unsigned int irq)
  {
 - move_native_irq(irq);
 + if (unlikely(irq_desc[irq].status  IRQ_MOVE_PENDING)) {
 + move_native_irq(irq);
 + apic_handle_pending_vector(apic_in_service_vector());
 + }
   ack_APIC_irq();

this looks a bit ugly and a bit fragile. We had a simple 
'move_native_irq()' call for IRQ balancing, which is straightforward, 
but now we've got this complex condition open coded.

If then this should be done in some sort of helper - but even then, the 
whole approach looks a bit erroneous.

To me the cleanest way to migrate an IRQ between two different vectors 
on two different CPUs would be to first mask the IRQ source in the PIC, 
then to do the migration /atomically/ (no intermediary state), and then 
unmask. If the PIC loses edges while masked then that's a problem of the 
irq chip implementation of the PIC: its -set_affinity() method should 
refuse to migrate edge-triggered IRQs if it can lose edges while 
unmasked!

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-03 Thread l . genoni
As I reported when I tested this patch, it works, but I could see an 
abnormally high load averay while triggering the error message. anyway, it 
is better to have an high load averag three or four times higher than what 
you would expect then a crash/reboot. isn't it? :)


Luigi Genoni

p.s.
will test the other definitive patch on montday on both 8 and 16 CPU 
system.


On Sat, 3 Feb 2007, Eric W. Biederman wrote:


Date: Sat, 03 Feb 2007 00:55:11 -0700
From: Eric W. Biederman <[EMAIL PROTECTED]>
To: Arjan van de Ven <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>, linux-kernel@vger.kernel.org,
"Lu, Yinghai" <[EMAIL PROTECTED]>,
Luigi Genoni <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL PROTECTED]>,
Natalie Protasevich <[EMAIL PROTECTED]>, Andi Kleen <[EMAIL PROTECTED]>
Subject: Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq
migration.
Resent-Date: Sat, 03 Feb 2007 09:05:10 +0100
Resent-From: <[EMAIL PROTECTED]>

Arjan van de Ven <[EMAIL PROTECTED]> writes:


Once the migration operation is complete we know we will receive
no more interrupts on this vector so the irq pending state for
this irq will no longer be updated.  If the irq is not pending and
we are in the intermediate state we immediately free the vector,
otherwise in we free the vector in do_IRQ when the pending irq
arrives.


So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
I assume it doesn't affect many people?


I got a few reports of this; irqbalance may trigger this kernel bug it
seems... I would suggest to consider this for 2.6.20 since it's a
hard-hang case



Yes.  The bug I fixed will not happen if you don't migrate irqs.

At the very least we want the patch below (already in -mm)
that makes it not a hard hang case.

Subject: [PATCH] x86_64:  Survive having no irq mapping for a vector

Occasionally the kernel has bugs that result in no irq being
found for a given cpu vector.  If we acknowledge the irq
the system has a good chance of continuing even though we dropped
an missed an irq message.  If we continue to simply print a
message and drop and not acknowledge the irq the system is
likely to become non-responsive shortly there after.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
arch/x86_64/kernel/irq.c |   11 ---
1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 0c06af6..648055a 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -120,9 +120,14 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs)

if (likely(irq < NR_IRQS))
generic_handle_irq(irq);
-   else if (printk_ratelimit())
-   printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
-   __func__, smp_processor_id(), vector);
+   else {
+   if (!disable_apic)
+   ack_APIC_irq();
+
+   if (printk_ratelimit())
+   printk(KERN_EMERG "%s: %d.%d No irq handler for 
vector\n",
+   __func__, smp_processor_id(), vector);
+   }

irq_exit();

--
1.4.4.1.g278f

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-03 Thread Andi Kleen
On Saturday 03 February 2007 11:22, Eric W. Biederman wrote:
> Andi Kleen <[EMAIL PROTECTED]> writes:
> 
> >> Once the migration operation is complete we know we will receive
> >> no more interrupts on this vector so the irq pending state for
> >> this irq will no longer be updated.  If the irq is not pending and
> >> we are in the intermediate state we immediately free the vector,
> >> otherwise in we free the vector in do_IRQ when the pending irq
> >> arrives.
> >
> > Ok for me, although the magic numbers are a little nasty.
> 
> You must be talking about (vector/32) *0x10.

No I meant the -1s

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-03 Thread Eric W. Biederman
Andi Kleen <[EMAIL PROTECTED]> writes:

>> Once the migration operation is complete we know we will receive
>> no more interrupts on this vector so the irq pending state for
>> this irq will no longer be updated.  If the irq is not pending and
>> we are in the intermediate state we immediately free the vector,
>> otherwise in we free the vector in do_IRQ when the pending irq
>> arrives.
>
> Ok for me, although the magic numbers are a little nasty.

You must be talking about (vector/32) *0x10.
The 32 is the number of bits and 0x10 the gap between apic
registers.  I couldn't think of a better form.If someone
can think of a better way it probably warrants a cleanup patch
at some point.

> What about i386?

i386 does not handle this case but since it is still globally
allocating all of it's vectors and never changes it's vectors during
migration it is totally harmless when an irq comes in on a cpu other
than the one we are expecting it on.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-03 Thread Andi Kleen

> Once the migration operation is complete we know we will receive
> no more interrupts on this vector so the irq pending state for
> this irq will no longer be updated.  If the irq is not pending and
> we are in the intermediate state we immediately free the vector,
> otherwise in we free the vector in do_IRQ when the pending irq
> arrives.

Ok for me, although the magic numbers are a little nasty.

What about i386?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-03 Thread Andi Kleen

 Once the migration operation is complete we know we will receive
 no more interrupts on this vector so the irq pending state for
 this irq will no longer be updated.  If the irq is not pending and
 we are in the intermediate state we immediately free the vector,
 otherwise in we free the vector in do_IRQ when the pending irq
 arrives.

Ok for me, although the magic numbers are a little nasty.

What about i386?

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-03 Thread Eric W. Biederman
Andi Kleen [EMAIL PROTECTED] writes:

 Once the migration operation is complete we know we will receive
 no more interrupts on this vector so the irq pending state for
 this irq will no longer be updated.  If the irq is not pending and
 we are in the intermediate state we immediately free the vector,
 otherwise in we free the vector in do_IRQ when the pending irq
 arrives.

 Ok for me, although the magic numbers are a little nasty.

You must be talking about (vector/32) *0x10.
The 32 is the number of bits and 0x10 the gap between apic
registers.  I couldn't think of a better form.If someone
can think of a better way it probably warrants a cleanup patch
at some point.

 What about i386?

i386 does not handle this case but since it is still globally
allocating all of it's vectors and never changes it's vectors during
migration it is totally harmless when an irq comes in on a cpu other
than the one we are expecting it on.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-03 Thread Andi Kleen
On Saturday 03 February 2007 11:22, Eric W. Biederman wrote:
 Andi Kleen [EMAIL PROTECTED] writes:
 
  Once the migration operation is complete we know we will receive
  no more interrupts on this vector so the irq pending state for
  this irq will no longer be updated.  If the irq is not pending and
  we are in the intermediate state we immediately free the vector,
  otherwise in we free the vector in do_IRQ when the pending irq
  arrives.
 
  Ok for me, although the magic numbers are a little nasty.
 
 You must be talking about (vector/32) *0x10.

No I meant the -1s

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-03 Thread l . genoni
As I reported when I tested this patch, it works, but I could see an 
abnormally high load averay while triggering the error message. anyway, it 
is better to have an high load averag three or four times higher than what 
you would expect then a crash/reboot. isn't it? :)


Luigi Genoni

p.s.
will test the other definitive patch on montday on both 8 and 16 CPU 
system.


On Sat, 3 Feb 2007, Eric W. Biederman wrote:


Date: Sat, 03 Feb 2007 00:55:11 -0700
From: Eric W. Biederman [EMAIL PROTECTED]
To: Arjan van de Ven [EMAIL PROTECTED]
Cc: Andrew Morton [EMAIL PROTECTED], linux-kernel@vger.kernel.org,
Lu, Yinghai [EMAIL PROTECTED],
Luigi Genoni [EMAIL PROTECTED], Ingo Molnar [EMAIL PROTECTED],
Natalie Protasevich [EMAIL PROTECTED], Andi Kleen [EMAIL PROTECTED]
Subject: Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq
migration.
Resent-Date: Sat, 03 Feb 2007 09:05:10 +0100
Resent-From: [EMAIL PROTECTED]

Arjan van de Ven [EMAIL PROTECTED] writes:


Once the migration operation is complete we know we will receive
no more interrupts on this vector so the irq pending state for
this irq will no longer be updated.  If the irq is not pending and
we are in the intermediate state we immediately free the vector,
otherwise in we free the vector in do_IRQ when the pending irq
arrives.


So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
I assume it doesn't affect many people?


I got a few reports of this; irqbalance may trigger this kernel bug it
seems... I would suggest to consider this for 2.6.20 since it's a
hard-hang case



Yes.  The bug I fixed will not happen if you don't migrate irqs.

At the very least we want the patch below (already in -mm)
that makes it not a hard hang case.

Subject: [PATCH] x86_64:  Survive having no irq mapping for a vector

Occasionally the kernel has bugs that result in no irq being
found for a given cpu vector.  If we acknowledge the irq
the system has a good chance of continuing even though we dropped
an missed an irq message.  If we continue to simply print a
message and drop and not acknowledge the irq the system is
likely to become non-responsive shortly there after.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
arch/x86_64/kernel/irq.c |   11 ---
1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 0c06af6..648055a 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -120,9 +120,14 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs)

if (likely(irq  NR_IRQS))
generic_handle_irq(irq);
-   else if (printk_ratelimit())
-   printk(KERN_EMERG %s: %d.%d No irq handler for vector\n,
-   __func__, smp_processor_id(), vector);
+   else {
+   if (!disable_apic)
+   ack_APIC_irq();
+
+   if (printk_ratelimit())
+   printk(KERN_EMERG %s: %d.%d No irq handler for 
vector\n,
+   __func__, smp_processor_id(), vector);
+   }

irq_exit();

--
1.4.4.1.g278f

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-02 Thread Eric W. Biederman
Arjan van de Ven <[EMAIL PROTECTED]> writes:

>> > Once the migration operation is complete we know we will receive
>> > no more interrupts on this vector so the irq pending state for
>> > this irq will no longer be updated.  If the irq is not pending and
>> > we are in the intermediate state we immediately free the vector,
>> > otherwise in we free the vector in do_IRQ when the pending irq
>> > arrives.
>> 
>> So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
>> I assume it doesn't affect many people?
>
> I got a few reports of this; irqbalance may trigger this kernel bug it
> seems... I would suggest to consider this for 2.6.20 since it's a
> hard-hang case


Yes.  The bug I fixed will not happen if you don't migrate irqs.

At the very least we want the patch below (already in -mm)
that makes it not a hard hang case.

Subject: [PATCH] x86_64:  Survive having no irq mapping for a vector

Occasionally the kernel has bugs that result in no irq being
found for a given cpu vector.  If we acknowledge the irq
the system has a good chance of continuing even though we dropped
an missed an irq message.  If we continue to simply print a
message and drop and not acknowledge the irq the system is
likely to become non-responsive shortly there after.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/irq.c |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 0c06af6..648055a 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -120,9 +120,14 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs)
 
if (likely(irq < NR_IRQS))
generic_handle_irq(irq);
-   else if (printk_ratelimit())
-   printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
-   __func__, smp_processor_id(), vector);
+   else {
+   if (!disable_apic)
+   ack_APIC_irq();
+
+   if (printk_ratelimit())
+   printk(KERN_EMERG "%s: %d.%d No irq handler for 
vector\n",
+   __func__, smp_processor_id(), vector);
+   }
 
irq_exit();
 
-- 
1.4.4.1.g278f

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-02 Thread Arjan van de Ven

> > Once the migration operation is complete we know we will receive
> > no more interrupts on this vector so the irq pending state for
> > this irq will no longer be updated.  If the irq is not pending and
> > we are in the intermediate state we immediately free the vector,
> > otherwise in we free the vector in do_IRQ when the pending irq
> > arrives.
> 
> So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
> I assume it doesn't affect many people?

I got a few reports of this; irqbalance may trigger this kernel bug it
seems... I would suggest to consider this for 2.6.20 since it's a
hard-hang case

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-02 Thread Andrew Morton
On Fri, 02 Feb 2007 18:39:15 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

> Andrew Morton <[EMAIL PROTECTED]> writes:
> 
> > So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
> > I assume it doesn't affect many people?
> 
> If it's not to late, and this patch isn't too scary.
> 
> It's a really rare set of circumstances that trigger it, but the
> possibility of being hit is pretty widespread, anything with
> more than one cpu, and more then one irq could see this.
> 
> The easiest way to trigger this is to have two level triggered irqs on
> two different cpus using the same vector.  In that case if one acks
> it's irq while the other irq is migrating to a different cpu 2.6.19
> get completely confused and stop handling interrupts properly.
> 
> With my previous bug fix (not to drop the ack when we are confused)
> the machine will stay up, and that is obviously correct and can't
> affect anything else so is probably a candidate for the stable tree.
> 
> With this fix everything just works.
> 
> I don't know how often a legitimate case of the exact same irq
> going off twice in a row is, but that is a possibility as well
> especially with edge triggered interrupts.
> 
> Setting up the test scenario was a pain, but by extremely limiting
> my choice of vectors I was able to confirm I survived several hundred
> of these events with in a couple of minutes no problem.
> 

OK, thanks.  Let's await Andi's feedback.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-02 Thread Eric W. Biederman
Andrew Morton <[EMAIL PROTECTED]> writes:

> So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
> I assume it doesn't affect many people?

If it's not to late, and this patch isn't too scary.

It's a really rare set of circumstances that trigger it, but the
possibility of being hit is pretty widespread, anything with
more than one cpu, and more then one irq could see this.

The easiest way to trigger this is to have two level triggered irqs on
two different cpus using the same vector.  In that case if one acks
it's irq while the other irq is migrating to a different cpu 2.6.19
get completely confused and stop handling interrupts properly.

With my previous bug fix (not to drop the ack when we are confused)
the machine will stay up, and that is obviously correct and can't
affect anything else so is probably a candidate for the stable tree.

With this fix everything just works.

I don't know how often a legitimate case of the exact same irq
going off twice in a row is, but that is a possibility as well
especially with edge triggered interrupts.

Setting up the test scenario was a pain, but by extremely limiting
my choice of vectors I was able to confirm I survived several hundred
of these events with in a couple of minutes no problem.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-02 Thread Andrew Morton
On Fri, 02 Feb 2007 17:35:31 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

> When making the interrupt vectors per cpu I failed to handle a case
> during irq migration.  If the same interrupt comes in while we are
> servicing the irq but before we migrate it the pending bit in the
> local apic IRR register will be set for that irq.
> 
> After migrating the irq to another cpu and or vector the data
> structures will no longer be setup to handle this pending irq.  Then as
> soon as we return from servicing the irq we just migrated we will get
> a nasty: "No irq handler for vector" message. 
> 
> Since we do not disable irqs for edge triggered interrupts except
> in the smallest possible window during migration I cannot avoid
> migrating an irq in the unlikely event that it becomes pending.
> This is because by the time the irq could no longer become pending
> I have already updated all of my data structures.
> 
> Therefore this patch introduces an intermediate state that
> exists soley on the cpu where we are handling the irq during
> migration.  The irq number is changed to negative in the
> vector_irq data structure.
> 
> Once the migration operation is complete we know we will receive
> no more interrupts on this vector so the irq pending state for
> this irq will no longer be updated.  If the irq is not pending and
> we are in the intermediate state we immediately free the vector,
> otherwise in we free the vector in do_IRQ when the pending irq
> arrives.

So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
I assume it doesn't affect many people?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-02 Thread Andrew Morton
On Fri, 02 Feb 2007 17:35:31 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

 When making the interrupt vectors per cpu I failed to handle a case
 during irq migration.  If the same interrupt comes in while we are
 servicing the irq but before we migrate it the pending bit in the
 local apic IRR register will be set for that irq.
 
 After migrating the irq to another cpu and or vector the data
 structures will no longer be setup to handle this pending irq.  Then as
 soon as we return from servicing the irq we just migrated we will get
 a nasty: No irq handler for vector message. 
 
 Since we do not disable irqs for edge triggered interrupts except
 in the smallest possible window during migration I cannot avoid
 migrating an irq in the unlikely event that it becomes pending.
 This is because by the time the irq could no longer become pending
 I have already updated all of my data structures.
 
 Therefore this patch introduces an intermediate state that
 exists soley on the cpu where we are handling the irq during
 migration.  The irq number is changed to negative in the
 vector_irq data structure.
 
 Once the migration operation is complete we know we will receive
 no more interrupts on this vector so the irq pending state for
 this irq will no longer be updated.  If the irq is not pending and
 we are in the intermediate state we immediately free the vector,
 otherwise in we free the vector in do_IRQ when the pending irq
 arrives.

So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
I assume it doesn't affect many people?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-02 Thread Eric W. Biederman
Andrew Morton [EMAIL PROTECTED] writes:

 So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
 I assume it doesn't affect many people?

If it's not to late, and this patch isn't too scary.

It's a really rare set of circumstances that trigger it, but the
possibility of being hit is pretty widespread, anything with
more than one cpu, and more then one irq could see this.

The easiest way to trigger this is to have two level triggered irqs on
two different cpus using the same vector.  In that case if one acks
it's irq while the other irq is migrating to a different cpu 2.6.19
get completely confused and stop handling interrupts properly.

With my previous bug fix (not to drop the ack when we are confused)
the machine will stay up, and that is obviously correct and can't
affect anything else so is probably a candidate for the stable tree.

With this fix everything just works.

I don't know how often a legitimate case of the exact same irq
going off twice in a row is, but that is a possibility as well
especially with edge triggered interrupts.

Setting up the test scenario was a pain, but by extremely limiting
my choice of vectors I was able to confirm I survived several hundred
of these events with in a couple of minutes no problem.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-02 Thread Andrew Morton
On Fri, 02 Feb 2007 18:39:15 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

 Andrew Morton [EMAIL PROTECTED] writes:
 
  So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
  I assume it doesn't affect many people?
 
 If it's not to late, and this patch isn't too scary.
 
 It's a really rare set of circumstances that trigger it, but the
 possibility of being hit is pretty widespread, anything with
 more than one cpu, and more then one irq could see this.
 
 The easiest way to trigger this is to have two level triggered irqs on
 two different cpus using the same vector.  In that case if one acks
 it's irq while the other irq is migrating to a different cpu 2.6.19
 get completely confused and stop handling interrupts properly.
 
 With my previous bug fix (not to drop the ack when we are confused)
 the machine will stay up, and that is obviously correct and can't
 affect anything else so is probably a candidate for the stable tree.
 
 With this fix everything just works.
 
 I don't know how often a legitimate case of the exact same irq
 going off twice in a row is, but that is a possibility as well
 especially with edge triggered interrupts.
 
 Setting up the test scenario was a pain, but by extremely limiting
 my choice of vectors I was able to confirm I survived several hundred
 of these events with in a couple of minutes no problem.
 

OK, thanks.  Let's await Andi's feedback.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-02 Thread Arjan van de Ven

  Once the migration operation is complete we know we will receive
  no more interrupts on this vector so the irq pending state for
  this irq will no longer be updated.  If the irq is not pending and
  we are in the intermediate state we immediately free the vector,
  otherwise in we free the vector in do_IRQ when the pending irq
  arrives.
 
 So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
 I assume it doesn't affect many people?

I got a few reports of this; irqbalance may trigger this kernel bug it
seems... I would suggest to consider this for 2.6.20 since it's a
hard-hang case

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.

2007-02-02 Thread Eric W. Biederman
Arjan van de Ven [EMAIL PROTECTED] writes:

  Once the migration operation is complete we know we will receive
  no more interrupts on this vector so the irq pending state for
  this irq will no longer be updated.  If the irq is not pending and
  we are in the intermediate state we immediately free the vector,
  otherwise in we free the vector in do_IRQ when the pending irq
  arrives.
 
 So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
 I assume it doesn't affect many people?

 I got a few reports of this; irqbalance may trigger this kernel bug it
 seems... I would suggest to consider this for 2.6.20 since it's a
 hard-hang case


Yes.  The bug I fixed will not happen if you don't migrate irqs.

At the very least we want the patch below (already in -mm)
that makes it not a hard hang case.

Subject: [PATCH] x86_64:  Survive having no irq mapping for a vector

Occasionally the kernel has bugs that result in no irq being
found for a given cpu vector.  If we acknowledge the irq
the system has a good chance of continuing even though we dropped
an missed an irq message.  If we continue to simply print a
message and drop and not acknowledge the irq the system is
likely to become non-responsive shortly there after.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 arch/x86_64/kernel/irq.c |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 0c06af6..648055a 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -120,9 +120,14 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs)
 
if (likely(irq  NR_IRQS))
generic_handle_irq(irq);
-   else if (printk_ratelimit())
-   printk(KERN_EMERG %s: %d.%d No irq handler for vector\n,
-   __func__, smp_processor_id(), vector);
+   else {
+   if (!disable_apic)
+   ack_APIC_irq();
+
+   if (printk_ratelimit())
+   printk(KERN_EMERG %s: %d.%d No irq handler for 
vector\n,
+   __func__, smp_processor_id(), vector);
+   }
 
irq_exit();
 
-- 
1.4.4.1.g278f

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/