Re: [PATCH 2/2] powerpc/book3s: mce: Use add_taint_no_warn() in machine_check_early().

2017-04-17 Thread Mahesh Jagannath Salgaonkar
On 04/17/2017 04:09 PM, Daniel Axtens wrote:
> Hi Mahesh,
> 
>> Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check 
>> errors.
> 
> I notice this Fixes a commit I introduced. Please could you cc me when
> you do this? I am likely to miss it otherwise, especially since I have
> now left IBM.

Sure will do. :-)

> 
> Being cced allows me to provide an Ack or a review. And getting feedback
> on my changes is very helpful in becoming a better programmer.
> 
> In this case, as per Michael's comment, why don't we just move the
> add_taint from machine_check_early to
> machine_check_process_queued_event - the other side of the work queue.

Yes. That is what my plan is. Also, that is not the only place.
add_taint() need to be called from machine_check_exception() as well. So
it will be called from two places.

Thanks,
-Mahesh.

> 
> The work queue system is supposed to provide us with a safe place to do
> printing, etc., so it's an appropriate place. Also, we already do
> machine_check_print_event_info there, and adding the taint doesn't need
> to be done synchronously.
> 
> Regards,
> Daniel
> 
> Mahesh J Salgaonkar  writes:
> 
>> From: Mahesh Salgaonkar 
>>
>> machine_check_early() gets called in real mode. The very first time when
>> add_taint() is called, it prints a warning which ends up calling opal
>> call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
>> very first machine check while we are in opal we are doomed. OPAL_CALL
>> overwrites the PACASAVEDMSR in r13 and in this case when we are done with
>> MCE handling the original opal call will use this new MSR on it's way
>> back to opal_return. This usually leads unexpected behaviour or kernel
>> to panic. Instead use the add_taint_no_warn() that does not call printk.
>>
>> This is broken with current FW level. We got lucky so far for not getting
>> very first MCE hit while in OPAL. But easily reproducible on Mambo.
>> This should go to stable as well alongwith patch 1/2.
>>
>> Signed-off-by: Mahesh Salgaonkar 
>> ---
>>  arch/powerpc/kernel/traps.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> index 62b587f..4a048dc 100644
>> --- a/arch/powerpc/kernel/traps.c
>> +++ b/arch/powerpc/kernel/traps.c
>> @@ -306,7 +306,7 @@ long machine_check_early(struct pt_regs *regs)
>>  
>>  __this_cpu_inc(irq_stat.mce_exceptions);
>>  
>> -add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>> +add_taint_no_warn(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>>  
>>  /*
>>   * See if platform is capable of handling machine check. (e.g. PowerNV
> 



Re: [PATCH 2/2] powerpc/book3s: mce: Use add_taint_no_warn() in machine_check_early().

2017-04-17 Thread Mahesh Jagannath Salgaonkar
On 04/17/2017 04:09 PM, Daniel Axtens wrote:
> Hi Mahesh,
> 
>> Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check 
>> errors.
> 
> I notice this Fixes a commit I introduced. Please could you cc me when
> you do this? I am likely to miss it otherwise, especially since I have
> now left IBM.

Sure will do. :-)

> 
> Being cced allows me to provide an Ack or a review. And getting feedback
> on my changes is very helpful in becoming a better programmer.
> 
> In this case, as per Michael's comment, why don't we just move the
> add_taint from machine_check_early to
> machine_check_process_queued_event - the other side of the work queue.

Yes. That is what my plan is. Also, that is not the only place.
add_taint() need to be called from machine_check_exception() as well. So
it will be called from two places.

Thanks,
-Mahesh.

> 
> The work queue system is supposed to provide us with a safe place to do
> printing, etc., so it's an appropriate place. Also, we already do
> machine_check_print_event_info there, and adding the taint doesn't need
> to be done synchronously.
> 
> Regards,
> Daniel
> 
> Mahesh J Salgaonkar  writes:
> 
>> From: Mahesh Salgaonkar 
>>
>> machine_check_early() gets called in real mode. The very first time when
>> add_taint() is called, it prints a warning which ends up calling opal
>> call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
>> very first machine check while we are in opal we are doomed. OPAL_CALL
>> overwrites the PACASAVEDMSR in r13 and in this case when we are done with
>> MCE handling the original opal call will use this new MSR on it's way
>> back to opal_return. This usually leads unexpected behaviour or kernel
>> to panic. Instead use the add_taint_no_warn() that does not call printk.
>>
>> This is broken with current FW level. We got lucky so far for not getting
>> very first MCE hit while in OPAL. But easily reproducible on Mambo.
>> This should go to stable as well alongwith patch 1/2.
>>
>> Signed-off-by: Mahesh Salgaonkar 
>> ---
>>  arch/powerpc/kernel/traps.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> index 62b587f..4a048dc 100644
>> --- a/arch/powerpc/kernel/traps.c
>> +++ b/arch/powerpc/kernel/traps.c
>> @@ -306,7 +306,7 @@ long machine_check_early(struct pt_regs *regs)
>>  
>>  __this_cpu_inc(irq_stat.mce_exceptions);
>>  
>> -add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>> +add_taint_no_warn(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>>  
>>  /*
>>   * See if platform is capable of handling machine check. (e.g. PowerNV
> 



Re: [PATCH 2/2] powerpc/book3s: mce: Use add_taint_no_warn() in machine_check_early().

2017-04-17 Thread Daniel Axtens
Hi Mahesh,

> Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check 
> errors.

I notice this Fixes a commit I introduced. Please could you cc me when
you do this? I am likely to miss it otherwise, especially since I have
now left IBM.

Being cced allows me to provide an Ack or a review. And getting feedback
on my changes is very helpful in becoming a better programmer.

In this case, as per Michael's comment, why don't we just move the
add_taint from machine_check_early to
machine_check_process_queued_event - the other side of the work queue.

The work queue system is supposed to provide us with a safe place to do
printing, etc., so it's an appropriate place. Also, we already do
machine_check_print_event_info there, and adding the taint doesn't need
to be done synchronously.

Regards,
Daniel

Mahesh J Salgaonkar  writes:

> From: Mahesh Salgaonkar 
>
> machine_check_early() gets called in real mode. The very first time when
> add_taint() is called, it prints a warning which ends up calling opal
> call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
> very first machine check while we are in opal we are doomed. OPAL_CALL
> overwrites the PACASAVEDMSR in r13 and in this case when we are done with
> MCE handling the original opal call will use this new MSR on it's way
> back to opal_return. This usually leads unexpected behaviour or kernel
> to panic. Instead use the add_taint_no_warn() that does not call printk.
>
> This is broken with current FW level. We got lucky so far for not getting
> very first MCE hit while in OPAL. But easily reproducible on Mambo.
> This should go to stable as well alongwith patch 1/2.
>
> Signed-off-by: Mahesh Salgaonkar 
> ---
>  arch/powerpc/kernel/traps.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 62b587f..4a048dc 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -306,7 +306,7 @@ long machine_check_early(struct pt_regs *regs)
>  
>   __this_cpu_inc(irq_stat.mce_exceptions);
>  
> - add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> + add_taint_no_warn(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>  
>   /*
>* See if platform is capable of handling machine check. (e.g. PowerNV


Re: [PATCH 2/2] powerpc/book3s: mce: Use add_taint_no_warn() in machine_check_early().

2017-04-17 Thread Daniel Axtens
Hi Mahesh,

> Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check 
> errors.

I notice this Fixes a commit I introduced. Please could you cc me when
you do this? I am likely to miss it otherwise, especially since I have
now left IBM.

Being cced allows me to provide an Ack or a review. And getting feedback
on my changes is very helpful in becoming a better programmer.

In this case, as per Michael's comment, why don't we just move the
add_taint from machine_check_early to
machine_check_process_queued_event - the other side of the work queue.

The work queue system is supposed to provide us with a safe place to do
printing, etc., so it's an appropriate place. Also, we already do
machine_check_print_event_info there, and adding the taint doesn't need
to be done synchronously.

Regards,
Daniel

Mahesh J Salgaonkar  writes:

> From: Mahesh Salgaonkar 
>
> machine_check_early() gets called in real mode. The very first time when
> add_taint() is called, it prints a warning which ends up calling opal
> call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
> very first machine check while we are in opal we are doomed. OPAL_CALL
> overwrites the PACASAVEDMSR in r13 and in this case when we are done with
> MCE handling the original opal call will use this new MSR on it's way
> back to opal_return. This usually leads unexpected behaviour or kernel
> to panic. Instead use the add_taint_no_warn() that does not call printk.
>
> This is broken with current FW level. We got lucky so far for not getting
> very first MCE hit while in OPAL. But easily reproducible on Mambo.
> This should go to stable as well alongwith patch 1/2.
>
> Signed-off-by: Mahesh Salgaonkar 
> ---
>  arch/powerpc/kernel/traps.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 62b587f..4a048dc 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -306,7 +306,7 @@ long machine_check_early(struct pt_regs *regs)
>  
>   __this_cpu_inc(irq_stat.mce_exceptions);
>  
> - add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> + add_taint_no_warn(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>  
>   /*
>* See if platform is capable of handling machine check. (e.g. PowerNV


Re: [PATCH 2/2] powerpc/book3s: mce: Use add_taint_no_warn() in machine_check_early().

2017-04-11 Thread Michael Ellerman
Mahesh J Salgaonkar  writes:

> From: Mahesh Salgaonkar 
>
> machine_check_early() gets called in real mode. The very first time when
> add_taint() is called, it prints a warning which ends up calling opal
> call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
> very first machine check while we are in opal we are doomed. OPAL_CALL
> overwrites the PACASAVEDMSR in r13 and in this case when we are done with
> MCE handling the original opal call will use this new MSR on it's way
> back to opal_return. This usually leads unexpected behaviour or kernel
> to panic. Instead use the add_taint_no_warn() that does not call printk.
>
> This is broken with current FW level. We got lucky so far for not getting
> very first MCE hit while in OPAL. But easily reproducible on Mambo.
> This should go to stable as well alongwith patch 1/2.

This is not a good way to fix a bug that needs to go back to stable.
Changing generic code means I need to sync up with the right maintainer,
get acks, etc. And then convince people that it should go to stable also.

So you can please fix this a different way for stable?

Can we just do the tainting later, once we're in virtual mode?

cheers

> Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check 
> errors.
> Signed-off-by: Mahesh Salgaonkar 
> ---
>  arch/powerpc/kernel/traps.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 62b587f..4a048dc 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -306,7 +306,7 @@ long machine_check_early(struct pt_regs *regs)
>  
>   __this_cpu_inc(irq_stat.mce_exceptions);
>  
> - add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> + add_taint_no_warn(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>  
>   /*
>* See if platform is capable of handling machine check. (e.g. PowerNV


Re: [PATCH 2/2] powerpc/book3s: mce: Use add_taint_no_warn() in machine_check_early().

2017-04-11 Thread Michael Ellerman
Mahesh J Salgaonkar  writes:

> From: Mahesh Salgaonkar 
>
> machine_check_early() gets called in real mode. The very first time when
> add_taint() is called, it prints a warning which ends up calling opal
> call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
> very first machine check while we are in opal we are doomed. OPAL_CALL
> overwrites the PACASAVEDMSR in r13 and in this case when we are done with
> MCE handling the original opal call will use this new MSR on it's way
> back to opal_return. This usually leads unexpected behaviour or kernel
> to panic. Instead use the add_taint_no_warn() that does not call printk.
>
> This is broken with current FW level. We got lucky so far for not getting
> very first MCE hit while in OPAL. But easily reproducible on Mambo.
> This should go to stable as well alongwith patch 1/2.

This is not a good way to fix a bug that needs to go back to stable.
Changing generic code means I need to sync up with the right maintainer,
get acks, etc. And then convince people that it should go to stable also.

So you can please fix this a different way for stable?

Can we just do the tainting later, once we're in virtual mode?

cheers

> Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check 
> errors.
> Signed-off-by: Mahesh Salgaonkar 
> ---
>  arch/powerpc/kernel/traps.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 62b587f..4a048dc 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -306,7 +306,7 @@ long machine_check_early(struct pt_regs *regs)
>  
>   __this_cpu_inc(irq_stat.mce_exceptions);
>  
> - add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> + add_taint_no_warn(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>  
>   /*
>* See if platform is capable of handling machine check. (e.g. PowerNV


[PATCH 2/2] powerpc/book3s: mce: Use add_taint_no_warn() in machine_check_early().

2017-02-20 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

machine_check_early() gets called in real mode. The very first time when
add_taint() is called, it prints a warning which ends up calling opal
call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
very first machine check while we are in opal we are doomed. OPAL_CALL
overwrites the PACASAVEDMSR in r13 and in this case when we are done with
MCE handling the original opal call will use this new MSR on it's way
back to opal_return. This usually leads unexpected behaviour or kernel
to panic. Instead use the add_taint_no_warn() that does not call printk.

This is broken with current FW level. We got lucky so far for not getting
very first MCE hit while in OPAL. But easily reproducible on Mambo.
This should go to stable as well alongwith patch 1/2.

Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check 
errors.
Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/kernel/traps.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 62b587f..4a048dc 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -306,7 +306,7 @@ long machine_check_early(struct pt_regs *regs)
 
__this_cpu_inc(irq_stat.mce_exceptions);
 
-   add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
+   add_taint_no_warn(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
 
/*
 * See if platform is capable of handling machine check. (e.g. PowerNV



[PATCH 2/2] powerpc/book3s: mce: Use add_taint_no_warn() in machine_check_early().

2017-02-20 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

machine_check_early() gets called in real mode. The very first time when
add_taint() is called, it prints a warning which ends up calling opal
call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
very first machine check while we are in opal we are doomed. OPAL_CALL
overwrites the PACASAVEDMSR in r13 and in this case when we are done with
MCE handling the original opal call will use this new MSR on it's way
back to opal_return. This usually leads unexpected behaviour or kernel
to panic. Instead use the add_taint_no_warn() that does not call printk.

This is broken with current FW level. We got lucky so far for not getting
very first MCE hit while in OPAL. But easily reproducible on Mambo.
This should go to stable as well alongwith patch 1/2.

Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check 
errors.
Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/kernel/traps.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 62b587f..4a048dc 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -306,7 +306,7 @@ long machine_check_early(struct pt_regs *regs)
 
__this_cpu_inc(irq_stat.mce_exceptions);
 
-   add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
+   add_taint_no_warn(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
 
/*
 * See if platform is capable of handling machine check. (e.g. PowerNV