Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded

2022-08-19 Thread Guilherme G. Piccoli
On 17/08/2022 19:19, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 07:09:26PM -0300, Guilherme G. Piccoli wrote:
>> Again - a matter of a trade-off, a good compromise must be agreed by all
>> parties (kdump maintainers are usually extremely afraid of taking risks
>> to not break kdump).
> 
> Right, and logging hw errors from a panic notifier feels simply weird.
> 
> x86 has its own, special notifier exactly for that and it is independent
> from the panic path but it gets run right after the exception raised due
> to the hw error is done.
> 
> Dunno if ARM has such facilities tho...
> 
> Thx.
> 

Yeah, MCE stuff right? Not sure for ARM and other architectures as well.
Cheers,


Guilherme

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded

2022-08-19 Thread Guilherme G. Piccoli
On 17/08/2022 14:31, Borislav Petkov wrote:
> [...]
> 
> How does the fact that kdump is loaded, obviate the need to print
> information about the errors?
> 
> Are you suggesting that people who have the whole vmcore would be able
> to piece together the error information?
> 

Hi Boris, thanks for chiming in.

So, this is part of an effort to clean-up the panic path. Currently, if
a kdump happens, *all* the panic notifiers are skipped by default,
including this one. In this scenario, this patch seems like a no-op.

But happens that in the refactor we are proposing [0], some notifiers
should run before the kdump. We are basically putting some ordering in
the way notifiers are executed, while documenting this properly and with
the goal of not increasing the failure risk for kdump.

This patch is useful so we can bring the altera EDAC notifier to run
earlier while not increasing the risk on kdump - this operation is a bit
"delicate" to happen in the panic scenario. The origin of this patch was
a discussion with Tony/Peter [1], guess we can call it a "compromise
solution".

Let me know if you disagree or have more questions, and in case you'd
like to check/participate in the whole panic notifiers refactor
discussion, it would be great =)
Cheers,


Guilherme


[0]
https://lore.kernel.org/lkml/20220427224924.592546-1-gpicc...@igalia.com/

[1]
https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa2138527...@igalia.com/

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded

2022-08-19 Thread Guilherme G. Piccoli
On 17/08/2022 19:00, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 06:56:11PM -0300, Guilherme G. Piccoli wrote:
>> But do you agree that currently, in case of a kdump, that information
>> *is not collected*, with our without my patch?
> 
> If for some reason that panic notifier does not get run before kdump,
> then that's a bug and that driver should not use a panic notifier to log
> hw errors in the first place.
> 

Indeed, the notifiers don't run before kdump by default, in a conscious
decision of the kdump maintainers.

You might be right, in the sense that maybe the edac error handler
shouldn't run as a panic notifier. Let's see if Tony / Dinh can chime in
on that discussion - we could move it to run in the kexec event as well,
so it'd always run before a kdump, but maybe the risk it offers during
panic time is not worth.

Again - a matter of a trade-off, a good compromise must be agreed by all
parties (kdump maintainers are usually extremely afraid of taking risks
to not break kdump).

Cheers!

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded

2022-08-19 Thread Guilherme G. Piccoli
On 17/08/2022 18:46, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 06:39:07PM -0300, Guilherme G. Piccoli wrote:
>> Sorry for the confusion, let me try to be a bit more clear:
> 
> I think you're missing the point. Lemme try again:
> 
> You *absolutely* must log those errors because they're important. It
> doesn't matter if this is done in a panic notifier and you're changing
> that whole shebang or through some other magic.
> 
> If you do, then this driver needs to *still* *log* those fatal errors -
> regardless through a panic notifier or some novel contraption it wants
> to use.
> 
> So if you want to change the panic notifiers, you *must* make sure those
> errors still do get logged.
> 
> Better?
> 

Boris, I understand the importance of the logs, for sure!

But do you agree that currently, in case of a kdump, that information
*is not collected*, with our without my patch?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded

2022-08-19 Thread Guilherme G. Piccoli
On 17/08/2022 18:02, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 05:28:34PM -0300, Guilherme G. Piccoli wrote:
>> My understanding is the same as yours, i.e., this is not possible to
>> collect from vmcore, it requires register reading. But again: if you
>> kdump your machine today, you won't collect this information, patch
>> changed nothing in that regard.
> 
> Why won't you be able to collect it? You can certainly access dmesg in
> the vmcore and see those errors logged there.

Sorry for the confusion, let me try to be a bit more clear:

(1) if we kdump but we *didn't run* s10_edac_dberr_handler() before
kdump, the information is lost, since s10_edac_dberr_handler() performs
register readings. That information is not contained inside the vmcore.

(2) If for some reason the function s10_edac_dberr_handler() *was
executed prior to kdump*, of course the registers information would be
on dmesg, easy to collect in the vmcore.

Makes sense?


> 
>> The one thing it changes is that you'd skip the altera register dump if
>> kdump is set AND you managed to also set "crash_kexec_post_notifiers".
> 
> What your patch changes is, it prevents s10_edac_dberr_handler() from
> logging potentially important fatal hw errors when kdump is loaded.

Agreed. If kdump is loaded, we cannot log that information (modulo that
we do not collect it today by default on kdump as well).
The other part of story (the reason of the patch) is that we plan to
start running this panic notifier a bit earlier, being able to collect
such edac logs with pstore, for example.


> 
> If Dinh is fine with that, I'll take the patch. But it looks like a bad
> idea to me.
> 

I think we should seek what the majority of the folks consider the best,
in order to converge to some well-accepted solution. I'm completely OK
in dropping this one and rework with some other idea, or we can leave
this panic notifier as is, continue running that a bit later.

Tony / Petr (when back), suggestions are welcome =)
Cheers,


Guilherme

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded

2022-08-19 Thread Guilherme G. Piccoli
On 17/08/2022 16:34, Borislav Petkov wrote:
> [...]
> 
> What is "the failure risk for kdump"?
> 
> Some of the notifiers which run before kdump might fail and thus prevent
> the machine from kdumping?
>

Exactly; some notifiers could break the machine and prevent a successful
kdump. The EDAC one is consider medium risk, due to invasive operations
(register readings on panic situation).


> [...] 
> My question stands: if kdump is loaded and the s10_edac_dberr_handler()
> does not read the the fatal errors and they don't get shown in dmesg
> before the machine panics, how do you intend to show that information to
> the user?
> 
> Because fatal errors are something you absolutely wanna show, at least,
> in dmesg!
> 
> I don't think you can "read" the errors from vmcore - they need to be
> read from the hw registers before the machine dies.
> 

My understanding is the same as yours, i.e., this is not possible to
collect from vmcore, it requires register reading. But again: if you
kdump your machine today, you won't collect this information, patch
changed nothing in that regard.

The one thing it changes is that you'd skip the altera register dump if
kdump is set AND you managed to also set "crash_kexec_post_notifiers".

In case you / Dinh / Tony disagrees with the patch, it's fine and we can
discard it, but then this notifier couldn't run early in the refactor we
are doing, it'd postponed to run later. This are is full of trade-offs,
we just need to choose what compromise solution is preferred by the
majority of developers =)

Cheers,


Guilherme

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded

2022-08-17 Thread Guilherme G. Piccoli
On 16/08/2022 15:44, Dinh Nguyen wrote:
> [...] 
>> V2:
>> - new patch, based on the discussion in [0].
>> [0] 
>> https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa2138527...@igalia.com/
>>
> 
> Acked-by: Dinh Nguyen 

Thanks a lot Dinh!

There is something I'm asking for maintainers on patches in this series,
so here it goes for you (CC Borislav / Tony): do you think it's possible
to pick this one in your tree directly, from this series, or do you
prefer I re-submit as a standalone patch, on linux-edac maybe?

Thanks,


Guilherme

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded

2022-08-17 Thread Dinh Nguyen




On 7/19/22 14:53, Guilherme G. Piccoli wrote:

The altera_edac panic notifier performs some data collection with
regards errors detected; such code relies in the regmap layer to
perform reads/writes, so the code is abstracted and there is some
risk level to execute that, since the panic path runs in atomic
context, with interrupts/preemption and secondary CPUs disabled.

Users want the information collected in this panic notifier though,
so in order to balance the risk/benefit, let's skip the altera panic
notifier if kdump is loaded. While at it, remove a useless header
and encompass a macro inside the sole ifdef block it is used.

Cc: Dinh Nguyen 
Cc: Petr Mladek 
Cc: Tony Luck 
Signed-off-by: Guilherme G. Piccoli 

---

V2:
- new patch, based on the discussion in [0].
[0] 
https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa2138527...@igalia.com/



Acked-by: Dinh Nguyen 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded

2022-08-15 Thread Guilherme G. Piccoli
On 19/07/2022 16:53, Guilherme G. Piccoli wrote:
> The altera_edac panic notifier performs some data collection with
> regards errors detected; such code relies in the regmap layer to
> perform reads/writes, so the code is abstracted and there is some
> risk level to execute that, since the panic path runs in atomic
> context, with interrupts/preemption and secondary CPUs disabled.
> 
> Users want the information collected in this panic notifier though,
> so in order to balance the risk/benefit, let's skip the altera panic
> notifier if kdump is loaded. While at it, remove a useless header
> and encompass a macro inside the sole ifdef block it is used.
> 
> Cc: Dinh Nguyen 
> Cc: Petr Mladek 
> Cc: Tony Luck 
> Signed-off-by: Guilherme G. Piccoli 
> 
> ---
> 
> V2:
> - new patch, based on the discussion in [0].
> [0] 
> https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa2138527...@igalia.com/
> 
>  drivers/edac/altera_edac.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> [...]

Hey Tony / Dinh, do you think this patch is good, based on the
discussion we had [0] last iteration?

Thanks in advance,


Guilherme


[0]
https://lore.kernel.org/lkml/599b72f6-76a4-8e6d-5432-56fb1ffd7...@igalia.com/

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec