Re: [PATCH V2] rtc: mc146818: Detect and handle broken RTCs

2021-02-01 Thread Serge Belyshev
Hi!  "Me too":

> --- a/drivers/rtc/rtc-mc146818-lib.c
> +++ b/drivers/rtc/rtc-mc146818-lib.c
> @@ -21,6 +21,13 @@ unsigned int mc146818_get_time(struct rt
>  
>  again:
>   spin_lock_irqsave(&rtc_lock, flags);
> + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
> + if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) {
> + spin_unlock_irqrestore(&rtc_lock, flags);
> + memset(time, 0xff, sizeof(*time));
> + return 0;
> + }
> +

... triggers here on a different box (Xiaomi mi notebook air 12.5):

[3.524002] [ cut here ]
[3.528317] WARNING: CPU: 3 PID: 273 at drivers/rtc/rtc-mc146818-lib.c:25 
mc146818_get_time+0x1b6/0x210
[3.532558] CPU: 3 PID: 273 Comm: udevadm Not tainted 5.11.0-rc6 #760
[3.536748] Hardware name: Timi TM1612/TM1612, BIOS A04 08/06/2016
[3.540947] RIP: 0010:mc146818_get_time+0x1b6/0x210
[3.545103] Code: 76 0b 0f b6 d0 83 ea 13 6b d2 64 01 d5 83 fd 45 89 6b 14 
7f 06 83 c5 64 89 6b 14 41 83 ed 01 b8 02 00 00 00 44 89 6b 10 eb 39 <0f> 0b 48 
c7 c7 b4 e0 9e 82 48 89 ee e8 29 6b 34 00 48 c7 03 ff ff
[3.549883] RSP: :c900012efe30 EFLAGS: 00010002
[3.554387] RAX: 0081 RBX: c900012efe64 RCX: 0005b8d7
[3.558867] RDX: 0001 RSI: 8881000aa000 RDI: 000d
[3.56] RBP: 0046 R08: 0004 R09: fffe5e075ac6
[3.567748] iwlwifi :02:00.0: Applying debug destination EXTERNAL_DRAM
[3.567822] R10:  R11:  R12: 
[3.567827] R13: c900012efedc R14: 0008 R15: 888100051200
[3.577223] FS:  () GS:88816ad8() 
knlGS:
[3.579870] CS:  0010 DS:  ES:  CR0: 80050033
[3.581947] CR2: 7fface455e28 CR3: 000103244005 CR4: 003706a0
[3.583836] Call Trace:
[3.585699]  hpet_rtc_interrupt+0x1af/0x220
[3.587585]  __handle_irq_event_percpu+0x5a/0xc0
[3.589230]  handle_irq_event_percpu+0x1b/0x50
[3.590673]  handle_irq_event+0x22/0x40
[3.592107]  handle_edge_irq+0x6b/0x190
[3.593545]  common_interrupt+0x67/0x130
[3.594983]  ? asm_common_interrupt+0x8/0x40
[3.596432]  asm_common_interrupt+0x1e/0x40
[3.597618] RIP: 0033:0x7ffaceac9b31
[3.598794] Code: 48 83 fe 0a 0f 87 f5 fe ff ff be 41 ff ff 6f 48 29 d6 48 
89 04 f1 e9 e4 fe ff ff 48 85 ff 74 79 49 8b 44 24 60 48 85 c0 74 04 <48> 01 78 
08 49 8b 44 24 58 48 85 c0 74 04 48 01 78 08 49 8b 44 24
[3.600048] RSP: 002b:7ffc12303b00 EFLAGS: 00010202
[3.601343] RAX: 7fface455e20 RBX: 6dff RCX: 7fface80c040
[3.602587] RDX:  RSI: 0029 RDI: 7fface451000
[3.603809] RBP: 7ffc12303c50 R08: 6fff R09: eef5
[3.605015] R10: 7022 R11: 0032 R12: 7fface80c000
[3.606223] R13: 6eff R14: 6e35 R15: 7ffc12303ce0
[3.607421] ---[ end trace 5922ddf43b0f7b83 ]---
[3.608692] hpet: Lost 3 RTC interrupts


Re: [PATCH V2] rtc: mc146818: Detect and handle broken RTCs

2021-01-31 Thread Dirk Gouders
Thomas Gleixner  writes:

> The recent fix for handling the UIP bit unearthed another issue in the RTC
> code. If the RTC is advertised but the readout is straight 0xFF because
> it's not available, the old code just proceeded with crappy values, but the
> new code hangs because it waits for the UIP bit to become low.
>
> Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID
> register (Register D) which should have bit 0-6 cleared. If that's not the
> case then fail to register the CMOS.
>
> Add the same check to mc146818_get_time(), warn once when the condition
> is true and invalidate the rtc_time data.

In case it is helpful: on my hardware this patch triggers a warning
(attached below).

Without it the rtc messages look like this:

[2.783386] rtc_cmos 00:01: RTC can wake from S4
[2.784302] rtc_cmos 00:01: registered as rtc0
[2.785036] rtc_cmos 00:01: setting system clock to 2021-01-31T10:13:40 UTC 
(1612088020)
[2.785713] rtc_cmos 00:01: alarms up to one month, y3k, 114 bytes nvram, 
hpet irqs

Dirk

[7.258410] [ cut here ]
[7.258414] WARNING: CPU: 2 PID: 0 at drivers/rtc/rtc-mc146818-lib.c:25 
mc146818_get_time+0x2b/0x1e5
[7.258420] Modules linked in: iwlmvm(+) mac80211 iwlwifi sdhci_pci 
amdgpu(+) drm_ttm_helper cfg80211 ttm cqhci gpu_sched sdhci ccp 
thinkpad_acpi(+) rng_core nvram tpm_tis(+) tpm_tis_core wmi tpm pinctrl_amd
[7.258432] CPU: 2 PID: 0 Comm: swapper/2 Tainted: GW 
5.11.0-rc5-next-20210129-x86_64 #180
[7.258434] Hardware name: LENOVO 20U50008GE/20U50008GE, BIOS R19ET26W (1.10 
) 06/22/2020
[7.258435] RIP: 0010:mc146818_get_time+0x2b/0x1e5
[7.258437] Code: 56 41 55 45 31 ed 41 54 55 53 48 89 fb 48 c7 c7 bc d9 eb 
82 e8 26 d8 36 00 bf 0d 00 00 00 48 89 c5 e8 6d d1 8f ff a8 7f 74 24 <0f> 0b 48 
c7 c7 bc d9 eb 82 48 89 ee e8 bc d6 36 00 b0 ff b9 24 00
[7.258438] RSP: 0018:c922cef0 EFLAGS: 00010002
[7.258440] RAX: 0031 RBX: c922cf24 RCX: 
[7.258441] RDX: 0001 RSI: 888105607000 RDI: 000d
[7.258441] RBP: 0046 R08: c922cf24 R09: 
[7.258442] R10:  R11:  R12: 888105607000
[7.258443] R13:  R14: c922cfa4 R15: 
[7.258444] FS:  () GS:88840ec8() 
knlGS:
[7.258445] CS:  0010 DS:  ES:  CR0: 80050033
[7.258446] CR2: 7f2ed26c4160 CR3: 0480a000 CR4: 00350ee0
[7.258447] Call Trace:
[7.258449]  
[7.258450]  hpet_rtc_interrupt+0xd3/0x1a3
[7.258454]  __handle_irq_event_percpu+0x6b/0x12e
[7.258457]  handle_irq_event_percpu+0x2c/0x6f
[7.258459]  handle_irq_event+0x23/0x43
[7.258461]  handle_edge_irq+0x9e/0xbb
[7.258463]  asm_call_irq_on_stack+0x12/0x20
[7.258467]  
[7.258467]  common_interrupt+0x9a/0x123
[7.258470]  asm_common_interrupt+0x1e/0x40
[7.258472] RIP: 0010:cpuidle_enter_state+0x13e/0x1fe
[7.258475] Code: 49 89 c4 e8 bd fd ff ff 31 ff e8 3e 80 92 ff 45 84 ff 74 
12 9c 58 0f ba e0 09 73 03 0f 0b fa 31 ff e8 13 16 96 ff fb 45 85 f6 <0f> 88 97 
00 00 00 49 63 d6 4c 2b 24 24 48 6b ca 68 48 6b c2 30 4c
[7.258476] RSP: 0018:c9167eb0 EFLAGS: 0206
[7.258477] RAX: 88840eca8240 RBX: 888101e0d400 RCX: 0001b0a24b16
[7.258478] RDX: 0002 RSI: 0002 RDI: 
[7.258478] RBP: 0003 R08:  R09: 
[7.258479] R10: 88810083c4a8 R11:  R12: 0001b0a24b48
[7.258480] R13: 8299cc60 R14: 0003 R15: 
[7.258482]  cpuidle_enter+0x2b/0x37
[7.258483]  do_idle+0x126/0x184
[7.258485]  cpu_startup_entry+0x18/0x1a
[7.258486]  secondary_startup_64_no_verify+0xb0/0xbb
[7.258489] ---[ end trace 9da59c3696ed99d8 ]---


> Reported-by: Mickaël Salaün 
> Signed-off-by: Thomas Gleixner 
> Tested-by: Mickaël Salaün 
> ---
> V2: Fixed the sizeof() as spotted by Mickaël
> ---
>  drivers/rtc/rtc-cmos.c |8 
>  drivers/rtc/rtc-mc146818-lib.c |7 +++
>  2 files changed, 15 insertions(+)
>
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -805,6 +805,14 @@ cmos_do_probe(struct device *dev, struct
>  
>   spin_lock_irq(&rtc_lock);
>  
> + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
> + if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) {
> + spin_unlock_irq(&rtc_lock);
> + dev_warn(dev, "not accessible\n");
> + retval = -ENXIO;
> + goto cleanup1;
> + }
> +
>   if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) {
>   /* force periodic irq to CMOS reset default of 1024Hz;
>*
> --- a/drivers/rtc/rtc-mc146818-lib.c
> +++ b/drivers/rtc/rtc-mc146818-lib.c
> @@ -21,6 +21,13 @@ u

[PATCH V2] rtc: mc146818: Detect and handle broken RTCs

2021-01-26 Thread Thomas Gleixner
The recent fix for handling the UIP bit unearthed another issue in the RTC
code. If the RTC is advertised but the readout is straight 0xFF because
it's not available, the old code just proceeded with crappy values, but the
new code hangs because it waits for the UIP bit to become low.

Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID
register (Register D) which should have bit 0-6 cleared. If that's not the
case then fail to register the CMOS.

Add the same check to mc146818_get_time(), warn once when the condition
is true and invalidate the rtc_time data.

Reported-by: Mickaël Salaün 
Signed-off-by: Thomas Gleixner 
Tested-by: Mickaël Salaün 
---
V2: Fixed the sizeof() as spotted by Mickaël
---
 drivers/rtc/rtc-cmos.c |8 
 drivers/rtc/rtc-mc146818-lib.c |7 +++
 2 files changed, 15 insertions(+)

--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -805,6 +805,14 @@ cmos_do_probe(struct device *dev, struct
 
spin_lock_irq(&rtc_lock);
 
+   /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
+   if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) {
+   spin_unlock_irq(&rtc_lock);
+   dev_warn(dev, "not accessible\n");
+   retval = -ENXIO;
+   goto cleanup1;
+   }
+
if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) {
/* force periodic irq to CMOS reset default of 1024Hz;
 *
--- a/drivers/rtc/rtc-mc146818-lib.c
+++ b/drivers/rtc/rtc-mc146818-lib.c
@@ -21,6 +21,13 @@ unsigned int mc146818_get_time(struct rt
 
 again:
spin_lock_irqsave(&rtc_lock, flags);
+   /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
+   if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) {
+   spin_unlock_irqrestore(&rtc_lock, flags);
+   memset(time, 0xff, sizeof(*time));
+   return 0;
+   }
+
/*
 * Check whether there is an update in progress during which the
 * readout is unspecified. The maximum update time is ~2ms. Poll


Re: [PATCH V2] rtc: mc146818: Detect and handle broken RTCs

2021-01-26 Thread Alexandre Belloni
On 26/01/2021 18:02:11+0100, Thomas Gleixner wrote:
> The recent fix for handling the UIP bit unearthed another issue in the RTC
> code. If the RTC is advertised but the readout is straight 0xFF because
> it's not available, the old code just proceeded with crappy values, but the
> new code hangs because it waits for the UIP bit to become low.
> 
> Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID
> register (Register D) which should have bit 0-6 cleared. If that's not the
> case then fail to register the CMOS.
> 
> Add the same check to mc146818_get_time(), warn once when the condition
> is true and invalidate the rtc_time data.
> 
> Reported-by: Mickaël Salaün 
> Signed-off-by: Thomas Gleixner 
> Tested-by: Mickaël Salaün 
Acked-by: Alexandre Belloni 

> ---
> V2: Fixed the sizeof() as spotted by Mickaël
> ---
>  drivers/rtc/rtc-cmos.c |8 
>  drivers/rtc/rtc-mc146818-lib.c |7 +++
>  2 files changed, 15 insertions(+)
> 
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -805,6 +805,14 @@ cmos_do_probe(struct device *dev, struct
>  
>   spin_lock_irq(&rtc_lock);
>  
> + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
> + if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) {
> + spin_unlock_irq(&rtc_lock);
> + dev_warn(dev, "not accessible\n");
> + retval = -ENXIO;
> + goto cleanup1;
> + }
> +
>   if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) {
>   /* force periodic irq to CMOS reset default of 1024Hz;
>*
> --- a/drivers/rtc/rtc-mc146818-lib.c
> +++ b/drivers/rtc/rtc-mc146818-lib.c
> @@ -21,6 +21,13 @@ unsigned int mc146818_get_time(struct rt
>  
>  again:
>   spin_lock_irqsave(&rtc_lock, flags);
> + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
> + if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) {
> + spin_unlock_irqrestore(&rtc_lock, flags);
> + memset(time, 0xff, sizeof(*time));
> + return 0;
> + }
> +
>   /*
>* Check whether there is an update in progress during which the
>* readout is unspecified. The maximum update time is ~2ms. Poll

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com