[PATCH 5.11 117/210] i40e: Fix kernel oops when i40e driver removes VFs
From: Eryk Rybak [ Upstream commit 347b5650cd158d1d953487cc2bec567af5c5bf96 ] Fix the reason of kernel oops when i40e driver removed VFs. Added new __I40E_VFS_RELEASING state to signalize releasing process by PF, that it makes possible to exit of reset VF procedure. Without this patch, it is possible to suspend the VFs reset by releasing VFs resources procedure. Retrying the reset after the timeout works on the freed VF memory causing a kernel oops. Fixes: d43d60e5eb95 ("i40e: ensure reset occurs when disabling VF") Signed-off-by: Eryk Rybak Signed-off-by: Grzegorz Szczurek Reviewed-by: Aleksandr Loktionov Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin --- drivers/net/ethernet/intel/i40e/i40e.h | 1 + drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 9 + 2 files changed, 10 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 118473dfdcbd..fe1258778cbc 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -142,6 +142,7 @@ enum i40e_state_t { __I40E_VIRTCHNL_OP_PENDING, __I40E_RECOVERY_MODE, __I40E_VF_RESETS_DISABLED, /* disable resets during i40e_remove */ + __I40E_VFS_RELEASING, /* This must be last as it determines the size of the BITMAP */ __I40E_STATE_SIZE__, }; diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 1b6ec9be155a..5d301a466f5c 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -137,6 +137,7 @@ void i40e_vc_notify_vf_reset(struct i40e_vf *vf) **/ static inline void i40e_vc_disable_vf(struct i40e_vf *vf) { + struct i40e_pf *pf = vf->pf; int i; i40e_vc_notify_vf_reset(vf); @@ -147,6 +148,11 @@ static inline void i40e_vc_disable_vf(struct i40e_vf *vf) * ensure a reset. */ for (i = 0; i < 20; i++) { + /* If PF is in VFs releasing state reset VF is impossible, +* so leave it. +*/ + if (test_bit(__I40E_VFS_RELEASING, pf->state)) + return; if (i40e_reset_vf(vf, false)) return; usleep_range(1, 2); @@ -1574,6 +1580,8 @@ void i40e_free_vfs(struct i40e_pf *pf) if (!pf->vf) return; + + set_bit(__I40E_VFS_RELEASING, pf->state); while (test_and_set_bit(__I40E_VF_DISABLE, pf->state)) usleep_range(1000, 2000); @@ -1631,6 +1639,7 @@ void i40e_free_vfs(struct i40e_pf *pf) } } clear_bit(__I40E_VF_DISABLE, pf->state); + clear_bit(__I40E_VFS_RELEASING, pf->state); } #ifdef CONFIG_PCI_IOV -- 2.30.2
[PATCH 5.10 107/188] i40e: Fix kernel oops when i40e driver removes VFs
From: Eryk Rybak [ Upstream commit 347b5650cd158d1d953487cc2bec567af5c5bf96 ] Fix the reason of kernel oops when i40e driver removed VFs. Added new __I40E_VFS_RELEASING state to signalize releasing process by PF, that it makes possible to exit of reset VF procedure. Without this patch, it is possible to suspend the VFs reset by releasing VFs resources procedure. Retrying the reset after the timeout works on the freed VF memory causing a kernel oops. Fixes: d43d60e5eb95 ("i40e: ensure reset occurs when disabling VF") Signed-off-by: Eryk Rybak Signed-off-by: Grzegorz Szczurek Reviewed-by: Aleksandr Loktionov Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin --- drivers/net/ethernet/intel/i40e/i40e.h | 1 + drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 9 + 2 files changed, 10 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 118473dfdcbd..fe1258778cbc 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -142,6 +142,7 @@ enum i40e_state_t { __I40E_VIRTCHNL_OP_PENDING, __I40E_RECOVERY_MODE, __I40E_VF_RESETS_DISABLED, /* disable resets during i40e_remove */ + __I40E_VFS_RELEASING, /* This must be last as it determines the size of the BITMAP */ __I40E_STATE_SIZE__, }; diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 3b269c70dcfe..e4f13a49c3df 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -137,6 +137,7 @@ void i40e_vc_notify_vf_reset(struct i40e_vf *vf) **/ static inline void i40e_vc_disable_vf(struct i40e_vf *vf) { + struct i40e_pf *pf = vf->pf; int i; i40e_vc_notify_vf_reset(vf); @@ -147,6 +148,11 @@ static inline void i40e_vc_disable_vf(struct i40e_vf *vf) * ensure a reset. */ for (i = 0; i < 20; i++) { + /* If PF is in VFs releasing state reset VF is impossible, +* so leave it. +*/ + if (test_bit(__I40E_VFS_RELEASING, pf->state)) + return; if (i40e_reset_vf(vf, false)) return; usleep_range(1, 2); @@ -1574,6 +1580,8 @@ void i40e_free_vfs(struct i40e_pf *pf) if (!pf->vf) return; + + set_bit(__I40E_VFS_RELEASING, pf->state); while (test_and_set_bit(__I40E_VF_DISABLE, pf->state)) usleep_range(1000, 2000); @@ -1631,6 +1639,7 @@ void i40e_free_vfs(struct i40e_pf *pf) } } clear_bit(__I40E_VF_DISABLE, pf->state); + clear_bit(__I40E_VFS_RELEASING, pf->state); } #ifdef CONFIG_PCI_IOV -- 2.30.2
[PATCH 5.4 054/111] i40e: Fix kernel oops when i40e driver removes VFs
From: Eryk Rybak [ Upstream commit 347b5650cd158d1d953487cc2bec567af5c5bf96 ] Fix the reason of kernel oops when i40e driver removed VFs. Added new __I40E_VFS_RELEASING state to signalize releasing process by PF, that it makes possible to exit of reset VF procedure. Without this patch, it is possible to suspend the VFs reset by releasing VFs resources procedure. Retrying the reset after the timeout works on the freed VF memory causing a kernel oops. Fixes: d43d60e5eb95 ("i40e: ensure reset occurs when disabling VF") Signed-off-by: Eryk Rybak Signed-off-by: Grzegorz Szczurek Reviewed-by: Aleksandr Loktionov Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin --- drivers/net/ethernet/intel/i40e/i40e.h | 1 + drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 9 + 2 files changed, 10 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 678e4190b8a8..e571c6116c4b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -152,6 +152,7 @@ enum i40e_state_t { __I40E_VIRTCHNL_OP_PENDING, __I40E_RECOVERY_MODE, __I40E_VF_RESETS_DISABLED, /* disable resets during i40e_remove */ + __I40E_VFS_RELEASING, /* This must be last as it determines the size of the BITMAP */ __I40E_STATE_SIZE__, }; diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 5acd599d6b9a..e56107305486 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -137,6 +137,7 @@ void i40e_vc_notify_vf_reset(struct i40e_vf *vf) **/ static inline void i40e_vc_disable_vf(struct i40e_vf *vf) { + struct i40e_pf *pf = vf->pf; int i; i40e_vc_notify_vf_reset(vf); @@ -147,6 +148,11 @@ static inline void i40e_vc_disable_vf(struct i40e_vf *vf) * ensure a reset. */ for (i = 0; i < 20; i++) { + /* If PF is in VFs releasing state reset VF is impossible, +* so leave it. +*/ + if (test_bit(__I40E_VFS_RELEASING, pf->state)) + return; if (i40e_reset_vf(vf, false)) return; usleep_range(1, 2); @@ -1506,6 +1512,8 @@ void i40e_free_vfs(struct i40e_pf *pf) if (!pf->vf) return; + + set_bit(__I40E_VFS_RELEASING, pf->state); while (test_and_set_bit(__I40E_VF_DISABLE, pf->state)) usleep_range(1000, 2000); @@ -1563,6 +1571,7 @@ void i40e_free_vfs(struct i40e_pf *pf) } } clear_bit(__I40E_VF_DISABLE, pf->state); + clear_bit(__I40E_VFS_RELEASING, pf->state); } #ifdef CONFIG_PCI_IOV -- 2.30.2
[PATCH 4.19 34/66] i40e: Fix kernel oops when i40e driver removes VFs
From: Eryk Rybak [ Upstream commit 347b5650cd158d1d953487cc2bec567af5c5bf96 ] Fix the reason of kernel oops when i40e driver removed VFs. Added new __I40E_VFS_RELEASING state to signalize releasing process by PF, that it makes possible to exit of reset VF procedure. Without this patch, it is possible to suspend the VFs reset by releasing VFs resources procedure. Retrying the reset after the timeout works on the freed VF memory causing a kernel oops. Fixes: d43d60e5eb95 ("i40e: ensure reset occurs when disabling VF") Signed-off-by: Eryk Rybak Signed-off-by: Grzegorz Szczurek Reviewed-by: Aleksandr Loktionov Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin --- drivers/net/ethernet/intel/i40e/i40e.h | 1 + drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 9 + 2 files changed, 10 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 738acba7a9a3..3c921dfc2056 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -149,6 +149,7 @@ enum i40e_state_t { __I40E_CLIENT_L2_CHANGE, __I40E_CLIENT_RESET, __I40E_VF_RESETS_DISABLED, /* disable resets during i40e_remove */ + __I40E_VFS_RELEASING, /* This must be last as it determines the size of the BITMAP */ __I40E_STATE_SIZE__, }; diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 5d782148d35f..3c1533c627fd 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -137,6 +137,7 @@ void i40e_vc_notify_vf_reset(struct i40e_vf *vf) **/ static inline void i40e_vc_disable_vf(struct i40e_vf *vf) { + struct i40e_pf *pf = vf->pf; int i; i40e_vc_notify_vf_reset(vf); @@ -147,6 +148,11 @@ static inline void i40e_vc_disable_vf(struct i40e_vf *vf) * ensure a reset. */ for (i = 0; i < 20; i++) { + /* If PF is in VFs releasing state reset VF is impossible, +* so leave it. +*/ + if (test_bit(__I40E_VFS_RELEASING, pf->state)) + return; if (i40e_reset_vf(vf, false)) return; usleep_range(1, 2); @@ -1381,6 +1387,8 @@ void i40e_free_vfs(struct i40e_pf *pf) if (!pf->vf) return; + + set_bit(__I40E_VFS_RELEASING, pf->state); while (test_and_set_bit(__I40E_VF_DISABLE, pf->state)) usleep_range(1000, 2000); @@ -1438,6 +1446,7 @@ void i40e_free_vfs(struct i40e_pf *pf) } } clear_bit(__I40E_VF_DISABLE, pf->state); + clear_bit(__I40E_VFS_RELEASING, pf->state); } #ifdef CONFIG_PCI_IOV -- 2.30.2
[PATCH 4.14 01/18] Bluetooth: fix kernel oops in store_pending_adv_report
From: Alain Michaud commit a2ec905d1e160a33b2e210e45ad30445ef26ce0e upstream. Fix kernel oops observed when an ext adv data is larger than 31 bytes. This can be reproduced by setting up an advertiser with advertisement larger than 31 bytes. The issue is not sensitive to the advertisement content. In particular, this was reproduced with an advertisement of 229 bytes filled with 'A'. See stack trace below. This is fixed by not catching ext_adv as legacy adv are only cached to be able to concatenate a scanable adv with its scan response before sending it up through mgmt. With ext_adv, this is no longer necessary. general protection fault: [#1] SMP PTI CPU: 6 PID: 205 Comm: kworker/u17:0 Not tainted 5.4.0-37-generic #41-Ubuntu Hardware name: Dell Inc. XPS 15 7590/0CF6RR, BIOS 1.7.0 05/11/2020 Workqueue: hci0 hci_rx_work [bluetooth] RIP: 0010:hci_bdaddr_list_lookup+0x1e/0x40 [bluetooth] Code: ff ff e9 26 ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 39 c7 75 0a eb 24 48 8b 00 48 39 f8 74 1c 44 8b 06 <44> 39 40 10 75 ef 44 0f b7 4e 04 66 44 39 48 14 75 e3 38 50 16 75 RSP: 0018:bc6a40493c70 EFLAGS: 00010286 RAX: 4141414141414141 RBX: 001b RCX: RDX: RSI: 9903e76c100f RDI: 9904289d4b28 RBP: bc6a40493c70 R08: 93570362 R09: R10: R11: 9904344eae38 R12: 9904289d4000 R13: R14: ffa3 R15: 9903e76c100f FS: () GS:99043458() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7feed125a000 CR3: 0001b860a003 CR4: 003606e0 Call Trace: process_adv_report+0x12e/0x560 [bluetooth] hci_le_meta_evt+0x7b2/0xba0 [bluetooth] hci_event_packet+0x1c29/0x2a90 [bluetooth] hci_rx_work+0x19b/0x360 [bluetooth] process_one_work+0x1eb/0x3b0 worker_thread+0x4d/0x400 kthread+0x104/0x140 Fixes: c215e9397b00 ("Bluetooth: Process extended ADV report event") Reported-by: Andy Nguyen Reported-by: Linus Torvalds Reported-by: Balakrishna Godavarthi Signed-off-by: Alain Michaud Tested-by: Sonny Sasaka Acked-by: Marcel Holtmann Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- net/bluetooth/hci_event.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -1133,6 +1133,9 @@ static void store_pending_adv_report(str { struct discovery_state *d = >discovery; + if (len > HCI_MAX_AD_LENGTH) + return; + bacpy(>last_adv_addr, bdaddr); d->last_adv_addr_type = bdaddr_type; d->last_adv_rssi = rssi; @@ -4779,6 +4782,11 @@ static void process_adv_report(struct hc return; } + if (len > HCI_MAX_AD_LENGTH) { + pr_err_ratelimited("legacy adv larger than 31 bytes"); + return; + } + /* Find the end of the data in case the report contains padded zero * bytes at the end causing an invalid length value. * @@ -4839,7 +4847,7 @@ static void process_adv_report(struct hc */ conn = check_pending_le_conn(hdev, bdaddr, bdaddr_type, type, direct_addr); - if (conn && type == LE_ADV_IND) { + if (conn && type == LE_ADV_IND && len <= HCI_MAX_AD_LENGTH) { /* Store report for later inclusion by * mgmt_device_connected */ @@ -4964,10 +4972,14 @@ static void hci_le_adv_report_evt(struct struct hci_ev_le_advertising_info *ev = ptr; s8 rssi; - rssi = ev->data[ev->length]; - process_adv_report(hdev, ev->evt_type, >bdaddr, - ev->bdaddr_type, NULL, 0, rssi, - ev->data, ev->length); + if (ev->length <= HCI_MAX_AD_LENGTH) { + rssi = ev->data[ev->length]; + process_adv_report(hdev, ev->evt_type, >bdaddr, + ev->bdaddr_type, NULL, 0, rssi, + ev->data, ev->length); + } else { + bt_dev_err(hdev, "Dropping invalid advertising data"); + } ptr += sizeof(*ev) + ev->length + 1; }
[PATCH 4.9 04/16] Bluetooth: fix kernel oops in store_pending_adv_report
From: Alain Michaud commit a2ec905d1e160a33b2e210e45ad30445ef26ce0e upstream. Fix kernel oops observed when an ext adv data is larger than 31 bytes. This can be reproduced by setting up an advertiser with advertisement larger than 31 bytes. The issue is not sensitive to the advertisement content. In particular, this was reproduced with an advertisement of 229 bytes filled with 'A'. See stack trace below. This is fixed by not catching ext_adv as legacy adv are only cached to be able to concatenate a scanable adv with its scan response before sending it up through mgmt. With ext_adv, this is no longer necessary. general protection fault: [#1] SMP PTI CPU: 6 PID: 205 Comm: kworker/u17:0 Not tainted 5.4.0-37-generic #41-Ubuntu Hardware name: Dell Inc. XPS 15 7590/0CF6RR, BIOS 1.7.0 05/11/2020 Workqueue: hci0 hci_rx_work [bluetooth] RIP: 0010:hci_bdaddr_list_lookup+0x1e/0x40 [bluetooth] Code: ff ff e9 26 ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 39 c7 75 0a eb 24 48 8b 00 48 39 f8 74 1c 44 8b 06 <44> 39 40 10 75 ef 44 0f b7 4e 04 66 44 39 48 14 75 e3 38 50 16 75 RSP: 0018:bc6a40493c70 EFLAGS: 00010286 RAX: 4141414141414141 RBX: 001b RCX: RDX: RSI: 9903e76c100f RDI: 9904289d4b28 RBP: bc6a40493c70 R08: 93570362 R09: R10: R11: 9904344eae38 R12: 9904289d4000 R13: R14: ffa3 R15: 9903e76c100f FS: () GS:99043458() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7feed125a000 CR3: 0001b860a003 CR4: 003606e0 Call Trace: process_adv_report+0x12e/0x560 [bluetooth] hci_le_meta_evt+0x7b2/0xba0 [bluetooth] hci_event_packet+0x1c29/0x2a90 [bluetooth] hci_rx_work+0x19b/0x360 [bluetooth] process_one_work+0x1eb/0x3b0 worker_thread+0x4d/0x400 kthread+0x104/0x140 Fixes: c215e9397b00 ("Bluetooth: Process extended ADV report event") Reported-by: Andy Nguyen Reported-by: Linus Torvalds Reported-by: Balakrishna Godavarthi Signed-off-by: Alain Michaud Tested-by: Sonny Sasaka Acked-by: Marcel Holtmann Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- net/bluetooth/hci_event.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -1133,6 +1133,9 @@ static void store_pending_adv_report(str { struct discovery_state *d = >discovery; + if (len > HCI_MAX_AD_LENGTH) + return; + bacpy(>last_adv_addr, bdaddr); d->last_adv_addr_type = bdaddr_type; d->last_adv_rssi = rssi; @@ -4779,6 +4782,11 @@ static void process_adv_report(struct hc return; } + if (len > HCI_MAX_AD_LENGTH) { + pr_err_ratelimited("legacy adv larger than 31 bytes"); + return; + } + /* Find the end of the data in case the report contains padded zero * bytes at the end causing an invalid length value. * @@ -4839,7 +4847,7 @@ static void process_adv_report(struct hc */ conn = check_pending_le_conn(hdev, bdaddr, bdaddr_type, type, direct_addr); - if (conn && type == LE_ADV_IND) { + if (conn && type == LE_ADV_IND && len <= HCI_MAX_AD_LENGTH) { /* Store report for later inclusion by * mgmt_device_connected */ @@ -4964,10 +4972,14 @@ static void hci_le_adv_report_evt(struct struct hci_ev_le_advertising_info *ev = ptr; s8 rssi; - rssi = ev->data[ev->length]; - process_adv_report(hdev, ev->evt_type, >bdaddr, - ev->bdaddr_type, NULL, 0, rssi, - ev->data, ev->length); + if (ev->length <= HCI_MAX_AD_LENGTH) { + rssi = ev->data[ev->length]; + process_adv_report(hdev, ev->evt_type, >bdaddr, + ev->bdaddr_type, NULL, 0, rssi, + ev->data, ev->length); + } else { + bt_dev_err(hdev, "Dropping invalid advertising data"); + } ptr += sizeof(*ev) + ev->length + 1; }
[PATCH 4.4 03/16] Bluetooth: fix kernel oops in store_pending_adv_report
From: Alain Michaud commit a2ec905d1e160a33b2e210e45ad30445ef26ce0e upstream. Fix kernel oops observed when an ext adv data is larger than 31 bytes. This can be reproduced by setting up an advertiser with advertisement larger than 31 bytes. The issue is not sensitive to the advertisement content. In particular, this was reproduced with an advertisement of 229 bytes filled with 'A'. See stack trace below. This is fixed by not catching ext_adv as legacy adv are only cached to be able to concatenate a scanable adv with its scan response before sending it up through mgmt. With ext_adv, this is no longer necessary. general protection fault: [#1] SMP PTI CPU: 6 PID: 205 Comm: kworker/u17:0 Not tainted 5.4.0-37-generic #41-Ubuntu Hardware name: Dell Inc. XPS 15 7590/0CF6RR, BIOS 1.7.0 05/11/2020 Workqueue: hci0 hci_rx_work [bluetooth] RIP: 0010:hci_bdaddr_list_lookup+0x1e/0x40 [bluetooth] Code: ff ff e9 26 ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 39 c7 75 0a eb 24 48 8b 00 48 39 f8 74 1c 44 8b 06 <44> 39 40 10 75 ef 44 0f b7 4e 04 66 44 39 48 14 75 e3 38 50 16 75 RSP: 0018:bc6a40493c70 EFLAGS: 00010286 RAX: 4141414141414141 RBX: 001b RCX: RDX: RSI: 9903e76c100f RDI: 9904289d4b28 RBP: bc6a40493c70 R08: 93570362 R09: R10: R11: 9904344eae38 R12: 9904289d4000 R13: R14: ffa3 R15: 9903e76c100f FS: () GS:99043458() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7feed125a000 CR3: 0001b860a003 CR4: 003606e0 Call Trace: process_adv_report+0x12e/0x560 [bluetooth] hci_le_meta_evt+0x7b2/0xba0 [bluetooth] hci_event_packet+0x1c29/0x2a90 [bluetooth] hci_rx_work+0x19b/0x360 [bluetooth] process_one_work+0x1eb/0x3b0 worker_thread+0x4d/0x400 kthread+0x104/0x140 Fixes: c215e9397b00 ("Bluetooth: Process extended ADV report event") Reported-by: Andy Nguyen Reported-by: Linus Torvalds Reported-by: Balakrishna Godavarthi Signed-off-by: Alain Michaud Tested-by: Sonny Sasaka Acked-by: Marcel Holtmann Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- net/bluetooth/hci_event.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -1133,6 +1133,9 @@ static void store_pending_adv_report(str { struct discovery_state *d = >discovery; + if (len > HCI_MAX_AD_LENGTH) + return; + bacpy(>last_adv_addr, bdaddr); d->last_adv_addr_type = bdaddr_type; d->last_adv_rssi = rssi; @@ -4752,6 +4755,11 @@ static void process_adv_report(struct hc u32 flags; u8 *ptr, real_len; + if (len > HCI_MAX_AD_LENGTH) { + pr_err_ratelimited("legacy adv larger than 31 bytes"); + return; + } + /* Find the end of the data in case the report contains padded zero * bytes at the end causing an invalid length value. * @@ -4812,7 +4820,7 @@ static void process_adv_report(struct hc */ conn = check_pending_le_conn(hdev, bdaddr, bdaddr_type, type, direct_addr); - if (conn && type == LE_ADV_IND) { + if (conn && type == LE_ADV_IND && len <= HCI_MAX_AD_LENGTH) { /* Store report for later inclusion by * mgmt_device_connected */ @@ -4937,10 +4945,14 @@ static void hci_le_adv_report_evt(struct struct hci_ev_le_advertising_info *ev = ptr; s8 rssi; - rssi = ev->data[ev->length]; - process_adv_report(hdev, ev->evt_type, >bdaddr, - ev->bdaddr_type, NULL, 0, rssi, - ev->data, ev->length); + if (ev->length <= HCI_MAX_AD_LENGTH) { + rssi = ev->data[ev->length]; + process_adv_report(hdev, ev->evt_type, >bdaddr, + ev->bdaddr_type, NULL, 0, rssi, + ev->data, ev->length); + } else { + bt_dev_err(hdev, "Dropping invalid advertising data"); + } ptr += sizeof(*ev) + ev->length + 1; }
[sparc64] kernel OOPS bisected from "lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state"
Hello! The following git patch 044d0d6de9f50192f9697583504a382347ee95ca (linux git master branch) introduced the following kernel OOPS upon kernel boot on my sparc64 T5-2 ldom (VM): $ uname -a Linux ttip 5.9.0-rc2-00011-g044d0d6de9f5 #59 SMP Thu Sep 10 13:07:45 MSK 2020 sparc64 GNU/Linux (OOPS is from the latest tag, but the same on commit above) ... rcu: Hierarchical SRCU implementation. smp: Bringing up secondary CPUs ... [ cut here ] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:4875 check_flags+0x9c/0x2c0 DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4 #36 Call Trace: [<004727a8>] __warn+0xa8/0x120 [<00472c10>] warn_slowpath_fmt+0x64/0x74 [<004e859c>] check_flags+0x9c/0x2c0 [<00c17ca0>] lock_is_held_type+0x20/0x140 [<005095f4>] rcu_read_lock_sched_held+0x54/0xa0 [<004ed4c0>] lock_acquire+0x120/0x480 [<00c21610>] _raw_spin_lock+0x30/0x60 [<009b9bdc>] p1275_cmd_direct+0x1c/0x60 [<009b9ab0>] prom_startcpu_cpuid+0x30/0x40 [<004427e4>] __cpu_up+0x184/0x3a0 [<00474600>] bringup_cpu+0x20/0x120 [<0047378c>] cpuhp_invoke_callback+0xec/0x340 [<004753d4>] cpu_up+0x154/0x220 [<00475c60>] bringup_nonboot_cpus+0x60/0xa0 [<00fbc338>] smp_init+0x28/0xa0 [<00fad3b4>] kernel_init_freeable+0x18c/0x300 irq event stamp: 5135 hardirqs last enabled at (5135): [<00c21a28>] _raw_spin_unlock_irqrestore+0x28/0x60 hardirqs last disabled at (5134): [<00c217e0>] _raw_spin_lock_irqsave+0x20/0x80 softirqs last enabled at (1474): [<00c245a0>] __do_softirq+0x4e0/0x560 softirqs last disabled at (1467): [<0042d394>] do_softirq_own_stack+0x34/0x60 random: get_random_bytes called from __warn+0xc8/0x120 with crng_init=0 ---[ end trace 4cf960ae85148e2e ]--- possible reason: unannotated irqs-off. irq event stamp: 5135 hardirqs last enabled at (5135): [<00c21a28>] _raw_spin_unlock_irqrestore+0x28/0x60 hardirqs last disabled at (5134): [<00c217e0>] _raw_spin_lock_irqsave+0x20/0x80 softirqs last enabled at (1474): [<00c245a0>] __do_softirq+0x4e0/0x560 softirqs last disabled at (1467): [<0042d394>] do_softirq_own_stack+0x34/0x60 smp: Brought up 1 node, 32 CPUs devtmpfs: initialized ... full boot log in [1], kernel config in [2] linux-2.6$ git bisect log git bisect start # good: [d012a7190fc1fd72ed48911e77ca97ba4521bccd] Linux 5.9-rc2 git bisect good d012a7190fc1fd72ed48911e77ca97ba4521bccd # bad: [34d4ddd359dbcdf6c5fb3f85a179243d7a1cb7f8] Merge tag 'linux-kselftest-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest git bisect bad 34d4ddd359dbcdf6c5fb3f85a179243d7a1cb7f8 # bad: [e1d0126ca3a66c284a02b083a42e2b39558002cd] Merge tag 'xfs-5.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux git bisect bad e1d0126ca3a66c284a02b083a42e2b39558002cd # good: [24148d8648e37f8c15bedddfa50d14a31a0582c5] Merge tag 'io_uring-5.9-2020-08-28' of git://git.kernel.dk/linux-block git bisect good 24148d8648e37f8c15bedddfa50d14a31a0582c5 # bad: [b69bea8a657b681442765b06be92a2607b1bd875] Merge tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad b69bea8a657b681442765b06be92a2607b1bd875 # good: [20934c0de13b49a072fb1e0ca79fe0fe0e40eae5] usb: storage: Add unusual_uas entry for Sony PSZ drives git bisect good 20934c0de13b49a072fb1e0ca79fe0fe0e40eae5 # good: [c4011283a7d5d64a50991dd3baa9acdf3d49092c] Merge tag 'dma-mapping-5.9-2' of git://git.infradead.org/users/hch/dma-mapping git bisect good c4011283a7d5d64a50991dd3baa9acdf3d49092c # good: [8bb5021cc2ee5d5dd129a9f2f5ad2bb76eea297d] Merge tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux git bisect good 8bb5021cc2ee5d5dd129a9f2f5ad2bb76eea297d # good: [00b0ed2d4997af6d0a93edef820386951fd66d94] locking/lockdep: Cleanup git bisect good 00b0ed2d4997af6d0a93edef820386951fd66d94 # bad: [044d0d6de9f50192f9697583504a382347ee95ca] lockdep: Only trace IRQ edges git bisect bad 044d0d6de9f50192f9697583504a382347ee95ca # good: [021c109330ebc1f54b546c63a078ea3c31356ecb] arm64: Implement arch_irqs_disabled() git bisect good 021c109330ebc1f54b546c63a078ea3c31356ecb # good: [99dc56feb7932020502d40107a712fa302b32082] mips: Implement arch_irqs_disabled() git bisect good 99dc56feb7932020502d40107a712fa302b32082 # first bad commit: [044d0d6de9f50192f9697583504a382347ee95ca] lockdep: Only trace IRQ edges 1. https://github.com/mator/sparc64-dmesg/blob/master/dmesg-5.9.0-rc4 2. https://github.com/mator/sparc64-dmesg/blob/master/config-5.9.0-rc4.gz
Re: [sparc64] kernel OOPS bisected from "lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state"
On Thu, Sep 10, 2020 at 4:40 PM wrote: > > On Thu, Sep 10, 2020 at 02:43:13PM +0300, Anatoly Pugachev wrote: > > Hello! > > > > The following git patch 044d0d6de9f50192f9697583504a382347ee95ca > > (linux git master branch) introduced the following kernel OOPS upon > > kernel boot on my sparc64 T5-2 ldom (VM): > > https://lkml.kernel.org/r/20200908154157.gv1362...@hirez.programming.kicks-ass.net Peter, thanks! That fixes the issue for me.
Re: [sparc64] kernel OOPS bisected from "lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state"
On Thu, Sep 10, 2020 at 02:43:13PM +0300, Anatoly Pugachev wrote: > Hello! > > The following git patch 044d0d6de9f50192f9697583504a382347ee95ca > (linux git master branch) introduced the following kernel OOPS upon > kernel boot on my sparc64 T5-2 ldom (VM): https://lkml.kernel.org/r/20200908154157.gv1362...@hirez.programming.kicks-ass.net
[PATCH 5.4 073/214] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
From: Marc Zyngier [ Upstream commit 63ef91f24f9bfc70b6446319f6cabfd094481372 ] Booting a recent kernel on a rk3399-based system (nanopc-t4), equipped with a recent u-boot and ATF results in an Oops due to a NULL pointer dereference. This turns out to be due to the rk3399-dmc driver looking for an *undocumented* property (rockchip,pmu), and happily using a NULL pointer when the property isn't there. Instead, make most of what was brought in with 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") conditioned on finding this property in the device-tree, preventing the driver from exploding. Cc: sta...@vger.kernel.org Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") Signed-off-by: Marc Zyngier Signed-off-by: Chanwoo Choi Signed-off-by: Sasha Levin --- drivers/devfreq/rk3399_dmc.c | 42 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/drivers/devfreq/rk3399_dmc.c b/drivers/devfreq/rk3399_dmc.c index 24f04f78285b7..027769e39f9b8 100644 --- a/drivers/devfreq/rk3399_dmc.c +++ b/drivers/devfreq/rk3399_dmc.c @@ -95,18 +95,20 @@ static int rk3399_dmcfreq_target(struct device *dev, unsigned long *freq, mutex_lock(>lock); - if (target_rate >= dmcfreq->odt_dis_freq) - odt_enable = true; - - /* -* This makes a SMC call to the TF-A to set the DDR PD (power-down) -* timings and to enable or disable the ODT (on-die termination) -* resistors. -*/ - arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, - dmcfreq->odt_pd_arg1, - ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, - odt_enable, 0, 0, 0, ); + if (dmcfreq->regmap_pmu) { + if (target_rate >= dmcfreq->odt_dis_freq) + odt_enable = true; + + /* +* This makes a SMC call to the TF-A to set the DDR PD +* (power-down) timings and to enable or disable the +* ODT (on-die termination) resistors. +*/ + arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, + dmcfreq->odt_pd_arg1, + ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, + odt_enable, 0, 0, 0, ); + } /* * If frequency scaling from low to high, adjust voltage first. @@ -371,13 +373,14 @@ static int rk3399_dmcfreq_probe(struct platform_device *pdev) } node = of_parse_phandle(np, "rockchip,pmu", 0); - if (node) { - data->regmap_pmu = syscon_node_to_regmap(node); - of_node_put(node); - if (IS_ERR(data->regmap_pmu)) { - ret = PTR_ERR(data->regmap_pmu); - goto err_edev; - } + if (!node) + goto no_pmu; + + data->regmap_pmu = syscon_node_to_regmap(node); + of_node_put(node); + if (IS_ERR(data->regmap_pmu)) { + ret = PTR_ERR(data->regmap_pmu); + goto err_edev; } regmap_read(data->regmap_pmu, RK3399_PMUGRF_OS_REG2, ); @@ -399,6 +402,7 @@ static int rk3399_dmcfreq_probe(struct platform_device *pdev) goto err_edev; }; +no_pmu: arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, 0, 0, ROCKCHIP_SIP_CONFIG_DRAM_INIT, 0, 0, 0, 0, ); -- 2.25.1
[PATCH 5.7 363/393] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
From: Marc Zyngier commit 63ef91f24f9bfc70b6446319f6cabfd094481372 upstream. Booting a recent kernel on a rk3399-based system (nanopc-t4), equipped with a recent u-boot and ATF results in an Oops due to a NULL pointer dereference. This turns out to be due to the rk3399-dmc driver looking for an *undocumented* property (rockchip,pmu), and happily using a NULL pointer when the property isn't there. Instead, make most of what was brought in with 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") conditioned on finding this property in the device-tree, preventing the driver from exploding. Cc: sta...@vger.kernel.org Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") Signed-off-by: Marc Zyngier Signed-off-by: Chanwoo Choi Signed-off-by: Greg Kroah-Hartman --- drivers/devfreq/rk3399_dmc.c | 42 +++--- 1 file changed, 23 insertions(+), 19 deletions(-) --- a/drivers/devfreq/rk3399_dmc.c +++ b/drivers/devfreq/rk3399_dmc.c @@ -95,18 +95,20 @@ static int rk3399_dmcfreq_target(struct mutex_lock(>lock); - if (target_rate >= dmcfreq->odt_dis_freq) - odt_enable = true; - - /* -* This makes a SMC call to the TF-A to set the DDR PD (power-down) -* timings and to enable or disable the ODT (on-die termination) -* resistors. -*/ - arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, - dmcfreq->odt_pd_arg1, - ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, - odt_enable, 0, 0, 0, ); + if (dmcfreq->regmap_pmu) { + if (target_rate >= dmcfreq->odt_dis_freq) + odt_enable = true; + + /* +* This makes a SMC call to the TF-A to set the DDR PD +* (power-down) timings and to enable or disable the +* ODT (on-die termination) resistors. +*/ + arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, + dmcfreq->odt_pd_arg1, + ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, + odt_enable, 0, 0, 0, ); + } /* * If frequency scaling from low to high, adjust voltage first. @@ -371,13 +373,14 @@ static int rk3399_dmcfreq_probe(struct p } node = of_parse_phandle(np, "rockchip,pmu", 0); - if (node) { - data->regmap_pmu = syscon_node_to_regmap(node); - of_node_put(node); - if (IS_ERR(data->regmap_pmu)) { - ret = PTR_ERR(data->regmap_pmu); - goto err_edev; - } + if (!node) + goto no_pmu; + + data->regmap_pmu = syscon_node_to_regmap(node); + of_node_put(node); + if (IS_ERR(data->regmap_pmu)) { + ret = PTR_ERR(data->regmap_pmu); + goto err_edev; } regmap_read(data->regmap_pmu, RK3399_PMUGRF_OS_REG2, ); @@ -399,6 +402,7 @@ static int rk3399_dmcfreq_probe(struct p goto err_edev; }; +no_pmu: arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, 0, 0, ROCKCHIP_SIP_CONFIG_DRAM_INIT, 0, 0, 0, 0, );
[PATCH 5.8 434/464] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
From: Marc Zyngier commit 63ef91f24f9bfc70b6446319f6cabfd094481372 upstream. Booting a recent kernel on a rk3399-based system (nanopc-t4), equipped with a recent u-boot and ATF results in an Oops due to a NULL pointer dereference. This turns out to be due to the rk3399-dmc driver looking for an *undocumented* property (rockchip,pmu), and happily using a NULL pointer when the property isn't there. Instead, make most of what was brought in with 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") conditioned on finding this property in the device-tree, preventing the driver from exploding. Cc: sta...@vger.kernel.org Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") Signed-off-by: Marc Zyngier Signed-off-by: Chanwoo Choi Signed-off-by: Greg Kroah-Hartman --- drivers/devfreq/rk3399_dmc.c | 42 +++--- 1 file changed, 23 insertions(+), 19 deletions(-) --- a/drivers/devfreq/rk3399_dmc.c +++ b/drivers/devfreq/rk3399_dmc.c @@ -95,18 +95,20 @@ static int rk3399_dmcfreq_target(struct mutex_lock(>lock); - if (target_rate >= dmcfreq->odt_dis_freq) - odt_enable = true; - - /* -* This makes a SMC call to the TF-A to set the DDR PD (power-down) -* timings and to enable or disable the ODT (on-die termination) -* resistors. -*/ - arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, - dmcfreq->odt_pd_arg1, - ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, - odt_enable, 0, 0, 0, ); + if (dmcfreq->regmap_pmu) { + if (target_rate >= dmcfreq->odt_dis_freq) + odt_enable = true; + + /* +* This makes a SMC call to the TF-A to set the DDR PD +* (power-down) timings and to enable or disable the +* ODT (on-die termination) resistors. +*/ + arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, + dmcfreq->odt_pd_arg1, + ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, + odt_enable, 0, 0, 0, ); + } /* * If frequency scaling from low to high, adjust voltage first. @@ -371,13 +373,14 @@ static int rk3399_dmcfreq_probe(struct p } node = of_parse_phandle(np, "rockchip,pmu", 0); - if (node) { - data->regmap_pmu = syscon_node_to_regmap(node); - of_node_put(node); - if (IS_ERR(data->regmap_pmu)) { - ret = PTR_ERR(data->regmap_pmu); - goto err_edev; - } + if (!node) + goto no_pmu; + + data->regmap_pmu = syscon_node_to_regmap(node); + of_node_put(node); + if (IS_ERR(data->regmap_pmu)) { + ret = PTR_ERR(data->regmap_pmu); + goto err_edev; } regmap_read(data->regmap_pmu, RK3399_PMUGRF_OS_REG2, ); @@ -399,6 +402,7 @@ static int rk3399_dmcfreq_probe(struct p goto err_edev; }; +no_pmu: arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, 0, 0, ROCKCHIP_SIP_CONFIG_DRAM_INIT, 0, 0, 0, 0, );
[PATCH 4.19 43/56] Bluetooth: fix kernel oops in store_pending_adv_report
From: Alain Michaud [ Upstream commit a2ec905d1e160a33b2e210e45ad30445ef26ce0e ] Fix kernel oops observed when an ext adv data is larger than 31 bytes. This can be reproduced by setting up an advertiser with advertisement larger than 31 bytes. The issue is not sensitive to the advertisement content. In particular, this was reproduced with an advertisement of 229 bytes filled with 'A'. See stack trace below. This is fixed by not catching ext_adv as legacy adv are only cached to be able to concatenate a scanable adv with its scan response before sending it up through mgmt. With ext_adv, this is no longer necessary. general protection fault: [#1] SMP PTI CPU: 6 PID: 205 Comm: kworker/u17:0 Not tainted 5.4.0-37-generic #41-Ubuntu Hardware name: Dell Inc. XPS 15 7590/0CF6RR, BIOS 1.7.0 05/11/2020 Workqueue: hci0 hci_rx_work [bluetooth] RIP: 0010:hci_bdaddr_list_lookup+0x1e/0x40 [bluetooth] Code: ff ff e9 26 ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 39 c7 75 0a eb 24 48 8b 00 48 39 f8 74 1c 44 8b 06 <44> 39 40 10 75 ef 44 0f b7 4e 04 66 44 39 48 14 75 e3 38 50 16 75 RSP: 0018:bc6a40493c70 EFLAGS: 00010286 RAX: 4141414141414141 RBX: 001b RCX: RDX: RSI: 9903e76c100f RDI: 9904289d4b28 RBP: bc6a40493c70 R08: 93570362 R09: R10: R11: 9904344eae38 R12: 9904289d4000 R13: R14: ffa3 R15: 9903e76c100f FS: () GS:99043458() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7feed125a000 CR3: 0001b860a003 CR4: 003606e0 Call Trace: process_adv_report+0x12e/0x560 [bluetooth] hci_le_meta_evt+0x7b2/0xba0 [bluetooth] hci_event_packet+0x1c29/0x2a90 [bluetooth] hci_rx_work+0x19b/0x360 [bluetooth] process_one_work+0x1eb/0x3b0 worker_thread+0x4d/0x400 kthread+0x104/0x140 Fixes: c215e9397b00 ("Bluetooth: Process extended ADV report event") Reported-by: Andy Nguyen Reported-by: Linus Torvalds Reported-by: Balakrishna Godavarthi Signed-off-by: Alain Michaud Tested-by: Sonny Sasaka Acked-by: Marcel Holtmann Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- net/bluetooth/hci_event.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index a044e6bb12b84..cdb92b129906f 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -1229,6 +1229,9 @@ static void store_pending_adv_report(struct hci_dev *hdev, bdaddr_t *bdaddr, { struct discovery_state *d = >discovery; + if (len > HCI_MAX_AD_LENGTH) + return; + bacpy(>last_adv_addr, bdaddr); d->last_adv_addr_type = bdaddr_type; d->last_adv_rssi = rssi; @@ -5116,7 +5119,8 @@ static struct hci_conn *check_pending_le_conn(struct hci_dev *hdev, static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, u8 bdaddr_type, bdaddr_t *direct_addr, - u8 direct_addr_type, s8 rssi, u8 *data, u8 len) + u8 direct_addr_type, s8 rssi, u8 *data, u8 len, + bool ext_adv) { struct discovery_state *d = >discovery; struct smp_irk *irk; @@ -5138,6 +5142,11 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, return; } + if (!ext_adv && len > HCI_MAX_AD_LENGTH) { + bt_dev_err_ratelimited(hdev, "legacy adv larger than 31 bytes"); + return; + } + /* Find the end of the data in case the report contains padded zero * bytes at the end causing an invalid length value. * @@ -5197,7 +5206,7 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, */ conn = check_pending_le_conn(hdev, bdaddr, bdaddr_type, type, direct_addr); - if (conn && type == LE_ADV_IND) { + if (!ext_adv && conn && type == LE_ADV_IND && len <= HCI_MAX_AD_LENGTH) { /* Store report for later inclusion by * mgmt_device_connected */ @@ -5251,7 +5260,7 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, * event or send an immediate device found event if the data * should not be stored for later. */ - if (!has_pending_adv_report(hdev)) { + if (!ext_adv && !has_pending_adv_report(hdev)) { /* If the report will trigger a SCAN_REQ store it for * later merging. */ @@ -5286,7 +5295,8 @@ static void process
[PATCH 5.4 64/90] Bluetooth: fix kernel oops in store_pending_adv_report
From: Alain Michaud [ Upstream commit a2ec905d1e160a33b2e210e45ad30445ef26ce0e ] Fix kernel oops observed when an ext adv data is larger than 31 bytes. This can be reproduced by setting up an advertiser with advertisement larger than 31 bytes. The issue is not sensitive to the advertisement content. In particular, this was reproduced with an advertisement of 229 bytes filled with 'A'. See stack trace below. This is fixed by not catching ext_adv as legacy adv are only cached to be able to concatenate a scanable adv with its scan response before sending it up through mgmt. With ext_adv, this is no longer necessary. general protection fault: [#1] SMP PTI CPU: 6 PID: 205 Comm: kworker/u17:0 Not tainted 5.4.0-37-generic #41-Ubuntu Hardware name: Dell Inc. XPS 15 7590/0CF6RR, BIOS 1.7.0 05/11/2020 Workqueue: hci0 hci_rx_work [bluetooth] RIP: 0010:hci_bdaddr_list_lookup+0x1e/0x40 [bluetooth] Code: ff ff e9 26 ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 39 c7 75 0a eb 24 48 8b 00 48 39 f8 74 1c 44 8b 06 <44> 39 40 10 75 ef 44 0f b7 4e 04 66 44 39 48 14 75 e3 38 50 16 75 RSP: 0018:bc6a40493c70 EFLAGS: 00010286 RAX: 4141414141414141 RBX: 001b RCX: RDX: RSI: 9903e76c100f RDI: 9904289d4b28 RBP: bc6a40493c70 R08: 93570362 R09: R10: R11: 9904344eae38 R12: 9904289d4000 R13: R14: ffa3 R15: 9903e76c100f FS: () GS:99043458() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7feed125a000 CR3: 0001b860a003 CR4: 003606e0 Call Trace: process_adv_report+0x12e/0x560 [bluetooth] hci_le_meta_evt+0x7b2/0xba0 [bluetooth] hci_event_packet+0x1c29/0x2a90 [bluetooth] hci_rx_work+0x19b/0x360 [bluetooth] process_one_work+0x1eb/0x3b0 worker_thread+0x4d/0x400 kthread+0x104/0x140 Fixes: c215e9397b00 ("Bluetooth: Process extended ADV report event") Reported-by: Andy Nguyen Reported-by: Linus Torvalds Reported-by: Balakrishna Godavarthi Signed-off-by: Alain Michaud Tested-by: Sonny Sasaka Acked-by: Marcel Holtmann Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- net/bluetooth/hci_event.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 88cd410e57289..44385252d7b6a 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -1274,6 +1274,9 @@ static void store_pending_adv_report(struct hci_dev *hdev, bdaddr_t *bdaddr, { struct discovery_state *d = >discovery; + if (len > HCI_MAX_AD_LENGTH) + return; + bacpy(>last_adv_addr, bdaddr); d->last_adv_addr_type = bdaddr_type; d->last_adv_rssi = rssi; @@ -5231,7 +5234,8 @@ static struct hci_conn *check_pending_le_conn(struct hci_dev *hdev, static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, u8 bdaddr_type, bdaddr_t *direct_addr, - u8 direct_addr_type, s8 rssi, u8 *data, u8 len) + u8 direct_addr_type, s8 rssi, u8 *data, u8 len, + bool ext_adv) { struct discovery_state *d = >discovery; struct smp_irk *irk; @@ -5253,6 +5257,11 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, return; } + if (!ext_adv && len > HCI_MAX_AD_LENGTH) { + bt_dev_err_ratelimited(hdev, "legacy adv larger than 31 bytes"); + return; + } + /* Find the end of the data in case the report contains padded zero * bytes at the end causing an invalid length value. * @@ -5312,7 +5321,7 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, */ conn = check_pending_le_conn(hdev, bdaddr, bdaddr_type, type, direct_addr); - if (conn && type == LE_ADV_IND) { + if (!ext_adv && conn && type == LE_ADV_IND && len <= HCI_MAX_AD_LENGTH) { /* Store report for later inclusion by * mgmt_device_connected */ @@ -5366,7 +5375,7 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, * event or send an immediate device found event if the data * should not be stored for later. */ - if (!has_pending_adv_report(hdev)) { + if (!ext_adv && !has_pending_adv_report(hdev)) { /* If the report will trigger a SCAN_REQ store it for * later merging. */ @@ -5401,7 +5410,8 @@ static void process
[PATCH 5.7 084/120] Bluetooth: fix kernel oops in store_pending_adv_report
From: Alain Michaud [ Upstream commit a2ec905d1e160a33b2e210e45ad30445ef26ce0e ] Fix kernel oops observed when an ext adv data is larger than 31 bytes. This can be reproduced by setting up an advertiser with advertisement larger than 31 bytes. The issue is not sensitive to the advertisement content. In particular, this was reproduced with an advertisement of 229 bytes filled with 'A'. See stack trace below. This is fixed by not catching ext_adv as legacy adv are only cached to be able to concatenate a scanable adv with its scan response before sending it up through mgmt. With ext_adv, this is no longer necessary. general protection fault: [#1] SMP PTI CPU: 6 PID: 205 Comm: kworker/u17:0 Not tainted 5.4.0-37-generic #41-Ubuntu Hardware name: Dell Inc. XPS 15 7590/0CF6RR, BIOS 1.7.0 05/11/2020 Workqueue: hci0 hci_rx_work [bluetooth] RIP: 0010:hci_bdaddr_list_lookup+0x1e/0x40 [bluetooth] Code: ff ff e9 26 ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 39 c7 75 0a eb 24 48 8b 00 48 39 f8 74 1c 44 8b 06 <44> 39 40 10 75 ef 44 0f b7 4e 04 66 44 39 48 14 75 e3 38 50 16 75 RSP: 0018:bc6a40493c70 EFLAGS: 00010286 RAX: 4141414141414141 RBX: 001b RCX: RDX: RSI: 9903e76c100f RDI: 9904289d4b28 RBP: bc6a40493c70 R08: 93570362 R09: R10: R11: 9904344eae38 R12: 9904289d4000 R13: R14: ffa3 R15: 9903e76c100f FS: () GS:99043458() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7feed125a000 CR3: 0001b860a003 CR4: 003606e0 Call Trace: process_adv_report+0x12e/0x560 [bluetooth] hci_le_meta_evt+0x7b2/0xba0 [bluetooth] hci_event_packet+0x1c29/0x2a90 [bluetooth] hci_rx_work+0x19b/0x360 [bluetooth] process_one_work+0x1eb/0x3b0 worker_thread+0x4d/0x400 kthread+0x104/0x140 Fixes: c215e9397b00 ("Bluetooth: Process extended ADV report event") Reported-by: Andy Nguyen Reported-by: Linus Torvalds Reported-by: Balakrishna Godavarthi Signed-off-by: Alain Michaud Tested-by: Sonny Sasaka Acked-by: Marcel Holtmann Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- net/bluetooth/hci_event.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index b11f8d391ad82..fe75f435171ce 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -1305,6 +1305,9 @@ static void store_pending_adv_report(struct hci_dev *hdev, bdaddr_t *bdaddr, { struct discovery_state *d = >discovery; + if (len > HCI_MAX_AD_LENGTH) + return; + bacpy(>last_adv_addr, bdaddr); d->last_adv_addr_type = bdaddr_type; d->last_adv_rssi = rssi; @@ -5317,7 +5320,8 @@ static struct hci_conn *check_pending_le_conn(struct hci_dev *hdev, static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, u8 bdaddr_type, bdaddr_t *direct_addr, - u8 direct_addr_type, s8 rssi, u8 *data, u8 len) + u8 direct_addr_type, s8 rssi, u8 *data, u8 len, + bool ext_adv) { struct discovery_state *d = >discovery; struct smp_irk *irk; @@ -5339,6 +5343,11 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, return; } + if (!ext_adv && len > HCI_MAX_AD_LENGTH) { + bt_dev_err_ratelimited(hdev, "legacy adv larger than 31 bytes"); + return; + } + /* Find the end of the data in case the report contains padded zero * bytes at the end causing an invalid length value. * @@ -5398,7 +5407,7 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, */ conn = check_pending_le_conn(hdev, bdaddr, bdaddr_type, type, direct_addr); - if (conn && type == LE_ADV_IND) { + if (!ext_adv && conn && type == LE_ADV_IND && len <= HCI_MAX_AD_LENGTH) { /* Store report for later inclusion by * mgmt_device_connected */ @@ -5452,7 +5461,7 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, * event or send an immediate device found event if the data * should not be stored for later. */ - if (!has_pending_adv_report(hdev)) { + if (!ext_adv && !has_pending_adv_report(hdev)) { /* If the report will trigger a SCAN_REQ store it for * later merging. */ @@ -5487,7 +5496,8 @@ static void process
[PATCH 5.4 134/138] ASoC: topology: fix kernel oops on route addition error
From: Pierre-Louis Bossart commit 6f0307df83f2aa6bdf656c2219c89ce96502d20e upstream. When errors happens while loading graph components, the kernel oopses while trying to remove all topology components. This can be root-caused to a list pointing to memory that was already freed on error. remove_route() is already called on errors and will perform the required cleanups so there's no need to free the route memory in soc_tplg_dapm_graph_elems_load() if the route was added to the list. We do however want to free the routes allocated but not added to the list. Fixes: 7df04ea7a31ea ('ASoC: topology: modify dapm route loading routine and add dapm route unloading') Signed-off-by: Pierre-Louis Bossart Reviewed-by: Ranjani Sridharan Reviewed-by: Kai Vehmanen Link: https://lore.kernel.org/r/20200707203749.113883-2-pierre-louis.boss...@linux.intel.com Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/soc-topology.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) --- a/sound/soc/soc-topology.c +++ b/sound/soc/soc-topology.c @@ -1284,17 +1284,29 @@ static int soc_tplg_dapm_graph_elems_loa list_add([i]->dobj.list, >comp->dobj_list); ret = soc_tplg_add_route(tplg, routes[i]); - if (ret < 0) + if (ret < 0) { + /* +* this route was added to the list, it will +* be freed in remove_route() so increment the +* counter to skip it in the error handling +* below. +*/ + i++; break; + } /* add route, but keep going if some fail */ snd_soc_dapm_add_routes(dapm, routes[i], 1); } - /* free memory allocated for all dapm routes in case of error */ - if (ret < 0) - for (i = 0; i < count ; i++) - kfree(routes[i]); + /* +* free memory allocated for all dapm routes not added to the +* list in case of error +*/ + if (ret < 0) { + while (i < count) + kfree(routes[i++]); + } /* * free pointer to array of dapm routes as this is no longer needed.
[PATCH 5.7 174/179] ASoC: topology: fix kernel oops on route addition error
From: Pierre-Louis Bossart commit 6f0307df83f2aa6bdf656c2219c89ce96502d20e upstream. When errors happens while loading graph components, the kernel oopses while trying to remove all topology components. This can be root-caused to a list pointing to memory that was already freed on error. remove_route() is already called on errors and will perform the required cleanups so there's no need to free the route memory in soc_tplg_dapm_graph_elems_load() if the route was added to the list. We do however want to free the routes allocated but not added to the list. Fixes: 7df04ea7a31ea ('ASoC: topology: modify dapm route loading routine and add dapm route unloading') Signed-off-by: Pierre-Louis Bossart Reviewed-by: Ranjani Sridharan Reviewed-by: Kai Vehmanen Link: https://lore.kernel.org/r/20200707203749.113883-2-pierre-louis.boss...@linux.intel.com Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/soc-topology.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) --- a/sound/soc/soc-topology.c +++ b/sound/soc/soc-topology.c @@ -1285,17 +1285,29 @@ static int soc_tplg_dapm_graph_elems_loa list_add([i]->dobj.list, >comp->dobj_list); ret = soc_tplg_add_route(tplg, routes[i]); - if (ret < 0) + if (ret < 0) { + /* +* this route was added to the list, it will +* be freed in remove_route() so increment the +* counter to skip it in the error handling +* below. +*/ + i++; break; + } /* add route, but keep going if some fail */ snd_soc_dapm_add_routes(dapm, routes[i], 1); } - /* free memory allocated for all dapm routes in case of error */ - if (ret < 0) - for (i = 0; i < count ; i++) - kfree(routes[i]); + /* +* free memory allocated for all dapm routes not added to the +* list in case of error +*/ + if (ret < 0) { + while (i < count) + kfree(routes[i++]); + } /* * free pointer to array of dapm routes as this is no longer needed.
Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
On Thu, Jul 16, 2020 at 09:22:11PM +0300, Maxim Levitsky wrote: > On Thu, 2020-07-16 at 21:21 +0300, Andy Shevchenko wrote: > > On Thu, Jul 16, 2020 at 09:00:00PM +0300, Maxim Levitsky wrote: > > > On Thu, 2020-07-16 at 18:47 +0300, Andy Shevchenko wrote: ... > > > It works (no more oops) > > > > Thanks for testing. I'm about to send formal patch, can you give your > > Tested-by tag there then? > > Of course. > > Tested-by: Maxim Levitsky Thanks, I meant there [1] :-) [1]: https://lore.kernel.org/lkml/20200716182747.54929-1-andriy.shevche...@linux.intel.com/T/#u -- With Best Regards, Andy Shevchenko
Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
On Thu, Jul 16, 2020 at 09:00:00PM +0300, Maxim Levitsky wrote: > On Thu, 2020-07-16 at 18:47 +0300, Andy Shevchenko wrote: > > On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote: > > > Hi! > > > > > > Few days ago I bisected a regression on 5.8 kernel: > > > > > > I have nvidia rtx 2070s and its USB type C port driver (which is open > > > source) > > > started to crash on load: > > > > ... > > > > > Reverting the commit helped fix this oops. > > > > > > My .config attached. > > > If any more info is needed I'll be happy to provide it, > > > and of course test patches. > > > > Can you test below? > > > > diff --git a/drivers/base/property.c b/drivers/base/property.c > > index 1e6d75e65938..d58aa98fe964 100644 > > --- a/drivers/base/property.c > > +++ b/drivers/base/property.c > > @@ -721,7 +721,7 @@ struct fwnode_handle *device_get_next_child_node(struct > > device *dev, > > return next; > > > > /* When no more children in primary, continue with secondary */ > > - if (!IS_ERR_OR_NULL(fwnode->secondary)) > > + if (fwnode && !IS_ERR_OR_NULL(fwnode->secondary)) > > next = fwnode_get_next_child_node(fwnode->secondary, child); > > > > return next; > > It works (no more oops) Thanks for testing. I'm about to send formal patch, can you give your Tested-by tag there then? -- With Best Regards, Andy Shevchenko
Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
On Thu, 2020-07-16 at 21:21 +0300, Andy Shevchenko wrote: > On Thu, Jul 16, 2020 at 09:00:00PM +0300, Maxim Levitsky wrote: > > On Thu, 2020-07-16 at 18:47 +0300, Andy Shevchenko wrote: > > > On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote: > > > > Hi! > > > > > > > > Few days ago I bisected a regression on 5.8 kernel: > > > > > > > > I have nvidia rtx 2070s and its USB type C port driver (which is open > > > > source) > > > > started to crash on load: > > > > > > ... > > > > > > > Reverting the commit helped fix this oops. > > > > > > > > My .config attached. > > > > If any more info is needed I'll be happy to provide it, > > > > and of course test patches. > > > > > > Can you test below? > > > > > > diff --git a/drivers/base/property.c b/drivers/base/property.c > > > index 1e6d75e65938..d58aa98fe964 100644 > > > --- a/drivers/base/property.c > > > +++ b/drivers/base/property.c > > > @@ -721,7 +721,7 @@ struct fwnode_handle > > > *device_get_next_child_node(struct device *dev, > > > return next; > > > > > > /* When no more children in primary, continue with secondary */ > > > - if (!IS_ERR_OR_NULL(fwnode->secondary)) > > > + if (fwnode && !IS_ERR_OR_NULL(fwnode->secondary)) > > > next = fwnode_get_next_child_node(fwnode->secondary, child); > > > > > > return next; > > > > It works (no more oops) > > Thanks for testing. I'm about to send formal patch, can you give your > Tested-by tag there then? Of course. Tested-by: Maxim Levitsky Best regards, Maxim Levitsky >
Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
On Thu, 2020-07-16 at 18:47 +0300, Andy Shevchenko wrote: > On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote: > > Hi! > > > > Few days ago I bisected a regression on 5.8 kernel: > > > > I have nvidia rtx 2070s and its USB type C port driver (which is open > > source) > > started to crash on load: > > ... > > > Reverting the commit helped fix this oops. > > > > My .config attached. > > If any more info is needed I'll be happy to provide it, > > and of course test patches. > > Can you test below? > > diff --git a/drivers/base/property.c b/drivers/base/property.c > index 1e6d75e65938..d58aa98fe964 100644 > --- a/drivers/base/property.c > +++ b/drivers/base/property.c > @@ -721,7 +721,7 @@ struct fwnode_handle *device_get_next_child_node(struct > device *dev, > return next; > > /* When no more children in primary, continue with secondary */ > - if (!IS_ERR_OR_NULL(fwnode->secondary)) > + if (fwnode && !IS_ERR_OR_NULL(fwnode->secondary)) > next = fwnode_get_next_child_node(fwnode->secondary, child); > > return next; It works (no more oops) Best regards, Maxim Levitsky
Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
23 > > # good: [081096d98bb23946f16215357b141c5616b234bf] Merge tag 'tty-5.8-rc1' > > of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty > > git bisect good 081096d98bb23946f16215357b141c5616b234bf > > # bad: [3a2a8751742133a7bbc49b9d1bcbd52e212edff6] Merge tag 'for-v5.8' of > > git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply > > git bisect bad 3a2a8751742133a7bbc49b9d1bcbd52e212edff6 > > # bad: [a1e81f9654eef650d3ee35c94a8cab00b5cd379c] m68k: implement > > flush_icache_user_range > > git bisect bad a1e81f9654eef650d3ee35c94a8cab00b5cd379c > > # good: [c336c022503d1be719ca06f2526c211709e3d2d3] staging: wfx: remove > > false positive warning > > git bisect good c336c022503d1be719ca06f2526c211709e3d2d3 > > # good: [05c8a4fc44a916dd897769ca69b42381f9177ec4] habanalabs: correctly > > cast u64 to void* > > git bisect good 05c8a4fc44a916dd897769ca69b42381f9177ec4 > > # good: [a3975dea1696b7c81319dc4b66e3c378dd47ccfb] Merge tag 'iio-for-5.8c' > > of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next > > git bisect good a3975dea1696b7c81319dc4b66e3c378dd47ccfb > > # bad: [f558b8364e19f9222e7976c64e9367f66bab02cc] Merge tag > > 'driver-core-5.8-rc1' of > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core > > git bisect bad f558b8364e19f9222e7976c64e9367f66bab02cc > > # good: [b6d90ef9a439b4ef73a350789bf766a1339a703d] staging: vchi: Get rid > > of not implemented function declarations > > git bisect good b6d90ef9a439b4ef73a350789bf766a1339a703d > > # good: [93d2e4322aa74c1ad1e8c2160608eb9a960d69ff] of: platform: Batch > > fwnode parsing when adding all top level devices > > git bisect good 93d2e4322aa74c1ad1e8c2160608eb9a960d69ff > > # bad: [c2c076166b5880eabe068ce1cab30bf6edeeea1a] firmware_loader: change > > enum fw_opt to u32 > > git bisect bad c2c076166b5880eabe068ce1cab30bf6edeeea1a > > # bad: [2cd38fd15e4ebcfe917a443734820269f8b5ba2b] driver core: Remove > > unnecessary is_fwnode_dev variable in device_add() > > git bisect bad 2cd38fd15e4ebcfe917a443734820269f8b5ba2b > > # good: [c82c83c330654c5639960ebc3dabbae53c43f79e] driver core: platform: > > Fix spelling errors in platform.c > > git bisect good c82c83c330654c5639960ebc3dabbae53c43f79e > > # bad: [114dbb4fa7c4053a51964d112e2851e818e085c6] drivers property: When no > > children in primary, try secondary > > git bisect bad 114dbb4fa7c4053a51964d112e2851e818e085c6 > > # first bad commit: [114dbb4fa7c4053a51964d112e2851e818e085c6] drivers > > property: When no children in primary, try secondary > > > > > > Reverting the commit helped fix this oops. > > > > My .config attached. > > If any more info is needed I'll be happy to provide it, > > and of course test patches. > > > > Best regards, > > Maxim Levitsky > > Turns out that kernel has decode_stacktrace.sh. I always decoded the symbols manually. I will send the decoded trace from now on in bug reports. IMHO it would be usefull to include a pointer to it in the kernel oops report since many people like me don't know about this nice script. [mlevitsk@starship ~/UPSTREAM/linux-kernel/work_area/ucsi_crash]$../../src/scripts/decode_stacktrace.sh ../../src/vmlinux ../../src/ ../../src/ < ./stacktrace.txt [ +0.43] CPU: 19 PID: 31281 Comm: kworker/19:1 Tainted: PW O 5.8.0-rc3.stable #133 [ +0.45] Hardware name: Gigabyte Technology Co., Ltd. TRX40 DESIGNARE/TRX40 DESIGNARE, BIOS F4c 03/05/2020 [ +0.30] Workqueue: events_long ucsi_init_work [typec_ucsi] [ +0.48] RIP: 0010:device_get_next_child_node (/home/mlevitsk/UPSTREAM/linux-kernel/src/drivers/base/property.c:715) [ +0.24] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b 03 48 85 c0 74 f3 48> All code 0: 18 48 85sbb%cl,-0x7b(%rax) 3: db 74 24 48 (bad) 0x48(%rsp) 7: 8b 43 08mov0x8(%rbx),%eax a: 48 85 c0test %rax,%rax d: 74 1b je 0x2a f: 48 8b 40 50 mov0x50(%rax),%rax 13: 48 85 c0test %rax,%rax 16: 74 12 je 0x2a 18: 48 89 eemov%rbp,%rsi 1b: 48 89 dfmov%rbx,%rdi 1e: ff d0 callq *%rax 20: 48 85 c0test %rax,%rax 23: 74 05 je 0x2a 25: 5b pop%rbx 26: 5d pop%rbp 27: 41 5c pop%r12 29: c3 retq 2a:* 48 8b 03
Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote: > Hi! > > Few days ago I bisected a regression on 5.8 kernel: > > I have nvidia rtx 2070s and its USB type C port driver (which is open source) > started to crash on load: ... > Reverting the commit helped fix this oops. > > My .config attached. > If any more info is needed I'll be happy to provide it, > and of course test patches. Can you test below? diff --git a/drivers/base/property.c b/drivers/base/property.c index 1e6d75e65938..d58aa98fe964 100644 --- a/drivers/base/property.c +++ b/drivers/base/property.c @@ -721,7 +721,7 @@ struct fwnode_handle *device_get_next_child_node(struct device *dev, return next; /* When no more children in primary, continue with secondary */ - if (!IS_ERR_OR_NULL(fwnode->secondary)) + if (fwnode && !IS_ERR_OR_NULL(fwnode->secondary)) next = fwnode_get_next_child_node(fwnode->secondary, child); return next; -- With Best Regards, Andy Shevchenko
Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote: > Hi! > > Few days ago I bisected a regression on 5.8 kernel: > > I have nvidia rtx 2070s and its USB type C port driver (which is open source) > started to crash on load: I'm looking at this, but I have questions: - any pointers to the device tree excerpt which this tries to iterate over - can you provide full Code: line? Only way I see, why it happens, is that fwnode is not initialized properly somewhere (means it has garbage in the secondary pointer). > [ +0.43] CPU: 19 PID: 31281 Comm: kworker/19:1 Tainted: PW O > 5.8.0-rc3.stable #133 > [ +0.45] Hardware name: Gigabyte Technology Co., Ltd. TRX40 > DESIGNARE/TRX40 DESIGNARE, BIOS F4c 03/05/2020 > [ +0.30] Workqueue: events_long ucsi_init_work [typec_ucsi] > [ +0.48] RIP: 0010:device_get_next_child_node+0x5b/0xb0 > [ +0.24] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 > 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b > 03 48 85 c0 74 f3 48> > [ +0.65] RSP: 0018:c900038d7e08 EFLAGS: 00010246 > [ +0.44] RAX: 889fb6b62f00 RBX: RCX: > 0001 > [ +0.27] RDX: 889fb6fd4a70 RSI: RDI: > 889fb6b63608 > [ +0.46] RBP: R08: 0001 R09: > 7fff > [ +0.24] R10: 2075ce282580 R11: 0062de3e R12: > 889fb6b63608 > [ +0.43] R13: 0001 R14: 889fb6b63018 R15: > 0001 > [ +0.44] FS: () GS:889fbe4c() > knlGS: > [ +0.24] CS: 0010 DS: ES: CR0: 80050033 > [ +0.42] CR2: CR3: 00175621b000 CR4: > 00340ea0 > [ +0.46] Call Trace: > [ +0.30] ucsi_init+0x213/0x530 [typec_ucsi] > [ +0.28] ucsi_init_work+0x12/0x20 [typec_ucsi] > [ +0.49] process_one_work+0x1d2/0x390 > [ +0.27] worker_thread+0x4a/0x3b0 > [ +0.25] ? process_one_work+0x390/0x390 > [ +0.49] kthread+0xf9/0x130 > [ +0.26] ? kthread_park+0x90/0x90 > [ +0.28] ret_from_fork+0x1f/0x30 > [ +0.48] Modules linked in: ucsi_ccg typec_ucsi typec hfsplus cdrom ntfs > msdos vfio_pci vfio_virqfd vfio_iommu_type1 vfio vhost_net vhost vhost_iotlb > tap xfs rfcomm xt_M> > [ +0.39] usb_storage ext4 mbcache jbd2 amdgpu gpu_sched ttm > drm_kms_helper syscopyarea sysfillrect ahci sysimgblt fb_sys_fops > crc32_pclmul libahci crc32c_intel igb ccp > > [ +0.000289] CR2: > [ +0.26] ---[ end trace 38ebb9aebd55fbff ]--- > [ +0.014201] RIP: 0010:device_get_next_child_node+0x5b/0xb0 > [ +0.30] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 > 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b > 03 48 85 c0 74 f3 48> > [ +0.75] RSP: 0018:c900038d7e08 EFLAGS: 00010246 > [ +0.27] RAX: 889fb6b62f00 RBX: RCX: > 0001 > [ +0.48] RDX: 889fb6fd4a70 RSI: RDI: > 889fb6b63608 > [ +0.49] RBP: R08: 0001 R09: > 7fff > [ +0.27] R10: 2075ce282580 R11: 0062de3e R12: > 889fb6b63608 > [ +0.49] R13: 0001 R14: 889fb6b63018 R15: > 0001 > [ +0.50] FS: () GS:889fbe4c() > knlGS: > [ +0.27] CS: 0010 DS: ES: CR0: 80050033 > [ +0.50] CR2: CR3: 00175621b000 CR4: > 00340ea0 > > I bisected this, while passing the UCSI controller to a VM, and this > is the result: > > git bisect start > # good: [3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162] Linux 5.7 > git bisect good 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162 > # bad: [48778464bb7d346b47157d21ffde2af6b2d39110] Linux 5.8-rc2 > git bisect bad 48778464bb7d346b47157d21ffde2af6b2d39110 > # good: [a98f670e41a99f53acb1fb33cee9c6abbb2e6f23] Merge tag 'media/v5.8-1' > of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media > git bisect good a98f670e41a99f53acb1fb33cee9c6abbb2e6f23 > # good: [081096d98bb23946f16215357b141c5616b234bf] Merge tag 'tty-5.8-rc1' of > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty > git bisect good 081096d98bb23946f16215357b141c5616b234bf > # bad: [3a2a8751742133a7bbc49b9d1bcbd52e212edff6] Merge tag 'for-v5.8' of > git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply > git bisect bad 3a2a8751742133a7bbc49b9d1bcbd52e212edff6 > # bad: [a1e81f9654eef650d3ee35c94a8cab00b5cd379c] m68k: implement > flush_icache_user_range > git bisect bad a1e81f9654eef650d3ee35c94a8cab00b5cd379c > # good: [c336c022503d1be719ca06f2526c211709e3d2d3] staging: wfx: remove false > positive warning > git bisect good c336c022503d1be719ca06f2526c211709e3d2d3 > # good: [05c8a4fc44a916dd897769ca69b42381f9177ec4] habanalabs: correctly cast > u64
Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
On Thu, 2020-07-16 at 10:28 +0200, Greg KH wrote: > On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote: > > Hi! > > > > Few days ago I bisected a regression on 5.8 kernel: > > > > I have nvidia rtx 2070s and its USB type C port driver (which is open > > source) > > Is that driver merged into the tree? If not, do you have a pointer to > it somewhere? > > thanks, > > greg k-h > It is in the tree. CONFIG_TYPEC_UCSI selectes the generic UCSI driver CONFIG_UCSI_CCG selects the hardware driver, which is an i2c driver which binds to an i2c device (I think with address 0x8) on an i2c controller, which is exposed by function 3 of the NVIDIA card, and uses the CONFIG_I2C_NVIDIA_GPU driver. We also have CONFIG_TYPEC_NVIDIA_ALTMODE which I haven't researched what it does. Best regards, Maxim Levitsky
Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote: > Hi! > > Few days ago I bisected a regression on 5.8 kernel: > > I have nvidia rtx 2070s and its USB type C port driver (which is open source) Is that driver merged into the tree? If not, do you have a pointer to it somewhere? thanks, greg k-h
kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
Hi! Few days ago I bisected a regression on 5.8 kernel: I have nvidia rtx 2070s and its USB type C port driver (which is open source) started to crash on load: [ +0.43] CPU: 19 PID: 31281 Comm: kworker/19:1 Tainted: PW O 5.8.0-rc3.stable #133 [ +0.45] Hardware name: Gigabyte Technology Co., Ltd. TRX40 DESIGNARE/TRX40 DESIGNARE, BIOS F4c 03/05/2020 [ +0.30] Workqueue: events_long ucsi_init_work [typec_ucsi] [ +0.48] RIP: 0010:device_get_next_child_node+0x5b/0xb0 [ +0.24] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b 03 48 85 c0 74 f3 48> [ +0.65] RSP: 0018:c900038d7e08 EFLAGS: 00010246 [ +0.44] RAX: 889fb6b62f00 RBX: RCX: 0001 [ +0.27] RDX: 889fb6fd4a70 RSI: RDI: 889fb6b63608 [ +0.46] RBP: R08: 0001 R09: 7fff [ +0.24] R10: 2075ce282580 R11: 0062de3e R12: 889fb6b63608 [ +0.43] R13: 0001 R14: 889fb6b63018 R15: 0001 [ +0.44] FS: () GS:889fbe4c() knlGS: [ +0.24] CS: 0010 DS: ES: CR0: 80050033 [ +0.42] CR2: CR3: 00175621b000 CR4: 00340ea0 [ +0.46] Call Trace: [ +0.30] ucsi_init+0x213/0x530 [typec_ucsi] [ +0.28] ucsi_init_work+0x12/0x20 [typec_ucsi] [ +0.49] process_one_work+0x1d2/0x390 [ +0.27] worker_thread+0x4a/0x3b0 [ +0.25] ? process_one_work+0x390/0x390 [ +0.49] kthread+0xf9/0x130 [ +0.26] ? kthread_park+0x90/0x90 [ +0.28] ret_from_fork+0x1f/0x30 [ +0.48] Modules linked in: ucsi_ccg typec_ucsi typec hfsplus cdrom ntfs msdos vfio_pci vfio_virqfd vfio_iommu_type1 vfio vhost_net vhost vhost_iotlb tap xfs rfcomm xt_M> [ +0.39] usb_storage ext4 mbcache jbd2 amdgpu gpu_sched ttm drm_kms_helper syscopyarea sysfillrect ahci sysimgblt fb_sys_fops crc32_pclmul libahci crc32c_intel igb ccp > [ +0.000289] CR2: [ +0.26] ---[ end trace 38ebb9aebd55fbff ]--- [ +0.014201] RIP: 0010:device_get_next_child_node+0x5b/0xb0 [ +0.30] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b 03 48 85 c0 74 f3 48> [ +0.75] RSP: 0018:c900038d7e08 EFLAGS: 00010246 [ +0.27] RAX: 889fb6b62f00 RBX: RCX: 0001 [ +0.48] RDX: 889fb6fd4a70 RSI: RDI: 889fb6b63608 [ +0.49] RBP: R08: 0001 R09: 7fff [ +0.27] R10: 2075ce282580 R11: 0062de3e R12: 889fb6b63608 [ +0.49] R13: 0001 R14: 889fb6b63018 R15: 0001 [ +0.50] FS: () GS:889fbe4c() knlGS: [ +0.27] CS: 0010 DS: ES: CR0: 80050033 [ +0.50] CR2: CR3: 00175621b000 CR4: 00340ea0 I bisected this, while passing the UCSI controller to a VM, and this is the result: git bisect start # good: [3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162] Linux 5.7 git bisect good 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162 # bad: [48778464bb7d346b47157d21ffde2af6b2d39110] Linux 5.8-rc2 git bisect bad 48778464bb7d346b47157d21ffde2af6b2d39110 # good: [a98f670e41a99f53acb1fb33cee9c6abbb2e6f23] Merge tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media git bisect good a98f670e41a99f53acb1fb33cee9c6abbb2e6f23 # good: [081096d98bb23946f16215357b141c5616b234bf] Merge tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty git bisect good 081096d98bb23946f16215357b141c5616b234bf # bad: [3a2a8751742133a7bbc49b9d1bcbd52e212edff6] Merge tag 'for-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply git bisect bad 3a2a8751742133a7bbc49b9d1bcbd52e212edff6 # bad: [a1e81f9654eef650d3ee35c94a8cab00b5cd379c] m68k: implement flush_icache_user_range git bisect bad a1e81f9654eef650d3ee35c94a8cab00b5cd379c # good: [c336c022503d1be719ca06f2526c211709e3d2d3] staging: wfx: remove false positive warning git bisect good c336c022503d1be719ca06f2526c211709e3d2d3 # good: [05c8a4fc44a916dd897769ca69b42381f9177ec4] habanalabs: correctly cast u64 to void* git bisect good 05c8a4fc44a916dd897769ca69b42381f9177ec4 # good: [a3975dea1696b7c81319dc4b66e3c378dd47ccfb] Merge tag 'iio-for-5.8c' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next git bisect good a3975dea1696b7c81319dc4b66e3c378dd47ccfb # bad: [f558b8364e19f9222e7976c64e9367f66bab02cc] Merge tag 'driver-core-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core git bisect bad f558b8364e19f9222e7976c64e9367f66bab02cc # good: [b6d90ef9a439b4ef73a350789bf766a1339a703d] staging: vchi:
Re: [PATCH v1] Bluetooth: Fix kernel oops triggered by hci_adv_monitors_clear()
On Tue 2020-07-07 17:38:46, Marcel Holtmann wrote: > Hi Miao-chen, > > > This fixes the kernel oops by removing unnecessary background scan > > update from hci_adv_monitors_clear() which shouldn't invoke any work > > queue. > > > > The following test was performed. > > - Run "rmmod btusb" and verify that no kernel oops is triggered. > > > > Signed-off-by: Miao-chen Chou > > Reviewed-by: Abhishek Pandit-Subedi > > Reviewed-by: Alain Michaud > > --- > > > > net/bluetooth/hci_core.c | 2 -- > > 1 file changed, 2 deletions(-) > > patch has been applied to bluetooth-next tree. Bluetooth no longer seems to oops for me... but there's different showstopper in next (graphics -- i915 -- related). Oh well :-(. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH v1] Bluetooth: Fix kernel oops triggered by hci_adv_monitors_clear()
Hi Miao-chen, > This fixes the kernel oops by removing unnecessary background scan > update from hci_adv_monitors_clear() which shouldn't invoke any work > queue. > > The following test was performed. > - Run "rmmod btusb" and verify that no kernel oops is triggered. > > Signed-off-by: Miao-chen Chou > Reviewed-by: Abhishek Pandit-Subedi > Reviewed-by: Alain Michaud > --- > > net/bluetooth/hci_core.c | 2 -- > 1 file changed, 2 deletions(-) patch has been applied to bluetooth-next tree. Regards Marcel
Re: [PATCH v1] Bluetooth: Fix kernel oops triggered by hci_adv_monitors_clear()
Hi Marcel, In case you missed this thread, my suggestion is to revert the previous patch and apply this patch. Please see my earlier email for the reason. Thanks. Regards, Miao On Tue, Jun 30, 2020 at 2:55 PM Miao-chen Chou wrote: > > Hi Marcel, > > hci_unregister_dev() is invoked when the controller is intended to be > removed by btusb driver. In other words, there should not be any > activity on hdev's workqueue, so the destruction of the workqueue > should be the first thing to do to prevent the clear helpers from > issuing any work. So my suggestion is to revert the patch re-arranging > the workqueue and apply this instead. > I should have uploaded this earlier, but I encountered some troubles > while verifying the changes. Sorry for the inconvenience. > > Regards, > Miao > > On Mon, Jun 29, 2020 at 11:51 PM Marcel Holtmann wrote: > > > > Hi Miao-chen, > > > > > This fixes the kernel oops by removing unnecessary background scan > > > update from hci_adv_monitors_clear() which shouldn't invoke any work > > > queue. > > > > > > The following test was performed. > > > - Run "rmmod btusb" and verify that no kernel oops is triggered. > > > > > > Signed-off-by: Miao-chen Chou > > > Reviewed-by: Abhishek Pandit-Subedi > > > Reviewed-by: Alain Michaud > > > --- > > > > > > net/bluetooth/hci_core.c | 2 -- > > > 1 file changed, 2 deletions(-) > > > > > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > > index 5577cf9e2c7cd..77615161c7d72 100644 > > > --- a/net/bluetooth/hci_core.c > > > +++ b/net/bluetooth/hci_core.c > > > @@ -3005,8 +3005,6 @@ void hci_adv_monitors_clear(struct hci_dev *hdev) > > > hci_free_adv_monitor(monitor); > > > > > > idr_destroy(>adv_monitors_idr); > > > - > > > - hci_update_background_scan(hdev); > > > } > > > > I am happy to apply this as well, but I also applied another patch > > re-arranging the workqueue destroy handling. Can you check which prefer or > > if we should include both patches. > > > > Regards > > > > Marcel > >
Re: [PATCH v3] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Marc, On 6/30/20 7:05 PM, Marc Zyngier wrote: > Booting a recent kernel on a rk3399-based system (nanopc-t4), > equipped with a recent u-boot and ATF results in an Oops due > to a NULL pointer dereference. > > This turns out to be due to the rk3399-dmc driver looking for > an *undocumented* property (rockchip,pmu), and happily using > a NULL pointer when the property isn't there. > > Instead, make most of what was brought in with 9173c5ceb035 > ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters > to TF-A.") conditioned on finding this property in the device-tree, > preventing the driver from exploding. > > Cc: sta...@vger.kernel.org > Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down > parameters to TF-A.") > Signed-off-by: Marc Zyngier > --- > * From v2: > - Trimmed down commit message > - Cc stable > > drivers/devfreq/rk3399_dmc.c | 42 > 1 file changed, 23 insertions(+), 19 deletions(-) > > diff --git a/drivers/devfreq/rk3399_dmc.c b/drivers/devfreq/rk3399_dmc.c > index 24f04f78285b..027769e39f9b 100644 > --- a/drivers/devfreq/rk3399_dmc.c > +++ b/drivers/devfreq/rk3399_dmc.c > @@ -95,18 +95,20 @@ static int rk3399_dmcfreq_target(struct device *dev, > unsigned long *freq, > > mutex_lock(>lock); > > - if (target_rate >= dmcfreq->odt_dis_freq) > - odt_enable = true; > - > - /* > - * This makes a SMC call to the TF-A to set the DDR PD (power-down) > - * timings and to enable or disable the ODT (on-die termination) > - * resistors. > - */ > - arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, > - dmcfreq->odt_pd_arg1, > - ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, > - odt_enable, 0, 0, 0, ); > + if (dmcfreq->regmap_pmu) { > + if (target_rate >= dmcfreq->odt_dis_freq) > + odt_enable = true; > + > + /* > + * This makes a SMC call to the TF-A to set the DDR PD > + * (power-down) timings and to enable or disable the > + * ODT (on-die termination) resistors. > + */ > + arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, > + dmcfreq->odt_pd_arg1, > + ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, > + odt_enable, 0, 0, 0, ); > + } > > /* >* If frequency scaling from low to high, adjust voltage first. > @@ -371,13 +373,14 @@ static int rk3399_dmcfreq_probe(struct platform_device > *pdev) > } > > node = of_parse_phandle(np, "rockchip,pmu", 0); > - if (node) { > - data->regmap_pmu = syscon_node_to_regmap(node); > - of_node_put(node); > - if (IS_ERR(data->regmap_pmu)) { > - ret = PTR_ERR(data->regmap_pmu); > - goto err_edev; > - } > + if (!node) > + goto no_pmu; > + > + data->regmap_pmu = syscon_node_to_regmap(node); > + of_node_put(node); > + if (IS_ERR(data->regmap_pmu)) { > + ret = PTR_ERR(data->regmap_pmu); > + goto err_edev; > } > > regmap_read(data->regmap_pmu, RK3399_PMUGRF_OS_REG2, ); > @@ -399,6 +402,7 @@ static int rk3399_dmcfreq_probe(struct platform_device > *pdev) > goto err_edev; > }; > > +no_pmu: > arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, 0, 0, > ROCKCHIP_SIP_CONFIG_DRAM_INIT, > 0, 0, 0, 0, ); > Applied it. Thanks. -- Best Regards, Chanwoo Choi Samsung Electronics
Re: [PATCH v1] Bluetooth: Fix kernel oops triggered by hci_adv_monitors_clear()
Hi Marcel, hci_unregister_dev() is invoked when the controller is intended to be removed by btusb driver. In other words, there should not be any activity on hdev's workqueue, so the destruction of the workqueue should be the first thing to do to prevent the clear helpers from issuing any work. So my suggestion is to revert the patch re-arranging the workqueue and apply this instead. I should have uploaded this earlier, but I encountered some troubles while verifying the changes. Sorry for the inconvenience. Regards, Miao On Mon, Jun 29, 2020 at 11:51 PM Marcel Holtmann wrote: > > Hi Miao-chen, > > > This fixes the kernel oops by removing unnecessary background scan > > update from hci_adv_monitors_clear() which shouldn't invoke any work > > queue. > > > > The following test was performed. > > - Run "rmmod btusb" and verify that no kernel oops is triggered. > > > > Signed-off-by: Miao-chen Chou > > Reviewed-by: Abhishek Pandit-Subedi > > Reviewed-by: Alain Michaud > > --- > > > > net/bluetooth/hci_core.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > index 5577cf9e2c7cd..77615161c7d72 100644 > > --- a/net/bluetooth/hci_core.c > > +++ b/net/bluetooth/hci_core.c > > @@ -3005,8 +3005,6 @@ void hci_adv_monitors_clear(struct hci_dev *hdev) > > hci_free_adv_monitor(monitor); > > > > idr_destroy(>adv_monitors_idr); > > - > > - hci_update_background_scan(hdev); > > } > > I am happy to apply this as well, but I also applied another patch > re-arranging the workqueue destroy handling. Can you check which prefer or if > we should include both patches. > > Regards > > Marcel >
[PATCH v3] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Booting a recent kernel on a rk3399-based system (nanopc-t4), equipped with a recent u-boot and ATF results in an Oops due to a NULL pointer dereference. This turns out to be due to the rk3399-dmc driver looking for an *undocumented* property (rockchip,pmu), and happily using a NULL pointer when the property isn't there. Instead, make most of what was brought in with 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") conditioned on finding this property in the device-tree, preventing the driver from exploding. Cc: sta...@vger.kernel.org Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") Signed-off-by: Marc Zyngier --- * From v2: - Trimmed down commit message - Cc stable drivers/devfreq/rk3399_dmc.c | 42 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/drivers/devfreq/rk3399_dmc.c b/drivers/devfreq/rk3399_dmc.c index 24f04f78285b..027769e39f9b 100644 --- a/drivers/devfreq/rk3399_dmc.c +++ b/drivers/devfreq/rk3399_dmc.c @@ -95,18 +95,20 @@ static int rk3399_dmcfreq_target(struct device *dev, unsigned long *freq, mutex_lock(>lock); - if (target_rate >= dmcfreq->odt_dis_freq) - odt_enable = true; - - /* -* This makes a SMC call to the TF-A to set the DDR PD (power-down) -* timings and to enable or disable the ODT (on-die termination) -* resistors. -*/ - arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, - dmcfreq->odt_pd_arg1, - ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, - odt_enable, 0, 0, 0, ); + if (dmcfreq->regmap_pmu) { + if (target_rate >= dmcfreq->odt_dis_freq) + odt_enable = true; + + /* +* This makes a SMC call to the TF-A to set the DDR PD +* (power-down) timings and to enable or disable the +* ODT (on-die termination) resistors. +*/ + arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, + dmcfreq->odt_pd_arg1, + ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, + odt_enable, 0, 0, 0, ); + } /* * If frequency scaling from low to high, adjust voltage first. @@ -371,13 +373,14 @@ static int rk3399_dmcfreq_probe(struct platform_device *pdev) } node = of_parse_phandle(np, "rockchip,pmu", 0); - if (node) { - data->regmap_pmu = syscon_node_to_regmap(node); - of_node_put(node); - if (IS_ERR(data->regmap_pmu)) { - ret = PTR_ERR(data->regmap_pmu); - goto err_edev; - } + if (!node) + goto no_pmu; + + data->regmap_pmu = syscon_node_to_regmap(node); + of_node_put(node); + if (IS_ERR(data->regmap_pmu)) { + ret = PTR_ERR(data->regmap_pmu); + goto err_edev; } regmap_read(data->regmap_pmu, RK3399_PMUGRF_OS_REG2, ); @@ -399,6 +402,7 @@ static int rk3399_dmcfreq_probe(struct platform_device *pdev) goto err_edev; }; +no_pmu: arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, 0, 0, ROCKCHIP_SIP_CONFIG_DRAM_INIT, 0, 0, 0, 0, ); -- 2.27.0
Re: [PATCH v1] Bluetooth: Fix kernel oops triggered by hci_adv_monitors_clear()
Hi Miao-chen, > This fixes the kernel oops by removing unnecessary background scan > update from hci_adv_monitors_clear() which shouldn't invoke any work > queue. > > The following test was performed. > - Run "rmmod btusb" and verify that no kernel oops is triggered. > > Signed-off-by: Miao-chen Chou > Reviewed-by: Abhishek Pandit-Subedi > Reviewed-by: Alain Michaud > --- > > net/bluetooth/hci_core.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > index 5577cf9e2c7cd..77615161c7d72 100644 > --- a/net/bluetooth/hci_core.c > +++ b/net/bluetooth/hci_core.c > @@ -3005,8 +3005,6 @@ void hci_adv_monitors_clear(struct hci_dev *hdev) > hci_free_adv_monitor(monitor); > > idr_destroy(>adv_monitors_idr); > - > - hci_update_background_scan(hdev); > } I am happy to apply this as well, but I also applied another patch re-arranging the workqueue destroy handling. Can you check which prefer or if we should include both patches. Regards Marcel
[PATCH v1] Bluetooth: Fix kernel oops triggered by hci_adv_monitors_clear()
This fixes the kernel oops by removing unnecessary background scan update from hci_adv_monitors_clear() which shouldn't invoke any work queue. The following test was performed. - Run "rmmod btusb" and verify that no kernel oops is triggered. Signed-off-by: Miao-chen Chou Reviewed-by: Abhishek Pandit-Subedi Reviewed-by: Alain Michaud --- net/bluetooth/hci_core.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 5577cf9e2c7cd..77615161c7d72 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -3005,8 +3005,6 @@ void hci_adv_monitors_clear(struct hci_dev *hdev) hci_free_adv_monitor(monitor); idr_destroy(>adv_monitors_idr); - - hci_update_background_scan(hdev); } void hci_free_adv_monitor(struct adv_monitor *monitor) -- 2.26.2
Re: [PATCH v2] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Marc, Hi Marc, On 6/29/20 10:22 PM, Marc Zyngier wrote: > On 2020-06-29 12:29, Chanwoo Choi wrote: >> Hi Enric and Mark, >> >> On 6/29/20 8:05 PM, Enric Balletbo i Serra wrote: >>> Hi Chanwoo and Marc, >>> >>> On 29/6/20 13:09, Chanwoo Choi wrote: Hi Enric, Could you check this issue? Your patch[1] causes this issue. As Marc mentioned, although rk3399-dmc.c handled 'rockchip,pmu' as the mandatory property, your patch[1] didn't add the 'rockchip,pmu' property to the documentation. >>> >>> I think the problem is that the DT binding patch, for some reason, was >>> missed >>> and didn't land. The patch seems to have all the required reviews and acks. >>> >>> https://patchwork.kernel.org/patch/10901593/ >>> >>> Sorry because I didn't notice this issue when 9173c5ceb035 landed. And >>> thanks >>> for fixing the issue. >> >> If the 'rockchip,pmu' propery is mandatory, instead of Mark's patch, >> we better to require the merge of patch[1] to DT maintainer. > > It is way too late. Firmware exists (mainline u-boot, for one) that > do not expose the new property, and you can't demand that people > upgrade. This is an ABI bug, and we now have to live with it. As you commented, it is proper that rk3399-dmc.c treats 'rockchip,pmu' property as optional. Could you send v3 with edited patch descritpion and adding stable mailing list to Cc? > > So, yes to fixing the DT, and no to *only* fixing the DT. -- Best Regards, Chanwoo Choi Samsung Electronics
Re: [PATCH v2] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Chanwoo, On 29/6/20 13:29, Chanwoo Choi wrote: > Hi Enric and Mark, > > On 6/29/20 8:05 PM, Enric Balletbo i Serra wrote: >> Hi Chanwoo and Marc, >> >> On 29/6/20 13:09, Chanwoo Choi wrote: >>> Hi Enric, >>> >>> Could you check this issue? Your patch[1] causes this issue. >>> As Marc mentioned, although rk3399-dmc.c handled 'rockchip,pmu' >>> as the mandatory property, your patch[1] didn't add the 'rockchip,pmu' >>> property to the documentation. >>> >> >> I think the problem is that the DT binding patch, for some reason, was missed >> and didn't land. The patch seems to have all the required reviews and acks. >> >> https://patchwork.kernel.org/patch/10901593/ >> >> Sorry because I didn't notice this issue when 9173c5ceb035 landed. And thanks >> for fixing the issue. > > If the 'rockchip,pmu' propery is mandatory, instead of Mark's patch, > we better to require the merge of patch[1] to DT maintainer. > > [1] https://patchwork.kernel.org/patch/10901593/ > Give me some time to double check, because I think that at this point, is needed on some devices with old firmware but not now. It's been a while since I worked on this, but I suspect that being optional is the right way. Maybe Heiko, who IIRC worked on TF-A has a more clear thought on this? Thanks, Enric >> >> Best regards, >> Enric >> >>> [1] 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT >>> and auto power down parameters to TF-A.") >>> >>> >>> On 6/29/20 5:18 PM, Marc Zyngier wrote: >>>> Hi Chanwoo, >>>> >>>> On Mon, 29 Jun 2020 03:43:37 +0100, >>>> Chanwoo Choi wrote: >>>>> >>>>> Hi Marc, >>>>> >>>>> On 6/23/20 12:28 AM, Marc Zyngier wrote: >>>> >>>> [...] >>>> >>>>> It looks good to me. But, I think that it is not necessary >>>>> fully kernel panic log about NULL pointer. It is enoughspsp >>>>> just mentioning the NULL pointer issue without full kernel panic log. >>>> >>>> I personally find the backtrace useful as it allows people with the >>>> same issue to trawl the kernel log and find whether it has already be >>>> fixed upstream. But it's only me, and I'm not attached to it. >>>> >>>>> So, how about editing the patch description as following or others simply? >>>>> and we need to add 'sta...@vger.kernel.org' to Cc list for applying it >>>>> to stable branch. >>>> >>>> Looks good to me. >>>> >>>>> >>>>> >>>>> PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent >>>>> >>>>> Booting a recent kernel on a rk3399-based system (nanopc-t4), >>>>> equipped with a recent u-boot and ATF results in the kernel panic >>>>> about NULL pointer issue. >>>> >>>> nit: "results in a kernel panic on dereferencing a NULL pointer". >>>> >>>>> >>>>> This turns out to be due to the rk3399-dmc driver looking for >>>>> an *undocumented* property (rockchip,pmu), and happily using >>>>> a NULL pointer when the property isn't there. >>>>> >>>>> Instead, make most of what was brought in with 9173c5ceb035 >>>>> ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters >>>>> to TF-A.") conditioned on finding this property in the device-tree, >>>>> preventing the driver from exploding. >>>>> >>>>> Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto >>>>> power down parameters to TF-A.") >>>>> Signed-off-by: Marc Zyngier >>>>> Signed-off-by: Chanwoo Choi >>>> >>>> >>>> Note that the biggest issue is still there: the driver is using an >>>> undocumented property, and this patch is just papering over it. >>>> Since I expect this property to be useful for something, it would be >>>> good for whoever knows what it does to document it. >>> >>> Hi Marc, >>> >>> You are right. We have to do two step: >>> 1. Add missing explanation of 'rockchip,pmu' property to dt-binding document >>> 2. If possible, add 'rockchip,pmu' property node to rk3399_dmc dt node. >>> >>> When I tried to
Re: [PATCH v2] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Enric and Mark, On 6/29/20 8:05 PM, Enric Balletbo i Serra wrote: > Hi Chanwoo and Marc, > > On 29/6/20 13:09, Chanwoo Choi wrote: >> Hi Enric, >> >> Could you check this issue? Your patch[1] causes this issue. >> As Marc mentioned, although rk3399-dmc.c handled 'rockchip,pmu' >> as the mandatory property, your patch[1] didn't add the 'rockchip,pmu' >> property to the documentation. >> > > I think the problem is that the DT binding patch, for some reason, was missed > and didn't land. The patch seems to have all the required reviews and acks. > > https://patchwork.kernel.org/patch/10901593/ > > Sorry because I didn't notice this issue when 9173c5ceb035 landed. And thanks > for fixing the issue. If the 'rockchip,pmu' propery is mandatory, instead of Mark's patch, we better to require the merge of patch[1] to DT maintainer. [1] https://patchwork.kernel.org/patch/10901593/ > > Best regards, > Enric > >> [1] 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT >> and auto power down parameters to TF-A.") >> >> >> On 6/29/20 5:18 PM, Marc Zyngier wrote: >>> Hi Chanwoo, >>> >>> On Mon, 29 Jun 2020 03:43:37 +0100, >>> Chanwoo Choi wrote: >>>> >>>> Hi Marc, >>>> >>>> On 6/23/20 12:28 AM, Marc Zyngier wrote: >>> >>> [...] >>> >>>> It looks good to me. But, I think that it is not necessary >>>> fully kernel panic log about NULL pointer. It is enoughspsp >>>> just mentioning the NULL pointer issue without full kernel panic log. >>> >>> I personally find the backtrace useful as it allows people with the >>> same issue to trawl the kernel log and find whether it has already be >>> fixed upstream. But it's only me, and I'm not attached to it. >>> >>>> So, how about editing the patch description as following or others simply? >>>> and we need to add 'sta...@vger.kernel.org' to Cc list for applying it >>>> to stable branch. >>> >>> Looks good to me. >>> >>>> >>>> >>>> PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent >>>> >>>> Booting a recent kernel on a rk3399-based system (nanopc-t4), >>>> equipped with a recent u-boot and ATF results in the kernel panic >>>> about NULL pointer issue. >>> >>> nit: "results in a kernel panic on dereferencing a NULL pointer". >>> >>>> >>>> This turns out to be due to the rk3399-dmc driver looking for >>>> an *undocumented* property (rockchip,pmu), and happily using >>>> a NULL pointer when the property isn't there. >>>> >>>> Instead, make most of what was brought in with 9173c5ceb035 >>>> ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters >>>> to TF-A.") conditioned on finding this property in the device-tree, >>>> preventing the driver from exploding. >>>> >>>> Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto >>>> power down parameters to TF-A.") >>>> Signed-off-by: Marc Zyngier >>>> Signed-off-by: Chanwoo Choi >>> >>> >>> Note that the biggest issue is still there: the driver is using an >>> undocumented property, and this patch is just papering over it. >>> Since I expect this property to be useful for something, it would be >>> good for whoever knows what it does to document it. >> >> Hi Marc, >> >> You are right. We have to do two step: >> 1. Add missing explanation of 'rockchip,pmu' property to dt-binding document >> 2. If possible, add 'rockchip,pmu' property node to rk3399_dmc dt node. >> >> When I tried to find usage example of 'rockchip,pmu' property, >> I found them as following: The 'rockchip,pmu' property[2] indicates >> 'PMU (Power Management Unit)'. >> >> $ grep -rn "rockchip,pmu" arch/arm64/boot/dts/ >> arch/arm64/boot/dts/rockchip/px30.dtsi:1211: rockchip,pmu = >> <>; >> arch/arm64/boot/dts/rockchip/rk3399.dtsi:1909: rockchip,pmu = >> <>; >> arch/arm64/boot/dts/rockchip/rk3368.dtsi:807:rockchip,pmu = >> <>; >> >> [2] the description of 'rockchip,pmu' property >> - >> https://protect2.fireeye.com/url?k=e55f0ba3-b8384f85-e55e80ec-0cc47a31384a-d9c5f6b28aba9be6=1=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7.2%2Fs
Re: [PATCH v2] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Enric, Could you check this issue? Your patch[1] causes this issue. As Marc mentioned, although rk3399-dmc.c handled 'rockchip,pmu' as the mandatory property, your patch[1] didn't add the 'rockchip,pmu' property to the documentation. [1] 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") On 6/29/20 5:18 PM, Marc Zyngier wrote: > Hi Chanwoo, > > On Mon, 29 Jun 2020 03:43:37 +0100, > Chanwoo Choi wrote: >> >> Hi Marc, >> >> On 6/23/20 12:28 AM, Marc Zyngier wrote: > > [...] > >> It looks good to me. But, I think that it is not necessary >> fully kernel panic log about NULL pointer. It is enoughspsp >> just mentioning the NULL pointer issue without full kernel panic log. > > I personally find the backtrace useful as it allows people with the > same issue to trawl the kernel log and find whether it has already be > fixed upstream. But it's only me, and I'm not attached to it. > >> So, how about editing the patch description as following or others simply? >> and we need to add 'sta...@vger.kernel.org' to Cc list for applying it >> to stable branch. > > Looks good to me. > >> >> >> PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent >> >> Booting a recent kernel on a rk3399-based system (nanopc-t4), >> equipped with a recent u-boot and ATF results in the kernel panic >> about NULL pointer issue. > > nit: "results in a kernel panic on dereferencing a NULL pointer". > >> >> This turns out to be due to the rk3399-dmc driver looking for >> an *undocumented* property (rockchip,pmu), and happily using >> a NULL pointer when the property isn't there. >> >> Instead, make most of what was brought in with 9173c5ceb035 >> ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters >> to TF-A.") conditioned on finding this property in the device-tree, >> preventing the driver from exploding. >> >> Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power >> down parameters to TF-A.") >> Signed-off-by: Marc Zyngier >> Signed-off-by: Chanwoo Choi > > > Note that the biggest issue is still there: the driver is using an > undocumented property, and this patch is just papering over it. > Since I expect this property to be useful for something, it would be > good for whoever knows what it does to document it. Hi Marc, You are right. We have to do two step: 1. Add missing explanation of 'rockchip,pmu' property to dt-binding document 2. If possible, add 'rockchip,pmu' property node to rk3399_dmc dt node. When I tried to find usage example of 'rockchip,pmu' property, I found them as following: The 'rockchip,pmu' property[2] indicates 'PMU (Power Management Unit)'. $ grep -rn "rockchip,pmu" arch/arm64/boot/dts/ arch/arm64/boot/dts/rockchip/px30.dtsi:1211:rockchip,pmu = <>; arch/arm64/boot/dts/rockchip/rk3399.dtsi:1909: rockchip,pmu = <>; arch/arm64/boot/dts/rockchip/rk3368.dtsi:807: rockchip,pmu = <>; [2] the description of 'rockchip,pmu' property - https://elixir.bootlin.com/linux/v5.7.2/source/Documentation/devicetree/bindings/pinctrl/rockchip,pinctrl.txt#L40 If don't receive the any reply, I'll add as following: cwchoi00@chan-linux-pc:~/kernel/git.kernel/linux.chanwoo$ d diff --git a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt index 0ec68141f85a..161e60ea874b 100644 --- a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt +++ b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt @@ -18,6 +18,8 @@ Optional properties: format depends on the interrupt controller. It should be a DCF interrupt. When DDR DVFS finishes a DCF interrupt is triggered. +- rockchip,pmu: Phandle to the syscon managing the "pmu general +register files". Following properties relate to DDR timing: -- Best Regards, Chanwoo Choi Samsung Electronics
Re: [PATCH v2] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Chanwoo and Marc, On 29/6/20 13:09, Chanwoo Choi wrote: > Hi Enric, > > Could you check this issue? Your patch[1] causes this issue. > As Marc mentioned, although rk3399-dmc.c handled 'rockchip,pmu' > as the mandatory property, your patch[1] didn't add the 'rockchip,pmu' > property to the documentation. > I think the problem is that the DT binding patch, for some reason, was missed and didn't land. The patch seems to have all the required reviews and acks. https://patchwork.kernel.org/patch/10901593/ Sorry because I didn't notice this issue when 9173c5ceb035 landed. And thanks for fixing the issue. Best regards, Enric > [1] 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT > and auto power down parameters to TF-A.") > > > On 6/29/20 5:18 PM, Marc Zyngier wrote: >> Hi Chanwoo, >> >> On Mon, 29 Jun 2020 03:43:37 +0100, >> Chanwoo Choi wrote: >>> >>> Hi Marc, >>> >>> On 6/23/20 12:28 AM, Marc Zyngier wrote: >> >> [...] >> >>> It looks good to me. But, I think that it is not necessary >>> fully kernel panic log about NULL pointer. It is enoughspsp >>> just mentioning the NULL pointer issue without full kernel panic log. >> >> I personally find the backtrace useful as it allows people with the >> same issue to trawl the kernel log and find whether it has already be >> fixed upstream. But it's only me, and I'm not attached to it. >> >>> So, how about editing the patch description as following or others simply? >>> and we need to add 'sta...@vger.kernel.org' to Cc list for applying it >>> to stable branch. >> >> Looks good to me. >> >>> >>> >>> PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent >>> >>> Booting a recent kernel on a rk3399-based system (nanopc-t4), >>> equipped with a recent u-boot and ATF results in the kernel panic >>> about NULL pointer issue. >> >> nit: "results in a kernel panic on dereferencing a NULL pointer". >> >>> >>> This turns out to be due to the rk3399-dmc driver looking for >>> an *undocumented* property (rockchip,pmu), and happily using >>> a NULL pointer when the property isn't there. >>> >>> Instead, make most of what was brought in with 9173c5ceb035 >>> ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters >>> to TF-A.") conditioned on finding this property in the device-tree, >>> preventing the driver from exploding. >>> >>> Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power >>> down parameters to TF-A.") >>> Signed-off-by: Marc Zyngier >>> Signed-off-by: Chanwoo Choi >> >> >> Note that the biggest issue is still there: the driver is using an >> undocumented property, and this patch is just papering over it. >> Since I expect this property to be useful for something, it would be >> good for whoever knows what it does to document it. > > Hi Marc, > > You are right. We have to do two step: > 1. Add missing explanation of 'rockchip,pmu' property to dt-binding document > 2. If possible, add 'rockchip,pmu' property node to rk3399_dmc dt node. > > When I tried to find usage example of 'rockchip,pmu' property, > I found them as following: The 'rockchip,pmu' property[2] indicates > 'PMU (Power Management Unit)'. > > $ grep -rn "rockchip,pmu" arch/arm64/boot/dts/ > arch/arm64/boot/dts/rockchip/px30.dtsi:1211: rockchip,pmu = > <>; > arch/arm64/boot/dts/rockchip/rk3399.dtsi:1909:rockchip,pmu = > <>; > arch/arm64/boot/dts/rockchip/rk3368.dtsi:807: rockchip,pmu = > <>; > > [2] the description of 'rockchip,pmu' property > - > https://elixir.bootlin.com/linux/v5.7.2/source/Documentation/devicetree/bindings/pinctrl/rockchip,pinctrl.txt#L40 > > > If don't receive the any reply, I'll add as following: > > cwchoi00@chan-linux-pc:~/kernel/git.kernel/linux.chanwoo$ d > diff --git a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt > b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt > index 0ec68141f85a..161e60ea874b 100644 > --- a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt > +++ b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt > @@ -18,6 +18,8 @@ Optional properties: > format depends on the interrupt controller. > It should be a DCF interrupt. When DDR DVFS finishes > a DCF interrupt is triggered. > +- rockchip,pmu: Phandle to the syscon managing the "pmu > general > +register files". > > Following properties relate to DDR timing: > > >
Re: [PATCH v2] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Enric, On 6/29/20 8:26 PM, Enric Balletbo i Serra wrote: > Hi Chanwoo, > > On 29/6/20 13:29, Chanwoo Choi wrote: >> Hi Enric and Mark, >> >> On 6/29/20 8:05 PM, Enric Balletbo i Serra wrote: >>> Hi Chanwoo and Marc, >>> >>> On 29/6/20 13:09, Chanwoo Choi wrote: >>>> Hi Enric, >>>> >>>> Could you check this issue? Your patch[1] causes this issue. >>>> As Marc mentioned, although rk3399-dmc.c handled 'rockchip,pmu' >>>> as the mandatory property, your patch[1] didn't add the 'rockchip,pmu' >>>> property to the documentation. >>>> >>> >>> I think the problem is that the DT binding patch, for some reason, was >>> missed >>> and didn't land. The patch seems to have all the required reviews and acks. >>> >>> https://patchwork.kernel.org/patch/10901593/ >>> >>> Sorry because I didn't notice this issue when 9173c5ceb035 landed. And >>> thanks >>> for fixing the issue. >> >> If the 'rockchip,pmu' propery is mandatory, instead of Mark's patch, >> we better to require the merge of patch[1] to DT maintainer. >> >> [1] https://patchwork.kernel.org/patch/10901593/ >> > > Give me some time to double check, because I think that at this point, is > needed > on some devices with old firmware but not now. It's been a while since I > worked > on this, but I suspect that being optional is the right way. OK. Thanks for your reply. > > Maybe Heiko, who IIRC worked on TF-A has a more clear thought on this? > > Thanks, > Enric > >>> >>> Best regards, >>> Enric >>> >>>> [1] 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT >>>> and auto power down parameters to TF-A.") >>>> >>>> >>>> On 6/29/20 5:18 PM, Marc Zyngier wrote: >>>>> Hi Chanwoo, >>>>> >>>>> On Mon, 29 Jun 2020 03:43:37 +0100, >>>>> Chanwoo Choi wrote: >>>>>> >>>>>> Hi Marc, >>>>>> >>>>>> On 6/23/20 12:28 AM, Marc Zyngier wrote: >>>>> >>>>> [...] >>>>> >>>>>> It looks good to me. But, I think that it is not necessary >>>>>> fully kernel panic log about NULL pointer. It is enoughspsp >>>>>> just mentioning the NULL pointer issue without full kernel panic log. >>>>> >>>>> I personally find the backtrace useful as it allows people with the >>>>> same issue to trawl the kernel log and find whether it has already be >>>>> fixed upstream. But it's only me, and I'm not attached to it. >>>>> >>>>>> So, how about editing the patch description as following or others >>>>>> simply? >>>>>> and we need to add 'sta...@vger.kernel.org' to Cc list for applying it >>>>>> to stable branch. >>>>> >>>>> Looks good to me. >>>>> >>>>>> >>>>>> >>>>>> PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent >>>>>> >>>>>> Booting a recent kernel on a rk3399-based system (nanopc-t4), >>>>>> equipped with a recent u-boot and ATF results in the kernel panic >>>>>> about NULL pointer issue. >>>>> >>>>> nit: "results in a kernel panic on dereferencing a NULL pointer". >>>>> >>>>>> >>>>>> This turns out to be due to the rk3399-dmc driver looking for >>>>>> an *undocumented* property (rockchip,pmu), and happily using >>>>>> a NULL pointer when the property isn't there. >>>>>> >>>>>> Instead, make most of what was brought in with 9173c5ceb035 >>>>>> ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters >>>>>> to TF-A.") conditioned on finding this property in the device-tree, >>>>>> preventing the driver from exploding. >>>>>> >>>>>> Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto >>>>>> power down parameters to TF-A.") >>>>>> Signed-off-by: Marc Zyngier >>>>>> Signed-off-by: Chanwoo Choi >>>>> >>>>> >>>>> Note that the biggest issue is still there: the driver
Re: [PATCH v2] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Chanwoo, On Mon, 29 Jun 2020 03:43:37 +0100, Chanwoo Choi wrote: > > Hi Marc, > > On 6/23/20 12:28 AM, Marc Zyngier wrote: [...] > It looks good to me. But, I think that it is not necessary > fully kernel panic log about NULL pointer. It is enoughspsp > just mentioning the NULL pointer issue without full kernel panic log. I personally find the backtrace useful as it allows people with the same issue to trawl the kernel log and find whether it has already be fixed upstream. But it's only me, and I'm not attached to it. > So, how about editing the patch description as following or others simply? > and we need to add 'sta...@vger.kernel.org' to Cc list for applying it > to stable branch. Looks good to me. > > > PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent > > Booting a recent kernel on a rk3399-based system (nanopc-t4), > equipped with a recent u-boot and ATF results in the kernel panic > about NULL pointer issue. nit: "results in a kernel panic on dereferencing a NULL pointer". > > This turns out to be due to the rk3399-dmc driver looking for > an *undocumented* property (rockchip,pmu), and happily using > a NULL pointer when the property isn't there. > > Instead, make most of what was brought in with 9173c5ceb035 > ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters > to TF-A.") conditioned on finding this property in the device-tree, > preventing the driver from exploding. > > Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power > down parameters to TF-A.") > Signed-off-by: Marc Zyngier > Signed-off-by: Chanwoo Choi Note that the biggest issue is still there: the driver is using an undocumented property, and this patch is just papering over it. Since I expect this property to be useful for something, it would be good for whoever knows what it does to document it. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v2] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
On 2020-06-29 12:29, Chanwoo Choi wrote: Hi Enric and Mark, On 6/29/20 8:05 PM, Enric Balletbo i Serra wrote: Hi Chanwoo and Marc, On 29/6/20 13:09, Chanwoo Choi wrote: Hi Enric, Could you check this issue? Your patch[1] causes this issue. As Marc mentioned, although rk3399-dmc.c handled 'rockchip,pmu' as the mandatory property, your patch[1] didn't add the 'rockchip,pmu' property to the documentation. I think the problem is that the DT binding patch, for some reason, was missed and didn't land. The patch seems to have all the required reviews and acks. https://patchwork.kernel.org/patch/10901593/ Sorry because I didn't notice this issue when 9173c5ceb035 landed. And thanks for fixing the issue. If the 'rockchip,pmu' propery is mandatory, instead of Mark's patch, we better to require the merge of patch[1] to DT maintainer. It is way too late. Firmware exists (mainline u-boot, for one) that do not expose the new property, and you can't demand that people upgrade. This is an ABI bug, and we now have to live with it. So, yes to fixing the DT, and no to *only* fixing the DT. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v2] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
e. > > Instead, make most of what was brought in with 9173c5ceb035 > ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters > to TF-A.") conditioned on finding this property in the device-tree, > preventing the driver from exploding. > > Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down > parameters to TF-A.") > Signed-off-by: Marc Zyngier > --- > drivers/devfreq/rk3399_dmc.c | 42 > 1 file changed, 23 insertions(+), 19 deletions(-) > > diff --git a/drivers/devfreq/rk3399_dmc.c b/drivers/devfreq/rk3399_dmc.c > index 24f04f78285b..027769e39f9b 100644 > --- a/drivers/devfreq/rk3399_dmc.c > +++ b/drivers/devfreq/rk3399_dmc.c > @@ -95,18 +95,20 @@ static int rk3399_dmcfreq_target(struct device *dev, > unsigned long *freq, > > mutex_lock(>lock); > > - if (target_rate >= dmcfreq->odt_dis_freq) > - odt_enable = true; > - > - /* > - * This makes a SMC call to the TF-A to set the DDR PD (power-down) > - * timings and to enable or disable the ODT (on-die termination) > - * resistors. > - */ > - arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, > - dmcfreq->odt_pd_arg1, > - ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, > - odt_enable, 0, 0, 0, ); > + if (dmcfreq->regmap_pmu) { > + if (target_rate >= dmcfreq->odt_dis_freq) > + odt_enable = true; > + > + /* > + * This makes a SMC call to the TF-A to set the DDR PD > + * (power-down) timings and to enable or disable the > + * ODT (on-die termination) resistors. > + */ > + arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, > + dmcfreq->odt_pd_arg1, > + ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, > + odt_enable, 0, 0, 0, ); > + } > > /* >* If frequency scaling from low to high, adjust voltage first. > @@ -371,13 +373,14 @@ static int rk3399_dmcfreq_probe(struct platform_device > *pdev) > } > > node = of_parse_phandle(np, "rockchip,pmu", 0); > - if (node) { > - data->regmap_pmu = syscon_node_to_regmap(node); > - of_node_put(node); > - if (IS_ERR(data->regmap_pmu)) { > - ret = PTR_ERR(data->regmap_pmu); > - goto err_edev; > - } > + if (!node) > + goto no_pmu; > + > + data->regmap_pmu = syscon_node_to_regmap(node); > + of_node_put(node); > + if (IS_ERR(data->regmap_pmu)) { > + ret = PTR_ERR(data->regmap_pmu); > + goto err_edev; > } > > regmap_read(data->regmap_pmu, RK3399_PMUGRF_OS_REG2, ); > @@ -399,6 +402,7 @@ static int rk3399_dmcfreq_probe(struct platform_device > *pdev) > goto err_edev; > }; > > +no_pmu: > arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, 0, 0, > ROCKCHIP_SIP_CONFIG_DRAM_INIT, > 0, 0, 0, 0, ); > It looks good to me. But, I think that it is not necessary fully kernel panic log about NULL pointer. It is enoughspsp just mentioning the NULL pointer issue without full kernel panic log. So, how about editing the patch description as following or others simply? and we need to add 'sta...@vger.kernel.org' to Cc list for applying it to stable branch. PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent Booting a recent kernel on a rk3399-based system (nanopc-t4), equipped with a recent u-boot and ATF results in the kernel panic about NULL pointer issue. This turns out to be due to the rk3399-dmc driver looking for an *undocumented* property (rockchip,pmu), and happily using a NULL pointer when the property isn't there. Instead, make most of what was brought in with 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") conditioned on finding this property in the device-tree, preventing the driver from exploding. Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") Signed-off-by: Marc Zyngier Signed-off-by: Chanwoo Choi -- Best Regards, Chanwoo Choi Samsung Electronics
Re: [PATCH] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
On 2020-06-23 09:55, Heiko Stübner wrote: Am Montag, 22. Juni 2020, 17:07:52 CEST schrieb Marc Zyngier: [...] maz@fine-girl:~$ sudo dtc -I dtb /sys/firmware/fdt 2>/dev/null | grep -A 5 dmc dmc { u-boot,dm-pre-reloc; compatible = "rockchip,rk3399-dmc"; devfreq-events = <0xc8>; [followed by a ton of timings...] It is definitely coming from u-boot (I don't provide any DTB otherwise, and you can find the corresponding node and timings in the u-boot tree). which is probably the source of the problem :-) . I'm pretty sure the "reviewed" binding in the kernel doesn't match the dt-nodes used in uboot. and the driver doesn't match the binding either. Frankly, this is badly messed up. While u-boot these days syncs the main devicetrees from Linux, the memory setup stuff is pretty specific to uboot (and lives in separate dtsi files). And I guess you're the only one feeding uboot's dtb to Linux directly, hence nobody else did encounter this before ;-) . I'm not "feeding" it directly. I'm using the expected DT distribution mechanism, which is the boot firmware. Nobody should ever have to provide their own DT to the kernel. Thanks, M. (starting to like ACPI more and more every day) -- Jazz is not dead. It just smells funny...
Re: [PATCH] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Am Montag, 22. Juni 2020, 17:07:52 CEST schrieb Marc Zyngier: > Hi Heiko, > > On 2020-06-22 14:54, Heiko Stübner wrote: > > Hi Marc, > > > > Am Montag, 22. Juni 2020, 15:31:55 CEST schrieb Marc Zyngier: > >> On Sat, 13 Jun 2020 11:24:35 +0100 > >> Marc Zyngier wrote: > >> > >> > Booting a recent kernel on a rk3399-based system (nanopc-t4), > >> > equipped with a recent u-boot and ATF results in the following: > >> > > >> > [5.607431] Unable to handle kernel NULL pointer dereference at > >> > virtual address 01e4 > >> > [5.608219] Mem abort info: > >> > [5.608469] ESR = 0x9604 > >> > [5.608749] EC = 0x25: DABT (current EL), IL = 32 bits > >> > [5.609223] SET = 0, FnV = 0 > >> > [5.609600] EA = 0, S1PTW = 0 > >> > [5.609891] Data abort info: > >> > [5.610149] ISV = 0, ISS = 0x0004 > >> > [5.610489] CM = 0, WnR = 0 > >> > [5.610757] user pgtable: 4k pages, 48-bit VAs, pgdp=e62fb000 > >> > [5.611326] [01e4] pgd=, > >> > p4d= > >> > [5.611931] Internal error: Oops: 9604 [#1] SMP > >> > [5.612363] Modules linked in: rockchip_thermal(E+) rk3399_dmc(E+) > >> > soundcore(E) dw_wdt(E) rockchip_dfi(E) nvmem_rockchip_efuse(E) > >> > pwm_rockchip(E) cfg80211(E+) rockchip_saradc(E) industrialio(E) > >> > rfkill(E) cpufreq_dt(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) > >> > crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) realtek(E) nvme(E) > >> > nvme_core(E) t10_pi(E) xhci_plat_hcd(E) xhci_hcd(E) rtc_rk808(E) > >> > rk808_regulator(E) clk_rk808(E) dwc3(E) udc_core(E) roles(E) ulpi(E) > >> > rk808(E) > >> fan53555(E) rockchipdrm(E) analogix_dp(E) dw_hdmi(E) cec(E) > >> dw_mipi_dsi(E) fixed(E) dwc3_of_simple(E) phy_rockchip_emmc(E) > >> gpio_keys(E) drm_kms_helper(E) phy_rockchip_inno_usb2(E) > >> ehci_platform(E) dwmac_rk(E) stmmac_platform(E) phy_rockchip_pcie(E) > >> ohci_platform(E) ohci_hcd(E) rockchip_io_domain(E) stmmac(E) > >> phy_rockchip_typec(E) ehci_hcd(E) sdhci_of_arasan(E) mdio_xpcs(E) > >> sdhci_pltfm(E) cqhci(E) drm(E) sdhci(E) phylink(E) of_mdio(E) > >> usbcore(E) i2c_rk3x(E) dw_mmc_rockchip(E) dw_mmc_pltfm(E) dw_mmc(E) > >> fixed_phy(E) libphy(E) > >> > [5.612454] pl330(E) > >> > [5.620255] CPU: 1 PID: 270 Comm: systemd-udevd Tainted: G > >> > E 5.7.0-13692-g83ae758d8b22 #1157 > >> > [5.621110] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS > >> > 2020.07-rc4-00023-g10d4cafe0f 06/10/2020 > >> > [5.621947] pstate: 4005 (nZcv daif -PAN -UAO BTYPE=--) > >> > [5.622446] pc : regmap_read+0x1c/0x80 > >> > [5.622787] lr : rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] > >> > [5.623299] sp : 8000126cb8a0 > >> > [5.623594] x29: 8000126cb8a0 x28: 8000126cbdb0 > >> > [5.624063] x27: f22dac40 x26: f6779800 > >> > [5.624533] x25: f6779810 x24: ffea > >> > [5.625002] x23: ffea x22: f65b74c8 > >> > [5.625471] x21: f783ca08 x20: f65b7480 > >> > [5.625941] x19: x18: 0001 > >> > [5.626410] x17: x16: > >> > [5.626878] x15: f22db138 x14: > >> > [5.627347] x13: 0018 x12: 80001106a8c7 > >> > [5.627817] x11: 0003 x10: 0101010101010101 > >> > [5.627861] systemd[1]: Found device SPCC M.2 PCIE SSD 3. > >> > [5.628286] x9 : 88d7c89c x8 : 7f7f7f7f7f7f7f7f > >> > [5.629238] x7 : fefefeff646c606d x6 : 1c0e0e0ee3e8e9f0 > >> > [5.629709] x5 : 706968630e0e0e1c x4 : 80808080 > >> > [5.630178] x3 : 937b1b5b1b434b80 x2 : 8000126cb944 > >> > [5.630648] x1 : 0308 x0 : > >> > [5.631119] Call trace: > >> > [5.631346] regmap_read+0x1c/0x80 > >> > [5.631654] rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] > >> > [5.632142] platform_drv_probe+0x5c/0xb0 > >> > [5.632500] really_probe+0xe4/0x448 > >> > [5.632819] driver_probe_device+0xfc/0x168 > >> > [5.633191] device_driver_attach+0x7c/0x88 > >> > [5.633567] __driver_attach+0xac/0x178 > >> > [5.633914] bus_for_each_dev+0x78/0xc8 > >> > [5.634261] driver_attach+0x2c/0x38 > >> > [5.634582] bus_add_driver+0x14c/0x230 > >> > [5.634925] driver_register+0x6c/0x128 > >> > [5.635269] __platform_driver_register+0x50/0x60 > >> > [5.635692] rk3399_dmcfreq_driver_init+0x2c/0x1000 [rk3399_dmc] > >> > [5.636226] do_one_initcall+0x50/0x230 > >> > [5.636569] do_init_module+0x60/0x248 > >> > [5.636902] load_module+0x21f8/0x28d8 > >> > [5.637237] __do_sys_finit_module+0xb0/0x118 > >> > [5.637627] __arm64_sys_finit_module+0x28/0x38 > >> > [5.638031] el0_svc_common.constprop.0+0x7c/0x1f8 > >> > [5.638456] do_el0_svc+0x2c/0x98 > >> > [5.638754] el0_svc+0x18/0x48 > >> > [5.639029]
[PATCH v2] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Booting a recent kernel on a rk3399-based system (nanopc-t4), equipped with a recent u-boot and ATF results in the following: [5.607431] Unable to handle kernel NULL pointer dereference at virtual address 01e4 [5.608219] Mem abort info: [5.608469] ESR = 0x9604 [5.608749] EC = 0x25: DABT (current EL), IL = 32 bits [5.609223] SET = 0, FnV = 0 [5.609600] EA = 0, S1PTW = 0 [5.609891] Data abort info: [5.610149] ISV = 0, ISS = 0x0004 [5.610489] CM = 0, WnR = 0 [5.610757] user pgtable: 4k pages, 48-bit VAs, pgdp=e62fb000 [5.611326] [01e4] pgd=, p4d= [5.611931] Internal error: Oops: 9604 [#1] SMP [5.612363] Modules linked in: rockchip_thermal(E+) rk3399_dmc(E+) soundcore(E) dw_wdt(E) rockchip_dfi(E) nvmem_rockchip_efuse(E) pwm_rockchip(E) cfg80211(E+) rockchip_saradc(E) industrialio(E) rfkill(E) cpufreq_dt(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) realtek(E) nvme(E) nvme_core(E) t10_pi(E) xhci_plat_hcd(E) xhci_hcd(E) rtc_rk808(E) rk808_regulator(E) clk_rk808(E) dwc3(E) udc_core(E) roles(E) ulpi(E) rk808(E) fan53555(E) rockchipdrm(E) analogix_dp(E) dw_hdmi(E) cec(E) dw_mipi_dsi(E) fixed(E) dwc3_of_simple(E) phy_rockchip_emmc(E) gpio_keys(E) drm_kms_helper(E) phy_rockchip_inno_usb2(E) ehci_platform(E) dwmac_rk(E) stmmac_platform(E) phy_rockchip_pcie(E) ohci_platform(E) ohci_hcd(E) rockchip_io_domain(E) stmmac(E) phy_rockchip_typec(E) ehci_hcd(E) sdhci_of_arasan(E) mdio_xpcs(E) sdhci_pltfm(E) cqhci(E) drm(E) sdhci(E) phylink(E) of_mdio(E) usbcore(E) i2c_rk3x(E) dw_mmc_rockchip(E) dw_mmc_pltfm(E) dw_mmc(E) fixed_phy(E) libphy(E) [5.612454] pl330(E) [5.620255] CPU: 1 PID: 270 Comm: systemd-udevd Tainted: GE 5.7.0-13692-g83ae758d8b22 #1157 [5.621110] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2020.07-rc4-00023-g10d4cafe0f 06/10/2020 [5.621947] pstate: 4005 (nZcv daif -PAN -UAO BTYPE=--) [5.622446] pc : regmap_read+0x1c/0x80 [5.622787] lr : rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] [5.623299] sp : 8000126cb8a0 [5.623594] x29: 8000126cb8a0 x28: 8000126cbdb0 [5.624063] x27: f22dac40 x26: f6779800 [5.624533] x25: f6779810 x24: ffea [5.625002] x23: ffea x22: f65b74c8 [5.625471] x21: f783ca08 x20: f65b7480 [5.625941] x19: x18: 0001 [5.626410] x17: x16: [5.626878] x15: f22db138 x14: [5.627347] x13: 0018 x12: 80001106a8c7 [5.627817] x11: 0003 x10: 0101010101010101 [5.627861] systemd[1]: Found device SPCC M.2 PCIE SSD 3. [5.628286] x9 : 88d7c89c x8 : 7f7f7f7f7f7f7f7f [5.629238] x7 : fefefeff646c606d x6 : 1c0e0e0ee3e8e9f0 [5.629709] x5 : 706968630e0e0e1c x4 : 80808080 [5.630178] x3 : 937b1b5b1b434b80 x2 : 8000126cb944 [5.630648] x1 : 0308 x0 : [5.631119] Call trace: [5.631346] regmap_read+0x1c/0x80 [5.631654] rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] [5.632142] platform_drv_probe+0x5c/0xb0 [5.632500] really_probe+0xe4/0x448 [5.632819] driver_probe_device+0xfc/0x168 [5.633191] device_driver_attach+0x7c/0x88 [5.633567] __driver_attach+0xac/0x178 [5.633914] bus_for_each_dev+0x78/0xc8 [5.634261] driver_attach+0x2c/0x38 [5.634582] bus_add_driver+0x14c/0x230 [5.634925] driver_register+0x6c/0x128 [5.635269] __platform_driver_register+0x50/0x60 [5.635692] rk3399_dmcfreq_driver_init+0x2c/0x1000 [rk3399_dmc] [5.636226] do_one_initcall+0x50/0x230 [5.636569] do_init_module+0x60/0x248 [5.636902] load_module+0x21f8/0x28d8 [5.637237] __do_sys_finit_module+0xb0/0x118 [5.637627] __arm64_sys_finit_module+0x28/0x38 [5.638031] el0_svc_common.constprop.0+0x7c/0x1f8 [5.638456] do_el0_svc+0x2c/0x98 [5.638754] el0_svc+0x18/0x48 [5.639029] el0_sync_handler+0x8c/0x2d4 [5.639378] el0_sync+0x158/0x180 [5.639680] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (b941e400) [5.640221] ---[ end trace 63675fe5d0021970 ]--- This turns out to be due to the rk3399-dmc driver looking for an *undocumented* property (rockchip,pmu), and happily using a NULL pointer when the property isn't there. Instead, make most of what was brought in with 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") conditioned on finding this property in the device-tree, preventing the driver from exploding. Fixes: 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto power down parameters to TF-A.") Signed-off-by: Marc Zyngier --- drivers/devfreq/rk3399_dmc.c | 42 1 file changed, 23 insertions(+), 19
Re: [PATCH] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Heiko, On 2020-06-22 14:54, Heiko Stübner wrote: Hi Marc, Am Montag, 22. Juni 2020, 15:31:55 CEST schrieb Marc Zyngier: On Sat, 13 Jun 2020 11:24:35 +0100 Marc Zyngier wrote: > Booting a recent kernel on a rk3399-based system (nanopc-t4), > equipped with a recent u-boot and ATF results in the following: > > [5.607431] Unable to handle kernel NULL pointer dereference at virtual address 01e4 > [5.608219] Mem abort info: > [5.608469] ESR = 0x9604 > [5.608749] EC = 0x25: DABT (current EL), IL = 32 bits > [5.609223] SET = 0, FnV = 0 > [5.609600] EA = 0, S1PTW = 0 > [5.609891] Data abort info: > [5.610149] ISV = 0, ISS = 0x0004 > [5.610489] CM = 0, WnR = 0 > [5.610757] user pgtable: 4k pages, 48-bit VAs, pgdp=e62fb000 > [5.611326] [01e4] pgd=, p4d= > [5.611931] Internal error: Oops: 9604 [#1] SMP > [5.612363] Modules linked in: rockchip_thermal(E+) rk3399_dmc(E+) soundcore(E) dw_wdt(E) rockchip_dfi(E) nvmem_rockchip_efuse(E) pwm_rockchip(E) cfg80211(E+) rockchip_saradc(E) industrialio(E) rfkill(E) cpufreq_dt(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) realtek(E) nvme(E) nvme_core(E) t10_pi(E) xhci_plat_hcd(E) xhci_hcd(E) rtc_rk808(E) rk808_regulator(E) clk_rk808(E) dwc3(E) udc_core(E) roles(E) ulpi(E) rk808(E) fan53555(E) rockchipdrm(E) analogix_dp(E) dw_hdmi(E) cec(E) dw_mipi_dsi(E) fixed(E) dwc3_of_simple(E) phy_rockchip_emmc(E) gpio_keys(E) drm_kms_helper(E) phy_rockchip_inno_usb2(E) ehci_platform(E) dwmac_rk(E) stmmac_platform(E) phy_rockchip_pcie(E) ohci_platform(E) ohci_hcd(E) rockchip_io_domain(E) stmmac(E) phy_rockchip_typec(E) ehci_hcd(E) sdhci_of_arasan(E) mdio_xpcs(E) sdhci_pltfm(E) cqhci(E) drm(E) sdhci(E) phylink(E) of_mdio(E) usbcore(E) i2c_rk3x(E) dw_mmc_rockchip(E) dw_mmc_pltfm(E) dw_mmc(E) fixed_phy(E) libphy(E) > [5.612454] pl330(E) > [5.620255] CPU: 1 PID: 270 Comm: systemd-udevd Tainted: GE 5.7.0-13692-g83ae758d8b22 #1157 > [5.621110] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2020.07-rc4-00023-g10d4cafe0f 06/10/2020 > [5.621947] pstate: 4005 (nZcv daif -PAN -UAO BTYPE=--) > [5.622446] pc : regmap_read+0x1c/0x80 > [5.622787] lr : rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] > [5.623299] sp : 8000126cb8a0 > [5.623594] x29: 8000126cb8a0 x28: 8000126cbdb0 > [5.624063] x27: f22dac40 x26: f6779800 > [5.624533] x25: f6779810 x24: ffea > [5.625002] x23: ffea x22: f65b74c8 > [5.625471] x21: f783ca08 x20: f65b7480 > [5.625941] x19: x18: 0001 > [5.626410] x17: x16: > [5.626878] x15: f22db138 x14: > [5.627347] x13: 0018 x12: 80001106a8c7 > [5.627817] x11: 0003 x10: 0101010101010101 > [5.627861] systemd[1]: Found device SPCC M.2 PCIE SSD 3. > [5.628286] x9 : 88d7c89c x8 : 7f7f7f7f7f7f7f7f > [5.629238] x7 : fefefeff646c606d x6 : 1c0e0e0ee3e8e9f0 > [5.629709] x5 : 706968630e0e0e1c x4 : 80808080 > [5.630178] x3 : 937b1b5b1b434b80 x2 : 8000126cb944 > [5.630648] x1 : 0308 x0 : > [5.631119] Call trace: > [5.631346] regmap_read+0x1c/0x80 > [5.631654] rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] > [5.632142] platform_drv_probe+0x5c/0xb0 > [5.632500] really_probe+0xe4/0x448 > [5.632819] driver_probe_device+0xfc/0x168 > [5.633191] device_driver_attach+0x7c/0x88 > [5.633567] __driver_attach+0xac/0x178 > [5.633914] bus_for_each_dev+0x78/0xc8 > [5.634261] driver_attach+0x2c/0x38 > [5.634582] bus_add_driver+0x14c/0x230 > [5.634925] driver_register+0x6c/0x128 > [5.635269] __platform_driver_register+0x50/0x60 > [5.635692] rk3399_dmcfreq_driver_init+0x2c/0x1000 [rk3399_dmc] > [5.636226] do_one_initcall+0x50/0x230 > [5.636569] do_init_module+0x60/0x248 > [5.636902] load_module+0x21f8/0x28d8 > [5.637237] __do_sys_finit_module+0xb0/0x118 > [5.637627] __arm64_sys_finit_module+0x28/0x38 > [5.638031] el0_svc_common.constprop.0+0x7c/0x1f8 > [5.638456] do_el0_svc+0x2c/0x98 > [5.638754] el0_svc+0x18/0x48 > [5.639029] el0_sync_handler+0x8c/0x2d4 > [5.639378] el0_sync+0x158/0x180 > [5.639680] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (b941e400) > [5.640221] ---[ end trace 63675fe5d0021970 ]--- > > This turns out to be due to the rk3399-dmc driver looking for > an *undocumented* property (rockchip,pmu), and happily using > a NULL pointer when the property isn't there. > > The very existence of this driver in the kernel is highly doubtful > (I'd expect firmware to deal with this directly), but in the meantime > let's prevent it from
Re: [PATCH] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Marc, Am Montag, 22. Juni 2020, 15:31:55 CEST schrieb Marc Zyngier: > On Sat, 13 Jun 2020 11:24:35 +0100 > Marc Zyngier wrote: > > > Booting a recent kernel on a rk3399-based system (nanopc-t4), > > equipped with a recent u-boot and ATF results in the following: > > > > [5.607431] Unable to handle kernel NULL pointer dereference at virtual > > address 01e4 > > [5.608219] Mem abort info: > > [5.608469] ESR = 0x9604 > > [5.608749] EC = 0x25: DABT (current EL), IL = 32 bits > > [5.609223] SET = 0, FnV = 0 > > [5.609600] EA = 0, S1PTW = 0 > > [5.609891] Data abort info: > > [5.610149] ISV = 0, ISS = 0x0004 > > [5.610489] CM = 0, WnR = 0 > > [5.610757] user pgtable: 4k pages, 48-bit VAs, pgdp=e62fb000 > > [5.611326] [01e4] pgd=, p4d= > > [5.611931] Internal error: Oops: 9604 [#1] SMP > > [5.612363] Modules linked in: rockchip_thermal(E+) rk3399_dmc(E+) > > soundcore(E) dw_wdt(E) rockchip_dfi(E) nvmem_rockchip_efuse(E) > > pwm_rockchip(E) cfg80211(E+) rockchip_saradc(E) industrialio(E) rfkill(E) > > cpufreq_dt(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) > > crc16(E) mbcache(E) jbd2(E) realtek(E) nvme(E) nvme_core(E) t10_pi(E) > > xhci_plat_hcd(E) xhci_hcd(E) rtc_rk808(E) rk808_regulator(E) clk_rk808(E) > > dwc3(E) udc_core(E) roles(E) ulpi(E) rk808(E) > fan53555(E) rockchipdrm(E) analogix_dp(E) dw_hdmi(E) cec(E) > dw_mipi_dsi(E) fixed(E) dwc3_of_simple(E) phy_rockchip_emmc(E) > gpio_keys(E) drm_kms_helper(E) phy_rockchip_inno_usb2(E) > ehci_platform(E) dwmac_rk(E) stmmac_platform(E) phy_rockchip_pcie(E) > ohci_platform(E) ohci_hcd(E) rockchip_io_domain(E) stmmac(E) > phy_rockchip_typec(E) ehci_hcd(E) sdhci_of_arasan(E) mdio_xpcs(E) > sdhci_pltfm(E) cqhci(E) drm(E) sdhci(E) phylink(E) of_mdio(E) > usbcore(E) i2c_rk3x(E) dw_mmc_rockchip(E) dw_mmc_pltfm(E) dw_mmc(E) > fixed_phy(E) libphy(E) > > [5.612454] pl330(E) > > [5.620255] CPU: 1 PID: 270 Comm: systemd-udevd Tainted: GE > >5.7.0-13692-g83ae758d8b22 #1157 > > [5.621110] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS > > 2020.07-rc4-00023-g10d4cafe0f 06/10/2020 > > [5.621947] pstate: 4005 (nZcv daif -PAN -UAO BTYPE=--) > > [5.622446] pc : regmap_read+0x1c/0x80 > > [5.622787] lr : rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] > > [5.623299] sp : 8000126cb8a0 > > [5.623594] x29: 8000126cb8a0 x28: 8000126cbdb0 > > [5.624063] x27: f22dac40 x26: f6779800 > > [5.624533] x25: f6779810 x24: ffea > > [5.625002] x23: ffea x22: f65b74c8 > > [5.625471] x21: f783ca08 x20: f65b7480 > > [5.625941] x19: x18: 0001 > > [5.626410] x17: x16: > > [5.626878] x15: f22db138 x14: > > [5.627347] x13: 0018 x12: 80001106a8c7 > > [5.627817] x11: 0003 x10: 0101010101010101 > > [5.627861] systemd[1]: Found device SPCC M.2 PCIE SSD 3. > > [5.628286] x9 : 88d7c89c x8 : 7f7f7f7f7f7f7f7f > > [5.629238] x7 : fefefeff646c606d x6 : 1c0e0e0ee3e8e9f0 > > [5.629709] x5 : 706968630e0e0e1c x4 : 80808080 > > [5.630178] x3 : 937b1b5b1b434b80 x2 : 8000126cb944 > > [5.630648] x1 : 0308 x0 : > > [5.631119] Call trace: > > [5.631346] regmap_read+0x1c/0x80 > > [5.631654] rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] > > [5.632142] platform_drv_probe+0x5c/0xb0 > > [5.632500] really_probe+0xe4/0x448 > > [5.632819] driver_probe_device+0xfc/0x168 > > [5.633191] device_driver_attach+0x7c/0x88 > > [5.633567] __driver_attach+0xac/0x178 > > [5.633914] bus_for_each_dev+0x78/0xc8 > > [5.634261] driver_attach+0x2c/0x38 > > [5.634582] bus_add_driver+0x14c/0x230 > > [5.634925] driver_register+0x6c/0x128 > > [5.635269] __platform_driver_register+0x50/0x60 > > [5.635692] rk3399_dmcfreq_driver_init+0x2c/0x1000 [rk3399_dmc] > > [5.636226] do_one_initcall+0x50/0x230 > > [5.636569] do_init_module+0x60/0x248 > > [5.636902] load_module+0x21f8/0x28d8 > > [5.637237] __do_sys_finit_module+0xb0/0x118 > > [5.637627] __arm64_sys_finit_module+0x28/0x38 > > [5.638031] el0_svc_common.constprop.0+0x7c/0x1f8 > > [5.638456] do_el0_svc+0x2c/0x98 > > [5.638754] el0_svc+0x18/0x48 > > [5.639029] el0_sync_handler+0x8c/0x2d4 > > [5.639378] el0_sync+0x158/0x180 > > [5.639680] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (b941e400) > > [5.640221] ---[ end trace 63675fe5d0021970 ]--- > > > > This turns out to be due to the rk3399-dmc driver looking for > > an *undocumented* property (rockchip,pmu), and happily using > > a NULL pointer when the property isn't there. > > > > The very existence
Re: [PATCH] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Hi Heiko, On Sat, 13 Jun 2020 11:24:35 +0100 Marc Zyngier wrote: > Booting a recent kernel on a rk3399-based system (nanopc-t4), > equipped with a recent u-boot and ATF results in the following: > > [5.607431] Unable to handle kernel NULL pointer dereference at virtual > address 01e4 > [5.608219] Mem abort info: > [5.608469] ESR = 0x9604 > [5.608749] EC = 0x25: DABT (current EL), IL = 32 bits > [5.609223] SET = 0, FnV = 0 > [5.609600] EA = 0, S1PTW = 0 > [5.609891] Data abort info: > [5.610149] ISV = 0, ISS = 0x0004 > [5.610489] CM = 0, WnR = 0 > [5.610757] user pgtable: 4k pages, 48-bit VAs, pgdp=e62fb000 > [5.611326] [01e4] pgd=, p4d= > [5.611931] Internal error: Oops: 9604 [#1] SMP > [5.612363] Modules linked in: rockchip_thermal(E+) rk3399_dmc(E+) > soundcore(E) dw_wdt(E) rockchip_dfi(E) nvmem_rockchip_efuse(E) > pwm_rockchip(E) cfg80211(E+) rockchip_saradc(E) industrialio(E) rfkill(E) > cpufreq_dt(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) > crc16(E) mbcache(E) jbd2(E) realtek(E) nvme(E) nvme_core(E) t10_pi(E) > xhci_plat_hcd(E) xhci_hcd(E) rtc_rk808(E) rk808_regulator(E) clk_rk808(E) > dwc3(E) udc_core(E) roles(E) ulpi(E) rk808(E) fan53555(E) rockchipdrm(E) analogix_dp(E) dw_hdmi(E) cec(E) dw_mipi_dsi(E) fixed(E) dwc3_of_simple(E) phy_rockchip_emmc(E) gpio_keys(E) drm_kms_helper(E) phy_rockchip_inno_usb2(E) ehci_platform(E) dwmac_rk(E) stmmac_platform(E) phy_rockchip_pcie(E) ohci_platform(E) ohci_hcd(E) rockchip_io_domain(E) stmmac(E) phy_rockchip_typec(E) ehci_hcd(E) sdhci_of_arasan(E) mdio_xpcs(E) sdhci_pltfm(E) cqhci(E) drm(E) sdhci(E) phylink(E) of_mdio(E) usbcore(E) i2c_rk3x(E) dw_mmc_rockchip(E) dw_mmc_pltfm(E) dw_mmc(E) fixed_phy(E) libphy(E) > [5.612454] pl330(E) > [5.620255] CPU: 1 PID: 270 Comm: systemd-udevd Tainted: GE > 5.7.0-13692-g83ae758d8b22 #1157 > [5.621110] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS > 2020.07-rc4-00023-g10d4cafe0f 06/10/2020 > [5.621947] pstate: 4005 (nZcv daif -PAN -UAO BTYPE=--) > [5.622446] pc : regmap_read+0x1c/0x80 > [5.622787] lr : rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] > [5.623299] sp : 8000126cb8a0 > [5.623594] x29: 8000126cb8a0 x28: 8000126cbdb0 > [5.624063] x27: f22dac40 x26: f6779800 > [5.624533] x25: f6779810 x24: ffea > [5.625002] x23: ffea x22: f65b74c8 > [5.625471] x21: f783ca08 x20: f65b7480 > [5.625941] x19: x18: 0001 > [5.626410] x17: x16: > [5.626878] x15: f22db138 x14: > [5.627347] x13: 0018 x12: 80001106a8c7 > [5.627817] x11: 0003 x10: 0101010101010101 > [5.627861] systemd[1]: Found device SPCC M.2 PCIE SSD 3. > [5.628286] x9 : 88d7c89c x8 : 7f7f7f7f7f7f7f7f > [5.629238] x7 : fefefeff646c606d x6 : 1c0e0e0ee3e8e9f0 > [5.629709] x5 : 706968630e0e0e1c x4 : 80808080 > [5.630178] x3 : 937b1b5b1b434b80 x2 : 8000126cb944 > [5.630648] x1 : 0308 x0 : > [5.631119] Call trace: > [5.631346] regmap_read+0x1c/0x80 > [5.631654] rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] > [5.632142] platform_drv_probe+0x5c/0xb0 > [5.632500] really_probe+0xe4/0x448 > [5.632819] driver_probe_device+0xfc/0x168 > [5.633191] device_driver_attach+0x7c/0x88 > [5.633567] __driver_attach+0xac/0x178 > [5.633914] bus_for_each_dev+0x78/0xc8 > [5.634261] driver_attach+0x2c/0x38 > [5.634582] bus_add_driver+0x14c/0x230 > [5.634925] driver_register+0x6c/0x128 > [5.635269] __platform_driver_register+0x50/0x60 > [5.635692] rk3399_dmcfreq_driver_init+0x2c/0x1000 [rk3399_dmc] > [5.636226] do_one_initcall+0x50/0x230 > [5.636569] do_init_module+0x60/0x248 > [5.636902] load_module+0x21f8/0x28d8 > [5.637237] __do_sys_finit_module+0xb0/0x118 > [5.637627] __arm64_sys_finit_module+0x28/0x38 > [5.638031] el0_svc_common.constprop.0+0x7c/0x1f8 > [5.638456] do_el0_svc+0x2c/0x98 > [5.638754] el0_svc+0x18/0x48 > [5.639029] el0_sync_handler+0x8c/0x2d4 > [5.639378] el0_sync+0x158/0x180 > [5.639680] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (b941e400) > [5.640221] ---[ end trace 63675fe5d0021970 ]--- > > This turns out to be due to the rk3399-dmc driver looking for > an *undocumented* property (rockchip,pmu), and happily using > a NULL pointer when the property isn't there. > > The very existence of this driver in the kernel is highly doubtful > (I'd expect firmware to deal with this directly), but in the meantime > let's prevent it from oopsing the kernel at probe time if this > property isn't present. > > Signed-off-by: Marc Zyngier
[PATCH] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
Booting a recent kernel on a rk3399-based system (nanopc-t4), equipped with a recent u-boot and ATF results in the following: [5.607431] Unable to handle kernel NULL pointer dereference at virtual address 01e4 [5.608219] Mem abort info: [5.608469] ESR = 0x9604 [5.608749] EC = 0x25: DABT (current EL), IL = 32 bits [5.609223] SET = 0, FnV = 0 [5.609600] EA = 0, S1PTW = 0 [5.609891] Data abort info: [5.610149] ISV = 0, ISS = 0x0004 [5.610489] CM = 0, WnR = 0 [5.610757] user pgtable: 4k pages, 48-bit VAs, pgdp=e62fb000 [5.611326] [01e4] pgd=, p4d= [5.611931] Internal error: Oops: 9604 [#1] SMP [5.612363] Modules linked in: rockchip_thermal(E+) rk3399_dmc(E+) soundcore(E) dw_wdt(E) rockchip_dfi(E) nvmem_rockchip_efuse(E) pwm_rockchip(E) cfg80211(E+) rockchip_saradc(E) industrialio(E) rfkill(E) cpufreq_dt(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) realtek(E) nvme(E) nvme_core(E) t10_pi(E) xhci_plat_hcd(E) xhci_hcd(E) rtc_rk808(E) rk808_regulator(E) clk_rk808(E) dwc3(E) udc_core(E) roles(E) ulpi(E) rk808(E) fan53555(E) rockchipdrm(E) analogix_dp(E) dw_hdmi(E) cec(E) dw_mipi_dsi(E) fixed(E) dwc3_of_simple(E) phy_rockchip_emmc(E) gpio_keys(E) drm_kms_helper(E) phy_rockchip_inno_usb2(E) ehci_platform(E) dwmac_rk(E) stmmac_platform(E) phy_rockchip_pcie(E) ohci_platform(E) ohci_hcd(E) rockchip_io_domain(E) stmmac(E) phy_rockchip_typec(E) ehci_hcd(E) sdhci_of_arasan(E) mdio_xpcs(E) sdhci_pltfm(E) cqhci(E) drm(E) sdhci(E) phylink(E) of_mdio(E) usbcore(E) i2c_rk3x(E) dw_mmc_rockchip(E) dw_mmc_pltfm(E) dw_mmc(E) fixed_phy(E) libphy(E) [5.612454] pl330(E) [5.620255] CPU: 1 PID: 270 Comm: systemd-udevd Tainted: GE 5.7.0-13692-g83ae758d8b22 #1157 [5.621110] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2020.07-rc4-00023-g10d4cafe0f 06/10/2020 [5.621947] pstate: 4005 (nZcv daif -PAN -UAO BTYPE=--) [5.622446] pc : regmap_read+0x1c/0x80 [5.622787] lr : rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] [5.623299] sp : 8000126cb8a0 [5.623594] x29: 8000126cb8a0 x28: 8000126cbdb0 [5.624063] x27: f22dac40 x26: f6779800 [5.624533] x25: f6779810 x24: ffea [5.625002] x23: ffea x22: f65b74c8 [5.625471] x21: f783ca08 x20: f65b7480 [5.625941] x19: x18: 0001 [5.626410] x17: x16: [5.626878] x15: f22db138 x14: [5.627347] x13: 0018 x12: 80001106a8c7 [5.627817] x11: 0003 x10: 0101010101010101 [5.627861] systemd[1]: Found device SPCC M.2 PCIE SSD 3. [5.628286] x9 : 88d7c89c x8 : 7f7f7f7f7f7f7f7f [5.629238] x7 : fefefeff646c606d x6 : 1c0e0e0ee3e8e9f0 [5.629709] x5 : 706968630e0e0e1c x4 : 80808080 [5.630178] x3 : 937b1b5b1b434b80 x2 : 8000126cb944 [5.630648] x1 : 0308 x0 : [5.631119] Call trace: [5.631346] regmap_read+0x1c/0x80 [5.631654] rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc] [5.632142] platform_drv_probe+0x5c/0xb0 [5.632500] really_probe+0xe4/0x448 [5.632819] driver_probe_device+0xfc/0x168 [5.633191] device_driver_attach+0x7c/0x88 [5.633567] __driver_attach+0xac/0x178 [5.633914] bus_for_each_dev+0x78/0xc8 [5.634261] driver_attach+0x2c/0x38 [5.634582] bus_add_driver+0x14c/0x230 [5.634925] driver_register+0x6c/0x128 [5.635269] __platform_driver_register+0x50/0x60 [5.635692] rk3399_dmcfreq_driver_init+0x2c/0x1000 [rk3399_dmc] [5.636226] do_one_initcall+0x50/0x230 [5.636569] do_init_module+0x60/0x248 [5.636902] load_module+0x21f8/0x28d8 [5.637237] __do_sys_finit_module+0xb0/0x118 [5.637627] __arm64_sys_finit_module+0x28/0x38 [5.638031] el0_svc_common.constprop.0+0x7c/0x1f8 [5.638456] do_el0_svc+0x2c/0x98 [5.638754] el0_svc+0x18/0x48 [5.639029] el0_sync_handler+0x8c/0x2d4 [5.639378] el0_sync+0x158/0x180 [5.639680] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (b941e400) [5.640221] ---[ end trace 63675fe5d0021970 ]--- This turns out to be due to the rk3399-dmc driver looking for an *undocumented* property (rockchip,pmu), and happily using a NULL pointer when the property isn't there. The very existence of this driver in the kernel is highly doubtful (I'd expect firmware to deal with this directly), but in the meantime let's prevent it from oopsing the kernel at probe time if this property isn't present. Signed-off-by: Marc Zyngier --- drivers/devfreq/rk3399_dmc.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/devfreq/rk3399_dmc.c b/drivers/devfreq/rk3399_dmc.c index 24f04f78285b..bee233a2e0ce 100644 ---
[PATCH 5.6 102/194] drm/i915/gvt: Fix kernel oops for 3-level ppgtt guest
From: Zhenyu Wang [ Upstream commit 72a7a9925e2beea09b109dffb3384c9bf920d9da ] As i915 won't allocate extra PDP for current default PML4 table, so for 3-level ppgtt guest, we would hit kernel pointer access failure on extra PDP pointers. So this trys to bypass that now. It won't impact real shadow PPGTT setup, so guest context still works. This is verified on 4.15 guest kernel with i915.enable_ppgtt=1 to force on old aliasing ppgtt behavior. Fixes: 4f15665ccbba ("drm/i915: Add ppgtt to GVT GEM context") Reviewed-by: Xiong Zhang Signed-off-by: Zhenyu Wang Link: http://patchwork.freedesktop.org/patch/msgid/20200506095918.124913-1-zhen...@linux.intel.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/i915/gvt/scheduler.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index 685d1e04a5ff6..709ad181bc94a 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -375,7 +375,11 @@ static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload, for (i = 0; i < GVT_RING_CTX_NR_PDPS; i++) { struct i915_page_directory * const pd = i915_pd_entry(ppgtt->pd, i); - + /* skip now as current i915 ppgtt alloc won't allocate + top level pdp for non 4-level table, won't impact + shadow ppgtt. */ + if (!pd) + break; px_dma(pd) = mm->ppgtt_mm.shadow_pdps[i]; } } -- 2.20.1
[PATCH 5.4 078/147] drm/i915/gvt: Fix kernel oops for 3-level ppgtt guest
From: Zhenyu Wang [ Upstream commit 72a7a9925e2beea09b109dffb3384c9bf920d9da ] As i915 won't allocate extra PDP for current default PML4 table, so for 3-level ppgtt guest, we would hit kernel pointer access failure on extra PDP pointers. So this trys to bypass that now. It won't impact real shadow PPGTT setup, so guest context still works. This is verified on 4.15 guest kernel with i915.enable_ppgtt=1 to force on old aliasing ppgtt behavior. Fixes: 4f15665ccbba ("drm/i915: Add ppgtt to GVT GEM context") Reviewed-by: Xiong Zhang Signed-off-by: Zhenyu Wang Link: http://patchwork.freedesktop.org/patch/msgid/20200506095918.124913-1-zhen...@linux.intel.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/i915/gvt/scheduler.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index 6c79d16b381ea..058dcd5416440 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -374,7 +374,11 @@ static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload, for (i = 0; i < GVT_RING_CTX_NR_PDPS; i++) { struct i915_page_directory * const pd = i915_pd_entry(ppgtt->pd, i); - + /* skip now as current i915 ppgtt alloc won't allocate + top level pdp for non 4-level table, won't impact + shadow ppgtt. */ + if (!pd) + break; px_dma(pd) = mm->ppgtt_mm.shadow_pdps[i]; } } -- 2.20.1
[PATCH 4.19 059/114] drm/amdgpu: Fix KFD-related kernel oops on Hawaii
From: Felix Kuehling [ Upstream commit dcafbd50f2e4d5cc964aae409fb5691b743fba23 ] Hawaii needs to flush caches explicitly, submitting an IB in a user VMID from kernel mode. There is no s_fence in this case. Fixes: eb3961a57424 ("drm/amdgpu: remove fence context from the job") Signed-off-by: Felix Kuehling Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 51b5e977ca885..f4e9d1b10e3ed 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -139,7 +139,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, /* ring tests don't use a job */ if (job) { vm = job->vm; - fence_ctx = job->base.s_fence->scheduled.context; + fence_ctx = job->base.s_fence ? + job->base.s_fence->scheduled.context : 0; } else { vm = NULL; fence_ctx = 0; -- 2.20.1
[PATCH 5.3 104/148] drm/amdgpu: Fix KFD-related kernel oops on Hawaii
From: Felix Kuehling [ Upstream commit dcafbd50f2e4d5cc964aae409fb5691b743fba23 ] Hawaii needs to flush caches explicitly, submitting an IB in a user VMID from kernel mode. There is no s_fence in this case. Fixes: eb3961a57424 ("drm/amdgpu: remove fence context from the job") Signed-off-by: Felix Kuehling Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 7850084a05e3a..60655834d6498 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -143,7 +143,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, /* ring tests don't use a job */ if (job) { vm = job->vm; - fence_ctx = job->base.s_fence->scheduled.context; + fence_ctx = job->base.s_fence ? + job->base.s_fence->scheduled.context : 0; } else { vm = NULL; fence_ctx = 0; -- 2.20.1
Applied "spi: stm32-qspi: Fix kernel oops when unbinding driver" to the spi tree
The patch spi: stm32-qspi: Fix kernel oops when unbinding driver has been applied to the spi tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-5.4 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From 3c0af1dd2fe78adc02fe21f6cfe7d6cb8602573e Mon Sep 17 00:00:00 2001 From: Patrice Chotard Date: Fri, 4 Oct 2019 14:36:06 +0200 Subject: [PATCH] spi: stm32-qspi: Fix kernel oops when unbinding driver spi_master_put() must only be called in .probe() in case of error. As devm_spi_register_master() is used during probe, spi_master_put() mustn't be called in .remove() callback. It fixes the following kernel WARNING/Oops when executing echo "58003000.spi" > /sys/bus/platform/drivers/stm32-qspi/unbind : [ cut here ] WARNING: CPU: 1 PID: 496 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x9c/0xa4 kernfs: can not remove 'uevent', no directory Modules linked in: CPU: 1 PID: 496 Comm: sh Not tainted 5.3.0-rc1-00219-ga0e07bb51a37 #62 Hardware name: STM32 (Device Tree Support) [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0xb4/0xc8) [] (dump_stack) from [] (__warn.part.3+0xbc/0xd8) [] (__warn.part.3) from [] (warn_slowpath_fmt+0x68/0x8c) [] (warn_slowpath_fmt) from [] (kernfs_remove_by_name_ns+0x9c/0xa4) [] (kernfs_remove_by_name_ns) from [] (device_del+0x128/0x358) [] (device_del) from [] (device_unregister+0x24/0x64) [] (device_unregister) from [] (spi_unregister_controller+0x88/0xe8) [] (spi_unregister_controller) from [] (release_nodes+0x1bc/0x200) [] (release_nodes) from [] (device_release_driver_internal+0xec/0x1ac) [] (device_release_driver_internal) from [] (unbind_store+0x60/0xd4) [] (unbind_store) from [] (kernfs_fop_write+0xe8/0x1c4) [] (kernfs_fop_write) from [] (__vfs_write+0x2c/0x1c0) [] (__vfs_write) from [] (vfs_write+0xa4/0x184) [] (vfs_write) from [] (ksys_write+0x58/0xd0) [] (ksys_write) from [] (ret_fast_syscall+0x0/0x54) Exception stack(0xdd289fa8 to 0xdd289ff0) 9fa0: 006c 000e20e8 0001 000e20e8 000d 9fc0: 006c 000e20e8 b6f87da0 0004 000d 000d 9fe0: 0004 bee639b0 b6f2286b b6eaf6c6 ---[ end trace 1b15df8a02d76aef ]--- [ cut here ] WARNING: CPU: 1 PID: 496 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x9c/0xa4 kernfs: can not remove 'online', no directory Modules linked in: CPU: 1 PID: 496 Comm: sh Tainted: GW 5.3.0-rc1-00219-ga0e07bb51a37 #62 Hardware name: STM32 (Device Tree Support) [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0xb4/0xc8) [] (dump_stack) from [] (__warn.part.3+0xbc/0xd8) [] (__warn.part.3) from [] (warn_slowpath_fmt+0x68/0x8c) [] (warn_slowpath_fmt) from [] (kernfs_remove_by_name_ns+0x9c/0xa4) [] (kernfs_remove_by_name_ns) from [] (device_remove_attrs+0x20/0x5c) [] (device_remove_attrs) from [] (device_del+0x134/0x358) [] (device_del) from [] (device_unregister+0x24/0x64) [] (device_unregister) from [] (spi_unregister_controller+0x88/0xe8) [] (spi_unregister_controller) from [] (release_nodes+0x1bc/0x200) [] (release_nodes) from [] (device_release_driver_internal+0xec/0x1ac) [] (device_release_driver_internal) from [] (unbind_store+0x60/0xd4) [] (unbind_store) from [] (kernfs_fop_write+0xe8/0x1c4) [] (kernfs_fop_write) from [] (__vfs_write+0x2c/0x1c0) [] (__vfs_write) from [] (vfs_write+0xa4/0x184) [] (vfs_write) from [] (ksys_write+0x58/0xd0) [] (ksys_write) from [] (ret_fast_syscall+0x0/0x54) Exception stack(0xdd289fa8 to 0xdd289ff0) 9fa0: 006c 000e20e8 0001 000e20e8 000d 9fc0: 006c 000e20e8 b6f87da0 0004 000d 000d 9fe0: 0004 bee639b0 b6f2286b b6eaf6c6 ---[ end trace 1b15df8a02d76af0 ]--- 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 0050 pgd = e612f14d [0050] *pgd=ff1f5835 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: CPU: 1 PID: 496 Comm: sh Tainted: GW 5.3.0-rc1-00219-ga0e07bb51a37 #62 Hardware name: STM32 (Device Tree Support) PC is at kernfs_find_ns+0x8/0xfc LR is at kernfs_find_and_get_ns+0x30/0x48 pc : []lr : []
spi: stm32-qspi: Fix kernel oops when unbinding driver
From: Patrice Chotard spi_master_put() must only be called in .probe() in case of error. As devm_spi_register_master() is used during probe, spi_master_put() mustn't be called in .remove() callback. It fixes the following kernel WARNING/Oops when executing echo "58003000.spi" > /sys/bus/platform/drivers/stm32-qspi/unbind : [ cut here ] WARNING: CPU: 1 PID: 496 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x9c/0xa4 kernfs: can not remove 'uevent', no directory Modules linked in: CPU: 1 PID: 496 Comm: sh Not tainted 5.3.0-rc1-00219-ga0e07bb51a37 #62 Hardware name: STM32 (Device Tree Support) [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0xb4/0xc8) [] (dump_stack) from [] (__warn.part.3+0xbc/0xd8) [] (__warn.part.3) from [] (warn_slowpath_fmt+0x68/0x8c) [] (warn_slowpath_fmt) from [] (kernfs_remove_by_name_ns+0x9c/0xa4) [] (kernfs_remove_by_name_ns) from [] (device_del+0x128/0x358) [] (device_del) from [] (device_unregister+0x24/0x64) [] (device_unregister) from [] (spi_unregister_controller+0x88/0xe8) [] (spi_unregister_controller) from [] (release_nodes+0x1bc/0x200) [] (release_nodes) from [] (device_release_driver_internal+0xec/0x1ac) [] (device_release_driver_internal) from [] (unbind_store+0x60/0xd4) [] (unbind_store) from [] (kernfs_fop_write+0xe8/0x1c4) [] (kernfs_fop_write) from [] (__vfs_write+0x2c/0x1c0) [] (__vfs_write) from [] (vfs_write+0xa4/0x184) [] (vfs_write) from [] (ksys_write+0x58/0xd0) [] (ksys_write) from [] (ret_fast_syscall+0x0/0x54) Exception stack(0xdd289fa8 to 0xdd289ff0) 9fa0: 006c 000e20e8 0001 000e20e8 000d 9fc0: 006c 000e20e8 b6f87da0 0004 000d 000d 9fe0: 0004 bee639b0 b6f2286b b6eaf6c6 ---[ end trace 1b15df8a02d76aef ]--- [ cut here ] WARNING: CPU: 1 PID: 496 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x9c/0xa4 kernfs: can not remove 'online', no directory Modules linked in: CPU: 1 PID: 496 Comm: sh Tainted: GW 5.3.0-rc1-00219-ga0e07bb51a37 #62 Hardware name: STM32 (Device Tree Support) [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0xb4/0xc8) [] (dump_stack) from [] (__warn.part.3+0xbc/0xd8) [] (__warn.part.3) from [] (warn_slowpath_fmt+0x68/0x8c) [] (warn_slowpath_fmt) from [] (kernfs_remove_by_name_ns+0x9c/0xa4) [] (kernfs_remove_by_name_ns) from [] (device_remove_attrs+0x20/0x5c) [] (device_remove_attrs) from [] (device_del+0x134/0x358) [] (device_del) from [] (device_unregister+0x24/0x64) [] (device_unregister) from [] (spi_unregister_controller+0x88/0xe8) [] (spi_unregister_controller) from [] (release_nodes+0x1bc/0x200) [] (release_nodes) from [] (device_release_driver_internal+0xec/0x1ac) [] (device_release_driver_internal) from [] (unbind_store+0x60/0xd4) [] (unbind_store) from [] (kernfs_fop_write+0xe8/0x1c4) [] (kernfs_fop_write) from [] (__vfs_write+0x2c/0x1c0) [] (__vfs_write) from [] (vfs_write+0xa4/0x184) [] (vfs_write) from [] (ksys_write+0x58/0xd0) [] (ksys_write) from [] (ret_fast_syscall+0x0/0x54) Exception stack(0xdd289fa8 to 0xdd289ff0) 9fa0: 006c 000e20e8 0001 000e20e8 000d 9fc0: 006c 000e20e8 b6f87da0 0004 000d 000d 9fe0: 0004 bee639b0 b6f2286b b6eaf6c6 ---[ end trace 1b15df8a02d76af0 ]--- 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 0050 pgd = e612f14d [0050] *pgd=ff1f5835 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: CPU: 1 PID: 496 Comm: sh Tainted: GW 5.3.0-rc1-00219-ga0e07bb51a37 #62 Hardware name: STM32 (Device Tree Support) PC is at kernfs_find_ns+0x8/0xfc LR is at kernfs_find_and_get_ns+0x30/0x48 pc : []lr : []psr: 40010013 sp : dd289dac ip : fp : r10: r9 : def6ec58 r8 : dd289e54 r7 : r6 : c0abb234 r5 : r4 : c0d26a30 r3 : ddab5080 r2 : r1 : c0abb234 r0 : Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: dd11c06a DAC: 0051 Process sh (pid: 496, stack limit = 0xe13a592d) Stack: (0xdd289dac to 0xdd28a000) 9da0:c0d26a30 c0abb234 c02e4ac8 9dc0: c0976b44 def6ec00 dea53810 dd289e54 c02e864c c0a61a48 c0a4a5ec 9de0: c0d630a8 def6ec00 c0d04c48 c02e86e0 def6ec00 de909338 c0d04c48 c05833b0 9e00: c0638144 dd289e54 def59900 475b3ee5 def6ec00 9e20: def6ec00 def59b80 dd289e54 def59900 c05835f8 def6ec00 c0638dac 9e40: 000a dea53810 c0d04c48 c058c580 dea53810 def59500 def59b80 475b3ee5 9e60: ddc63e00 dea53810 dea3fe10 c0d63a0c dea53810 ddc63e00 dd289f78 dd240d10 9e80: c0588a44 c0d59a20 000d c0d63a0c c0586840 000d dd240d00 9ea0: ddc63e00 c02e64e8 c0d04c48 dd9bbcc0 9ec0: c02e6400 dd289f78
[PATCH 5.2 137/313] PM / devfreq: Fix kernel oops on governor module load
From: Ezequiel Garcia [ Upstream commit 7544fd7f384591038646d3cd9efb311ab4509e24 ] A bit unexpectedly (but still documented), request_module may return a positive value, in case of a modprobe error. This is currently causing issues in the devfreq framework. When a request_module exits with a positive value, we currently return that via ERR_PTR. However, because the value is positive, it's not a ERR_VALUE proper, and is therefore treated as a valid struct devfreq_governor pointer, leading to a kernel oops. Fix this by returning -EINVAL if request_module returns a positive value. Fixes: b53b0128052ff ("PM / devfreq: Fix static checker warning in try_then_request_governor") Signed-off-by: Ezequiel Garcia Reviewed-by: Chanwoo Choi Signed-off-by: MyungJoo Ham Signed-off-by: Sasha Levin --- drivers/devfreq/devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index ab22bf8a12d69..a0e19802149fc 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -254,7 +254,7 @@ static struct devfreq_governor *try_then_request_governor(const char *name) /* Restore previous state before return */ mutex_lock(_list_lock); if (err) - return ERR_PTR(err); + return (err < 0) ? ERR_PTR(err) : ERR_PTR(-EINVAL); governor = find_devfreq_governor(name); } -- 2.20.1
[PATCH 5.3 152/344] PM / devfreq: Fix kernel oops on governor module load
From: Ezequiel Garcia [ Upstream commit 7544fd7f384591038646d3cd9efb311ab4509e24 ] A bit unexpectedly (but still documented), request_module may return a positive value, in case of a modprobe error. This is currently causing issues in the devfreq framework. When a request_module exits with a positive value, we currently return that via ERR_PTR. However, because the value is positive, it's not a ERR_VALUE proper, and is therefore treated as a valid struct devfreq_governor pointer, leading to a kernel oops. Fix this by returning -EINVAL if request_module returns a positive value. Fixes: b53b0128052ff ("PM / devfreq: Fix static checker warning in try_then_request_governor") Signed-off-by: Ezequiel Garcia Reviewed-by: Chanwoo Choi Signed-off-by: MyungJoo Ham Signed-off-by: Sasha Levin --- drivers/devfreq/devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index ab22bf8a12d69..a0e19802149fc 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -254,7 +254,7 @@ static struct devfreq_governor *try_then_request_governor(const char *name) /* Restore previous state before return */ mutex_lock(_list_lock); if (err) - return ERR_PTR(err); + return (err < 0) ? ERR_PTR(err) : ERR_PTR(-EINVAL); governor = find_devfreq_governor(name); } -- 2.20.1
[PATCH AUTOSEL 5.2 18/63] drm/amdgpu: Fix KFD-related kernel oops on Hawaii
From: Felix Kuehling [ Upstream commit dcafbd50f2e4d5cc964aae409fb5691b743fba23 ] Hawaii needs to flush caches explicitly, submitting an IB in a user VMID from kernel mode. There is no s_fence in this case. Fixes: eb3961a57424 ("drm/amdgpu: remove fence context from the job") Signed-off-by: Felix Kuehling Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index fe393a46f8811..5eed2423dbb5e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -141,7 +141,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, /* ring tests don't use a job */ if (job) { vm = job->vm; - fence_ctx = job->base.s_fence->scheduled.context; + fence_ctx = job->base.s_fence ? + job->base.s_fence->scheduled.context : 0; } else { vm = NULL; fence_ctx = 0; -- 2.20.1
[PATCH AUTOSEL 4.19 13/43] drm/amdgpu: Fix KFD-related kernel oops on Hawaii
From: Felix Kuehling [ Upstream commit dcafbd50f2e4d5cc964aae409fb5691b743fba23 ] Hawaii needs to flush caches explicitly, submitting an IB in a user VMID from kernel mode. There is no s_fence in this case. Fixes: eb3961a57424 ("drm/amdgpu: remove fence context from the job") Signed-off-by: Felix Kuehling Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 51b5e977ca885..f4e9d1b10e3ed 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -139,7 +139,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, /* ring tests don't use a job */ if (job) { vm = job->vm; - fence_ctx = job->base.s_fence->scheduled.context; + fence_ctx = job->base.s_fence ? + job->base.s_fence->scheduled.context : 0; } else { vm = NULL; fence_ctx = 0; -- 2.20.1
[PATCH AUTOSEL 5.3 119/203] PM / devfreq: Fix kernel oops on governor module load
From: Ezequiel Garcia [ Upstream commit 7544fd7f384591038646d3cd9efb311ab4509e24 ] A bit unexpectedly (but still documented), request_module may return a positive value, in case of a modprobe error. This is currently causing issues in the devfreq framework. When a request_module exits with a positive value, we currently return that via ERR_PTR. However, because the value is positive, it's not a ERR_VALUE proper, and is therefore treated as a valid struct devfreq_governor pointer, leading to a kernel oops. Fix this by returning -EINVAL if request_module returns a positive value. Fixes: b53b0128052ff ("PM / devfreq: Fix static checker warning in try_then_request_governor") Signed-off-by: Ezequiel Garcia Reviewed-by: Chanwoo Choi Signed-off-by: MyungJoo Ham Signed-off-by: Sasha Levin --- drivers/devfreq/devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index ab22bf8a12d69..a0e19802149fc 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -254,7 +254,7 @@ static struct devfreq_governor *try_then_request_governor(const char *name) /* Restore previous state before return */ mutex_lock(_list_lock); if (err) - return ERR_PTR(err); + return (err < 0) ? ERR_PTR(err) : ERR_PTR(-EINVAL); governor = find_devfreq_governor(name); } -- 2.20.1
[PATCH AUTOSEL 5.2 108/185] PM / devfreq: Fix kernel oops on governor module load
From: Ezequiel Garcia [ Upstream commit 7544fd7f384591038646d3cd9efb311ab4509e24 ] A bit unexpectedly (but still documented), request_module may return a positive value, in case of a modprobe error. This is currently causing issues in the devfreq framework. When a request_module exits with a positive value, we currently return that via ERR_PTR. However, because the value is positive, it's not a ERR_VALUE proper, and is therefore treated as a valid struct devfreq_governor pointer, leading to a kernel oops. Fix this by returning -EINVAL if request_module returns a positive value. Fixes: b53b0128052ff ("PM / devfreq: Fix static checker warning in try_then_request_governor") Signed-off-by: Ezequiel Garcia Reviewed-by: Chanwoo Choi Signed-off-by: MyungJoo Ham Signed-off-by: Sasha Levin --- drivers/devfreq/devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index ab22bf8a12d69..a0e19802149fc 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -254,7 +254,7 @@ static struct devfreq_governor *try_then_request_governor(const char *name) /* Restore previous state before return */ mutex_lock(_list_lock); if (err) - return ERR_PTR(err); + return (err < 0) ? ERR_PTR(err) : ERR_PTR(-EINVAL); governor = find_devfreq_governor(name); } -- 2.20.1
[PATCH 5.2 081/162] SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
[ Upstream commit ee9d66182392695535cc9fccfcb40c16f72de2a9 ] Fix kernel oops when mounting a encryptData CIFS share with CONFIG_DEBUG_VIRTUAL Signed-off-by: Sebastien Tisserant Reviewed-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/cifs/smb2ops.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index ae10d6e297c3a..42de31d206169 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -3439,7 +3439,15 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len, static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { - sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); + void *addr; + /* +* VMAP_STACK (at least) puts stack into the vmalloc address space +*/ + if (is_vmalloc_addr(buf)) + addr = vmalloc_to_page(buf); + else + addr = virt_to_page(buf); + sg_set_page(sg, addr, buflen, offset_in_page(buf)); } /* Assumes the first rqst has a transform header as the first iov. -- 2.20.1
[PATCH 4.19 40/98] SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
[ Upstream commit ee9d66182392695535cc9fccfcb40c16f72de2a9 ] Fix kernel oops when mounting a encryptData CIFS share with CONFIG_DEBUG_VIRTUAL Signed-off-by: Sebastien Tisserant Reviewed-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/cifs/smb2ops.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 97fdbec54db97..cc9e846a38658 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -2545,7 +2545,15 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len, static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { - sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); + void *addr; + /* +* VMAP_STACK (at least) puts stack into the vmalloc address space +*/ + if (is_vmalloc_addr(buf)) + addr = vmalloc_to_page(buf); + else + addr = virt_to_page(buf); + sg_set_page(sg, addr, buflen, offset_in_page(buf)); } /* Assumes the first rqst has a transform header as the first iov. -- 2.20.1
[PATCH 4.14 23/62] SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
[ Upstream commit ee9d66182392695535cc9fccfcb40c16f72de2a9 ] Fix kernel oops when mounting a encryptData CIFS share with CONFIG_DEBUG_VIRTUAL Signed-off-by: Sebastien Tisserant Reviewed-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/cifs/smb2ops.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 23326b0cd5628..58a502e622aa4 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -2168,7 +2168,15 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, struct smb_rqst *old_rq) static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { - sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); + void *addr; + /* +* VMAP_STACK (at least) puts stack into the vmalloc address space +*/ + if (is_vmalloc_addr(buf)) + addr = vmalloc_to_page(buf); + else + addr = virt_to_page(buf); + sg_set_page(sg, addr, buflen, offset_in_page(buf)); } static struct scatterlist * -- 2.20.1
[PATCH AUTOSEL 5.2 091/123] SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
From: Sebastien Tisserant [ Upstream commit ee9d66182392695535cc9fccfcb40c16f72de2a9 ] Fix kernel oops when mounting a encryptData CIFS share with CONFIG_DEBUG_VIRTUAL Signed-off-by: Sebastien Tisserant Reviewed-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/cifs/smb2ops.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index ae10d6e297c3a..42de31d206169 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -3439,7 +3439,15 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len, static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { - sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); + void *addr; + /* +* VMAP_STACK (at least) puts stack into the vmalloc address space +*/ + if (is_vmalloc_addr(buf)) + addr = vmalloc_to_page(buf); + else + addr = virt_to_page(buf); + sg_set_page(sg, addr, buflen, offset_in_page(buf)); } /* Assumes the first rqst has a transform header as the first iov. -- 2.20.1
[PATCH AUTOSEL 4.19 47/68] SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
From: Sebastien Tisserant [ Upstream commit ee9d66182392695535cc9fccfcb40c16f72de2a9 ] Fix kernel oops when mounting a encryptData CIFS share with CONFIG_DEBUG_VIRTUAL Signed-off-by: Sebastien Tisserant Reviewed-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/cifs/smb2ops.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 97fdbec54db97..cc9e846a38658 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -2545,7 +2545,15 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len, static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { - sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); + void *addr; + /* +* VMAP_STACK (at least) puts stack into the vmalloc address space +*/ + if (is_vmalloc_addr(buf)) + addr = vmalloc_to_page(buf); + else + addr = virt_to_page(buf); + sg_set_page(sg, addr, buflen, offset_in_page(buf)); } /* Assumes the first rqst has a transform header as the first iov. -- 2.20.1
[PATCH AUTOSEL 4.14 29/44] SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
From: Sebastien Tisserant [ Upstream commit ee9d66182392695535cc9fccfcb40c16f72de2a9 ] Fix kernel oops when mounting a encryptData CIFS share with CONFIG_DEBUG_VIRTUAL Signed-off-by: Sebastien Tisserant Reviewed-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/cifs/smb2ops.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 23326b0cd5628..58a502e622aa4 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -2168,7 +2168,15 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, struct smb_rqst *old_rq) static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { - sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); + void *addr; + /* +* VMAP_STACK (at least) puts stack into the vmalloc address space +*/ + if (is_vmalloc_addr(buf)) + addr = vmalloc_to_page(buf); + else + addr = virt_to_page(buf); + sg_set_page(sg, addr, buflen, offset_in_page(buf)); } static struct scatterlist * -- 2.20.1
[PATCH 5.1 33/96] ASoC: Intel: cht_bsw_nau8824: fix kernel oops with platform_name override
[ Upstream commit 096701e8131425044d2054a0c210d6ea24ee7386 ] The platform override code uses devm_ functions to allocate memory for the new name but the card device is not initialized. Fix by moving the init earlier. Fixes: 4506db8043341 ("ASoC: Intel: cht_bsw_nau8824: platform name fixup support") Signed-off-by: Pierre-Louis Bossart Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/cht_bsw_nau8824.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/intel/boards/cht_bsw_nau8824.c b/sound/soc/intel/boards/cht_bsw_nau8824.c index 02c2fa239331..20fae391c75a 100644 --- a/sound/soc/intel/boards/cht_bsw_nau8824.c +++ b/sound/soc/intel/boards/cht_bsw_nau8824.c @@ -257,6 +257,7 @@ static int snd_cht_mc_probe(struct platform_device *pdev) snd_soc_card_set_drvdata(_soc_card_cht, drv); /* override plaform name, if required */ + snd_soc_card_cht.dev = >dev; mach = (>dev)->platform_data; platform_name = mach->mach_params.platform; @@ -266,7 +267,6 @@ static int snd_cht_mc_probe(struct platform_device *pdev) return ret_val; /* register the soc card */ - snd_soc_card_cht.dev = >dev; ret_val = devm_snd_soc_register_card(>dev, _soc_card_cht); if (ret_val) { dev_err(>dev, -- 2.20.1
[PATCH 5.1 31/96] ASoC: Intel: cht_bsw_max98090: fix kernel oops with platform_name override
[ Upstream commit fb54555134b9b17835545e4d096b5550c27eed64 ] The platform override code uses devm_ functions to allocate memory for the new name but the card device is not initialized. Fix by moving the init earlier. Fixes: 7e7e24d7c7ff0 ("ASoC: Intel: cht_bsw_max98090_ti: platform name fixup support") Signed-off-by: Pierre-Louis Bossart Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/cht_bsw_max98090_ti.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/intel/boards/cht_bsw_max98090_ti.c b/sound/soc/intel/boards/cht_bsw_max98090_ti.c index c0e0844f75b9..572e336ae0f9 100644 --- a/sound/soc/intel/boards/cht_bsw_max98090_ti.c +++ b/sound/soc/intel/boards/cht_bsw_max98090_ti.c @@ -454,6 +454,7 @@ static int snd_cht_mc_probe(struct platform_device *pdev) } /* override plaform name, if required */ + snd_soc_card_cht.dev = >dev; mach = (>dev)->platform_data; platform_name = mach->mach_params.platform; @@ -463,7 +464,6 @@ static int snd_cht_mc_probe(struct platform_device *pdev) return ret_val; /* register the soc card */ - snd_soc_card_cht.dev = >dev; snd_soc_card_set_drvdata(_soc_card_cht, drv); if (drv->quirks & QUIRK_PMC_PLT_CLK_0) -- 2.20.1
[PATCH 5.1 34/96] ASoC: Intel: cht_bsw_rt5672: fix kernel oops with platform_name override
[ Upstream commit 9bbc799318a34061703f2a980e2b6df7fc6760f0 ] The platform override code uses devm_ functions to allocate memory for the new name but the card device is not initialized. Fix by moving the init earlier. Fixes: f403906da05cd ("ASoC: Intel: cht_bsw_rt5672: platform name fixup support") Signed-off-by: Pierre-Louis Bossart Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/cht_bsw_rt5672.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/intel/boards/cht_bsw_rt5672.c b/sound/soc/intel/boards/cht_bsw_rt5672.c index 3d5a2b3a06f0..87ce3857376d 100644 --- a/sound/soc/intel/boards/cht_bsw_rt5672.c +++ b/sound/soc/intel/boards/cht_bsw_rt5672.c @@ -425,6 +425,7 @@ static int snd_cht_mc_probe(struct platform_device *pdev) } /* override plaform name, if required */ + snd_soc_card_cht.dev = >dev; platform_name = mach->mach_params.platform; ret_val = snd_soc_fixup_dai_links_platform_name(_soc_card_cht, @@ -442,7 +443,6 @@ static int snd_cht_mc_probe(struct platform_device *pdev) snd_soc_card_set_drvdata(_soc_card_cht, drv); /* register the soc card */ - snd_soc_card_cht.dev = >dev; ret_val = devm_snd_soc_register_card(>dev, _soc_card_cht); if (ret_val) { dev_err(>dev, -- 2.20.1
[PATCH 5.1 32/96] ASoC: Intel: bytcht_es8316: fix kernel oops with platform_name override
[ Upstream commit 79136a016add1acb690fe8d96be50dd22a143d26 ] The platform override code uses devm_ functions to allocate memory for the new name but the card device is not initialized. Fix by moving the init earlier. Fixes: e4bc6b1195f64 ("ASoC: Intel: bytcht_es8316: platform name fixup support") Signed-off-by: Pierre-Louis Bossart Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/bytcht_es8316.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/intel/boards/bytcht_es8316.c b/sound/soc/intel/boards/bytcht_es8316.c index d2a7e6ba11ae..1c686f83220a 100644 --- a/sound/soc/intel/boards/bytcht_es8316.c +++ b/sound/soc/intel/boards/bytcht_es8316.c @@ -471,6 +471,7 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev) } /* override plaform name, if required */ + byt_cht_es8316_card.dev = dev; platform_name = mach->mach_params.platform; ret = snd_soc_fixup_dai_links_platform_name(_cht_es8316_card, @@ -538,7 +539,6 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev) (quirk & BYT_CHT_ES8316_MONO_SPEAKER) ? "mono" : "stereo", mic_name[BYT_CHT_ES8316_MAP(quirk)]); byt_cht_es8316_card.long_name = long_name; - byt_cht_es8316_card.dev = dev; snd_soc_card_set_drvdata(_cht_es8316_card, priv); ret = devm_snd_soc_register_card(dev, _cht_es8316_card); -- 2.20.1
[PATCH AUTOSEL 5.1 30/51] ASoC: Intel: cht_bsw_rt5672: fix kernel oops with platform_name override
From: Pierre-Louis Bossart [ Upstream commit 9bbc799318a34061703f2a980e2b6df7fc6760f0 ] The platform override code uses devm_ functions to allocate memory for the new name but the card device is not initialized. Fix by moving the init earlier. Fixes: f403906da05cd ("ASoC: Intel: cht_bsw_rt5672: platform name fixup support") Signed-off-by: Pierre-Louis Bossart Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/cht_bsw_rt5672.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/intel/boards/cht_bsw_rt5672.c b/sound/soc/intel/boards/cht_bsw_rt5672.c index 3d5a2b3a06f0..87ce3857376d 100644 --- a/sound/soc/intel/boards/cht_bsw_rt5672.c +++ b/sound/soc/intel/boards/cht_bsw_rt5672.c @@ -425,6 +425,7 @@ static int snd_cht_mc_probe(struct platform_device *pdev) } /* override plaform name, if required */ + snd_soc_card_cht.dev = >dev; platform_name = mach->mach_params.platform; ret_val = snd_soc_fixup_dai_links_platform_name(_soc_card_cht, @@ -442,7 +443,6 @@ static int snd_cht_mc_probe(struct platform_device *pdev) snd_soc_card_set_drvdata(_soc_card_cht, drv); /* register the soc card */ - snd_soc_card_cht.dev = >dev; ret_val = devm_snd_soc_register_card(>dev, _soc_card_cht); if (ret_val) { dev_err(>dev, -- 2.20.1
[PATCH AUTOSEL 5.1 27/51] ASoC: Intel: cht_bsw_max98090: fix kernel oops with platform_name override
From: Pierre-Louis Bossart [ Upstream commit fb54555134b9b17835545e4d096b5550c27eed64 ] The platform override code uses devm_ functions to allocate memory for the new name but the card device is not initialized. Fix by moving the init earlier. Fixes: 7e7e24d7c7ff0 ("ASoC: Intel: cht_bsw_max98090_ti: platform name fixup support") Signed-off-by: Pierre-Louis Bossart Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/cht_bsw_max98090_ti.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/intel/boards/cht_bsw_max98090_ti.c b/sound/soc/intel/boards/cht_bsw_max98090_ti.c index c0e0844f75b9..572e336ae0f9 100644 --- a/sound/soc/intel/boards/cht_bsw_max98090_ti.c +++ b/sound/soc/intel/boards/cht_bsw_max98090_ti.c @@ -454,6 +454,7 @@ static int snd_cht_mc_probe(struct platform_device *pdev) } /* override plaform name, if required */ + snd_soc_card_cht.dev = >dev; mach = (>dev)->platform_data; platform_name = mach->mach_params.platform; @@ -463,7 +464,6 @@ static int snd_cht_mc_probe(struct platform_device *pdev) return ret_val; /* register the soc card */ - snd_soc_card_cht.dev = >dev; snd_soc_card_set_drvdata(_soc_card_cht, drv); if (drv->quirks & QUIRK_PMC_PLT_CLK_0) -- 2.20.1
[PATCH AUTOSEL 5.1 28/51] ASoC: Intel: bytcht_es8316: fix kernel oops with platform_name override
From: Pierre-Louis Bossart [ Upstream commit 79136a016add1acb690fe8d96be50dd22a143d26 ] The platform override code uses devm_ functions to allocate memory for the new name but the card device is not initialized. Fix by moving the init earlier. Fixes: e4bc6b1195f64 ("ASoC: Intel: bytcht_es8316: platform name fixup support") Signed-off-by: Pierre-Louis Bossart Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/bytcht_es8316.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/intel/boards/bytcht_es8316.c b/sound/soc/intel/boards/bytcht_es8316.c index d2a7e6ba11ae..1c686f83220a 100644 --- a/sound/soc/intel/boards/bytcht_es8316.c +++ b/sound/soc/intel/boards/bytcht_es8316.c @@ -471,6 +471,7 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev) } /* override plaform name, if required */ + byt_cht_es8316_card.dev = dev; platform_name = mach->mach_params.platform; ret = snd_soc_fixup_dai_links_platform_name(_cht_es8316_card, @@ -538,7 +539,6 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev) (quirk & BYT_CHT_ES8316_MONO_SPEAKER) ? "mono" : "stereo", mic_name[BYT_CHT_ES8316_MAP(quirk)]); byt_cht_es8316_card.long_name = long_name; - byt_cht_es8316_card.dev = dev; snd_soc_card_set_drvdata(_cht_es8316_card, priv); ret = devm_snd_soc_register_card(dev, _cht_es8316_card); -- 2.20.1
[PATCH AUTOSEL 5.1 29/51] ASoC: Intel: cht_bsw_nau8824: fix kernel oops with platform_name override
From: Pierre-Louis Bossart [ Upstream commit 096701e8131425044d2054a0c210d6ea24ee7386 ] The platform override code uses devm_ functions to allocate memory for the new name but the card device is not initialized. Fix by moving the init earlier. Fixes: 4506db8043341 ("ASoC: Intel: cht_bsw_nau8824: platform name fixup support") Signed-off-by: Pierre-Louis Bossart Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/cht_bsw_nau8824.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/intel/boards/cht_bsw_nau8824.c b/sound/soc/intel/boards/cht_bsw_nau8824.c index 02c2fa239331..20fae391c75a 100644 --- a/sound/soc/intel/boards/cht_bsw_nau8824.c +++ b/sound/soc/intel/boards/cht_bsw_nau8824.c @@ -257,6 +257,7 @@ static int snd_cht_mc_probe(struct platform_device *pdev) snd_soc_card_set_drvdata(_soc_card_cht, drv); /* override plaform name, if required */ + snd_soc_card_cht.dev = >dev; mach = (>dev)->platform_data; platform_name = mach->mach_params.platform; @@ -266,7 +267,6 @@ static int snd_cht_mc_probe(struct platform_device *pdev) return ret_val; /* register the soc card */ - snd_soc_card_cht.dev = >dev; ret_val = devm_snd_soc_register_card(>dev, _soc_card_cht); if (ret_val) { dev_err(>dev, -- 2.20.1
bpf: test_btf : kernel Oops: 207 : PC is at memcpy+0xc0/0x330
while running kernel selftest bpf: test_btf the following kernel oops detected on beaglebone x15 board. Linux version 5.2.0-rc3-next-20190604 Full test log link can be found below [1] bpf: test_btf_ # # BTF GET_INFO test[3] (Large bpf_btf_info) OK GET_INFO: test[3]_(Large # # BTF GET_INFO test[4] (BTF ID) OK GET_INFO: test[4]_(BTF # [ 341.144885] 8<--- cut here --- [ 341.148164] Unable to handle kernel NULL pointer dereference at virtual address [ 341.156443] pgd = b0902156 [ 341.159294] [] *pgd=9655e003, *pmd=ff918003 [ 341.164229] Internal error: Oops: 207 [#1] SMP ARM [ 341.169052] Modules linked in: tun sha1_generic sha1_arm_neon sha1_arm algif_hash af_alg snd_soc_simple_card snd_soc_simple_card_utils snd_soc_core ac97_bus snd_pcm_dmaengine snd_pcm snd_timer snd soundcore fuse [ 341.187962] CPU: 0 PID: 6773 Comm: test_sockmap Not tainted 5.2.0-rc3-next-20190604 #1 [ 341.195923] Hardware name: Generic DRA74X (Flattened Device Tree) [ 341.202058] PC is at memcpy+0xc0/0x330 [ 341.205836] LR is at bpf_msg_push_data+0x70c/0x728 [ 341.210654] pc : []lr : []psr: 800b0013 [ 341.216957] sp : e99ad6cc ip : 0002 fp : e99ad83c [ 341.12] r10: d1bdc000 r9 : 0001 r8 : [ 341.227467] r7 : cd1de000 r6 : r5 : d1bdc000 r4 : [ 341.234032] r3 : r2 : 8000 r1 : r0 : cd1de000 [ 341.240597] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 341.247771] Control: 30c5387d Table: 91b19880 DAC: fffd [ 341.253553] Process test_sockmap (pid: 6773, stack limit = 0x3ad4028c) [ 341.260118] Stack: (0xe99ad6cc to 0xe99ae000) [ 341.264502] d6c0:cd1de000 c10ea4a4 [ 341.272725] d6e0: ea2759a0 0001 [ 341.280948] d700: [ 341.289171] d720: [ 341.297394] d740: e99ad78c d03f0580 [ 341.305615] d760: 0004 d03f 0007 03a9 [ 341.313836] d780: d03f c0581f6c 0060 c1e09fd0 [ 341.322060] d7a0: e99ad85c e99ad7b0 c04c1d1c c06c3638 c1dc19b8 e99ad7ec d03f0540 [ 341.330283] d7c0: 0002 d03f 0007 03a9 0001 d03f0560 d03f e3444ce4 [ 341.338506] d7e0: 0060 c1e09fd0 e99ad8a4 e99ad7f8 cbc66e00 e99ad8a8 c1419868 c059b69c [ 341.346730] d800: e99ad824 e99ad818 c04e3b7c f006b240 e99ad8b8 c1419868 [ 341.354954] d820: c10e9d98 c0581ddc e99ad894 e99ad840 c0581f6c c10e9da4 [ 341.363175] d840: f5388145 290412b8 [ 341.371399] d860: c2432908 f08d7937 c1e08488 f006b028 0011 c11cd828 f006b000 [ 341.379620] d880: e99ad9e4 c1fc9e37 e99ad934 e99ad898 c0584910 c0581e48 [ 341.387841] d8a0: 0005 0004 0003 0002 0001 cbc66ee8 [ 341.396063] d8c0: d1bdc000 [ 341.404286] d8e0: d1bdc000 cbc66ee0 0007 0006 [ 341.412509] d900: 0010 c1fc9e37 e99ad8b8 cf380840 c10fdef4 c11cd828 9fdbe7c7 [ 341.420732] d920: d1bdc000 d1bdc000 e99ad984 e99ad938 c10fdf18 c05848d0 [ 341.428954] d940: c10fde14 c0459978 e99ad97c e99ad958 c056165c e7bdd400 01ff d1bdc000 [ 341.437178] d960: e7bdd400 0011 cf380840 e99ad9e4 c1fc9e37 e99ad9cc e99ad988 [ 341.445407] d980: c11cd828 c10fde20 c11cda3c e99ad9a0 [ 341.453631] d9a0: 0001 e7bdd400 cf380840 eb6d1030 c1e08488 0003 c1fc9e37 [ 341.461855] d9c0: e99adcac e99ad9d0 c11cdb60 c11cd518 c11cd90c e8577024 [ 341.470078] d9e0: 0020 0001 0001 0001 0001 [ 341.478300] da00: eb6d1030 0001 [ 341.486522] da20: [ 341.494742] da40: [ 341.502965] da60: [ 341.511184] da80: [ 341.519406] daa0: 0087 0001 d03f0500 [ 341.527628] dac0: d03f c1e47f04 c1e09fd0 e99adb8c e99adae0 c04c1d1c c04c10d0 [ 341.535850] dae0: 0078 [ 341.544074] db00: d03f04f0 0087 c2432908 c2638640 0087 d03f0500 [ 341.552296] db20: c1e08488 c2418b30 406293ec 295bca2f 0
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
On 5/20/19 8:25 PM, Nicholas Piggin wrote: Bharata B Rao's on May 21, 2019 12:29 am: On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote: On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote: Bharata B Rao's on May 20, 2019 3:56 pm: On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote: git bisect points to commit 4231aba000f5a4583dd9f67057aadb68c3eca99d Author: Nicholas Piggin Date: Fri Jul 27 21:48:17 2018 +1000 powerpc/64s: Fix page table fragment refcount race vs speculative references The page table fragment allocator uses the main page refcount racily with respect to speculative references. A customer observed a BUG due to page table page refcount underflow in the fragment allocator. This can be caused by the fragment allocator set_page_count stomping on a speculative reference, and then the speculative failure handler decrements the new reference, and the underflow eventually pops when the page tables are freed. Fix this by using a dedicated field in the struct page for the page table fragment allocator. Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage") Cc: sta...@vger.kernel.org # v3.10+ That's the commit that added the BUG_ON(), so prior to that you won't see the crash. Right, but the commit says it fixes page table page refcount underflow by introducing a new field >pt_frag_refcount. Now we are hitting the underflow for this pt_frag_refcount. The fixed underflow is caused by a bug (race on page count) that got fixed by that patch. You are hitting a different underflow here. It's not certain my patch caused it, I'm just trying to reproduce now. Ok. Can't reproduce I'm afraid, tried adding and removing 8GB memory from a 4GB guest (via host adding / removing memory device), and it just works. Boot, add 8G, reboot, remove 8G is the sequence to reproduce. It's likely to be an edge case like an off by one or rounding error that just happens to trigger in your config. Might be easiest if you could test with a debug patch. Sure, I will continue debugging. When the guest is rebooted after hotplug, the entire memory (which includes the hotplugged memory) gets remapped again freshly. However at this time since no slab is available yet, pt_frag_refcount never gets initialized as we never do pte_fragment_alloc() for these mappings. So we right away hit the underflow during the first unplug itself, it looks like. Nice catch, good debugging work. I will check how this can be fixed. Tricky problem. What do you think? You might be able to make the early page table allocations in the same pattern as the frag allocations, and then fill in the struct page metadata when you have those. I guess we need to do something similar to what x86 does. We need to walk the init_mm page table again and re-init struct page and other data structures backing the tables? -aneesh
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
On Tue, May 21, 2019 at 12:55:49AM +1000, Nicholas Piggin wrote: > Bharata B Rao's on May 21, 2019 12:29 am: > > On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote: > >> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote: > >> > Bharata B Rao's on May 20, 2019 3:56 pm: > >> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote: > >> > >> >> > git bisect points to > >> > >> >> > > >> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d > >> > >> >> > Author: Nicholas Piggin > >> > >> >> > Date: Fri Jul 27 21:48:17 2018 +1000 > >> > >> >> > > >> > >> >> > powerpc/64s: Fix page table fragment refcount race vs > >> > >> >> > speculative references > >> > >> >> > > >> > >> >> > The page table fragment allocator uses the main page > >> > >> >> > refcount racily > >> > >> >> > with respect to speculative references. A customer observed > >> > >> >> > a BUG due > >> > >> >> > to page table page refcount underflow in the fragment > >> > >> >> > allocator. This > >> > >> >> > can be caused by the fragment allocator set_page_count > >> > >> >> > stomping on a > >> > >> >> > speculative reference, and then the speculative failure > >> > >> >> > handler > >> > >> >> > decrements the new reference, and the underflow eventually > >> > >> >> > pops when > >> > >> >> > the page tables are freed. > >> > >> >> > > >> > >> >> > Fix this by using a dedicated field in the struct page for > >> > >> >> > the page > >> > >> >> > table fragment allocator. > >> > >> >> > > >> > >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory > >> > >> >> > wastage") > >> > >> >> > Cc: sta...@vger.kernel.org # v3.10+ > >> > >> >> > >> > >> >> That's the commit that added the BUG_ON(), so prior to that you > >> > >> >> won't > >> > >> >> see the crash. > >> > >> > > >> > >> > Right, but the commit says it fixes page table page refcount > >> > >> > underflow by > >> > >> > introducing a new field >pt_frag_refcount. Now we are hitting > >> > >> > the underflow > >> > >> > for this pt_frag_refcount. > >> > >> > >> > >> The fixed underflow is caused by a bug (race on page count) that got > >> > >> fixed by that patch. You are hitting a different underflow here. It's > >> > >> not certain my patch caused it, I'm just trying to reproduce now. > >> > > > >> > > Ok. > >> > > >> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a > >> > 4GB guest (via host adding / removing memory device), and it just works. > >> > >> Boot, add 8G, reboot, remove 8G is the sequence to reproduce. > >> > >> > > >> > It's likely to be an edge case like an off by one or rounding error > >> > that just happens to trigger in your config. Might be easiest if you > >> > could test with a debug patch. > >> > >> Sure, I will continue debugging. > > > > When the guest is rebooted after hotplug, the entire memory (which includes > > the hotplugged memory) gets remapped again freshly. However at this time > > since no slab is available yet, pt_frag_refcount never gets initialized as > > we > > never do pte_fragment_alloc() for these mappings. So we right away hit the > > underflow during the first unplug itself, it looks like. > > Nice catch, good debugging work. Thanks, with help from Aneesh. > > > I will check how this can be fixed. > > Tricky problem. What do you think? You might be able to make the early > page table allocations in the same pattern as the frag allocations, and > then fill in the struct page metadata when you have those. Will explore. > > Other option may be create a new set of page tables after mm comes up > to replace the early page tables with. That's a bigger hammer though. Will also check if similar scenario exists on x86 and if so, how and when pte frag data is fixed there. Regards, Bharata.
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
Bharata B Rao's on May 21, 2019 12:29 am: > On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote: >> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote: >> > Bharata B Rao's on May 20, 2019 3:56 pm: >> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote: >> > >> >> > git bisect points to >> > >> >> > >> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d >> > >> >> > Author: Nicholas Piggin >> > >> >> > Date: Fri Jul 27 21:48:17 2018 +1000 >> > >> >> > >> > >> >> > powerpc/64s: Fix page table fragment refcount race vs >> > >> >> > speculative references >> > >> >> > >> > >> >> > The page table fragment allocator uses the main page refcount >> > >> >> > racily >> > >> >> > with respect to speculative references. A customer observed a >> > >> >> > BUG due >> > >> >> > to page table page refcount underflow in the fragment >> > >> >> > allocator. This >> > >> >> > can be caused by the fragment allocator set_page_count >> > >> >> > stomping on a >> > >> >> > speculative reference, and then the speculative failure handler >> > >> >> > decrements the new reference, and the underflow eventually >> > >> >> > pops when >> > >> >> > the page tables are freed. >> > >> >> > >> > >> >> > Fix this by using a dedicated field in the struct page for the >> > >> >> > page >> > >> >> > table fragment allocator. >> > >> >> > >> > >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory >> > >> >> > wastage") >> > >> >> > Cc: sta...@vger.kernel.org # v3.10+ >> > >> >> >> > >> >> That's the commit that added the BUG_ON(), so prior to that you won't >> > >> >> see the crash. >> > >> > >> > >> > Right, but the commit says it fixes page table page refcount >> > >> > underflow by >> > >> > introducing a new field >pt_frag_refcount. Now we are hitting >> > >> > the underflow >> > >> > for this pt_frag_refcount. >> > >> >> > >> The fixed underflow is caused by a bug (race on page count) that got >> > >> fixed by that patch. You are hitting a different underflow here. It's >> > >> not certain my patch caused it, I'm just trying to reproduce now. >> > > >> > > Ok. >> > >> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a >> > 4GB guest (via host adding / removing memory device), and it just works. >> >> Boot, add 8G, reboot, remove 8G is the sequence to reproduce. >> >> > >> > It's likely to be an edge case like an off by one or rounding error >> > that just happens to trigger in your config. Might be easiest if you >> > could test with a debug patch. >> >> Sure, I will continue debugging. > > When the guest is rebooted after hotplug, the entire memory (which includes > the hotplugged memory) gets remapped again freshly. However at this time > since no slab is available yet, pt_frag_refcount never gets initialized as we > never do pte_fragment_alloc() for these mappings. So we right away hit the > underflow during the first unplug itself, it looks like. Nice catch, good debugging work. > I will check how this can be fixed. Tricky problem. What do you think? You might be able to make the early page table allocations in the same pattern as the frag allocations, and then fill in the struct page metadata when you have those. Other option may be create a new set of page tables after mm comes up to replace the early page tables with. That's a bigger hammer though. Thanks, Nick
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote: > On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote: > > Bharata B Rao's on May 20, 2019 3:56 pm: > > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote: > > >> >> > git bisect points to > > >> >> > > > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d > > >> >> > Author: Nicholas Piggin > > >> >> > Date: Fri Jul 27 21:48:17 2018 +1000 > > >> >> > > > >> >> > powerpc/64s: Fix page table fragment refcount race vs > > >> >> > speculative references > > >> >> > > > >> >> > The page table fragment allocator uses the main page refcount > > >> >> > racily > > >> >> > with respect to speculative references. A customer observed a > > >> >> > BUG due > > >> >> > to page table page refcount underflow in the fragment > > >> >> > allocator. This > > >> >> > can be caused by the fragment allocator set_page_count stomping > > >> >> > on a > > >> >> > speculative reference, and then the speculative failure handler > > >> >> > decrements the new reference, and the underflow eventually pops > > >> >> > when > > >> >> > the page tables are freed. > > >> >> > > > >> >> > Fix this by using a dedicated field in the struct page for the > > >> >> > page > > >> >> > table fragment allocator. > > >> >> > > > >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage") > > >> >> > Cc: sta...@vger.kernel.org # v3.10+ > > >> >> > > >> >> That's the commit that added the BUG_ON(), so prior to that you won't > > >> >> see the crash. > > >> > > > >> > Right, but the commit says it fixes page table page refcount underflow > > >> > by > > >> > introducing a new field >pt_frag_refcount. Now we are hitting > > >> > the underflow > > >> > for this pt_frag_refcount. > > >> > > >> The fixed underflow is caused by a bug (race on page count) that got > > >> fixed by that patch. You are hitting a different underflow here. It's > > >> not certain my patch caused it, I'm just trying to reproduce now. > > > > > > Ok. > > > > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a > > 4GB guest (via host adding / removing memory device), and it just works. > > Boot, add 8G, reboot, remove 8G is the sequence to reproduce. > > > > > It's likely to be an edge case like an off by one or rounding error > > that just happens to trigger in your config. Might be easiest if you > > could test with a debug patch. > > Sure, I will continue debugging. When the guest is rebooted after hotplug, the entire memory (which includes the hotplugged memory) gets remapped again freshly. However at this time since no slab is available yet, pt_frag_refcount never gets initialized as we never do pte_fragment_alloc() for these mappings. So we right away hit the underflow during the first unplug itself, it looks like. I will check how this can be fixed. > > Regards, > Bharata.
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote: > Bharata B Rao's on May 20, 2019 3:56 pm: > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote: > >> >> > git bisect points to > >> >> > > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d > >> >> > Author: Nicholas Piggin > >> >> > Date: Fri Jul 27 21:48:17 2018 +1000 > >> >> > > >> >> > powerpc/64s: Fix page table fragment refcount race vs speculative > >> >> > references > >> >> > > >> >> > The page table fragment allocator uses the main page refcount > >> >> > racily > >> >> > with respect to speculative references. A customer observed a BUG > >> >> > due > >> >> > to page table page refcount underflow in the fragment allocator. > >> >> > This > >> >> > can be caused by the fragment allocator set_page_count stomping > >> >> > on a > >> >> > speculative reference, and then the speculative failure handler > >> >> > decrements the new reference, and the underflow eventually pops > >> >> > when > >> >> > the page tables are freed. > >> >> > > >> >> > Fix this by using a dedicated field in the struct page for the > >> >> > page > >> >> > table fragment allocator. > >> >> > > >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage") > >> >> > Cc: sta...@vger.kernel.org # v3.10+ > >> >> > >> >> That's the commit that added the BUG_ON(), so prior to that you won't > >> >> see the crash. > >> > > >> > Right, but the commit says it fixes page table page refcount underflow by > >> > introducing a new field >pt_frag_refcount. Now we are hitting the > >> > underflow > >> > for this pt_frag_refcount. > >> > >> The fixed underflow is caused by a bug (race on page count) that got > >> fixed by that patch. You are hitting a different underflow here. It's > >> not certain my patch caused it, I'm just trying to reproduce now. > > > > Ok. > > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a > 4GB guest (via host adding / removing memory device), and it just works. Boot, add 8G, reboot, remove 8G is the sequence to reproduce. > > It's likely to be an edge case like an off by one or rounding error > that just happens to trigger in your config. Might be easiest if you > could test with a debug patch. Sure, I will continue debugging. Regards, Bharata.
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
Bharata B Rao's on May 20, 2019 3:56 pm: > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote: >> >> > git bisect points to >> >> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d >> >> > Author: Nicholas Piggin >> >> > Date: Fri Jul 27 21:48:17 2018 +1000 >> >> > >> >> > powerpc/64s: Fix page table fragment refcount race vs speculative >> >> > references >> >> > >> >> > The page table fragment allocator uses the main page refcount racily >> >> > with respect to speculative references. A customer observed a BUG >> >> > due >> >> > to page table page refcount underflow in the fragment allocator. >> >> > This >> >> > can be caused by the fragment allocator set_page_count stomping on a >> >> > speculative reference, and then the speculative failure handler >> >> > decrements the new reference, and the underflow eventually pops when >> >> > the page tables are freed. >> >> > >> >> > Fix this by using a dedicated field in the struct page for the page >> >> > table fragment allocator. >> >> > >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage") >> >> > Cc: sta...@vger.kernel.org # v3.10+ >> >> >> >> That's the commit that added the BUG_ON(), so prior to that you won't >> >> see the crash. >> > >> > Right, but the commit says it fixes page table page refcount underflow by >> > introducing a new field >pt_frag_refcount. Now we are hitting the >> > underflow >> > for this pt_frag_refcount. >> >> The fixed underflow is caused by a bug (race on page count) that got >> fixed by that patch. You are hitting a different underflow here. It's >> not certain my patch caused it, I'm just trying to reproduce now. > > Ok. Can't reproduce I'm afraid, tried adding and removing 8GB memory from a 4GB guest (via host adding / removing memory device), and it just works. It's likely to be an edge case like an off by one or rounding error that just happens to trigger in your config. Might be easiest if you could test with a debug patch. Thanks, Nick
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote: > >> > git bisect points to > >> > > >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d > >> > Author: Nicholas Piggin > >> > Date: Fri Jul 27 21:48:17 2018 +1000 > >> > > >> > powerpc/64s: Fix page table fragment refcount race vs speculative > >> > references > >> > > >> > The page table fragment allocator uses the main page refcount racily > >> > with respect to speculative references. A customer observed a BUG due > >> > to page table page refcount underflow in the fragment allocator. This > >> > can be caused by the fragment allocator set_page_count stomping on a > >> > speculative reference, and then the speculative failure handler > >> > decrements the new reference, and the underflow eventually pops when > >> > the page tables are freed. > >> > > >> > Fix this by using a dedicated field in the struct page for the page > >> > table fragment allocator. > >> > > >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage") > >> > Cc: sta...@vger.kernel.org # v3.10+ > >> > >> That's the commit that added the BUG_ON(), so prior to that you won't > >> see the crash. > > > > Right, but the commit says it fixes page table page refcount underflow by > > introducing a new field >pt_frag_refcount. Now we are hitting the > > underflow > > for this pt_frag_refcount. > > The fixed underflow is caused by a bug (race on page count) that got > fixed by that patch. You are hitting a different underflow here. It's > not certain my patch caused it, I'm just trying to reproduce now. Ok. > > > > > BTW, if I go below this commit, I don't hit the pagecount > > > > VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); > > > > which is in pte_fragment_free() path. > > Do you have CONFIG_DEBUG_VM=y? Yes. Regards, Bharata.
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
Bharata B Rao's on May 20, 2019 2:25 pm: > On Mon, May 20, 2019 at 12:02:23PM +1000, Michael Ellerman wrote: >> Bharata B Rao writes: >> > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote: >> >> Hello, >> >> >> >> On power9 host, performing memory hotunplug from ppc64le guest results in >> >> kernel oops. >> >> >> >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using >> >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest. >> >> >> >> Recreation steps: >> >> >> >> 1. Boot a guest with below mem configuration: >> >> 33554432 >> >> 8388608 >> >> 4194304 >> >> >> >> >> >> >> >> >> >> >> >> >> >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> >> >> now >> >> reboot guest -> once guest comes back try to unplug 8G memory >> >> >> >> mem.xml used: >> >> >> >> >> >> 8 >> >> 0 >> >> >> >> >> >> >> >> Memory attach and detach commands used: >> >> virsh attach-device vm1 ./mem.xml --live >> >> virsh detach-device vm1 ./mem.xml --live >> >> >> >> Trace seen inside guest after unplug, guest just hangs there forever: >> >> >> >> [ 21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113! >> >> [ 21.963064] Oops: Exception in kernel mode, sig: 5 [#1] >> >> [ 21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA >> >> pSeries >> >> [ 21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse >> >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi >> >> scsi_transport_iscsi >> >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress >> >> lzo_compress >> >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx >> >> xor raid6_pq multipath crc32c_vpmsum >> >> [ 21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not >> >> tainted 5.1.0-dirty #2 >> >> [ 21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn >> >> [ 21.963355] NIP: c0079e18 LR: c0c79308 CTR: >> >> 8000 >> >> [ 21.963392] REGS: c003f88034f0 TRAP: 0700 Not tainted >> >> (5.1.0-dirty) >> >> [ 21.963422] MSR: 8282b033 >> >> CR: >> >> 28002884 XER: 2004 >> >> [ 21.963470] CFAR: c0c79304 IRQMASK: 0 >> >> [ 21.963470] GPR00: c0c79308 c003f8803780 c1521000 >> >> 00fff8c0 >> >> [ 21.963470] GPR04: 0001 ffe30005 0005 >> >> 0020 >> >> [ 21.963470] GPR08: 0001 c00a00fff8e0 >> >> c16d21a0 >> >> [ 21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0 >> >> c003ffe30100 >> >> [ 21.963470] GPR16: c003ffe3 c14aa4de c00a009f >> >> c16d21b0 >> >> [ 21.963470] GPR20: c14de588 0001 c16d21b8 >> >> c00a00a0 >> >> [ 21.963470] GPR24: c00a00a0 >> >> c003ffe96000 >> >> [ 21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000 >> >> c00a00fff8c0 >> >> [ 21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0 >> >> [ 21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4 >> >> [ 21.963873] Call Trace: >> >> [ 21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0 >> >> (unreliable) >> >> [ 21.963933] [c003f88037b0] [] (null) >> >> [ 21.963969] [c003f88038c0] [c006f038] >> >> vmemmap_free+0x218/0x2e0 >> >> [ 21.964006] [c003f8803940] [c036f100] >> >> sparse_remove_one_section+0xd0/0x138 >> >> [ 21.964050] [c003f8803980] [c0383a50] >> >> __remove_pages+0x410/0x560 >> >> [ 21.964093] [c003f8803a90] [c0c784d8] >> >> arch_remove_memory+0x68/0xdc >> >> [ 21.964136] [c003f8803ad0] [c0385d74] >> >&
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
On Mon, May 20, 2019 at 12:02:23PM +1000, Michael Ellerman wrote: > Bharata B Rao writes: > > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote: > >> Hello, > >> > >> On power9 host, performing memory hotunplug from ppc64le guest results in > >> kernel oops. > >> > >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using > >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest. > >> > >> Recreation steps: > >> > >> 1. Boot a guest with below mem configuration: > >> 33554432 > >> 8388608 > >> 4194304 > >> > >> > >> > >> > >> > >> > >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now > >> reboot guest -> once guest comes back try to unplug 8G memory > >> > >> mem.xml used: > >> > >> > >> 8 > >> 0 > >> > >> > >> > >> Memory attach and detach commands used: > >> virsh attach-device vm1 ./mem.xml --live > >> virsh detach-device vm1 ./mem.xml --live > >> > >> Trace seen inside guest after unplug, guest just hangs there forever: > >> > >> [ 21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113! > >> [ 21.963064] Oops: Exception in kernel mode, sig: 5 [#1] > >> [ 21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA > >> pSeries > >> [ 21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse > >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi > >> scsi_transport_iscsi > >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress > >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx > >> xor raid6_pq multipath crc32c_vpmsum > >> [ 21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not > >> tainted 5.1.0-dirty #2 > >> [ 21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn > >> [ 21.963355] NIP: c0079e18 LR: c0c79308 CTR: > >> 8000 > >> [ 21.963392] REGS: c003f88034f0 TRAP: 0700 Not tainted > >> (5.1.0-dirty) > >> [ 21.963422] MSR: 8282b033 > >> CR: > >> 28002884 XER: 2004 > >> [ 21.963470] CFAR: c0c79304 IRQMASK: 0 > >> [ 21.963470] GPR00: c0c79308 c003f8803780 c1521000 > >> 00fff8c0 > >> [ 21.963470] GPR04: 0001 ffe30005 0005 > >> 0020 > >> [ 21.963470] GPR08: 0001 c00a00fff8e0 > >> c16d21a0 > >> [ 21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0 > >> c003ffe30100 > >> [ 21.963470] GPR16: c003ffe3 c14aa4de c00a009f > >> c16d21b0 > >> [ 21.963470] GPR20: c14de588 0001 c16d21b8 > >> c00a00a0 > >> [ 21.963470] GPR24: c00a00a0 > >> c003ffe96000 > >> [ 21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000 > >> c00a00fff8c0 > >> [ 21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0 > >> [ 21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4 > >> [ 21.963873] Call Trace: > >> [ 21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0 > >> (unreliable) > >> [ 21.963933] [c003f88037b0] [] (null) > >> [ 21.963969] [c003f88038c0] [c006f038] > >> vmemmap_free+0x218/0x2e0 > >> [ 21.964006] [c003f8803940] [c036f100] > >> sparse_remove_one_section+0xd0/0x138 > >> [ 21.964050] [c003f8803980] [c0383a50] > >> __remove_pages+0x410/0x560 > >> [ 21.964093] [c003f8803a90] [c0c784d8] > >> arch_remove_memory+0x68/0xdc > >> [ 21.964136] [c003f8803ad0] [c0385d74] > >> __remove_memory+0xc4/0x110 > >> [ 21.964180] [c003f8803b10] [c00d44e4] > >> dlpar_remove_lmb+0x94/0x140 > >> [ 21.964223] [c003f8803b50] [c00d52b4] > >> dlpar_memory+0x464/0xd00 > >> [ 21.964259] [c003f8803be0] [c00cd5c0] > >> handle_dlpar_errorlog+0xc0/0x190 > >> [ 21.964303] [c003f8803c50] [c00cd6bc] > >
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
Bharata B Rao writes: > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote: >> Hello, >> >> On power9 host, performing memory hotunplug from ppc64le guest results in >> kernel oops. >> >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest. >> >> Recreation steps: >> >> 1. Boot a guest with below mem configuration: >> 33554432 >> 8388608 >> 4194304 >> >> >> >> >> >> >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now >> reboot guest -> once guest comes back try to unplug 8G memory >> >> mem.xml used: >> >> >> 8 >> 0 >> >> >> >> Memory attach and detach commands used: >> virsh attach-device vm1 ./mem.xml --live >> virsh detach-device vm1 ./mem.xml --live >> >> Trace seen inside guest after unplug, guest just hangs there forever: >> >> [ 21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113! >> [ 21.963064] Oops: Exception in kernel mode, sig: 5 [#1] >> [ 21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA >> pSeries >> [ 21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx >> xor raid6_pq multipath crc32c_vpmsum >> [ 21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not >> tainted 5.1.0-dirty #2 >> [ 21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn >> [ 21.963355] NIP: c0079e18 LR: c0c79308 CTR: >> 8000 >> [ 21.963392] REGS: c003f88034f0 TRAP: 0700 Not tainted (5.1.0-dirty) >> [ 21.963422] MSR: 8282b033 CR: >> 28002884 XER: 2004 >> [ 21.963470] CFAR: c0c79304 IRQMASK: 0 >> [ 21.963470] GPR00: c0c79308 c003f8803780 c1521000 >> 00fff8c0 >> [ 21.963470] GPR04: 0001 ffe30005 0005 >> 0020 >> [ 21.963470] GPR08: 0001 c00a00fff8e0 >> c16d21a0 >> [ 21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0 >> c003ffe30100 >> [ 21.963470] GPR16: c003ffe3 c14aa4de c00a009f >> c16d21b0 >> [ 21.963470] GPR20: c14de588 0001 c16d21b8 >> c00a00a0 >> [ 21.963470] GPR24: c00a00a0 >> c003ffe96000 >> [ 21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000 >> c00a00fff8c0 >> [ 21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0 >> [ 21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4 >> [ 21.963873] Call Trace: >> [ 21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0 >> (unreliable) >> [ 21.963933] [c003f88037b0] [] (null) >> [ 21.963969] [c003f88038c0] [c006f038] >> vmemmap_free+0x218/0x2e0 >> [ 21.964006] [c003f8803940] [c036f100] >> sparse_remove_one_section+0xd0/0x138 >> [ 21.964050] [c003f8803980] [c0383a50] >> __remove_pages+0x410/0x560 >> [ 21.964093] [c003f8803a90] [c0c784d8] >> arch_remove_memory+0x68/0xdc >> [ 21.964136] [c003f8803ad0] [c0385d74] >> __remove_memory+0xc4/0x110 >> [ 21.964180] [c003f8803b10] [c00d44e4] >> dlpar_remove_lmb+0x94/0x140 >> [ 21.964223] [c003f8803b50] [c00d52b4] >> dlpar_memory+0x464/0xd00 >> [ 21.964259] [c003f8803be0] [c00cd5c0] >> handle_dlpar_errorlog+0xc0/0x190 >> [ 21.964303] [c003f8803c50] [c00cd6bc] >> pseries_hp_work_fn+0x2c/0x60 >> [ 21.964346] [c003f8803c80] [c013a4a0] >> process_one_work+0x2b0/0x5a0 >> [ 21.964388] [c003f8803d10] [c013a818] >> worker_thread+0x88/0x610 >> [ 21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0 >> [ 21.964468] [c003f8803e20] [c000bdc4] >> ret_from_kernel_thread+0x5c/0x78 >> [ 21.964506] Instruction dump: >> [ 21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14 >> 395f0020 813f0020 >> [ 21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 &
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote: > Hello, > > On power9 host, performing memory hotunplug from ppc64le guest results in > kernel oops. > > Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using > ppc64le_defconfig for host and ppc64le_guest_defconfig for guest. > > Recreation steps: > > 1. Boot a guest with below mem configuration: > 33554432 > 8388608 > 4194304 > > > > > > > 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now > reboot guest -> once guest comes back try to unplug 8G memory > > mem.xml used: > > > 8 > 0 > > > > Memory attach and detach commands used: > virsh attach-device vm1 ./mem.xml --live > virsh detach-device vm1 ./mem.xml --live > > Trace seen inside guest after unplug, guest just hangs there forever: > > [ 21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113! > [ 21.963064] Oops: Exception in kernel mode, sig: 5 [#1] > [ 21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA > pSeries > [ 21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse > vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi > ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress > raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx > xor raid6_pq multipath crc32c_vpmsum > [ 21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not > tainted 5.1.0-dirty #2 > [ 21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn > [ 21.963355] NIP: c0079e18 LR: c0c79308 CTR: > 8000 > [ 21.963392] REGS: c003f88034f0 TRAP: 0700 Not tainted (5.1.0-dirty) > [ 21.963422] MSR: 8282b033 CR: > 28002884 XER: 2004 > [ 21.963470] CFAR: c0c79304 IRQMASK: 0 > [ 21.963470] GPR00: c0c79308 c003f8803780 c1521000 > 00fff8c0 > [ 21.963470] GPR04: 0001 ffe30005 0005 > 0020 > [ 21.963470] GPR08: 0001 c00a00fff8e0 > c16d21a0 > [ 21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0 > c003ffe30100 > [ 21.963470] GPR16: c003ffe3 c14aa4de c00a009f > c16d21b0 > [ 21.963470] GPR20: c14de588 0001 c16d21b8 > c00a00a0 > [ 21.963470] GPR24: c00a00a0 > c003ffe96000 > [ 21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000 > c00a00fff8c0 > [ 21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0 > [ 21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4 > [ 21.963873] Call Trace: > [ 21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0 > (unreliable) > [ 21.963933] [c003f88037b0] [] (null) > [ 21.963969] [c003f88038c0] [c006f038] > vmemmap_free+0x218/0x2e0 > [ 21.964006] [c003f8803940] [c036f100] > sparse_remove_one_section+0xd0/0x138 > [ 21.964050] [c003f8803980] [c0383a50] > __remove_pages+0x410/0x560 > [ 21.964093] [c003f8803a90] [c0c784d8] > arch_remove_memory+0x68/0xdc > [ 21.964136] [c003f8803ad0] [c0385d74] > __remove_memory+0xc4/0x110 > [ 21.964180] [c003f8803b10] [c00d44e4] > dlpar_remove_lmb+0x94/0x140 > [ 21.964223] [c003f8803b50] [c00d52b4] > dlpar_memory+0x464/0xd00 > [ 21.964259] [c003f8803be0] [c00cd5c0] > handle_dlpar_errorlog+0xc0/0x190 > [ 21.964303] [c003f8803c50] [c00cd6bc] > pseries_hp_work_fn+0x2c/0x60 > [ 21.964346] [c003f8803c80] [c013a4a0] > process_one_work+0x2b0/0x5a0 > [ 21.964388] [c003f8803d10] [c013a818] > worker_thread+0x88/0x610 > [ 21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0 > [ 21.964468] [c003f8803e20] [c000bdc4] > ret_from_kernel_thread+0x5c/0x78 > [ 21.964506] Instruction dump: > [ 21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14 > 395f0020 813f0020 > [ 21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac > 7d205028 3129 > [ 21.964613] ---[ end trace aaa571aa1636fee6 ]--- > [ 21.966349] > [ 21.966383] Sending IPI to other CPUs > [ 21.978335] IPI complete > [ 21.981354] kexec: Starting switchover sequence. > I'm in purgatory git bisect points to commit 4231aba000f5a4583dd9f67057aadb68c3eca99d Author: Nicholas Piggin Date: Fri Jul 27 21:48:17 2018 +1000 powerpc/64s:
Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
srikanth writes: > Hello, > > On power9 host, performing memory hotunplug from ppc64le guest results > in kernel oops. Thanks for the report. Did this used to work in the past? If so what is the last version that worked? > Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using > ppc64le_defconfig for host and ppc64le_guest_defconfig for guest. > > Recreation steps: > > 1. Boot a guest with below mem configuration: > 33554432 > 8388608 > 4194304 > > > > > > > 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> > now reboot guest -> once guest comes back try to unplug 8G memory I assume the reboot is required to trigger the bug? ie. if you unplug without rebooting it doesn't crash? > mem.xml used: > > > 8 > 0 > > > > Memory attach and detach commands used: > virsh attach-device vm1 ./mem.xml --live > virsh detach-device vm1 ./mem.xml --live > > Trace seen inside guest after unplug, guest just hangs there forever: > > [ 21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113! > [ 21.963064] Oops: Exception in kernel mode, sig: 5 [#1] > [ 21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA > pSeries > [ 21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse > vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi > scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress > zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy > async_pq async_xor async_tx xor raid6_pq multipath crc32c_vpmsum > [ 21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not > tainted 5.1.0-dirty #2 > [ 21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn > [ 21.963355] NIP: c0079e18 LR: c0c79308 CTR: > 8000 > [ 21.963392] REGS: c003f88034f0 TRAP: 0700 Not tainted (5.1.0-dirty) > [ 21.963422] MSR: 8282b033 > CR: 28002884 XER: 2004 > [ 21.963470] CFAR: c0c79304 IRQMASK: 0 > [ 21.963470] GPR00: c0c79308 c003f8803780 c1521000 > 00fff8c0 Can you try not to word wrap these, it makes them much harder to read. There's some instructions here on configuring Thunderbird: https://www.kernel.org/doc/html/latest/process/email-clients.html#thunderbird-gui > [ 21.963470] GPR04: 0001 ffe30005 0005 > 0020 > [ 21.963470] GPR08: 0001 c00a00fff8e0 > c16d21a0 > [ 21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0 > c003ffe30100 > [ 21.963470] GPR16: c003ffe3 c14aa4de c00a009f > c16d21b0 > [ 21.963470] GPR20: c14de588 0001 c16d21b8 > c00a00a0 > [ 21.963470] GPR24: c00a00a0 > c003ffe96000 > [ 21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000 > c00a00fff8c0 > [ 21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0 > [ 21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4 > [ 21.963873] Call Trace: > [ 21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0 > (unreliable) > [ 21.963933] [c003f88037b0] [] (null) > [ 21.963969] [c003f88038c0] [c006f038] > vmemmap_free+0x218/0x2e0 > [ 21.964006] [c003f8803940] [c036f100] > sparse_remove_one_section+0xd0/0x138 > [ 21.964050] [c003f8803980] [c0383a50] > __remove_pages+0x410/0x560 > [ 21.964093] [c003f8803a90] [c0c784d8] > arch_remove_memory+0x68/0xdc > [ 21.964136] [c003f8803ad0] [c0385d74] > __remove_memory+0xc4/0x110 > [ 21.964180] [c003f8803b10] [c00d44e4] > dlpar_remove_lmb+0x94/0x140 > [ 21.964223] [c003f8803b50] [c00d52b4] > dlpar_memory+0x464/0xd00 > [ 21.964259] [c003f8803be0] [c00cd5c0] > handle_dlpar_errorlog+0xc0/0x190 > [ 21.964303] [c003f8803c50] [c00cd6bc] > pseries_hp_work_fn+0x2c/0x60 > [ 21.964346] [c003f8803c80] [c013a4a0] > process_one_work+0x2b0/0x5a0 > [ 21.964388] [c003f8803d10] [c013a818] > worker_thread+0x88/0x610 > [ 21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0 > [ 21.964468] [c003f8803e20] [c000bdc4] > ret_from_kernel_thread+0x5c/0x78 > [ 21.964506] Instruction dump: > [ 21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14 > 395f0020 813f0020 > [ 21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac > 7d20502
PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
Hello, On power9 host, performing memory hotunplug from ppc64le guest results in kernel oops. Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using ppc64le_defconfig for host and ppc64le_guest_defconfig for guest. Recreation steps: 1. Boot a guest with below mem configuration: 33554432 8388608 4194304 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now reboot guest -> once guest comes back try to unplug 8G memory mem.xml used: 8 0 Memory attach and detach commands used: virsh attach-device vm1 ./mem.xml --live virsh detach-device vm1 ./mem.xml --live Trace seen inside guest after unplug, guest just hangs there forever: [ 21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113! [ 21.963064] Oops: Exception in kernel mode, sig: 5 [#1] [ 21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath crc32c_vpmsum [ 21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not tainted 5.1.0-dirty #2 [ 21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn [ 21.963355] NIP: c0079e18 LR: c0c79308 CTR: 8000 [ 21.963392] REGS: c003f88034f0 TRAP: 0700 Not tainted (5.1.0-dirty) [ 21.963422] MSR: 8282b033 CR: 28002884 XER: 2004 [ 21.963470] CFAR: c0c79304 IRQMASK: 0 [ 21.963470] GPR00: c0c79308 c003f8803780 c1521000 00fff8c0 [ 21.963470] GPR04: 0001 ffe30005 0005 0020 [ 21.963470] GPR08: 0001 c00a00fff8e0 c16d21a0 [ 21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0 c003ffe30100 [ 21.963470] GPR16: c003ffe3 c14aa4de c00a009f c16d21b0 [ 21.963470] GPR20: c14de588 0001 c16d21b8 c00a00a0 [ 21.963470] GPR24: c00a00a0 c003ffe96000 [ 21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000 c00a00fff8c0 [ 21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0 [ 21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4 [ 21.963873] Call Trace: [ 21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0 (unreliable) [ 21.963933] [c003f88037b0] [] (null) [ 21.963969] [c003f88038c0] [c006f038] vmemmap_free+0x218/0x2e0 [ 21.964006] [c003f8803940] [c036f100] sparse_remove_one_section+0xd0/0x138 [ 21.964050] [c003f8803980] [c0383a50] __remove_pages+0x410/0x560 [ 21.964093] [c003f8803a90] [c0c784d8] arch_remove_memory+0x68/0xdc [ 21.964136] [c003f8803ad0] [c0385d74] __remove_memory+0xc4/0x110 [ 21.964180] [c003f8803b10] [c00d44e4] dlpar_remove_lmb+0x94/0x140 [ 21.964223] [c003f8803b50] [c00d52b4] dlpar_memory+0x464/0xd00 [ 21.964259] [c003f8803be0] [c00cd5c0] handle_dlpar_errorlog+0xc0/0x190 [ 21.964303] [c003f8803c50] [c00cd6bc] pseries_hp_work_fn+0x2c/0x60 [ 21.964346] [c003f8803c80] [c013a4a0] process_one_work+0x2b0/0x5a0 [ 21.964388] [c003f8803d10] [c013a818] worker_thread+0x88/0x610 [ 21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0 [ 21.964468] [c003f8803e20] [c000bdc4] ret_from_kernel_thread+0x5c/0x78 [ 21.964506] Instruction dump: [ 21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14 395f0020 813f0020 [ 21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac 7d205028 3129 [ 21.964613] ---[ end trace aaa571aa1636fee6 ]--- [ 21.966349] [ 21.966383] Sending IPI to other CPUs [ 21.978335] IPI complete [ 21.981354] kexec: Starting switchover sequence. I'm in purgatory
[PATCH 3.16 014/202] ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
3.16.66-rc1 review patch. If anyone has any objections, please let me know. -- From: b-ak commit 667e9334fa64da2273e36ce131b05ac9e47c5769 upstream. During the bootup of the kernel, the DAPM bias level is in the OFF state. As soon as the DAPM framework kicks in it pushes the codec into STANDBY state. The probe function doesn't prepare the clock, and STANDBY state does a clk_disable_unprepare() without checking the previous state. This leads to an OOPS. Not transitioning from an OFF state to the STANDBY state fixes the problem. Signed-off-by: b-ak Signed-off-by: Mark Brown [bwh: Backported to 3.16: - Open-code snd_soc_component_get_bias_level() - Adjust context] Signed-off-by: Ben Hutchings --- sound/soc/codecs/tlv320aic32x4.c | 4 1 file changed, 4 insertions(+) --- a/sound/soc/codecs/tlv320aic32x4.c +++ b/sound/soc/codecs/tlv320aic32x4.c @@ -534,6 +534,10 @@ static int aic32x4_set_bias_level(struct case SND_SOC_BIAS_PREPARE: break; case SND_SOC_BIAS_STANDBY: + /* Initial cold start */ + if (codec->dapm.bias_level == SND_SOC_BIAS_OFF) + break; + /* Switch off BCLK_N Divider */ snd_soc_update_bits(codec, AIC32X4_BCLKN, AIC32X4_BCLKEN, 0);
[PATCH 4.19 053/110] drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up
[ Upstream commit abf7b30d7f61d981bfcca65d1e8331b27021b475 ] In the Cirrus driver, the regular clean-up code also performs the clean-up of a failed initialization. If the fbdev's framebuffer was not initialized, the clean-up will fail within drm_framebuffer_unregister_private. Booting with cirrus.bpp=16 triggers this bug. The framebuffer is currently stored directly within struct cirrus_fbdev. To fix the bug, we turn it into a pointer that is only set for initialized framebuffers. The fbdev's clean-up code skips uninitialized framebuffers. The memory for struct drm_framebuffer is allocated dynamically. This requires additional error handling within cirrusfb_create. The framebuffer clean-up is now performed by drm_framebuffer_put, which also frees the data strcuture's memory. Link: https://bugzilla.suse.com/show_bug.cgi?id=1101822 Signed-off-by: Thomas Zimmermann Link: http://patchwork.freedesktop.org/patch/msgid/20180720112743.27159-1-tzimmerm...@suse.de Signed-off-by: Gerd Hoffmann Signed-off-by: Sasha Levin --- drivers/gpu/drm/cirrus/cirrus_drv.h | 2 +- drivers/gpu/drm/cirrus/cirrus_fbdev.c | 48 +++ drivers/gpu/drm/cirrus/cirrus_mode.c | 2 +- 3 files changed, 29 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/cirrus/cirrus_drv.h b/drivers/gpu/drm/cirrus/cirrus_drv.h index ce9db7aab225..a29f87e98d9d 100644 --- a/drivers/gpu/drm/cirrus/cirrus_drv.h +++ b/drivers/gpu/drm/cirrus/cirrus_drv.h @@ -146,7 +146,7 @@ struct cirrus_device { struct cirrus_fbdev { struct drm_fb_helper helper; - struct drm_framebuffer gfb; + struct drm_framebuffer *gfb; void *sysram; int size; int x1, y1, x2, y2; /* dirty rect */ diff --git a/drivers/gpu/drm/cirrus/cirrus_fbdev.c b/drivers/gpu/drm/cirrus/cirrus_fbdev.c index b643ac92801c..82cc82e0bd80 100644 --- a/drivers/gpu/drm/cirrus/cirrus_fbdev.c +++ b/drivers/gpu/drm/cirrus/cirrus_fbdev.c @@ -22,14 +22,14 @@ static void cirrus_dirty_update(struct cirrus_fbdev *afbdev, struct drm_gem_object *obj; struct cirrus_bo *bo; int src_offset, dst_offset; - int bpp = afbdev->gfb.format->cpp[0]; + int bpp = afbdev->gfb->format->cpp[0]; int ret = -EBUSY; bool unmap = false; bool store_for_later = false; int x2, y2; unsigned long flags; - obj = afbdev->gfb.obj[0]; + obj = afbdev->gfb->obj[0]; bo = gem_to_cirrus_bo(obj); /* @@ -82,7 +82,7 @@ static void cirrus_dirty_update(struct cirrus_fbdev *afbdev, } for (i = y; i < y + height; i++) { /* assume equal stride for now */ - src_offset = dst_offset = i * afbdev->gfb.pitches[0] + (x * bpp); + src_offset = dst_offset = i * afbdev->gfb->pitches[0] + (x * bpp); memcpy_toio(bo->kmap.virtual + src_offset, afbdev->sysram + src_offset, width * bpp); } @@ -192,23 +192,26 @@ static int cirrusfb_create(struct drm_fb_helper *helper, return -ENOMEM; info = drm_fb_helper_alloc_fbi(helper); - if (IS_ERR(info)) - return PTR_ERR(info); + if (IS_ERR(info)) { + ret = PTR_ERR(info); + goto err_vfree; + } info->par = gfbdev; - ret = cirrus_framebuffer_init(cdev->dev, >gfb, _cmd, gobj); + fb = kzalloc(sizeof(*fb), GFP_KERNEL); + if (!fb) { + ret = -ENOMEM; + goto err_drm_gem_object_put_unlocked; + } + + ret = cirrus_framebuffer_init(cdev->dev, fb, _cmd, gobj); if (ret) - return ret; + goto err_kfree; gfbdev->sysram = sysram; gfbdev->size = size; - - fb = >gfb; - if (!fb) { - DRM_INFO("fb is NULL\n"); - return -EINVAL; - } + gfbdev->gfb = fb; /* setup helper */ gfbdev->helper.fb = fb; @@ -241,24 +244,27 @@ static int cirrusfb_create(struct drm_fb_helper *helper, DRM_INFO(" pitch is %d\n", fb->pitches[0]); return 0; + +err_kfree: + kfree(fb); +err_drm_gem_object_put_unlocked: + drm_gem_object_put_unlocked(gobj); +err_vfree: + vfree(sysram); + return ret; } static int cirrus_fbdev_destroy(struct drm_device *dev, struct cirrus_fbdev *gfbdev) { - struct drm_framebuffer *gfb = >gfb; + struct drm_framebuffer *gfb = gfbdev->gfb; drm_fb_helper_unregister_fbi(>helper); - if (gfb->obj[0]) { - drm_gem_object_put_unlocked(gfb->obj[0]); - gfb->obj[0] = NULL; - } - vfree(gfbdev->sysram); drm_fb_helper_fini(>helper); - drm_framebuffer_unregister_private(gfb); - drm_framebuffer_cleanup(gfb); + if (gfb) + drm_framebuffer_put(gfb); return 0; } diff --git a/drivers/gpu/drm/cirrus/cirrus_mode.c
[PATCH 4.19 030/103] ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
4.19-stable review patch. If anyone has any objections, please let me know. -- From: b-ak commit 667e9334fa64da2273e36ce131b05ac9e47c5769 upstream. During the bootup of the kernel, the DAPM bias level is in the OFF state. As soon as the DAPM framework kicks in it pushes the codec into STANDBY state. The probe function doesn't prepare the clock, and STANDBY state does a clk_disable_unprepare() without checking the previous state. This leads to an OOPS. Not transitioning from an OFF state to the STANDBY state fixes the problem. Signed-off-by: b-ak Signed-off-by: Mark Brown Cc: sta...@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- sound/soc/codecs/tlv320aic32x4.c |4 1 file changed, 4 insertions(+) --- a/sound/soc/codecs/tlv320aic32x4.c +++ b/sound/soc/codecs/tlv320aic32x4.c @@ -822,6 +822,10 @@ static int aic32x4_set_bias_level(struct case SND_SOC_BIAS_PREPARE: break; case SND_SOC_BIAS_STANDBY: + /* Initial cold start */ + if (snd_soc_component_get_bias_level(component) == SND_SOC_BIAS_OFF) + break; + /* Switch off BCLK_N Divider */ snd_soc_component_update_bits(component, AIC32X4_BCLKN, AIC32X4_BCLKEN, 0);
[PATCH 4.20 034/117] ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
4.20-stable review patch. If anyone has any objections, please let me know. -- From: b-ak commit 667e9334fa64da2273e36ce131b05ac9e47c5769 upstream. During the bootup of the kernel, the DAPM bias level is in the OFF state. As soon as the DAPM framework kicks in it pushes the codec into STANDBY state. The probe function doesn't prepare the clock, and STANDBY state does a clk_disable_unprepare() without checking the previous state. This leads to an OOPS. Not transitioning from an OFF state to the STANDBY state fixes the problem. Signed-off-by: b-ak Signed-off-by: Mark Brown Cc: sta...@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- sound/soc/codecs/tlv320aic32x4.c |4 1 file changed, 4 insertions(+) --- a/sound/soc/codecs/tlv320aic32x4.c +++ b/sound/soc/codecs/tlv320aic32x4.c @@ -822,6 +822,10 @@ static int aic32x4_set_bias_level(struct case SND_SOC_BIAS_PREPARE: break; case SND_SOC_BIAS_STANDBY: + /* Initial cold start */ + if (snd_soc_component_get_bias_level(component) == SND_SOC_BIAS_OFF) + break; + /* Switch off BCLK_N Divider */ snd_soc_component_update_bits(component, AIC32X4_BCLKN, AIC32X4_BCLKEN, 0);
[PATCH v2] ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
During the bootup of the kernel, the DAPM bias level is in the OFF state. As soon as the DAPM framework kicks in it pushes the codec into STANDBY state. The probe function doesn't prepare the clock, and STANDBY state does a clk_disable_unprepare() without checking the previous state. This leads to an OOPS. Not transitioning from an OFF state to the STANDBY state fixes the problem. Signed-off-by: b-ak --- sound/soc/codecs/tlv320aic32x4.c | 4 1 file changed, 4 insertions(+) diff --git a/sound/soc/codecs/tlv320aic32x4.c b/sound/soc/codecs/tlv320aic32x4.c index e2b5a11b16d1..f03195d2ab2e 100644 --- a/sound/soc/codecs/tlv320aic32x4.c +++ b/sound/soc/codecs/tlv320aic32x4.c @@ -822,6 +822,10 @@ static int aic32x4_set_bias_level(struct snd_soc_component *component, case SND_SOC_BIAS_PREPARE: break; case SND_SOC_BIAS_STANDBY: + /* Initial cold start */ + if (snd_soc_component_get_bias_level(component) == SND_SOC_BIAS_OFF) + break; + /* Switch off BCLK_N Divider */ snd_soc_component_update_bits(component, AIC32X4_BCLKN, AIC32X4_BCLKEN, 0); -- 2.19.1
Re: [PATCH] ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
On Mon, Jan 07, 2019 at 12:59:07PM +, Mark Brown wrote: > On Sat, Jan 05, 2019 at 10:16:22AM +0530, b-ak wrote: > > > > > Hi Mark, > > > > Fixed the build error. > > > > Thanks, > > Bhargav > > > > Please submit patches following the process covered in > submitting-patches.rst, don't send them as attachments to replies in the > middle of threads. Doing that confuses all the tooling for handling > patches. Ok. I made a mistake while sending it with Mutt. Will be sending it inline now.
[PATCH v2] ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
During the bootup of the kernel, the DAPM bias level is in the OFF state. As soon as the DAPM framework kicks in it pushes the codec into STANDBY state. The probe function doesn't prepare the clock, and STANDBY state does a clk_disable_unprepare() without checking the previous state. This leads to an OOPS. Not transitioning from an OFF state to the STANDBY state fixes the problem. Signed-off-by: b-ak --- sound/soc/codecs/tlv320aic32x4.c | 4 1 file changed, 4 insertions(+) diff --git a/sound/soc/codecs/tlv320aic32x4.c b/sound/soc/codecs/tlv320aic32x4.c index e2b5a11b16d1..f03195d2ab2e 100644 --- a/sound/soc/codecs/tlv320aic32x4.c +++ b/sound/soc/codecs/tlv320aic32x4.c @@ -822,6 +822,10 @@ static int aic32x4_set_bias_level(struct snd_soc_component *component, case SND_SOC_BIAS_PREPARE: break; case SND_SOC_BIAS_STANDBY: + /* Initial cold start */ + if (snd_soc_component_get_bias_level(component) == SND_SOC_BIAS_OFF) + break; + /* Switch off BCLK_N Divider */ snd_soc_component_update_bits(component, AIC32X4_BCLKN, AIC32X4_BCLKEN, 0); -- 2.19.1
Re: [PATCH] ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
On Sat, Jan 05, 2019 at 10:16:22AM +0530, b-ak wrote: > > Hi Mark, > > Fixed the build error. > > Thanks, > Bhargav > Please submit patches following the process covered in submitting-patches.rst, don't send them as attachments to replies in the middle of threads. Doing that confuses all the tooling for handling patches. signature.asc Description: PGP signature