Re: [RFC] taint: add module firmware crash taint support
On Fri, 8 May 2020 02:14:38 + Luis Chamberlain wrote: > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as > this fact should in no way shape or form affect lockdep. This taint > is device driver specific. Looks fine to me. -- Steve
Re: [RFC] taint: add module firmware crash taint support
On Fri, May 08, 2020 at 02:14:38AM +, Luis Chamberlain wrote: > Device driver firmware can crash, and sometimes, this can leave your > system in a state which makes the device or subsystem completely > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead > of scraping some magical words from the kernel log, which is driver > specific, is much easier. So instead provide a helper which lets drivers > annotate this. > > Once this happens, scrapers can easily scrape modules taint flags. > This will taint both the kernel and respective calling module. > > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as > this fact should in no way shape or form affect lockdep. This taint > is device driver specific. > > Signed-off-by: Luis Chamberlain > --- > > Below is the full diff stat of manual inspection throughout the kernel > when this happens. My methodology is to just scrape for "crash" and > then study the driver a bit to see if indeed it seems like that the > firmware crashes there. In *many* cases there is even infrastructure > for this, so this is sometimes clearly obvious. Some other times it > required a bit of deciphering. > > The diff stat below is what I have so far, however the patch below > only includes the drivers that start with Q, as they were a source of > inspiration for this, and to make this RFC easier to read. > > If this seems sensible, I can follow up with the kernel helper first, > and then tackle each subsystem independently. > > I purposely skipped review of remoteproc and virtualization. That should > require its own separate careful review and considerations. > ... > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 04a5885cec1b..19e1541c82c7 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -601,7 +601,8 @@ extern enum system_states { > #define TAINT_LIVEPATCH 15 > #define TAINT_AUX16 > #define TAINT_RANDSTRUCT 17 > -#define TAINT_FLAGS_COUNT18 > +#define TAINT_FIRMWARE_CRASH 18 > +#define TAINT_FLAGS_COUNT19 > As others have already mentioned (and I guess it was your idea too) it would be nicer to split the changes into a set. We also will need to update Documentation/admin-guide/tainted-kernels.rst to reflect the inclusion of this new flag. -- Rafael
Re: [RFC] taint: add module firmware crash taint support
On Fri, 8 May 2020 13:16:04 +0300 Andy Shevchenko wrote: > > +++ b/include/trace/events/module.h > > @@ -26,7 +26,8 @@ struct module; > > { (1UL << TAINT_OOT_MODULE),"O" }, \ > > { (1UL << TAINT_FORCED_MODULE), "F" }, \ > > { (1UL << TAINT_CRAP), "C" }, \ > > - { (1UL << TAINT_UNSIGNED_MODULE), "E" }) > > + { (1UL << TAINT_UNSIGNED_MODULE), "E" }, \ > > + { (1UL << TAINT_FIRMWARE_CRASH),"Q" }) > > Perhaps split out the closing parenthesis to avoid changing additional line in > the future? I don't think that will make a difference, as the last line must not contain a comma. New updates will still add additional line changes just to insert a comma before adding a new flag. -- Steve > > > TRACE_EVENT(module_load,
Re: [RFC] taint: add module firmware crash taint support
On Fri, May 08, 2020 at 02:14:38AM +, Luis Chamberlain wrote: > Device driver firmware can crash, and sometimes, this can leave your > system in a state which makes the device or subsystem completely > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead > of scraping some magical words from the kernel log, which is driver > specific, is much easier. So instead provide a helper which lets drivers > annotate this. > > Once this happens, scrapers can easily scrape modules taint flags. > This will taint both the kernel and respective calling module. > > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as > this fact should in no way shape or form affect lockdep. This taint > is device driver specific. ... > +++ b/include/trace/events/module.h > @@ -26,7 +26,8 @@ struct module; > { (1UL << TAINT_OOT_MODULE),"O" }, \ > { (1UL << TAINT_FORCED_MODULE), "F" }, \ > { (1UL << TAINT_CRAP), "C" }, \ > - { (1UL << TAINT_UNSIGNED_MODULE), "E" }) > + { (1UL << TAINT_UNSIGNED_MODULE), "E" }, \ > + { (1UL << TAINT_FIRMWARE_CRASH),"Q" }) Perhaps split out the closing parenthesis to avoid changing additional line in the future? > TRACE_EVENT(module_load, -- With Best Regards, Andy Shevchenko
Re: [RFC] taint: add module firmware crash taint support
On Fri, May 08, 2020 at 08:11:24AM +0200, Daniel Vetter wrote: > On Fri, May 08, 2020 at 02:14:38AM +, Luis Chamberlain wrote: > > Device driver firmware can crash, and sometimes, this can leave your > > system in a state which makes the device or subsystem completely > > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead > > of scraping some magical words from the kernel log, which is driver > > specific, is much easier. So instead provide a helper which lets drivers > > annotate this. > > > > Once this happens, scrapers can easily scrape modules taint flags. > > This will taint both the kernel and respective calling module. > > > > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as > > this fact should in no way shape or form affect lockdep. This taint > > is device driver specific. > > > > Signed-off-by: Luis Chamberlain > > --- > > > > Below is the full diff stat of manual inspection throughout the kernel > > when this happens. My methodology is to just scrape for "crash" and > > then study the driver a bit to see if indeed it seems like that the > > firmware crashes there. In *many* cases there is even infrastructure > > for this, so this is sometimes clearly obvious. Some other times it > > required a bit of deciphering. > > > > The diff stat below is what I have so far, however the patch below > > only includes the drivers that start with Q, as they were a source of > > inspiration for this, and to make this RFC easier to read. > > > > If this seems sensible, I can follow up with the kernel helper first, > > and then tackle each subsystem independently. > > > > I purposely skipped review of remoteproc and virtualization. That should > > require its own separate careful review and considerations. > > > > drivers/atm/nicstar.c | 1 + > > drivers/bluetooth/hci_qca.c | 1 + > > drivers/gpu/drm/i915/i915_gpu_error.c | 1 + > > drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 2 ++ > > drivers/gpu/drm/msm/msm_gpu.c | 1 + > > I'm not finding the drm changes in your diff below ... That was on purpose, as this was an RFC and I didnt' want to clutter this with noise. > Also what Kees > said, I think best to split this up and properly cc per > get_maintainers.pl. Sounds good. Luis
Re: [RFC] taint: add module firmware crash taint support
On Fri, May 08, 2020 at 02:14:38AM +, Luis Chamberlain wrote: > Device driver firmware can crash, and sometimes, this can leave your > system in a state which makes the device or subsystem completely > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead > of scraping some magical words from the kernel log, which is driver > specific, is much easier. So instead provide a helper which lets drivers > annotate this. > > Once this happens, scrapers can easily scrape modules taint flags. > This will taint both the kernel and respective calling module. > > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as > this fact should in no way shape or form affect lockdep. This taint > is device driver specific. > > Signed-off-by: Luis Chamberlain > --- > > Below is the full diff stat of manual inspection throughout the kernel > when this happens. My methodology is to just scrape for "crash" and > then study the driver a bit to see if indeed it seems like that the > firmware crashes there. In *many* cases there is even infrastructure > for this, so this is sometimes clearly obvious. Some other times it > required a bit of deciphering. > > The diff stat below is what I have so far, however the patch below > only includes the drivers that start with Q, as they were a source of > inspiration for this, and to make this RFC easier to read. > > If this seems sensible, I can follow up with the kernel helper first, > and then tackle each subsystem independently. > > I purposely skipped review of remoteproc and virtualization. That should > require its own separate careful review and considerations. > > drivers/atm/nicstar.c | 1 + > drivers/bluetooth/hci_qca.c | 1 + > drivers/gpu/drm/i915/i915_gpu_error.c | 1 + > drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 2 ++ > drivers/gpu/drm/msm/msm_gpu.c | 1 + I'm not finding the drm changes in your diff below ... Also what Kees said, I think best to split this up and properly cc per get_maintainers.pl. -Daniel > drivers/media/dvb-frontends/sp8870.c| 1 + > drivers/media/pci/ttpci/av7110.c| 1 + > drivers/misc/mic/cosm/cosm_scif_server.c| 1 + > drivers/net/ethernet/8390/axnet_cs.c| 4 +++- > drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c| 1 + > drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 1 + > drivers/net/ethernet/brocade/bna/bfa_ioc.c | 1 + > drivers/net/ethernet/cavium/liquidio/lio_main.c | 1 + > drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 1 + > drivers/net/ethernet/ibm/ehea/ehea_main.c | 2 ++ > drivers/net/ethernet/qlogic/qed/qed_debug.c | 3 +++ > drivers/net/ipa/ipa_modem.c | 1 + > drivers/net/wimax/i2400m/rx.c | 1 + > drivers/net/wireless/ath/ath10k/pci.c | 2 ++ > drivers/net/wireless/ath/ath10k/sdio.c | 2 ++ > drivers/net/wireless/ath/ath10k/snoc.c | 1 + > drivers/net/wireless/ath/ath6kl/hif.c | 1 + > .../net/wireless/broadcom/brcm80211/brcmfmac/core.c | 1 + > drivers/net/wireless/marvell/mwl8k.c| 1 + > drivers/scsi/bfa/bfa_ioc.c | 1 + > drivers/scsi/megaraid/megaraid_sas_fusion.c | 1 + > drivers/scsi/qla4xxx/ql4_mbx.c | 2 ++ > drivers/staging/media/omap4iss/iss.c| 1 + > include/linux/kernel.h | 3 ++- > include/linux/module.h | 13 + > include/trace/events/module.h | 3 ++- > kernel/module.c | 5 +++-- > kernel/panic.c | 1 + > 33 files changed, 58 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c > b/drivers/net/ethernet/qlogic/qed/qed_debug.c > index f4eebaabb6d0..9cc6287b889b 100644 > --- a/drivers/net/ethernet/qlogic/qed/qed_debug.c > +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c > @@ -7854,6 +7854,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer) >REGDUMP_HEADER_SIZE, >_size); > if (!rc) { > + module_firmware_crashed(); > *(u32 *)((u8 *)buffer + offset) = > qed_calc_regdump_header(cdev, PROTECTION_OVERRIDE, > cur_engine, > @@ -7869,6 +7870,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer) > rc = qed_dbg_fw_asserts(cdev, (u8 *)buffer + offset + > REGDUMP_HEADER_SIZE, _size); > if (!rc) { > + module_firmware_crashed(); >
Re: [RFC] taint: add module firmware crash taint support
On Thu, May 07, 2020 at 10:47:08PM -0700, Kees Cook wrote: > On Fri, May 08, 2020 at 02:14:38AM +, Luis Chamberlain wrote: > > Device driver firmware can crash, and sometimes, this can leave your > > system in a state which makes the device or subsystem completely > > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead > > of scraping some magical words from the kernel log, which is driver > > specific, is much easier. So instead provide a helper which lets drivers > > annotate this. > > > > Once this happens, scrapers can easily scrape modules taint flags. > > This will taint both the kernel and respective calling module. > > > > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as > > this fact should in no way shape or form affect lockdep. This taint > > is device driver specific. > > > > Signed-off-by: Luis Chamberlain > > --- > > > > Below is the full diff stat of manual inspection throughout the kernel > > when this happens. My methodology is to just scrape for "crash" and > > then study the driver a bit to see if indeed it seems like that the > > firmware crashes there. In *many* cases there is even infrastructure > > for this, so this is sometimes clearly obvious. Some other times it > > required a bit of deciphering. > > > > The diff stat below is what I have so far, however the patch below > > only includes the drivers that start with Q, as they were a source of > > inspiration for this, and to make this RFC easier to read. > > > > If this seems sensible, I can follow up with the kernel helper first, > > and then tackle each subsystem independently. > > > > I purposely skipped review of remoteproc and virtualization. That should > > require its own separate careful review and considerations. > > This all seems reasonable to me. You might need to break these up into > per-maintainer patches to get appropriate review. Perhaps land the > infrastructure and some initial patches via netdev and in the next > release send patches for DRM, media, etc? Works for me. I'll give it a few more days for review on the RFC before I shoot out a series for netdev. Luis
Re: [RFC] taint: add module firmware crash taint support
On Fri, May 08, 2020 at 02:14:38AM +, Luis Chamberlain wrote: > Device driver firmware can crash, and sometimes, this can leave your > system in a state which makes the device or subsystem completely > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead > of scraping some magical words from the kernel log, which is driver > specific, is much easier. So instead provide a helper which lets drivers > annotate this. > > Once this happens, scrapers can easily scrape modules taint flags. > This will taint both the kernel and respective calling module. > > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as > this fact should in no way shape or form affect lockdep. This taint > is device driver specific. > > Signed-off-by: Luis Chamberlain > --- > > Below is the full diff stat of manual inspection throughout the kernel > when this happens. My methodology is to just scrape for "crash" and > then study the driver a bit to see if indeed it seems like that the > firmware crashes there. In *many* cases there is even infrastructure > for this, so this is sometimes clearly obvious. Some other times it > required a bit of deciphering. > > The diff stat below is what I have so far, however the patch below > only includes the drivers that start with Q, as they were a source of > inspiration for this, and to make this RFC easier to read. > > If this seems sensible, I can follow up with the kernel helper first, > and then tackle each subsystem independently. > > I purposely skipped review of remoteproc and virtualization. That should > require its own separate careful review and considerations. This all seems reasonable to me. You might need to break these up into per-maintainer patches to get appropriate review. Perhaps land the infrastructure and some initial patches via netdev and in the next release send patches for DRM, media, etc? -- Kees Cook
Re: [RFC] taint: add module firmware crash taint support
On Fri, May 08, 2020 at 02:14:38AM +, Luis Chamberlain wrote: > Device driver firmware can crash, and sometimes, this can leave your > system in a state which makes the device or subsystem completely > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead > of scraping some magical words from the kernel log, which is driver > specific, is much easier. So instead provide a helper which lets drivers > annotate this. > > Once this happens, scrapers can easily scrape modules taint flags. > This will taint both the kernel and respective calling module. > > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as > this fact should in no way shape or form affect lockdep. This taint > is device driver specific. > > Signed-off-by: Luis Chamberlain > --- > > Below is the full diff stat of manual inspection throughout the kernel > when this happens. My methodology is to just scrape for "crash" and > then study the driver a bit to see if indeed it seems like that the > firmware crashes there. In *many* cases there is even infrastructure > for this, so this is sometimes clearly obvious. Some other times it > required a bit of deciphering. > > The diff stat below is what I have so far, however the patch below > only includes the drivers that start with Q, as they were a source of > inspiration for this, and to make this RFC easier to read. > > If this seems sensible, I can follow up with the kernel helper first, > and then tackle each subsystem independently. > > I purposely skipped review of remoteproc and virtualization. That should > require its own separate careful review and considerations. > I should also note that another motivation for this is it is *very* common for support issues to creep up, where the issues at hand are firmware related. Easily ruling out a firmware crash, and later devising tooling which captures this, can help with support. Luis
[RFC] taint: add module firmware crash taint support
Device driver firmware can crash, and sometimes, this can leave your system in a state which makes the device or subsystem completely useless. Detecting this by inspecting /proc/sys/kernel/tainted instead of scraping some magical words from the kernel log, which is driver specific, is much easier. So instead provide a helper which lets drivers annotate this. Once this happens, scrapers can easily scrape modules taint flags. This will taint both the kernel and respective calling module. The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as this fact should in no way shape or form affect lockdep. This taint is device driver specific. Signed-off-by: Luis Chamberlain --- Below is the full diff stat of manual inspection throughout the kernel when this happens. My methodology is to just scrape for "crash" and then study the driver a bit to see if indeed it seems like that the firmware crashes there. In *many* cases there is even infrastructure for this, so this is sometimes clearly obvious. Some other times it required a bit of deciphering. The diff stat below is what I have so far, however the patch below only includes the drivers that start with Q, as they were a source of inspiration for this, and to make this RFC easier to read. If this seems sensible, I can follow up with the kernel helper first, and then tackle each subsystem independently. I purposely skipped review of remoteproc and virtualization. That should require its own separate careful review and considerations. drivers/atm/nicstar.c | 1 + drivers/bluetooth/hci_qca.c | 1 + drivers/gpu/drm/i915/i915_gpu_error.c | 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 2 ++ drivers/gpu/drm/msm/msm_gpu.c | 1 + drivers/media/dvb-frontends/sp8870.c| 1 + drivers/media/pci/ttpci/av7110.c| 1 + drivers/misc/mic/cosm/cosm_scif_server.c| 1 + drivers/net/ethernet/8390/axnet_cs.c| 4 +++- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c| 1 + drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 1 + drivers/net/ethernet/brocade/bna/bfa_ioc.c | 1 + drivers/net/ethernet/cavium/liquidio/lio_main.c | 1 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 1 + drivers/net/ethernet/ibm/ehea/ehea_main.c | 2 ++ drivers/net/ethernet/qlogic/qed/qed_debug.c | 3 +++ drivers/net/ipa/ipa_modem.c | 1 + drivers/net/wimax/i2400m/rx.c | 1 + drivers/net/wireless/ath/ath10k/pci.c | 2 ++ drivers/net/wireless/ath/ath10k/sdio.c | 2 ++ drivers/net/wireless/ath/ath10k/snoc.c | 1 + drivers/net/wireless/ath/ath6kl/hif.c | 1 + .../net/wireless/broadcom/brcm80211/brcmfmac/core.c | 1 + drivers/net/wireless/marvell/mwl8k.c| 1 + drivers/scsi/bfa/bfa_ioc.c | 1 + drivers/scsi/megaraid/megaraid_sas_fusion.c | 1 + drivers/scsi/qla4xxx/ql4_mbx.c | 2 ++ drivers/staging/media/omap4iss/iss.c| 1 + include/linux/kernel.h | 3 ++- include/linux/module.h | 13 + include/trace/events/module.h | 3 ++- kernel/module.c | 5 +++-- kernel/panic.c | 1 + 33 files changed, 58 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c b/drivers/net/ethernet/qlogic/qed/qed_debug.c index f4eebaabb6d0..9cc6287b889b 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_debug.c +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c @@ -7854,6 +7854,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer) REGDUMP_HEADER_SIZE, _size); if (!rc) { + module_firmware_crashed(); *(u32 *)((u8 *)buffer + offset) = qed_calc_regdump_header(cdev, PROTECTION_OVERRIDE, cur_engine, @@ -7869,6 +7870,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer) rc = qed_dbg_fw_asserts(cdev, (u8 *)buffer + offset + REGDUMP_HEADER_SIZE, _size); if (!rc) { + module_firmware_crashed(); *(u32 *)((u8 *)buffer + offset) = qed_calc_regdump_header(cdev, FW_ASSERTS, cur_engine, feature_size, @@ -7906,6 +7908,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer) rc = qed_dbg_grc(cdev, (u8 *)buffer + offset +